Research Article

LTTng CLUST: A System-Wide Unified CPU and GPU Tracing Tool for OpenCL Applications

Table 3

Asynchronous OpenCL API function overhead benchmark.

Buffer
Size
(Byte)
Base
ave.
(ns/call)
Base
Std. dev.
(ns/call)
Preload
ave.
(ns/call)
Preload
Std. dev.
(ns/call)
Trace
ave.
(ns/call)
Trace
Std. dev.
(ns/call)
Preload
overhead
(ns/call)
Trace
overhead
(ns/call)

4149.511.47164.71.17000.6261.815.26851.1
40158.990.92168.71.37026.8289.59.76867.8
400156.151.50174.71.37269.3240.618.57113.2
188.441.14226.71.37043.6244.238.36855.2
1499.765.471503.35.68393.0227.23.66893.3
17805.67134.3117862.116.125404.7276.356.47599.0

Sample size = 100; loop size = 1000.