Research Article
LTTng CLUST: A System-Wide Unified CPU and GPU Tracing Tool for OpenCL Applications
Table 3
Asynchronous OpenCL API function overhead benchmark.
| Buffer Size (Byte) | Base ave. (ns/call) | Base Std. dev. (ns/call) | Preload ave. (ns/call) | Preload Std. dev. (ns/call) | Trace ave. (ns/call) | Trace Std. dev. (ns/call) | Preload overhead (ns/call) | Trace overhead (ns/call) |
| 4 | 149.51 | 1.47 | 164.7 | 1.1 | 7000.6 | 261.8 | 15.2 | 6851.1 | 40 | 158.99 | 0.92 | 168.7 | 1.3 | 7026.8 | 289.5 | 9.7 | 6867.8 | 400 | 156.15 | 1.50 | 174.7 | 1.3 | 7269.3 | 240.6 | 18.5 | 7113.2 | | 188.44 | 1.14 | 226.7 | 1.3 | 7043.6 | 244.2 | 38.3 | 6855.2 | | 1499.76 | 5.47 | 1503.3 | 5.6 | 8393.0 | 227.2 | 3.6 | 6893.3 | | 17805.67 | 134.31 | 17862.1 | 16.1 | 25404.7 | 276.3 | 56.4 | 7599.0 |
|
|
Sample size = 100; loop size = 1000.
|