Research Article
LTTng CLUST: A System-Wide Unified CPU and GPU Tracing Tool for OpenCL Applications
Table 2
Synchronous OpenCL API function overhead benchmark.
| Loop Size | Base ave. (ns/call) | Base Std. dev. (ns/call) | Preload ave. (ns/call) | Preload Std. dev. (ns/call) | Trace ave. (ns/call) | Trace Std. dev. (ns/call) | Preload overhead (ns/call) | Trace overhead (ns/call) |
| 1 | 16 | 3 | 18 | 3 | 383 | 8 | 2 | 367 | 10 | 5.2 | 0.5 | 7.8 | 0.6 | 366.5 | 2.2 | 2.6 | 361.3 | | 4.64 | 0.04 | 6.66 | 0.05 | 365.68 | 6.38 | 2.02 | 361.04 | | 4.291 | 0.006 | 6.058 | 0.028 | 365.168 | 2.88 | 1.767 | 360.877 | | 4.277 | 0.012 | 6.283 | 0.036 | 359.780 | 13.425 | 2.006 | 355.503 | | 4.526 | 0.005 | 6.484 | 0.101 | 359.379 | 1.055 | 1.958 | 354.853 | | 4.531 | 0.029 | 6.467 | 0.097 | 363.313 | 5.138 | 1.936 | 358.782 | | 4.537 | 0.018 | 6.499 | 0.150 | 361.145 | 2.791 | 1.962 | 356.608 | | 4.535 | 0.022 | 6.460 | 0.026 | 361.108 | 1.966 | 1.925 | 356.573 |
|
|
Sample size = 100.
|