Table 1: Experimental environment.

CPUs Intel Xeon E5645
# Cores 4
Vector width SSE 4.2, 4 single precision FP
Caches L1D/L2/L3: 64 KB/256 KB/12 MB
FP peak performance 230.4 GFlops
Core frequency 2.40 GHz
DRAM 4 GB

GPUs NVidia GeForce GTX 580
# SMs 16
Caches L1/Global L2: 16 KB/768 KB
FP peak performance 1.56 TFlops
Shader Clock frequency 1544 MHz

O/S Ubuntu 12.04.1 LTS
Platform Intel OpenCL Platform 1.5 for CPU
NVidia OpenCL Platform 4.2 for GPU
Compiler Intel C/C++ compiler 12.1.3