Research Article

3D Data Denoising via Nonlocal Means Filter by Using Parallel GPU Strategies

Table 4

Full unrolling algorithm on a single GPU unit. Execution times and speed-up values for several block size configurations and 3D datasets of normally distributed random numbers. Search and similarity windows have been set according to and .

Dataset sizeExecution time/speed-up
1 GPU unitCPU
(16, 16, 1)(128, 1, 1)(256, 1, 1)(512, 1, 1)

0.59/29.70.58/30.21.15/15.22.4/7.2917.5
2.33/35.62.29/36.22.3/36.14.8/17.383
4.46/38.14.5/37.84.38/38.89.2/18.5170
17.9/3917.5/39.717.5/39.718.4/37.8696
34.9/39.934.2/40.734.3/40.636/38.71393
71.4/39.470/40.270.1/40.173.9/38.12814
140/40.3137/41.1137/41145/38.95623
276/40.9271/41.7271/41.7286/39.511291