Research Article

3D Data Denoising via Nonlocal Means Filter by Using Parallel GPU Strategies

Table 2

Partial unrolling algorithm on a single GPU unit. Execution times and speed-up values for several block size configurations and 3D datasets of normally distributed random numbers. Search and similarity windows have been set according to and .

Dataset sizeExecution time/speed-up
1 GPU unit CPU
(16, 16, 1)(128, 1, 1)(256, 1, 1)(512, 1, 1)

0.99/17.70.77/22.71.32/13.32.72/6.4317.5
2.31/35.92.25/36.92.23/37.24.57/18.283
4.62/36.84.5/37.84.46/38.19.22/18.4170
16.9/41.216.2/42.916.6/4218.6/37.3696
33.79/41.232.5/42.933.14/4237.5/37.21393
65.2/43.163.6/44.363.9/4469.2/40.72814
131/43.1128/43.9128/43.8139/40.55623
264/42.7261/43.3259/43.5281/40.211291