Research Article

3D Data Denoising via Nonlocal Means Filter by Using Parallel GPU Strategies

Table 5

Full unrolling algorithm on 2 GPU units. Execution times and speed-up values for several block size configurations and 3D datasets of normally distributed random numbers. Search and similarity windows have been set according to and .

Dataset sizeExecution time/speed-up
2 GPU units CPU
(16, 16, 1)(128, 1, 1)(256, 1, 1)(512, 1, 1)

0.12/1460.23/76.10.43/40.70.9/19.417.5
0.87/95.40.85/97.60.85/97.61.79/46.483
1.94/87.61.9/89.51.9/89.53.99/42.6170
7.73/907.58/91.87.59/91.77.98/87.2696
16.3/85.715.9/87.416/87.316.8/831393
30.9/91.130.3/92.830.3/92.732/87.92814
65/86.563.8/88.263.8/88.167.3/83.55623
133/84.8131/86.4131/86.3138/81.811291