Research Article

3D Data Denoising via Nonlocal Means Filter by Using Parallel GPU Strategies

Table 3

Partial unrolling algorithm on 2 GPU units. Execution times and speed-up values for several block size configurations and 3D datasets of normally distributed random numbers. Search and similarity windows have been set according to and .

Dataset sizeExecution time/speed-up
2 GPU units CPU
(16, 16, 1)(128, 1, 1)(256, 1, 1)(512, 1, 1)

0.49/35.70.38/46.10.66/26.51.36/12.917.5
1.15/72.21.12/74.11.11/74.82.27/36.683
2.31/73.62.25/75.62.23/76.24.57/37.2170
8.44/82.58.12/85.78.28/84.19.26/75.2696
16.9/82.516.2/85.816.6/84.118.6/74.71393
32.6/86.231.7/88.931.92/88.234.5/81.32814
65.2/86.263.6/88.563.9/8869.2/81.35623
131/86.5129/87.8128/87.9139/81.211291