Research Article

3D Data Denoising via Nonlocal Means Filter by Using Parallel GPU Strategies

Table 1

L1-prefer switch influence on execution times for the partial and full unrolling algorithm with (128, 1, 1) block size configuration and 3D datasets of normally distributed random numbers.

Partial unrolling algorithm Full unrolling algorithm
Single GPU Multi-GPUSingle GPU Multi-GPU
L1no L1L1no L1L1no L1L1no L1

0.770.830.380.420.580.580.230.23
2.252.31.121.152.292.290.850.85
4.54.592.252.34.384.381.91.9
16.2416.548.128.2517.5117.517.587.58
32.533.5816.2416.5434.2234.2415.9415.95
63.5566.4231.6732.5270.017030.3130.32
128.19138.0663.5766.59136.89136.9563.7563.77
260.93288.84128.63138.2270.61270.75130.78130.67