Research Article

3D Data Denoising via Nonlocal Means Filter by Using Parallel GPU Strategies

Figure 7

Schematic representation of thread organization inside a single GPU and between GPUs. Thread workload highlights the voxels that will be processed by a single thread. Moreover, threads can be organized in strips or tiles of the specified Block size. The red cube depicts the search window and the smaller blue one depicts the similarity window.
523862.fig.007a
(a) 1D CUDA block configuration (12, 1, 1)
523862.fig.007b
(b) 2D CUDA block configuration (12, 12, 1)