Research Article
3D Data Denoising via Nonlocal Means Filter by Using Parallel GPU Strategies
Algorithm 5
CUDA multi-GPU code of NLM algorithm with partial unrolling strategy.
(1) /* “my_in_img” and “my_out_img” are respectively the sections | of the images “in_img” and “out_img” splitted between the | “n_gpus” GPUs. */ | (2) int const i_1 = threadIdx.x + blockDim.x*blockIdx.x; | (3) int const i_2 = threadIdx.y + blockDim.y*blockIdx.y; | (4) /* local statements */ | (5) if ((i_1 ≥ 0) && (i_1 < X_Dim) && (i_2 ≥ 0) && (i_2 < Y_Dim)) { | (6) for (i_3 0; i_3 Z_Dim/n_gpus; i_3) { | (7) /* compute my_out_img i_1 + i_2*X_Dim + i_3*X_Dim*Y_Dim/n_gpus | using my_in_img i_1 + i_2*X_Dim + i_3*X_Dim*Y_Dim/n_gpus */ | (8) } } |
|