Research Article

3D Data Denoising via Nonlocal Means Filter by Using Parallel GPU Strategies

Algorithm 4

CUDA code of NLM algorithm with full unrolling strategy.
(1)  int  const  i_1  =  threadIdx.x  +  blockDim.x*blockIdx.x;
(2)  int  const  i_2  =  threadIdx.y  +  blockDim.y*blockIdx.y;
(3)  int  const  i_3  =  threadIdx.z  +  blockDim.z*blockIdx.z;
(4)   /*  localstatements  */
(5)      if  ((i_1    0)  &&  (i_1  <  X_Dim)  &&  (i_2    0)  &&  (i_2  <  Y_Dim)  &&
   (i_3     0)  &&  (i_3     Z_Dim))  {
(6)      /*  compute  out_img i_1  +  i_2*X_Dim   i_3*X_Dim*Y_Dim using
   in_img i_1  +  i_2*X_Dim  +  i_3*X_Dim*Y_Dim   */
(7)    }