Research Article

3D Data Denoising via Nonlocal Means Filter by Using Parallel GPU Strategies

Algorithm 5

CUDA multi-GPU code of NLM algorithm with partial unrolling strategy.
(1)  /*  my_in_img  and  my_out_img  are  respectively  the  sections
    of  the  images  in_img  and  out_img  splitted  between  the
     “n_gpus  GPUs.  */
(2)  int   const   i_1  =  threadIdx.x  +  blockDim.x*blockIdx.x;
(3)  int   const   i_2  =  threadIdx.y  +  blockDim.y*blockIdx.y;
(4)   /*  local  statements  */
(5)   if   ((i_1    0)  &&  (i_1  <  X_Dim)  &&  (i_2    0)  &&  (i_2  <  Y_Dim))  {
(6)   for   (i_3     0;  i_3     Z_Dim/n_gpus;  i_3 )  {
(7)    /*  compute  my_out_img   i_1  +  i_2*X_Dim  +  i_3*X_Dim*Y_Dim/n_gpus
   using  my_in_img   i_1  +  i_2*X_Dim  +  i_3*X_Dim*Y_Dim/n_gpus   */
(8)    }  }