Research Article

Performance Optimization of 3D Lattice Boltzmann Flow Solver on a GPU

Algorithm 4

Single CUDA kernel after combining two phases of LBM.
()  int  i;
()for  (i = 0; i < timeSteps; i++)
()
()   lbm_kernel<<<GRID, BLOCK>>>(source_grid,
dest_grid, xdim, ydim, zdim, cell_size,
  grid_size);
()  cudaThreadSyncronize();
()  swap_grid(source_grid, dest_grid);
()