Research Article

Performance Optimization of 3D Lattice Boltzmann Flow Solver on a GPU

Table 2

Summary of experiments.

Experiment Description

SerialSerial implementation on single CPU core
Parallel
 AoSAoS scheme
 SoA_Push_OnlySoA scheme + push data scheme
 SoA_Pull
  SoA_Pull_OnlySoA scheme + pull data scheme
  SoA_Pull_BRSoA scheme + pull data scheme + branch divergence removal
  SoA_Pull_RRSoA scheme + pull data scheme + register reduction
  SoA_Pull_FullSoA scheme + pull data scheme + branch divergence removal + register usage reduction
  SoA_Pull_Full_TilingSoA scheme + pull data scheme + branch divergence removal + register usage reduction + tiling with data layout change