Research Article

Performance Optimization of 3D Lattice Boltzmann Flow Solver on a GPU

Table 3

Performance (MLUPS) comparisons of four implementations.

Domain sizes TimeSteps Serial AoS SoA_Pull_Only SoA_Pull_Full_Tiling

10009.89111.73759.521034
50009.75112.24814.011115.63
100009.82111.73818.361129.32
Avg. Perf.9.82112.41797.301092.99

10007.6478.58798.211115.2
50007.6578.74855.551189.15
100006.9878.74861.421199.33
Avg. Perf.7.4278.69838.391167.69

10009.4776.79811.351114.96
50009.6976.89866.761185.95
100009.5676.91871.391205.48
Avg. Perf.9.5776.86849.831168.8

10008.9974.74787.761113.77
50008.9175.09873.81182.33
100008.9775.14883.561210.63
Avg. Perf.8.9374.99848.371168.91