Research Article
Performance Optimization of 3D Lattice Boltzmann Flow Solver on a GPU
Table 3
Performance (MLUPS) comparisons of four implementations.
| Domain sizes | TimeSteps | Serial | AoS | SoA_Pull_Only | SoA_Pull_Full_Tiling |
| | 1000 | 9.89 | 111.73 | 759.52 | 1034 | 5000 | 9.75 | 112.24 | 814.01 | 1115.63 | 10000 | 9.82 | 111.73 | 818.36 | 1129.32 | Avg. Perf. | 9.82 | 112.41 | 797.30 | 1092.99 |
| | 1000 | 7.64 | 78.58 | 798.21 | 1115.2 | 5000 | 7.65 | 78.74 | 855.55 | 1189.15 | 10000 | 6.98 | 78.74 | 861.42 | 1199.33 | Avg. Perf. | 7.42 | 78.69 | 838.39 | 1167.69 |
| | 1000 | 9.47 | 76.79 | 811.35 | 1114.96 | 5000 | 9.69 | 76.89 | 866.76 | 1185.95 | 10000 | 9.56 | 76.91 | 871.39 | 1205.48 | Avg. Perf. | 9.57 | 76.86 | 849.83 | 1168.8 |
| | 1000 | 8.99 | 74.74 | 787.76 | 1113.77 | 5000 | 8.91 | 75.09 | 873.8 | 1182.33 | 10000 | 8.9 | 775.14 | 883.56 | 1210.63 | Avg. Perf. | 8.93 | 74.99 | 848.37 | 1168.91 |
|
|