Research Article

Scalable Parallel Algorithm of Multiple-Relaxation-Time Lattice Boltzmann Method with Large Eddy Simulation on Multi-GPUs

Algorithm 3

Multi-GPUs of MRT-LBM-LES.
Read grid file.
Domain decomposition.
Memory allocation on host and device.
Initialization.
Copy data from host to device.
Judgement of the lattice style.
Iterative computation until satisfying convergence condition on GPUs.
Read data from global memory and propagation.
Deal with boundary condition.
Calculate macroscopic quantities.
Collision and write data back.
Data exchange of outer subdomain using MPI.
Copy data from device to host.
Gather and write data back to host memory.