Research Article

Hybrid MPI and CUDA Parallelization for CFD Applications on Multi-GPU HPC Clusters

Algorithm 3

Blocking communication mode algorithm.
(1)if device_count>1 then
(2) cudaMemory (h_a, d_a, sizeof(float)n, cudaMemcpyDeviceToHost);
(3)MPI_Bsend (buf, int count, MPI_Datatype, int dest, int tag,MPI_COMM_WORLD);
(4)MPI_Recv (buf, int count, MPI_Datatype, int source, int tag,MPI_COMM_WORLD, MPI_Status n, cudaMemcpyHostToDevice);
(5) cudaMemcpy (d_a, h_a, sizeof(float)n, cudaMemcpyHostToDevice);
(6)end if