Research Article
Hybrid MPI and CUDA Parallelization for CFD Applications on Multi-GPU HPC Clusters
Algorithm 3
Blocking communication mode algorithm.
(1) | if device_count>1 then | (2) | cudaMemory (h_a, d_a, sizeof(float)n, cudaMemcpyDeviceToHost); | (3) | MPI_Bsend (buf, int count, MPI_Datatype, int dest, int tag,MPI_COMM_WORLD); | (4) | MPI_Recv (buf, int count, MPI_Datatype, int source, int tag,MPI_COMM_WORLD, MPI_Status n, cudaMemcpyHostToDevice); | (5) | cudaMemcpy (d_a, h_a, sizeof(float)n, cudaMemcpyHostToDevice); | (6) | end if |
|