(1)!$omp parallel do private(idev, ib, … …)
(2)do idev = 0, 1
(3)repeat
(4) offload target(mic:idev): set_boundary_condition for each block
(5) if(icycle 1) offload target(mic:idev) wait(sgr(idev)): set_CRI_to_domain
(6) offload target(mic:idev) exchange_interface_data
(7)do ib = 1, nb(idev)
(8)  offload target(mic:idev): spatial_step
(9)  offload target(mic:idev): temporal_step
(10)  offload target(mic:idev): compute_CPI
(11)  offload_transfer target(mic:idev) out() signal((ib))
(12)end do
(13) master thread: offload_wait all related to each device
(14) master thread on CPU: exchange_interpolation_data
(15) master thread: offload_transfer target(mic:idev) in() signal(sgr(idev))
(16)until convergence
(17)!$omp end parallel do
Algorithm 1: Communication optimization algorithm.