Research Article

A Performance Study of a Dual Xeon-Phi Cluster for the Forward Modelling of Gravitational Fields

Listing 1

Code fragments in FORTRAN corresponding to the Algorithm 1. It shows how the nested parallelism is implemented to control the two Xeon-Phi coprocessors embedded in the same node using the INTEL LEO extensions.
(1)!$OMP PARALLEL DEFAULT (NONE) PRIVATE(G, P_G, TID, DEVICE, STATUS) &
        !$OMP & SHARED(P_G_shared,Xm,Ym,height) &
        !$OMP & SHARED(prisms,&
        !$OMP & Xa,Xb,Ya,Yb,Za,Zb, GType, X1p, Y1p, Z1p,posx,posy,posz) &
        !$OMP & SHARED (mic_start,mic_end,density) &
(6)   !$OMP & FIRSTPRIVATE(dxp,dyp,dzp,start,end,Typeoffile,Nx,Ny)&
        !$OMP & NUM_THREADS(n_of_devices)
        ALLOCATE(G(Ny,Nx),stat=status); !Global Shared Mesh
        CALL CHECKMEMORY(status);
        G = 0.0;
(11)     P_G => G;
        TID = OMP_GET_THREAD_NUM()
        !dir$  omp offload target(mic:TID) inout (P_G:alloc_if(.true.)
        free_if(.false.))
(16)     !$OMP PARALLEL DEFAULT(NONE) SHARED(P_G,MIC_INICIO,MIC_FINAL) PRIVATE(G)&
        !$OMP & PRIVATE(DEVICE, TID) FIRSTPRIVATE(Ny,Nx,STATUS,INICIO,FIN)&
        !$OMP & FIRSTPRIVATE(dxp,dyp,dzp,Typeoffile,GTIPO) SHARED(Xa,Xb,Ya,Yb,Za,
          Zb,X1p,Y1p,Z1p,posx,posy,posz)&
      !$OMP & SHARED(Xm,Ym,height,density,prisms) NUM_THREADS(num_threads)
(21)         !$OMP SINGLE
             prisms = 0;
            !$OMP END SINGLE
            Allocate(G(Ny,Nx), Stat=status) !Local mesh for every thread created
            in the Phi
(26)       Call CHECKMEMORY (status)
           G(:,:) = 0.0;
           TID = OMP_GET_THREAD_NUM()
           DEVICE = offload_get_device_number()
(31)        !$OMP DO PRIVATE(k)
          Do k = mic_start(device+1), mic_end (device+1)
         !Number of prisms to be processed
           If (TYPEOFFILE == 1) Then
            Xa (k) = X1p (posx(k))
(36)         Xb (k) = Xa (k) + dxp
            Ya (k) = Y1p (posy(k))
            Yb (k) = Ya (k) + dyp
            Za (k) = Z1p (posz(k))
            Zb (k) = Za (k) + dzp
(41)        End If
           Call GravAnomBoxMalla (Xa(k), Ya(k), Za(k), Xb(k), Yb(k), Zb(k),
              densidad(k),GTIPO,G,&
            & Xm, Ym, height, Ny, Nx)
            !$OMP ATOMIC
            prismas = prisms + 1
(46)        !$OMP END DO
     !$OMP CRITICAL (INTERIOR)
              P_G(:,:) = P_G(:,:) + G(:,:);
     !$OMP END CRITICAL (INTERIOR)
(51)  !$OMP END PARALLEL
     !$OMP CRITICAL (EXTERIOR)
             P_G_shared(:,:) = P_G_shared(:,:) + P_G(:,:);
     !$OMP END CRITICAL (EXTERIOR)
     !$OMP END PARALLEL