Overlapping Communication and Computation with OpenMP and MPI
Machines comprised of a distributed collection of shared memory or SMP nodes are becoming common for parallel computing. OpenMP can be combined with MPI on many such machines. Motivations for combing OpenMP and MPI are discussed. While OpenMP is typically used for exploiting loop-level parallelism it can also be used to enable coarse grain parallelism, potentially leading to less overhead. We show how coarse grain OpenMP parallelism can also be used to facilitate overlapping MPI communication and computation for stencil-based grid programs such as a program performing Gauss-Seidel iteration with red-black ordering. Spatial subdivision or domain decomposition is used to assign a portion of the grid to each thread. One thread is assigned a null calculation region so it was free to perform communication. Example calculations were run on an IBM SP using both the Kuck & Associates and IBM compilers.