Review Article

Survey of Methodologies, Approaches, and Challenges in Parallel Programming Using High-Performance Computing Systems

Table 5

Hybridity in various technologies.

Tech/APISupport for hybridity (description)Potential disadvantages or shortcomings

OpenMPAllows to run threads on multicore/many-core CPUs as well as offload and parallelize within devices, including GPUsNot easy to set up for offloading to GPUs

CUDACUDA’s API allows management of several GPUs, it is possible to manage computations on several GPUs from a single CPU thread, several streams may be used for sequences of commands onto one or more GPUsRequires combination with some multithreaded APIs such as OpenMP or Pthreads for load balancing across CPU + GPU systems, with MPI for clusters, many host threads may be preferred for balancing among several GPUs

OpenCLA universal model based on kernels for execution on several, potentially different, compute devices, command queues used for several streams of computationsRequires many more lines of code when used with hybrid CPU + GPU systems compared to, e.g., OpenMP + CUDA

OpenACCAllows to manage computations across several devices within a nodeWhile it is possible to balance computations among devices using OpenACC functions (similarly to CUDA), CPU threads (and correspondingly APIs allowing that) might be preferred for more efficient balancing strategies [71]

MPIThe standard allows a hybrid multiprocess + multithreaded model if implementation supports it (check with MPI_init_thread (). An MPI implementation can be combined with multithreaded APIs such as OpenMP or Pthreads, a CUDA-aware MPI implementation allows using device pointers in MPI callsRequires combining with APIs such as OpenCL or, e.g., OpenMP/CUDA to use efficiently with hybrid multicore/many-core CPUs and GPUs, such solutions are not always fully supported by every MPI implementations, e.g., CUDA features can be limited to some type of the operations, e.g., point-to-point

Apache HadoopAbility to manage computations with different processing paradigms: MapReduce, Spark, HiveQL, Tez, etc.Easy basic installation but requires a lot of effort to provide production ready and secure cluster

Apache SparkBarrier execution mode makes integration with machine learning pipelines much easierProduction ready solutions typically require external cluster manager