Scientific Programming

Review Article

Survey of Methodologies, Approaches, and Challenges in Parallel Programming Using High-Performance Computing Systems

Table 5

Hybridity in various technologies.


Tech/API	Support for hybridity (description)	Potential disadvantages or shortcomings

OpenMP	Allows to run threads on multicore/many-core CPUs as well as offload and parallelize within devices, including GPUs	Not easy to set up for offloading to GPUs

CUDA	CUDA’s API allows management of several GPUs, it is possible to manage computations on several GPUs from a single CPU thread, several streams may be used for sequences of commands onto one or more GPUs	Requires combination with some multithreaded APIs such as OpenMP or Pthreads for load balancing across CPU + GPU systems, with MPI for clusters, many host threads may be preferred for balancing among several GPUs

OpenCL	A universal model based on kernels for execution on several, potentially different, compute devices, command queues used for several streams of computations	Requires many more lines of code when used with hybrid CPU + GPU systems compared to, e.g., OpenMP + CUDA

OpenACC	Allows to manage computations across several devices within a node	While it is possible to balance computations among devices using OpenACC functions (similarly to CUDA), CPU threads (and correspondingly APIs allowing that) might be preferred for more efficient balancing strategies [71]

MPI	The standard allows a hybrid multiprocess + multithreaded model if implementation supports it (check with MPI_init_thread (). An MPI implementation can be combined with multithreaded APIs such as OpenMP or Pthreads, a CUDA-aware MPI implementation allows using device pointers in MPI calls	Requires combining with APIs such as OpenCL or, e.g., OpenMP/CUDA to use efficiently with hybrid multicore/many-core CPUs and GPUs, such solutions are not always fully supported by every MPI implementations, e.g., CUDA features can be limited to some type of the operations, e.g., point-to-point

Apache Hadoop	Ability to manage computations with different processing paradigms: MapReduce, Spark, HiveQL, Tez, etc.	Easy basic installation but requires a lot of effort to provide production ready and secure cluster

Apache Spark	Barrier execution mode makes integration with machine learning pipelines much easier	Production ready solutions typically require external cluster manager