|
Tech/API | Support for hybridity (description) | Potential disadvantages or shortcomings |
|
OpenMP | Allows to run threads on multicore/many-core CPUs as well as offload and parallelize within devices, including GPUs | Not easy to set up for offloading to GPUs |
|
CUDA | CUDA’s API allows management of several GPUs, it is possible to manage computations on several GPUs from a single CPU thread, several streams may be used for sequences of commands onto one or more GPUs | Requires combination with some multithreaded APIs such as OpenMP or Pthreads for load balancing across CPU + GPU systems, with MPI for clusters, many host threads may be preferred for balancing among several GPUs |
|
OpenCL | A universal model based on kernels for execution on several, potentially different, compute devices, command queues used for several streams of computations | Requires many more lines of code when used with hybrid CPU + GPU systems compared to, e.g., OpenMP + CUDA |
|
OpenACC | Allows to manage computations across several devices within a node | While it is possible to balance computations among devices using OpenACC functions (similarly to CUDA), CPU threads (and correspondingly APIs allowing that) might be preferred for more efficient balancing strategies [71] |
|
MPI | The standard allows a hybrid multiprocess + multithreaded model if implementation supports it (check with MPI_init_thread (). An MPI implementation can be combined with multithreaded APIs such as OpenMP or Pthreads, a CUDA-aware MPI implementation allows using device pointers in MPI calls | Requires combining with APIs such as OpenCL or, e.g., OpenMP/CUDA to use efficiently with hybrid multicore/many-core CPUs and GPUs, such solutions are not always fully supported by every MPI implementations, e.g., CUDA features can be limited to some type of the operations, e.g., point-to-point |
|
Apache Hadoop | Ability to manage computations with different processing paradigms: MapReduce, Spark, HiveQL, Tez, etc. | Easy basic installation but requires a lot of effort to provide production ready and secure cluster |
|
Apache Spark | Barrier execution mode makes integration with machine learning pipelines much easier | Production ready solutions typically require external cluster manager |
|