Review Article

Survey of Methodologies, Approaches, and Challenges in Parallel Programming Using High-Performance Computing Systems

Table 1

Target/model classification of technologies.

Technology/APIAbstraction level/groupProgramming modelProgramming languageSupported platforms/target parallel systemLicense/standard

OpenMPLibraryMultithreaded applicationC/C++/FortranHeterogeneous system with CPU(s), accelerators including GPU(s) [54], supported by, e.g., gccOpenMP is a standard [7]

CUDALibraryCUDA model, computations launched as kernels executed by multiple threads grouped into blocks, global, and shared memory on the GPU as well as host memory for data managementCServer or workstation with 1 + NVIDIA GPU(s)Proprietary NVIDIA solution, NVIDIA EULA [8]1

OpenCLLibraryOpenCL model, computations launched as kernels executed by multiple work items grouped into work groups and memory objects for data managementC/C++Heterogeneous platform including CPUs, GPUs from various vendors, FPGAs, etc., supported by, e.g., gccOpenCL is a standard [9]

PthreadsLibraryMultithreaded application, provides thread management routines, synchronization mechanisms including mutexes, conditional variablesCWidely available in UNIX platforms, implementations, e.g., NPTLPart of the POSIX standard

Open ACCLibraryMultithreaded applicationC/C++/FortranHeterogeneous architectures, e.g., a server or workstation with x86/POWER + NVIDIA GPUs, support for compilers such as PGI, gcc, accULL, etc.OpenACC is a standard [10]

Java ConcurrencyJVM [14] specificMultithreaded applicationJava, scalaServer, workstation, mobile deviceOpen standards: [12, 13]

TCP/IPNetwork stackMulti-processC, Fortran, C++, Java, and othersCluster, server, workstation, mobile device, and othersTCP/IP [15] is a standard broadly implemented by OS developers

RDMANetwork stackMultiprocessCClusterRDMA [17] is a standard implemented by over InfiniBand and converged Ethernet protocols

UCXNetwork stackMultiprocess, multithreadedC, Java, PythonCluster, server, workstationUCX [21] is a set of network APIs with a reference implementation

MPILibraryMultiprocess, also multithreaded if implementation supportsC/FortranCluster, server, workstationMPI is a standard [26], several implementations available, e.g., OpenMPI and MPICH

OpenSHMEMLibraryMultiprocess applicationC, FortranClusterOpen standard with reference implementation

PCJJava libraryMultiprocess applicationJavaClusterOpen source Java library [29]
Apache HadoopSet of applicationsYARN managed resource negotiation, multiprocess MapReduce tasks [41]Core functionality in JAVA, also C, BASH, and othersCluster, server, workstationOpen source implementation of Google’s MapReduce [40], Apache software license-ASL 2.0

Apache SparkSet of applicationsResource negotiation based on the selected resource manager (YARN, Spark Standalone, etc.), executors run workers in threads [49]ScalaCluster, server, workstationApache software license-ASL 2.0 [55]