Review Article

Survey of Methodologies, Approaches, and Challenges in Parallel Programming Using High-Performance Computing Systems

Table 4

Selected, important latest features and extensions in various technologies.

Tech/APIDescription of latest featuresVersionLiterature

OpenMPSupport for controlling offloading behavior (it is possible to offload to GPUs as well), extensions regarding thread affinity information management (affinity added to the task construct), data mapping clarifications and extensions, extended support for various C++ and Fortran versions5.0[7]

CUDAImproved the scalability of cudaFree in multi-GPU environments, support for cooperative group kernels with MPS, new cuBLASLt library has been added for general matrix GEMM operations, cuBLASLt now has support for FP16 matrix multiplies using tensor cores on volta and turing GPUs, improved performance of cuFFT on multi-GPU systems, some random generators in cuRAND10.1[8]

OpenCLMinor changes in the latest 2.1 to 2.2 update, e.g., added calls to clSetProgramSpecializationConstant and clSetProgramReleaseCallback, major changes in 1.2 to 2.0 update including shared virtual memory, device queues used to enqueue kernels on a device, added the possibility for kernels enqueing kernels using a device queue2.2[9]

OpenACCReduction clause on in a compute construct assumes a copy for each reduction variable, arrays and composite variables are allowed in reduction clauses, local device defined2.7[10]

Java Conc.An interoperable publish-subscribe framework with flow class and various other improvements9[67]

MPIIntroduction of nonblocking collective I/O routines, corrections in Fortran bindings3.1[26]

OpenSHMEMMultithreading support, extended type support, C11 type-generic interfaces for point-to-point synchronization, additional functions and extensions to the existing ones1.4[68]

Apache HadoopSupport for opportunistic containers, i.e., containers that are scheduled even if there is not enough resources to run them. Opportunistic containers wait for resource availability and since they have low priority, they are preempted if higher priority jobs are scheduled3.0.3[69]

Apache SparkBuilt-in avro datasource, support for eager evaluation of DataFrames2.4[70]