Abstract

The paper presents state of the art of energy-aware high-performance computing (HPC), in particular identification and classification of approaches by system and device types, optimization metrics, and energy/power control methods. System types include single device, clusters, grids, and clouds while considered device types include CPUs, GPUs, multiprocessor, and hybrid systems. Optimization goals include various combinations of metrics such as execution time, energy consumption, and temperature with consideration of imposed power limits. Control methods include scheduling, DVFS/DFS/DCT, power capping with programmatic APIs such as Intel RAPL, NVIDIA NVML, as well as application optimizations, and hybrid methods. We discuss tools and APIs for energy/power management as well as tools and environments for prediction and/or simulation of energy/power consumption in modern HPC systems. Finally, programming examples, i.e., applications and benchmarks used in particular works are discussed. Based on our review, we identified a set of open areas and important up-to-date problems concerning methods and tools for modern HPC systems allowing energy-aware processing.

1. Introduction

In today’s high-performance computing (HPC) systems, consideration of energy and power plays a more and more important role. New cluster systems are designed not to exceed 20 MW of power [1] with the aim of reaching exascale performance soon. Apart from the TOP500 (https://www.top500.org/lists/top500/) performance-oriented ranking, the Green500 (https://www.top500.org/green500/) list ranks supercomputers by performance per watt. Wide adoption of GPUs helped to increase this ratio for applications that can be efficiently run on such systems. Programming and parallelization in such hybrid systems has become a necessity to obtain high performance but is also a challenge when using multi- and manycore environments. In terms of power and energy control methods, apart from scheduling, DVFS/DFS/DCT, and power capping APIs have become available for CPUs and GPUs of mobile, desktop, and server lines. Power capping is now also available in job management systems for clusters such as in Slurm that allows shutting down idle nodes, starting these again when required, allows us to set a cap on the power used through DVFS [2]. Metrics such as execution time, energy, power, and temperature are used in various contexts and in various combinations, for various applications. There is a need for constant and thorough analysis of possibilities, mechanisms, tools, and results in this field to identify current and future challenges, which is the primary aim of this work.

2. Existing Surveys

Firstly, the matter of appropriate energy and performance metrics has been investigated in the literature [3]. There are several survey papers related to energy-aware high-performance computing but as the field, technology, and features are evolving very rapidly, these lack certain aspects that we present in this paper.

Early works concerning the data centers and cloud were surveyed in [4], showing a variety of energy-aware aspects in related literature. The authors proposed a taxonomy of power/energy management in computing systems, with distinction of different abstraction levels and presented energy-related works, including the ones describing models, hardware, and software components. Our survey extends the above work with newer solutions and provides a more compact view at today’s energy/power-related issues.

The study in [5] categorizes energy-aware computing methods for servers, clusters, data centers, and grid and clouds but lacks discussion on all currently considered optimization criteria, mechanisms such as power capping as well as detailed analysis of applications, and benchmarks used in the field. Thus, we include analysis of available target optimization metrics, energy-aware control methods, and benchmarks in our classification.

The study in [6] reviews energy-aware performance analysis methodologies for HPC available in 2012 listing hardware, software, and hybrid approaches as well as tools dedicated for energy monitoring. However, the paper does not review the methodologies for controlling the energy/power budget. The main goal of the paper is to collect available energy/power monitoring techniques. In addition, paper validates the existing tools in terms of overhead, portability, and user-friendly parameters. Consequently, we add analysis on energy and power control methods in our analysis.

The study in [7] includes a survey of software methods for improving energy efficiency in parallel computing from a slightly different perspective; namely, it focuses on increasing energy efficiency for parallel computations. It discusses components such as processor, memory, and network, from application to the system level and elements such as load and mixed precision computations in parallel computing.

A survey of techniques for improving energy efficiency in distributed systems focused more on grids and clouds was presented in [8]. Compared to our work, it does not analyze in such great detailed possible optimization goals, node, and cluster level techniques or energy-aware simulation systems. Thus, we include an exhaustive list of optimization criteria used in various works and classify approaches also by device types and computing environments.

Power- and energy-related analytical models for high-performance computing systems and applications are discussed in detail in [9], with references and contributions in other works, in this particular subarea. Node architecture is discussed, and models considering CPUs, GPUs, Intel Xeon Phis, FPGAs are included. Counter-based models are analyzed. We focus more on methods and tools as well as whole simulation environments that can make use of such models.

Techniques related to energy efficiency in cluster computing are surveyed in [10], including software- and hardware-related factors that influence energy efficiency, adaptive resource management, dynamic power management (DPM), and dynamic voltage and frequency scaling (DVFS) methods. Our paper extends that work considerably in terms of the number of methods considered.

A survey of concepts, techniques, and algorithms for energy-efficient processing in ultrascale systems was discussed in [11] along with hardware mechanisms, software mechanisms for energy and power consumption, energy-aware scheduling, energy characteristics of algorithms, and algorithmic techniques for energy-aware processing. The paper can be considered as complementary to our paper as it provides descriptions of energy-aware algorithms and algorithmic techniques that we do not focus on. On the contrary, we provide a wider consideration of energy metrics and methods.

Paper [12] presents current research related to energy-efficiency and solutions related to power constrained processing in high-performance computing, on the way towards exascale computing. Specifically, it considers the power cap of 20 MW for future systems, objectives such as energy efficiency, power-aware computing, and energy and power management technologies such as DVFS and DCT. The work also surveys various power monitoring tools such as Watts Up? Pro, vendor tools such as Intel RAPL, NVIDIA NVML, AMD Application Power Management, and IBM EnergyScale, and finer grained tools such as PowerPack, Penguin PowerInsight, PowerMon [13], PowerMon2 [13], Ilsche, and High-Definition Energy Efficiency Monitoring (HDEEM). While the paper provides a detailed description of selected methods, especially DVFS and tools for monitoring, we extend characterization of energy approaches per device and system types and various optimization metrics.

The study in [14] presents how to adapt performance measuring tools for energy efficiency management of parallel applications, specifically the libadapt library and an OpenMP wrapper.

The study in [15] presents a survey of several energy savings methodologies with analysis concerning their effectiveness in an environment in which failures do occur. Energy costs of reliability are considered. An energy-reliability metric is proposed that considers energy required to run an application in such a system.

The survey presented in [16] provides a systematic approach for analyzing works related to energy efficiency including main data centers’ domains from basic equipment, including server and network devices, through management systems to end used software, all in the context of cloud computing. The proposed analysis allowed to present existing challenges and possible future works. Our survey is more concerned with HPC solutions; however, some aspects are common also for cloud-related topics.

Topics related to power monitoring for ultrascale systems are presented in [17]. The paper describes solutions used for online power measurement, including a profound analysis of the current state-of-the-art, detailed description of selected tools with examples of their usage, open areas concerning the subject, and possible future research directions. Our paper is more focused on power/energy management, providing a review of control tools, models, and simulators.

3. Motivations for This Work

In view of the existing reviews of work on energy-related aspects in high-performance computing, the contribution of our work can be considered as the up-to-date survey and analysis of progress in the field including the following aspects:(1)Study of available APIs and tools for energy and power management in HPC(2)Consideration of various target systems such as single devices, multiprocessor systems, cluster, grid, and cloud systems(3)Consideration of various device types including CPUs, GPUs, and also hybrid systems(4)Consideration of variety of optimization metrics and their combinations considered in the literature including performance, power, energy, and temperature(5)Consideration of various optimization methods including known scheduling, DVFS/DFS/DCT but also latest power capping features for both CPUs and GPUs, application optimizations, and hybrid approaches(6)Consideration of applications used for measurements and benchmarking in energy-aware works(7)Tools for prediction and simulation of energy and power consumption in HPC systems(8)Formulation of open research problems in the field based on latest developments and results

In the paper, we focus on survey of available methods and tools allowing proper configuration, management, and simulation of HPC systems for energy-aware processing. While we do not discuss designing applications, we discuss available APIs and power management tools that can be used by programmers and users of such systems. Methods that require hardware modifications such as cooling or architectural changes are out of scope of this paper.

4. Tools for Energy/Power Management in Modern HPC Systems

Available tools for energy/power management can be considered in two categories: monitoring and controlling. Depending on the approach or vendor, some tools allow for only reading the energy/power consumption while others may allow for reading and limiting (capping) the energy/power consumption. Also, some tools are intended to only limit the energy/power consumption but indirectly where a user can modify, e.g., device frequency to lower the energy consumption. Finally, there are many derived tools which are wrapping low-level drivers aforementioned above in a more user-friendly form.

A solid survey on available tools for energy/power management was presented in paper [12]. Below we propose a slightly different classification choosing the most significant tools available in 2019 and filling some gaps that are missing in the aforementioned survey.

4.1. Power Monitoring

After HPC started focusing not only on job execution time but also on energy efficiency, the researchers started monitoring the energy/power consumption of the system as a whole using external meters such as Watts Up? Pro. Such an approach has a big advantage as it monitors actual energy/power consumption. However, such external meters cannot report energy/power consumption of system subcomponents (e.g., CPU, GPU, and memory).

4.2. Power Controlling

As mentioned before, there are several indirect tools or methods that allow us to control energy and power consumption. Dynamic voltage and frequency scaling (DVFS) considered sometimes separately as DFS and DVS is one of the approaches that allow us to lower the processor voltage and/or frequency in order to reduce energy/power consumption but also the same time degrading performance. DVFS is available for both CPUs and GPUs. The study in [18] discusses differences of using DVFS on CPU and GPU.

Dynamic concurrency throttling (DCT) and concurrency packing [19] is another technique that can result in energy/power savings. By reducing number of available resources such as number of threads for an OpenMP application, a user is able to control power consumption and performance of the application.

4.3. Power Monitoring and Controlling

Full power management including monitoring energy/power consumption as well as controlling the power limits was implemented by many hardware manufacturers. Vendor-specific tools were described in detail in an appendix of [12]. The authors identified the power management tools for Intel: Running Average Power Limit (RAPL), AMD: Application Power Management (APM), IBM: EnergyScale, and NVIDIA: NVIDIA’s Management Library (NVML). It is worth to note that besides C-based programming library (NVML), NVIDIA introduced nvidia-smi—a command line utility available on the top of NVML. Both NVML and nvidia-smi are supported for most of Tesla, Quadro, Titan, and GRID lines [20].

Intel RAPL provides capabilities of monitoring and controlling power/energy consumption for privileged users through model-specific registers (MSR). Since its first release (Sandy Bridge), RAPL has used a software power model for estimating energy usage based on hardware performance counters. According to the study [21], Haswell RAPL has introduced an enhanced implementation with fully integrated voltage regulators allowing for actual energy measurements and improving the measurement accuracy. Precision of RAPL was evaluated in [22] with an external power meter and showed that the measurements are almost identical. The study in [23] reviews existing CPU RAPL measurement validations and focuses on validating RAPL DRAM power measurements using different types of DDR3 and DDR4 memory and comparing these with those from an actual hardware power meter.

Although Intel RAPL is well known and well described in the literature and the research considering processor power management and power capping is documented since SandyBridge was released, the competitors’ tools like AMD’s APM TDP Power Cap, and IBM’s EnergyScale were mostly just mentioned in many papers but never fully examined in any significant work. This seems to be one of the open areas for the researchers.

Table 1 collects basic information regarding aforementioned tools for energy/power management with comments and example work related.

4.4. Derived Tools

Performance Application Programming Interface (PAPI) since its release and first papers [27] is still developed, and recently, besides processor performance counters, it was extended by offering access to RAPL and NVML library through the PAPI interface [28].

Processor Counter Monitor (PCM) [29] is an open source library as well as a set of command line utilities designed by Intel very similar to PAPI. It is also accessing performance counters and allowing for energy/power monitoring via the RAPL interface.

Performance under Power Limits (PUPiL) [30] is an example of the hybrid hardware-software approach to achieve energy/power consumption benefits. It manipulates DVFS as well as core allocation, socket usage, memory usage, and hyperthreading. Such an approach was compared by authors to raw RAPL power capping, and the results achieved are in favor of PUPiL.

Score-P, intended for analysis and subsequent optimization of HPC applications, allows energy-aware analysis. It is shown in [31] how clock frequency affects finite element application execution time with a minimum of energy consumption on the SuperMUC infrastructure. Consequently, both energy-optimal and time-optimal configurations are distinguished with saving 2% energy and extending execution time by 14% as well as saving 14% time and taking 6% more energy.

Since Ubuntu 18.04 LTS release, power capping has become available as a user-friendly command-line utility power cap-set [32]. This tool is also based on RAPL, so it is only valuable for Intel processors. It allows for setting a power limit on each of available domains (PKG, PP0, PP1, and DRAM).

5. Classification of Energy-Aware Optimizations for High-Performance Computing

The paper classifies existing works in terms of several aspects and features, including the following major factors:Computing Environment. What and how many, especially compute, devices are considered, whether optimization is considered at the level of a single device, a single multiprocessor system, cluster, grid, or a cloud (Table 2).Device Type. What type(s) of devices are considered in optimization, specifically CPU(s), GPU(s), or hybrid CPU + accelerator environments (Table 3). It can be seen that all identified types of systems are represented by several works in the literature. However, there are few works that address energy-aware computing for hybrid CPU + accelerator systems. Additionally, there are more works addressing these issues for multicore CPUs compared to GPUs.Target Metric(s) Being Optimized. Specifically, it includes execution time, power limit, energy consumption, and temperature (Table 4). We can see that many works address the issue of minimization of energy consumption at the cost of minimal performance impact. This may be performed by identification of application phases in which power minimization can contribute to that goal. Relatively few works address consideration of network and memory components for that purpose. There is a lack of automatic profiling and adjustment for parallel applications running in hybrid CPU + accelerator systems.Energy/Power Control Method. How the devices are managed for optimization including selection of devices/scheduling, lower-level CPU frequency control, power capping APIs for CPUs/GPUs, application-level modifications, or hybrid methods (Table 5). It can be seen that direct power capping APIs, described in more detail in Section 4.3, are relatively new and have not been investigated in many works yet which opens possibilities for new solutions.

In terms of system components that can be controlled in terms of power and energy, the literature distinguishes frequency, core and uncore [45], disk [53], and network [53]. The latter can also be done through Energy-Efficient Ethernet (EEE) [78] that can turn physical layer devices into a low-power mode with savings up to even 70%—work [78] shows that the overhead of the technology is negligible for many practical scenarios. The MREEF framework considered in [57] distinguishes optimization steps such as detection of system phases, characterization of phases, classification, prediction of the upcoming system state, and reconfiguration for minimization of energy consumption (with consideration of disk and network scaling).

Table 6 correlates three of the major factors defined in Tables 35 and presents the existing works in the context of target metrics, energy/power control methods, and device types. The combination of the factors is a strong foundation for identification of both the recent trends in research regarding energy-aware high-performance computing and also open areas for future research.

While the majority of the presented works in the literature focus on performance and power or energy optimization during an application’s execution, it is also possible to consider pre- or postexecution scenarios. On the contrary, the study in [37] considers postexecution scenarios after an application on a GPU has terminated. Through the forced frequency control, it is possible to achieve lower energy consumption in such a situation compared to the default scenario. Details are considered in the tables.

Finally, applications and benchmarks used for power/energy aware optimization in HPC systems are summarized in Table 7. It can be seen that NAS Parallel Benchmarks, physical phenomena simulations, and compute intensive applications are mainly used for measurements of solution performance. By identifying the same benchmarks from various papers, it makes it possible to either cross check conclusions or integrate complementary approaches for future work.

6. Tools for Prediction and/or Simulation of Energy/Power Consumption in an HPC System

There are several systems that allow us to predict and/or simulate energy/power consumption in HPC systems. Table 8 presents the summary of the currently used tools.

GSSim [80] (Grid Scheduling Simulator) is a tool dedicated to simulate scheduling performed in a grid environment. The tasks are assigned into the underlying computation resources, and their communication is evaluated according to defined network equipment. Its extension DCworms [81] provides additional plugins for temperature and power/energy usage in a modeled data center. The simulator provides three approaches for energy modeling: static with various power-level modes, dynamic where the energy consumption depends on the resource load, and application specific which can be used for advanced model tuning. The experimental results of the performed simulations compared to real hardware measurements showed a high correlation between the simulation and a real HPC environment, for both power and thermal models [91].

MERPSYS [92] (Modeling Efficiency, Reliability and Power consumption of multilevel parallel HPC SYStems using CPUs and GPUs) simulator enables hierarchical modeling of a grid, a cluster, or a single machine architecture and test it against a defined application. The tool provides means (Java scripts specified using the web simulator interface) for the flexible system and application definitions for simulating energy consumption and the execution time. The simulator was tested using typical SPMD (Simple Program Many Data) and DAC (Divide and Conquer) applications [82].

CloudSim [84] is a framework dedicated to simulate a behavior of a cloud or a whole cloud federation, supporting an IaaS model. The tool enables modeling all main elements of the cloud architecture, including physical devices, VM allocation, cloud market, network behavior, and dynamic workflows. The results of the simulation support the data center resource provisioning, QoS, and energy-consumption analysis. CloudSim is used by researchers in academic and commercial organizations, e.g., HP Labs in the USA.

SimGrid [85] is a discrete-event simulation framework for grid environments focusing on versatility and scalability. The tool supports three different sources of the input data: two kinds of API, including MPI tracing from real applications, and a DAG (directed acyclic graph) format for task workflows. The SimGrid extension [93] enables to account energy consumption of concurrent applications in the HPC grids featuring DVFS technology of the multicore processors.

GENSim [94] is a data center simulator capable to model a mixed task input, for both interactive web service calls as well as batch tasks. The tool has been used for estimation of power consumption, assuming usage of both brown and green energy, where the latter is used for accelerating the current batch computations during the predicted peek times of the renewable energy sources. The results were validated using a real hardware experimental testbed consisting of a collection of CPU (Intel Nehalem) based cloud servers [86].

Combination of tools OMNet++ and INET [58] was used for HPC computation modeling, where energy-aware scheduling algorithms were tested. The specific cluster configuration was assumed, and a number of clients requesting totally 400 jobs were simulated. The behavior of main server components was evaluated including such procedures like switching off the idle nodes. The simulation results were compared to the results obtained in a real testbed environment.

GDCSim [87] (Green Data Center Simulator) provides a holistic solution for evaluation of data center energy consumption. The tool enables an analysis of data center geometries, workload characteristics, platform power management schemes, and applied scheduling algorithms. It supports both thermal analysis under different physical configurations (using CFD) as well as energy efficiency analysis of resource management algorithms (using event-based approach). The simulator was used for evaluation of scheduling in an HPC environment and a transnational workload on Internet data center.

GreenCloud [88] presented a packet-level simulator for a cloud, providing energy consumption model for various data center architectures. The model covers workload basic infrastructure elements: computing servers, access, aggregation, and core networks devices including various L2/L3 switches working at various network speeds (1, 10, and 100 Gb Ethernet). For the power management purposes, the simulator uses DVFS (dynamic voltage and frequency scaling) and DNS (DyNamic Shutdown) schemes along with the different workload characteristics incorporated into a defined data center model. The presented use case shows evaluation of energy consumption for two- and three-layer data center architectures including a variant supporting a high-speed (100 Gb Ethernet) interconnection.

TracSim [89] is a simulator for a typical HPC cluster with a fixed power cap, which should not be exceeded due to cooling and electric connection limitations. The assumption is that some compute jobs do not need so much power; thus, the others can use more energy consuming resources. The tool implements various scheduling policies to simulate different approaches for evaluation of the possible power level. The experiments showed that this solution can be tuned for a specific environment, i.e., a production HPC cluster at Los Alamos National Laboratory (LANL), and the overall simulation results are accurate in 90%, in most cases.

In [90], Ostermann et al. proposed a combination of three tools, providing a sophisticated, event-based simulator for a cloud environment working under the Infrastructure as a Service (IaaS) model with a given power cap for the whole modeled system. The simulator consists of the following components: (i) ASKALON [95] responsible for a scientific workflow, (ii) GroudSim [96] being the main event based engine of the solution, and (iii) DISSECT-CF [97] containing functionality related to cloud modeling. The approach evaluation was based on the simulation of scientific workflows (using traces of real execution) and showed good performance and scalability despite the fact of using such a complex solution.

The energy-aware HyperSim-G simulator [74] was used for testing genetic-based scheduling algorithms deployed in a grid environment. The tool is based on a basic version of the HyperSim-G event-based simulation package described in [98]. As an energy-saving technique, the tool utilized DVFS and performed experiments showed a systematic method of evaluation of compute grid schedulers supporting energy and performance biobjective minimization.

Design Space Exploration for GP GPU was proposed in [39], providing a tool for multiobjective evaluation of GP GPU devices in the context of specific medical or industrial applications. The analysis is performed for various parameters, including energy-efficiency, performance, or real-time capabilities evaluating the modeled devices. The simulator was designed as a distributed application deployed in a heterogeneous cloud environment, supporting a variety of GPUs, including the ones still to be released by the manufacturer. The validation of the solution was made using a real-life streaming application and showed a low error level (below 4% in the worst case) in comparison to the real devices.

In [35], Langer et al. presented a work covering energy minimization of the multicore chips for two mini-benchmark HPC applications. The optimal configuration was selected using integer linear programming and solved with heuristics. The simulation was based on the Sniper [99] package, aiming to increase efficiency by optimizing the level of the simulation accuracy. The tool was enhanced by the McPAT framework [100] providing energy-aware design space exploration for multicore chips, considering dynamic, short-circuit, and leakage power modeling.

7. Open Areas

Finally, based on the analyzed research, we can formulate open areas for research that seem crucial for further progress in the field of energy-aware high-performance computing:(1)The variety of the HPC tools used for energy/power management, presented in Table 1, shows a need for unification of various APIs provided by the different vendors, to propose a uniform power-aware API spanning available HPC computing devices such as multi- and manycore CPUs, GPUs, and accelerators, supporting a common, cardinal subset of universal parameters related to power/energy as well as performance measurements and management.(2)The usability, precision, and performance of the currently used tools for prediction and simulation presented in Table 8, in the context of their support for specific computing environments (Table 2), device types (Table 3), and used metrics, show that further development of, possibly empirical, performance-energy models for a wide range of CPU and GPU architectures for various classes of applications is required (e.g., the ones described in Table 7), including performance (power limit) functions, available for runtime usage as well as simulator environments.(3)As a conclusion from Table 6, we can recognize several open research directions, concerning the energy-aware HPC field, which still need further development:(i)Energy/power-aware methods for hybrid (CPU + accelerator) systems(ii)Optimization with any energy/power control method but targeted at minimization of product of energy and execution time(iii)Using hybrid energy/control methods for energy consumption and energy-time product minimization(4)Finally, analysis of the energy/power control methods, presented in Table 5, drives us to the following conclusions:(i)There is a need for development of tools for automatic configuration of an HPC system including power caps, for a wide variety of application classes focusing on performance and energy consumption and available for various parallel programming APIs. While there exist approaches for selected classes of applications and using MPI (e.g., [65]), there are no general tools able to adjust to a variety of application APIs. These tools can use the models proposed in the previous step as well as detect and assign an application to one of selected classes, in terms of performance-energy profiles.(ii)Automatic configuration of an HPC system in terms of performance and energy for a hybrid (CPU + accelerator) application at runtime, where off-loading of computations can be conditioned, not only by the time of the computations but also by power/energy constraints.(iii)Farther development and validation of currently existing tools focused on energy/power management area, including their functionality extensions as well as quality improvements, e.g., validation of AMD’s Application Power Management TDP Power Cap tool or IBM’s EnergyScale capabilities.

8. Summary and Future Work

In the paper, we have discussed APIs for controlling energy and power aspects of high-performance computing systems incorporating state-of-the-art CPUs and GPUs and presented tools for prediction and/or simulation of energy/power consumption in an HPC system. We analyzed approaches, control methods, optimization criteria, and programming examples as well as benchmarks used in state-of-the-art research on energy-aware high-performance computing. Specifically, solutions for systems such as workstations, clusters, grids, and clouds using various computing devices such as multi- and manycore CPUs and GPUs were presented. Optimization metrics including combinations of execution time, energy used, power consumption, and temperature were analyzed. Control methods used in available approach include scheduling, DVFS/DFS/DCT, power capping, application optimizations, and hybrid approaches. We have finally presented open areas and recommendations for future research in this field.

Conflicts of Interest

The authors declare that they have no conflicts of interest.