Scientific Programming

Review Article

Energy-Aware High-Performance Computing: Survey of State-of-the-Art Tools, Techniques, and Environments

Table 4

Target metric.


Target metric	Works	Description

(1) Performance/execution time with power limit	[73]	Minimization of application running time with an upper bound on the total power consumption of compute devices selected for computations
	[47]	Shows benefit of power monitoring for a resource manager and compares results for fixed frequency mode, minimum power level assigned to a job, and automatic mode with consideration of available power
	[30]	Performance under power cap, timeliness, and efficiency/weighted speedups are considered
	[48]	Execution time vs maximum power consumption per system considered, consideration of system utilization, power consumption profiles, and cumulative distribution function of the job waiting time
	[52]	Application slowdown vs power reduction and optimization of performance per Watt
	[24]	Analysis of performance vs power limit configurations
	[42]	Analysis of performance vs power limit configurations
	[62]	Analysis of performance/execution time vs power limit configurations
	[68]	Turnaround time vs cluster power limits
	[46]	Consideration of impact of power allocation for CPU and DRAM domains on performance when power capping
	[70]	Optimization of the number of nodes and power distribution between CPU and memory in an overprovisioned HPC cluster

(2) Performance/execution time/energy minimization + thermal awareness	[33]	Task partitioning and scheduling, heuristic algorithm task partitioning, and scheduling TPS based on task partitioning compared to Min-min and PDTM
	[38]	Finding such core speeds that tasks complete before deadlines, and peak temperature is minimized
	[67]	Thermal aware task scheduling algorithms are proposed for reduction of temperature and power consumption in a data center, and job response times are considered
	[44]	Minimization of energy consumption of the system with consideration of task deadlines and temperature limit
	[66]	Workload placement with thermal consideration and analysis of cooling costs vs data center utilization

(3) Performance/execution time/value + energy optimization	[34]	Concurrent kernel scheduling on a GPU + impact of frequency scaling on performance and energy consumption
	[45]	Dynamic core and uncore tuning to achieve the best energy/performance trade-off, and the approach is to lower core frequency and increase uncore frequency for codes with low operational intensity and increase core frequency and lower uncore frequency for other codes, tuning for energy, energy delay product, and energy delay product squared
	[57]	Trade-off between performance (measured by execution time) and energy consumption (with consideration of disk and network scaling)
	[14]	Trade-off between performance and energy consumption
	[74]	Biobjective optimization task with make-span and average energy consumption
	[50]	Joint optimization of value (utility) and energy, consideration of jobs with dependent tasks, profiling, and nonprofiling-based approaches
	[51]	Performance and energy efficiency, focus on application autotuning, and framework
	[58]	Keeping performance close to initial and make energy savings
	[53]	Performance vs energy consumption, trade-off, impact of detection, and recognition thresholds on energy consumption and execution time
	[75]	Maximization of performance and minimization of energy consumption at the same time shown for the proposed GPU-RAID compared to a regular Linux-RAID
	[39]	Exploration of trade-offs between performance and energy consumption for various GPUs
	[76]	Simultaneous minimization of energy and execution time
	[77]	Minimization of energy consumption (KWh) while maintaining defined QoS (percentage of SLA violation)
	[59]	Minimization of energy consumption (J) while keeping a minimal performance influence
	[42]	Analysis of execution time vs energy usage for various power limit configurations, using DDR4 or MCDRAM memories
	[60]	Pareto optimal solutions incorporating performance and energy taking into account functions such as speed of execution vs workload size and dynamic energy vs workload size, and optimal number of processors is selected as well
	[63, 64]	Consideration of impact of DCT and combined DVFS/DCT on execution time and energy usage of hybrid MPI/OpenMP applications and controls execution of OpenMP phases with the number of threads and DVFS level based on prediction of phase execution time with event rates
	[65]	Minimization of energy delay product of MPI applications in a transparent way through reduction of CPU performance during MPI communication phases
	[43]	Exploration of execution time and energy of parallel OpenMP programs on a multicore Intel Xeon CPU through various strategies involving various loop scheduling ways, chunk sizes, optimization levels, and thread counts
	[69]	Minimization of energy used at the cost of minimal performance loss and proposes energy-aware MPI (EAM) which is an application-oblivious MPI runtime that observes MPI slack to maximize energy efficiency using power levers
	[72]	Providing a default configuration for an acceptable power-performance trade-off, with additional policies implied for a specific computing center

(4) Energy minimization	[49]	Energy minimization with no impact on performance
	[35]	Energy minimization at the cost of increased of execution time, integer linear programming-based approach in order to find a configuration with the number of cores minimizing energy consumption
	[55, 56]	Energy minimization at the cost of increased execution time, achieving energy savings while running a parallel application on a cluster through DVFS and frequency minimization during periods of lower activity: intranode optimization related to inefficiencies of communication, intranode optimization related to nonoptimal data, and computation distribution among processes of an application
	[54]	A what-if prediction approach to predict energy savings of possible optimizations, and the work focuses on identification of a set of performance counters for a power and performance model
	[36]	Finding an optimal GPU configuration (in terms of the number of threads per block and the number of blocks)
	[37]	Minimization of energy after an application has finished through frequency control
	[40]	Energy minimization at the cost of increased execution time through power capping for parallel applications on modern multi and manycore processors
	[41]	Energy minimization with low-performance degradation (aiming up to 5%)
	[61]	Energy minimization of MPI programs through frequency scaling with constraints on execution time, linear programming approach is used, and traces from MPI application execution are collected

(5) Product of energy and execution time	[36]	Finding an optimal GPU configuration (in terms of the number of threads per block and the number of blocks)