Abstract
Energy consumption has recently become a major concern to multiprocessor computing systems, of which the primary performance goal has traditionally been reducing execution time of applications. In the context of scheduling, there have been increasing research interests on algorithms using dynamic voltage scaling (DVS), which allows processors to operate at lower voltage supply levels at the expense of sacrificing processing speed, to acquire a satisfactory tradeoff between quality of schedule and energy consumption. The problem considered in this paper is to find a schedule for a workflow, which is normally a precedence constrained application, on a bounded number of heterogeneous DVSenabled processors, so as to minimize both makespan (overall execution time of the application) and energy consumption. A fast and efficient heuristic is proposed and evaluated using simulation with two realworld applications as well as randomly generated ones.
1. Introduction
During the last few decades, explosions in the volume of computation and/or data have stimulated a variety of researches on multiprocessor platforms (such as grids and clouds) to host complicated applications such as workflows [1, 2], which are widely used in the engineering, business, and science fields. It is not difficult to imagine that these powerful platforms, with a large (and still increasing) group of computing, storage, and connection equipment, must consume an enormous amount of energy. It has been estimated that the annual data center energy consumption in 2011 in the United States is over 100 billion kWh and at a cost of $7.4 billion [3]. According to [4], in the United States, energy consumed by the information and communication technology equipment is roughly 8% of the total and will increase 50% within a decade. This, undoubtedly, will further deteriorate the environment with increasing emission.
The increasingly challenging energy problem urges growing need in developing energyefficient solutions for multiprocessor platforms. However, most of the current researches on resource management of these platforms (e.g., Condor [5], Pegasus [6], etc.) mainly focus on achieving performance goals like high performance, high throughput, high reliability, and/or high availability to cater to users’ requirements. As a result, most existing multiprocessor platforms generally lack capability on energy saving. This renders energy consumption problem an urgent and crucial issue to address.
Recent advancement in hardware technologies [7] (including dynamic voltage and frequency scaling, resource hibernation, memory optimization, solid state drives, energyefficient computer monitors, etc.) have dealt with the energy consumption issues to some extent. However, it still remains a serious concern for software techniques such as scheduling algorithms (especially in a multiprocessor platform) to achieve substantial energy saving.
In this paper we consider workflow scheduling based on DVS, as it has demonstrated to be a promising technique in an abundance of literatures [8–12]. DVS enables processors to dynamically adjust voltage supply levels (VSLs) and CPU frequencies aiming to reduce power consumption, while an acceptable amount of performance sacrifice is paid as the expense.
With the aim at simultaneously minimizing makespan and energy consumption, the general form of the problem we considered here boils down to biobjective DAG scheduling, as we assume every workflow application is represented by a directed acyclic graph (DAG). In particular, we focus on DAG scheduling for admission control of service and marketoriented computing environments such as clouds, where a user and a service provider need to reach an agreement before the execution of the user application, and users are free to choose among different service providers. In such a scenario, a service provider needs the DAG scheduling return a competitive makespan (to attract customers) and a low energy consumption (for energy saving). Moreover, the scheduling should be performed in short time as users normally require a realtime response. There have been a few biobjective DAG scheduling heuristics in the literature. Some of these heuristics may provide quick response, but their performance leaves a considerable space to improve. Other heuristics may exhibit satisfactory performance but the scheduling cost is extremely high, and therefore not particularly suitable for the scenario discussed above. The need for fast and efficient DAG scheduling heuristics, suitable for real admission control of clouds, motivates the work presented in this paper.
This paper presents a new biobjective heuristic with the objective to simultaneously provide effective DVSbased DAG scheduling and fast scheduling time. Our heuristic is an enhancement of energy conscious scheduling heuristic (ECS) [11], which could make a quick scheduling decision, whereas the scheduling performance is often limited due to local optimum. With deliberation, we refine the core of ECS, namely, propose a novel objective function used by the RS (relative superiority) and a new criteria used by the MCER (makespanconservative energy reduction technique) phases of ECS, to derive a new heuristic. The comparison results obtained from our extensive evaluation show that our approach can make significant improvement on both makespan optimization and energy reduction while still meeting realtime response requirement. This indicates that our approach can be easily applied to admission control of service and marketoriented computing systems.
The remainder of the paper is organized as follows. Section 2 describes the background and related work. Section 3 describes the models used in our study and specifies the problem to be addressed. The proposed scheduling approach is presented in Section 4 with an illustrative example. The results of our comparative evaluation are shown in Section 5. Finally, the paper is concluded in Section 6.
2. Related Work
Dozens of static DAG scheduling heuristics aiming at minimizing makespan for heterogeneous multiprocessor systems have been presented in the literature. These heuristics are designed following different design principles. We hereby roughly classify these heuristics into listscheduling algorithms [13–16], duplicationbased algorithms [17–19], clustering algorithms [20, 21], and guided random search methods [22, 23]. Apparently, all these heuristics are different with our study in that their scheduling does not take energy consumption into account.
As DVS is a promising energy saving technique that can be incorporated into scheduling, a large number of scheduling algorithms based on DVS have been proposed for diverse applications and computing platforms. The majority of these DVSbased scheduling heuristics are conducted on homogeneous computing systems [9, 10, 24, 25], or singleprocessor systems [3, 26, 27], or focused on independent tasks [28–30]. These heuristics cannot address issues like task dependency and processor heterogeneity, which are addressed in our study.
There are also DVSbased scheduling heuristics focusing on DAG applications as well as heterogeneous systems. Huang et al. [12] proposed an enhanced energyefficient scheduling algorithm to reduce energy consumption while meeting performancebased service level agreement (e.g., deadline constraint). This algorithm exploited the slack room between initially scheduled tasks and reallocated them in a global manner to achieve power saving. Unlike this work, applications considered in our study are not deadlineconstrained, and the evaluation of the quality of schedules should be measured on both makespan and energy consumption.
Evolutionary techniques (i.e., genetic algorithm) have been widely applied to various problems (i.e., energy supply [31], space allocation [32], and multiobjective scheduling [33], etc.). Mezmaz et al. [34] proposed a hybrid genetic algorithm using DVS to simultaneously minimize makespan and energy consumption. Algorithms based on evolutionary techniques normally perform well on optimization. However, these algorithms usually require significantly high scheduling costs, even though modification may be applied to improve their efficiency [35]. As a result, these algorithms are naturally too timeconsuming for admission control of clouds where a realtime response is required.
Energyconscious scheduling heuristic (ECS) [11] is a listscheduling algorithm aiming at simultaneously minimizing makespan and energy consumption with a low complexity. The heuristic consists of two phases. In the first phase, the heuristic applies bottomlevel ranking to prioritize tasks, and then, in turn, selects the processor and the VSL for the current task so that the devised objective function, which is defined as relative superiority (RS), can be maximized. After the first phase, a temporary schedule is generated. In the second phase, a new criterion is used, which is defined as makespanconservative energy reduction technique (MCER). That is, for each prioritized task in the current schedule, all of other combinations of task, processor, and VSL are checked to see whether any of these combinations reduces the energy consumption of the task without increasing the current makespan. If so, such a combination is applied to obtain a new schedule. After the second phase, the newest schedule is returned as the scheduling result. Evaluation results demonstrate that ECS significantly outperforms energy unconscious heuristics on energy consumption. However, the RS and MCER used by ECS, which are the cores of the algorithm, consider only local optimality. As a result, the scheduling decisions made by ECS tend to be confined to a local optimum. This motivates our work to propose novel objective function and criteria and devise a new heuristic. The experimental results presented in Section 5 clearly show that our approach obtains schedules which are better than those found by ECS on both makespan optimization and energy reduction.
3. Problem Description
In this section, we describe the application model, the system model, and the energy model that used in our work and then specify the problem we are going to address.
3.1. Application Model
We use a directed acyclic graph (DAG) to represent an application to be scheduled (shown in Figure 1 with its details in Table 3). In a DAG, nodes denote tasks and edges that represent data transmission between tasks. In our work, we use to represent a DAG, which consists of a set of nodes and a set of edges . A node represents the corresponded task and an edge represents the intercommunication and precedence constraint between node and . For an edge , is called a parent node of , and is called a child node of . A child node cannot start execution until all of its parents have finished and all the required data transmission has arrived. Parentless nodes are called source nodes; childless node are called sink node. Apparently, an entry node of must be a source node and an exit node a sink node. For standardization, we specify in this paper that a DAG has only a single entry node and a single exit node. One can easily see that all DAGs with multiple entry or exit nodes can be equivalently transformed to this standardization [36]. For illustration, a simple example DAG is shown in Figure 1, where the weight attached to each edge denotes the amount of data to be transmitted.
In order to meet precedence constraint, the start time and the finish time of task on processor are computed by where represents the execution time of task on processor ; denotes the finish time of task which is the currently last task on processor ; represents the set of all parent tasks of task ; denotes the processor which task is assigned to, and if there is no task assigned to processor , is equal to zero. In the case of the entry task, we hav After the scheduling is completed, the makespan of the schedule, is defined as
3.2. System Model
We consider a set of DVSenabled heterogeneous processors which are fully interconnected and equally capable of running any applications. All the processors can run at different voltage and frequency levels. While the processor is in idle, it stays at its lowest voltage and lowest frequency level for the maximal energy saving [37]. Hereby we assume a set of DVSenabled processors (denoted by ) that are fully connected. It is assumed that the time needed to transmit per unit of data from one processor to another, named transmission rate, is constant and preknown (as illustrated in Table 2). Therefore, the time needed to transmit data from one processor to another, named transmission latency, is computed by where denotes the amount of transmitted data from task to and if task and are allocated to the same processor, the transmission latency is zero. It is also assumed that one processor can only run one task at a time and no preemption is considered.
Each processor can operate in a set of voltage supply levels (VSL, denoted by ), each of which is corresponded to a specific relative speed (as illustrated in Table 1). For task , we assume its execution time on a processor , which operates on VSL 0 (denoted by ), is preknown; thereby, the execution time of on a different VSL (denoted by ) can be obtained by the ratio of and the relative speed of VSL .
3.3. Energy Consumption Model
We adopt the energy model used in [11], which is derived from the power consumption model in complementary metaloxide semiconductor (CMOS) logic circuits. Since we assume the processors consume a certain amount of energy while idling, the total energy consumption of the execution for a DAG is comprised of direct and indirect energy consumption. The direct energy consumption is defined as where is the number of tasks, is a device related constant, is the voltage on which the processor operates when executing task , and is the amount of time taken for ’s execution. On the other hand, the indirect energy consumption is defined as where is the number of processors, is the set of idling slots (between time 0 and the makespan) on processor , is the lowest supply voltage on , and is the amount of idling time for . Then, the total energy consumption is defined as
3.4. Scheduling Problem
The scheduling problem in this study is allocating tasks in a DAG to DVSenabled heterogeneous processors, to simultaneously minimize makespan and energy consumption while still meeting precedence constraints between tasks. We assume all DAGs start execution at time 0 and the makespan is defined as the latest finish time of tasks after the scheduling is completed.
4. Methodology
In this section, we present the proposed new heuristic enhanced energy conscious scheduling heuristic (EECS), as well as a simple example for illustration purpose.
4.1. Proposed Heuristic
As presented in Algorithm 1, our heuristic first prioritizes tasks based on bottomlevel ranking (denoted by ), which is computed by adding the average computation and communication costs along the longest path of the exit node in the DAG. Next, Algorithm 2 is applied to the prioritized tasks to generate an initial schedule. However, scheduling decisions made in Algorithm 2 are inevitably limited by local greed. Therefore, the generated schedule is adjusted by Algorithm 3 for further optimization.



Algorithm 2 explains how the scheduling decision is made for each task in the initial schedule. We make scheduling decisions for tasks in turn. In each turn, one task is assigned a specific processor with a specific VSL, which is picked up from all possible combinations of processor and VSL, for optimum. Note that our scheduling aims at minimization on two objectives (i.e., makespan and energy consumption), which normally conflict with each other. This indicates the evaluation of a processorVSL combination is not straightforward. In order to make a comparison between two combinations, we devise substitution score (SUBS). For task , quantifies the score gained if a processorVSL combination is replaced by . deliberately takes into account the tradeoff between makespan minimization and energy reduction. As defined in (8), is a sum of three factors. The first factor is local energy factor, which is the difference of energy caused by the substitution with normalization by the energy consumption of current task. The second factor is local execution time factor, which is the difference of task execution time caused by the substitution with normalization by the execution time of current task. The third factor is makespan factor, which is the difference of task finish time caused by the substitution with normalization by the execution time of current task. As defined in (8), in the case of , the makespan factor is ignored, as the sign of makespan factor is always in accordance with the sign of local execution time factor: where, for task on processor with VSL , denotes the directed energy consumption of , the execution time of , and the finish time of .
In Algorithm 3, for each scheduled task, we check whether there exists another processorVSL combination, which, by replacing the currently scheduled combination, can reduce the total energy consumption (different with the MCER technique used in ECS, which consider only the energy consumption of the current task) without increasing the makespan. If so, the replacement will be enforced.
Based on the above description, it is not difficult to compute that the complexity of our heuristic is , where is the number of DAG nodes, the number of DAG edges, the number of processors, and the number of VSLs.
4.2. An Example
A simple DAG with 8 nodes is used here for illustration purpose. Figure 1 shows the DAG structure and the size of data to transmit between two interdependent tasks. Three processors (as depicted in Table 1) are assumed to run the DAG, and the execution time of each task on each processor is provided in Table 3. Additionally, Table 2 provides the data transmission rates among these processors.
Table 4 provides the  results computed for each node of the DAG example. According to these results, the tasks are sorted as follows: .
Figure 2(a) depicts the schedule generated by the first phase (i.e., RS) of ECS, and Figure 2(b) is the schedule finally obtained by ECS after applying MCER. Figures 2(c) and 2(d) show the schedules generated by the PCS phase and the GES phases of EECS, respectively. The corresponding makespan and energy consumption for each schedule is provided in Table 5.
(a) Result of RS of ECS
(b) Result of MCER of ECS
(c) Result of PCS of EECS
(d) Result of GES of EECS
In this specific example, the PCS phase of EECS generates a better schedule (with shorter makespan and less energy consumption) than the one obtained by the RS phase of ECS. By comparing Figures 2(c) and 2(d), we clearly see the effectiveness of GES on energy reduction without increasing the makespan. Although for ECS, MCER can also improve the schedule quality obtained by RS. However, the final schedule of ECS is still 4.41% down on makespan minimization and 4.60% down on energy reduction, in comparison with the result obtained by EECS. This implies that EECS can outperform ECS on both minimizing makespan and reducing energy. We verify this implication in the next section.
5. Performance Evaluation
In this section, we compare our algorithm (EECS) with ECS. We consider DAGs derived from realworld workflow applications and a simulated heterogeneous system, which consists of processors with DVS parameter setting derived from real CPU models. Simulation results demonstrate the significant improvement our algorithm makes both on makespan optimization and energy saving.
5.1. Experimental Setting
In our evaluation, we considered randomly generated DAGs and two realworld applications, which are LIGO [38] with 77 nodes and Laplace equation solver [39] with 49 nodes. When generating random DAGs, we followed the method presented in [40]. Figure 1 illustrates how a random DAG looks like. Note that the node number of LIGO and Laplace is fixed, while the node number of a random DAG randomly selected from the range of .
We also considered different numbers of resources: 3, 5, and 8. All processors are DVSenabled and the VSL parameter is randomly selected from Table 1. In order to model task execution times, we adopted the method presented in [41]. In this method, in brief, two values are selected from a uniform distribution in a certain interval. The product of the two selected values is computed and adopted as a generation of one task execution time. We classified the task execution times generated from the interval into low heterogeneity, those from into high heterogeneity.
The computation and communication ratio (CCR) is a measure that indicates whether the DAG is communication intensive, computation intensive, or moderate. The definition of CCR is the ratio between the average communication cost and the average computation cost on the target system. We considered five specific CCR values: 0.1, 0.2, 1.0, 5, and 10. With a set of generated task execution times, the communication costs of the tasks were randomly generated to keep consistency with the given CCR.
For every competing heuristic (ECS and EECS), the number of experiments conducted is 45000. Table 6 summarized the parameters used in our experiments. Specifically, for each type of DAG, the base DAG set consists of 500 random samples. This figure is combined with 5 different CCRs, 3 different numbers of processors, 2 different types of heterogeneity, and 3 different DAG types, which leads to the result of 45000. In each experiment, every algorithm is used to generate a schedule with makespan and energy consumption. Hence, the total number of experiments in our evaluation is 90000 (two algorithms were evaluated).
Finally, all the experiments were implemented by Java and run on a PC with AMD A6 CPU running at 2.20 GHz with 4 GB memory.
5.2. Comparison Metrics
In our evaluation, we consider makespan and energy consumption are equally performance metrics. For a given schedule, its makespan is normalized to a lower bound, which is the sum of the execution and communication costs of tasks along the critical path (denoted by ), while its energy consumption is normalized to a upper bound, which is the total energy consumption of the schedule in which every task is scheduled so that the energy consumption is maximized (denoted by ).
Specifically, for each experiment, the performance of each heuristic (ECS and EECS) is normalized to (makespan ratio) and (energy ratio) defined as follows: where is the makespan of the schedule and the energy consumption of the schedule.
5.3. Experimental Results
The results for each of the two different scheduling heuristics on the three different types of DAGs (note that for each DAG, results impacted by number of processors and results impacted by heterogeneity are both considered; this results in 6 pairs) are shown in Figures 3, 4, 5, 6, 7, and 8. Particularly, the actual comparative results of Figures 5 and 6 are shown in Tables 9 and 10, respectively. The results are normalized to and , respectively, as presented in Section 5.2. Recall that for each heuristic, all results are averaged over 500 runs.
(a) Low heterogeneity (MR)
(b) High heterogeneity (MR)
(c) Low heterogeneity (ER)
(d) High heterogeneity (ER)
(a) Three processors (MR)
(b) Five processors (MR)
(c) Eight processors (MR)
(d) Three processors (ER)
(e) Five processors (ER)
(f) Eight processors (ER)
(a) Low heterogeneity (MR)
(b) High heterogeneity (MR)
(c) Low heterogeneity (ER)
(d) High heterogeneity (ER)
(a) Three processors (MR)
(b) Five processors (MR)
(c) Eight processors (MR)
(d) Three processors (ER)
(e) Five processors (ER)
(f) Eight processors (ER)
(a) Low heterogeneity (MR)
(b) High heterogeneity (MR)
(c) Low heterogeneity (ER)
(d) High heterogeneity (ER)
(a) Three processors (MR)
(b) Five processors (MR)
(c) Eight processors (MR)
(d) Three processors (ER)
(e) Five processors (ER)
(f) Eight processors (ER)
In all cases depicted in the result figures, it is clear that EECS always obtained a and a less than their counterpart that ECS achieved. This indicates that EECS outperforms ECS in all cases on both makespan optimization and energy reduction.
It is interesting to see that the makespan improvement of EECS over ECS is somehow correlated with CCR. When Laplace is used, for both low and high heterogeneities, the difference between EECS and ECS is decreased, as CCR increases from 0.1 to 5. Then, this difference significantly increases, as CCR changes from 5 to 10. Such a variation of difference can also be observed when LIGO and random is used with high heterogeneity. The difference between EECS and ECS varies little when CCR is not more than 5.0. However, a significant increase of difference can be seen when CCR changes from 5 to 10. These observations imply EECS may perform particularly better than ECS in the scenario where CCR is high.
Table 7 summarized the comparative results between ECS and EECS in terms of the change of number of processors. As the size of LIGO and Laplace is fixed, different settings of processor number may correspond to a specific scenario. Here, using 3 processors indicate a “resourcehungry” situation, 8 processors indicate a “resourcerich” situation, and 5 processors indicate a medium. When LIGO is used, the improvement of EECS over ECS, on both makespan and energy, increases as the number of processors grows. In the case of Laplace, such an improvement hits the lowest when 5 processors are used, while reaching the highest when 8 processors are used. So, it seems that EECS may obtain a greater improvement over ECS in a “resourcerich” scenario.
The comparative results between ECS and EECS in terms of different processor heterogeneities are summarized in Table 8. It is clearly suggested that the advantage of EECS over ECS may be magnified as the heterogeneity of processor turns from low to high.
From Tables 7 and 8, we can see that ECS may obtain a makespan up to 17.84% and energy consumption 10.1% more than EECS. Averagely, EECS significantly outperforms ECS by 12% on makespan minimization and 8% on energy reduction.
Aside from the comparison of scheduling performance, we assessed the running times of ECS and EECS for DAGs with different sizes. The results are shown in Figure 9. Although EECS and ECS are both based on listscheduling, EECS needs a bit more running time than ECS as the computation involved in EECS is more complicated. However, as can be seen in the graph, when scheduling a DAG with 200 nodes, EECS only needs around 7 seconds on average. This suggests that EECS can still cope well with the realtime requirement of workflow scheduling for admission control of marketoriented systems.
6. Conclusion
This paper proposed EECS, a novel efficient biobjective DAG scheduling heuristic based on the enhancement to the energy conscious scheduling heuristic ECS. The proposed heuristic aims at simultaneously minimizing makespan and energy consumption with a low complexity. The experimental results suggest that EECS can significantly outperform the existing approach (i.e., ECS) on both makespan optimization and energy reduction. It also appears that EECS has a low execution time cost and thus is able to produce a schedule as a realtime response to users in marketoriented systems.
Based on the work in this paper, further work could try to examine the performance of EECS in an uncertain environment. Further study could investigate how EECS can cope with significant overestimation or underestimation of task execution time and assess its robustness against such uncertainties.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgment
The work is supported by National Natural Science Foundation of China (NSFC, Grant no. 61202361).