Abstract

Data centers, as resource providers, take advantage of virtualization technology to achieve excellent resource utilization, scalability, and high availability. However, large numbers of computing servers containing virtual machines of data centers consume a tremendous amount of energy. Thus, it is necessary to significantly improve resource utilization. Among the many issues associated with energy, scheduling plays a very important role in successful task execution and energy consumption in virtualized environments. This paper seeks to implement an energy-efficient task scheduling algorithm for virtual machines with changeless speed comprised of two main steps: assigning as many tasks as possible to virtual machines with lower energy consumption and keeping the makespan of each virtual machine within a deadline. We propose a novel scheduling algorithm for heterogeneous virtual machines in virtualized environments to effectively reduce energy consumption and finish all tasks before a deadline. The new scheduling strategy is simulated using the CloudSim toolkit package. Experimental results show that our approach outperforms previous scheduling methods by a significant margin in terms of energy consumption.

1. Introduction

Nowadays, large-scale data centers take advantage of virtualization technology [13] to achieve excellent resource utilization, scalability, and high availability, such as cloud computing. Cloud computing has achieved tremendous success in offering infrastructure/platform/software as a service based on virtualization technology. Virtualized environments provide computing resource to the clients in the form of a virtual machine (VM) which is a software machine implemented on physical machines. VM behaves like a physical machine, such that it could run different operating systems and applications. Due to poor task assignment optimization, current data centers having huge numbers of heterogeneous servers consume and simultaneously waste massive power to execute numerous assigned tasks. Out of the various energy issues, scheduling plays a very important role in successful execution of tasks in virtualized environments. Scheduling seeks maximum utilization of resources by appropriate assignment of tasks to the resources available like CPU, memory, and storage [46]. It is necessary for service providers and requesters to devise efficient scheduling.

In virtualized environments such as cloud computing, end-users simply use the available services and pay for the used services without owning any part of the infrastructure. Several criteria determine the quality of the provided service and the duration of this service (makespan), and the consumed energy is among these criteria. As shown in Section 4, energy consumption and execution time are always two opposite variables. That is to say, in general, there exists no solution which is close to the optimal value on both objectives (makespan and energy consumption) at the same time. For this kind of conflicting problems, these compromise solutions are often used to get trade-offs. Therefore, we will tackle the problem in a compromise way which means that we mainly optimize one objective with the second objective maintained at a reasonable value. For the problem in this paper, it can be represented as minimizing the energy consumption under the premise that the makespan is within a threshold value. However, since finding the optimal makespan is usually NP-hard, we aim to develop an energy-efficient task scheduling algorithm to complete tasks on heterogeneous virtual machines (VMs) in virtualized environments within a certain deadline, and the total energy consumption is minimized at the same time. Numbers of real applications can be modeled as this kind of problem such as parallel picture transmission and real-time iteration procedure of algorithm in real-time multiprocessing systems and environments.

Most of the current energy-efficient scheduling strategies are based on the dynamic voltage scaling (DVS) [7] or dynamic voltage frequency scaling (DVFS) [8, 9] with the changeable voltage and power supply. However, the current processors with available variable voltage/speed have only several discrete voltage/speed settings [10], which mean that the DVS is still in the development stage. In this paper, we mainly focus on the energy issue in task scheduling, particularly on the condition that the processors or VMs have changeless voltage supply. We prove that assigning as many tasks as possible to the VM with small speed is the key strategy of energy-efficient task scheduling and propose a new task scheduling algorithm that takes into account the makespan and energy consumption at the same time. Our new approach investigates the problem of minimizing energy consumption with schedule length constraint on VMs in virtualized environments.

The rest of the paper is organized as follows. Section 2 discusses the related works. Section 3 defines the task scheduling problem of minimizing energy consumption with schedule length constraint. A new strategy for minimizing energy of task assignment is proposed and the task scheduling algorithm based on the strategy is described mathematically in Section 4. Section 5 presents and discusses the simulations and performance analysis. We conclude our work in Section 6.

Currently, green computing is applied more and more extensively in data centers. A number of new green scheduling algorithms for saving energy and resource have been proposed, such as [11, 12]. In [13], researchers developed energy-efficient algorithms by incorporating DVS and frequency scaling technology to minimize energy consumption. For the tasks with or without precedence, scheduling algorithms proposed in [14] adopted shared slack reclamation on variable voltage/speed processors to minimize energy consumption. Researchers proved in [7] that only when all tasks were executed with the same power (or at the same speed), the total energy consumption for a computer with multiple identical processors is minimal. Some studies also investigated different ways of minimizing the energy consumption of cloud computing [15].

Numbers of studies focused on parallel applications with precedence constraint and algorithms to minimize the makespan [1621]. The QoS-based workflow scheduling algorithm proposed in [16] tried to minimize the cost of workflow execution under user-defined deadline constraint based on a novel concept called partial critical paths (PCP). The HEFT (heterogeneous earliest finish time) algorithm [17] is a kind of heuristic method based on list scheduling consisting of two phrases: calculating task prioritization and processor selection. Many previous studies [1820] proved that HEFT could get competitive result with low complexity.

Genetic algorithm (GA) [22] and the other 10 heuristics are implemented and compared in [23] and the results show that genetic algorithm (GA) behaves best in all the tested algorithms for task scheduling problems. Hybrid particle swarm optimization algorithm (HPSO) [24] belongs to the modified particle swarm optimization algorithms and researchers in [25] showed that the HPSO algorithm for task scheduling problem performs competitively in comparison with the GA based algorithm.

Most previous studies on energy consumption of task scheduling are based on homogeneous computing systems [14, 2628] or single-processor systems [29]. Researchers in [30] extend the work in [14] with AND/OR model applications which focus on shared-memory multiprocessor systems without consideration of communication. In [28], the researchers adopted DVS (i.e., slack reclamation) to develop a system based on linear programming which exploits slack using. Two scheduling algorithms for bag-of-tasks applications on clusters are proposed in [26]. Researchers of [31] proposed an energy-aware scheduling algorithm with a detailed discussion of slack time computation. The problem of energy-aware task allocation for a computational grid with DVS was studied in [32]. Energy-conscious scheduling heuristic (ECS) that takes into account both makespan and energy consumption is devised in [33].

3. Problem Definition

In the world of virtualized environments (such as cloud computing), successful task scheduling is dependent on the effectiveness of techniques used to execute the task. In our definition, the environment is assumed to be hosted in a data center composed of heterogeneous servers which provide resource by VMs. The servers and VMs may have different memory sizes, processing capacities, and failure rates. Similarly, the communication links may have different bandwidths. It is also assumed that computation can be overlapped with communication. The communications among processors are assumed to perform at the same speed on all links without contentions. Let be the data center comprised of servers . Let , where are VMs in the server . Each VM has its own computing capability or speed represented by the number of instructions per second (MIPS).

A parallel application consisting of tasks can be generally represented by a directed acyclic graph (DAG). The vertices of DAG represent the partitioned tasks of the application and the edges of DAG represent precedence constraints among the tasks (if any), as shown in Figure 1. In this paper, we only focus on independent tasks that all the solutions do not contain idle time. Many real world problems can be modeled as a DAG, such as iterative solution of systems of equations, power system simulations, and VLSI simulation programs. A DAG, , consists of a set of nodes and a set of edges. Let , where are the sets of tasks to be executed in the data center. Let , where are the sizes of the tasks (execution requirement or the number of instructions).

Because of the dissimilar natures of tasks and VMs, the execution times and energy consumption of a task running on different VMs are different. When a task runs on different VMs, the execution time of it may be different. Similarly, when the communication among two tasks is transmitted through different communication paths, the communication time may be different. Notice that when tasks are assigned to the same VM, the communication cost is zero and thus can be ignored. For ease of discussion, we only consider the independent tasks with no communication.

The scheduling optimization problem of makespan and energy consumption is defined as follows: VMs in a data center are used to finish tasks by the deadline time . Assume that is the number of tasks which are assigned to VM , for ; then . A changeless speed for each VM is denoted as . The speed is defined as MIPS. The number of instructions of task is denoted as . The execution time for task on VM is . The total execution time for tasks on VM is defined as . According to [7], the energy consumption for task on VM is , where for and . The total energy is . The optimization problem is given below.

Minimize with constraints , , , , for , and .

Taking into account energy consumption in task scheduling adds another complexity layer to an already complicated problem. Applications in our study are real-time application which means the applications are deadline-constrained. To evaluate the quality of schedules, both makespan and energy consumption should be measured explicitly. Therefore, we consider both makespan and energy consumption as the performance criteria and try to minimize the energy consumption with the makespan constraint. In Section 4, we propose a new energy-efficient task scheduling algorithm that can find an optimal or near optimal schedule to complete all tasks on VMs with minimum or near minimum energy by the deadline .

4. Energy-Efficient Scheduling with Makespan Constraint

4.1. Energy-Efficient Analysis

As discussed before, the makespan objective is a given hard constraint and we aim at determining the least possible energy consumption. We mainly focus on the biobjective problem to minimize makespan and minimize energy consumption. Unfortunately, these objectives are conflicting. Inspired by [34], we propose Proposition 1 presenting that the minimum energy consumption is obtained only when all the tasks are mapped to the VM which has the minimum speed. However, mapping all the tasks to the VM which has the minimum speed would lead to a schedule that is arbitrarily far from the optimal makespan.

Proposition 1. Let be a schedule assigning all tasks to VM ( is minimal) in topological order. Let be the energy consumption of the successful execution of schedule . Then, any schedule , with energy consumption , is such that .

Proof. Suppose that . Let be the completion time of all the tasks mapped to VM 0; then, . Let be the completion time of the last task on VM with schedule . Therefore, . Let be the task set and the task sets that are not executed on VM 0 by schedule . Then,  (there are still some tasks of to be executed on VM 0). Let , where is task set executed on VM by schedule . Then, . Let us compute the difference :
Proposition 1 presents that assigning all tasks to the VM that has minimal speed could achieve the goal of minimizing energy consumption. On the basis of Proposition 1, we present below an approximation algorithm based on list scheduling which has a lower complexity and is easy to implement.
Let be the given makespan (deadline of tasks). Let , where is the completion time of the formal last task on VM . It is obvious that if task is executed on , then the makespan will be greater than , and such that . It can be seen from the definition of that if task has less operations than task , then all the machines able to schedule with makespan less than can also be able to schedule with makespan less than . Notice that if is very large, would contain all VMs and hence all the tasks will be scheduled on the VM with the minimal leading to the most energy-efficient schedule. The proposed approach is illustrated in the proposed algorithm.

4.2. Algorithm Analysis

The time complexity of the proposed algorithm is in . The proposed algorithm could be carried out around a heap. The cost of sorting tasks is in and sorting VMs costs . The cost of heap operations is in and scheduling operations costs . The schedule returned by the proposed algorithm could ensure that the makespan is lower than or no such schedule exists.

Researchers in [34] proposed a task scheduling algorithm named CMLT which is also a kind of list scheduling. Different from our algorithm, CMLT tackles the reliability of task scheduling problems and it guarantees the makespan is lower than (not ) or returns no solution (Algorithm 1).

{
 Sort all the tasks in the decreasing order of in a waiting list
 Let
 while the waiting list is not null
  Compute for task
  if   is not null then
    Choose the VM that has the minimum from
    Assign task to VM
    Update the completion date of VM
    if the completion date of VM is bigger than the given threshold then
     Mark VM as non reusable
    end if
  else
    return no solution
 end while
 return the generated schedule
}

5. Experiments and Result Analysis

5.1. Experimental Scenarios

In the verification experiments of our algorithm, the comparison experiments were conducted on a PC with a 2.6 GHz Pentium Dual Core Processor, Windows XP platform. Besides, the experiments used CloudSim 3 simulator [35] to simulate virtualized environments. The toolkit of CloudSim 3 simulator supports modeling of virtualized environments like cloud system components such as data centers, host, VMs, and policies of scheduling. Lots of previous studies conducted evaluation experiments based on CloudSim platform.

In our experiments, a set of VMs are created with different speeds (MIPS) using VM components of CloudSim and the RAM size for all the VMs is set to 512 MB. The number of the VM set is fixed at 12. The speed of each VM (MIPS) is chosen uniformly in [102, 103]. The VM set is used for running the task sets to get the makespan and energy consumption results.

To evaluate the performance of the proposed task scheduling algorithm in virtualized environments, randomly generated problem instances are used. We have randomly generated 10 sets of tasks using Cloudlet component where the length of each task (the computation requirement) is chosen uniformly in [105, 107] and the number of each task set is set from 100 to 1000 with an increment of 100. These numbers may not be very realistic but provide comprehensive results of the tasks that are easy to read. The task sets are scheduled by the proposed algorithm and comparison algorithms when running in the VM set.

The experiment consists of two steps: firstly, the proposed algorithm and comparison algorithms schedule task sets to the VM set to get the makespan and energy consumption; secondly, the makespan and energy consumption of different algorithms are handled and evaluated. The performance of each phase of the proposed algorithm is presented in comparison with the HEFT, GA, and HPSO algorithms, which are three of the best existing scheduling algorithms. As mentioned earlier, the HEFT heuristic algorithm proposed in [17] which is a kind of list scheduling heuristic method could achieve excellent schedule in terms of makespan for independent-constraint tasks. Eleven heuristics are implemented and compared in [23] and the results show that genetic algorithm (GA) behaves best in all the tested algorithms for task scheduling problems. Researchers in [25] showed that the HPSO algorithm for task assignment problem performs competitively in comparison with the GA based algorithm. To evaluate the performance of the proposed algorithm, we implement HEFT, GA, and HPSO as well as the proposed algorithm to compare their performance. Some changes have to be made to the formal HEFT, GA, and HPSO for convenient implementation and comparison. Besides the primary properties of HEFT, GA, and HPSO, we add the constraint of deadlines: given is the task set consisting of tasks assigned to VM , and then .

The energy consumption of a schedule has been defined earlier: , where for . The makespan is computed as follows: . Moreover, some algorithms such as GA which are stochastic approaches may yield different result with each independent running process; we thus run each algorithm 8 times for every problem instance and report the average results. The average percentage improvement (API) is chosen to conduct the performance analysis of the algorithms. The API in the energy consumption for the proposed algorithm over the HEFT, GA, and HPSO, respectively, is computed as (take HEFT for instance)

5.2. Results Analysis

Experiments conducted with different numbers of tasks lead to similar results as shown in Figures 2, 3, and 4. These figures plot the makespan, the energy consumption, and the execution time, respectively, with tasks from 100 to 1000. Figure 2 shows that the proposed algorithm is more efficient than HEFT, GA, and HPSO in terms of energy consumption. In these experiments, we consider schedule length and energy consumption as the main metric. The schedule length of the proposed algorithm for different numbers of tasks is bigger than the other algorithms. This is because the selection of VM order in the proposed algorithm is conducted by giving priority to the VMs with low speed under the threshold makespan constraint, which leads to a greater makespan compared with the results of HEFT, GA, and HPSO algorithms. Notice that the HPSO is slightly better than GA in terms of makespan.

As Figure 3 shows, the proposed algorithm performs best according to energy consumption indicator. The energy consumption results with different task numbers of the proposed algorithm are much better than those of HEFT, GA, and HPSO. The more detailed results and API are presented in Table 1. As we can see from Table 1, our algorithm is better than metaheuristics (GA and HPSO) and HEFT in terms of energy consumption (e.g., its API with HPSO is from 33.23% to 55.36%). It can be also found from Figure 3 that, for the energy consumption indicator, GA is slightly better than HPSO.

Figure 4 shows the average execution times for HEFT, GA, HPSO, and the proposed algorithm. As the number of tasks increases, the running times of algorithms become longer. As Figure 4 shows, the running times of GA and the HPSO are higher than our algorithm. The running time of HEFT is nearly similar to ours. Moreover, list scheduling heuristics (the proposed algorithm and HEFT) performs better than metaheuristics (GA and HPSO) in terms of running time. The reason for this result is that heuristics based on list scheduling tends to get competitive solutions with low time complexity.

6. Conclusions

This paper has focused on energy-efficient task scheduling on VMs in virtualized environments with changeless variable speed, keeping the makespan of each VM within a deadline. We define the problem of minimizing energy consumption with the constraint of schedule length on VMs in virtualized environments. It has been proved in this paper that the speed of the VMs plays a key role in optimal solutions for energy consumption. Based on the analysis, an energy-efficient task scheduling algorithm has been proposed by combining the list scheduling and the key property of VM speed. Task scheduling on VMs has been implemented in simulation and the results demonstrated the better performance of our algorithm in comparison with three other excellent algorithms (HEFT, GA, and HPSO).

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgment

This work is supported by Shandong Province Natural Science Foundation.