Abstract

Reducing energy consumption of data centers is an important way for cloud providers to improve their investment yield, but they must also ensure that the services delivered meet the various requirements of consumers. In this paper, we propose a resource management strategy to reduce both energy consumption and Service Level Agreement (SLA) violations in cloud data centers. It contains three improved methods for subproblems in dynamic virtual machine (VM) consolidation. For making hosts detection more effective and improving the VM selection results, first, the overloaded hosts detecting method sets a dynamic independent saturation threshold for each host, respectively, which takes the CPU utilization trend into consideration; second, the underutilized hosts detecting method uses multiple factors besides CPU utilization and the Naive Bayesian classifier to calculate the combined weights of hosts in prioritization step; and third, the VM selection method considers both current CPU usage and future growth space of CPU demand of VMs. To evaluate the performance of the proposed strategy, it is simulated in CloudSim and compared with five existing energy–saving strategies using real-world workload traces. The experimental results show that our strategy outperforms others with minimum energy consumption and SLA violation.

1. Introduction

Cloud computing [1] has revolutionized the ownership model of IT infrastructure by offering on-demand provisioning of elastic resources [2]. Due to its flexibility, low-latency, and parallel processing capability, it has become a suitable and popular platform in many areas. Many industry magnates, such as Google, IBM, Microsoft, and Amazon, have begun to put massive manpower and financial resources to promote the commercialization of cloud computing and related services [3]. A number of large-scale data centers have been built all around the world. Since the average energy consumption of a data center is almost as much as 25000 households’, the rapid expansion of the number of data centers must be accompanied by the fast increasing in energy demand. Such high energy consumption can directly lead to the increasing of carbon dioxide (CO2) emissions and operational costs of data centers [4]. In view of global warming and the return on investment reducing, the issue of high energy consumption of data centers has aroused great concern from both governments and cloud providers. Consequently, improving the energy efficiency and eliminating unnecessary energy costs have become hot spots in the industry and the main difficulty and challenge of the next-generation data centers.

Many infrastructure-based solutions have been made to deal with the problem [5], but their implementations are expensive, and the reduction of energy consumption is limited [6]. In addition, apart from the huge quantity and low power efficiency of infrastructure, inefficient usage of computing resources is another reason for the high energy consumption in cloud data centers. By collecting data from more than five thousand hosts over six months, a fact was found that the hosts in the data centers are rarely idle or fully utilized, and for most of the time, only to of their full capacity are operated [7]. Moreover, it is important to realize that the idle host uses about of its peak power consumption. All the above data indicate that inefficient resource usage leads to huge amount of energy wastage. Therefore, although many remarkable improvements on infrastructure have been made, designing effective resource management strategies to improve resource utilization is still necessary and meaningful in further decreasing energy consumption of a data center.

To address this problem, the capabilities of virtualization technology [8] should be well utilized. First, it allows multiple virtual machines (VMs) to be created on a single host and mapped to different consumers, which increases the throughput and scalability of a data center. Second, it provides a function named live migration [9]; in this way, a VM can be transformed between hosts with a close to zero downtime. With the support of dynamic migration, dynamic VM consolidation has emerged as the most popular strategy in this area recently. VMs are reallocated periodically in the dynamic VM consolidation method: some VMs are migrated from overloaded hosts to avoid performance degradation; all VMs on underutilized hosts are moved out to shut these hosts down to minimize the number of active hosts. But it should be stressed that excessive resource utilization may affect the performance of cloud services. For instance, resources requirements of some VMs may increase a lot abruptly, and during VM live migration process, resources are occupied on both source and target hosts. Maintaining a reliable Quality of Service (QoS) is essential for cloud providers as consumers pay for the services they get. SLA is the concrete form of QoS, which describes various details of service level provided to consumers [10]. Improper migrations and unconstrained VM consolidations can cause performance degradation of VMs and then lead to SLA violation. Then, a penalty must be paid to the customer, which will increase the total costs of cloud providers. Therefore, the trade-off between energy consumption and SLA violation should be found in the VM dynamic consolidation strategy.

In this paper, we propose an energy and SLA-aware resource management strategy based on dynamic VM consolidation. It intends to improve the resource utilization and the status of VM allocation in cloud data center, and then the energy consumption can be reduced while meeting the QoS delivered by cloud providers. Generally, for the dynamic VM consolidation, four subproblems need to be seriously considered: (1) overloaded hosts detection; (2) VM selection from overloaded hosts; (3) underutilized hosts detection; and (4) VM placement [11]. The proposed strategy contains methods to deal with the subproblems mentioned above. Finally, we run it on the CloudSim toolkit with real-world workload traces. Furthermore, the superiority of this strategy is demonstrated by comparing with several existing strategies. The proposing of some new and effective parameters in the strategy makes it more reasonable in the detection of overloaded and underutilized hosts and the selection of VMs from overloaded hosts than the existing strategies. Specifically, the differences from previous works along with the main contributions we made are listed as follows(1)For overloaded hosts detection, previous methods either set a common upper threshold for all hosts or take the host as the basic investigation unit to obtain its upper threshold, which makes them naïve and unreasonable. In our method, we introduce a dynamic independent saturation threshold for each host. When calculating the saturation threshold of a host, each VM in it is considered as the basic investigation unit; that is, parameters such as the type and CPU usage of each VM are considered, as well as the number of VMs on it. Accordingly, there adds a new host state type, saturated state. Meanwhile, this method takes the CPU utilization trend of host into account by introducing the saturation degree.(2)Instead of just considering CPU utilization as most of the previous overloaded hosts detecting methods do, a new indicator for candidate hosts is introduced in priority calculation process in our method. This indicator considers both the CPU usage of each VM and the number of VMs to improve the performance of the detection. In addition, the Naive Bayesian classifier is applied for predicting the variation trend of the indicator.(3)In order to accommodate these changes above, we also present a new VM selection method. For the purpose of reducing energy consumption and SLA violation, the basic idea of our method is reducing the number and cost of migrations. So, it takes both the current CPU usage and the future growth space of CPU demand of VMs into consideration, which makes it more comprehensive than the previous works.

The rest of the paper is organized as follows. The previous works related to energy-aware resource management are presented in Section 2. Section 3 is the main part of this paper, which introduces our strategy and the correlative methods in detail. Experimentation setup is depicted in Section 4. Experimental results are given and analyzed in Section 5. Finally, Section 6 provides the conclusion of our research.

Many works have done to provide high-quality serveries with minimal energy consumption in cloud data centers except infrastructure-based optimizations. In general, depending on whether they are implemented at hardware or software level, the mainstream energy-aware resource management strategies can be divided into two types.

2.1. Hardware Strategies

Hardware strategies employ parallel architectures, multicore architectures, voltage and frequency scaling, and dynamic component consolidation and deactivation to reduce energy consumption of hardware in cloud data centers. The DVFS introduced above is the most popular one among them [12]. By employing this technique, the CPU can adjust its performance dynamically. Specifically, in order to save energy consumption, the voltage and frequency of CPU will be reduced when it is not fully utilized. The DVFS has improved energy consumption issue to some extent, but it has some limitations. The methods based on DVFS are static and offline, which means the workload traces should be notified in advance, or the future CPU utilization should be predicted by leveraging the knowledge of past periods. So, they may not be suitable for using when the workload trace is unknown and irregular.

2.2. Software Strategies

Most of the software strategies introduce significant VM dynamic consolidation methods to optimize resource utilization and reduce energy consumption along cloud data center. Zhu et al. [13] studied dynamic VM consolidation problem of automated resource allocation and capacity planning. They set a static CPU utilization upper threshold of 85% and introduce a heuristic method for detecting overloaded hosts. The value of 85% was proposed by Gmach et al. [14] for the first time, based on their study of real workload. Beloglazov and Buyya [15] divides the VM allocation into two parts, allocation of new requested VMs and optimization of current placements of existing VMs. The first part is considered as a bin-packing problem, and this paper solves it by applying Modified Best Fit Decreasing (MBFD) method. For the second part, they propose four heuristics methods for choosing VMs to migrate. The four methods are Single Threshold (ST), Minimization of Migrations (MMs), Highest Potential Growth (HPG), and Random Choice (RC). Meanwhile, the authors present a decentralized architecture of the energy-aware resource management system and three stages of VM placement optimization in [16]. The stages are VM reallocation considering current resource utilization, virtual network topologies optimization, and VM reallocation considering thermal state of hosts. They prove that their heuristics perform better than DVFS.

In order to adapt to variable and unknown workload, several strategies are focusing on adopting statistical analysis of historical data. Beloglazov and Buyya [17] give a competitive analysis and prove competitive ratios of the single VM migration and dynamic VM consolidation problems. Furthermore, they propose an adaptive double CPU utilization thresholds method. In [11], they summarize and extend their previous work. The problem of dynamic VM consolidation is split into four parts, and they put forward several heuristic methods for each part. To find overloaded hosts, there are four statistical methods: Median Absolute Deviation (MAD), Interquartile Range (IQR), Local Regression (LR), and Local Regression Robust (LRR). Minimum Migration Time (MMT), Maximum Correlation (MC), and Random Choice (RC) are proposed to deal with the subproblem of VM selection. They also propose a simple method for underutilized hosts detecting and use Power-Aware Best Fit Decreasing (PABFD) for VM placement. Arianyan et al. [18] introduce a holistic method for resource management procedure in cloud data centers, which is called Enhanced Optimization (EO). Besides, Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) power and SLA-aware allocation (TPSA) method are proposed as the resource allocation methods. Moreover, for underutilized hosts detecting, methods including Available Capacity (AC), Migration Delay (MDL), and TOPSIS-Available Capacity-Number of VMs-Migration Delay (TACND) are proposed. Yadav et al. [19] introduce Maximum Utilization Minimum Size (MuMs) based on statistical analysis of hosts CPU utilization history as the VM selection method. Then, Yadav and Zhang [20] propose an adaptive heuristic M estimation Regression (MeReg) method to estimate upper CPU utilization threshold using recent CPU utilization history. Yadav et al. [21] also propose a novel overloaded host detection method called Least Medial Square Regression (LmsReg) and a VM selection method called Minimum Utilization Prediction (MuP). LmsReg is more robust than other regression techniques. MuP considers the types of application running and CPU utilization at different time periods. A multiresource double-threshold method is proposed by Yadav et al. [22] who propose two regression-based methods named Gradient Descent-based Regression (Gdr) and Maximize Correlation Percentage (MCP) to set a dynamic CPU utilization upper threshold and a dynamic Bandwidth-aware (Bw) VM selection method. Based on the first-order Markov chain model, a load detection method named Median Absolute Deviation Markov Chain Host Detection method (MadMCHD) is proposed by Melhem et al. [23] to find the future overloaded and underutilized hosts. They also add the Markov prediction model into the PABFD and propose a Markov Power Aware Best Fit Decreasing (MPABFD) method for VM placement. Ranjbari and Torkestani [24] use the Learning Automata Overload Detection (LAOD) to predict the CPU utilization of a host upon its historical usage data and determine whether it is overloaded dynamically.

With the popularity of artificial intelligence techniques, some artificial intelligence strategies are proposed to give the most optimal VM allocation, which take advantage of various genetic methods, such as neural networks, machine learning, and fuzzy method. For example, Abd et al. [25] propose a DNA-based fuzzy genetic method (DFGA) that deals with real-time tasks of dynamic users to reduce power consumption in cloud data centers. An energy-aware VM scheduling approach named PreAntPolicy is introduced by Duan et al. [26], which consists of a fractal mathematics-based prediction model and a scheduler using an improved ant colony method. Li et al. [27] first develop a multiresource double-threshold method. Then, they introduce Modified Particle Swarm Optimization (MPSO) method into VM reallocation. Ghobaei-Arani et al. [28] propose a VM placement optimization method combining learning automata theory, correlation coefficient, and ensemble prediction model. However, these methods acquire long learning periods to give good solutions. Zhou et al. [29] introduce an adaptive three-threshold framework and use a method named K-Means clustering algorithm Midrange-Interquartile range (KMI) to get the three thresholds, then the hosts are divided into four classes: less loaded hosts, little loaded hosts, normally loaded hosts, and overloaded hosts. Based on this framework, they also put forward two VM selection methods named Maximum ratio of CPU utilization to memory utilization (MRCU) and Minimum the product of a CPU utilization (MPCU) for CPU intensive and I/O intensive workload, respectively, and a VM placement method named VM Placement with Maximizing energy Efficiency (VPME).

3. Energy and SLA-Efficient Resource Management Strategy

In this section, we give the detailed introduction of the proposed energy and SLA-aware resource management strategy. For subproblems in dynamic VM consolidation, it contains three improved methods to complete overloaded hosts detecting, underutilized hosts detecting, and VM selection. Meanwhile, it uses the existing PABFD method for VM placement. In order to explain the working process of the whole resource management strategy and the relations between the four methods, we give a flow chart in Figure 1. The acronyms in the figure are the name of the methods that are detailed in the rest of this section.

When a host is judged as overloaded, some VMs selected on it are put into the migration list. However, the VMs on the migration list will not be reallocated until all hosts have been detected. In contrast, each time an underutilized host is identified, it is necessary to reallocate all its VMs immediately, and then the underutilized hosts detecting process for the next candidate host can proceed.

For ease of reference and understanding, Table 1 summarizes the acronyms of the terms defined in this section.

3.1. Overloaded Hosts Detecting Method

Theoretically, a host only will be identified as overloaded if the total CPU demands of all VMs on it exceed its total CPU capacity at some point. But performance degradation and SLA violations are already inevitable on it at this time. In order to prevent these from happening, overloaded hosts detecting method is proposed in practice by setting an upper threshold. When the CPU utilization of a host exceeds this threshold, it will be identified as overloaded, and next, some of the VMs must be migrated from it to other hosts to reduce its CPU utilization to normal. Therefore, overloaded hosts detecting method can avoid SLA violations caused by the sudden increase of CPU demands of some VMs.

Clearly, in overloaded hosts detecting process, how to determine the appropriate value of upper threshold is the key problem. At first, it should be noted that using the same upper threshold to determine whether the state of the host is overloaded is not reasonable even for hosts with the same configuration. Because the number, type, and CPU usage of VMs on them are different. Two extreme examples are given in Figure 2 to illustrate the irrationality of using a common upper threshold for all hosts.

For illustration purposes, the actual unit of CPU capacity is not used in the examples. Shaded areas represent idle CPU capacity. The two hosts and have the same total CPU capacity of 100 and the same upper threshold of 80%. is the only VM on , and the maximum amount of CPU can be required by is 100. In the present moment, the CPU usage of is 90, so the CPU utilization of is 90%, which is larger than the upper threshold. is judged as overloaded and must be migrated to other hosts. However, as long as no new VMs are created on , even if the CPU usage of increases to its maximum amount, the CPU demand on will not exceed its total CPU capacity. Furthermore, the migration of is unwise, as its CPU usage is huge; first, the time and cost of this migration are extremely large; second, it easily leads to the overload of the destination host of the migration at the next time point. For this situation, we think that keeping on and just not allowing to add new VMs is better than treating as overloaded. There are 60 VMs on , the maximum amount of CPU can be required by each VM is 100. At the present moment, the CPU usage of each VM is 1, so the CPU utilization of is 60%, which is smaller than the upper threshold, and is judged as not overloaded. However, if the CPU usage per VM increases slightly, for example by 1, at the next time point, the CPU demand on will increase to 120, which exceeds its total CPU capacity. Then the upper threshold is ineffective for at the present moment. Therefore, hosts should be judged separately by giving each of them an independent threshold. In addition, when the CPU utilization of a host exceeds some certain threshold, it may not need VM migrations, and just limiting the creation of new VMs can prevent it from overloading.

Similarly, the total CPU utilization of a host is not a complete reflection of its state, so taking host as the basic unit of investigation to get the upper threshold is also irrational. The actual situation of every VM on the host should be reflected directly in the calculation of the threshold. Based on these considerations above, we introduce a new threshold named Saturation Threshold (). Each host has its own private , and changes dynamically with the actual situation of every VM on the host. Before giving the calculation formula of , several concepts need to be clarified.

3.1.1. VM Maximum Amount of Resource ()

is determined by the type a VM belongs to, and it is equal to the maximum amount of CPU available for that type. Since all the VM types are known and fixed, of a VM is also a known fixed value.

3.1.2. VM Resource Allocation (VRA) and VM Resource Allocation Rate (VRAR)

In the actual situation, in order to reduce operating costs, instead of allocating to a VM, the cloud provider only allocate the amount of CPU that it needs in a moment for the VM to use. Therefore, equals to the actual CPU usage of a VM. Then, is calculated as

3.1.3. VM Resource Reservation (VRR) and Host Resource Reservation (HRR)

Each VM is treated as an independent object when calculating . Depending on CPU request and usage of a VM, a part of CPU capacity is reserved on the host for its future usage. Since the of a VM is fixed, with the increase of its , the growth space of its future CPU demand decreases, and accordingly, its should be reduced. Based on this fact, the calculation formula of is given as

The sum of s of all VMs on a host is called the , so as the number of VMs increases, also increases.

Finally, the calculation formula of is given as

In equation (3), the acronym stands for Host Maximum amount of Resource; it is the total CPU capacity that the host can provide. The definition of is closely related to the number of VMs and the CPU usage of each VM on a host, so using in the calculation of means we also take these two parameters into account in our overloaded hosts detecting method.

If the CPU utilization of a host exceeds its , the host is marked as saturated. of a VM is not really allocated to it and of a host is a part of CPU capacity that can be shared by all VMs on it. Therefore, it should be noted that immediate VM migrations are not required on a saturated host; it simply no longer accepts VM allocation.

When judging whether a host is overloaded, the changing trend of its CPU utilization is also a parameter that cannot be ignored. In order to add this into consideration, a concept named Saturation Degree () is introduced. It is the extent to which its CPU utilization exceeds and can be calculated as

If a host stays saturated and its increases continuously at consecutive monitoring points (the points at which remains the same are excluded), the state of the host will be changed from saturation to overload. is an adjustable parameter, its value can be optimized and finally determined through experiments.

The pseudocode of our overloaded hosts detecting method is shown in Algorithm 1. It is referred to as Dynamic Independent Saturation Threshold (DIST) method. In order to get , , and the CPU utilization of a host, all VMs on it will be traversed. So, the time complexities of them are . The rest of DIST uses numerical comparisons to determine if the host is overloaded, and the time complexity is . Therefore, the time complexity of DIST for one host is , where is the number of VMs on the host, and then, the time complexity of the entire overloaded hosts detecting process for all hosts is , where refers to the number of hosts.

Input: host
(1)Calculate and of the host
(2);
(3)if
(4)if
(5)  ;
(6)else if
(7)  ;
(8)end if
(9)if
(10)  ;
(11)  ;
(12)else
(13)  ;
(14)  ;
(15)  ;
(16)  ;
(17)end if
(18)else
(19);
(20);
(21)end if
(22)return
3.2. Underutilized Hosts Detecting Method

Migrating all VMs from underutilized hosts and then shutting them off or setting them to deep sleep mode is an efficient way to increase CPU utilization and reduce energy consumption of a cloud data center. In the proposed underutilized hosts detecting method, all active hosts except saturated ones should be put into a candidate host set for detection. We first get the priorities of all candidate hosts and then try to migrate all VMs from the host with highest priority to other unsaturated hosts while keeping them not overloaded. If the entire migration process is successfully completed, the host with highest priority is marked as underutilized, and it will be turned off or switched to deep sleep mode after all VM migrations are done. Otherwise, it will remain active. The host will be removed from the candidate host set after detection. Meanwhile, some candidate hosts have just accepted the migrated VMs, so the candidate host set and the priority of all candidate hosts should be updated. The underutilized hosts detecting method does not terminate until there is no host in the candidate set.

In priority calculation process, unlike previous works which simply use CPU utilization of a host to decide its priority, we want to take more factors into consideration to improve effectiveness. Therefore, a new indicator named Host Resource Occupancy Rate () is proposed. To explain it, the definition of Resource Occupancy is given at first.

3.2.1. VM Resource Occupancy () and Host Resource Occupancy ()

is the sum of and of a VM. is equal to the sum of s of all VMs on that host.

The calculation formula of is

The reasons for using to calculate the priority of a host are as follows. First, besides the CPU utilization of hosts at present, their possible future CPU demands and maximum CPU capacities are also critical for prioritizing them. Second, it is important to take the number of VMs into account in the priority calculation method, because the more VMs a host has, the greater the probability that it will not be underutilized in the future. Comprehensive consideration of them can obviously improve the effectiveness of underutilized hosts detection. It should be noted that, according to their definitions, can simultaneously reflect the actual CPU usage of the VM at present and the CPU capacity should be reserved for its future usage; is related the number of VMs on the host; the calculation of uses . Therefore, using to calculate the priority of a host is much better than using the CPU utilization undoubtedly.

In addition, the impact of variation trend of is another important factor which should be taken into consideration in priority calculation. Specifically, for hosts with approximately equal values at one monitoring point, the one should have higher priority if its values are likely to decrease at the next monitoring point. We use the Naive Bayesian classifier to get the probability that decreases. Then based on and the probability, an indicator named Adjusted Host Resource Occupancy Ratio () is proposed.

In the Naive Bayesian classifier, we need data samples to form a training set, and each sample is represented by a dimension vector . Each vector consists of feature attributes and a class label. There may exist classes, so the range can be expressed as . After training, for a sample which has no class label, the classifier will predict that it belongs to the class which has the highest posterior probability conditioned on the sample vector.

As we intended to use the historical data of to predict its probabilities of decreasing or not decreasing at the next monitoring point, according to the Naive Bayesian classifier, the direct method is to choose the historical data as the features of sample vector. Suppose are s observed from preceding monitoring points in time , then we get the input feature vector . The variation of can be divided into two types, decreasing and not decreasing, so the range of the class is . Specifically, the class is the state of decreasing and the class is the state of not decreasing. Similarly, for simple and efficient use of the input vector , the vector will be transformed to vector using the rule shown in the following equation:

For an input vector , our goal is to calculate and :

The class conditional probabilities and can be calculated by the following equations:

The probabilities and can be got based on training samples.where is the number of training samples of class having the value for its th feature, and is the number of training samples belonging to class ; is the number of training samples of class having the value for its th feature, and is the number of training samples belonging to class . For the special case where or is , Laplace smoothing can be used to solve it.

Besides, in (7) and (8), and are the class prior probabilities which can be estimated by the following equations:where is the total number of training samples.

Finally, the calculation of is given in equation (14). A host with a smaller should have higher priority:

Since the purpose of introducing variation trend of is just to distinguish priorities of hosts with approximately equal , we multiply by to reduce its weight. Though cannot be got according to the formulas, we can get and , and can be treated as a nonzero constant. In addition, , then the second case of equation (14) can be transformed into the following equation:

In our experiment, the interval of measurements is five minutes, and the workload data of last nearest one hour is enough for predicting the state of next monitoring point, so we let . For each prediction, we use the last nearest 24 measurements to form 13 sample vectors as a training sample set, so the s of the first 24 monitoring points are equal to their s.

The pseudocode of our underutilized hosts detecting method is shown in Algorithm 2. It is referred to as Combined Weight Prioritization (CWP) method. In order to get of a host, all VMs on it will be traversed, so the time complexity is . The rest of CWP uses double circulation to determine if the host is underutilized, and the time complexity is . Therefore, the time complexity of CWP is , where is the number of VMs, while refers to the number of hosts.

Input:
(1)Put all active hosts except saturated ones into
(2)Calculate of each host in
(3);
(4)for ()
(5)for ()
(6)  if ()
(7)   Destroy all VM reallocations of the host;
(8)   continue;
(9)  end if
(10)end for
(11);
(12) Update ;
(13)end for
(14)return
3.3. VM Selection Method

Determining which VMs to migrate from an overloaded host has a direct impact on the number and cost of migrations, i.e., inappropriate selections can lead to extra SLA violations, which in turn can increase energy consumption. In our consideration, for the VMs on an overloaded host, the one with bigger and smaller should be prioritized for migration. Bigger means it takes up a lot of CPU at present, and smaller means it has larger growth space of CPU demand in the future. This rule makes the current migration of this VM makes more sense for making the host running properly in the future, and accordingly, the total number and cost of migrations will be reduced.

Considering that using two separate factors makes the selection process difficult, it is better to find one factor that can reflect them both simultaneously. According the definition of , for the two VMs with the same , the one has smaller must has larger . So, VM with bigger and should be selected first. is the sum of and , so the selection should be based on . In conclusion, the VM with larger should have higher priority to be selected for migration.

The proposed VM selection method is referred to as Minimize Number and Cost of Migrations (MNCM) method. of each VM is already obtained in the previous part, and all VMs on the overloaded host will be traversed for selecting proper ones; therefore, the time complexity of MNCM is , where is the number of VMs.

4. Experimental Setup

In this section, the simulator, hosts and VMs characteristics, workload data, and performance metrics in our experiment are described in detail.

4.1. Simulator

It is essential to evaluate the proposed energy and SLA-efficient resource management strategy and compare it with the previous works on a large-scale data center infrastructure. However, experimentation on a real cloud data center is expensive and time-consuming. Moreover, real cloud data centers are proprietary and invisible to consumers. The experiment results are often difficult to reproduce and analyze. In addition, the influence of network and data transmission cannot be ignored, which will lead to inaccurate evaluation of energy consumption. To solve this issue, many simulators based on modeling and simulation technology are designed. They can provide an experimental environment which is very close to a real data center, and they make it much easier to evaluate and compare different resource management strategies. Considering the modern open-source CloudSim toolkit can provide reproducible results to check the cloud strategies and has very good support for energy consumption mode [30], CloudSim 4.0 is chosen as the experimental platform in this paper. More details of CloudSim are given in [31, 32].

4.2. Configuration of Hosts and VMs

In the simulation, we implement a data center which contains 800 heterogeneous hosts: half of them are HP ProLiant G4, the other half are HP ProLiant G5. The specific characteristics of these two types of servers [11] are listed in Table 2. Referring Amazon EC2, we set up four types of VMs, and their characteristics [11] are depicted in Table 3. Initially, the resources of each VM are allocated according to the resource requirements defined by its type. Then, less resources are allocated to VMs according to their workload during their lifetime dynamically, which can create opportunities for VM dynamic consolidation.

4.3. Workload Traces

To make the experiment more convincing, it is necessary to use real workload data. In this paper, the data used is derived from the CoMon project which is a monitoring infrastructure for PlanetLab [33]. We use 10 days’ workload traces collected from more than 1000 VMs on 800 hosts located at more than 500 places throughout the world [11] as shown in Table 4. These traces characterize CPU utilization in 5 min intervals of the VMs.

4.4. Performance Metrics

The goal of an energy-aware resource management strategy is to minimize the power consumption and SLA violation of the data center. To verify its effectiveness, we choose energy consumption, SLA violation metrics, energy efficiency, number of VM migrations, and number of host shutdowns as performance metrics to evaluate our strategy.

4.4.1. Energy Consumption

In comparison to other resources like memory, disk storage, and network, it has been shown that the energy consumption of a host is mostly consumed by its CPU. Even if the DVFS technique is applied, the energy consumption of a host can be approximated by a linear model with its CPU utilization. However, the introduction of modern multicore CPUs, large memory, and big hard disks makes the traditional linear model inaccurate and makes the establishment of accurate analysis model to describe the energy consumption of the host complex. Therefore, we use the real data of energy consumptions of HP ProLiant G4 and HP ProLiant G5 under different CPU utilizations derived from SPECpower benchmark (http://www.spec.org/powerssj2008/). The details of the data are shown in Table 5.

4.4.2. SLA Violation Metrics

The values of SLA violation metrics are key indicators to evaluate QoS of data center. The CPU demand of a VM arbitrarily varies over time, and SLA violations will be caused if the host is oversubscribed. Two metrics have been introduced in [11] to depict SLA violation. They are SLA violation Time per Active Host () and Performance Degradation due to Migration . VMs cannot be provided with their CPU demands if the host is experiencing the 100% CPU utilization, so is SLA violations due to overutilization of hosts. is the negative impact on the performance of a VM caused by its live migration process. The definitions of them are given aswhere and denote the number of hosts and VMs in a data center, respectively; is the time during which the host’s CPU utilization reaches 100%; is the total active time of the host; is the estimated performance degradation of caused by VM migrations, and according to Dumitrescu and Foster [34], it is set to 10% of CPU utilization during the total migration time of ; is the total CPU capacity requested by .

As the two metrics are independent and equally important, SLA Violation () is calculated by multiplying them together as

4.4.3. Energy Efficiency

A good energy-aware resource management strategy should minimize power consumption and simultaneously. However, the two metrics have a relationship of restricting each other, using them individually is hard to give an intuitive judgment of how good or bad a strategy is compared with others. Therefore, the energy efficiency () proposed in [29] as shown in (19) is used as the other metric. Obviously, the strategy with bigger value performs better.

4.4.4. Number of VM Migrations

VM migration is an expensive operation as it brings data transmission burden to the network and resources are occupied on both sources and destination hosts during the migration process.

4.4.5. Number of Host Shutdowns

Reduction in the number of switching state of hosts can lead to additional energy saving in data center, so a smaller value of number of host shutdowns represents the strategy has a better performance.

5. Experimental Results and Analysis

In this section, we first present the impact of the parameter n in the overloaded hosts detecting method, on the performance of the proposed resource management strategy and determine the optimal value for it. Then, the performance of our strategy is evaluated relying on the aforementioned metrics, and the experimental results are analyzed in comparison to some benchmark strategies.

5.1. Determine the Optimal Value of Parameter

As mentioned in Section 3.1, in DIST, the state of the host will be changed from saturation to overload if it stays saturated and its increases continuously at consecutive monitoring points (the points at which remains constant are excluded). Theoretically, when the value of is very small, the hosts are easy to get into overloaded state, and in the extreme case when the is equal to , there is no host belonging to the saturated state, because as long as a host conforms to the criteria of saturated state, it is judged as overloaded and the VM selection and migration processes on it begins. Therefore, the number of VM migrations is large, and SLA violations and energy consumption caused by VM migrations are also very large. With the increase of value, fewer and fewer hosts can change from saturated to overloaded state, the number of VM migrations will decrease and so do the SLA violations and energy consumption caused by them. However, when the value of is too large, some saturated state hosts cannot be timely converted into overloaded state for VM migrations; then the resource requests of some VMs on them may not be satisfied, resulting in increasing SLA violations and energy consumption. Finally, when exceeds a certain critical value, all hosts in saturated state will not become overloaded, and the number of VM migrations, SLA violations, and energy consumption all reach definite values and do not change with the increase of value.

In order to find the most suitable value for , we study it with the first three of the ten PlanetLab workload traces and using the metrics as evaluation criteria. The impact of on the all metrics has been studied; however, for the sake of space, we only show the impact on energy consumption, SLA violations, energy efficiency, and number of VM migrations metrics. Moreover, we find that when approaches , the values of all the metrics have been very stable, and the results obtained when is differ greatly from the results obtained when is other values. So, in order to show the critical data more clearly, we draw the results with values from to in Figures 36, and results obtained when is are listed separately in Table 6.

From Table 6, it is obvious that when is , huge number of VM migrations occur in all three groups of experiments, and accordingly, the values of and energy consumption are also very large, and the values of energy efficiency are low. The change trend of the data in the figures basically conforms to our theoretical analysis above. It can be clearly seen from the figures that when increases from to , the values of number of VM migrations, and energy consumption decreases a lot, and the values of energy efficiency increases a lot, and when , the value of the four metrics tend to be stable. It should be noted that, though the value of energy consumption basically unchanged when increases from to , the values of number of VM migrations and first decrease in a certain extent when increases from to , and then increase when increases to . Accordingly, when increases from to , the energy consumption first increases and then decreases and reaches the maximum when is . As is clear from the above descriptions, we consider as the most suitable value of to reduce both energy consumption and SLA violations.

5.2. Comparison to Benchmark Strategies

In this section, the proposed strategy is compared with five existing energy–saving strategies which use THR (with static utilization threshold 0.8), LR (with safety parameter 1.2), IQR (with safety parameter 1.5), MAD (with safety parameter 2.5), and LAOD (with safety parameter 0.9) [24] in overloaded hosts detecting phase, respectively, and use a simple method (SM) [11] in the underutilized hosts detecting phase and MMT in the VM selection phase. Additionally, our strategy and all the comparing strategies use PABFD method in VM placement phase. The five overloaded hosts detecting methods have been explained in Section 2. Safety parameter is used to control aggressiveness of these methods for consolidating VMs. The smaller the parameter, the lower the energy consumption, but the higher the level of SLA violations caused by the consolidation. The value of the safety parameter selected for each method has shown to be optimal [11]. In SM, the host with minimum CPU utilization relative to the others will be considered as underutilized if all the VMs on it can be migrated onto others while keeping them not overloaded. MMT selects the VM that requires the minimum migration time relative to the others. The migration time is estimated as the amount of RAM utilized divided by the available network bandwidth. PABDF first sorts the VMs based on their CPU utilizations in an unincreased order and then allocates each VM to the host which will have the least increase in power caused by the allocation.

In the following, we use DIST/CWP/MNCM to represent our strategy, and the comparing strategies are THR/SM/MMT, LR/SM/MMT, MAD/SM/MMT, IQR/SM/MMT, and LAOD/SM/MMT. For each strategy, experiments are executed using the 10 days of workload traces depicted in Table 4 separately. The comparison of energy consumption, SLA metrics, number of VM migrations, as well as number of host shutdowns of these strategies are reported in Figures 710. Each value in the bar graphs is the average value of ten results obtained using 10 days of data.

From Figure 7, it is obvious that the proposed strategy has a much smaller number of VM migrations compared with other strategies. Specifically, relative to the proposed strategy, LAOD/SM/MMT has the minimum difference and LR/SM/MMT has the maximum difference in the number of VM migrations. The range of difference reached 21359 to 24929, with a reduction rate range of 86.74% to 88.41%. The reason can be explained as follows: first, DIST and CWP consider the uniqueness of each host according to the actual situation of VMs on it when determining whether it is overloaded or underutilized; second, in DIST, a saturated host no longer accepts a VM allocation, which reduces the chance that it becomes overloaded and requires VM migrations; third, besides the current CPU usage, MNCM takes the future growth space of CPU demand into account. These methods make the host detecting results and the VM selection results more effective, and then the number of VM migrations is reduced.

As we can see in Figure 8, the proposed strategy also has a significant advantage on the number of host shutdowns. Since the proposed strategy properly chooses the underutilized hosts and the VMs need to be migrated from overloaded hosts, many unnecessary and incorrect migrations and the restarting of some previously shutdown hosts are prevented. As a result, it shuts down a much smaller number of hosts than the other strategies do. Compared to the proposed strategy, LR/SM/MMT has the minimum difference and IQR/SM/MMT has the maximum difference in the number of host shutdowns. The range of difference reached 4261 to 4954, with a reduction rate range of 84.28% to 86.17%.

To save space, the comparisons of the proposed strategy to the benchmark strategies on the three SLA metrics are all shown in Figure 9. According to the results, the proposed strategy has smaller values of SLAV, PDM, and SLATAH, and compared to it, LAOD/SM/MMT has the minimum difference and LR/SM/MMT has the maximum difference in the SLA metrics. The range of difference reached 2.292 to 4.002, 0.28 to 0.44, and 2.35 to 3.53, with the ranges of reduction rates of 70.31% to 80.52%, 43.75% to 55%, and 46.81% to 56.94%, respectively. First, through introducing the saturated state, DIST prevents the CPU utilization of hosts from reaching 100%; consequently, SLATAH decreases; second, the number of VM migrations of the proposed strategy is much smaller than other strategies, and then the time cost and performance degradation due to migration are smaller; thus, PDM decreases. Therefore, SLAV formed by multiplying the two metrics is also reduced.

Figure 10 depicts the comparison on energy consumption of different strategies. Notably, the proposed strategy has smaller energy consumption value than others. Specifically, relative to the proposed strategy, LR/SM/MMT has the minimum difference and THR/SM/MMT has the maximum difference in the energy consumption. The range of the difference is 37.48 kWh to 64.12 kWh, and the range of the reduction rate is 23.16% to 34.02%. Since the proposed strategy has smaller values of above metrics in comparison to the other strategies, and VM migrations and switching hosts state ON/OFF can produce extra energy consumption; this result is easy to understand.

In addition, in order to see the specific effect of our strategy in reducing energy consumption, we run NPA and DVFS to get their energy consumption values as benchmarks because they do not involve VM migration. NPA uses no energy management measures during workloads processing, and its energy consumption is 2410.8 kWh, and DVFS consumes 829.5 kWh. In comparison, the proposed strategy reduces energy consumption by 94.84% and 85.01%.

The above simulation results have shown that our strategy using the proposed three methods together produces much better performance compared to other combinations of existing methods. Then to demonstrate the validity and reliability of each of them, we combine them separately with other benchmark methods to compose various strategies. Extensive experiments are conducted for testing them, but to save space, only a selection of representative and illustrative results is listed in the paper. Table 7 shows the results of some experiments using the workload traces on April 20, 2011, and LR, SM, and MMT are taken as the benchmark methods for three phases of dynamic VM consolidation.

The first row is the baseline strategy, and the second to fourth rows are the strategies use one of the three methods. From these results, the three proposed methods work better than their corresponding benchmark methods. DIST greatly reduces SLAV as it introduces the saturated state for hosts, which prevents the CPU utilization of hosts from reaching 100%; CWP cut almost half of SLAV and energy consumption because it considers more factors and uses the Naive Bayesian classifier for prediction; and MNCM cut almost half of energy consumption because it has more comprehensive consideration in the selection of VMs. And, more remarkable, they can be well integrated. From the results shown in the last row, our strategy, DIST/CWP/MNCM, has the best result on energy efficiency compared to other combinations, that is, using them together can achieve the best overall performance.

6. Conclusions

For the energy consumption problem in cloud data centers, this study put forward a threshold-based energy and SLA-efficient resource management strategy to make a trade-off between energy consumption and SLA violation. For the subproblems in dynamic VM consolidation, the overloaded hosts detecting method DIST, the underutilized hosts detecting method CWP, and the VM selection method MNCM are proposed. Benefits from these methods are that the chance that hosts are being overloaded is reduced and underutilized hosts are turned off as much as possible. Meanwhile, the numbers of VM migrations and host shutdowns are well controlled. Therefore, energy consumption and SLA violation of the cloud data center are both reduced. The results of simulation experiments show that our strategy outperforms comparing strategies significantly on all evaluation metrics. As future work, more resource types, such as memory and network bandwidth, will be considered in addition to the CPU. Furthermore, we plan to further improve the performance of our strategy by using machine learning algorithms to predict future workloads based on historical data.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (grant nos. 11372067 and 61772112) and the Innovation Foundation of Science and Technology of Dalian (grant no. 2018J11CY010).