Abstract

Aimed at resolving the issues of the imbalance of resources and workloads at data centers and the overhead together with the high cost of virtual machine (VM) migrations, this paper proposes a new VM migration strategy which is based on the cloud model time series workload prediction algorithm. By setting the upper and lower workload bounds for host machines, forecasting the tendency of their subsequent workloads by creating a workload time series using the cloud model, and stipulating a general VM migration criterion workload-aware migration (WAM), the proposed strategy selects a source host machine, a destination host machine, and a VM on the source host machine carrying out the task of the VM migration. Experimental results and analyses show, through comparison with other peer research works, that the proposed method can effectively avoid VM migrations caused by momentary peak workload values, significantly lower the number of VM migrations, and dynamically reach and maintain a resource and workload balance for virtual machines promoting an improved utilization of resources in the entire data center.

1. Introduction

With the rapid and continuous growth of cloud computing on a global scale, typical cloud computing techniques such as virtualization, parallel computation, and distributed database and storage have gained substantial development and been applied extensively in different areas. In particular, as one of the foundational components of cloud computing architecture, virtualization technique plays a critical role in delivering guaranteed cloud computing services [1]. By creating multiple simulating virtual machines (VMs) on the cluster of high performance network servers and providing on-demand services to users via these virtual machines, virtualization is a fundamental technique that can be used to realize the rapid deployment, dynamic allocation, and cross-domain management of IT resources [24]. Note however that driven by the constantly changing users’ demands, both the number and the workloads of virtual machines vary frequently, which, incidentally, presents a new challenge for resources scheduling and migrations of virtual machines. It has been recognized, by virtue of virtual machine migration process, that the selections of the source host machine and destination host machine are the most significant steps for virtual machine migrations.

Toward making decisions as to the selections of source and destination host machines, and also avoiding unnecessary virtual machine migrations caused by momentary peak workload values, the so-called live migration strategy for virtual machines is proposed by some researchers [58]. Currently, there are two types of virtual machine migration approaches existing in the literature: one is to combine the upper threshold and lower threshold of the host machine to manage the use of resources [9, 10]; the other is to use the workload threshold of the host machine to predict the trend of its subsequent workloads [1114]. While the former approach is able to resolve the issue of resource waste inflicted by the static workload balancing strategy, it cannot resolve the issue of aggregation conflict which exits in traditional workload balancing strategies. On the other hand, the latter approach is able to resolve the issue of “false alarm” virtual machine migrations caused by some transient peak workload values but fails to put into consideration the uncertainty and the stochastic nature of the workload values, as well as the combination of both, on host machines.

As such, toward bringing the uncertainty and randomness issue of workload values into the decision process of virtual machine migrations, and thus resulting in a more robust migration strategy, we in this paper propose a new virtual machine migration strategy which is based on the time series predication in cloud theory. This strategy basically works as follows: it first sets up the upper and lower workload thresholds for host machines, then forecasts the future workload tendency of the host machine using cloud theory, and finally stipulates a migration selection criterion and uses this criterion to select the source host machine, destination host machine, and the virtual machine to perform the desired migration. We argue that our proposed virtual machine migration strategy offers a comprehensive treatment for the uncertainty, fuzziness, and randomness of the workload values, converts qualitative notions to quantitative ones and vice versa, eliminates the aggregation conflict problem induced by virtual machine migrations due to some transient and momentary peak workload values, and contributes to obtaining a dynamic balancing of virtual machine resources.

The rest of the paper is structured as follows: Section 2 overviews related work in the literature as to the virtual machine migrations. Section 3 reviews the background knowledge of cloud model and introduces the computation of time series workload prediction. Our proposed virtual machine migration algorithm is presented in Section 4. Section 5 demonstrates the experimental results and analyses of the proposed migration strategy in comparison with other peer works, and Section 6 concludes the paper.

The subject of virtual machine migration has been extensively studied [1519]. The primary reason for this is that there is a constant increase in the number of virtual machines at cloud computing data centers, which presents new challenges in terms of the virtual machine resource scheduling and deployment. Due to the fact that the workload of a host machine for virtual machines changes dynamically in accordance with the ever-changing users’ service demands, simple and static virtual machine migration strategies are no longer adequate in delivering quality services for users.

Conventional virtual machine migration strategies can be classified into single-threshold method and dual-threshold method [20, 21]. While the single-threshold method only places an upper bound on the workloads of host machines and initiates the virtual machine migration if the workload value is over this upper bound, dual-threshold method places both an upper bound and a lower bound on the workloads of host machines and initiates the migration when the workload is over the upper bound or below the lower bound. Beloglazov et al. [2225] suggested an adaptive energy-efficient and threshold-based heuristic algorithm which controls the virtual machine migration by monitoring the resource utilization rate. Unfortunately, threshold-based migration strategies lack the ability to foresee the possible future workload trend of host machines, and consequently may trigger unnecessary and wasteful virtual machine migrations if the workload of the host machine peaks just for a moment (for arbitrary reasons).

Various workload prediction techniques are also used in the context of virtual machine migrations [2629]. In [28], Khan et al. proposed a hidden Markov model based prediction method with the restriction that the applicability of this method depends on the correlation of time and domain of the workload. Gmach et al. [26] presented a resource pool management strategy on the basis of workload analysis and demand predication but did not address the issue of the actual virtual machine migrations. Zhao and Shen [30] used the autoregressive (AR) model in time series prediction techniques, which predicts future values on the basis of a sequence of past values ordered in time, to forecast the future workload values. Generally speaking, much of the current research work on prediction techniques fails to relate the workload predication analysis of host machines with the resource management of virtual machines to obtain a more desirable migration strategy.

The main purpose of our work is to improve the existing virtual machine migration strategies by applying the cloud model time series workload prediction technique to the decision procedure and process in virtual machine migrations.

3. Time Series Workload Predication Based on Cloud Model

Cloud model was proposed by Li et al. [14] in 2000, which deals with the conversion between qualitative concepts and quantitative descriptions subject to the notion of uncertainty. There exists a certain kind of mapping between the generally ambiguous describing ability of any natural languages and what objectively exists in the world and is intended to be described by the natural languages. It is interesting to note that this mapping is matched, in a primitive manner, by the essence of the cloud model.

3.1. Cloud Model Basics

Let be a quantitative domain of precise values and be a qualitative concept over . For any , there exists a random number with a stable tendency, which represents the relevance of with respect to the concept . The distribution of over the domain is called a cloud. Each corresponds to a cloud droplet . A cloud can be quantitatively characterized by 3 numerical values: expectation , entropy , and hyper entropy (see Figure 1), where(i) denotes the most typical quantitative expectation for the qualitative concept,(ii) indicates the uncertainty of the concept. The value of shows the range of   over which the concept can be accepted (with distinct uncertainty),(iii) is the uncertainty measure of and is affected by both the randomness and the fuzziness of . The value of indirectly indicates the cohesion degree of cloud droplet .

A massive amount of cloud droplets gives rise to a cloud. Each droplet represents the phenomenon that the qualitative concept can be quantified to a point in . According to the cloud generating mechanism in cloud model, there are forward cloud generators and backward cloud generators [31].

As shown in Figure 2(a), cloud droplets (a quantitative notion) are produced by the forward cloud generator using the 3 numerical values , , and which are the characterization of a cloud (a qualitative notion). Every produced droplet is a concrete realization of the qualitative cloud. Conversely, as shown in Figure 2(b), backward cloud generator reveres the process of forward cloud generator by converting cloud droplets into the 3 numerical values , , and (which are an abstract representation of a qualitative cloud).

3.2. The Computation for Time Series Workload Predication

The procedure to obtain a time series workload prediction is depicted in Figure 3. There are four core modules in this procedure: parameter setting/optimization, data preprocessing, prediction modeling by cloud model, and model evaluation, with data preprocessing and prediction modeling by cloud model being the primary ones. The setting of parameters is extremely important as it may affect the accuracy of the prediction results. Collected sample data will be first processed into standardized form via zero-mean operation, difference operation, and center compression operation and then fed into the prediction module to get the preliminary result. Final prediction result will be obtained by performing reversed difference operations on the preliminary prediction result.

Specific steps for computing the workload prediction are given as follows.

Step 1. Let be a known workload sequence indexed by time. Process by performing zero-mean operation on to obtain the sequence . Then calculate the first-order difference of and the second-order difference of as follows:
We use to denote in subsequent steps.

Step 2. Suppose we need to predict subsequent workload values. Using the deviation technique in nonlinear data standardizations, standardize obtained in Step 1 as follows: where is the average of and is the standard deviation of .

Step 3. Let be the maximum value of calculated in Step 2, and   be the ratio of with . Process by eliminating all ’s in whose ; call the result . By the relevant formulas in backward cloud generator, calculate expectation , entropy , and hyper entropy as follows:

Step 4. Feed , , and obtained in Step 3 into a forward cloud generator. Generate a normal random number with being expectation and being variance by , and a following normal random number with being expectation and being variance by . Then calculate the certainty degree as follows:

Repeating these calculations times will produce the set of cloud droplets , which is denoted by (the preliminary predication result). Then process by reversed difference operation twice in combination with and , and that will yield the final result.

4. The Virtual Machine Migration Strategy

Virtual machine migrations in cloud computing environment are generally composed of the following steps: (1) determine the source host machine from which some virtual machine is to be removed; (2) determine the destination host machine to which some virtual machine will be moved; (3) choose an appropriate migration scheme (such as static migration, dynamic migration, or mixed migration); (4) carry out the virtual machine migration; and (5) delete the mirror image files associated with the removed virtual machine on the source host machine.

The novelty of our work is to apply the cloud model based time series prediction technique to the migration of virtual machines, namely, to the selections of source host machine, destination host machine, and the virtual machine to be migrated.

4.1. The Strategy Outline

The controlling idea of our virtual machine migration strategy may be characterized as workload-aware migration (WAM). We use the cloud model based time series prediction technique introduced in Section 3 to forecast the future workloads for each host machine. If there exists a host machine whose workload keeps going beyond the upper workload threshold, then this machine will be labeled as a potential source host machine. In a similar manner, if this machine’s workload keeps staying between the upper threshold and the lower threshold, then it will be labeled as a potential destination host machine. The flow chart of the proposed virtual machine migration strategy is shown in Figure 4.

Since the workload of each host machine at a data center changes frequently in accordance with the users’ ever-changing service demands, we need to monitor in real time the changes of the workload for all host machines. This constitutes the data source phase in Figure 4. Collected workload data will then be standardized into desired format as the input to the cloud model based prediction engine. The next step is to forecast the future workloads for host machines by the cloud model based prediction engine, and based on the prediction result, to carry out the migration by appropriately selecting the source and destination host machines as well as the virtual machine to be migrated. Finally, the virtual machine mirror image files on the source host machine will be deleted, and the virtual machine on the destination host machine will be started to continue the user services.

4.2. Workload Prediction

The history of the workload of a host machine is formed by its actual past workload values which are collected by the data collection module at data centers. A machine’s CPU workload is used as the indicator for the entire workload of that machine since a higher CPU utilization rate suggests a higher consumption of resource of the machine. In the simulation test for our virtual machine migration strategy, data collected from the PlanetLab [32] platform which can be accessed from within the cloud computing simulation software CloudSim [33] are used as the CPU workload simulation. The set of workload history of host machines is denoted by (For the sake of convenience, we list notations and their meanings used in our workload prediction algorithm in Notation and Their Meanings Section).

In order to eliminate the virtual machine migrations which are caused by momentary peak workload values, we analyze in advance the possibility of overload or underload of host machines by forecasting the future workload trend for host machines via the cloud model based workload time series prediction technique. The workload upper threshold and lower threshold are denoted by and , respectively. Both and are heuristic values. The three characteristic values , , and of a cloud are produced by a one-dimensional backward cloud generator, which are then fed into a one-dimensional forward cloud generator to produce the prediction of host machine workloads. The set of predicted future workloads for host machine (index number of host machine) is denoted by .

4.3. The Selections of Source Host Machine, Destination Host Machine, and Virtual Machine
4.3.1. The List of Source Host Machines

For each set of the predicted workload values, compute its average and denote the result as . If holds, then host machine will be added to the list of source host machines. Here, note that the chance of the virtual machine migration triggered by a single momentary peak workload value is eliminated by (5).

Also, if the predicted workload value is below the lower threshold, there would be a need to migrate the virtual machines as well. Specifically, if holds, then host machine will be added to the list of underload host machines, which indicates that all virtual machines on this host machine need to be migrated.

4.3.2. The List of Virtual Machines

There are two cases to be dealt with for the virtual machine selection depending on whether the host machine is overloaded or underloaded. For the case of underloaded host machines, every virtual machine on every host machine in the list is to be migrated. Hence all virtual machines will be added to the list which records the index number of the host machine on which a virtual machine resides as well as the average running workload value of the virtual machine. The virtual machine workload values in this list are sorted from large to small. For the case of overloaded host machines, the list needs to be traversed to select, on each host machine, the virtual machine with the largest average running workload value and to add the selected virtual machine to the list , which, like the list , records the index number of the host machine on which a virtual machine resides as well as the average running workload value of the virtual machine. Again, as for the previous case, the virtual machine workload values in this list are sorted from large to small. Note that in consideration of the migration efficiency, the available virtual machine with the largest workload value should be chosen to be migrated so that the maximum migration benefit tends to be acquired by the minimum number of migrations.

4.3.3. The List of Destination Host Machines

A host machine will be chosen as a candidate for virtual machine migrations if its predicted workload value is between the upper threshold and the lower threshold. That is, if holds, then the host machine will be added to the list .

4.4. The Migration of Virtual Machines

The actual migration of a virtual machine can be decided by using the previously obtained lists of host machines and virtual machines. If holds, where is taken from the list and is a virtual machine workload value taken from the list ; then the corresponding virtual machine will be migrated to the host machine indexed by ; subsequently, and will be removed from the lists and , respectively. The index numbers of the source host machine and destination host machine will be released back to .

For underloaded host machines, all virtual machines will be migrated to the destination host machines by (8), if the list is not empty. All information about a virtual machine will be saved into the list after the virtual machine is migrated. The detailed algorithm for virtual machine migration is shown in Algorithm 1.

Input:  , , .
Output:  .
Procedure begin
(1)   for each   in   do
(2)     Use cloud model to predict future workloads for host , and push results into list
(3)     . Calculate the average of the predicted workloads and push result into list ;
(4)     if    then
(5)      Add host ID into the list , select the VM with the largest average history
(6)      workload valueon the host , and add the VM into queue ;
(7)     end if
(8)     if    then
(9)      Add host ID into list , and add all the virtual machines on the host into ;
(10)   end if
(11)   if    then
(12)    Add host ID into list ;
(13)   end if
(14) end for
(15) for each   in   do
(16)   for each   in   do
(17)    if    then
(18)     Migrate virtual machine to host , add VM information into list ;
(19)    end if
(20)   end for
(21) end for
(22) return  ;
end Procedure

5. Simulation Results

Simulation experiments are performed to test the effectiveness of our cloud model based time series workload predication algorithm in comparison with the autoregressive (AR) time series workload prediction model [30]. Data collected via the popular cloud computing simulation software CloudSim [33] in conjunction with global network platform PlanetLab [32] are used as input to predict the future workload trend for host machines.

5.1. Raw Data

Table 1 contains the workload data of a physical node over 96 successive and 5-minute-spaced time points collected on the PlanetLab on April 20, 2011. The first 90 entries in Table 1 are used as sample input to the two workload prediction algorithms, and the remaining 6 entries are used to check the correctness and accuracy of the prediction results of the algorithms.

5.2. Experiment Metrics

Standard error measurement tools in statistics mean absolute error (MAE) and root mean squared error (RMSE) are used to evaluate the accuracy of the algorithms. If the predicated workload sequence is and the actual workload sequence is , then MAE and RMSE are given by

Although formulated differently and having separate usability and appropriateness, both MAE and RMSE provide an indication as to how well the predication results match the actual data. A smaller MAE value or RMSE value means that the prediction results fit the actual data more closely and thus are superior.

5.3. Experiment Results

Due to their unstable and discrete nature, the originally collected workload data are processed by the difference operation and the normalization operation before they are used to generate the predictions. The results of applying these operations to the original data are depicted in Figures 5 and 6, respectively.

The prediction results of our cloud model based algorithm are compared with that of the AR model; details of the comparison are recorded in Table 2 and are contrasted in Figures 7 and 8. We can see that our cloud model based algorithm has lower values than that of AR in terms of both absolute error and error rate. The MAE value and RMSE value of our algorithm and the AR model are, respectively, 2.9310, 3.0334 and 6.5187, 7.3504, which are shown in Table 3 and graphed in Figures 9 and 10. Again, it can be seen that our cloud model based algorithm yields much more accurate prediction results than the AR model, and thus it can select more appropriate host machines and virtual machines in the process of virtual machine migrations.

Finally, the prediction results of our algorithm and the AR model together with the actual workload data are charted in Figure 11 to present a simultaneous visual comparison of the effectiveness of the algorithms, and the fact that the prediction results of our algorithm approximate the actual data curve well following the sample data curve is shown in Figure 12. Both of these figures are further evidence supporting the claim that our cloud model based algorithm is more desirable than the AR model.

6. Conclusion

We proposed in this paper a new virtual machine migration strategy which predicts future workloads of host machines by using the forward and backward cloud generators in cloud model, determines source and destination host machines on the basis of the prediction result and by the WAM criterion, and selects the most resource-demanding virtual machine on source host machines to perform the migration. Through the comparison with the peer AR model time series workload prediction technique, we found that our algorithm clearly delivers a more precise workload prediction result than the AR model, and thus it provides more effective support for the selection of host machines in virtual machine migration process, reduces the number of virtual machine migrations, and eventually promotes the system to reach a dynamic resource balance improving the resource utilization rate and the virtual service quality.

Notation and Their Meanings

:The set of workload history of host machines
:The workload upper threshold
:The workload lower threshold
:The set of predicted future workloads for host machines
:The average value of predicted future workloads for host machine (indexed by )
:The list of source host machines
:The list of low workload host machines
:The list of to-be-migrated virtual machines on low workload host machines
:The list of to-be-migrated virtual machines on high workload (overloaded) host machines
:The list of destination host machines
:The average value of predicted future workloads for host machine (indexed by )
:The average value of predicted future workloads for virtual machine (indexed by )
:The list of migrated virtual machines
:The average value set of predicted future workloads for host machines.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work is supported in part by the following Grants: National Science Foundation of China (Grant no. 61272400), Chongqing Innovative Team Fund for College Development Project (Grant no. KJTD201310), Chongqing Youth Innovative Talent Project (Grant no. cstc2013kjrc-qnrc40004), Ministry of Education of China and China Mobile Research Fund (Grant no. MCM20130351), and Science and Technology on Information Transmission and Dissemination in Communication Networks Laboratory Open Project (Grant no. ITD-U13002/KX132600009).