Cloud computing paradigm renders the Internet service providers (ISPs) with a new approach to deliver their service with less cost. ISPs can rent virtual machines from the Infrastructure-as-a-Service (IaaS) provided by the cloud rather than purchasing them. In addition, commercial cloud providers (CPs) offer diverse VM instance rental services in various time granularities, which provide another opportunity for ISPs to reduce cost. We investigate a Coarse-grain QoS-aware Dynamic Instance Provisioning (CDIP) problem for interactive workload in the cloud from the perspective of ISPs. We formulate the CDIP problem as an optimization problem where the objective is to minimize the VM instance rental cost and the constraint is the percentile delay bound. Since the Internet traffic shows a strong self-similar property, it is hard to get an analytical form of the percentile delay constraint. To address this issue, we purpose a lookup table structure together with a learning algorithm to estimate the performance of the instance provisioning policy. This approach is further extended with two function approximations to enhance the scalability of the learning algorithm. We also present an efficient dynamic instance provisioning algorithm, which takes full advantage of the rental service diversity, to determine the instance rental policy. Extensive simulations are conducted to validate the effectiveness of the proposed algorithms.

1. Introduction

Before the advent of cloud computing, Internet service providers (ISPs) used to reserve mass amount of resources in order to deal with the peak workload; otherwise the service response time may increase to an intolerable degree while facing the flash crowd and greatly degrade the user experience. However, this approach is energy-ineffective since peak resource utilization is often three times larger than the average utilization for a typical ISP. Things get even worse in systems that provide interactive service where the average utilization is only around 10% of the total capacity provisioned for the peak load [1]. The cloud computing technology provides a novel service paradigm called Infrastructure-as-a-Service (IaaS) to reduce the hardware cost and maintenance cost. In the IaaS, the ISPs only need to rent resource (e.g., virtual servers and network bandwidths) from the cloud providers (CPs) instead of purchasing a vast number of physical servers themselves. The IaaS service enables a more flexible and effective approach for resource provisioning. For example, users in the Amazon EC2 system can rent resource for a small period of time to cope with the flash traffic.

This paper studies a Coarse-grain Dynamic Virtual Machine (VM) Instance Provisioning (CDIP) problem for interactive workload subjected to a percentile delay constraint in the cloud from the perspective of ISPs. More specifically, this problem is related to the dynamic VM rental policy for the ISPs to minimize the resource rental cost while satisfying QoS constraints. A fine-grain (in the orders of seconds or minutes) resource provisioning policy may be more effective in increasing resource utilization and reducing cost, but it is more complex and hard to implement. For example, the startup phase of a VM instance in EC2 which “typically takes less than 10 minutes [2] (observed on November 2nd, 2013)” is not sufficient to support the fine-grain control policy. Further, the fine-grain policy can induce fluctuation and undermine the system stability. CPs like Amazon EC2 nowadays do provide a coarse-grain IaaS service instead of the fine-grain one. For example, the EC2 system offers IaaS service at 2 time scales. At a higher level, there is a VM rental service for 1 or 3 years (denoted as Reserved Instance Service, RIS); at a lower level, VM instances can also be acquired on an hourly bases (denoted as Marginal Instance Service, MIS) to absorb the instant flash traffic. Generally speaking, the cost for using MIS instances is much higher than using RIS instances (refer to Table 1 for a detailed pricing structure in Amazon’s EC2 platform). How to properly use these two services is one of the most important problems faced by ISPs to minimize cost.

Beside the VM instance rental cost, ISPs also care about the Quality-of-Service (QoS) issue for their end users. For interactive workload, traditional QoS is expressed by the mean queueing delay which is easy to analyze using classic queueing theory. However, the self-similar nature revealed in the Internet traffic [3] failed queueing-based analysis. In addition, the fact that interactive workload can tolerate some QoS violations drives researchers to propose an alternative form of QoS specification where is the system response delay, and are the desired threshold value determined by Service Level Agreement (SLA). Unfortunately, there is no analytical form of (1) for the self-similar traffic.

In this paper, we formulate the CDIP problem as an optimization problem where the QoS constraints cannot be precisely determined. We develop efficient algorithms to solve the CDIP problem and conduct numerical analysis to evaluate the proposed algorithms. Our contributions are that(i)we design a resource prediction algorithm to estimate the performance of resource provisioning policy in the self-similar traffic,(ii)we extend the resource prediction algorithm with function approximations to enhance the scalability of the algorithm,(iii)we present a VM instance provisioning algorithm for ISPs to determine the optimal number of RIS and MIS VM instance, which minimizes the VM instance rental cost.

This paper proceeds as follows. Section 2 discusses the related works; Section 3 shows the opportunity for reducing rental cost using hybrid RIS/MIS; Section 4 presents a general optimization framework for the CDIP problem as well as the solution algorithms; Section 5 extends the algorithms with function approximations to address the scalability issue; Section 6 evaluates the proposed algorithms in various settings, followed with conclusions in Section 7.

To make resource provisioning in the cloud computing environment, the first issue that must be addressed is to predict the future resource demand accurately. There are many researches dedicated to this area. Chen et al. [4] used a multiplicative Seasonal Autoregressive Moving Average (S-ARMA) approach to predict the mean and standard deviation of interarrival times and used a simple decomposed model as well as Winter’s smoothing method to predict the mean and standard deviation of file size. Gmach et al. [5] developed a pattern prediction method for cyclic workload through a workload periodogram function and an autocorrelation function. Caron and Desprez [6] used pattern matching to forecast the resource demand in the cloud. Niu et al. [7] proposed a channel interleaving scheme which can predict demand for new videos that lack historical demand data.

There are a number of works to lower the operational cost for the cloud providers (CPs). Ahmad and Vijaykumar [8] proposed a PowerTrade method to lower the total energy consumption of active servers, standby servers, and cooling facilities. They also developed a SurgeGuard method to maintain an extra number of servers at two time granularities to absorb flash crowd. Meisner et al. [1] developed a PowerNap mechanism which includes a sleep-active state scheduling component and a network interface card (NIC) supported by Wake-on-LAN functionality. The system is put into the sleep mode when there are no workloads. The NIC can wake the system up within 1 ms as long as there are packet arrivals from the networks. Leverich and Kozyrakis [9] integrated Hadoop system with an energy controller which recasts the data layout and task distribution to enable significant portions of a cluster to be shut down. Our work, on the other hand, studies how to reduce the cost from the perspective of Internet service providers (ISPs).

There are some recent researches close to our works. In [10], the author formulated the resource leasing problem as an Integer Programming Problem (IPP) and developed CoH, a family of heuristic policy to solve the problem. However, [10] treated batch jobs only and had little SLA considerations. Reference [11] also studied the instance provisioning problem and purposed a dynamic instance purchasing scheme based on the Central Limit Theorem to minimize the cost. The SLA constraint they considered is the overload probability which is not suitable for delay-sensitive interactive workload. The works [12, 13] make resource provisioning decision based on the Autoregressive Integrated Moving Average (ARIMA) prediction method; they still did not consider delay constraint. In contrast, [14] explicitly incorporated the delay into the objective function of the optimization problem. However, the delay was derived based on Markovian queueing theory which is not the case in today’s Internet dominated by self-similar traffic.

3. Problem Statement

The structure of a data center in a cloud computing system is shown in Figure 1. Inside the data center, there are a number of physical servers. A physical server hosts one or more Virtual Machine (VM) according to its resource capacity. Note that we only present the VM instead of the physical server in the figure. An ISP rents VMs from the cloud provider serve to its end users. To reduce the request response time, the data center often employs a shared queue structure.

The arrival rate of end user varies over time, which induces a time-changing VM instance demand. Figure 2 presents an example which divides a day into 8 phases (3 hr/phase) and the -axis shows the VM instance demand to ensure the QoS requirement in each phase. The marginal rental cost in Amazon EC2 is given in Table 1. From Figure 2, we can see that there is a big gap between the maximum and the minimum instance demand. If the ISP only uses RIS instance, he must acquire 23 instances in order to satisfy the peak workload appeared in the 6th phase, which wastes a lot of resource and rises the daily instance rental cost to 247.96$ (the rental cost for using only RIS instance can be computed as $ (the product of the number of instance, the marginal cost, and total 24 hours)). In contrast, if the ISP only adopts MIS instance, he will obtain the highest resource utilization, and there is an opportunity to reduce the daily rental cost to 230.52$ (from Figure 2, the total number of MIS instances is . Since a phase contains 3 hours, the rental cost for using only MIS instance can be computed as $).

If the ISP uses a hybrid approach which includes both RIS and MIS, on the other hand, the daily instance rental cost can be remarkably reduced. To see that, consider a resource provisioning policy which rents 10 RIS VM instances and acquires extra MIS instances if RIS instances are insufficient. The number of MIS instance can be formally written as where denotes the number of VM instance demand in phase . The daily rental cost for this hybrid approach is $, (the rental cost for RIS instance is $. The total number of MIS instances is ; therefore the rental cost for MIS instance is $. Thus, the total cost is $.), which saves 24.3% and 18.8% compared with using purely RIS and MIS instance, respectively.

The above analysis suggests 2 assumptions. First, the QoS performance in terms of percentile delay can be precisely predicted; second, the number of RIS and MIS instances can be determined to minimize the VM instance rental cost. The following sections explain these two assumptions in detail.

4. A General Optimization Framework for the CDIP Problem

The notations used in this paper are shown in Notations section. The CDIP problem can be formulated as subject to where is the number of RIS instance and , is the number of MIS instance in phase .

Note that, in the CDIP problem, the distribution of is determined by the characteristics of exogenous interactive workload arrivals and the number of active VM instance . As stated in Section 1, this problem is hard to solve, since we can hardly derive an explicit form of constraint (3). In this section, we will show how to approximately characterize constraint (3) and obtain the optimal solution.

4.1. A Learning Algorithm to Characterize the Percentile QoS Constraint in Self-Similar Traffic

Algorithm 1 learns the performance of various instance provisioning policies in the form of percentile delay via the stochastic gradient method. The algorithm first creates a data structure called VP_table (Violation Probability Table), in which each item VP_table[][] estimates the delay violation probability given the number of instance being in phase . The algorithm runs for several iterations to obtain unbiased delay violation probability samples p[][] for each phase . These samples, which can be generated via real system running or simulation, are further smoothed into VP_table[][]. Therefore, VP_table[][] is an unbiased estimation of delay violation probability with VM instances in phase . Variables , and are iteration counter, decision point counter, and instance number counter, respectively. Algorithm 1 has the following property.

Input: , , and SLA specification ; is the number of iterations and is the
number of decision points in a day.}
Output: VP_table;
(1) Create VP_table and initialize each item in VP_table to 0;
(2) Create [ ][ ] and counter; [ ][ ] is a sample of QoS violation ratio of using VM
    instances in phase , and counter logs the number of delay violations in a phase.}
(3) for   to   do
(4)  for   to   do
(5)   for   MIN_NUM to MAX_NUM do
(6)    Log response time for each incoming request;
(7)    if      then
(8)      ;
(9)   end if
(10)   Calculate an unbiased sample of delay violation probability ,
     where is the total number of requests arrived in phase , iteration ;
(11)     ;
(12)   end for {Loop
(13)  end for {Loop
(14) end for {Loop

Proposition 1. Algorithm 1 converges to the unbiased estimation of percentile QoS performance of using VM instances in phase .

Proof. The right-hand side of line 11 in Algorithm 1 can be rewritten as Since p[i][k] is an unbiased sample of percentile QoS performance metric, is the mean value of all samples up to iteration . As long as the end user request arrival process and service process are stationary stochastic processes in phase with VM instances, must be an unbiased estimation of percentile QoS performance as .

In practice, it is impossible to let . In fact, Algorithm 1 converges very fast in our numerical analysis (it converges within tens of iterations). Alternatively, we can also use the following equation as the stop criterion: where is a threshold value to get a desired precision.

4.2. The Instance Provisioning Algorithm

Based on the VP_table, we can obtain the minimum number of VM instances needed to meet the QoS constraints in phase , that is, . To find the number of RIS instances   is equal to solve the following optimization problem subject to where delay is considered as a function of the number of VM instances.

Problems (6)-(7) are an integer piece-wise function of where the optimal solution must appear in the boundary points. Algorithm 2 provides the solution method for problem (6). It can be divided into three parts as follows.(i)The first part (lines 1–8) uses exhaustive search to obtain the minimum number of VM instance required to satisfy QoS constraints. The result is stored in vector .(ii)The second part (lines 9–17) solves problems (6)-(7), and the result is , the optimal number of RIS instances, and the corresponding value of object function .(iii)The third part (lines 18–20) computes the number of MIS instances based on and .

Input: VP_table;
Output: ; is the number of RIS instance, and is the number
of MIS instance in phase .}
(1) for   to   do
(2)  for   MIN_NUM to MAX_NUM do
(3)   if VP_table[ ][ ] and VP_table[ ][ ]   then
(4)     ;
(5)    break;
(6)   end if
(7)  end for
(8) end for
(9) ;
(10) ;
(11) for   to   do
(12)   ;
(13)   ;
(14)  if     then
(15)    ;
(16)  end if
(17) end for
(18) for   to   do
(19)   ;
(20) end for

The worst time complexity of Algorithm 2 is .

5. Extensions

Algorithms 1 and 2 can effectively predict the number of instances needed for satisfying QoS constraints and reducing total rental cost for the ISPs. However, the scalability of these two algorithms is questionable: in order to obtain a precise estimation of the violation probability in VP_table, we must visit all possible instance provisioning policies and get sufficient violation probability samples. This section starts from the point of simplifying VP_table by function approximation techniques to enhance the scalability of Algorithms 1 and 2.

The idea of function approximation is to use a function to approximate the mapping between the number of instances and the violation probability in phase . In this paper, we use two forms of approximation:(i)a linear approximation given by (ii)a nonlinear approximation given by

Note that function is related to a certain phase ; therefore the parameters and have a subscript . We have further remarks for these function approximations as follows.(1)Intuitively, the QoS violation probability decreases as there are more VM instances; that is, is a decreasing function with respect to ; therefore must be negative in the nonlinear case.(2)The value of will all be 0 when exceeds a certain threshold, since no QoS violations occur if there is sufficient number of VM instances. When using linear approximation, we should filter out the case ; otherwise the estimation precision will be remarkably undermined for cases where .

We use the least square approach to obtain parameters and in the approximate function . Formally, the least square approach is given by where is the amount of samples and is the th unbiased sample for violation probability .

For the linear approximation, the optimal solution should satisfy Rearranging these two equations, we have The above analysis suggests

For the nonlinear approximation, let , , , and , and take “” in both sides of (9), which transforms the nonlinear approximation into a linear approximation

Following the idea of the linear approximation, we can obtain the solution for the nonlinear approximation as

We integrate the function approximations into Algorithms 1 and 2 where VP_table is replaced by an array func_app[]. Each item in func_app[] contains 2 elements, that is, and . With function approximations, some revisions are needed for Algorithms 1 and 2, which are shown in Table 5.

6. Evaluations

6.1. Simulation Setup

Internet traffic shows a strong self-similar property [3, 15]. We use the Multiscale Markov-Modulated Poisson Processes (MMPP) model to generate a self-similar like traffic. This approach has been proved effective in previous researches [1618] and was successfully applied in the literatures like [1922]. We use the approach the same as in [22], that is, a three-dimension Markov on-off modulated Poisson process, to generate the interactive workload arrivals. Consider the following.(i)The first dimension is the workload burst in the order of 1 second. We assume that the peak workload arrives at the middle of the day, that is, the 43200th second; therefore the arrival rate as a function of time can be given by (ii)The second dimension of workload burst is 2000 requests per 5 second.(iii)The last dimension of workload burst is 5000 requests per 10 second.

6.2. Estimation of the Response Time

In a production cloud system, it is impossible to log the response time for each incoming request to calculate the delay violation probability. A more practical way is to measure the mean response delay in a small time slot and view as the response delay for all requests arrived in this time slot. This approximation of response delay will be more accurate as the length of the time slot decreases. For example, in [23], the length of the time slot is set to 10 minutes. In our work, we set it to 10 seconds since we need to measure delay violation probability in a higher precision.

To estimate the mean response time in a time slot, we employ the Allen-Cunneen approximation formula [24, 25] for the queueing system: where is the average response time, is the average service rate, is the average arrival rate, is the average utilization of a server, is the number of servers. takes value from the following formula: and are the coefficients of variation of request interarrival times and service times, respectively.

In this paper, we assume a Poisson service process with requests per second; therefore . In order to online estimate , we further divide a time slot into time windows (see Figure 3). The algorithm to estimate is shown in Algorithm 3.

(1) for   to   do
(2) Measure ; is the number of request arrivals in time window .}
(3)   ; {Estimate the average inter-arrival time in time window .}
(4)   ; logs the accumulative total number of request in this time slot.}
(5) end for
(6) ; {Estimate the average inter-arrival times in the time slot.}
(7) ; {Estimate the standard deviation of inter-arrival time.}
(8) ;

6.3. Result Analysis
6.3.1. Cost of Various Instance Provisioning Policies

In this experiment, the length of a phase is set to 1 hr. From Algorithms 1 and 2, we can obtain that the optimal number of RIS instances is 29. Figure 4 shows the cost of three instance provisioning policies. Consider the following.(i)In the RIS mode, the ISP should rent 37 instances in all hours of a day since the system must satisfy the peak workload demand. This policy yields 408.576$ per day.(ii)In the MIS mode, the ISP makes instances provisioning decision in each hour according to the predicted demand; therefore the resource utilization is the highest. Unfortunately, the total daily cost (514.08$) is even higher than the one in the RIS mode.(iii)In the hybrid mode, the optimal number of RIS instances is 29. Although, in some cases, this is a little waste of resource, the daily cost of this policy is the lowest (360.768$).

6.3.2. Effects of the Rental Granularity

The length of the phase (or interdecision time) in the Amazon EC2 is 1 hour. Here, we vary the length to 2 and 3 hours to study its impact on the daily cost. Figure 5 plots the optimal number of reserved instances in each hour. It goes “smoother” as the length of the interdecision time becomes longer. For example, the numbers of reserved instances for the three rental granularities in time interval are , , and . The mean numbers in time interval and in the 3 hr granularity are 30 and 34, and the counterparts in the 1 hr granularity are 29.67 and 33. This implies that the instance provisioning policy could be more flexible as the interdecision time goes small.

Figure 6 presents the total cost for three rental granularities. It is obvious that the total cost is an increasing function of the length of the interdecision time. However, we can also see that this function is not linear; that is, the marginal cost is shrinking as the length of the interdecision time goes smaller. In production systems, a small interdecision time may induce additional system overhead; therefore there should be a tradeoff between the rental cost and system overhead.

Figure 7 describes the impacts of rental granularity to the delay violation probability. Using instance provisioning policies generated by Algorithms 1 and 2, the target SLA specification is satisfied in all three rental granularities. A more detailed comparison is provided in Table 2. The means of delay violation probability in 1 hr granularity and 2 hr granularity are very close to each other, and the one in 3 hr granularity is relatively small, implying that more resources are reserved. On the other hand, the standard deviation of the delay violation probability decreases as the length of interdecision time goes smaller. Since a small standard deviation implies a more stable response delay, we propose to use 1 hr granularity rental policy in delay- and jitter-sensitive applications such as VoIP and video streaming.

6.3.3. Effects of Function Approximations

Here, we evaluate the effectiveness of two function approximation approaches with 1 hr rental granularity. We can obtain parameters and using (13) and (15) for all phases, which are shown in Table 3. Specifically, the results in the first hour are plotted in Figure 8, where we can see that the nonlinear approximation is more accurate than the linear approximation. Figure 9 shows the estimation of VM instance demands. The linear approximation tends to overestimate the demand by 2–4, and the nonlinear approximation underestimates the demand by 0-1. Figure 10 shows the delay violation probability. By using VP_table structure, the delay violation probability is around 4%. The linear approximation approach reduces the delay violation probability to about 1% since it reserves more instances. By contrast, the delay violation probabilities in 13 phases (out of total 24 phases) exceed the target 5% objective. The delay violation probabilities even exceed 9% in the 10th and 16th phases.

The basic instance provisioning algorithm makes the best resource-SLA tradeoff but suffers from the scalability problem. The two function approximation approaches only need to estimate two parameters in each phase. They visit fewer instance provisioning policies and evade the lookup table structure (VP_table); thus the scalability of Algorithms 1 and 2 is enhanced. The effectiveness, however, lies in how well the function approximates the behavior of VP_table. A poor approximation may severely deviate from VP_table and generate a wrong instance provisioning policy which either damages the performance or increases the rental cost. Figures 11 and 12 present the number of RIS instances and total daily rental cost. We can see that the number of RIS instances in the VP_table approach is the same as in the one in the nonlinear approximation approach (29 VMs). The linear approximation approach, although achieves a lower delay violation probability, overestimates the VM instance demand too much (33 VMs).

In order to further evaluate the two function approximation approaches, define the instance deviation and the violation probability deviation as where and denote the number of rented instances (including both RIS and MIS instances) and the violation probability using function approximations and and denote the same parameters but using the VP_table structure. Clearly, smaller and indicate a more accurate approximation. The results are shown in Table 4. The linear approximation achieves a lower violation probability at the expense of a much higher number of instances. In addition, nonlinear approximation has a lower violation probability deviation. Therefore, we purpose to use nonlinear approximation in Algorithms 1 and 2.

7. Conclusions

Dynamic instance provisioning is a key issue for Internet service providers in the cloud computing environment. In this paper, we investigate the coarse-grain (in the order of hours) QoS-aware dynamic instance provisioning problem for interactive workload. The optimization problem in our consideration (see (2)-(3)) is not a traditional optimization problem since the QoS constraint (3) has no analytical form for the self-similar Internet traffic; therefore it cannot be solved using classic methods. We use various approaches, for example, a lookup table and two function approximations to characterize constraint (3). The lookup table approach suffers from the scalability issue, because, in order to obtain a precise estimation of the violation probability in the table, we must visit all possible instance provisioning policies and get sufficient violation probability samples. In contrast, function approximations can predict the performance using a small set of samples. Function approximations (especially nonlinear approximation) address the scalability problem at the expense of a little sacrifice of prediction precision. We conduct extensive simulations to evaluate the effectiveness of the proposed dynamic instance provisioning policy.


The number of phases in a day to make instance provisioning decisions, that is, the number of decision points
:The number of instances needed in phase to meet the QoS requirement
:The marginal rental cost for a RIS instance
:The marginal rental cost for a MIS instance
:The delay in phase
:The threshold delay set by the SLA
:The threshold violation probability set by the SLA.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.


The work is funded in part by the National Natural Science Foundation of China (NSFC) under Grant no. 61363052, the Inner Mongolia Provincial Natural Science Foundation under Grant nos. 2010MS0913 and 2013MS0920, and the Science Research Project for Inner Mongolia College under Grant nos. NJZY14064 and NJZY13120.