Abstract

As the scale of the data centers increases, electricity cost is becoming the fastest-growing element in their operation costs. In this paper, we investigate the electricity cost reduction opportunities utilizing energy storage facilities in data centers used as uninterrupted power supply units (UPS). Its basic idea is to combine the temporal diversity of electricity price and the energy storage to conceive a strategy for reducing the electricity cost. The electricity cost minimization is formulated in the framework of finite state-action discounted cost Markov decision process (MDP). We apply -Learning algorithm to solve the MDP optimization problem and derive a dynamic energy storage control strategy, which does not require any priori information on the Markov process. In order to address the slow-convergence problem of the -Learning based algorithm, we introduce a Speedy -Learning algorithm. We further discuss the offline optimization problem and obtain the optimal offline solution as the lower bound on the performance of the online and learning theoretic problem. Finally, we evaluate the performance of the proposed scheme by using real workload traces and electricity price data sets. The experimental results show the effectiveness of the proposed scheme.

1. Introduction

Cloud computing is an emerging Internet-based computing paradigm which offers on-demand computing services to cloud consumers. To meet the increasing demands of computing and storage resources in cloud computing, there is an increasing trend toward large-scale data centers. As more data centers are deployed and their scale increases, energy consumption cost is becoming the fastest-growing element in their operation costs, including the computing energy cost, cooling energy cost, and other energy overheads. It has been estimated that energy consumption cost may amount to 30%–50% percentage of operation cost of large-scale data centers built by companies such as Google, Microsoft, and Facebook [1]. In fact, data centers consumed approximately 1.5% of all electricity consumption worldwide in 2010, which was about 56% higher than the preceding five years [2, 3]. In the near future, the energy consumption cost problem of data centers is likely to worsen and be more challenging since the technology infrastructures emerge and upgrade from a recessionary period. Hence, efficiently controlling the electricity cost of data centers has attracted an intensive concern of broader research community participating from both academia and industry in the recent years.

As we know, electricity cost generation depends not only on the total amount of energy consumed by the data centers, but also on the electricity price. Therefore, the electricity price is also an important factor in the electricity cost of data centers. With the development of smart grid technology which is a technology for the next generation power grid, more and more electricity markets are undergoing deregulation where the electricity market operators offer dynamic electricity rates to large industrial and commercial customers instead of traditional flat rates at the retail level. Thus, there is an opportunity for us to achieve the electricity consumption cost saving in data centers by observing and utilizing the time-varying electricity price in the deregulated electricity markets.

Normally, the UPS units may be deployed in data centers, and provide emergency energy to power them up using stored energy before the backup diesel generators (DG) can start up and operate as a secondary power source when the main power system experiences an outage. Usually, the transition from the main power system to the secondary power source takes 10–20 seconds. As an improvement of the rechargeable battery, the UPS units have enough energy storage capacity for keeping a data center working 5–30 minutes at its maximum power demand [4]. Hence, the excessive energy storage capacity gives us a good opportunity for electricity cost saving utilizing the UPS units to dynamically control energy storage.

Based on the above two facts, the basic principle for achieving the electricity cost saving is recharging the UPS units residing in the data center when the outside electricity price is low and discharging for powering the data center when the outside electricity price is high. Hence, this paper focuses on a dynamic energy storage control strategy for reducing the electricity cost of the data centers. Dynamic energy storage control is expected to adapt the fluctuation of the electricity price and the workload by dynamically making recharge/discharge decisions for the UPS units. It aims for achieving substantial electricity cost saving without performance degradation.

In this paper, we formulate the electricity cost reduction problem utilizing energy storage facilities as the discounted cost Markov decision process. Since the statistical information about the workload arrival and the electricity price is not available, we propose an online algorithm based on -Learning and Speedy -Learning approaches to solve the optimization problem. Particularly, the main contributions of this paper are summarized as follows.(i)The problem of electricity cost minimization in data centers with energy storage facilities for time-varying electricity prices under deregulated electricity markets is modeled by a discounted cost Markov decision process, which achieves the cost saving by making decisions to recharge/discharge the battery.(ii)In order to solve the optimization problem, we propose a dynamic energy storage control strategy based on the -Learning algorithm, which avoids the reliance on any prior knowledge of the workload and the electricity prices. Furthermore, we introduce a Speedy -Learning algorithm to accelerate convergence of the standard -Learning.(iii)We formulate an offline optimization problem of electricity cost minimization for obtaining the optimal offline solution as the lower bound on the performance of the online and learning theoretic problem. The offline optimization problem is solved by mapping it into a tractable mixed integer linear programming instead of nonlinear programming.(iv)Finally, the experiments are carried out based on real workload traces and electricity price data sets to show the performance of the proposed scheme. By using the real traces that may not provably follow the Markovian assumption, the result also shows that the proposed scheme generally performs well.

The rest of the paper is organized as follows: in Section 2 some related works in this area are presented and discussed. Section 3 describes a system model for energy management system using energy storage facilities in date centers. Section 4 formulates the problem of electricity cost consumption in the data centers with energy storage facilities as a discounted cost Markov decision process. Section 5 is devoted to designing a dynamic energy storage control strategy of battery based on -Learning and Speedy -Learning algorithms to solve the optimization problem. The optimal offline solution is discussed in Section 6. In Section 7, we provide the numerical evaluation results and performance comparisons. Finally, conclusions are drawn in Section 8.

The severe energy consumption problem in data centers has motivated many works on reducing their electricity cost. These works may be roughly categorized into two basic types of mechanisms: (1) reduce the energy consumption or improve the energy efficiency of the data centers; and (2) exploit the temporal and geographical variation of electricity prices to achieve the electricity cost saving.

Regarding the first mechanism, new hardware designs and engineering techniques such as energy-efficient chips, multicore servers [5], DC power supplies [6], advanced cooling systems [7, 8], and virtualization [9, 10] have been developed in order to improve the power utilization efficiency (PUE) of data centers. From the perspective of algorithm design, the energy consumption saving can operate at two different levels: the server level and the data center level [4]. At the server level, dynamic voltage-frequency scaling (DVFS) [11] offers a way to reduce power consumption by adapting both voltage and frequency of CPU with respect to changing workloads. However, DVFS can be applicable only for components (like CPU) that support multiple speed and voltage levels. DVFS based power saving policies can be found in [12, 13]. Dynamic power management (DPM) is another energy conservation approach, which turns off the power or switches the system to a low-power state when inactive. It can be employed for any system component with multiple power states. In [14], DPM is applied to achieve energy-efficient computation by selectively turning off (or reducing the performance of) system components when they are idle (or partially unexploited).

At the data center level, dynamic cluster reconfiguration (DCR) [15], VM migration and consolidation for load balancing and power management [16], and so forth, approaches are widely discussed for reducing energy consumption in the data centers. DCR in [15] develops an online measurement based algorithm to decide the number of servers to power on/off to achieve energy saving while keeping the overload probability below a desired threshold, which makes a decision without any prior knowledge of the workload statistics. VM migration and consolidation [16] achieve energy saving by continuous consolidation of VMs according to current resource utilization, virtual network topologies connecting VMs, and thermal state of computing nodes. These methods mentioned above mainly focus toward reducing energy consumption to save electricity cost. They can operate as a complementary way to assist the method proposed in this paper to further reduce the electricity cost.

The second mechanism for reducing electricity cost relies on the fact of the notable temporal and geographical variations in electricity prices. In [1], Qureshi et al. develop and analyze a new method for reducing the electricity costs when running large Internet-scale systems. The key idea of the method is to distribute more traffic to data centers with low electricity price. In [17], Rao et al. utilize both the location diversity and the time diversity of electricity prices in the multiple electricity markets environment to minimize the total electricity cost while guaranteeing the quality of service (QoS). Luo et al. [18] propose a novel spatiotemporal load balancing approach to leverage both geographic and temporal variations of electricity price to minimize energy cost for distributed internet data centers (IDC). However, those works mentioned above do not utilize energy storage facilities residing in data centers, which may be used to achieve further electricity cost saving. Compared with existing techniques for electricity cost reduction, the methods of energy storage have no performance degradation of the data center. In this paper, our work focuses on the problem of electricity cost minimization in data centers with energy storage facilities under deregulated electricity markets where the electricity prices exhibit temporal variation, which is mainly motivated by [19]. In [19], an online control algorithm using Lyapunov optimization theory is proposed for reducing the time average electric utility bill in a data center, and the solution has the threshold structure. Although simple, the technique of Lyapunov optimization is unable to learn the system dynamics, which may not lead to an optimal control of energy storage. Alternatively, by exploiting a Markov decision process approach and reinforcement learning tool, the proposed algorithms learn the system dynamics and adapt the control decision accordingly for saving more electricity cost. Generally, the optimal control policies for Markov decision process suffer from the “curse of dimensionality.” In our work, we consider the total energy consumption of all components in the data center as the energy consumption state instead of each component’s individually. Furthermore, there are only three actions on the battery, that is, recharging, discharging, and doing neither. Thus, all of those considerations may effectively alleviate the problem of “curse of dimensionality.”

3. System Model

In this section, we describe system architecture model for energy management in data center, present the models for battery, energy consumption, and electricity cost, as well as formulating the problem of dynamic energy storage control to minimize the expected total electricity cost.

3.1. System Architecture

A general system architecture model for data center with energy storage facilities, depicted in Figure 1, is composed of an energy management system (EMS) and a data center facility. EMS acts as the heart of the energy management framework and manages the energy provision in data center, while the data center facility provides computation and storage resources for executing the submitted tasks. In EMS, the key components include information collector (IC) and energy storage management unit (ESMU). IC is to collect the information of the electricity prices, energy storage, and the energy demand generated by the data center periodically, while ESMU is to make the optimal decision on whether recharging or discharging the energy storage facilities for electricity cost minimization according to the information collected by IC. The energy storage unit (ESU), that is, UPS, has the capability of storing energy drawn from the power grid and discharging the stored energy to power the data center. Below, we use the terms UPS and battery interchangeably. The main work of this paper is to propose a dynamic energy storage control strategy for ESMU.

The basic running process of EMS can be generally described as follows. IC periodically collects the battery level information as well as the electricity price information from the grid. The data center submits its energy demand information to IC, and ESMU uses this information to make the decision that the energy supply draws from grid or the battery. Finally, the data center can provide services using the energy managed by EMS.

3.2. Mathematical Model

In this subsection, we introduce the time-slotted system model used in this paper, and the time is divided into slots of equal duration of minutes. It should be noted that small value of the time slot size, , is beneficial for characterizing the state variation of the system in a small time granularity, thus achieving a better cost saving policy due to its prompt adaptation to the changes of the system state. But it may increase the battery cost owing to the increased switching frequency switching of recharge/discharge battery. Therefore, a time slot size should be appropriately selected. The energy storage control decisions are made at the beginning of each slot, and the system’s state is assumed to be constant throughout each slot.

From [4], we know that the energy consumption demand of data center in each slot is proportional to the total number of workload requests needed to be served in that slot. The workload requests served in each slot consist of the unfinished requests in the last slot and the new incoming requests in current slot, which implies that the energy consumption demand of data center in each slot depends upon the previous energy demand, not upon other history demands, and it fulfills the Markov property. Thus, we model the energy consumption demands of data center in each slot as correlated time processes following a first-order discrete-time Markov model. The energy consumption demand in each slot is assumed to be known at the beginning of a time slot. In reality, this has to be estimated. There are several effective methods for estimating the workload, such as autoregressive and moving-average (ARMA). Let be energy consumption demand of data center in the slot , , where is the number of elements in . The elements in have nonnegative and finite values; that is, , for . denotes the state transition probability which means that the probability of state transition from to is .

The energy market usually consists of Day-Ahead market and Real-Time market [1]. In this paper, we consider the data centers in Real-Time market. Real-time market is a spot market in which the current real-time price is calculated every five minutes or so, based on actual grid operating conditions, rather than expected load. The electricity price in Real-Time electricity market in the slot is denoted by , where , is the number of elements in , and the elements are assumed to be nonnegative and finite values; that is, , for . Following [20], we model the electricity price as a Markov chain, and denotes its state transition probability.

In current data centers, UPS units use lead-acid batteries typically. There are several characteristics of battery operation when using a lead-acid battery practically. For a given battery, each recharge-discharge cycle has energy loss due to AC-DC conversion, so the battery may not be completely efficient, and its performance is affected by the recharge efficiency and discharge efficiency [21]. The energy in the battery is also subject to dissipation over time; it exhibits a leaky character. However, considering that storage leak loss is much smaller than that of interest to us, it is negligible for lead-acid batteries [19]. The recharging rate is assumed to be constant. This is a reasonable assumption when the battery recharges in the constant current way. To assess the impact of repeated recharging and discharging on the battery’s lifetime, we assume that each recharge and discharge operation incurs a fixed cost of and , respectively. From [19], we have when a new battery costs dollars and it can sustain recharge/discharge cycles. Let be the battery energy level in the slot , which is no more than battery capacity of ; that is, for all . The UPS unit is mainly employed to power data center using the stored energy in case of power failure before the backup diesel generators start up and provide power. In order to ensure the reliability of the data center, the battery energy level is required to maintain a minimum energy level ; that is, for all . Hence, the battery energy level is subject to a constraint:

Let be the decision variable of the event that the battery is recharged/discharged in the slot . Without loss of generality, we assume that recharge/discharge operations cannot be done simultaneously; that is to say, we can either recharge or discharge the battery or do neither, but not both. Thus, can be defined as follows: Let represent the amount of energy bought to recharge the battery in the slot , and denote the energy used towards satisfying demand in the slot . Then, and can be expressed as follows: where is an indicator function, defined as The update equation for the battery energy level in the slot can be expressed as where implies that the energy purchased to recharge the battery is reduced by the recharge efficiency , while implies that only a fraction of the discharged energy is converted into electricity under the discharge efficiency .

According to inequality (1), the battery level cannot exceed its maximum capacity and be lower than the minimum battery level. Therefore, and have to satisfy the constraints as follows:

Let represent the external energy drawn from the power grid in the slot , which is used to power data center and recharge the battery. As shown in Figure 1, in order to meet the energy consumption demand for powering the data center in the slot , we have

Thus, the total amount of energy drawn from the grid in the slot can be written as For notational simplicity, according to the indicator function , can also be denoted as

Define as the total immediate cost incurred in the slot . Then, we have for all where the term in the first equation is the electricity cost for the energy consumption in the slot , while the term represents the battery cost for each recharge and discharge operation.

In this paper, the goal of dynamic energy storage control is to minimize the expected total electricity cost in the data centers with energy storage facilities. Based on the above models, the problem can be formulated as follows: where denotes expectation operator, and is the discount factor that represents value reduction over time. The reason for considering discounted electricity costs is to emphasize early decisions and costs, in order to emulate the effect of reduced battery efficiency over time. Note that the total discounted electricity cost is finite, since the per-slot costs are bounded. We call this problem the expected total electricity cost minimization problem (ETC-problem) as the data center aims at minimizing the total electricity cost.

According to (10), (11) can be rewritten as

3.3. Discussion

In data center, the lower-level management routines like server consolidation and instantiation of new VMs may be executed. Different management routines may have different demand profiles of energy consumption. But once the lower-level management routine is given, the demand profile for the workload is determined and can be mathematically modeled. Hence, we can still apply the above mentioned model to achieve the electricity cost saving.

4. Cost Management Problem as an MDP

In this section, we will map the problem (12) into the framework of Markov decision process (MDP). A Markov decision process, also referred to as a discrete time stochastic control process, provides a mathematical framework for modeling decision-making situations where outcomes are partly random and partly under the control of the decision maker [22]. An MDP can be defined via a 4-tuple , where(i) is the finite set of states,(ii) is the finite set of actions,(iii), denotes the probability that the system is in state at the ()th slot when the decision maker chooses action in state at the th slot,(iv) denotes the immediate cost yielded when the state of the system at the th slot is , action is selected, and the system occupies state at the ()th slot.

The energy management system in data center, as described above, can be formulated as a finite-state discrete-time MDP. In the model, let denote the joint state (hereafter state) of the system at the th slot, and consists of the energy consumption demand , the battery energy level , and the electricity price . Thus can be expressed as the triple . Since all components of are discrete and finite, the number of elements in is finite, and the set of states can be denoted by , where is the number of elements in . An action set represents all allowable actions in all possible states. According to the definition of in (2), let be the set of actions for the system, where action indicates that the battery is discharged while action indicates that the battery is recharged, and action indicates that the battery is neither recharged nor discharged. A policy specifies the decision rule to be used at all decision epoches, and here the time of making decision-making is referred to as decision epoches. The policy provides the decision maker with a prescription for action selection under any possible future system state or history. The policy , maps the state space to the action space. In this paper, we restrict our attention to stationary deterministic policies that do not depend on time but only on the current state. Let denote the transition probability from state to state when action is taken. The immediate electricity cost is when the action is taken in state at the th slot, and then the state changes to at the ()th slot. Thus, the objective of an MDP is to find the optimal energy storage control policy that minimizes the expected total discounted electricity cost for the energy consumption in the data center over an infinite time horizon. Here the immediate cost function can be expressed as and the expected total discounted electricity cost is equivalent to (12), and is the action taken when the system is in state .

As described in Section 3, the energy consumption demand and the electricity price can be described by the state transition probability functions, while the battery energy level can be uniquely derived by the update equation (5) under the given policy and the current system state . Since the system state consists of the energy consumption, the battery energy level, and electricity price, the transition of the system state depends only on the current state and the current action. This means that the model described above fulfills the Markov property which indicates that a state depends only on the previous state not on more previous states. Thus, we can make use of dynamic programming (DP) and reinforcement learning (RL) theories to solve the problem (12). For convenience, we will introduce the definition of the state-value function and action-value function before solving the MDP problem [23].

Being in search of an optimal policy, the decision maker needs a facility to differentiate the desirability of possible successor states, in order to decide on the best action. A common way to rank states is by computing and using a so-called state-value function which estimates the expected discounted sum cost when starting in a specific state and taking actions determined by policy . Accordingly, the state-value function for policy is defined as follows: Equation (14) is also called the Bellman equation for , and it expresses a relationship between the value of a state and the value of its successor states.

Similarly, define the value of taking action in state under the policy , denoted by , as the expected discounted cost starting from , taking the action , and thereafter following policy . is expressed as is referred to as action-value function for policy .

For finite MDPs, an optimal policy can be precisely defined in the following way. A policy is defined to be better than or equal to a policy if its expected discounted cost of is less than or equal to that of for all states. In other words, if and only if for all . Let be the optimal policy which is better than or equal to all the other policies. Accordingly, the state-value function under the optimal policy is Intuitively, (16) expresses the fact that the value of a state under the optimal policy must equal the expected discounted cost for the best action from that state. So we can see that the optimal policy is the greedy policy. According to (16), the optimal action-value function under the optimal policy can be written as

As seen from the above analysis, in order to minimize the expected total electricity cost, we can obtain the optimal policy by learning -value, instead of estimating the demand and real-time electricity prices to solve (11) directly. Thus, solving the ETC-problem requires the prior information on the values of . Unfortunately, accurate probability distribution of state transition is usually difficult to be known beforehand in practice. Consequently, and cannot be computed using value iteration. To overcome this difficulty, we consider applying a model-free learning theoretic algorithm based on RL to arrive at an optimal policy which minimizes the expected discounted total cost by taking actions and observing their corresponding costs. In the next section, the detailed learning theoretic algorithm is presented.

5. Learning Theoretic Algorithm

In this section, we will introduce learning theoretic algorithms, namely, -Learning and Speedy -Learning, which we have used to find optimal energy storage control policy. -Learning is a reinforcement learning (RL) algorithm for solving the MDP problems and it directly estimates under the assumption that the system’s dynamics are completely unknown a priori. It is a well-known model-free algorithm, so the main advantages of the algorithm are simple and easy implementation as well as online operation [24]. Hence, -Learning is well-suited for our ETC-problem. The core of -Learning algorithm is a -table and an algorithm for updating the -table and choosing actions. A -table is a matrix indexed by state and action , which is the expected discounted cost of taking action in state .

According to (15), we can see that the action-value function can be expressed as a combination of the expected immediate cost and the state-value function of the next state when following the policy . Note that provides the expected long-term consequences for each state-action pair. Then the action incurring the lowest expected cost can be taken as the optimal action just by observing . Hence, the optimal action-value function allows optimal actions to be selected without knowing anything about , and we can derive the optimal policy by estimating . The -Learning process tries to find in a recursive manner. Let be the estimate of in the th iteration. Then, in each slot the update process of the estimate can be described as follows:(i)observe the current state ,(ii)choose action , and then perform the chosen action ,(iii)observe the next state , and receive an immediate cost ,(iv)update the estimate according to where is the learning rate in the th iteration, and it is responsible for weighing the newly learnt experience. The sequence can be proven to converge with probability 1 to as when satisfies the stochastic approximation conditions and further, , [25]. can be initialized arbitrarily for all .

Based on the above discussion, the estimate can be used for determining an action. However, the optimal action is determined depending on the accurate estimate for . Otherwise, there will always be cases that the actions with current minimum cost are not producing the real lowest cost return. During the learning process, unguided randomized exploration cannot guarantee acceptable performance, while taking greedy actions exploiting the available information in can guarantee a certain level of performance, but exploiting what is already known about the system prevents the discovery of better actions. In order to estimate accurately, the action selection method should harmonize the trade-off between exploitation and exploration such that EMS can reinforce the evaluation of the actions it already knows to be good but also explore new actions. Here, we consider the -greedy method. This method selects a random action (explores) with probability and the best action (exploits), that is, the one that has the lowest -value at the moment, with probability at each slot, where . Therefore, exploration probability provides -Learning to be able to continuously explore itself in the new environment for other possibilities of actions despite of the current lowest cost.

Although it has been shown that the sequence converges to the optimal action-value function , -Learnging suffers from slow-convergence when the discount factor is close to one. To address this problem, asynchronous Speedy -Learning (ASQL) method is applied to improve the convergence rate. At each slot step, ASQL uses two successive estimates of the action-value function to update the -values for achieving a faster convergence rate than standard -Learning. The update process for ASQL is described as follows: where the action is chosen in state using the -greedy exploration method, and the system occupies state next. Let and . Then, , and are calculated, respectively, by

In the ASQL algorithm, let decay linearly with time; that is, , where is the number of learning iteration. Note that other (polynomial) learning steps can also be used with Speedy -Learning. However, it has been shown that the rate of convergence of ASQL is optimized for [26]. Intuitively, the third term in the right-hand side of (19) does not play a role for small , , and the aggressive steps are taken as increases when the error in the estimate is large. Further, when is very large, the error of the estimate goes to zero as approaches its optimal value , and then there has , thus the third term does not affect the updates.

By applying the proposed scheme, we can obtain the optimal energy storage control policy using storage facilities in data centers for electricity cost minimization. The more detailed procedures of the proposed scheme are presented in Algorithm 1.

(1) Initialize:
for each ,   do
 Initialize arbitrarily
end for
Initialize learning counter
Initialize starting state
(2) Learning:
repeat
 Decide to explore/exploit action with probability
if exploration then
  Choose action at random
else if exploitation then
  choose action
end if
 Take action
 Observe the next state
 Receive an immediate cost
 Calculate according to (20)
 Calculate according to (21)
 Update the estimate as follows:
      +
       +
 Update the current state
 Update learning counter
until  

6. Optimal Offline Solution

In this section, we give a lower bound on the performance of the learning theoretic problem by the optimal offline solution, which is employed as a benchmark to evaluate the optimality of the proposed learning theoretic algorithm. In order to formulate the offline optimization problem, we assume that all the future workload arrivals as well as the electricity price variations are known noncausally before the decisions of energy storage control are made. This information can be obtained from the traces of the workload and electricity price in advance. Online learning theoretic problem optimizes the expected total electricity cost over an infinite horizon while the offline solution does that over a realization of the MDP for a finite number of time slots. As previously described, an MDP realization is a sequence of state transitions of the workload, the battery energy level and the electricity price state processes for a finite number of time slots. Hence, we can optimize such that the expected total electricity cost is minimized for a given MDP realization in the offline problem. According to (12), the offline optimization problem can be written as follows:where and .

From definition (4), it can be seen that the function is nonlinear. So the problem in (22a), (22b), (22c), (22d), and (22e) is a nonlinear programming (NLP) problem where the objective function or some of the constraints are nonlinear [27]. As we all know, it is difficult to solve the nonlinear optimization problem. For this reason, we will show that (22a), (22b), (22c), (22d), and (22e) can be mapped into a tractable linear programming before solving it.

Let us define the following variables regarding recharge and discharge operations in the slot , respectively: where indicates the recharge operation in the slot , while indicates the discharge operation in the slot . Under the assumption that the recharge/discharge operations cannot be done simultaneously, there is a constraint on and in each slot as follows: Here, we define the vector as the joint decision variable to control the recharge/discharge operation in the slot . As a result, the optimization problem (22a), (22b), (22c), (22d), and (22e) can be rewritten aswhere is an optimal sequence of control decisions to (26a), (26b), (26c), (26d), (26e), and (26f), and .

From (26a), (26b), (26c), (26d), (26e), and (26f), we can observe that the objective and constraint functions are linear. Moreover, the optimization variables and are constrained to be binary. Therefore, the problem in (26a), (26b), (26c), (26d), (26e), and (26f) is a mixed integer linear programming (MILP) problem. Currently, many existing tools can solve the MILP problem, such as GLPK [28], YALMIP [29], and Ip_solve [30]. In this paper, we employ Ip_solve to solve the proposed MILP problem. Ip_solve is a free linear (integer) programming solver based on the revised simplex method and the Branch-and-bound method for the integers, and it can solve pure linear, (mixed) integer/binary, semicontinuous and special ordered sets (SOS) models.

7. Performance Evaluation

In this section, the performance of the proposed dynamic energy storage control scheme is characterized quantitatively. Real-world workload traces and electricity price data sets are employed to evaluate the performance of the proposed scheme. In the following, we elaborate on the design of the experiments and presenting the experimental results.

7.1. Experimental Setup

In the experiments, we simulated a cloud-scale data center which hosts up to servers [4]. For simplicity, we assume that the servers in data center are homogeneous, and it is easy to extend the experiments for the heterogeneous servers with little modifications. In order to evaluate the performance of the proposed scheme, we conducted experiments based on real-world workloads and electricity price data sets.

7.1.1. Workload Data

The real workload request is extracted from trace data gathered from Intel Netbatch Grid in 2012 [31]. We set the time slot size to 15 minutes, and count the number of job requests executed in each slot. The original traced period is only one month, so we repeat it for obtaining a three-month workload trace to complete the performance evaluation. Figure 2 shows the variations of workload requests in each 15 min period for four days. In order to perform the experiment with a larger-scale workload of data center, the number of requests extracted from Intel Netbatch Grid has to be scaled up. One of the approaches to scale up the workloads is to capture the underlying structure of the trace by separating the steady part and random part from the original workloads trace, and to scale the steady part up and add random part to it. However, it requires an appropriate method to capture the characteristics of the random part. We will try this approach to perform our experiment in the future work. In the current experiment, we assume that the number of users is scaled up by 1000 times, and accordingly, the number of requests should be statistically scaled up by 1000 times. The normal power consumption demand for the workload request of data center in each slot can be approximated by the following formula [4]: where , , , and are constants determined by the data center. Particularly, is the average energy consumption of a server in one slot when it is idle, and denotes the number of workload requests served by one server in the slot . Hence, gives the energy consumption of one server when it serves requests in one slot. denotes the number of active servers in each slot and has the maximum value . is the ratio of total power drawn by a data center facility (including cooling power) to IT equipment power. In today’s energy-efficient data centers, the value of PUE is in interval from 1.1 to 2.0 generally, for example, Google data center has the average PUE of 1.12 in 2012 [32]. In our experiment, we set . The Intel Netbatch Grid is used for running its chip-simulation workloads, and it takes considerable time to serve one request by one server. According to the real workload trace, the average service time of each workload served by one server is 7-8 minutes, so we set . According to [4], , Watt, when the CPU type of server in data center is AMD Athlon and service rate is 2 requests/s. Accordingly, we set , Watt after calculation and in our experiment.

7.1.2. Electricity Price Data

We use real-time electricity prices at Houston obtained from the Electric Reliability Council of Texas (ERCOT), and the real-time electricity prices vary on a 15 min basis [33]. The time horizon we consider in the experiment covers the period from January 1 to March 31, 2013. In these three months, there are 8640 real-time electricity price samples. Figure 3 shows the real-time electricity price variation characteristics at Houston from January 3 to January 6, 2013.

In the experiments, we simulated a time slotted system with slot duration of 15 minutes, that is, . The unit for energy consumption or battery energy level is MWh, and the unit for real-time electricity price is $/MWh. We discretize the energy consumption demand into 4 equal interval bins, with the boundaries specified by  MWh, and choose the energy consumption demand state space to be  MWh. Similarly, real-time electricity price is also discretized into 4 equal interval bins, with the boundaries specified by  $/MWh, and  $/MWh is chosen to be as the electricity price state space. For a given maximum battery capacity , we also discretize the battery energy level into 4 equal interval bins, with the boundaries specified by ,  MWh, and choose the battery energy level state space to be  MWh. Meanwhile, we let the power used for recharging the battery in one slot is 500 MWh, that is,  MWh, and let the discount factor , which has been justified in [34]. Since the minimum energy level is a constant, the value of has no effect on the experimental results, and we set in the experiments.

7.2. Experimental Results

In order to demonstrate the performance improvement of the proposed dynamic energy storage control algorithm, we considered the Lyapunov optimization algorithm [19] and the offline optimization problem. The Lyapunov optimization algorithm makes decisions to recharge/discharge the battery for minimizing the electricity cost using the solution with threshold structure. The solution of the offline optimization problem can be considered as an lower bound on the performance of the proposed learning theoretic algorithm and the Lyapunov optimization algorithm.

7.2.1. Impact of the Number of Learning Iteration

In the first experiment, we intend to investigate the convergence rate and performance improvement of the proposed scheme using the real-world workload and electricity price traces. Let denote the number of learning iterations. The value for covers , and the battery maximum capacity is chosen to be  MWh. The initial battery energy level is set to be zero, that is, , and we evaluate the optimal policy for a fully efficient battery (). The new battery costs involve a unit price (in $ per MWh) [35]. That is, for a new battery with the capacity , the battery cost is given by . Here, we set to 100 $/MWh, and the recharge/discharge cycles to 2800. The other parameters for the Lyapunov optimization are set as follows: the constant , where denotes the maximum of the real-time electricity price, the control parameter , and the maximum power that can be drawn from the grid in any slot , where is the maximum energy consumption demand in any slot. The parameters set above for Lyapunov optimization are justified in [19].

In Figure 4, we illustrate the expected total electricity cost by the -Learning based approaches against the number of learning iteration, , together with the performance of the Lyapunov optimization approach. As shown in Figure 4, it can be observed that for the -Learning based approaches have less total electricity cost than Lyapunov optimization, and for the Speedy -Learning algorithm with yields approximately 11% more electricity cost than the offline solution. We also see that the expected total electricity costs for the -Learning based approaches decrease as the number of iteration learning increases, while the costs for the Lyapunov optimization and the data center without energy storage facilities do not vary with . The reason for the trend of the total costs of -Learning based approaches with is that the larger implies that more accurate is estimated, thus the policy taken by estimated is closer to the optimal policy, so the lower cost is yielded. The result shows that the -Learning based approaches can approximate the optimal policy with increasing accuracy as increases. From the Figure 4, it can also be observed that for Speedy -Learning algorithm with a low exploration probability () causes low learning rate, compared to the exploration probability (). Speedy -Learning algorithm with has faster convergence rate of to than the standard -Learning (). This is because that the speedy -Learning algorithm uses two successive estimates of the state-action value function to update the Q-values in order to achieve faster convergence. Since larger is more likely to explore better action which might remain unexplored, it can accelerate convergence of to . Therefore, with a suitable choice of , Speedy -Learning algorithm may be able to strike a balance in the exploration versus exploitation trade off, and achieve a faster convergence rate.

Figure 5 shows the long-run average electricity cost for different number of learning iteration . It can be observed from Figure 5 that the long-run average electricity costs of the -Learning based approaches decrease as the number of learning iteration , while the costs for the Lyapunov optimization and the data center without energy storage facilities remain unchanged as varies. For , the Speedy -Learning algorithm ( or ) yields lower average cost than the Lyapunov optimization algorithm. Compared with the Speedy -Learning algorithm with and standard -Learning algorithm, the Speedy -Learning algorithm with has better performance. The reason is that for smaller number of learning iteration , the error between the Q-value estimated by -Learning based algorithm and the optimal Q-values is larger, then the policies are not optimal and this results in higher average costs. As the number of learning iteration increases, more accurate Q-values are estimated and Speedy -Learning with larger also accelerates convergence of Q-values, and then more cost can be saved.

7.2.2. Impact of Battery Capacity

In this subsection, we further carried out an experiment in order to investigate the impact of the battery capacities of data centers by setting  MWh. We chose the number of learning iteration , the exploration probability . The other parameters and the simulation settings were the same as those in Section 7.2.1.

In Figure 6 we show the impact of battery capacity, , on the expected total electricity cost for . It can be observed that the expected total electricity cost decreases upon increasing , that is, the larger the battery capacity is, the more cost saving by the Speedy -Learning based scheme can be obtained. Additionally, we also see that the Speedy -Learning algorithm with yields at most approximately 10% more electricity cost than the offline solution, and lower than the Lyapunov optimization algorithm. The reason is that for larger battery capacity the Speedy -Learning based scheme would be likely to make the optimal policy to store more power at lower prices, while the threshold structure of the optimal solution for the Lyapunov optimization algorithm has no capability of learning system dynamics, and it stores power at the prices lower than the thresholds, but higher than the prices used by the Speedy -Learning based scheme.

Figure 7 shows the long-run average electricity costs for different battery capacities . We plot the performance of the Speedy -Learning based scheme for and , compared with the other approaches. From Figure 7, it can be observed that as increases, the average costs yielded by Speedy -Learning algorithm and Lyapunov optimization algorithm decrease, while the Speedy -Learning algorithm achieves more average cost saving than the Lyapunov optimization algorithm.

8. Conclusion

In this paper, we investigated the problem of electricity cost minimization of data centers using energy storage for time-varying electricity prices under deregulated electricity markets, which was formulated as a discounted cost Markov decision process. A dynamic energy storage control strategy based on the -Learning algorithm was designed to reduce the electricity cost, and we also applied the Speedy -Learning algorithm in order to accelerate convergence. The advantage of the proposed scheme is that it makes decision without any priori information about the energy management system of the data centers, and it can also adapt to the variations of the workload and the electricity prices. We also studied the offline optimization problem which was characterized as an MILP problem, and its optimal solution can be considered as a lower bound on the performance of the proposed algorithm. In the experiments, real workload traces and electricity price data sets were used for verifying the performance of the proposed scheme. The results illustrated the effectiveness of the proposed scheme in saving the electricity cost via comparison with the benchmark algorithm. Results for the real traces that may not provably follow the Markovian assumption also show that the proposed scheme generally performs well.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work was supported by National “863” Project of China (no. 2012AA050802), the Fundamental Research Funds for the Central Universities (WK2100100021), and National Natural Science Foundation of China (no. 61174062).