Advanced Cloud Computing and Novel ApplicationsView this Special Issue
Research Article | Open Access
Modeling and Analysis of Queueing-Based Vary-On/Vary-Off Schemes for Server Clusters
A cloud system usually consists of a lot of server clusters handling various applications. To satisfy the increasing demands, especially for the front-end web applications, the computing capacity of a cloud system is often allocated for the peak demand. Such installation causes resource underutilization during the off-peak hours. Vary-On/Vary-Off (VOVO) schemes concentrate workloads on some servers instead of distributing them across all servers in a cluster to reduce idle energy waste. Recent VOVO schemes adopt queueing theory to model the arrival process and the service process for determining the number of powered-on servers. For the arrival process, Poisson process can be safely assumed in web services due to the large number of independent sources. On the other hand, the heavy-tailed distribution of service times is observed in real web systems. However, there are no exact solutions to determine the performance for M/heavy-tailed/m queues. Therefore, this paper presents two queueing-based sizing approximations for Poisson and non-Poisson governed service processes. The simulation results of the proposed approximations are analyzed and evaluated by comparing with the simulated system running at full capacity. This relative measurement indicates that the Pareto distributed service process may be adequately modeled by memoryless queues when VOVO schemes are adopted.
The numbers of Internet requests are not uniformly distributed over time. There are a huge number of requests during the peak hours. Cloud service providers tend to install surplus server nodes to handle the bursty load. Clearly, these servers waste a lot of energy during the off-peak periods. Dynamically adjusting the number of active servers, that is, Vary-On/Vary-Off (VOVO) scheme, improves energy-efficiency of server clusters. However, overly shrinking the number of powered-on servers may lead to decreased service quality. Therefore, finding the right number of active servers to balance energy consumption and operation performance is a primary issue of an applicable VOVO scheme.
VOVO schemes can be dated back to earlier last decade [1, 2]. The basic idea of earlier VOVO schemes is to dynamically size a cluster according to CPU utilization or resource usage. This resource provisioning problem in a cluster can be analogous to the staff sizing problem in a telephone call center. In a call center, the customers are callers, servers are telephone agents, and tele-queues consist of callers that await service by an agent. The well-known Erlang-C model  has been widely applied to this problem. Many recent VOVO studies [4–9] adopt queueing analysis to manage resource usage of clusters.
Most available analytic solutions in queuing theory rely on independence assumptions and Poisson processes . Internet traffic patterns are well known to possess extreme variability and bursty structure . The heavy-tailed distributions of service times are observed in real web systems [12, 13]. This characteristic is characterized by self-similar process . Pareto distribution is a popular model of self-similar processes . However, queueing models with Pareto distributed service times are very difficult to analyze . Although heavy-tailed service processes in web systems are widely documented, memoryless queues are still used for evaluating system performance in many studies [17–21]. On the other hand, studies [22–25] that adopt general/Pareto distributions need approximations for the analytically intractable distributions to obtain the performance measures.
The Poisson arrival process is particularly appropriate if the arrivals are from a large number of independent sources , such as users of web services. However, exploring the difference between modeling service times with Poisson process and non-Poisson process governed queues remains a challenging research topic, since many queueing models remain analytically intractable . In order to understand the performance difference between modeling service times with Poisson process and non-Poisson process governed queues, a series of simulations are conducted in this study. Compared with the mathematical analysis and numerical methods, simulation is more time and memory consuming but it is sometimes the only way to get reasonably accurate results .
This paper presents the approximations of VOVO cluster sizing for systems modeled by and . Randomly generated workload traces with Pareto and exponential distributed service times are simulated using the approximations. Two distinct types of real web access logs are simulated as well. A relative performance evaluation method is proposed and used for gauging the simulation results. Through the evaluation, the performance difference between modeling service times with Poisson process and non-Poisson process governed queues is found. The result suggests that based sizing approach may be adequate when a queueing-based VOVO scheme is adopted in a cluster.
This paper is organized as follows. Section 2 shows the approximation methods for cluster sizing. Section 3 details the simulation setup and the evaluation metric. Section 4 presents the simulation process and discusses the results. Section 5 concludes this paper.
2. Approximation for Queueing-Based Cluster Sizing
Investigations in a queueing theory applied system mainly aim at getting the performance measures, which are the probabilistic properties of the random variables, including number of customers in the system, number of waiting customers, utilization of the servers, response time of a request, waiting time of a customer, idle time of the server, and busy time of a server. These measures heavily depend on the assumptions concerning the distributions of interarrival times and service times as well as number of servers and service discipline. Queueing analysis can be naturally applied to the performance measures of server clusters. Server clusters have been widely adopted in many cloud data centers to resolve the increasing user needs . Although heterogeneity is common in multifunctional cloud data centers, server closets or blade systems that form the basic computing units usually consist of homogeneous nodes. Therefore, this work focuses on single-queue homogeneous systems.
The symbols and definitions used in this paper to describe the performance measures of queueing systems are shown in Symbols and Definitions.
In classical queueing analysis, supposing that requests are handled by a single-queue homogeneous server system with the First-Come First-Serve (FCFS) discipline, exponentially distributed service times, and Poisson process governed arrival intervals, the system can be modeled as system. must be less than 1 () for system being in a stable state. Many performance measures of a stable system have been thoroughly studied and are shown in (1) to (7). The calculations and proofs of these equations can be found in many textbooks, for example, [29, p. 412]:
2.1. Approximation for Sizing Modeled Clusters
In a homogeneous system, of each server is identical. From (5), of system can be considered as a function of denoted by :Let be an arrival rate of system maintaining a targeted response time given . The curves of for , , , , and with a targeted response time are shown in Figure 1.
can be easily obtained from (5):For , , based on (7), can be represented asFrom (1) and (2), isTherefore, to get of for , the following equation has to be solved: can also be easily obtained by solving (13) with :
It is difficult to get a closed-form expression of in terms of , , and when . Therefore, an approximation is proposed for for . Assume that this approximation can be applicable for the systems with at most servers. Every , , is shifted with the offset value of and denoted as :Figure 2 shows the combination of the curves of and , , with emphasis on the intersections between the targeted response time and these curves.
By observing Figure 2, the distances between all consecutive and approximately form an exponential decay series . Let the series be approximated by an exponential decay function, let be the initial quantity, and let be the exponential decay constant. An element in the series can be expressed asLet the initial quantity ; can be obtained from (10):From (17), (16), (10), and (14), is can be obtained by rearranging (18):Therefore, can be represented asLet . For a positive integer , can be approximated asConsequently, with an anticipated arrival rate and the measured service rate , the number, denoted as , of servers that maintain the targeted mean response time can be approximated as
2.2. Approximation for Sizing Modeled Clusters
Internet workload characterization has found that the probability of service times is not an exponential distribution but a heavy-tailed distribution in real web systems [12, 14, 30]. In other words, a single-queue -server cluster should be referred to as queue for Internet services. There are no exact formulas for the mean response time of system, but numerous approximations can be used. Kingman’s Exponential Law of Congestion is a popular approximation that is calculated using the coefficient of variation of service times and known solutions from queues. Kingman’s approximation is expressed asLet . The mean response time of system can be expressed asLet represent the mean response time of system on different arrival rates. Based on (8), can be expressed asAlthough rises at a more precipitous rate than , the correlation observed in Figure 2 and aforementioned approximation still remain valid.
Let the variables , , and be the correspondences in model to the variables , , and previously mentioned in model. The mean response time for system can be approximated based on the Pollaczek-Khintchine transform:Suppose that the targeted response time is still ; then
Similar to the process from (14) to (20), the following equations can be derived:With an anticipated arrival rate and the measured service rate , the number, denoted as , of servers that is expected to maintain the required mean response time can be approximated as
3. Simulation Setup and Evaluation Metric
A cluster managed by a VOVO scheme periodically adjusts the number of active servers that provide the required services. In general, there are several key functional components including the following:(1)Job queue: the job queue holds the waiting requests. Each request enters the tail of the queue and waits for service in FCFS manner. In this work, all jobs share a common queue.(2)Workload distributor: the workload distributor retrieves a job from the head of the job queue and distributes the job to an available node.(3)Cluster sizing unit: this unit decides the number of active servers. The decision may be based on some predefined thresholds of certain resources, for example, CPU utilization, job throughput, and energy usage. In this work, the decision is calculated based on (22) or (29) according to the given arrival rate, mean service rate, and targeted response time.(4)On/off controller: the on/off controller periodically activates or deactivates server nodes according to the number given by the sizing unit.(5)Managed servers: the cluster consists of a group of identical computer nodes, which may be commodity servers. Each server node processes the assigned jobs and reports its working status to the workload distributor.
3.1. The Design of Simulation Program
A simulation program for the VOVO managed system is developed to investigate the performance of the proposed sizing methods. This program is written using the C++ programming language.
In a real VOVO managed system, every incoming job is queued, and an event notification is issued to the workload distributor upon the arrival of a job. If there are available nodes, the workload distributor then dispatches the queued jobs to the available nodes. If a node has completed its assigned job, it also sends an event notification to inform the distributor about its availability. The instructions of node activation and deactivation are periodically issued by the on/off controller. If a deactivation command is issued to a busy node, the node will complete the processing job before turning itself off. However, it will be extremely time consuming to simulate the system with time-based event-driven process. Since the input workload traces have to be readily prepared for the simulation, this work adopts the sequential process that significantly reduces the simulation time. The simulation process is shown in Algorithm 1.
3.2. Randomly Generated Traces
A set of randomly generated traces and two real-world traces are simulated in this work. The most widely used heavy-tailed distribution as the service time distribution is the Pareto distribution . The Poisson distribution is appropriate if the arrivals are from a large number of independent sources, such as web requests [10, 32]. Therefore, the randomly generated traces have Pareto distributed service times with tail indexes from 0.1 to 4.0 stepping by 0.1 and exponentially distributed arrival intervals with traffic intensities from 0.05 to 0.95 stepping by 0.05.
A randomly generated trace with a tail index and a traffic intensity is represented by . is a series of pairs of an arrival time, denoted by , and a service time, denoted by . Suppose that has elements; it can be represented asEach unique combination of and is randomly generated 10 times. That is, there are 10 different traces for a combination of and . Each trace contains values covering 36,000 time units. All traces are generated with the same mean service time. Therefore, there are 7,600 randomly generated traces which have been simulated in this study. The generating functions for Pareto distributed values and exponential distributed values can be found in many textbooks, for example, [10, p. 509]. The coefficient of variation is often used to measure the relative variation in the data and is the ratio of the standard deviation to the mean. For Pareto distributed values, the coefficient of variation denoted by of can be calculated as The coefficient of variation of exponential distributed values is supposed to be 1. The coefficients of variation of service times and arrival intervals of the generated traces are shown in Figures 3(a) and 3(b), respectively.
(a) CV of service times versus tail index
(b) CV of arrival intervals versus tail index
3.3. Real-World Traces
This simulation adopts two real-world workload traces that include a publicly available trace and a trace acquired from a university campus. The service time of a request is assumed to be proportional to its responded page size in the simulation.
The publicly available trace was recorded at the 1998 World Cup web site . This workload trace is one of few logs providing server activation records. It is known for having a heavy-tailed page-size distribution with a tail index of 1.37 . Each request recorded in the log contains an arrival time, a responded page size, and a server identification. The 1998 World Cup log was collected from 05:30:17 May 1, 1998, through 05:59:55 July 27, 1998, a total of 87 days. The log exhibits the following characteristics: 1,352,804,107 requests, 33 hosting servers, 4,040.684 bytes per response in average, 108.71 requests per second per server (the peak service rate) , and an average service time of 0.0092 seconds per request with a standard deviation of 0.084.
The second workload trace is acquired from a university with a student population of 4,219, including 3,531 undergraduates. This web access log was collected from 12:03:59 September 19, 2014, through 00:01:39 October 21, 2014, a total of 31 days. The trace log is from a site hosting a student information system that provides course information, handouts/homework systems, message system, email system, and other campus information. The log exhibits the following characteristics: 7,054,170 requests, 8 hosting servers, 5,991.64 bytes per response in average, 74.74 requests per second per server (the peak service rate), an average service time of 0.0134 seconds per request with a standard deviation of 0.227, and a tail index of 0.154 of the service time distribution.
The hourly traffic patterns of the 1998 World Cup log and the 2014 campus log are shown in Figures 4(a) and 4(b), respectively. The two logs represent two distinct service patterns including an occasional service pattern, that is, 1998 World Cup, and a regular service pattern, that is, student information system. The World Cup log shows a growth-decay pattern. An iterative pattern analogous to the daily working hours is observed in the campus log. Note that there are a school break and a scheduled maintenance during the recorded period.
(a) Traffic pattern of the World Cup log
(b) Traffic pattern of the campus log
For the World Cup log, the simulated cluster consists of 33 servers based on the information given in the log. For the campus log, the simulated cluster consists of 8 servers. As for the randomly generated traces, the simulated cluster consists of 10 servers. The on/off controller periodically sizes the simulated cluster with the interval set at 300 seconds, which are long enough to compensate the machine boot-up delays and short enough to reflect the demand changes [1, 2, 33].
3.4. Evaluation Metric
Three simulation scenarios, which are all-on, , and , are performed. All servers in a cluster are always powered on in all-on scenario. This scenario is expected to consume the most energy but to have the best service quality. The scenario uses (22) to approximate the number of servers. The scenario is similar to except that (29) is used for the sizing approximation. Nielsen’s  response time limits for usability are adopted by setting the targeted response time at 1 second and the failure threshold at 10 seconds.
The objective of a VOVO scheme is to reduce the energy consumption while maintaining a reasonable service quality. To gauge the performance of an approach (denoted by ), relative measures to all-on are adopted instead of absolute measurements, since the all-on scenario must have the least response time and the highest energy consumption. The considered factors of a scenario are as follows:(1)being satisfactory, denoted by , which is the portion of responses conforming to the targeted response time;(2)acceptance, denoted by , which is the portion of responses being admissible (i.e., under the failure threshold);(3)energy, denoted by , which is the average number of activated servers, since all servers are identical and have the same power profile.
The relative measurements of , , and are defined as
Let , , and be the weighting coefficients for , , and , respectively. The relative performance, denoted by , is defined as
4. Simulation Results and Analysis
4.1. Simulation Results
With this relative measurement, that is, (33), the optimal solution produces the minimal value of . The simulation results of randomly generated traces are summarized by the relative performance of the simulated scenarios to all-on with , , and . In order to make the results be easily comprehended, the relative performances of and are graphically visualized using gray level. Figure 5 shows the relative performance, that is, , of scenarios and , with , , and . It is very difficult to visually differentiate Figures 5(a) and 5(b). Using the averaged values, as shown in Figure 6, it can be found that has a slightly better performance than . In average, which is based on Figure 6, scenarios and outperform all-on under most cases except when the tail index is between 0.4 and 0.9. Furthermore, the averaged relative performances shown in Figure 6(a) are clearly correlated with the coefficient of variation of service times (as shown in Figure 3(a)). This simulation result indicates that both and yield a worse performance than all-on for diverse access patterns. This may imply that these approaches undersize the cluster for high variation of service times.
(a) Relative performance of scenario
(b) Relatively performance of scenario
(a) Relative performances versus tail index
(b) Relative performances versus traffic intensity
In Figures 6(a) and 6(b), the curves of and are indistinguishable under those scales. In fact, the relative performances of scenarios and are not identical. Figure 7(a) shows the ratios of to . There are some regions between tail indexes 0.3 and 1.3 where the ratios are not 1, that is, identical. In Figure 7(b), the average of is always less than or equal to 1, which means that based sizing is more effective than based sizing. However, Figure 7 also shows that the difference is very small, that is, under 1% in average. Given the fluctuation nature of web traffic, based sizing may be adequate for empirical practices.
(a) Relative performance ratios of to
(b) Average of relative performance ratios
In order to examine above findings, two real-world traces are simulated under previously mentioned scenarios, that is, all-on, , and . Figure 8 shows the cumulative distribution of the response times of the simulated real-world traces. As shown in Figure 8(a), all requests in scenario all-on can be served within 1 second, but only approximately 80% of requests can be handled for this targeted response time in scenarios and . The curves of and are also indistinguishable in Figure 8(a). In Figure 8(b), more than 99.96% of requests in scenario all-on can be served within 1 second. More than 97% of requests can be handled for this targeted response time in and scenarios. The curves of and are also indistinguishable in Figure 8(b).
(a) CDF of response times (World Cup)
(b) CDF of response times (campus)
Based on the relative performance, that is, , Table 1 shows that and are very similar in both cases. As expected, all-on always has the shortest mean response time but the most energy consumption. The proposed queueing-based sizing approaches, that is, and , can reduce significant energy consumption while maintaining a reasonable service quality.
4.2. Analysis and Comparison
Energy consumption and service quality of the server machines are two major performance measures for a cloud service provider. The above results are fully based on simulation. To evaluate the proposed strategy on a real system, a 6-hour log is extracted from the World Cup trace and fed to a cluster consisting of 33 computers. In addition to the 33-node cluster, there are an external computer that hosts other key functional components mentioned in Section 3 and a network switch connecting all nodes and the external computer through 1000BASE-T Ethernet. The extracted log contains 22,821,177 access records, which are from June 29, 1998, 17:20:00 GMT to June 29, 1998, 23:19:59 GMT. Each node of the cluster is equipped with a dual-core 1.66 GHz Intel Atom N280 processor and 1 GB of memory. All nodes use Linux 2.6 as the operation system with Apache 2.2 installed. The average power demand is 20.83 Watts when an idle node waits for a request with all its parts being turned on. The peak power level of a node that was instrumented is 26.33 Watts. The node profile of the test cluster is shown in Table 2.
In the evaluation, the on/off controller periodically sizes the cluster with the interval set at 300 seconds. Interval energy data of the cluster, excluding the external computer and the network switch, is instrumented and stored by a digital multimeter (DMM). The evaluation result is shown in Figure 9 and conforms to the simulation results. As shown in Figure 9(a), with all nodes turned on, that is, all-on scenario, all requests are responded to within 1 second, while only approximately 92% of requests can be responded to within 1 second for either scenario or scenario. On the other hand, both scenario and scenario consume much less energy than all-on scenario, as shown in Figure 9(b). Similar to the simulation results, the curves of and are also very close to each other in both Figures 9(a) and 9(b).
(a) Response times
(b) Energy usage
VOVO strategy has been studied for more than a decade. Many VOVO approaches [33, 35–37], which dynamically size a cluster according to a preset threshold of CPU utilization or resource usage, were developed based on the designs proposed by Chase et al.  or Pinheiro et al. . To compare the proposed queueing-based approach with the threshold-based approaches, Pinheiro’s approach  is simulated and denoted as vovo scenario. In vovo, the service demand is smoothed and estimated using the cumulative moving average. vovo periodically activates one more node of the cluster when the estimated utilization rate exceeds a predefined threshold and deactivates one node otherwise. The World Cup trace is also used in the simulation of vovo. Since vovo uses the threshold of CPU utilization rate instead of the response time as a controlling factor, 3 different threshold values, which are 0.7, 0.8, and 0.9, are simulated to get a comparable result.
The simulation results of vovo are evaluated with the metric proposed in Section 3.4 and compared with all-on and , as shown in Table 3. From this comparison, the threshold of the CPU utilization rate has to be less than 0.8 for vovo to get a comparable result with . Although vovo outperforms with the threshold set at 0.7, it requires more nodes and therefore consumes more energy than . In order to get a reasonable threshold value for vovo, it may be necessary to go through several runs of simulation or other lengthy procedures. On the other hand, the proposed approach minimally requires only the anticipated arrival rate , the service rate , and the desired response time to approximate the required number of servers , that is, (22).
This paper proposes two queueing-based sizing methods to periodically adjust the number of servers in a cluster. The proposed method aims at achieving a fair energy-delay performance trade-off of server clusters. The proposed approximation formulas, that is, (22) and (29), are simple closed-form expressions, which may be implemented in a network switch for real-time processing.
From the simulation results, the schemes with the proposed approximation formulas reduce considerable amount of energy consumption while maintaining comparable service performance for gentle service time fluctuations. However, the proposed methods tend to underestimate the number of required servers for service processes with high variability, that is, tail index between 0.3 and 1.3. Similar observation has also been documented in .
The relative measurements of and are almost undifferentiated, except that is very slightly better than for service processes with high variability. Although Internet workload characterization has found that the probability of service times is a heavy-tailed distribution, periodically resizing the cluster is possible to alleviate the situation of long jobs blocking short jobs in the waiting queue. Because once a deactivation command is issued to a busy node, the node becomes a pending-off node that has to complete the unfinished job before turning itself off. If a long job is handled by this pending-off node, the queued jobs can be quickly assigned to other newly activated nodes of the next period without waiting for the finish of that long job. Therefore, sizing the cluster based on model or model makes little difference. Based on the simulation results, the simpler model may be adequate and preferable for sizing clusters adopting queueing-based VOVO schemes.
Server clusters are widely adopted in cloud data centers . In order to support various kinds of services including user-end applications and back-end activities, heterogeneity becomes common in multifunctional cloud data centers. It is popular that a data center has different group of servers with different computation capacities. Since the basic computing units that are grouped for specific function usually consist of the same type of machines, the proposed approach is built based on the assumption of homogeneous nodes. Therefore, the proposed approach is particularly pertinent for the computing units forming the underling base of cloud data centers. Nevertheless, extending this work to the heterogeneous environments is an immediate future work of this study. The multitier system is an obvious case of server heterogeneity and is widely adopted in many enterprise systems. There are many approaches which have been proposed to address the applicability of queueing models on multitier systems, such as Multitier Internet Applications , Heterogeneous Multitier Web Clusters , Layered Queueing Networks (LQN) [39, 40], and Power-Saving Server Farms . The job dispatching [42, 43] and scheduling [44, 45] also arise as important issues in a heterogeneous environment. Considering these related developments and integrating the proposed approach with the existing work may be a practical way to extend this study to a heterogeneous environment.
Symbols and Definitions
|:||The job arrival rate of a queueing system|
|:||The mean service rate of a server in a queueing system|
|:||The mean service time of a server in a queueing system|
|:||The standard deviation of the service times in a queueing system|
|:||The number of servers in a queueing system|
|:||The traffic intensity,|
|:||A system state, which is the same as the number of jobs in the system|
|:||The probability of a state|
|:||The coefficient of variation of service times in a queueing system,|
|:||The number of jobs in system|
|:||The number of jobs in system|
|:||The number of busy servers in system|
|:||The mean value of|
|:||The mean value of|
|:||The mean value of|
|:||The response time of a job in system|
|:||The response time of a job in system|
|:||The response time of a job in system|
|:||The response time of a job in system|
|:||The waiting time of a job in system|
|:||The waiting time of a job in system|
|:||The mean value of|
|:||The mean value of|
|:||The mean value of|
|:||The mean value of|
|:||The mean value of|
|:||The mean value of.|
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
This study is funded by the Ministry of Science and Technology (Taiwan) under Grant no. NSC 101-2632-E-036-001-MY3 for the project A Study of Applications and Examinations on the Smart Meter Enabled Electricity Grid.
- J. S. Chase, D. C. Anderson, P. N. Thakar, A. M. Vahdat, and R. P. Doyle, “Managing energy and server resources in hosting centers,” SIGOPS—Operating Systems Review, vol. 35, pp. 103–116, 2001.
- E. Pinheiro, R. Bianchini, E. Carrera, and T. Heath, “Load balancing and unbalancing for power and performance in cluster-based systems,” in Proceedings of the Workshop on Compilers and Operating Systems for Low Power (COLP '01), vol. 180, pp. 182–195, Barcelona, Spain, 2001.
- E. Brockmeyer, H. L. Halstrm, A. K. Erlang, and A. Jensen, The Life and Works of A.K. Erlang, Transactions of the Danish Academy of Technical Sciences, Akademiet for de Tekniske Videnskaber, 1948.
- R. Guerra, L. Bertini, and J. Leite, “Improving response time and energy efficiency in server clusters,” in Proceedings of the 8th Workshop de Tempo, p. 8, Curitiba, Brazil, May 2006.
- D. Meisner, B. T. Gold, and T. F. Wenisch, “PowerNap: eliminating server idle power,” ACM SIGPLAN Notices, vol. 44, no. 3, pp. 205–216, 2009.
- X. Zheng and Y. Cai, “Markov model based power management in server clusters,” in Proceedings of the 2010 IEEE/ACM International Conference on Green Computing and Communications and International Conference on Cyber, Physical and Social Computing (CPSCom '10), pp. 96–102, Washington, DC, USA, 2010.
- R. Buyya, A. Beloglazov, and J. Abawajy, “Energy-efficient management of data center resources for cloud computing: a vision, architectural elements, and open challenges,” in Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA '10), pp. 6–17, CSREA Press, 2010.
- A. Gandhi, V. Gupta, M. Harchol-Balter, and M. A. Kozuch, “Optimality analysis of energy-performance trade-off for server farm management,” Performance Evaluation, vol. 67, no. 11, pp. 1155–1171, 2010.
- A. Gandhi, M. Harchol-Balter, R. Raghunathan, and M. A. Kozuch, “Autoscale: dynamic, robust capacity management for multi-tier data centers,” ACM Transactions on Computer Systems, vol. 30, no. 4, article 14, 2012.
- R. Jain, The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling, John Wiley & Sons, 1991.
- M. Harchol-Balter and A. B. Downey, “Exploiting process lifetime distributions for dynamic load balancing,” ACM Transactions on Computer Systems, vol. 15, no. 3, pp. 253–285, 1997.
- M. E. Crovella, M. S. Taqqu, and A. Bestavros, “Heavy-tailed probability distributions in the world wide web,” in A Practical Guide to Heavy Tails: Statistical Techniques and Applications, R. J. Adler, R. E. Feldman, and M. S. Taqqu, Eds., pp. 3–25, Birkhäuser, Boston, Mass, USA, 1998.
- A. Williams, M. Arlitt, C. Williamson, and K. Barker, “Web workload characterization: ten years later,” in Web Content Delivery, X. Tang, J. Xu, and S. Chanson, Eds., vol. 2 of Web Information Systems Engineering and Internet Technologies Book Series, pp. 3–21, Springer, New York, NY, USA, 2005.
- D. Ersoz, M. S. Yousif, and C. R. Das, “Characterizing network traffic in a cluster-based, multi-tier data center,” in Proceedings of the 27th International Conference on Distributed Computing Systems (ICDCS '07), p. 59, IEEE, Toronto, Canada, June 2007.
- S. Mirtchev and R. Goleva, “Discrete time single server queueing model whit a multimodal packet size distribution,” in Proceedings of the Conjoint Seminar on Modeling and Control of Information Processes, T. Atanasova, Ed., pp. 83–101, Sofia, Bulgaria, 2009.
- M. J. Fischer, D. M. B. Masi, D. Gross, and J. F. Shortle, “One-parameter pareto, two-parameter pareto, three-parameter pareto: is there a modeling difference?” Alcatel Telecommunications Review, pp. 79–92, 2005.
- A. Gandhi and M. Harchol-Balter, “How data center size impacts the effectiveness of dynamic power management,” in Proceedings of the 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton '11), pp. 1164–1169, September 2011.
- D. Meisner, B. T. Gold, and T. F. Wenisch, “The powernap server architecture,” ACM Transactions on Computer Systems, vol. 29, no. 1, article 3, 2011.
- H. Goudarzi, M. Ghasemazar, and M. Pedram, “SLA-based optimization of power and migration cost in cloud computing,” in Proceedings of the 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid '12), pp. 172–179, May 2012.
- Z. Liu, Y. Chen, C. Bash et al., “Renewable and cooling aware workload management for sustainable data centers,” in Proceedings of the 12th ACM SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer Systems, pp. 175–186, June 2012.
- A. Gandhi, S. Doroudi, M. Harchol-Balter, and A. Scheller-Wolf, “Exact analysis of the M/M/k/setup class of Markov chains via recursive renewal reward,” ACM SIGMETRICS Performance Evaluation Review, vol. 41, no. 1, pp. 153–166, 2013.
- Y. Chen, A. Das, W. Qin, A. Sivasubramaniam, Q. Wang, and N. Gautam, “Managing server energy and operational costs in hosting centers,” ACM SIGMETRICS Performance Evaluation Review, vol. 33, pp. 303–314, 2005.
- D. Meisner, C. M. Sadler, L. A. Barroso, W. Weber, and T. F. Wenisch, “Power management of online data-intensive services,” in Proceeding of the 38th Annual International Symposium on Computer Architecture, pp. 319–330, San Jose, Calif, USA, June 2011.
- Y. Zhang, Y. Wang, and X. Wang, “Electricity bill capping for cloud-scale data centers that impact the power markets,” in Proceedings of the 41st International Conference on Parallel Processing (ICPP '12), pp. 440–449, September 2012.
- B. Urgaonkar, P. Shenoy, A. Chandra, and P. Goyal, “Dynamic provisioning of multi-tier internet applications,” in Proceedings of the 2nd International Conference on Autonomic Computing (ICAC '05), pp. 217–228, June 2005.
- V. Gupta, M. Harchol-Balter, J. G. Dai, and B. Zwart, “On the inapproximability of M/G/K: why two moments of job size distribution are not enough,” Queueing Systems, vol. 64, no. 1, pp. 5–48, 2010.
- D. Meisner and T. F. Wenisch, “Stochastic queuing simulation for data center workloads,” in Proceedings of the Exascale Evaluation and Research Techniques Workshop, p. 9, March 2010.
- X. Liao, L. Hu, and H. Jin, “Energy optimization schemes in cluster with virtual machines,” Cluster Computing, vol. 13, no. 2, pp. 113–126, 2010.
- K. S. Trivedi, Probability and Statistics with Reliability, Queuing, and Computer Science Applications, Wiley-Interscience, 2nd edition, 2001.
- M. Arlitt and T. Jin, “Workload characterization study of the 1998 world cup web site,” IEEE Network, vol. 14, no. 3, pp. 30–37, 2000.
- Z. Tari, A. K. A. Phan, M. Jayasinghe, and V. G. Abhaya, On the Performance of Web Services, Springer, 2011.
- H. Gupta, A. Mahanti, and V. J. Ribeiro, “Revisiting coexistence of poissonity and self-similarity in internet traffic,” in Proceedings of the IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS '09), pp. 1–10, London , UK, September 2009.
- E. N. Elnozahy, M. Kistler, and R. Rajamony, “Energy-Efficient Server Clusters,” in Power-Aware Computer Systems, B. Falsafi and T. Vijaykumar, Eds., vol. 2325 of Lecture Notes in Computer Science, pp. 179–197, Springer, Berlin, Germany, 2003.
- J. Nielsen, Usability Engineering, Morgan Kaufmann Publishers, 1993.
- W. Chen, F. Jiang, W. Zheng, and P. Zhang, “A dynamic energy conservation scheme for clusters in computing centers,” in Embedded Software and Systems, vol. 3820 of Lecture Notes in Computer Science, pp. 244–255, Springer, Berlin, Germany, 2005.
- X. Zheng and Y. Cai, “Optimal server provisioning and frequency adjustment in server clusters,” in Proceedings of the 39th International Conference on Parallel Processing Workshops (ICPPW '10), pp. 504–511, IEEE, San Diego, Calif, USA, September 2010.
- W. Wei, L. Junzhou, S. Aibo, and D. Fang, “Energy-aware dynamic server provisioning and frequency adjustment in multi-tier data centers,” Journal of Internet Technology, vol. 14, no. 4, pp. 609–618, 2013.
- P. Wang, Y. Qi, X. Liu, Y. Chen, and X. Zhong, “Power management in heterogeneous multi-tier web clusters,” in Proceedings of the 39th International Conference on Parallel Processing (ICPP '10), pp. 385–394, IEEE, San Diego, Calif, USA, September 2010.
- G. Franks, P. Maly, M. Woodside, D. C. Petriu, and A. Hubbard, “Layered queueing network solver and simulator user manual,” Tech. Rep., Department of Systems and Computer Engineering, Carleton University, 2005.
- Y. Shoaib and O. Das, “Web application performance modeling using layered queueing networks,” Electronic Notes in Theoretical Computer Science, vol. 275, no. 1, pp. 123–142, 2011.
- S. Wang, W. Munawar, X. Liu, and J.-J. Chen, “Power-saving design in server farms for multi-tier applications under response time constraint,” in Proceedings of the 2nd International Conference on Smart Grids and Green IT Systems (SMARTGREENS '13), pp. 137–148, May 2013.
- V. Gupta, Stochastic models and analysis for resource management in server farms [Ph.D. thesis], Intel Corporation, 2011.
- C.-J. Tang, M.-R. Dai, C.-C. Chuang, Y.-S. Chiu, and W. S. Lin, “A load control method for small data centers participating in demand response programs,” Future Generation Computer Systems, vol. 32, no. 1, pp. 232–245, 2014.
- B. Urgaonkar, G. Pacifici, P. Shenoy, M. Spreitzer, and A. Tantawi, “An analytical model for multi-tier internet services and its applications,” ACM SIGMETRICS Performance Evaluation Review, vol. 33, no. 1, pp. 291–302, 2005.
- M. Mazzucco and D. Dyachuk, “Balancing electricity bill and performance in server farms with setup costs,” Future Generation Computer Systems, vol. 28, no. 2, pp. 415–426, 2012.
Copyright © 2015 Cheng-Jen Tang and Miau-Ru Dai. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.