Abstract

Distributed cloud has been widely adopted to support service requests from dispersed regions, especially for large enterprise which requests virtual desktops for multiple geodistributed branch companies. The cloud service provider (CSP) aims to deliver satisfactory services at the least cost. CSP selects proper data centers (DCs) closer to the branch companies so as to shorten the response time to user request. At the same time, it also strives to cut cost considering both DC level and server level. At DC level, the expensive long distance inter-DC bandwidth consumption should be reduced and lower electricity price is sought. Inside each tree-like DC, servers are trying to be used as little as possible so as to save equipment cost and power. In nature, there is a noncooperative relation between the DC level and server level in the selection. To attain these objectives and capture the noncooperative relation, multiobjective bilevel programming is used to formulate the problem. Then a unified genetic algorithm is proposed to solve the problem which realizes the selection of DC and server simultaneously. The extensive simulation shows that the proposed algorithm outperforms baseline algorithm in both quality of service guaranteeing and cost saving.

1. Introduction

Distributed cloud computing has been widely adopted to support service requests from dispersed regions by exploiting the differences of locations and various service capabilities. As one of the most promising services in cloud computing, Virtual desktop technology [1, 2] facilitates users accessing virtual machines (VMs) [3] named as virtual desktops (VDs), deployed in remote data centers (DCs) by the local thin clients. Compared with traditional personal computers or desktops, the local clients are equipped with less resources and there is no data or file saved in local clients, hence reducing the initial investment greatly and also realizing better confidentiality. Moreover, it can be more green [4]. VDs have attracted the interest of many CSPs such as Microsoft [5], VMware [6], Huawei [7], and other traditional telecom operators [8]. It has been widely used around the world [7].

The service delivery scheme is depicted in Figure 1. VDs are placed in geodistributed DCs closer to the branch companies (BCs). Users in each BC access cloud services through one specified gateway by using local thin clients. Because of the relatively small scale of distributed DC or the specific availability policy [9] (e.g., an upper limit of VMs of one BC is designated in one DC), it is possible that more than one DC is used to accommodate the VDs for a big BC [10]. The distributed DCs are often connected by dedicated high-speed links or expensive long distance links. Inside each DC, servers are normally networked in a tree-like topology. In the context of infrastructure as a service, VD is corresponding to VM (in this paper, we use servers and physical machines (PMs), VDs, and VMs interchangeably). The traffic delay inside DC is mainly dominated by switches or routers in the path. The cost matrix [11] in Figure 1 demonstrates the number of switches between each PM for a tree-like topology, such as PortLand, VL2, and BCube.

CSP faces great challenges in VD service delivery. On the one hand, it should guarantee the service level agreement. Especially for the latency sensitive VDs, it should shorten the response time on the premise of keeping the delay within the threshold agreed. The delay consists of three parts. The first is the time between thin client and the gateway. The second is the time between the gateway and DC. The third is the queueing delay inside DC. Now that the first is fixed for each BC once the gateway is specified, so we consider the latter twos. For the second one, we only consider a one-way delay. Delay optimization drives the CSP to deploy VMs in DC closest to each BC. But maybe the closest one is not the best candidate from the economic perspective.

On the other hand, aiming to make maximum profit, CSP also strives to reduce cost as much as possible. At DC level, appropriate DC should be selected to pursue cheaper electricity price and less long distance inter-DC links consumption. Sometimes competition will occur where adjacent BCs compete for the resource of one DC. It complicates matters even more. Inside each DC, VMs should be consolidated and stay together so that fewer PMs are used, power is saved, and less inter-PM bandwidth is consumed. Because nodes, bandwidth, and power comprise 75% of DC cost [12], they are the prime consideration to optimize.

When CSP selects DCs with cheaper electricity price and closer to BC, it must consider the service type, resource capability of PMs inside each DC. The normal two-phase scheme selects DCs according to power and location objective, then determines PMs, and assigns VMs to the hosts. It is a natural but maybe not a good strategy, because the determination of DCs has limited the candidates of PMs to a large extent. Inside each DC, the assignment of the VMs to these PMs cannot guarantee the optimal overall cost. So the decision of DCs is also subject to the assignment of VMs to proper PMs inside each DC. In nature, there is a noncooperative relation which leads to a bilevel structure. Each level pursues different objectives. Some are consistent and others are not (it is detailed in Section 3). CSP should balance quality with cost by considering both DC and server selection simultaneously.

To the best of our knowledge, our work is the first to explore unified algorithms for multi-BC virtual desktops placement. The main contributions are as follows:(1)We formulate the problem as multiobjective bilevel programming considering resource provision at both DC and server level.(2)A novel unified algorithm, segmented two-level grouping genetic algorithm (STLGGA), is proposed. It can realize the selection of DC and server simultaneously. It also minimizes the response delay between VDs and BCs at the least cost consisting of bandwidth, server nodes, and power.(3)Extensive simulation demonstrates that STLGGA outperforms the baseline algorithm for both multi-BC and single BC scenarios. For multi-BC, it achieves a 13% shorter delay, while saving power and resources by 21% and 6% on average, respectively.

The remainder of the paper is organized as follows. Related work is reviewed in Section 2. Section 3 formulates the problem and Section 4 presents a novel GA. It is evaluated in Section 5 and the whole paper is concluded in Section 6.

Various objectives of virtual desktops placement have been explored recently. Most works take into consideration the resource efficiency inside DC [2, 1416]. Man and Kayashima [14] use bin packing (BP) method to find the group of VDs which is suitable for a server so that minimal hosts are used. Makarov et al. [15] propose a tool for evaluating VDs. It focuses on assessing the effect of VD access protocol so as to guide the resource configuration according to the reaction time of the task execution. Deboosere et al. [2] study different aspects to optimize resources and user satisfaction. The resource requirement is predicted first, and then overbooking category is adopted and resource allocation and resource reallocation are used to achieve load balance or energy efficiency. This method provides a VDs resource management framework. VDs secure sharing is explored by [16]. But, in all these works, no location of VDs and inter-DC resource are considered.

Other works investigate VDs deployment across DCs. Kochut [17] prefers placing VDs across DCs in different location to lower the power consumption. But no service quality is touched. OpenFlow based mechanism is proposed to distribute VDs in multiple DCs so that the performance and scalability are maximized [18]. However, the authors focus on route setup, route selecting, and load balance between thin client and DC. The most similar work to ours is [1]. Latency sensitive VDs are optimized by exploiting the geodistributed DCs location. A greedy algorithm VMShadow is presented by migrating VD to its user while considering the overhead of migration. But it does not establish mapping between VDs and PMs. Furthermore, the resources at DC level and inside DC as well as energy cost are not involved.

Only seldom papers [13, 19] consider the selection of DC and server simultaneously when placing VMs. Yao et al. [19] propose a two-time-scale Lyapunov optimization algorithm to reduce power cost for delay tolerant workloads. A multilevel group GA (MLGGA) is proposed to reduce carbon emission by exploiting green energy [13]. The main idea is DC consolidation and PM consolidation. Herein, the distributed DCs are viewed as the higher level group and servers in DC as lower level group. This scheme can group the items and is designed for multilevel BP. But it does not consider bandwidth optimization and quality of service guarantee. Both of them focus on the general VMs. Calyam et al. [10] present a utility driven model U-RAM to allocate resources for VDs so that the utility in each DC is maximized. The resource should be enough to ensure the timeliness user perceived and coding efficiency of VD access protocol. It adopts a two-phase scheme: the DCs are selected based on balance, power, or express migration. Then VMs are assigned to PMs in the selected DC. Inter-DC bandwidth cost is not considered. The overall resource cost in the two levels cannot be optimized integratedly. All works cannot capture the noncooperation relation between DC network and servers.

3. Formulation

Suppose there are BCs and each requires VMs, and . There are candidate DCs. In each DC , there are PMs. The total number of PMs in all DCs is ; that is, . The cost and quality aware multi-BC virtual desktops placement problem can be summarized as placing VMs belonging to BCs on PMs which distribute in DCs, so that the maximum distance between DCs and BCs being served is as short as possible, while minimizing the overall cost at both DC and server level, including power, network, and server. The problem is modeled as multiobjective bilevel programming (MOBLP).

3.1. Low Level Objective and Constraints

Low level is server level. It aims to select PMs in DC determined by high level (DC level, as illustrated in Figure 1). We will place VMs in DC . Note that maybe the VMs serving different BCs will be placed in one DC. The number of VMs is fixed; that is, . Each VM () requires kinds of resources , such as network bandwidth, CPU, memory, and disk storage. denotes PM in DC . It possesses kinds of normalized resources where (the normalized weight for dimension is ). Specifically, we designate the first dimension resource as bandwidth ( and ) and the second one as CPU ( and ) for ease of notation. The communication traffic rate between and is and equals 0. Herein we suppose in that a bidirectional link normally has equal bandwidth in each direction. If we select the biggest one as the traffic. and are both Boolean variables controlled by low level. indicates whether is assigned to . It equals 1 if is assigned to and 0 otherwise. indicates whether is active. It equals 1 if is active and 0 otherwise.

The low level only considers resource cost of PM nodes (the former half part of ) and bandwidth (the latter half part) inside each DC :where is the price for resource . is summed from 2 because the bandwidth cost has been calculated explicitly in the latter part. Multiplier means that only cost of active PMs needs to be considered. is the th layer bandwidth price, where represents bandwidth in access, aggregate, and core layer (Figure 1), respectively. Normally, . is the th layer bandwidth consumption and defined as the sum of the th layer traffic of VMs across PMs (2). So it equals zero for the same PMs (3). Consider

In each PM, the resource capacity should be respected. Because if VMs are placed in one PM, then the inter-VM traffic is changed to intra-PM traffic and no outgoing bandwidth is occupied; the intra-PM traffic is subtracted. The bandwidth constraint is as follows:

The other resources constraint is

For each DC , the low level programming is written as

Constraint (7) states that a PM is viewed as active if it hosts at least one VM. Constraint (9) implies that all VMs should be assigned and a VM can only be placed in one PM.

3.2. High Level Objectives and Constraints

High level focuses on DC selection. It optimizes the delay, the overall power, and physical resource cost. Binary variable indicates whether there exists a PM which is used to serve BC and the PM is in . If it is true then and 0 otherwise. Once there exists one active PM in , the is viewed as active. So we have

Suppose the one-way delay between and BC being served is . It can be estimated by some extensively studied work [20]. We hope to ensure that all the delays are within a threshold which is the maximum delay permitted; that is,Multiplier means that we only care about the delay between active DC and the BC being served. At the same time we want to reduce the delay to a minimum. This is the first objective:

The second objective aims to optimize power cost of all the selected DCs by leveraging the geodiverse electricity price:where is electricity price of . is the coefficient to reflect the relation between power and CPU load. is the power consumption of in idle or standby state. Because power grows largely positively proportional to CPU utilization [21], we use an affine function of CPU load to estimate power cost. To make the power consumed and physical resource cost comparable, we follow the way of [12]. All the one-time purchased physical resource cost is amortized in a reasonable lifetime. So all the physical resource prices in the formulation are the amortized ones. Implicitly, in the formulation, we only balance the cost in the amortized period. The former half of represents power cost caused by workload and the latter half is power in idle or standby state.

The third objective is the overall resource costs including PMs and bandwidth:The former two parts are consistent with the objective of low level . However, low level only considers resources inside each DC. High level considers the inter-DC bandwidth cost in the third part additionally. Inter-DC bandwidth definition is similar to inter-PM bandwidths (2), (3):

The high level optimization can be summarized as multiobjective programming (MOP):where and are the solutions of the low level programming.

Obviously, each level has its own objectives and constraints. The high level objective value depends on the optimal solutions of the low level. Needless to say the first two parts of which are just what was pursued by the low level, the inter-DC bandwidth () and the overall power (), are both subject to the optimal solution of the low server level. Inter-DC bandwidth consists of the traffic between VMs across DCs (16). The overall power is directly related to the CPU load and how many PMs are used. The low level optimizes itself under the determined high level decision variable (DC is determined by high level). In particular, there is a noncooperation relation between DC level and server level. For example, each DC only tries to minimize its resource consumed, but sometimes it is contrary to minimizing the inter-DC traffic because minimizing inter-DC traffic may need to consume more PM resources. This is just the case described by bilevel programming [22].

Note that the bandwidth related constraints, (2), (4), and (16) are nonlinear, MOBLP is nonlinear bilevel programming. Even linear bilevel programming, the simplest one of bilevel programming problems, is proved to be strong NP-hard. The problem formulated herein is NP-hard. GA has been demonstrated as a very efficient scheme to address bilevel programming [23, 24]. We resort to GA to solve it.

4. Algorithm

For minimization programming, suppose there are two vectors, and , with the same dimension ; we say dominates if and only if for any and there exists at least so that . Suppose the feasible solution set of multiobjective programming (MOP) is ; then a solution is Pareto optimal to MOP if there is no point , so that dominates , where . In short, any decrease in one dimension of must lead to the increase of at least one other dimension. We try to find the multiple Pareto solutions for MOBLP.

Distinctive encoding and decoding, initial population generation scheme, and genetic operators are designed to address the multiobjective bilevel resource provision problem.

4.1. Encoding and Decoding Scheme

In GA, it is very important to reflect the structure and information of the problem to be optimized in the genes of chromosome (it is assumed that readers are familiar with the structure of GA, otherwise please refer to [25] for details). Considering the characteristic of multi-BC and two levels of , we propose a segmented two-level grouping encoding scheme as depicted in Figure 2. The entire candidate PMs and DCs are numbered first. The encoding gives the serial number (SN) of PM to which each VM is assigned and DC to which each PM belongs. It consists of segments in series and is encoded as . Segments are separated by a semicolon and corresponds to BC for . The structure of is the same as the encoding of MLGGA [13]. It comprises three parts. The first is SN of PMs used. The second is SN of DC to which each PM belongs. The third is DC in the second part after deleting the repeated ones. The three parts are isolated by a colon. For example, suppose there are DCs , PMs ; the relation of belongingness of the PM and DC is as follows: . BC 1 requires  VMs and they are assigned to PM . The corresponding DCs are . They are abbreviated as by deleting the latter repeated ones. Then is : : .

The new encoding scheme lists the PMs and DCs to which the VMs of each BC are assigned. It can capture the placement of VMs of multiple BCs as a whole and facilitate the competitive scenario resolution. Therefore, better solutions can be found. It also remedies the incompetency of MLGGA which can only place VMs of one BC one time and achieves better performance as revealed in Section 5. The decoding is self-evident.

Distinctive initial population generation scheme and genetic operators are designed to address the multiobjective bilevel VDs placement problem.

4.2. Initial Population Generation

The Pareto solution aims to optimize each scalar objective . So we strive to embody the optimum of each in the initial population. Solutions for the solo member objective are produced so that the initial population has a rather good gene to be inherited by the offspring.

For , the shortest delay VM placement algorithm (SD) is proposed in Algorithm 1. For BC , only those DCs, delays between them, and BC which do not exceed are considered. Denote those candidate DCs for BC as feasible DC set . For each BC, SD prefers placing VMs of this BC in closer DC in until all VMs are assigned. This procedure is repeated for all BCs.

Input: : numbers of BCs
  : feasible DC set for BC and
  : VMs set for BC
Output: VM placement solution encoding
(1) for    do
(2)  Sort DCs in according to . Closer DC is selected
  with higher probability and denote the sorted sequence
  as
(3)  for    do
(4)   Randomly select non-assigned VM from and put
   it into a random non-full PM in until is full
   or reach the upper limit
(5)   ∖VMs assigned to )
(6)   if    then
(7)    break
(8)   end if
(9)  end for
(10) end for
(11) encoding the solution as according to Section 4.1

In Algorithm 1, we can replace the sorting criterion with electricity price . Then we have another method which strives to place VDs in feasible DC with the lowest electricity price. We denote it as LeastPowerCost (LPC). LPC aims to optimize .

For , a modified first fit decreasing algorithm (MFFD, Algorithm 2) which takes into account both nodes and inter-PM bandwidth optimization is depicted in Algorithm 2. It strives to place VDs in the --cluster with the largest capacity in a DC which is the largest one in . The communication cost is defined as the number of switches or routers in the path [11]. The PMs with the same cost are named as -cluster. There are three kinds of clusters in topology in Figure 1. For example, PM1~PM4 is a 1-cluster. PM9~PM16 is a 3-cluster. PM5~PM8 and PM9~PM11 are both 5-clusters. Each time, we prefer the largest capacity cluster with the smallest cost, because bigger cost means more aggregate links and core links will be used. The overconsumption of these relatively scarce top layer links may further lead to congestion and communication delay. This tactic can reduce consumption of the higher layer bandwidth. In the process of placement, once a -cluster is selected, another cluster with the same cost will be selected in priority if both clusters can constitute a ---cluster. This can ensure the effect of consolidation and save more links between clusters. For example, if PM1~PM4 is the cluster with the biggest capacity in all 1-cluster, it will be selected first and then PM5~PM8 cluster is considered unconditionally because they constitute a 2-cluster.

Input: : numbers of BCs
  : feasible DC set for BC and
  : cost matrix of each DC as illustrated in Figure 1
  : VMs set for BC
Output: solution encoding
(1) for    do
(2)  Sort DCs in according to their capacity. Bigger DC
  is selected with higher probability and denote the sorted
  sequence as
(3)  for    do
(4)   while () and (exists not be searched)  do
(5)    Find smallest-cost-PM cluster with the largest
     capacity. Suppose there are total PMs in this
     cluster
(6)    Sort these PMs in non-increase capacity (21) and (22)
     order as
(7)    for    do
(8)    Select the biggest non-assigned VM from
      and put it into until is full
(9)    ∖VMs assigned to )
(10)      if    then
(11)      break
(12)      end if
(13)    end for
(14)   end while
(15)   if    then
(16)     break
(17)   end if
(18)  end for
(19)  if () and (all is searched) then
(20)   There is overflow, return FAILURE
(21)  end if
(22) end for
(23) Encoding the solution as according to Section 4.1

The capacity of a PM is defined as that in [26]:that is, the sum of all the resources dimensions of all the PMs in the cluster. Orif there is at least one VM in this PM, where is the normalized factor for dimension . is the -dimension residue capacity of PM . The reason is that, in the computer, if any dimension is used up, then the PM cannot support any more VMs. The corresponding cluster capacity is defined as the sum of the capacities of PMs in the cluster and similar for the capacity of DC. Capacity of VM is similar to (21) except that is replaced by .

MFFD strives to place VDs in the -cluster with the largest capacity in the largest DC chosen from .

MFFD, SD, and LPC will be invoked times, respectively, to produce initial feasible solutions. This scheme can produce a rather large initial population and the three groups of population embody a relatively good assignment for the three objectives, respectively. Thus, the initial parents are endowed with some optimal property. In the latter crossover and mutation, though the initial solution will be replaced by a new one which dominates this solution, the size of the initial population remained at least so that the GA can converge faster.

Lines (2)–(21) of MFFD describe the placement scheme for one BC. We denote them as MFFDOneB. Compared with line (2) where DC with a larger capacity is preferred, in MFFDOneB, the DC, which has the smallest residue capacity and has been used, has a higher priority for selection, so as to take full advantage of the residue capacity and reduce the number of DCs used. Lines (4)–(21) of MFFD describe the placement scheme inside one DC. We denote then as MFFDOneDC. MFFDOneDC mainly works for , while SD, LPC, MFFD, and MFFDOneB work for .

4.3. Crossover Operator

The crossover is applied to the segment one by one and every segment can produce an effect on other segments if the BCs corresponding to these segments compete for the same DC. Crossover operator is depicted in Algorithm 3. denotes the segment corresponding to BC in encoding . For each segment, the mechanism of the crossover operator in MLGGA is adopted. But to capture the scenario of multi-BC, we propose twofold exceptions.

Input: : two individuals
Output: : two individuals after crossover
(1) for    do
(2)   crossover with according to
  MLGGA [13] with the exception that when
  “competition” occurs then the upper described
  scheme is used
(3) end for

First, the classic FFD used in MLGGA is replaced by MFFDOneB when placing VMs of one BC or MFFDOneDC when placing some VMs inside one DC, respectively.

Second, a new technique is recommended to address the “competition” case in the multi-BC scenario. In MLGGA, it tries to inherit the property of the parent and keep the VMs in the inserted group (PM or DC) unchanged. So it will clear the different resident VMs in the same group in the target chromosome in advance and keep the common VMs. For example, there are two chromosomes, : : : , where DC contains VM 4 and VM 8, and : : : , where DC contains VM 2 and VM 4. Here the same alphabet with different case represents the same PM or DC. Therefore, and indicate the same DC. means crossover point. For target chromosome , when crossover operates, will be replaced by . Now there are two same DC and in the offspring of . So VMs in should be cleared so that VMs in are kept unchanged; that is, VM 8 in will be reassigned by FFD and VM 4 is preserved. But in the multi-BC scenario, maybe contains many VMs of other BCs. So when there are too many VMs in , has not enough residue capacity to host these VMs. This is just the case of “competition” where BCs compete for the same PM or DC. We propose the competition resolution scheme and allow one BC to drive out VMs of other BCs as follows. The resident VMs of the other BCs and this driving BC in will be cleared first. Thus ensure VMs in are kept unchanged and the group property of the parent is inherited. Then the cleared VMs are reassigned in by MFFDOneDC according to the following BC order: first this driving BC and then another randomly selected one. The procedure is repeated until all BCs are processed or is full. At last, the overflow VMs are reassigned to the feasible BCs of the BC being served, by MFFDOneB. Crossover can reduce both delay and PM and network cost.

4.4. Mutation Operator

The mutation happens in the third part of each segment, that is, DC in , thus leading to VMs replacement in PMs belonging to the mutated DC. There are three possible scenarios for the mutation. The first is that DC mutates to an idle candidate. The second is that it mutates to a DC which is used by the same BC. The last is that the target DC has been used by other BCs, and therefore this DC/PM is competed for by two BCs. The latter two scenarios may coexist. Because the newly added VMs may cause a violation of capacity or exceed the upper limit of the BC, some VMs may overflow; that is, the resident VMs need to be reassigned. This facilitates the changeover of DCs/PMs for two BCs so that resources can be balanced between them. See Algorithm 4 for details. In line (7), the overflow VMs of BC are preferentially assigned to because maybe is sill not full after VMs of BC are assigned.

Input: : input individual
  : feasible DC set for BC and
Output: : individual after mutation
(1) for    do
(2)  Randomly select one DC in the third part of
  and change it to one DC with lower electricity price
  with higher probability in , denotes as
(3)  if ( is used by ) or ( is used by another BC (such
  as )) or ( is shared by BC and BC ) then
(4)   Clear . Assign VMs of BC in to by
   MFFDOneDC in priority
(5)   The overflow VMs of , if any, are assigned to other
   DCs in the order of non-increase order of electricity
   price in . Inside each DC, the assignment is
   completed by MFFDOneDC
(6)   The VMs of originally in are assigned to by
   MFFDOneDC.
(7)   The overflow VMs of , if any, are preferentially
   assigned to , then to other DCs in the order of non-increase
   order of electricity price in . Inside each
   DC, the assignment is completed by MFFDOneDC
(8)  else
(9)    Place the VMs in by MFFDOneDC
(10)  end if
(11)  if All VMs are assigned successfully then
(12)   Clear the VMs of BC in DC
(13)  else
(14)   Remain the original assignment before mutation unchanged
(15)  end if
(16) end for

Power cost optimization is mainly fulfilled by mutation operator in that the electricity price differs at DC level.

4.5. The Unified Genetic Algorithm: STLGGA

The unified GA, segmented two-level grouping GA (STLGGA), is depicted in Algorithm 5.

(1) Pareto solution set
(2) Generate initial population by SD (Algorithm 1), LPC
and MFFD (Algorithm 2), respectively. Denotes these
population as
(3) while Stopping condition not satisfied do
(4)  Crossover. Random select two individuals from and
  denote as . Applying Algorithm 3 to .
  Produce two offspring
(5)  Mutation. Applying Algorithm 4 to to produce
  
(6)  Update of  . For each in ,
  If , then
  If , then
  The size of is kept not less than
(7)  Update of  .
  Remove from all the points if is
  dominated by and
  Add to if not exist point in so
  that dominates
  Add to if not exist point in so
  that dominates
(8) end while

5. Simulation Results

DC network is simulated in a grid in plane. Generally, the DCs hold the property of clustering. 80% of DCs follow a normal distribution and 20% of DCs are selected uniformly from the grid. The distance between DCs is -distance and the number of PMs inside each DC follows . Configurations of PMs are borrowed from IBM System x M5 server and System x3300 M4 server [27]. Four classes of PMs equipped with a 1 Gbps Ethernet card are simulated. Considering the proportional configuration of PMs, we simply give each class a price instead of giving every resource a unique price. For the resource requirements of VMs, we adopt the four kinds of configurations of Amazon m3-serials [28] (for consistency with PM, GiB is replaced by GB). m3-serials are designed for general purpose and are very suitable for VDs. Table 1 lists the details. For each DC, we simulate a tree-like topology (Figure 1). Each core switch administrates 15 aggregate switches. Each aggregate switch administrates 2 access switches. 5 PMs are connected to the same access switch.

Depending on VMWare [29], a virtual machine cannot have more vCPUs than the number of logical cores of the host. The number of logical cores is equal to the number of the physical cores if hyperthreading is disabled or at most twice that number of the physical cores if hyperthreading is enabled. So we suppose there is a one-to-one relation between vCPU and physical core.

For multi-BC scenario, the number of VDs each BC requires is chosen uniformly between 20 and 300. Traffic between VDs follows  Mbps [9]. For the VDs in the same BC, all of them communicate. For the VDs belonging to different BCs, only 10% VDs communicate. The bandwidth prices , , and are attached to access, aggregate, and core layer, respectively. The long distance inter-DC bandwidth price is .

The electricity price pool is from the data of August, 2015, of EIA [30]. Each simulated DC is equipped with a random price selected from the pool. The transmission delay is measured by distance; herein we use 300 as the threshold. We assume the queueing delay is same and therefore it is omitted. We adopt the idle or standby power consumption as 60% of the peak power [21].

The initial solution size of STLGGA is and . Our simulation is realized with Matlab. All numerical experiments stop after 30 thousand iterations. In average, it takes about 105 seconds and is a little slower than MLGGA which will take about 81 seconds as claimed in [13]. This is because STLGGA need to deal with the competition scenario. In all the simulations, the numerical results are the average of all the Pareto solutions for multiobjective programming.

We use a latest proposed unified algorithm, MLGGA [13], as the baseline. Because each time it can only place the VMs of one BC, the objective cannot be calculated. Therefore, and are used to calculate the fitness values. MLGGA is invoked for a randomly selected BC in to optimize and . A solution is chosen randomly from the Pareto solutions as the assignment scheme of . Then, other BCs are traversed on the basis of the remaining resource after the deployment of VDs of previous BCs, until the VMs of all BCs are assigned. Now can be calculated based on the results. The simulation results are detailed in Section 5.1. We also compare the two algorithms for one BC case in Section 5.2, where each algorithm pursues three objectives defined in MOBLP. Both scenarios validate the effectiveness of STLGGA.

5.1. Simulation Results for Multi-BC Scenario

To investigate the scale efficiency of STLGGA, we vary the number of BCs from 5 to 15. Figure 3 plots the solution quality compared with MLGGA. The three objectives achieved are shown in Figures 3(a)~3(c). With the number of BCs increasing, delay, accompanied with power and resources, increases for both algorithms. STLGGA results in an average of 13% shorter delay, which also means the communication of users supported by VDs deployed in different DCs becomes much faster. When STLGGA is used, power and resources are saved by 21% and 6% on average, respectively.

Resource efficiency is detailed in Figures 3(d)~3(g). STLGGA uses fewer PMs. The average reduction is about 27%, that is, 118 PMs (Figure 3(d)). Because of the heterogeneity of PMs, we also compare the PM resource consumed. STLGGA leads to 1%~19% resource cost saving. On average, about 9% of cost is saved (Figure 3(e)). Fewer PMs indicate that less power is needed to keep PMs active. Therefore, power efficiency is improved and the total power is reduced. This further backs up Figure 3(b). On average, STLGGA also leads to a reduction of 13% of the expensive inter-DC traffic (Figure 3(f)). Because the total traffic between all VMs is a determined value, STLGGA saves more expensive long distance inter-DC bandwidth by converting more inter-DC traffic to intra-DC traffic at the cost of relatively cheaper intra-DC bandwidth, including access, aggregate, and core layer one. The traffic across the three layers produced by STLGGA is more than what was produced by MLGGA. But the cost of the total required bandwidth produced by STLGGA inside DC is much less than what was produced by MLGGA, that is, an average reduction of about 13%. It is consistent with our purpose (objective ) to optimize the bandwidth cost inside DC (Figure 3(g)).

Suppose 5 BCs apply for VDs. We study VDs placement with different scales of DCs varied from 4 to 60. The capacity of DC and the number of VDs requested for each BC remain as before. Figure 4 demonstrates the three objectives. On average, 5% delay is shortened. 10% power and 5% resource cost are saved, respectively. The detailed resource comparison results show the same tendency as Figures 3(d)~3(f) and are omitted here.

It is noted that, naturally, it is expected that the solution quality will improve because as the number of DCs increases, there are more candidates. But in reality, the Pareto solution tries to balance the three objectives and the figures appear in a nonsmooth phenomenon. The delay displays an uptrend when DCs increase from 35 to 50 (Figure 4(a)), accompanied by the declining of power cost (Figure 4(b)) and resource cost (Figure 4(c)). But when delay decreases as DCs increased from 55 to 60, the latter two objectives go upward. This is due to the random number of PMs in DC, the location diversity, and random power assignment. It is also observed that STLGGA still performs much better than MLGGA in the latter two objectives. This comes at a cost of a little bigger delay when 60 DCs are searched. Generally, with the number of DCs increasing, all the three figures show a downtrend.

5.2. Simulation Results for Single BC Scenario

We also examine the VDs placement when there is just one BC. The number of VDs being applied for varies from 500 to 1000. This time, MLGGA is invoked to optimize , , and simultaneously within the feasible DCs, now that can be calculated. The average results of the Pareto solutions for both algorithms are reported in Figure 5. Similar to the results of multi-BC, STLGGA outperforms MLGGA for the three objectives, as well as for PM number, PM resources, inter-DC traffic, and total required bandwidth cost.

This further validates the idea that STLGGA not only works well for multi-BC, but also does for a single BC.

6. Conclusion

Considering the bilevel resource provision for the deployment of virtual desktops of multi-BC in distributed cloud, service delay, power efficiency, and cost optimization are explored in this paper. The problem is formulated as multiobjective bilevel programming which captures the noncooperative relation of DC network level and server level. So it can facilitate the optimization of nodes and bandwidth cost of both levels without violating the delay threshold, while striving to further minimize the maximum delay of each BC. Because of the NP-hard nature of the problem, a segmented two-level group GA is proposed. Novel coding, initial population production, and operators schemes are tailored to address the problem. The effectiveness of the algorithm is validated by extensive simulations. The algorithm outperforms the baseline algorithm in both multi-BC and single BC scenarios.

Though we focus on VDs deployment, it is just one applicable object of the proposed formulation and algorithm. They can also be applied to the placement of VMs to support any location-sensitive or delay-sensitive services [31] in distributed clouds, such as VOD [32] and big data [33].

In this paper, we only consider different electricity prices of DCs in energy cost optimization. But it cannot reflect the utilization of renewable energy. Because renewable energy, such as solar, wind, and tidal energy, is varying with time and regions, VDs can be migrated to exploit them more efficiently within the delay threshold [34, 35]. In our future work, we aim to utilize more renewable energy by leveraging the wide-spanned distributed DCs over the globe, so that not only economic, but also social benefit can be achieved.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work is supported in part by National High Technology Research and Development Program of China (no. 2015AA016008), National Science and Technology Major Project (no. JC201104210032A), National Natural Science Foundation of China (no. 11371004, 61402136), Natural Science Foundation of Guangdong Province, China (no. 2014A030313697), International Exchange and Cooperation Foundation of Shenzhen City, China (no. GJHZ20140422173959303), Shenzhen Strategic Emerging Industries Program (no. ZDSY20120613125016389), Shenzhen Overseas High Level Talent Innovation and Entrepreneurship Special Funds (no. KQCX20150326141251370), Shenzhen Applied Technology Engineering Laboratory for Internet Multimedia Application of Shenzhen Development and Reform Commission (no. ), and Public Service Platform of Mobile Internet Application Security Industry of Shenzhen Development and Reform Commission (no. ).