Abstract

In the problem of VMs consolidation for cloud energy saving, different workloads will ask for different resources. Thus, considering workload characteristic, the VM placement solution will be more reasonable. In the real world, different workload works in a varied CPU utilization during its work time according to its task characteristics. That means energy consumption related to both the CPU utilization and CPU frequency. Therefore, only using the model of CPU frequency to evaluate energy consumption is insufficient. This paper theoretically verified that there will be a CPU frequency best suit for a certain CPU utilization in order to obtain the minimum energy consumption. According to this deduction, we put forward a heuristic CPU frequency scaling algorithm VP-FS (virtual machine placement with frequency scaling). In order to carry the experiments, we realized three typical greedy algorithms for VMs placement and simulate three groups of VM tasks. Our efforts show that different workloads will affect VMs allocation results. Each group of workload has its most suitable algorithm when considering the minimum used physical machines. And because of the CPU frequency scaling, VP-FS has the best results on the total energy consumption compared with the other three algorithms under any of the three groups of workloads.

1. Introduction

On-demand and effective resources management is crucial for the availability of a scalable cloud datacenter [1, 2]. According to statistics, the resource utilization rate of datacenter is very low, which is only about 30% on average. However, a server is in idle status most of the time during the day. And even an idle server still consumes almost 60% of its full-load power. With its technical advantages and the hardware support, virtualization technology [3] comes out once again. Based on the support of the processor and server, virtual machine (VM) becomes the basic unit for resource management and sharing [4]. It has high utilization efficiency and good isolation characteristics with each other. Nowadays, people also use VMs to solve energy saving problem in datacenter which is so-called packing problem [5]. Packing problem is a global optimal problem. The objective is to pack the pieces into a number of bins, subject to the constraint, that the sum of the sizes of the pieces in each bin is less than or equal to the size of the bin. Similarly, VM packing means to place the VMs as much as possible to the physical machines (PMs), so that the unused PMs can be shut down to save energy. Therefore, VM packing problem is also called VM placement problem.

Different from dynamic VM placement, initial VM placement should have long term effects in datacenter. Because VMs migration will cost time and resources, frequent migration is impractical. Such placement plays an important role in efficient use of resources and energy saving in datacenters [6]. Initial VM placement is subject to the VM requirements for resources, the SLA requirements set by users, and the available resources on PMs. There are many works studying the optimal solution of initial VM placement problem, such as heuristic methods [79] or genetic algorithms [10, 11]. However, heuristic algorithm is basically the single point search, and thus it is easy to fall into the local optimal solution. Genetic algorithm does not use the feedback information of the system, which makes the search blind. Reference [12] proposed an improved ant colony optimization algorithm to find the optimal solution. Reference [11] proposed a genetic algorithm based on NSGA-II to find the optimal solution. There are also many works focusing on the energy or power in a virtualized environment [1318]. In [12], the author proposed a heuristic method to find the optimal energy consumption in a virtualized enterprise environment. Reference [7] studied the relation between CPU utilization and applications. In this way, the system can decide the placement solution for VMs in order to decrease the energy consumption. Reference [15] proposed algorithms based on first-fit descending to consolidate VMs in datacenter. Reference [16] not only considers the energy but also considers the factor of SLA for optimal service provision according to users resource requirement. References [17, 18] also focus on the objective of energy. They also regard CPU as the main resource for energy consumption and monitor it. The objective of VM migration is to find the optimal energy consumption.

These algorithms are all based on the fixed frequency of CPU. References [19, 20] put forward the model of CPU frequency and energy consumption. Reference [21] proposed a solution under the changeable CPU frequency. But it only adapts to unchanged workload. In [22], the authors addressed the idea that an energy conservation algorithm must consider the workload characteristics of virtual machines. The typical or random workload will actually affect the final energy consumption. Reference [23] describes the relationship between the power consumption and the operating frequency. This energy consumption model has been utilized by many works [24, 25]. However, its assumption is when the CPU operates on full load and the CPU utilization is 100%. Paper [22] has addressed the idea that the different workload will affect the CPU utilization because of its task kinds and algorithms. For example, some workload is CPU intensive while others are data intensive or I/O intensive. Therefore, it is not comprehensive to only consider the relation of power and frequency.

This paper considers the relationship among power consumption and CPU frequency and CPU utilization. Here we consider the condition of workload characteristics. That means the CPU utilization will dynamically change according to their task characteristics. Usually, traditional PM in datacenters runs at a fixed CPU frequency despite what CPU utilization is. Based on the energy consumption model, we deduce that there will be a suitable frequency for certain workload CPU utilization in order to obtain minimum energy consumption. Therefore, we put forward a CPU frequency scaling idea and design a heuristic ant colony algorithm VP-FS (virtual machine placement with frequency scaling) to find the global solution for VMs placement. For the experiments, we simulate three groups of workload with different characteristics of CPU requirements as the evaluation data. CPU utilization for these three group presents linear, even, and bipolar distribution, respectively. For comparison, we also design and realize three algorithms with different greedy policies for initial VM placement. Different weights are considered for CPU, memory, and bandwidth resource requirements separately. The efforts show that different workload has its suitable placement algorithm. Because CPU frequency can dynamically be scaled to fit for certain CPU utilization, VP-FS can lead to minimum energy consumption among these algorithms. The main contributions of this paper are in the following aspects.(1)Different from the work in [23], we consider the CPU utilization which is also an important factor to the energy consumption. And we verify that there should be a frequency that is best suited for CPU utilization with respect to the minimum energy consumption.(2)Different from the work in [20], we consider the optimal energy consumption not the energy efficiency. We can have a more direct view of the relation among the PM numbers, CPU frequency, and the workload tasks capacity.(3)Based on the above analysis, we put forward a CPU frequency scaling algorithm which can dynamically scale the frequency of the PM according to the CPU utilization in order to obtain the optimal energy consumption.(4)In order to evaluate the effort of ours, we simulate three groups of workloads. Each group has a certain CPU utilization distribution. Then we do experiments using the proposed algorithms. From the results we can see that the solution of VM placements does have the relation with the different workloads. And the energy consumption can be lower by using frequency scaling.

The rest of this paper is organized as follows. The VM packing problem statement is presented and the three greedy algorithms are proposed in Section 2. In Section 3, we modeled the VM placement problem with respect to multiobjectives and frequency scaling. We also deduce the related factors with respect to minimum energy consumption. Based on the above analysis and the idea of CPU frequency scaling, we propose a heuristic ant colony algorithm VP-FS in Section 4. VP-FS is to find the optimal VM placement solution by searching the global solution space. In Section 5, we design the workload with different characteristics of CPU requirements. We then do the experiments and evaluations of the proposed algorithms. Conclusion is finally given in Section 6.

2. Problem Formulation and the VM Packing Algorithms

2.1. Problem Formulation

For future reference, we summarize the notation that is used throughout this paper in Notation section. In Notation section, each denotes a resource utilization ratio (RUR), such as CPU, memory, or the bandwidth of a VM. With the notations presented in Notation section, we use to represent the comprehensive RUR of . Thus, VM packing problem can be formulated as follows: given a certain , , , , , , we should find a placement solution , that is, and .

2.2. RUR Based VM Packing Algorithms with Different Greedy Policies

We will present three algorithms with different greedy policies in this section.

(1) VPBFD Algorithm. The idea of VPBFD (virtual machine placement with best fit descending policy) algorithm is as follows. Compute the comprehensive RUR of each VM. And sort the VMs in descending order according to their RUR. Sort the PMs in ascending order according to their RUR. Search for each VM, finding the first PM that can satisfy the RUR of the VM. If a VM found its host PM, then recalculate the surplus RUR of PM and find its new place in the ascending PM queue. The pseudocode of VPBFD is presented in Algorithm 1. uVM is an array used for saving the comprehensive RUR of each VM. And the VM label is recorded in array oVM.

Input: VM, PM
Output:
(1)  for   to do
(2) for   to do
(3)   = VM;
(4) uVM = ;
(5) sort(uVM, oVM, descending);
(6)  for   to do
(7) for   to do
(8)   = PM[][];
(9) uPM [];
(10) sort(uPM, oPM, ascending);
(11)  for   to do
(12) for   to do
(13)  if  PM[] > VM[oVM[]]
(14)   PM[] VM[oVM[]];
(15)   M[oVM[], ] = 1;
(16)   adjust(PM[]), oPM[]);
(17)   break;
(18) end

VPBFD algorithm will satisfy VM with highest resource requirements and select the PM that just can satisfy the VM requirements, so it can leave much more room for other VMs.

(2) VPWFD Algorithm. The idea of VPWFD (virtual machine placement with worst fit descending policy) algorithm is as follows. Compute the comprehensive RUR of each VM. And sort the VMs in descending order according to their RUR. Sort the PMs in descending order according to their RUR. Search for each VM, finding the first PM that can satisfy the RUR of the VM. If a VM has found its host PM, then recalculate the remnant RUR of PM and find its new place in the descending PM queue. The only difference of VPWFD algorithm with VPBFD algorithm is to sort PM in descending order instead of ascending order. So the pseudocode of VPWFD will not be presented in this paper.

Different from VPBFD algorithm, VPWFD algorithm will first select the PM with highest resource surplus. Therefore, such PM can have room for other VMs.

(3) VPRandom Algorithm. The idea of VPRandom (virtual machine placement with random policy) algorithm is as follows. Compute the comprehensive RUR of each VM. It will not sort VM or PM. The algorithm will just begin to search for each VM, finding the first PM that can satisfy the RUR of the VM. The pseudocode of VPRandom algorithm is presented in Algorithm 2. As Algorithm 1, here the uVM is an array used for saving the comprehensive RUR of each VM. And the VM label is recorded in array oVM.

Input: VM, PM
Output:
(1) for   to do
(2) for   to do
(3)   = VM[][];
(4) uVM[]= ;
(5) for   to do
(6) for   to do
(7)  if  PM[] > VM[oVM[]]
(8)   PM[] VM[oVM[]];
(9)   M[oVM[], ] = 1;
(10)   break;
(11) end

Different from Algorithms 1, 2, and 3, VPRandom algorithm will not sort VM or PM. In totally random order, the algorithm will select the first satisfiable PM for each VM. In some cases, this method may obtain the ideal solution.

Input: VM, PM, ants, error margin , parameter
Output:
(1) for each
(2) = 0
(3) do{
(4)   =
(5)   =
(6)  for each from 1 to
(7)   =
(8)  for each from 1 to
(9)   choose host from hosts
(10)   Allocate(, )
(11)   If
(12)   then Dealloc(, ), back to step 9
(13)   
(14)  
(15)   = choose_energy(, )
(16)   = choose_balance()
(17)  for   in
(18)   
(19)   = energy_sla_sum()
(20) }  while()

3. Multiobjective VM Placement with Frequency Scaling

3.1. VM Placement Problem with Variable Frequencies

In this paper, we will put VMs to PMs; the basic objective is to minimize energy consumption of all the PMs. Therefore, the solution space is . If we consider frequencies of each PM and PM can select a proper CPU frequency according to its workload, then the solution space will be . Generally speaking, different type of workload has different resource requirements. And only if all the resources can satisfy VM requirements, the resources can be allocated to the VM. However, the unbalance of resources utilization will easily lead to a waste of resources. If some VMs can meet a resource balance in different resource requirement, these VMs should be placed in one PM to obtain better resource utilization. For example, one VM has a CPU-intensive workload and another VM has a data-intensive workload; then, these two VMs can be packing together in one PM to obtain better resource utilization. This may be reasonable for an optimal placement solution. Thus, besides the energy consumption objective, this paper also proposes another objective to measure the solution, that is, resource balancing degree.

3.2. The Objectives of the Problem

(1) Energy Consumption. Consider where denotes the instantaneous power of a PM. It depends on the energy consumption of PM in its idle time and the instantaneous frequency and CPU utilization, denotes the CPU utilization, and is a coefficient, which indicates that dynamic energy consumption of CPU is proportional to the cubic of frequency and utilization [21]. In [23], the energy consumption is depicted as . However, its assumption is that the workload works on 100% CPU utilization. Actually, a workload task can be depicted as the multiple of CPU frequency, CPU utilization, and a coefficient [20]. Therefore, in Formula (1), we put the as a multiplier to the .

(2) Resources Balancing Degree. Consider In this paper, we mainly consider three kinds of resources. They are CPU, memory, and bandwidth. In formula (2), and denote the utilization of memory and bandwidth on PMs, respectively. is the average utilization of the three resources. We normalize , , and in formula (2) so that can depict the balancing of the three kinds of resources in a PM. If the value of is small, then it means the three kinds of resources utilization in a PM are balancing.

(3) Objective Function. Based on formulas (1) and (2), the objective function of VMs initial placement can be depicted as follows.Minimize subject to where indicates the energy consumption of , indicates the resources balancing degree of , and and are weights which stand for the degree of importance on energy consumption and resources balancing. The objective function is to find the minimal value of energy consumption and best resource balancing degree. Objective SLA is implemented by constraining resources utilization in and ; indicates the upper threshold of resources utilization on PMs.

3.3. Frequency Scaling for Lowest Energy Consumption

Formula (1) shows the relation of instantaneous power of a PM with its instantaneous frequency and CPU utilization . Given a set of workloads, each of whom has its own resource requirements. And each workload may ask for different CPU resource according to its load capacity. Theoretically speaking, if the PM can work on its best CPU frequency according to the CPU utilization, then the energy efficiency will be the optimal. Therefore, we need to answer the following two questions. Is there a CPU frequency that is best suitable for a certain workload that can make the minimum energy consumption? What can such frequency be depicted as?

For convenience, we use to denote in this section. If there are PMs with each work on a frequency , then, as [23], the total required computing power of all the PMs is . According to formula (1), the total energy consumption of all the PMs is given by

As formula (6) shows, the total energy consumption has two variables, which are CPU frequency and utilization. If all the CPU frequencies and utilization in each PM are the optimal ones, then the total energy consumption can be the minimum one. Therefore, we will deduct the relations of with respect to CPU frequency or CPU utilization separately in the following.

(1) Total Energy Consumption with respect to CPU Frequency. As the first question asked about if we want to obtain the lowest energy consumption, then there is optimal values for CPU frequency of each PM. Based on formula (6), we make a first partial derivative of the total energy consumption with respect to CPU frequency. Consider

In formula (7),

According to (8), we can rewrite (7) as

In order to get the minimum , let formula (9) equate 0, and then we have

(2) Total Energy Consumption with respect to Amount of PM. We substitute each in formula (6) with (11). Formula (6) can be rewritten as

If PM works at frequency and the CPU utilization is , we define as

Therefore, according to (11), we have

Then we can rewrite formula (12) as

We make the first derivative of the total energy consumption according to (15) with respect to the number of PM. This can be expressed as

Setting , then we have .

Thus,

Equation (17) means that besides the constants and and the number of PMs , the optimal energy consumption also relates to each frequency in each PM.

4. Dynamic Programming Algorithm

To find the optimal solution for energy consumption, there should be a most suitable frequency to each PM according to the workload running on it. Based on this idea, this paper proposes an ant colony optimization algorithm VP-FS (virtual machine placement with frequency scaling). It can select the suitable frequency by scaling it according to the CPU utilization of the running workload. The basic idea is discussed as follows.

Initialize each and each as a two tuples , which means a placement between and . We call the two tuples as a path. Set for each path an initial pheromone . The ant will choose a path randomly for according to the value of . Of course, the path will be easily selected if it has a big . Once the ant has finished path choosing for each , an initial placement solution is formed.

If there are ants, then there will be placement solution. We take all the solutions that meet objective SLA as the solution space for the second objective . In the solution space that meets the second objective, we select the first subset solution as the solution space for the third objective . Then we will find the best solution. For the paths in the final solution, their pheromone will increase multiplied by . Iterating the above processes till the value of the objective function as formula (4) does not change or changes in a small enough range. The pseudocode of VP-FS algorithm is presented in Algorithm 3.

In Algorithm 3, denotes the value of the objective function of formula (4). Variant represents the ant. It will loop from 1 to the . Each ant will produce a placement solution. Variants and are used to find a proper for a . While choosing the proper for , it will depend on the probability . If any remnant resource is insufficient, that is, , then undo the allocation, else record the allocation . choose_energy() means to select the solution from set . And choose_balance() means to search the best solution with minimal resources balancing degree from set . The ending condition of the iteration is that the difference of two results is less than a small enough value .

5. Experiments

Our experiments run on a cluster containing 10 PMs with OpenStack as its cloud infrastructure platform. All PMs are connected by a gigabit Ethernet. CPU in each PM has four cores. CPU highest frequency of each core is 2.7 GHz. The memory and bandwidth of each PM are 2 GB and 100 Mbps. We also set 30 VMs and let each VM deploy on one of the cores of the PM. We use sysbench to simulate each workload.

5.1. CPU-Intensive Workloads

Different workloads have different resource requirements variation, which may ask for different initial placement algorithms for best performance. In this paper, we simulate three groups of workloads. In each group of workload, we mainly focus on the CPU resource. The distribution of CPU utilization in each group is shown in Figure 1.

In Figure 1(a), the CPU utilization requirements of the three VM groups present different distribution. The 1st group is a linear distribution, the 2nd group is an even distribution, and the 3rd group is a bipolar distribution. Figure 1(b) is the memory and bandwidth resources, which are almost the same in each group.

The comprehensive RUR of is . In our experiments, . That means we consider three types of resource, CPU, memory, and bandwidth. Because CPU is the most effective factor in energy and we propose a CPU frequency scaling approach in this paper, so we set , , and as 7, 3, and 2, respectively. In the realization of the proposed VP-FS algorithm, we set , .

5.2. Placement Solution of Each Algorithm

For VMs in each group, we use the four algorithms (VPBFD, VPRandom, VPWFD, and VP-FS) in this paper to allocate them. 30 VMs are allocated to 10 PMs. Figure 2 is the allocation results under each VM group.

Because of different resource distribution in each group, the allocation results are different, as Figures 2(a), 2(b), and 2(c) show. x-Coordinate is the 10 PMs and y-coordinate is the 30 VMs. For example, in Figure 2(a), according to VPBFD, the allocation result is that VMs numbers 30, 29, 28, and 27 will be placed to PM number 1. According to VP-FS, VMs numbers 1, 19, and 3 will be placed to PM number 1. As the allocation results using different four algorithms show, firstly, the amount of used PMs is different. VPBFD and VPRandom use the least PMs, that is, 8 PMs instead of 10 PM for the 30 VMs, while VPWFD and VP-FS allocate 30 VMs in all the 10 PMs. Secondly, the distributed pattern is different. Because VPRandom algorithm does not sort the PM or the VM in advance, so the 30 VMs in Figures 2(a), 2(b), and 2(c) are all sequentially allocated from PM 1 to 10 if only the PM can afford the required resources. This is the same for the algorithm of VP-FS. When using VPWFD algorithm, the 30 VMs under three groups are all disorderly distributed in each PM, because VPWFD algorithm always tries to find the PM with the largest resources.

5.3. Resource Balancing Degree of Each Algorithm

After VM placement, we calculate the value of in each PM according to formula (2). The results are shown in Figure 3.

In Figure 3, we compare the value of in each PM under different algorithms. If the value of is low, then the PM has a good resource balance. Otherwise, the PM has a bad resource balance. Using the first group with the linear distributed CPU utilization. We get the results in Figure 3(a). The results obtained by VPBFD and VPRandom are fluctuating wildly from top to bottom. That means only some of the PMs can have a very good resource balance after being allocated with VMs. Because VPBFD firstly considers allocating the VMs to the used PM if only the PM has enough resources, therefore the resources of some PMs can be used effectively. VPWFD firstly considers allocating the VMs to the PM with the largest resources, and therefore all the 10 PMs have been used. Although some of the PMs have the lowest value under VPBFD, both the VPWFD and VP-FS have an average resources balance among each of the 10 PMs. Comparatively speaking, under linear CPU utilization workloads, value of VPWFD is a little better than VP-FS. Figure 3(b) is the results under CPU RU even distribution, and we can see taht all the value of each PM are all in an average manner by using any of the algorithms. Figure 3(c) is the results under CPU RU bipolar distribution. The values of each PM by using VP-FS are the smoothest compared to the other three algorithms. That means, by using VP-FS, under the workloads in bipolar distribution, all the PMs can obtain the best resource balance. We can also use the standard deviation of value of the total 10 PMs to describe the resource balance of each algorithm. As Table 1 shows, the standard deviation of value of VP-FS is small compared with the other algorithms. That means the resource balance is well in each of the 10 PMs under VP-FS.

5.4. Energy Consumption of Each Algorithm

The VMs run for 2 hours after having been placed on the PMs. We use power meters to measure the instantaneous power and the whole power consumption of each PM. When the PMs are in their idle status, we can obtain the basic energy consumption of each PM, as Table 2 shows.

Using the placement results under the 1st group data, the consumption energy of each PM is shown in Figure 4(a), so as to the 2nd group data in Figure 4(b) and the 3rd group data in Figure 4(c). Because VPBFD and VPRandom algorithms only use 8 PMs, their energy consumption in each PM is higher than that of VPWFD and VP-FS in the first 8 PMs. VP-FS leads to almost the least energy consumption in each PM under each group, except for few PMs.

Workload distribution in each group really affects the energy consumption result in each PM. In the first group, CPU RU is in a linear distribution pattern. In the four algorithms, VPBFD method will sort the VMs by their resource utilization in descend and VM in the first place will be placed firstly. So the energy consumption from PM 1 to PM 8 is also in a descending manner in Figure 4(a). VPRandom method will not sort VMs in advance and the placement is in a random manner, so the result in Figure 4(a) is totally the same manner in ascending order as the CPU RU distribution. In the second group, CPU RU is in an even distribution pattern; according to the above analysis, the energy consumption results of VPBFD and VPWFD are also in an even manner in Figure 4(b). In the third group, CPU RU is in a bipolar distribution pattern; although VPBFD will sort VMs in advance, the big RU values are much more than the first group, so VMs in the front positions will be placed to the sequential PMs. Thus, the energy consumption results of VPBFD and VPRandom are also in disordered manner. In any of these three groups, without sorting the VMs in advance while considering proper frequency scaling, VP-FS method leads to similar energy consumption in each PM.

As we all know, basic energy consumption contributes to the large part of the total energy consumption of an active PM. Including the basic energy consumption of the PM number 9 and PM number 10, VP-FS still has the lowest total energy consumption in the four VM placement algorithms. Figure 5 is the total energy consumption under each group of data. In each of the three groups, VPRandom and VPWFD methods get almost the same energy consumption. Table 3 is detailed data which show the energy saving results between VP-FS and other algorithms. From Figure 5 and Table 3, in the first group, when CPU workload is in a linear distribution pattern, VP-FS has the best energy saving results of 10.37% to the other traditional methods. VPBFD has the second best energy consumption result. In the third group, the advantage of VP-FS is not very obvious compared to VPBFD. The energy saving of VP-FS compared to VPBFD is only 1.84%. That means when workloads is in a bipolar distribution, VPBFD is almost as good as VP-FS method. On the average, VP-FS has saved 6.56%, 9.99%, and 10% total energy compared with VPBFD, VPRandom, and VPWFD, respectively. Obviously, using frequency scaling in initial VM placement, PM can find the proper CPU frequency for the certain VM that allocated to it.

6. Conclusions

In this paper, we consider workload characteristics with dynamic CPU utilization and energy consumption with CPU frequency scaling to the ordinary VM placement problem. If a PM has different CPU frequencies, then the solution space of the VM placement problem will be expanded from to be . Using energy consumption model, we verify that there will be a CPU frequency that best fit for a CPU utilization in PM with respect to the minimum energy consumption. We then modeled the objectives of energy, resource balance and SLA for optimal VM placement solution, and propose an ACO-based CPU frequency scaling algorithm VP-FS. In order to compare the effect, we put forward three typical greedy algorithms, which are VPBFD, VPRandom, and VPWFD. Each of them has different greedy policy. We design three groups of VMs with different resource distribution, so that we can have an evaluation of the impact which the workloads bring to the algorithms in VMs consolidation and energy saving. Our efforts show that, for workloads with different CPU resource utilization, running under different algorithms will produce different VMs allocation results. If we consider the numbers of used PMs, then different group of workload will have its most suitable algorithms for minimum used PMs. VP-FS is not the best algorithms considering the number of used PMs. However, because of its frequency scaling policy, it has the lowest energy consumption compared with the other three algorithms under three different groups of VM workloads. In the future, we will further consider workload awareness to dynamic VM allocation.

Notation

:Number of VMs
:Number of PMs
:Frequencies of a PM
:Instantaneous frequency of a PM
:Energy consumption of PM in its idle time
:VM
:PM
Dimension of the resources
:Utilization ratio of resource ,
:Weight of resource
:Remnant of the comprehensive resource utilization of
:Placement solution; if is placed in , then , else
:Energy consumption of in .

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgment

This paper is sponsored by the National Natural Science Funds of China under Grants nos. 61202429, 61379145.