Abstract

Cloud computing, an emerging computing paradigm, has been widely concerned due to its high scalability and availability. An essential stage of cloud computing is cloud resource management. Currently, the existing research about cloud computing technology has two prevalent disadvantages: high energy consumption and low resource utilization. Considering greedy scheduling is an effective strategy for cloud resource management technology in cloud computing, particularly in improving resource utilization and reducing energy consumption, we consider the heterogeneous characteristics of resources to save energy consumption of datacenter when tasks are the fundamental element of cloud datacenter. Meanwhile, granular computing is a complex problem-solving strategy through a granulation method. Thus, we introduce granular computing theory into cloud task scheduling and propose a greedy scheduling strategy based on different information granules, dividing the tasks into three types (i.e., CPU, memory, and hybrid type). Finally, we assign various scheduling strategies for cloud tasks with different characteristics. All the numerical experiments on the CloudSim platform show that our method has significant effects on energy consumption optimization and is a practical task scheduling algorithm.

1. Introduction

Cloud computing is an emerging distributed computing paradigm, which can be regarded as a pool of accessible and virtualized resources via Internet technology. This relatively new computing paradigm has been widely concerned and successfully applied in many engineering fields, such as big data analysis, medical diagnosis, and financial transactions [1]. Cloud task scheduling is an integral part of cloud resource management, which can directly affect the overall performance of cloud datacenter [2]. Thus, how to carry out efficient task scheduling has attracted the attention of many researchers. Currently, as the number of cloud computing customers increases, scheduling becomes quite tricky, requiring the use of appropriate scheduling algorithms. Some scheduling algorithms are developed in the early stages under the grid computing environment [3].

Recently, in cloud computing, where users may use hundreds or thousands of virtualized resources, everyone cannot assign every task manually. Due to commercialization and virtualization and the complexity of cloud computing in handling task scheduling at the virtual machine layer, scheduling plays a critical role in the cloud datacenter, requiring efficient and effective allocation of resources to each task. Decker et al. [4] use two granular computing methods, namely, fuzzy-set-based evolving modeling (FBeM) and evolving granular neural network (eGNN) to model and monitor data. A progressive granular neural network (PGNNs) can improve the classifier performance [5]. Shi et al. [6] propose a task duplication and insertion algorithm based on list scheduling to dynamically schedule tasks by predicting the completion time. An improved whale optimization algorithm (IWC) in paper [7] can effectively prove task scheduling efficiency.

According to the above discussion, we can see that some exiting works only consider the task scheduling optimization or the multitask scheduling problem, but they ignore various types of task impacts on the scheduling. Moreover, in the application of actual hardware of a big scale, the slight adjustment is likely to bring enormous impact meter platform of the cloud for the hardware. Thus, this paper considers the multitype task scheduling problems and proposes a cloud energy-saving scheduling strategy. The basic idea of our method is to divide resource requests into CPU type, memory type, and hybrid type according to the three-way decision theory.

Tasks have many attributes, but many types are not suitable to cluster, and the most significant impact on scheduling is CPU and memory. So, we consider the three indicators (i.e., CPU, memory, and hybrid type). Meanwhile, combined with different scheduling strategies, tasks with different characteristics are placed on more suitable computing resources to achieve the effect of energy-saving.

The remainder of this paper is organized as follows. Section 2 reviews the basic notions of three-way decision and the cloud task scheduling framework. Section 3 gives a cloud task clustering method based on three-way decision (TWD-CTC), which adopts the corresponding energy-saving scheduling strategy. Section 4 gives the experimental analysis results. Finally, this paper concludes with a suggestion for future research in Section 5.

2. Preliminaries

2.1. Three-Way Decision in Cloud Computing

Three-way decision is an effective data-information-knowledge-wisdom (DIKW) processing theory proposed by Yao [8], whose essential elements include trisecting, acting, and outcome, i.e., TAO model. Three-way decision has been widely used in various disciplines such as computer science, management science, mathematical science, and decision science [911]. We introduce the three-way decision into cloud task scheduling, and it is necessary to introduce the basic notions of three-way decision.

Definition 1 (see [12]). Let be a finite nonempty subset and be an equivalence relation on . A pair of is called an approximate space of fuzzy probabilistic rough set. For and , the upper and lower approximate set can be defined as follows: where denotes the conditional probability of classification and denotes the cardinality of the set.
Given a pair of thresholds (a,b), will be divided into the positive region , boundary region , and negative region , and we call it trisection of three-way decision, which is defined as follows: According to decision-theoretic rough set, the rule generated from represents object that belongs to ; the rule generated from represents object that does not belong to ; the rule generated from the represents uncertain object that belongs to .

Definition 2 (see [13]). Let be a finite nonempty universal set and be a finite standard set, which divide the into three pairwise disjoint subsets, , , and , which are denoted as and the following statement hold: (1)Pairwise disjoint: , , and (2)Covering the universal set: For these subsets, the complement is constructed as follows:

As a decision theory, the three-way decision model provides an incremental thinking method to solve complex problems. In the first stage, the universe will be divided into the three reasonable regions. In the second stage, the optimal strategy is formulated according to the three regions, and the different strategies will be applied in these regions [14, 15].

With the in-depth study of cloud computing, it is easy to find that there are many phenomena about “3” in cloud computing, which can be called the three elements of cloud computing. For example, the task time can be represented as three parts: the long-time, medium-time, and short-time tasks; the virtual machine operation can be represented as three parts: merging, migrating, and shutting; the host state can be represented as three parts: working, sleeping, and shutting. Note that with so many phenomena about “3,” it is not difficult to assume there exists a granular computing model based on a three-way decision under the cloud computing environment. We draw on the basic ideas of the three-way decision and many theoretical achievements to study some interesting cloud computing problems in this paper.

The main idea of three-way decision is to divide the whole into three independent parts and apply different processing methods to the different parts, and it also provides an effective strategy for solving complex problems. This paper proposes an energy-saving task scheduling model based on three-way decision with greedy strategy. The cloud tasks will be divided into CPU type, memory type, and hybrid type by the characteristics of resource requests, and different scheduling strategies are adopted for the different types of tasks to save energy consumption.

2.2. Cloud Task Scheduling

Task scheduling is one of the important components of cloud computing, which allocates tasks with appropriate computing resources for execution reasonably. The effect of scheduling strategy directly affects the effectiveness of the entire cloud computing system, including but not limited to operational efficiency, stability, flexibility, user service quality, system load balancing, and system energy consumption. Figure 1 shows the process of task scheduling in cloud computing.

As shown in Figure 1, the steps of task scheduling are shown as follows: at first, users submit tasks by their resource requirements, and tasks will be added to the task queue. The task scheduler obtains the available resource information mainly on virtual machines through scheduling technology; then, tasks are reasonably allocated to appropriate virtual machines to execute by scheduling strategy; finally, running the tasks on the virtual machine and after the tasks are completed, it will be summarized and feedback to the users.

3. Proposed Algorithm

This section presents a greedy scheduling model with saving energy consumption based on the characteristics of cloud tasks.

3.1. Cloud Task Scheduling System

The architecture diagram of the task scheduling system is shown in Figure 2, in which VM denotes virtual machine. The structure diagram gives the process of the cloud task scheduling system from the perspective of three-way decision and analyzes the user requests by historical data, including data structure, data attributes, and amount of data. The system data preprocessing module filters irrelevant attributes and normalizes valuable raw data. In order to achieve effective task scheduling, the tasks will be divided into three types (CPU type, memory type, and hybrid type) by the data characteristics and threshold, and the corresponding scheduling strategies are applied in different types. The tasks with different attributes will be placed in the more suitable host.

Alam et al. [16] pointed out that tasks have three modes for clustering problems about tasks. According to the resource requests and the number of requests of users, tasks can be represented as three parts: long-time, short-time, and medium-time tasks by clustering. Long-time tasks have more resource requests and fewer occurrences, but they always request for more resources. In particular, there are many requests for CPU resources, so they are called the CPU-intensive tasks; short-time tasks have fewer resource requests, but they have the most frequent occurrences; medium-time tasks have the middle number of occurrences and resource requests, and most requests are memory resource requirements, so they are called the memory-intensive tasks.

This paper uses a mix of multiple job types to cluster all tasks. The task dimensions have requests for CPU resources, memory resources, task waiting time, etc. Zhang et al. [17] pointed out that task resource usage is a more reliable metric than the task waiting time. So, CPU and memory resource requests are chosen as task dimensions. On the other hand, task type and priority are also essential considerations. However, there are nine types of tasks, and the details between types are complicated. Da et al. [18] pointed out the completion status of the tasks: 73% of tasks can be completed normally, 26% of tasks are terminated, less than 1% of tasks are in other states, and almost all tasks cannot be completed in the terminated state. Since the state of tasks is different, some tasks cannot be paused, postponed, or stopped midway once they are started, such as timers. There are 12 priorities, and too many priorities are not suitable for analysis and clustering. Finally, the basic dimensions of the task are summarized into two: CPU resource request and memory resource request.

3.2. The Cloud Task Clustering Based on Three-Way Decision

In this paper, tasks will be divided into three types via three-way decision: CPU type, memory type, and hybrid type. Given be a task set, where represents the dimension of task, which includes CPU and memory resources request. Based on the basic idea of three-way decision, the expressions for task clustering is , where represents CPU type, represents memory type, and represents hybrid type, and they satisfy the following properties: (1)There is no overlap between the three types of tasks: ,, and (2)Three types of tasks including all tasks:

It is assumed that the selection of iterative centroid is to find the largest value of CPU and the smallest value of the memory in each cluster , which form cluster centroids.

Definition 3. Let be CPU of the task , be memory of the task . Given represents the distance between task in cluster and cluster centroid of CPU attribute, and represents the distance of memory attribute. represents the average of CPU attribute and represents the average of memory attribute of tasks in cluster . Given the universe of discourse is a finite nonempty subset and is an indistinguishable relation on the universe of discourse, is the approximation space of rough set. If , upper and lower approximate set can be defined as follows: All object will be divided into core region , boundary region , and trivial region based on upper and lower approximate set, which is defined as follows:

The clustering algorithm firstly carries out k-means on the objects. Then, the tasks of each cluster were divided into three parts, which are core region , boundary region , and trivial region . The algorithm flow mainly has the following six steps: (1)Input dataset and number of clusters (2)Randomly select objects in the dataset as cluster centers(3)Calculate the Euclidean distance between the object and the cluster center and assign the objects to the smallest cluster of Euclidean distance(4)In each cluster , use the largest CPU value and the smallest memory value as the new cluster center, and return to step 3 until the cluster center no longer change(5)Calculate the threshold and for each cluster , and cope with each cluster . According to the upper and lower approximate set conditions of Definition 3, the objects of each cluster are divided into region (CPU type), region (memory type), and region (hybrid type)(6)Finally, region, region, and region of all clusters will be merged

Based on the above discussion in Subsection 3.1, we propose a cloud task clustering method based on three-way decision (TWD-CTC) by analyzing cloud tasks. The specific algorithm is as follows.

Input: , the number of clusters
Output:
⟵ ChooseCentroid;
while is changed do:
   DivideCluter;
   Update;
endwhile;
foreach do:
 if do:
    ;
 else if () do:
     ;
 else if () do:
   ;
 endif;
endfor;
Output:
3.3. The Energy-Saving Task Scheduling with Greedy Strategy Based on TWD-CTC

This subsection proposes an energy-saving scheduling model with greedy scheduling based on three-way decision.

3.3.1. Greedy Strategy

The greedy task scheduling with energy-saving model is shown in Figure 3 and VM is the virtual machine. The system structure diagram presents a greedy scheduling model based on TWD-CTC from a macroscopic perspective.

The process is as follows: (1)The different types of resource requests for tasks are divided into core tasks (CPU type), trivial tasks (memory type), and boundary tasks (hybrid type) by cluster(2)The greedy strategy is used to allocate tasks, and tasks are allocated to VM with optimal resources each time, so as to achieve better overall allocation efficiency

3.3.2. Scheduling Algorithm

The greedy algorithm is efficient and straightforward on finding the optimal solution to solve some problems. The basic idea is first to find the current local optimal solution and then gradually find the optimal global solution to save time to find the optimal solution and get the overall best solution.

The greedy-based task scheduling strategy firstly sorts the tasks in descending order by length of task and sorts VMs in ascending order according to MIPS. Then, calculate the estimated time of task run in VM , and the estimated time will be as the greedy object to iterate for assigning each task to VM with optimal resources. After each allocation, the task scale will be reduced, and each iteration will generate the most suitable resources for the current assigning task. When the iteration is completed, the global task allocation method will be optimal when the iteration is completed.

Based on the above discussion in this subsection, the specific algorithm of the TWD-CTC greedy scheduling model is given as follows.

Input: Tasks of ,VMs of
Output: Total energy consumption of cloud computing
 SortTask(Tasks,);
 SortVM(VMs,);
 foreach do :
  foreach do :
   ⟵ ComputeTime;
⟵0;
 foreach do :
  foreach do :
   if ( and
) do :
   ;
   ;
   endif;
  endfor;
 endfor;
⟵ SchedulVmTask ;
Output:

4. Performance Evaluation

4.1. Experiment Setup

The experiment uses the CloudSim platform designed to help create and manage multiple, independent, and collaborative virtualized services on datacenter nodes and enable flexible switching between time-sharing and space-sharing when allocating processing cores’ virtualized services.

We evaluate our algorithm based on the Google traces from [19], which is gathered from Google MapReduce Cloud trace logs, and this version of the cloud computing environment handles information of 25 million tasks that span for nearly one month. Our simulations have been conducted on a computer having Intel® Core™ i7-10750H CPU 2.60GHz 2.59 GHz; 32 GB RAM, and 64-bit Windows 10 Operating System.

The CPU/MIPS of task is [0,1], the Memory/MB of task is [100,500], and the file size/MB of task is [400,1000]; the VmMips of VM is [312,512,800,920,1000,1500]; the Host Mips of Host is 3720. The relationship between the host and energy consumption is shown in Figure 4.

The performance of TWD-CTCG algorithm is evaluated by two indicators including energy consumption and load balancing. The comparative experiment implements min-max-min [20], min-max [21], SJF [22], and FCFS [22] task scheduling algorithms. Our algorithm compares them in performance by energy consumption and load balancing.

4.2. Energy Consumption Comparison Experiment
4.2.1. Different Task Experiment Comparison

(1) Less Task Experiment Comparison. In the case of less task experiment, there are 100, 200, 300, 400, and 500 tasks. The compared results of experiments are shown in Table 1 and Figure 5.

From the analysis of the experimental data in Table 1 and Figure 5, when the number of tasks is 100, the cases of min-max and min-max-min are essentially similar and FCFS is relatively higher, and SJF has the lowest energy consumption, while TWD-CTCG performs slightly worse. As the number of tasks increases, the energy consumption of SJF increases dramatically, and TWD-CTCG has the lowest energy consumption in other tasks type.

(2) More Task Experiment Comparison. In the case of more task experiments, there are 1000, 1250, 1500, 1750, and 2000 tasks. The compared results are shown in Table 1 and Figure 6.

According to the analysis of the experimental data in Table 1 and Figure 6, as the number of tasks increases, the energy consumption generated by the min-max and min-max-min are nearly the same, and SJF generates relatively higher and FCFS has been growing, but the growth has not been remarkable. The TWD-CTCG algorithm increases relatively small in energy consumption.

By comparing the experimental results with less tasks and more tasks, it can be seen that the TWD-CTCG is better than the other algorithms in less task case except for task number is 100 and TWD-CTCG is better than others in more task case.

4.2.2. Different Host Experiment Comparison

(1) Less Host Experiment Comparison. In the case of less host experiment, there are 10, 20, 30, 40, and 50 hosts. The compared results of experiments are shown in Table 2 and Figure 7.

From the analysis of the experimental data in Table 2 and Figure 7, with the number of hosts increasing, the energy consumption generated by min-max-min and min-max is almost the same, while the SJF algorithm is better than FCFS, and in the case of less hosts, TWD-CTCG is better than other algorithms.

(2) More Host Experiment Comparison. In the case of more host experiment, there are 100, 150, 200, 250, and 300 hosts. The compared results are shown in Table 2 and Figure 8.

According to the analysis of the experimental data in Table 2 and Figure 8, with the continuous increase of the number of hosts, the energy consumption generated by min-max-min and min-max is still basically the same in the case of more hosts, but the effect of FCFS is better than that of SJF, indicating that the FCFS algorithm will be better in the case of more hosts, and the effect of TWD-CTCG is best.

By comparing the experimental results with less tasks and more tasks, it can be seen that the algorithm proposed in this paper is better than other algorithms in terms of energy-saving effect and works well in both cases.

4.3. Load Balancing Comparison Experiment
4.3.1. Different Task Experiment Comparison

(1) Less Task Experiment Comparison. Similar to Subsection 4.2, the experiment has 100, 200, 300, 400, and 500 tasks and then compares the load balancing generated by the min-max and min-max-min, SJF, and FCFS algorithms under different numbers of tasks. The compared results are shown in Table 3 and Figure 9.

From the experimental data analysis in Table 3 and Figure 9, in the case of less tasks, the load balancing of the TWD-CTCG algorithm is not well at the beginning, but as the number of tasks increases, the load balancing gradually improves. However, the difference between min-max and min-max-min algorithms of load balancing is not very big and SJF keeps growing, while FCFS grows steadily.

(2) More Task Experiment Comparison. This section is a multitask load balancing comparison experiment, which has 1000, 1250, 1500, 1750, and 2000 tasks. The load balancing of more tasks is shown in Table 3 and Figure 10.

According to the analysis of the experimental data in Table 3 and Figure 10, in the case of more tasks, with the increasing of the number of tasks, the load balancing of SJF becomes larger and larger and exceeds FCFS. The load balancing of min-max-min is basically the same as min-max. The TWD-CTCG remains low in the whole process, except that the SJF is the same as TWD-CTCG when the number of task is 1000.

From the above experimental results comparing less tasks and more tasks, it can be seen that the algorithm proposed in this paper is better than other algorithms in terms of load balancing.

4.3.2. Different Host Experiment Comparison

(1) Less Host Experiment Comparison. The experiment has 10, 20, 30, 40, and 50 hosts and then compares the load balancing generated by the min-max and min-max-min, SJF, and FCFS algorithms under different hosts. The compared results are shown in Table 4 and Figure 11.

From the experimental data analysis in Table 4 and Figure 11, in the case of less hosts, the load balancing of all algorithms is similar at first, but as the number of hosts increases, the load balancing of min-max-min and min-max becomes larger and larger, and they are relatively similar. The TWD-CTCG algorithm has been kept at a low level, and the effect of FCFS is better than that of SJF in the case of less hosts.

(2) More Host Experiment Comparison. This is a more host load balancing comparison experiment, which has 100, 150, 200, 250, and 300 hosts. The compared results are shown in Table 4 and Figure 12.

According to the analysis of the experimental data in Table 4 and Figure 12, in the case of more hosts, min-max-min is similar to min-max and SJF is similar to FCFS. In this case, SJF is slightly better than FCFS, but TWD-CTCG has been keep a relatively low level.

In the case of less hosts and more hosts, TWD-CTCG maintains a good effect, indicating that TWD-CTCG has a good effect on load balancing in various situations of hosts.

5. Conclusions

In the development process of cloud computing, energy consumption optimization is still one of the important issues. This paper proposes TWD-CTCG algorithm, which cluster tasks by resource requests and schedule cloud tasks with greedy strategy to achieve reasonable resource allocation.

In the future work, it is necessary to increase the dataset and introduce big data processing methods based on the in-depth research on Hadoop, Spark big data platform, container cloud platform, etc. into our researches. The task clustering density will be more refined, the threshold function will be further improved, and the dynamic threshold will be used to further save energy consumption and improve resource utilization.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Authors’ Contributions

Shuaishuai Liu and Xinyu Ma contributed equally to this work.

Acknowledgments

This work is supported by the Postgraduate Innovation Project of the Harbin Normal University (HSDSSCX2021-30).