Abstract

Task execution in cloud computing requires obtaining stored data from remote data centers. Though this storage process reduces the memory constraints of the user’s computer, the time deadline is a serious concern. In this paper, Adaptive Cost-based Task Scheduling (ACTS) is proposed to provide data access to the virtual machines (VMs) within the deadline without increasing the cost. ACTS considers the data access completion time for selecting the cost effective path to access the data. To allocate data access paths, the data access completion time is computed by considering the mean and variance of the network service time and the arrival rate of network input/output requests. Then the task priority is assigned to the removed tasks based data access time. Finally, the cost of data paths are analyzed and allocated based on the task priority. Minimum cost path is allocated to the low priority tasks and fast access path are allocated to high priority tasks as to meet the time deadline. Thus efficient task scheduling can be achieved by using ACTS. The experimental results conducted in terms of execution time, computation cost, communication cost, bandwidth, and CPU utilization prove that the proposed algorithm provides better performance than the state-of-the-art methods.

1. Introduction

Cloud computing is a promising technology that provides efficient services to the customers in a distant virtual platform on a pay-per-use model. The definition for cloud computing given by NIST [1] is as follows: cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to shared computing resources which can be provisioned and provided with minimal interaction. Cloud computing provides different types of services such as infrastructure, software, and platform to the requested users with a specific price for the services. Cloud services use the internet and the central remote servers to maintain the data and applications. Cloud computing allows consumers and businesses to use applications without installation and access their personal files at any computer with internet access. This approach improves the computing processes such as data storage and processing. Cloud is deployed in different models: public cloud, private clouds, hybrid cloud, community cloud, and distributed cloud are some examples.

Service oriented architecture is the basic principle of the cloud computing which considers everything on the cloud as a service [2]. Infrastructure-as-a-service (IaaS) is the service of providing the physical machines (PM) or virtual machines (VM) to the user for processing resources, data partitioning, scaling, security, and backup processes. Platform-as-a-service (PaaS) provides the vendors with the platforms for development of applications including databases, web servers, and developmental tools. Software-as-a-service (SaaS) provides services for the e-mails, virtual desktops, communication processes, and gaming applications. The services are normally paid services whose price is fixed by the service providers based on the usage level of the customers. The price of the cloud services is very less compared to the other installed services.

In cloud computing, the tasks are performed in the physical machines (PMs) or the VMs as per the task requirements. The data required for the execution of the tasks and services are stored at multiple distant storage locations called the data centers which are also used with specific cost [3]. When the tasks are performed in the processing machines, the required data are requested and obtained from the data centers. The data from the data centers has to reach the VM within the particular time which is always the access completion time. The problem with this process is that the data is accessed through certain paths which are bound by the computation and storage costs. So it is possible that either one of the two situations arises: in order to obtain the data in time, the cost has to be sacrificed or, in order to reduce the cost, the delay in data access has to be accepted. This problem reduces the overall scheduling performance.

In order to overcome the data access problem, an adaptive cost-based task scheduling (ACTS) is proposed in this paper so that the data is obtained at the required time without delay and through affordable cost paths. The proposed approach estimates the completion time for accessing the data [13] that are required by the VM machines during the particular task executions. Then the cost of each possible path is estimated by the sum of computation, communication, and storage costs [14] of the path. Using the completion time for data access the priority of the tasks is assigned. The paths with high cost but with quick data access are assigned to tasks with high priority and the paths with low cost are assigned to the low priority tasks. Thus the data paths can be adaptively selected to reduce the overall cost and effectively deliver the data at the required time.

The remainder of the paper is summarized as follows: Section 2 explains the related researches briefly and presents the analysis of scheduling schemes. Section 3 presents the methodologies utilized in the paper. Section 4 provides the experimental results and their discussions. Section 5 concludes the research.

A cloud scheduler is a cloud-enable distributed resource manager. It manages virtual machines on clouds to create an environment for job execution. The FIFO scheduler in Hadoop MapReduce, fair scheduler in Facebook, and capacity scheduler in Yahoo are typical examples that serve the cloud systems with efficient and equitable resource management, but none of these schedulers satisfies QoS (quality of service) constraints. Therefore, they are not applicable to soft real-time needed applications and services, which are becoming more and more important and necessary in the hybrid cloud environment. The main objective of this section is not to propose methodologies to overcome all of the current issues in cloud task scheduling but to study and analyze some of the current methodologies and focus on finding their drawbacks.

Sahni and Vidyarthi [4] presented a cost-effective deadline constraint dynamic scheduling algorithm for the scientific workflows. The workflow scheduling algorithms in the grid and clusters are efficient but could not be utilized effectively in the cloud environment because of the on-demand resource provisioning and pay-as-you-go pricing model. Hence the scheduling using a dynamic cost-effective deadline-constrained heuristic algorithm has been utilized to exploit the features of cloud by considering the virtual machine performance variability and instance acquisition delay to determine the time scheduling. The problem with the approach is that VM failures may adversely affect the overall workflow execution time.

Tsai et al. [5] proposed hyper-heuristic scheduling algorithm (HHSA) for providing effective cloud scheduling solutions. The diversity detection and improvement detection operators are utilized in this approach to dynamically determine the better low-level heuristic for the effective scheduling. HHSA can reduce the makespan of task scheduling and improves the overall scheduling performance. The drawback is that the approach has high overhead of connection which reduces the importance of scheduling and thus reduces the overall performance.

Zhu et al. [6] proposed an agent-based dynamic scheduling algorithm named ANGEL for effective scheduling of tasks in the virtualized clouds. In this approach, a bidirectional announcement-bidding mechanism and the collaborative process are performed to improve the scheduling performance. To further improve the scheduling, elasticity is considered to dynamically add VMs. The calculation rules are generated to improve the bidding process that in turn reduces the delay. The problem with this approach is that it reduces the performance as it does not consider the communication and dispatching times.

Zhu et al. [7] presented an evolutionary multiobjective (EMO) workflow scheduling approach to reduce the workflow scheduling problem such as cost and makespan. Due to the specific properties of the workflow scheduling problem, the existing genetic operations, such as binary encoding, real-valued encoding, and the corresponding variation operators are based on them in the EMO. The problem is that the approach does not consider monetary costs and time overheads of both communication and storage.

Zhang et al. [8] proposed a fine-grained scheduling approach called phase and resource information-aware scheduler for MapReduce (PRISM) for scheduling in the MapReduce model. MapReduce has been utilized for its efficiency in reducing the running time of the data-intensive jobs but most of the MapReduce schedulers are designed on the basis of task-level solutions that provide suboptimal job performance. Moreover, the task-level schedulers face difficulties in reducing the job execution time. Hence the PRISM was developed which divides tasks into phases. Each phase with a constant resource usage profile performs scheduling at the phase level. Thus the overall job execution time can be reduced significantly but the problem of meeting job deadlines in the phase level scheduling is a serious concern that requires specified attention.

Zhu et al. [9] presented real-time task oriented energy aware (EA) scheduling called EARH for the virtualized clouds. The proposed approach is based on rolling-horizon (RH) optimization and the procedures are developed for creation, migration, and cancellation of VMs to dynamically adjust the scale of cloud to achieve real time deadlines and reduce energy. The EARH approach has the drawback of the number of cycles assigned to the VMs that cannot be updated dynamically.

Maguluri and Srikant [10] suggested a scheduling method for job scheduling with unknown duration in the cloud environment. The job sizes are assumed to be unknown not only at arrival, but also at the beginning of service. Hence the throughput-optimal scheduling and load-balancing algorithm for a cloud data center is introduced, when the job sizes are unknown. This algorithm is based on using queue lengths for weights in max-weight schedule instead of the workload.

Zuo et al. [11] presented self-adaptive learning particle swarm optimization- (SLPSO-) based scheduling approach for deadline constraint task scheduling in hybrid IaaS clouds. The approach solves the problem of meeting the peak demand for preserving the quality-of-service constraints by using the PSO optimization technique. The approach provides better scheduling of the tasks with maximizing the profit of IaaS provider while guaranteeing QoS. The problem with this approach is the lack of priority determination which results in failure of deadline tasks.

Scheduling tasks in a cloud computing environment is a challenging process. In [12] Su et al. presented a cost efficient task scheduling method that can be utilized for processing large size programs. But the performance of the approach is not sufficient as it did not consider the completion time and cost for scheduling.

From the literature it is found that the major issues in the above described methods are high cost consumption especially for communication and computation of data from cloud data centers. The inability to meet up the deadlines, due to the inappropriate data path allocation while task scheduling, is another area of concern. The analysis of various scheduling schemes is listed as below.

2.1. Analysis of Scheduling Schemes

Generally, the efficient task scheduling concepts of the clusters and the grid are not effective in the cloud environment. The main reason is that in cloud computing the resource provision is on-demand and the resources are provided on the basis of pay-per-use. Hence the scheduling approach has to make use of the features of the cloud in order to efficiently schedule the tasks without time delay. While processing a task in a VM, the data are needed to be obtained from the distant data centers located at multiple locations. As the tasks are deadline constraint, the data are needed to be obtained within the particular time using effective scheduling approaches. However, the solution for scheduling deadline constraint tasks in the cloud leads to a new problem in the form of cost. The computation and the storage resources are the basic resources in the cloud environment that forms the cost models.

Table 1 shows the various scheduling schemes described in literature and their drawbacks.

The high cost problems can be reduced by effectively selecting the minimum cost paths based on availability of the data paths. The problem is that not all the tasks take the same execution time which means some tasks require data quicker than the other tasks. But when using only the minimum cost path, the data would have to wait in queue or might be lost due to queue overflow. So the cost paths are needed to be selected adaptively for deadline constraint tasks. These two problems are the major focus of this research.

3. Adaptive Cost-Based Task Scheduling

The proposed adaptive cost-based task scheduling (ACTS) is discussed in this section. The scheduling of the tasks to the VMs can be performed effectively using the proposed scheduling method. This work takes inspiration from the work of Su et al. In their work, cost efficient task scheduling is used which considers the overall execution time and total monetary costs for scheduling. Though the execution time and monetary cost are considered this scheme cannot be considered as efficient due to the reason that these two factors are collaborative factors. The execution time is the time for task completion. This means the execution time includes the time from which the tasks are assigned to a VM until the output of the tasks is obtained. However, the time consumed for each process in task execution varies and not all of them can be minimized. In this sense, the time taken for obtaining the data from the data centers for task execution is considerably higher than all other process in task execution. Similarly, the monetary cost is the combined cost of resources for computation, communication, storage, data transfer, and so forth; in these processes, the costs for computation and communication are normally higher than other costs. But Su et al. considered only the combined factors for scheduling. Hence in the proposed ACTS we focused on specifically considering the individual processes as factors for scheduling. The major factors are data access completion time, computation cost, and communication cost.

The data that are required to be processed in the VM or the PM are stored in the distant data centers. These data are needed to be fetched to the processing VMs from the data centers through the cost-effective paths. The data access of each VM follows an independent Poisson distribution associated with the average rate of the arrival rate of the network I/O requests. The data access to the driver domain (PM) is processed on the basis of providing the access to the first come users while the other users wait in the queue. The service time of a data access in the driver domain is represented in an arbitrary distribution.

The data access completion time is considered to be the determination point in the selection of the data paths. The completion time for the data access is calculated by utilizing the parameters of the network input/output requests in the physical machines. The mean of the service time network I/O requests in the PMs is given by μ and the variance of the service time network I/O requests in the PMs is given by . The arrival rate of the network I/O requests to the PMs is given by . Then the completion time of a data access can be estimated using the formulaThe arrival rate of the network I/O requests to the PMs can be calculated bywhere and are the ratio of the CPU time allocated to the existing and new VMs. and are the arrival rate of the network I/O requests of the existing and new VMs to the PMs.

The tasks are performed in the virtual machine (VM) which obtains the data from cloud centers through the data access paths. Each data access path contains resources for processing the requests and accessing the data and also requires storage resources for storing the accessed data. Each of the resources carries certain costs for utilizing the resources. The computation cost includes the cost of resources for execution of the I/O requests for the data access and the cost for reaccessing the same data again. It also includes the cost for regenerating the datasets. The communication cost is the total cost of the resources utilized for the processing of the I/O requests. It can be expressed as the product of the data set size and the network traffic price.

The cost of the possible data access paths is analyzed in order to determine the minimum cost path. The cost of each path can be estimated by

The computation cost and communication cost are vital in the determination of the cost-effective paths as these resources handle the I/O requests of the VMs. When the VM executes a task, for accessing the data from the data centers, the VM sends request for the access. The data centers receive the I/O requests and then provide access for the data.

The proposed ACTS considers both the cost and the completion time of data access for efficiently scheduling the tasks. ACTS assigns priority to the tasks based on the completion time. Time T is chosen as a fixed time and the completion time is compared with T to determine the priority. The low priority tasks are those that have more completion time and hence the path is selected as minimum cost path to reduce the overall cost. The reason for this approach to low priority tasks is because these tasks can be executed in a normal time without much urgency. Similarly, the high priority tasks are those that require data within the less completion time and hence the paths that provide quicker data access are selected without waiting for the minimum cost path. This may increase the cost but the main aim is to obtain the requested data within the time and hence the small variation in the overall cost can be negligible. After the execution of the tasks, the CPU utilization and the bandwidth utilization are estimated.

Figure 1 shows the proposed ACTS procedure. This work focuses on scheduling the tasks to the VMs with minimum cost paths to reduce the complexity in the data accessing from the cloud data center. The tasks are allocated to the underloaded VMs based on the normal load conditions. The tasks allocated to VMs access the data from the distant cloud data centers. The cost that recurred for I/O processing is computed and the completion time for data access is estimated. Then the CPU utilization and bandwidth utilization are calculated and updated for successive task executions.

For example, let us consider V tasks of simple mathematical programs with flexible properties of bandwidth, random access memory (RAM), and million instructions per second (mips). These parameters of the cloud tasks are user defined and can be flexibly chosen. Moreover, the simulations are made in the real-time simulation environment (CloudSim) which provides user friendly behavior. The tasks are nonpreemptive dependent tasks.

The VMs are initiated from the cloud environment with existing VMs denoted as E and the newly initiated VMs are placed under . This is because when there is large load, the new VMs are introduced. The tasks execute the simple mathematical programs with the length differing based on the initiated codes. The addition program of () is executed once for a task with 4 bits while it is repeated to achieve the prescribed length in the chosen tasks.

Now let us take task v with m resources available. Initially the tasks are checked for possible execution. All the VMs are running in parallel and are unrelated and each VM runs on its own resources. There is no sharing of its own resources by other VMs. We schedule nonpreemptive dependent tasks to the VMs. For each task , the arrival rate and are calculated. Then the costs and are computed for each data path d using (2), (7), and (9). The computation cost in equation (7) is estimated as the sum of all costs incurred for running a task on a VM of a provider (8) while the communication cost (9) is the product of cost for data required and the inbound network traffic prices. Based on the completion time, the tasks priority is assigned. Then based on and cost, the paths are sorted. Then the paths are allocated to each task and then the underloaded VMs are loaded with the tasks which access the data from the cloud data center at the deadline time. Then the CPU utilization (11) and bandwidth utilization (12) are calculated for determining the efficiency of the system. This scheduling procedure is sorted in the following algorithm.

Algorithm 1 (adaptive cost-based task scheduling).   Input: number of tasks, VMsOutput: task schedulingBeginDeploy the set of physical machines.E = set of existing VMs present in the cloud computing system.N = set of new VMs to be created.Set of tasks .Set of resources .For each task ,Arrival rate to using (1)//where and are the ratio of the CPU time allocated to the existing and new VMs. and are the arrival rate of the network I/O requests of the existing and new VMs to the PMs.Compute completion time of data access using (2)//where is the mean service time of network I/O requests in , is the variance of the service time distribution, and is the arrival rate of network I/O requests to End forCompute cost of each possible data path using (3)Computation cost where the cost of running a task on provider p with VM is defined as//where set of tasks is given by and is the service provider. is the time to deadline of. is the runtime of a task . is the cost of running an VM on for one time unit.Communication cost can be computed as//where is the GB required for task and is the inbound network traffic prices per GB of the provider p.Select minimum cost path .Assign priority to tasks .If Else if )// is a fixed time with which the data access completion time of the tasks is compared to determine the priority;analyze data paths which satisfies the time to deadline for tasks ;data path = ;// path has faster data access to satisfy time to deadline even without minimum costEnd ifAssign tasks to .Estimation of CPU utilization// where is the CPU utilization; is the calculated cloudlet’s MIPS length; is the MIPS ration of the CPU; is the cloudlet’s duration in milliseconds when executed on a CPU with a MIPS rating of Estimation of bandwidth utilization// where is the bandwidth utilization; is the allotted bandwidth quota; is the amount of data transferred during the life of VM; is the duration which is the VM lifetime and it is equal to the VM release time to the VM creation time.Update VM characteristics for next iteration.End.

3.1. Description

The tasks V, the number of VMs, and VM resources are initialized.. The set of existing VMs and the set of newly created VMs are assigned. For each task, the data access completion time is calculated as . Similarly the computation cost and communication cost are also calculated in order to estimate the cost of each data path. Using the completion time and computation cost, and communication cost of each path, the scheduling is performed. The tasks are assigned priorities based on the completion time. The high priority tasks which have less completion time are allocated fast data access paths that satisfy the time to deadline without prioritizing the cost. But for the low priority tasks which have high completion time the minimum cost paths are allocated. Then the tasks are executed and the utilization of CPU and bandwidth are calculated.

4. Experimental Results

The experiments are conducted to evaluate the performance of the proposed adaptive cost-based task scheduling and the results are tabulated. The cost efficient task scheduling is presented in [12] utilized in this work for performance comparison without considering the cost and completion time of the data access and compared with the proposed ACTS considering the cost and data access completion time. The experiments are carried out using the CloudSim [15] tool. The classes of the CloudSim simulator have been extended (overridden) to utilize the newly written algorithm. The simulator CloudSim opens the possibility of evaluating the hypothesis prior to software development in an environment which can reproduce tests. Specifically, in case of cloud computing where the access to the infrastructure incurs payments in real currency, a simulation-based approach allows cloud customers to test their services in repeatable and controllable environment. Additionally it allows tuning the performance bottlenecks before the deployment on real clouds. The efficiency of the approaches is compared in terms of computation cost, communication cost, execution time, CPU utilization, and bandwidth.

The numbers of tasks and VMs considered are flexible to user requirements which mean the user provides memory, mips, and bandwidth values which are randomly utilized in the VM. The appropriate determination of the characteristics of the VM and the tasks is highly recommended for obtaining desired performance evaluation results. The VM characteristics are as follows: ram (256, 312, 712, and 856) bytes; mips (330, 370, and 400); bandwidth (700, 750, 800, and 900) bits per second (bps). Likewise, the I/O intensive tasks are taken as follows: length (4, 8, 11, 5, 3, 9, and 10); memory (256, 312, 378, 280, 436, 553, and 375) bytes. An I/O intensive task performs the function of reading the input/output data and writes them onto the files. These values are user provided values and suppose if the number of VMs is 10 then the combination of ram, mips, and bandwidth is chosen randomly. For example, in case of the ram for 10 VMs, the one possible set of values would be 256, 312, 712, 856, 256, 312, 712, 856, 256, and 312, respectively.

4.1. Computation Cost

Computation cost is the cost that is required for utilizing the resources for computation of the I/O requests for the data access. It can be computed using (7).

Figure 2 shows the comparison of the existing cost efficient task scheduling without considering the completion time and the cost with the proposed adaptive cost-based task scheduling (ACTS) with considering the completion time and the cost in terms of the computation cost. In the -axis, the number of tasks is taken while along the -axis the computation cost (price) is taken. When the number of tasks is 50, the cost efficient task scheduling has computation cost of 2890 but the proposed ACTS has 2534.8. Thus the proposed ACTS provides better scheduling with minimal computation cost.

4.2. Communication Cost

Communication cost is the cost that is required for utilizing the resources for I/O requests and responses between the data center and the VM for the data access. It can be calculated using (9).

Figure 3 shows the comparison of the existing cost efficient task scheduling without considering the completion time and the cost with the proposed adaptive cost-based task scheduling (ACTS) with considering the completion time and the cost in terms of the communication cost. In the -axis, the tasks are taken while along the -axis the communication cost (price) is taken. When the number of tasks is 50, the existing cost efficient task scheduling has communication cost of 1100 but the proposed adaptive cost-based task scheduling has 946.6. This shows that the proposed ACTS consumes less cost than the existing scheme.

4.3. Execution Time

The execution time is the time required to process a task in a VM. The execution time is estimated as the product of number of cycles for executing per instruction, time per cycle, and the number of instructions.

Figure 4 shows the comparison of the existing cost efficient task scheduling without considering the completion time and the cost with the proposed Adaptive cost-based task scheduling (ACTS) with considering the completion time and the cost in terms of the execution time. In the -axis, the tasks are taken while along the -axis the execution time in milliseconds (ms) is taken. When the number of tasks is 50, the existing cost efficient task scheduling has execution time of 4.978 ms but the proposed ACTS has 2.56 ms. This shows that the proposed ACTS reduces the time taken for the overall process.

4.4. CPU Utilization

CPU utilization refers to the usage of processing resources or the amount of work handled by a CPU. CPU utilization varies depending on the amount and type of managed computing tasks. It is estimated using (11).

Figure 5 shows the comparison of the existing cost efficient task scheduling with the proposed adaptive cost-based task scheduling (ACTS) in terms of the CPU utilization. In the -axis, the number of tasks is taken while along the -axis the CPU utilization in % is taken. When the number of tasks is 50, the existing cost efficient task scheduling has CPU utilization of 13.14% but the proposed ACTS has 11.345%. This shows that the proposed ACTS has less CPU utilization.

4.5. Bandwidth Utilization

Bandwidth is the amount of data that can be transmitted in a fixed amount of time. It is given in bits per second (bps). It is estimated using (12).

Figure 6 shows the comparison of the existing cost efficient task scheduling with the proposed Adaptive cost-based task scheduling (ACTS) in terms of the bandwidth. In the -axis, the number of tasks is taken while along the -axis the bandwidth in bps is taken. When the number of tasks is 50, the existing cost efficient task scheduling has bandwidth of 240.98 bps but the proposed ACTS has 34.123 bps.

Thus from the experimental results it is clear that the proposed Adaptive cost-based task scheduling (ACTS) which considers the completion time and computation cost and communication cost is efficient compared to the existing cost efficient task scheduling.

5. Conclusion

Scheduling tasks in cloud computing with reduced delay and effective cost management are a challenging task. Hence in this paper, adaptive cost-based task scheduling (ACTS) is proposed considering the data access completion time and the cost for data access. By considering these two factors, the data can be fetched from the data centers effectively and the scheduling performance can be improved. The approach focuses on providing data access for executing each task with maintained costs. Experimental results also show that the proposed adaptive cost-based task scheduling provides better performance in terms of execution time, computation cost, communication cost, and bandwidth and CPU utilization when compared with existing cost-efficient task scheduling approach.

In this paper, the task scheduling is performed for the already determined task demands and it is quite challenging to schedule tasks with undetermined demands. This could be performed by utilizing efficient resource provisioning techniques in the future. The cost for regeneration of datasets is not computed in ACTS but it is not efficient for exception cases which should be considered in the future researches. Moreover, the load-balancing problems are also needed to be resolved for providing efficient cloud computing services which would be our future scope of research.

Competing Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.