Abstract

Recently, the cloud computing is a computing paradigm that constitutes an advanced computing environment that evolved from the distributed computing. And the cloud computing provides acquired computing resources in a pay-as-you-go manner. For example, Amazon EC2 offers the Infrastructure-as-a-Service (IaaS) instances in three different ways with different price, reliability, and various performances of instances. Our study is based on the environment using spot instances. Spot instances can significantly decrease costs compared to reserved and on-demand instances. However, spot instances give a more unreliable environment than other instances. In this paper, we propose the workflow scheduling scheme that reduces the out-of-bid situation. Consequently, the total task completion time is decreased. The simulation results reveal that, compared to various instance types, our scheme achieves performance improvements in terms of an average combined metric of 12.76% over workflow scheme without considering the processing rate. However, the cost in our scheme is higher than an instance with low performance and is lower than an instance with high performance.

1. Introduction

In recent years, due to the increased interests in cloud computing, many cloud projects and commercial systems such as Amazon EC2 [1] have been implemented. Cloud computing provides many benefits including easy access to user data, ease of management for users, and the reduction of costs. And cloud computing services provide a high level of scalability of computing resources combined with internet technology to many customers [2, 3]. In most cloud services, the concept of an instance unit is used to provide users with resources in a cost-efficient manner. There are many different cloud computing providers and each offers different layers of services. This paper focuses on Infrastructure-as-a-Service (IaaS) platforms that allow clients access to massive computational resources in the form of instances [47].

Generally, cloud computing resources use reliable on-demand instances. On-demand instances allow the user to pay for computing capacity by hour, with no long-term commitments. This frees users from the costs and complexities of planning, purchasing, and maintaining hardware and transforms what are usually large fixed costs into much smaller variable costs [1]. However, on-demand instance may incur upper cost than other instances such as reserved instance and spot instance. We focus on spot instances in unreliable environment. For such a reason, if you have time flexibility for executing applications, spot instances can significantly decrease your Amazon EC2 costs [8, 9]. For task completion, therefore, spot instances may incur lower cost than on-demand instances.

The spot instance is configured by spot market-based cloud environment. In the spot instance environment, variations of spot prices are dependent on the supply and demand of spot instances. The environment affects the successful completion or failure of tasks depending on the variation of spot prices. Spot prices have a market structure and follow the law of demand and supply. Therefore, cloud services (Amazon EC2) can provide a spot instance when a user’s bid is higher than the current spot price. Further, a running instance stops when a user’s bid becomes less than or equal to the current spot price. After a running instance stops, it restarts when a user’s bid becomes greater than the current spot price [1012].

In particular, the scientific application makes the current common of workflow. However, the spot instance-based cloud computing takes various performances. In spot instance, an available execution time depends on a spot price. The spot price changes periodically based on user’s demand and supply. The completion time for the same amount of a task varies according to the performance of an instance. In particular, the failure time of each instance differs according to the user’s bid and the performance in an instance. Therefore, we solve the problem that a completion time of a task in an instance increases when a failure occurs. For an efficient execution of a task, the task is divided into subtasks on various types of available instances. We analyze information of the task and the instance from price history. We estimate the size of task and the information of an available instance from the analyzed data. We create workflow using each available instance and the size of a task. As a consequence, we propose the scheduling scheme using workflow to solve job execution problem and considering task processing rate. And we execute user’s job at the boundary of selected instances and expand the suggested user budget.

In this section, we begin by describing the workflow model focusing on spot instances. Firstly, we explain the background of spot instances in cloud environments. In the spot instances environment [8, 9], there are numerous studies on fault tolerance [1012] and workflow scheduling [13, 14].

2.1. Spot Instances

Amazon EC2 offers the IaaS instances in three different ways with different price, reliability, and various performances of instances. Those are reserved instances, on-demand instances, and spot instances. In case of reserved instances, a user pays a yearly fee and receives a discount on hourly rates. And, in case of on-demand instances, a user pays the fee on hourly rate. In spot instances, a user determines the user’s bid and spot price decides spot market based on the user’s demand and supply. Our scheduling focuses on offering services at the boundary of spot instances. Spot instances give an unreliable environment compared to reserved and on-demand instances. However, spot instances can significantly decrease user’s costs compared to other instances. The spot price in spot instance is based on market structure and law of demand and supply. Therefore, cloud service can provide a spot instance when a user’s bid is higher than the current spot price. If the user’s bid exceeds the current market price, the user runs the instance. However, if the market price exceeds the user’s bid, the instance is terminated and the partial hours are not charged. And the spot system immediately stops the spot instance without any notice to the user. After a running instance stops, the instance restarts when a user’s bid is greater than the current spot price. An example of spot history is shown in Figure 1. This figure shows examples of fluctuations of spot price for standard instance (m1-small and m1-large) and high-CPU instance (c1-medium and c1-xlarge) during 7 days in October 2010 [15].

2.2. Fault Tolerance

On the fault tolerance side, two similar studies (hour-boundary checkpointing [10] and rising edge-driven checkpointing [11]) proposed enforcing fault tolerance in cloud computing with spot instance. Based on the actual price history of EC2 spot instances, they compared several adaptive checkpointing schemes in terms of monetary costs and job execution time. In hour-boundary checkpointing, the checkpointing operation is performed in the hour boundary, and a user pays the biding price on an hourly basis. In rising edge-driven checkpointing, checkpointing operation is performed when the price of the spot instance is raised and the price is less than the user’s bid. However, two schemes have problems that the costs and task completion time are increased due to increase of the number of checkpoints. To solve these problems, in our previous study [12], we proposed the checkpointing scheme using checkpoint thresholds based on rising-driven checkpointing. The checkpointing is basically performed using two thresholds, price and time, based on the expected execution time according to the price history. Therefore, we propose a workflow system to apply the previous proposed checkpointing.

2.3. Workflow Scheduling

A workflow is a model that represents complex problems with structures such as directed acyclic graphs (DAG). Workflow scheduling is a kind of global task scheduling as it focuses on mapping and managing the execution of interdependent tasks on shared resources. However, the existing workflow scheduling methods have the limited scalability and are based on centralized scheduling algorithm. Consequently, these methods are not suitable for spot instance-based cloud computing. In spot instance, the job execution has to consider available time and cost of an instance. Fully decentralized workflow scheduling system determines the instance to use the chemistry-inspired model in community cloud platform [13]. A throughput maximization strategy is designed for transaction-intensive workflow scheduling that does not support multiple workflows [14]. Our proposed scheduling guarantees an equal task distribution to available instances in spot instance-based cloud computing. And the scheduling method performs redistribution of the tasks based on task processing rate.

3. Proposed Workflow System

3.1. System Architecture

Our proposed scheme is expanded from our previous work [12] and includes a workflow scheduling algorithm. Figure 2(a) presents the relation of workflows and instances and Figure 2(b) shows the constitution of coordinator and manager. Figure 2 illustrates the roles of the instance information manager, the workflow manager, and the resource scheduler. The instance information manager obtains information for the job allocation and resource management. The information includes VM specifications in each instance and the execution-related information such as the execution costs, execution completion time, and failure time. The execution-related information is calculated by using the selected VM based on spot history. The workflow manager and resource scheduler extract the needed execution-related information from the instance information manager. Frist, the workflow manager generates the workflow for the requested job. The generated workflow determines the task size according to the VM performance, the execution time and costs, and the failure time when the selected instance is used. Secondly, the resource scheduler manages the resource and allocates the task to handle the job. Resource and task managements are needed in order to reallocate tasks when the resource cannot get the information for the task and when the task has a fault during execution.

3.2. Workflow Scheduling Technique considering Task Processing Rate

The scheduling scheme is depicted in Figure 3. The instances , , and mean high, medium, and low performance, respectively. The instance belongs to a positive group and the other two instances ( , ) belong to a negative group. The scheduler distributes a task size to allocate available instances and considers performance of instances. Task size recalculation points divide the fourth quarter based on the expected task execution time and recalculate each quarter except for the last quarter. The task size rate is determined based on the average of task execution time of each instance within the recalculated point. And the modified task size in each instance is allocated to consider the task size rate.

Figure 4 shows the recalculation point of the task size from the position in Figure 3. In Figure 4, we assume that the processing rate of instances is proportional to the performance of instances. The left side of Figure 4, “before recalculation,” represents the tasks assigned to each instance. The right side, “after recalculation with relocation,” shows the result of task migration based on the average task execution time in each instance. After a recalculation operation, we perform the rearrangement of tasks. The rearrangement method sorts tasks in increasing order of their indices.

To design the above model, our proposed scheme uses the workflow in spot instance and its purpose is to minimize job processing time within the suggested cost of user. The task size is determined by considering the availability and performance of each instance in order to minimize the job processing time. The available time is estimated by the execution time and cost using the price history of spot instances to improve the performance and stability of task processing. The estimated data is determined to assign the amount of tasks to each instance. Our proposed scheme reduces the out-of-bid situation and improves the job execution time. However, total cost is higher than when not using workflow.

Our task distribution method determines the task size in order to allocate a task to a selected instance. Based on a compute-unit and an available state, the task size of an instance ( ) is calculated as follows: where represents the total size of tasks required for executing a user request. In an instance , and represent the compute-unit and the available state, respectively. The available state can be either 0 (unavailable) or 1 (available).

We use the instance rate for determining the criteria to divide groups. represents the unit taken for the processing of a task size in the instance . Consider where   and represent the task execution time and the task failure time, respectively.

And we define the to classify groups. The is the average of available instances such as and which represent the average of the and , respectively. The set of instances is classified into two groups, positive and negative, based on . The positive group is the set of instances with greater than . Consider

We calculate the task size to transfer from instance ( ) in as follows:

In group , the task size of each instance is given as . We are able to get by considering after the transfer operation:

The negative group is the set of instances with less than . Consider

The tasks are allocated according to the instance performance . The task size to receive is allocated according to the task size of each instance . In the group , the task size of each instance is given as . After the receive operation, is added to . Consider

We propose a workflow scheduling algorithm based on the above equations. Algorithms 1 and 2 show the workflow scheduling algorithm and the workflow recalculation function, respectively.

(1) Boolean S_flag = false // a flag representing occurrence of a task execution
(2) while (search user’s job) do
(3) if (require job execution by the user) then
(4)take the cost and total execution time by the user;
(5)S_flag = true;
(6) end if
(7) if (S_flag) then
(8)invoke initial_workflow ( );   // thread function
(9)  while (task execution does not finish) do
(10)  if  (meet the recalculation point by instance)  then
(11)  invoke recalculation_workflow ( );   // thread function
(12)  end if
(13)  end while
(14)end if
(15) end while

(1) Thread_Function initial_workflow ( ) begin
(2) forall instance   do
(3)retrieve an instance information to meet the user’s requirement in an
  instance ;
(4) analyze an available execution time and cost in an instance ;
(5) store the analyzed available instance to a queueinstance;
(6) end forall
(7)calculate on priority list for the priority job allocation;
(8) forall instance   queueinstance  do
(9)allocate tasks to the instance ;
(10) end forall
(11) end Thread_Function
(12) Thread_Function recalculation _workflow ( ) begin
(13) forall instance   Ins do
(14)   retrieve the information to an instance ;
(15)   calculate the modified task size;
(16) end forall
(17) end Thread_Function

4. Performance Evaluation

The simulations were conducted using the history data obtained from Amazon EC2 spot instances [15]. The history data before 10-01-2010 was used to extract the expected execution time and failure occurrence probability for our checkpointing scheme. The applicability of our scheme was tested using the history data after 10-01-2010.

In the simulations, one type of spot instance was applied to show the effect of an analysis—task time—on the performance. Table 1 shows various resource types used in Amazon EC2. In this table, resource types comprise a number of different instance types. First, standard instances offer a basic resource type. Second, high-CPU instances offer more compute-units than other resources and can be used for compute-intensive applications. Finally, high-memory instances offer more memory capacity than other resources and can be used for high-throughput applications, including database and memory caching applications. Under the simulation environments, we compare the performance of our proposed scheme with that of the existing schemes without distributions of tasks in terms of various analyses according to the task time.

Table 1 shows various information of resource type in each instance and Table 2 shows the parameters and values for simulation. The information of spot price is extracted from 11-30-2009 to 01-23-2011 in spot history. The user’s bid is taken by the spot price average from information of spot price. The task size is decided by compute-unit rate based on baseline. Initially, the baseline denotes an instance m1.xlarge. For example, the task size of an instance m1.small is calculated by the following:

4.1. Comparison Results of Each Instance before Applying Workflow

Figure 5 shows the simulation results about each instance. We consider performance condition of each instance. Each instance sets user’s bid to take the spot price average in Table 2. Figure 5 presents the execution time and costs according to various instances types. The instance with high performance reduces the execution time but spends higher cost than the instance with low performance. As, in Figure 5(a), the total execution time increases, Figure 5(c) describes that the failure time increases. Figure 5(d) shows the rollback time in each instance. Rollback time is the time interval between a failure occurrence time and the last checkpoint time.

4.2. Comparison Results after Applying Workflow

Figure 6 shows the simulation results about the task distribution. Figure 6(a) shows the total execution time for each instance and Total. In the figures, TotalT denotes the total time taken for distributing and merging tasks. TotalC denotes the sum of costs of task execution in each instance. The total execution time of the TotalT achieves performance improvements in terms of an average execution time of 81.47% over the shortest execution time in each task time interval. In Figure 6(b), the cost in our scheme increases an average of 11.64 compared to an instance m1.small and reduces an average of 32.87 compared to an instance . A failure time of Figure 6(c) and a rollback time of Figure 6(d) are smaller than those of Figures 5(c) and 5(d).

Figure 7 shows the execution results of workflow based on the task processing rate after applying our proposed scheme. Figures 6(a) and 7(a) show that the total execution time is reduced by an average of 18.8% after applying our scheme compared to not applying it. Figures 6(c) and 7(c) show that the failure time after applying our proposed scheme was increased by 6.68% compared to before applying it. However, in Figures 6(d) and 7(d), the rollback time after applying our proposed scheme showed an average performance improvement of 4.3% when compared to the rollback time without applying it. The rollback time is calculated from a failure point to the last checkpoint time. Figures 6(b) and 7(b) show that the total costs after applying our scheme decreased by an average of 0.37 when compared to the cost before applying it. There are two facts deduced from these results. One is the increase of failure time. The other is the improvement of total execution time through an efficient task distribution. And the task execution loss was reduced when the out-of-bid situation occurred. In addition, we compare experiments to consider the execution time and costs.

Figure 8 shows the combined performance metric and the product of the total task execution time and cost. According to the task time interval, there is a little difference between the basic and the applying schemes, compared to each instance. In the figure, the basic scheme denotes the workflow product that applies only task distribution without considering a task processing rate. The applying scheme denotes the workflow product considering the task processing rate. The product of the basic scheme achieves performance improvements in the average combined metric of 87.71% over the average product instance in each task time interval. The applying scheme achieves performance improvements in the average combined metric of 12.76%, compared to the basic scheme.

5. Conclusion

In this paper, we proposed a workflow scheduling technique considering task processing rate in unreliable cloud computing environments. The workflow scheduling scheme recalculates the task size based on task processing rate within the recalculated point. In addition, our previously proposed checkpoint scheme takes a checkpointing based on two kinds of thresholds: price and time. Our scheme reduces a failure time and an absolute time through the checkpoint scheme. The rollback time of our scheme can be less than that of the existing scheme without workflow because our scheme adaptively performs task distribution operation according to available instances. The simulation results showed that the average execution time in our scheme was improved by 17.8% after applying our proposed scheme as compared to before applying it. And our proposed scheme represented approximately the same cost as compared to before applying it. Other simulation results reveal that, compared to various instance types, our scheme achieves performance improvements in terms of an average combined metric of 12.76% over workflow scheme without considering task processing rate.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgment

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MEST) (NRF-2012R1A2A2A02046684).