Abstract

Cloud computing platforms have been extensively using scientific workflows to execute large-scale applications. However, multiobjective workflow scheduling with scientific standards to optimize QoS parameters is a challenging task. Various metaheuristic scheduling techniques have been proposed to satisfy the QoS parameters like makespan, cost, and resource utilization. Still, traditional metaheuristic approaches are incompetent to maintain agreeable equilibrium between exploration and exploitation of the search space because of their limitations like getting trapped in local optimum value at later evolution stages and higher-dimensional nonlinear optimization problem. This paper proposes an improved Fruit Fly Optimization (IFFO) algorithm to minimize makespan and cost for scheduling multiple workflows in the cloud computing environment. The proposed algorithm is evaluated using CloudSim for scheduling multiple workflows. The comparative results depict that the proposed algorithm IFFO outperforms FFO, PSO, and GA.

1. Introduction

Cloud is an infinite pool of configurable computing resources (storage, network, processor, bandwidth, etc.) with some functionalities such as an on-demand pay-per-use model, high availability, scalability, and reliability [1, 2]. It also supports the distributed architecture for geographically distributed heterogeneous resources and provisioning them to clients through virtualization for hosting large-scale applications. These applications are deployed in the form of workflows which are further divided into smaller tasks.

Due to continuously increasing workloads and the rise in their difficulty levels, workflow scheduling has become a widely studied cloud computing problem that attracts many researchers. As depicted in Figure 1, workflow scheduling is used to allocate the required resources to the appropriate tasks to complete the execution process. During scheduling tasks on the virtual machine (VM), the client’s QoS constraints must be fulfilled. Different clients may have different QoS requests in terms of cost, time, security, and so forth.

Multiple workflow scheduling comes into consideration to maximize the cloud architecture’s throughput when various client requests are received simultaneously. Similar tasks can be identified to be allocated on a similar set of resources [3]. Strategies must be adopted to enhance the system’s performance and ensure that all client requests are completed before the deadline.

An efficient scheduling technique maintains a trade-off between user requirements and resource utilization [4]. Maintaining this trade-off becomes challenging when some tasks have a parent-child relationship where a child task can only begin executing once its parent task has finished and all the output data from the predecessor task has been communicated to the child task [5, 6]. Various list-based algorithms cannot be directly implemented in cloud computing environments because of the resource heterogeneity, varied QoS constraints, and cloud’s dynamic nature [2]. Several metaheuristic algorithms such as FFO [7], PSO [8], GA [9, 10], have been explored for solving workflow scheduling problems. However, optimizing multiple objectives is still a challenging task for the CCE [1114]. Optimizing one QoS parameter often results in compromising with the other QoS parameter. Thus, scheduling multiple workflows on the cloud while maintaining a trade-off among multiple QoS parameters remains a problem that needs to be solved.

In this paper, an enhanced metaheuristic optimization technique IFFO has been proposed for scheduling multiple workflows on cloud computing environments to optimize multiple QoS parameters. The results of the proposed technique were generated using CloudSim and compared with existing FFO, PSO, and GA algorithms to validate the performance of IFFO in terms of makespan and cost parameters.

The rest of the paper is organized as follows: Section 1 describes the introductory concepts of cloud computing related to workflow scheduling. A crisp and concise literature survey on workflow scheduling for QoS parameters is discussed in Section 2. Section 3 highlights the problem formulation and problem definition. A novel framework using the IFFO algorithm is proposed in Section 4. The proposed algorithm’s experimental results are given in Section 5, and the conclusion is mentioned in Section 6.

2. Background

Yassa et al. have presented a new multiobjective approach, called DVFS-MODPSO [15], for scheduling workflows on the cloud computing environment. The presented algorithm is a hybridization of PSO with Heterogeneous Earliest Finish Time (HEFT), aiming to optimize multiple objectives like makespan, cost, and energy consumption. Dynamic Voltage and Frequency Scaling (DVFS) is used for energy optimization, and the results show better Pareto optimal solutions than HEFT.

CGA2 [16] is a technique proposed by Liu et al., which includes an adaptive penalty function for scheduling deadline constrained workflows in the cloud computing environment, addressing the limitations of previously proposed evolutionary algorithms. The proposed algorithm prevents premature convergence, unlike several existing static techniques, by applying adaptive crossover and mutation probabilities and generates solutions that are able to meet deadline constraints. CGA2 is compared with traditional algorithms such as PSO, HEFT, GA, and Random to demonstrate better performance in terms of meeting deadlines under strict constraints and reducing the overall workflow execution cost.

HSGA [17] is a GA-based hybrid workflow scheduling technique adopted by Delavar et al., which utilizes the optimization characteristics of Round Robin (RR) and Best Fit (BF) scheduling algorithms. Initially, the proposed technique does the priority ranking of tasks based on their dependencies, and then the resource allocation is done by implementing RR and BF for appropriate VM selection. The experimental results depicted better performance of HSGA in terms of reducing makespan, lowering failure rate, and balancing the load when compared with LAGA and NGA scheduling algorithms.

CDMWS [18] is a dynamic optimization technique for scheduling multiple workflows on the cloud, proposed by Delavar et al., which aims at improving CPU utilization, reducing makespan, and improving the makespan-deadline meeting ratio. The proposed technique is also divided into two stages. The first stage is responsible for estimating the execution time for each task by considering workflow deadline and task dependencies. The second stage is responsible for dynamic VM allocation, where VMs can reuse for tasks having similar requirements. VM reusability is implemented to lower power consumption and increase resource utilization. CDMWS is compared with two other algorithms, EWSA and RR, to verify its superiority.

Another list-based heuristic, MOWS [19], introduced by F. Abazari et al., adopts the greedy approach for prioritizing tasks and allocating appropriate resources to them. The proposed technique aims at improving the security of the overall cloud architecture while maintaining the execution time of workflows. The first phase of the algorithm designs the solution based on task prioritization and assesses security risk. The second part proposes an algorithm to deal with various security threats while scheduling a workflow on the cloud.

GA-based multiobjective workflow scheduling algorithm, MOGA [20], presented by Attiqa Rehman et al., aims at optimizing a diverse range of objectives, including makespan, budget, resource utilization, deadline, and energy efficiency. A gap search algorithm was also introduced in this work that finds gaps in the schedule generated for a particular workflow and fills them with independent tasks to maximize resource utilization. MOGA was compared with three GA-based algorithms and one PSO-based algorithm (MOPSO) to validate its superiority. Table 1 shows the summary of the related works.

3. Problem Formulation

Any large-scale application that needs to be deployed on a cloud platform is generally represented in the form of a workflow W = (T, E). A workflow can be pictorially represented using a directed acyclic graph (DAG) where T = {T1, T2, .., Tn} denotes the set of tasks. The complete application is divided into several dependent and independent subtasks. The tasks in a workflow are represented at different levels having a parent-child relationship where a child task cannot begin execution until all its parent tasks have finished execution, and all output data has been transferred to the child task. An edge Eij from Ti to Tj represents that Ti is the parent of Tj and a dependency between Ti and Tj.

Consider a sample workflow depicted in Figure 2. The entire application is divided into seven subtasks T1, T2, .., T7 falling at five different levels, that is, Level 0 to Level 4. At level 2, Tasks T3 and T4 are independent of each other since they are at the same level, so they can be executed concurrently on different resources. However, their execution can only begin once T2 at Level 1 has finished its execution and transferred all output data to T3 and T4. Similarly, the execution of T2 depends on its predecessor task, T1. Since T1 has no predecessor, it will be the first task to be executed. Workflow execution can consider being completed once the last task in the DAG, T7, has finished its execution and generated its output.

Resource heterogeneity will also be considered as provided by any IaaS cloud provider. Different types of VMs will be available based on different configurations. Any cloud service provider who joins the cloud marketplace provides a two-dimensional bid BVMi = (PVMi, CVMi) where PVMi represents the processing capacity of the VM measured in terms of MIPS and CVMi is the cost of execution on that VM. The pricing model is based on the current Amazon EC2 standards, where a full cycle consists of 60 minutes, and one extra minute will count for one complete cycle. So, if a resource is consumed for 61 minutes, the user will be charged for two complete cycles, that is, 120 minutes.

Execution time of a task Tj on a resource VMl, where Tj = {T1, T2, T3,……., TJ} j {1, 2, 3, …, J}, is calculated using equation (1) by considering the size of task Tj in terms of MIPS and the performance variation factor of VMl introduced while adjusting the processing capacity of a VM. Since a workflow involves task dependency, the data transfer time from tasks Tj to Tk can be calculated using equation (2) where is the amount of output data generated by Tj which is assumed to be known in advance for each task, and bw is the bandwidth between each VM. Also, the data transfer rate between two tasks scheduled on the same resource will be zero. Hence, the total processing time of each task Tj on a resource VMl is calculated using equation (3) where e is the number of edges connected to a parent task Tj and if Tj and Tk are scheduled on the same VM, else 1:

Similarly, multiple workflow scheduling problems can also be formulated. Consider the diagram shown in Figure 3. It consists of three small workflows, all of which can be combined to form a single large workflow by adding Dummystart and Dummyend as the starting and ending tasks, respectively. The execution time of both of these tasks will be zero as they are included only for merging the smaller workflows.

Optimal task scheduling and resource provisioning can be done based on various objectives. This work focuses on optimizing two scheduling objectives, that is, makespan and cost, by finding a schedule S = (Res, Map, Zct, Zms) for scheduling workflow on cloud computing environment, where Res = {r1, r2, .., rc} is the set of available resources, Map depicts the task to resource mapping in the form for each task in the workflow which means that a task is scheduled on resource and it will begin execution at start time and will finish execution at the end time . Zct and Zms represent total execution cost and total execution time and can be calculated with the help of the following two equations, respectively:where α denotes one cycle of the time unit for which the VM is charged, is the lease end time for resource , and is the lease start time. A sample schedule generated for workflow shown in Figure 2 is depicted in Figure 4.

Hence, this paper works on finding an optimal schedule S to minimize total execution cost and total execution time based on the definitions given so far.

4. Fruit Fly Optimization

Fruit Fly Optimization Algorithm [7] has been widely adopted for solving global optimization problems because of its simple structure and lesser number of parameters. The algorithm is inspired by the food-finding ability of fruit flies, where they can smell food even at a distance of 40 km. Once they get closer to the food source, they use their sensitive vision for flying towards the food direction.

FFO works in various phases as follows:(1)The first phase is the initialization phase, where the fruit flies are randomly distributed in the search space, and their location (X_init, Y_init) is initialized(2)In the second phase, each fruit fly is given some random direction and distance (X_init + RandomValue, Y_init + RandomValue) to move towards the food source(3)Next, the distance between each fruit fly and the food location is estimated, and the smell concentration is calculated, which is the reciprocal distance(4)The algorithm then goes into the fitness evaluation phase, which is a function based on the smell concentration(5)The maximum smell concentration of the individual fruit fly is retained, and the swarm updates its position to move in that direction(6)These steps are iteratively repeated, and the result of each iteration is compared with the previous one to check whether optimized results are obtained or not

FFO algorithm became popular because of its easy-to-implement structure and quick convergence. However, it is not found suitable for complex optimization problems as it could get trapped in the local optima at later evolution stages and might not reach the global optima. Also, the convergence rate of the algorithm for complex optimization could be improved.

Hence, this paper presents an enhanced version of the traditional FFO algorithm, which could be implemented for complex optimization problems such as scheduling multiple workflows in the cloud computing environment.

4.1. Proposed Framework

Figure 5 presents the proposed framework for IFFO. Consider a cloud provider with a set of virtual machines VMl = {VM1, VM2, VM3, ……, VML} l {1, 2, 3, ……, L} having some computational capacity. In a cloud computing scenario, there are multiple workflows Wi = {W1, W2, W3, ……., WI} i {1, 2, 3, ……, I} that need to be scheduled with optimized QoS parameters. Keeping this in mind, multiple Wi are merged and converted into a unified workflow, and tasks are added at the starting and ending position of Wi. workflows Wi: there may be several tasks Tj = {T1, T2, T3, ……., TJ} j {1, 2, 3, …, J}.

Each resource VMl, is available on-demand, is accessible from a shared pool of computing resources, and has some QoS parameter associated with it. For the present research work, the authors have considered makespan and cost as QoS parameters. Cost of running each VM is VMCm = {VMC1, VMC2, VMC3, ……., VMCM} m {1, 2, 3, …, M} and makespan VMMs = {VMM1, VMM2, VMM3, ……., VMMS} s {1, 2, 3, …, S}.

Maximum cost and makespan are calculated by the addition of cost and makespan of each task. The objective of the abovementioned problem is to minimize the cost and makespan for executing the entire workflow. Thus, these multiple objective optimization problems try to find out the Pereto optimal solution in each iteration. Once the cost and makespan are optimized while TjTJ, the IFFO algorithm is applied on input values to find out the best smell function in each iteration. Based on updated smell values, the entire swarm population updates their smell concentration and becomes ready for the next iteration. When maximum iterations are completed, or Pareto optimal solution is achieved, all the tasks are sent for simulation purposes.

4.2. Proposed IFFO Algorithm

The proposed IFFO optimization algorithm is an enhanced version of the traditional Fruit Fly Optimization Algorithm. This algorithm is used to optimize multiple objectives, that is, cost and makespan for multiple workflow scheduling in the cloud environment. The proposed algorithm continuously optimizes the old solution using the smell concentration function. This paper also shows an improvement in coverage rate by updating appropriate positions in each iteration.

The main steps of the proposed method are described as follows.(1)Input constraints: let n be the swarm size of fruit fly population; the initial position of each fruit fly is SPloc p = { SPloc 1, SPloc 2, SPloc 3, …., SPloc P} p {1, 2, 3, …, P}. Here, each swarm particle represents a possible solution, moving towards a random direction with randomized distance Rv. As per traditional FFO, the maximum iteration should be {20–40}; for current research work, the maximum iteration (distance) of fruit fly movement is Q = {20–40}.(2)Output constraints: the target is to find out rank-1 Pareto optimal solution for given input values. Here, f (P) = {S1, S2, S3, ……., SN} N {1, 2, 3, …, K} is a set of solutions with lower bound LB and upper bound UB. ZCtMs is the total cost and makespan of workflows that need to be minimized.(3)This step defines the objective function f(obj), where Sf is the scaling factor, R is a randomized function, and are arbitrary constant, that is, .(4)For terminating condition (maximum iteration), Imax (t) started from 1 to T.(5)Calculate the initial position of each swam particle , where the initial position of each swarm particles SPloc p = { SPloc 1, SPloc 2, SPloc 3,…., SPloc P} and is the random variable ranging from 0 to 1.(6)Distance between individual swarm and food is calculated by and smell concentration by .(7)Calculate SSq where q = {1, 2,3, …, n}, that is, smell concentration of each individual fruit fly.(8)Find out the mean of smell concentration .(9)Update the swarm particles position with updated values of , that is, and , and go to step 3.

The algorithmic representation of these steps is mentioned below (Algorithm 1).

(1)Input: SSn, SPloc p= {SPloc 1, SPloc 2,SPloc 3,., SPloc P} and Imax= {20-40} p {1, 2, 3, ……, P}
//SSn= Swarn Size, SPloc=initial location of individual swarm particles and Imax = Maximum number of iteration
(2)Output: Pareto optimal solution
and
  //are existing solutions, ZCtMsis total cost & makespan of multiple workflows and Outminis expected QoS optimized solution
 (3)
  //sfis scaling factor, and R is a randomized function
(4)for Imax (t) ← 1 to T do
(5)
  //initial position of each swarm particle and=(0,1)
(6)  
  //is distance between individual fruit fly and food, and is smell concentration
(7)  //for each individual fruit fly
(8)
(9) Update swarm particles location
  9.1.
  9.2.
  9.3. Go to step 3.
(10)   End for

5. Results and Discussion

5.1. Dataset and Simulation Setup

The experimental analysis was conducted using the CloudSim framework [21], the simulation tool used for simulating cloud environments. The proposed algorithm was implemented for three different datasets, and results were compared with three other metaheuristic optimization techniques, FFO, GA, and PSO. Datasets differ in terms of the number of tasks in a workflow and the number of resources available. Although the cloud environment is considered to have an unlimited set of resources, for arriving at an optimal solution, we need to limit the number of resources as well. We have considered three sample workflows consisting of 15, 25, and 35 tasks. For these three workflows, the number of resources is assumed to be 5, 10, and 15, respectively.

5.2. Performance Analysis

The proposed IFFO algorithm is compared with PSO, GA, and FFO based on two scheduling objectives, makespan, and cost. The algorithms were executed for 20 iterations, and the results depict better performance of IFFO as compared with the other algorithms, both in terms of makespan and cost. The experimental results are presented in the graphs shown in Figures 68 for datasets 1, 2, and 3. The blue line represents the cost of execution, while the orange line depicts the makespan. It is clear from these graphs that IFFO outperforms PSO, GA, and FFO in both parameters.

The percentagewise improvement of the proposed algorithm is depicted in Figure 9, which shows that for dataset 1, IFFO is 8.54%, 17.55%, and 10.46% better than PSO, GA, and FFO, respectively, in terms of cost and 6.6%, 2.49%, and 1.03% better than PSO, GA, and FFO respectively, in terms of makespan. For dataset 2, the improvement percentage is 9.21%, 17.8%, and 11.38% in terms of cost, and 7.25%, 9.4%, and 8.91% in terms of makespan when compared with PSO, GA, and FFO resp. Similarly, for dataset 3, IFFO showed an improvement of 11.24%, 19.34%, and 14.98% in terms of cost and 9.61%, 13.68%, and 19.35% in terms of makespan when compared with PSO, GA, and FFO, respectively.

The proposed algorithm is capable of optimizing both the parameters simultaneously, unlike many other optimization algorithms where the client has to compromise with one objective while trying to optimize the other. In such cases, a decision has to be made regarding which objective is to be given preference over the other.

6. Conclusion

Scientific workflows play a significant role in large-scale cloud-based applications. In workflow scheduling, nature-inspired algorithms elucidate the promising optimized results for multiobjective problems in the cloud environment. But to avoid local optima trapping problems in multiobjective optimization, traditional nature-inspired techniques continuously try to maintain a balance between exploration and exploitation. In this paper, multiple workflows are considered and merged with dummy start and end nodes to represent it as a single monolithic workflow. The proposed IFFO enhanced the traditional FFO algorithm to minimize the “stuck at the local optima” problem by using an enhanced swarm smell function. The activation function used the mean smell function for the generation of new positions of the swarm particles. The IFFO is used for scheduling multiple workflows to minimize cost and makespan parameters while providing a Pareto optimal solution. The proposed algorithm is implemented on the CloudSim platform, and the result for dataset 1 shows that IFFO is better than PSO, GA, and FFO by 15.14%, 20.04%, and 11.47%, respectively, in terms of cost and makespan conjointly. Similarly, for dataset 2, the proposed algorithm shows 16.46%, 27.2%, and 20.29% improvement. And for dataset 3, the improvement is 20.85%, 33.02%, and 34.33% as compared with PSO, GA, and FFO.

The future scope is to implement the proposed IFFO technique with more QoS parameters such as energy efficiency and load balancing to enhance the overall system performance. The IFFO can be applied in various state-of-the-art research areas like sensor networks, IoT, decision-making system, smart agriculture, and ecological engineering problem.

Data Availability

All data are included within this manuscript.

Conflicts of Interest

The authors declare that they have no conflicts of interest to report regarding the present study.

Acknowledgments

The authors would like to acknowledge the support from Taif University Researchers Supporting Project (no. TURSP-2020/216), Taif University, Taif, Saudi Arabia.