Abstract
Cloud computing platforms have been extensively using scientific workflows to execute largescale applications. However, multiobjective workflow scheduling with scientific standards to optimize QoS parameters is a challenging task. Various metaheuristic scheduling techniques have been proposed to satisfy the QoS parameters like makespan, cost, and resource utilization. Still, traditional metaheuristic approaches are incompetent to maintain agreeable equilibrium between exploration and exploitation of the search space because of their limitations like getting trapped in local optimum value at later evolution stages and higherdimensional nonlinear optimization problem. This paper proposes an improved Fruit Fly Optimization (IFFO) algorithm to minimize makespan and cost for scheduling multiple workflows in the cloud computing environment. The proposed algorithm is evaluated using CloudSim for scheduling multiple workflows. The comparative results depict that the proposed algorithm IFFO outperforms FFO, PSO, and GA.
1. Introduction
Cloud is an infinite pool of configurable computing resources (storage, network, processor, bandwidth, etc.) with some functionalities such as an ondemand payperuse model, high availability, scalability, and reliability [1, 2]. It also supports the distributed architecture for geographically distributed heterogeneous resources and provisioning them to clients through virtualization for hosting largescale applications. These applications are deployed in the form of workflows which are further divided into smaller tasks.
Due to continuously increasing workloads and the rise in their difficulty levels, workflow scheduling has become a widely studied cloud computing problem that attracts many researchers. As depicted in Figure 1, workflow scheduling is used to allocate the required resources to the appropriate tasks to complete the execution process. During scheduling tasks on the virtual machine (VM), the client’s QoS constraints must be fulfilled. Different clients may have different QoS requests in terms of cost, time, security, and so forth.
Multiple workflow scheduling comes into consideration to maximize the cloud architecture’s throughput when various client requests are received simultaneously. Similar tasks can be identified to be allocated on a similar set of resources [3]. Strategies must be adopted to enhance the system’s performance and ensure that all client requests are completed before the deadline.
An efficient scheduling technique maintains a tradeoff between user requirements and resource utilization [4]. Maintaining this tradeoff becomes challenging when some tasks have a parentchild relationship where a child task can only begin executing once its parent task has finished and all the output data from the predecessor task has been communicated to the child task [5, 6]. Various listbased algorithms cannot be directly implemented in cloud computing environments because of the resource heterogeneity, varied QoS constraints, and cloud’s dynamic nature [2]. Several metaheuristic algorithms such as FFO [7], PSO [8], GA [9, 10], have been explored for solving workflow scheduling problems. However, optimizing multiple objectives is still a challenging task for the CCE [11–14]. Optimizing one QoS parameter often results in compromising with the other QoS parameter. Thus, scheduling multiple workflows on the cloud while maintaining a tradeoff among multiple QoS parameters remains a problem that needs to be solved.
In this paper, an enhanced metaheuristic optimization technique IFFO has been proposed for scheduling multiple workflows on cloud computing environments to optimize multiple QoS parameters. The results of the proposed technique were generated using CloudSim and compared with existing FFO, PSO, and GA algorithms to validate the performance of IFFO in terms of makespan and cost parameters.
The rest of the paper is organized as follows: Section 1 describes the introductory concepts of cloud computing related to workflow scheduling. A crisp and concise literature survey on workflow scheduling for QoS parameters is discussed in Section 2. Section 3 highlights the problem formulation and problem definition. A novel framework using the IFFO algorithm is proposed in Section 4. The proposed algorithm’s experimental results are given in Section 5, and the conclusion is mentioned in Section 6.
2. Background
Yassa et al. have presented a new multiobjective approach, called DVFSMODPSO [15], for scheduling workflows on the cloud computing environment. The presented algorithm is a hybridization of PSO with Heterogeneous Earliest Finish Time (HEFT), aiming to optimize multiple objectives like makespan, cost, and energy consumption. Dynamic Voltage and Frequency Scaling (DVFS) is used for energy optimization, and the results show better Pareto optimal solutions than HEFT.
CGA^{2} [16] is a technique proposed by Liu et al., which includes an adaptive penalty function for scheduling deadline constrained workflows in the cloud computing environment, addressing the limitations of previously proposed evolutionary algorithms. The proposed algorithm prevents premature convergence, unlike several existing static techniques, by applying adaptive crossover and mutation probabilities and generates solutions that are able to meet deadline constraints. CGA^{2} is compared with traditional algorithms such as PSO, HEFT, GA, and Random to demonstrate better performance in terms of meeting deadlines under strict constraints and reducing the overall workflow execution cost.
HSGA [17] is a GAbased hybrid workflow scheduling technique adopted by Delavar et al., which utilizes the optimization characteristics of Round Robin (RR) and Best Fit (BF) scheduling algorithms. Initially, the proposed technique does the priority ranking of tasks based on their dependencies, and then the resource allocation is done by implementing RR and BF for appropriate VM selection. The experimental results depicted better performance of HSGA in terms of reducing makespan, lowering failure rate, and balancing the load when compared with LAGA and NGA scheduling algorithms.
CDMWS [18] is a dynamic optimization technique for scheduling multiple workflows on the cloud, proposed by Delavar et al., which aims at improving CPU utilization, reducing makespan, and improving the makespandeadline meeting ratio. The proposed technique is also divided into two stages. The first stage is responsible for estimating the execution time for each task by considering workflow deadline and task dependencies. The second stage is responsible for dynamic VM allocation, where VMs can reuse for tasks having similar requirements. VM reusability is implemented to lower power consumption and increase resource utilization. CDMWS is compared with two other algorithms, EWSA and RR, to verify its superiority.
Another listbased heuristic, MOWS [19], introduced by F. Abazari et al., adopts the greedy approach for prioritizing tasks and allocating appropriate resources to them. The proposed technique aims at improving the security of the overall cloud architecture while maintaining the execution time of workflows. The first phase of the algorithm designs the solution based on task prioritization and assesses security risk. The second part proposes an algorithm to deal with various security threats while scheduling a workflow on the cloud.
GAbased multiobjective workflow scheduling algorithm, MOGA [20], presented by Attiqa Rehman et al., aims at optimizing a diverse range of objectives, including makespan, budget, resource utilization, deadline, and energy efficiency. A gap search algorithm was also introduced in this work that finds gaps in the schedule generated for a particular workflow and fills them with independent tasks to maximize resource utilization. MOGA was compared with three GAbased algorithms and one PSObased algorithm (MOPSO) to validate its superiority. Table 1 shows the summary of the related works.
3. Problem Formulation
Any largescale application that needs to be deployed on a cloud platform is generally represented in the form of a workflow W = (T, E). A workflow can be pictorially represented using a directed acyclic graph (DAG) where T = {T_{1}, T_{2}, .., T_{n}} denotes the set of tasks. The complete application is divided into several dependent and independent subtasks. The tasks in a workflow are represented at different levels having a parentchild relationship where a child task cannot begin execution until all its parent tasks have finished execution, and all output data has been transferred to the child task. An edge E_{ij} from T_{i} to T_{j} represents that T_{i} is the parent of T_{j} and a dependency between T_{i} and T_{j}.
Consider a sample workflow depicted in Figure 2. The entire application is divided into seven subtasks T_{1}, T_{2}, .., T_{7} falling at five different levels, that is, Level 0 to Level 4. At level 2, Tasks T_{3} and T_{4} are independent of each other since they are at the same level, so they can be executed concurrently on different resources. However, their execution can only begin once T_{2} at Level 1 has finished its execution and transferred all output data to T_{3} and T_{4}. Similarly, the execution of T_{2} depends on its predecessor task, T_{1}. Since T_{1} has no predecessor, it will be the first task to be executed. Workflow execution can consider being completed once the last task in the DAG, T_{7}, has finished its execution and generated its output.
Resource heterogeneity will also be considered as provided by any IaaS cloud provider. Different types of VMs will be available based on different configurations. Any cloud service provider who joins the cloud marketplace provides a twodimensional bid B_{VMi} = (P_{VMi}, C_{VMi}) where P_{VMi} represents the processing capacity of the VM measured in terms of MIPS and C_{VMi} is the cost of execution on that VM. The pricing model is based on the current Amazon EC2 standards, where a full cycle consists of 60 minutes, and one extra minute will count for one complete cycle. So, if a resource is consumed for 61 minutes, the user will be charged for two complete cycles, that is, 120 minutes.
Execution time of a task T_{j} on a resource VM_{l}, where T_{j} = {T_{1}, T_{2}, T_{3},……., T_{J}} j {1, 2, 3, …, J}, is calculated using equation (1) by considering the size of task T_{j} in terms of MIPS and the performance variation factor of VM_{l} introduced while adjusting the processing capacity of a VM. Since a workflow involves task dependency, the data transfer time from tasks T_{j} to T_{k} can be calculated using equation (2) where is the amount of output data generated by T_{j} which is assumed to be known in advance for each task, and bw is the bandwidth between each VM. Also, the data transfer rate between two tasks scheduled on the same resource will be zero. Hence, the total processing time of each task T_{j} on a resource VM_{l} is calculated using equation (3) where e is the number of edges connected to a parent task T_{j} and if T_{j} and T_{k} are scheduled on the same VM, else 1:
Similarly, multiple workflow scheduling problems can also be formulated. Consider the diagram shown in Figure 3. It consists of three small workflows, all of which can be combined to form a single large workflow by adding Dummy_{start} and Dummy_{end} as the starting and ending tasks, respectively. The execution time of both of these tasks will be zero as they are included only for merging the smaller workflows.
Optimal task scheduling and resource provisioning can be done based on various objectives. This work focuses on optimizing two scheduling objectives, that is, makespan and cost, by finding a schedule S = (Res, Map, Z_{ct}, Z_{ms}) for scheduling workflow on cloud computing environment, where Res = {r_{1}, r_{2}, .., r_{c}} is the set of available resources, Map depicts the task to resource mapping in the form for each task in the workflow which means that a task is scheduled on resource and it will begin execution at start time and will finish execution at the end time . Z_{ct} and Z_{ms} represent total execution cost and total execution time and can be calculated with the help of the following two equations, respectively:where α denotes one cycle of the time unit for which the VM is charged, is the lease end time for resource , and is the lease start time. A sample schedule generated for workflow shown in Figure 2 is depicted in Figure 4.
Hence, this paper works on finding an optimal schedule S to minimize total execution cost and total execution time based on the definitions given so far.
4. Fruit Fly Optimization
Fruit Fly Optimization Algorithm [7] has been widely adopted for solving global optimization problems because of its simple structure and lesser number of parameters. The algorithm is inspired by the foodfinding ability of fruit flies, where they can smell food even at a distance of 40 km. Once they get closer to the food source, they use their sensitive vision for flying towards the food direction.
FFO works in various phases as follows:(1)The first phase is the initialization phase, where the fruit flies are randomly distributed in the search space, and their location (X_init, Y_init) is initialized(2)In the second phase, each fruit fly is given some random direction and distance (X_init + RandomValue, Y_init + RandomValue) to move towards the food source(3)Next, the distance between each fruit fly and the food location is estimated, and the smell concentration is calculated, which is the reciprocal distance(4)The algorithm then goes into the fitness evaluation phase, which is a function based on the smell concentration(5)The maximum smell concentration of the individual fruit fly is retained, and the swarm updates its position to move in that direction(6)These steps are iteratively repeated, and the result of each iteration is compared with the previous one to check whether optimized results are obtained or not
FFO algorithm became popular because of its easytoimplement structure and quick convergence. However, it is not found suitable for complex optimization problems as it could get trapped in the local optima at later evolution stages and might not reach the global optima. Also, the convergence rate of the algorithm for complex optimization could be improved.
Hence, this paper presents an enhanced version of the traditional FFO algorithm, which could be implemented for complex optimization problems such as scheduling multiple workflows in the cloud computing environment.
4.1. Proposed Framework
Figure 5 presents the proposed framework for IFFO. Consider a cloud provider with a set of virtual machines VM_{l} = {VM_{1}, VM_{2}, VM_{3}, ……, VM_{L}} l {1, 2, 3, ……, L} having some computational capacity. In a cloud computing scenario, there are multiple workflows W_{i} = {W_{1}, W_{2}, W_{3}, ……., W_{I}} i {1, 2, 3, ……, I} that need to be scheduled with optimized QoS parameters. Keeping this in mind, multiple W_{i} are merged and converted into a unified workflow, and tasks are added at the starting and ending position of W_{i}. workflows W_{i}: there may be several tasks T_{j} = {T_{1}, T_{2}, T_{3}, ……., T_{J}} j {1, 2, 3, …, J}.
Each resource VM_{l,} is available ondemand, is accessible from a shared pool of computing resources, and has some QoS parameter associated with it. For the present research work, the authors have considered makespan and cost as QoS parameters. Cost of running each VM is VMC_{m} = {VMC_{1}, VMC_{2}, VMC_{3}, ……., VMC_{M}} m {1, 2, 3, …, M} and makespan VMM_{s} = {VMM_{1}, VMM_{2}, VMM_{3}, ……., VMM_{S}} s {1, 2, 3, …, S}.
Maximum cost and makespan are calculated by the addition of cost and makespan of each task. The objective of the abovementioned problem is to minimize the cost and makespan for executing the entire workflow. Thus, these multiple objective optimization problems try to find out the Pereto optimal solution in each iteration. Once the cost and makespan are optimized while T_{j} ≠ T_{J}, the IFFO algorithm is applied on input values to find out the best smell function in each iteration. Based on updated smell values, the entire swarm population updates their smell concentration and becomes ready for the next iteration. When maximum iterations are completed, or Pareto optimal solution is achieved, all the tasks are sent for simulation purposes.
4.2. Proposed IFFO Algorithm
The proposed IFFO optimization algorithm is an enhanced version of the traditional Fruit Fly Optimization Algorithm. This algorithm is used to optimize multiple objectives, that is, cost and makespan for multiple workflow scheduling in the cloud environment. The proposed algorithm continuously optimizes the old solution using the smell concentration function. This paper also shows an improvement in coverage rate by updating appropriate positions in each iteration.
The main steps of the proposed method are described as follows.(1)Input constraints: let n be the swarm size of fruit fly population; the initial position of each fruit fly is SP_{loc p} = { SP_{loc 1}, SP_{loc 2,} SP_{loc 3}, …., SP_{loc P}} p {1, 2, 3, …, P}. Here, each swarm particle represents a possible solution, moving towards a random direction with randomized distance R_{v}. As per traditional FFO, the maximum iteration should be {20–40}; for current research work, the maximum iteration (distance) of fruit fly movement is Q = {20–40}.(2)Output constraints: the target is to find out rank1 Pareto optimal solution for given input values. Here, f (P) = {S_{1}, S_{2}, S_{3}, ……., S_{N}} N {1, 2, 3, …, K} is a set of solutions with lower bound LB and upper bound UB. Z_{CtMs} is the total cost and makespan of workflows that need to be minimized.(3)This step defines the objective function f(obj), where S_{f} is the scaling factor, R is a randomized function, and are arbitrary constant, that is, .(4)For terminating condition (maximum iteration), I_{max} (t) started from 1 to T.(5)Calculate the initial position of each swam particle , where the initial position of each swarm particles SP_{loc p} = { SP_{loc 1}, SP_{loc 2,} SP_{loc 3},…., SP_{loc P}} and is the random variable ranging from 0 to 1.(6)Distance between individual swarm and food is calculated by and smell concentration by .(7)Calculate SS_{q} where q = {1, 2,3, …, n}, that is, smell concentration of each individual fruit fly.(8)Find out the mean of smell concentration .(9)Update the swarm particles position with updated values of , that is, and , and go to step 3.
The algorithmic representation of these steps is mentioned below (Algorithm 1).

5. Results and Discussion
5.1. Dataset and Simulation Setup
The experimental analysis was conducted using the CloudSim framework [21], the simulation tool used for simulating cloud environments. The proposed algorithm was implemented for three different datasets, and results were compared with three other metaheuristic optimization techniques, FFO, GA, and PSO. Datasets differ in terms of the number of tasks in a workflow and the number of resources available. Although the cloud environment is considered to have an unlimited set of resources, for arriving at an optimal solution, we need to limit the number of resources as well. We have considered three sample workflows consisting of 15, 25, and 35 tasks. For these three workflows, the number of resources is assumed to be 5, 10, and 15, respectively.
5.2. Performance Analysis
The proposed IFFO algorithm is compared with PSO, GA, and FFO based on two scheduling objectives, makespan, and cost. The algorithms were executed for 20 iterations, and the results depict better performance of IFFO as compared with the other algorithms, both in terms of makespan and cost. The experimental results are presented in the graphs shown in Figures 6–8 for datasets 1, 2, and 3. The blue line represents the cost of execution, while the orange line depicts the makespan. It is clear from these graphs that IFFO outperforms PSO, GA, and FFO in both parameters.
The percentagewise improvement of the proposed algorithm is depicted in Figure 9, which shows that for dataset 1, IFFO is 8.54%, 17.55%, and 10.46% better than PSO, GA, and FFO, respectively, in terms of cost and 6.6%, 2.49%, and 1.03% better than PSO, GA, and FFO respectively, in terms of makespan. For dataset 2, the improvement percentage is 9.21%, 17.8%, and 11.38% in terms of cost, and 7.25%, 9.4%, and 8.91% in terms of makespan when compared with PSO, GA, and FFO resp. Similarly, for dataset 3, IFFO showed an improvement of 11.24%, 19.34%, and 14.98% in terms of cost and 9.61%, 13.68%, and 19.35% in terms of makespan when compared with PSO, GA, and FFO, respectively.
The proposed algorithm is capable of optimizing both the parameters simultaneously, unlike many other optimization algorithms where the client has to compromise with one objective while trying to optimize the other. In such cases, a decision has to be made regarding which objective is to be given preference over the other.
6. Conclusion
Scientific workflows play a significant role in largescale cloudbased applications. In workflow scheduling, natureinspired algorithms elucidate the promising optimized results for multiobjective problems in the cloud environment. But to avoid local optima trapping problems in multiobjective optimization, traditional natureinspired techniques continuously try to maintain a balance between exploration and exploitation. In this paper, multiple workflows are considered and merged with dummy start and end nodes to represent it as a single monolithic workflow. The proposed IFFO enhanced the traditional FFO algorithm to minimize the “stuck at the local optima” problem by using an enhanced swarm smell function. The activation function used the mean smell function for the generation of new positions of the swarm particles. The IFFO is used for scheduling multiple workflows to minimize cost and makespan parameters while providing a Pareto optimal solution. The proposed algorithm is implemented on the CloudSim platform, and the result for dataset 1 shows that IFFO is better than PSO, GA, and FFO by 15.14%, 20.04%, and 11.47%, respectively, in terms of cost and makespan conjointly. Similarly, for dataset 2, the proposed algorithm shows 16.46%, 27.2%, and 20.29% improvement. And for dataset 3, the improvement is 20.85%, 33.02%, and 34.33% as compared with PSO, GA, and FFO.
The future scope is to implement the proposed IFFO technique with more QoS parameters such as energy efficiency and load balancing to enhance the overall system performance. The IFFO can be applied in various stateoftheart research areas like sensor networks, IoT, decisionmaking system, smart agriculture, and ecological engineering problem.
Data Availability
All data are included within this manuscript.
Conflicts of Interest
The authors declare that they have no conflicts of interest to report regarding the present study.
Acknowledgments
The authors would like to acknowledge the support from Taif University Researchers Supporting Project (no. TURSP2020/216), Taif University, Taif, Saudi Arabia.