Wireless Communications and Mobile Computing

Volume 2018, Article ID 1934784, 16 pages

https://doi.org/10.1155/2018/1934784

## Workflow Scheduling Using Hybrid GA-PSO Algorithm in Cloud Computing

^{1}Network and Information Security Department, Yarmouk University, Irbid 21163, Jordan^{2}Computer Sciences Department, Yarmouk University, Irbid 21163, Jordan

Correspondence should be addressed to Ahmad M. Manasrah; oj.ude.uy@a.damha

Received 27 September 2017; Accepted 11 December 2017; Published 8 January 2018

Academic Editor: B. B. Gupta

Copyright © 2018 Ahmad M. Manasrah and Hanan Ba Ali. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Cloud computing environment provides several on-demand services and resource sharing for clients. Business processes are managed using the workflow technology over the cloud, which represents one of the challenges in using the resources in an efficient manner due to the dependencies between the tasks. In this paper, a Hybrid GA-PSO algorithm is proposed to allocate tasks to the resources efficiently. The Hybrid GA-PSO algorithm aims to reduce the makespan and the cost and balance the load of the dependent tasks over the heterogonous resources in cloud computing environments. The experiment results show that the GA-PSO algorithm decreases the total execution time of the workflow tasks, in comparison with GA, PSO, HSGA, WSGA, and MTCT algorithms. Furthermore, it reduces the execution cost. In addition, it improves the load balancing of the workflow application over the available resources. Finally, the obtained results also proved that the proposed algorithm converges to optimal solutions faster and with higher quality compared to other algorithms.

#### 1. Introduction

The needs for computing and huge storage resources are fast growing. Therefore, cloud computing gets the attention due to the high performance computing services and facilities that are provided to the users as Software as a Service (SaaS), Infrastructure as a Service (IaaS), and Platform as a Service (PaaS) [1–3]. Various applications can be modeled as workflow applications of a set of tasks with dependencies between them in the sense that before one task can execute, dependant tasks have to complete their execution first. Workflow applications are being used in a range of domains, such as astrophysics, bioinformatics, and disaster modeling and prediction. Moreover, complicated problems like complex scientific applications are emerging recently through combining various methods and techniques in a single solution. For such a need, this type of applications has been executed on supercomputers, clusters, and grids [4]. Fortunately, with the advent of clouds, such workflow applications are executed in the cloud. The workflow applications are the mechanism of a large-scale business process execution, consisting of a set of events or tasks in which information is distributed from one task to another based on some technical rules, to achieve a general goal [5]. The workflow application tasks are dependent on each other, where the output of some tasks is the input to another. Therefore, the order of their execution must be considered when assigning the tasks to VM processors in a multiprocessor environment. Assigning the dependent tasks to the most appropriate VM processors is known to be an NP-complete problem as discussed by Verma and Kaushal [6]. The scheduling processes of the workflow applications are a multiobjective optimization problem (also known as Pareto optimization), where users might wish to minimize the money cost and the execution time for the whole workflow application with efficient load balancing over the VMs in the cloud environment. The optimal decision for the multiobjective workflow optimization is the trade-off between the three objectives; therefore, the objectives must be rated based on their importance to the users to select the best Pareto solutions because, for instance, minimizing the overall cost may lead to maximizing the execution time and the load over a specific VM [7, 8]. The workflow scheduling problem is an inherited problem from the heterogeneous computing environments, for which different research efforts were made to address the scheduling problem [9–11]. However, heterogeneous computing environments are not easy to set up, and their capability of giving more uniform performance with less failure is quite limited in comparison to the cloud environments [12, 13]. Moreover, the main objective of the various previous efforts in addressing the workflow scheduling problem in heterogeneous environments is to only minimize the finish time. Therefore, with the wide adoption of the cloud environments and their services as a pay-per-use model, there is a need to consider both the total monetary cost and the execution makespan. As a result, several metaheuristic algorithms were proposed to solve the scheduling problem of the workflow tasks and to get an efficient solution for tasks distribution over the different VMs in the cloud environment. For instance, Genetic Algorithm (GA) [14], Ant Colony Optimization [15], Swarm Intelligence [16], and Artificial Bee Colony (ABC) [17] are few examples of the various proposed solutions of workflow scheduling problem that addresses the total monetary cost and the execution makespan.

The main objective of this paper is to propose an algorithm that addresses the workflow scheduling problem. The proposed algorithm should also reduce the total makespan execution time and balances the load over the VMs with minimum total monetary cost. Therefore, this paper proposes a Hybrid GA-PSO algorithm through combining the strengths of both algorithms to address the workflow scheduling problem. The efficiency of the proposed algorithm is evaluated against other algorithms to prove its effectiveness in solving the workflow scheduling problem in the cloud environment.

The remainder of the paper is organized as follows. The problem description and the state-of-the-art in workflow scheduling are described along with the challenges when applying the existing common scheduling algorithms on IaaS platforms which are also highlighted in Section 2. This is followed in Section 3 by the design of the workflow scheduling algorithm and definitions of the proposed algorithm. Section 4 provides details of the performance evaluation of the multiobjective scheduling problem in cloud along with the experimental results and their discussion, and the paper is concluded and the future work is summarized in Section 5.

#### 2. Related Work

Workflow scheduling problems are considered one of the main challenges in cloud environments. Many heuristic algorithms were proposed to solve the tasks scheduling problem using different strategies. However, the problem becomes obvious when the tasks are dependent on each other (i.e., workflow application). The dependent tasks require a specific execution order due to the relationship between them. There are two types of workflow scheduling: the best-effort workflow scheduling and the quality of services (QoS) constraint workflow scheduling [5, 18]. However, the best-effort workflow scheduling focuses on reducing the execution time of the whole workflow tasks regardless of other factors. Many types of research were based on the best-effort workflow scheduling to reduce the execution time, such as Braun et al. [16] who use the min-min algorithm for workflow scheduling. Their proposed approach executes the small tasks first and delays the larger tasks for a longer time. On the other hand, Mao et al. [19] use the max–min algorithm for task scheduling to execute the large tasks first and the small tasks are delayed for a longer time. In an attempt to resolve the aforementioned issues, Kumar and Verma [20] combined the min-min and max–min algorithms along with the Genetic Algorithm to improve the scheduling of multiple jobs over multiple virtual machines efficiently. The authors employ the min-min and the max–min algorithms to generate the GA individual and to provide better initial population rather than randomly chosen initial population. The achieved results were better than GA-based algorithms; however, it requires a lot of computation steps that consume time. This makes it unsuitable for cloud computing pay-per-use models. Guo et al. [21] proposed a Particle Swarm Optimization (PSO) based algorithm for solving the task-scheduling problem with an objective of reducing the total execution and transfer time. The optimization process is based on a heuristic scheduling combined with the PSO, to allocate the tasks to the different available resources. They practically proved that the PSO could run faster and give a better solution than GA. However, the PSO algorithm might get trapped in the a local optimal solution [22].

Different types of research, based on the QoS constraint for workflow scheduling, were considered to reduce the execution time under different predefined constraints, such as the following: user’s predefined budget constraints, user predefined deadline constraints, or workflow scheduling considering the reliability, time, cost, load balance, and fault recovery constraints. In this regard, Pandey et al. [23] presented a heuristic algorithm based on Particle Swarm Optimization to solve the workflow tasks scheduling over cloud resources. The conducted experiment shows that the computation cost using the PSO algorithm is three times better than the “Best Resource Selection” (BRS) algorithm under user predefined time constraints. However, the obtained result was not completely accurate due to the fast convergence towards the solution, which may cause PSO to get stuck in the local optimal solutions, and even the results cannot reflect the real performance of PSO. Arabnejad and Barbosa [24] presented a Heterogeneous Budget-Constrained Scheduling (HBCS) algorithm. The algorithm computes two possible schedules for the DAG (Directed Acyclic Graph) of the workflow. One schedule produces the minimum execution time with the maximum cost, while the other produces the minimum cost. The user, therefore, is able to decide which schedule to use to execute his task before the required deadline and within the cost range. The HBCS algorithm reduces the makespan by 30% and the cost within the user’s specified budget constraint. Furthermore, it reduces the time complexity compared to other budget-constrained algorithms.

Researchers such as Verma and Kaushal [6] realize that the priority of the tasks determined their execution order. Consequently, they presented a* Bicriteria Priority Based Particle Swarm Optimization (BPSO)* algorithm, to schedule the workflow tasks over the available cloud resources. The BPSO algorithm represents the trade-off between the execution time and the execution cost under the user’s predefined budget and deadline constraints. The proposed scheduling algorithm significantly reduces the execution cost and the makespan through selecting the best-known scheduling solution from the heuristic solutions under the predefined deadline and budget constraints compared to BHEFT (Budget-constrained Heterogeneous Earliest Finish Time) [31] and PSO algorithms [22, 26]. However, the BPSO algorithm does not consider the various loads of the available resources. Consequently, Xu et al. [25] developed a multiobjective heuristic algorithm based on the min-min algorithm. The proposed algorithm uses four real-world scientific workflows to evaluate its performance. The conducted experiments evaluate the performance of the makespan and the execution cost with fault recovery procedure. The heuristic algorithm, based on the min-min algorithm, is considered a better choice only when both the cost and the makespan are considered.

The multiobjective optimization is a very promising direction to tackle the problem of workflow scheduling. In this regard, Ge and Wei [27] used a Genetic Algorithm to optimize the tasks scheduling in the job queue. They used a centralized scheduler (i.e., master node) to distribute the waiting tasks to the different available resources (i.e., slave nodes) based on the resources status messages. Their results show that the proposed schedule was better than the First-In-First-Out (FIFO) and the Delay scheduling that distributes the load over all resources in the cloud. However, the proposed algorithm requires a lot of processing time to reach the optimal solution. Later, Fard et al. [28] suggested a heuristic static multiobjective scheduling algorithm for scientific workflows in heterogeneous environments. The proposed algorithm adopted the strategy of maximizing and minimizing the distance between the constraints for each of the four objectives (i.e., makespan, economic cost, energy consumption, and reliability). The researchers analyzed and categorized the different objectives based on their impact on the optimization process. The results showed that most of the generated solutions are within the predefined deadline and budget constraints. However, the proposed algorithm is not efficient with a small number of tasks and processors. Wu et al. [29], therefore, suggested a Revised Discrete Particle Swarm Optimization (RDPSO) algorithm to schedule the workflow applications over the different available resources. The experiments were conducted over a set of workflow applications with different data communication and computation costs. The result showed that the proposed RDPSO algorithm reduces the cost and yields better makespan compared to the standard PSO and BRS (Best Resource Selection) algorithm. However, the proposed algorithm is not efficient with large search space. Continuously, Chitra et al. [26] proposed a local minima jump solution using PSO (i.e., JPSO) for workflow scheduling in the cloud to schedule the tasks and load balance the workflow applications, to reduce the makespan. The JPSO algorithm overcomes getting trapped in the local minimal solution problem through making a jump in the value to avoid the poor convergence of the values. The results show that the proposed algorithm is more efficient compared to the GA algorithm by 3.8% with a small number of tasks. However, the GA algorithm shows the better result with a large number of tasks.

Many researchers attempted to solve the multiobjective optimization problem of the workflow applications using a different number of objectives. In this paper, a Hybrid GA-PSO algorithm is proposed to schedule the workflow tasks over the available resources. The proposed algorithm aims to achieve three objectives: reducing the makespan, reducing the cost, and balancing the load of the workflow tasks on heterogeneous VMs in the selected cloud DC. In summary, the GA-based algorithms provide better results than other algorithms when the number of iterations is large. However, increasing the number of iterations means that the GA algorithm will consume more time to reach the optimal solution. On the other hand, the PSO-based algorithms provide better results than the other algorithms and in less time. However, the results may not be accurate due to the fast convergence of the PSO-based algorithms to the solution, which may cause being stuck in the local optimal solution. Therefore, the proposed Hybrid GA-PSO algorithm is distinguished by the characteristics of the GA and the PSO algorithms. The Hybrid GA-PSO algorithm is expected to work faster with different sizes of workflow applications compared to other algorithms with the same objectives. Moreover, the Hybrid GA-PSO algorithm may not get trapped in the local optimal solution, because of the use of the GA mutation operator that enhances the accuracy of the solutions. Table 1 summarizes the review of the literature works along with their pros and cons.