Abstract

An efficient DAG task scheduling is crucial for leveraging the performance potential of a heterogeneous system and finding a schedule that minimizes the makespan (i.e., the total execution time) of a DAG is known to be NP-complete. A recently proposed metaheuristic method, Chemical Reaction Optimization (CRO), demonstrates its capability for solving NP-complete optimization problems. This paper develops an algorithm named Double-Reaction-Structured Chemical Reaction Optimization (DRSCRO) for DAG scheduling on heterogeneous systems, which modifies the conventional CRO framework and incorporates CRO with the variable neighborhood search (VNS) method. DRSCRO has two reaction phases for super molecule selection and global optimization, respectively. In the molecule selection phase, the CRO as a metaheuristic algorithm is adopted to obtain a super molecule for accelerating convergence. For promoting the intensification capability, in the global optimization phase, the VNS algorithm with a new processor selection model is used as the initialization under the consideration of scheduling order and processor assignment, and the load balance neighborhood structure of VNS is also utilized in the ineffective reaction operator. The experimental results verify the effectiveness and efficiency of DRSCRO in terms of makespan and convergence rate.

1. Introduction

A large application can be decomposed into several smaller models (i.e., tasks) processed in parallel on heterogeneous computing systems. An efficient task scheduling is crucial for leveraging the performance potential of a heterogeneous system. The problem of the task scheduling on heterogeneous system can be stated as assigning the processors to the tasks for minimizing the makespan (i.e., the total execution time). As one task is required only after all of its predecessors are executed, these tasks with precedence constraints can be modeled as directed acyclic graphs (DAGs), where the nodes and the directed edges represent the tasks and the communications between the tasks, respectively. Finding a schedule that minimizes the execution time of a parallel program is known to be NP-complete [1]. Therefore, two scheduling strategies, heuristic and metaheuristic, are developed for searching a suboptimal solution with lower execution time.

Heuristic scheduling strategies focus on identifying a solution by exploiting the heuristics, an important class of algorithms based on which is list scheduling [212], such as heterogeneous earliest finish time (HEFT) [3]. List scheduling consists of two basic phases, constructing a scheduling list of tasks order by priority of each task and mapping each task to a processor in priority order according to greedy approach (i.e., a task with the highest-priority is assigned to a processor that allows the earliest finish time). The performance of heuristic-based algorithms relied on the effectiveness of the heuristics in a tremendous manner.

Metaheuristic scheduling strategies such as Ant Colony Optimization (ACO) [13], Genetic Algorithms (GA) [1421], Tabu Search (TS) [22, 23], and Simulated Annealing (SA) [24] search the solution spaces in a direct manner and produce consistent and high quality results on the wide range problems while, in comparison with heuristic-based algorithms, these strategies always cost much more time. The Chemical Reaction Optimization (CRO) is a new metaheuristic method and has shown its efficiency in solving NP-complete problem [2529]. There are only two CRO-based algorithms [27, 30] for DAG scheduling on heterogeneous system so far according to our knowledge. These two algorithms both focused on the DAG scheduling with the objective of minimizing the makespan. However, as metaheuristic scheduling strategies, CRO-based algorithms for DAG scheduling still have very high time cost and the convergence rates of them also need to be improved. In [30], the concept of super molecule is applied for accelerating convergence and the super molecule is selected by heuristic scheduling strategies. However, the performance of this kind of super molecule selection method is affected by the range of problems.

This paper proposes an algorithm, Double-Reaction-Structured CRO (DRSCRO), for DAG task scheduling on heterogeneous systems to aim at obtaining schedules with better quality. In this paper, the conventional CRO framework scheme is modified and two reaction phases, one for super molecule selection and another for global optimization, are developed in DRSCRO. CRO as a metaheuristic algorithm is utilized in the molecule selection phase to obtain a super molecule [31] for better convergence rate. And the variable neighborhood search (VNS) algorithm [32] method with a new processor selection model, as well as its neighborhood structure, is also utilized to promote the intensification capability in the global optimization phase.

There are three major contributions of this work:(1)Developing DRSCRO by modifying the conventional CRO framework and utilizing a metaheuristic method to obtain a super molecule for accelerating convergence.(2)Utilizing the VNS [32] algorithm with a new processor selection model as the global optimization phase initialization, which takes into account the optimization of the scheduling order and processor assignment, and applying one of its neighborhood structures in the reaction operator to promote the intensification capability of DRSCRO.(3)Conducting simulation experiments to prove the efficiency and effectiveness of DRSCRO in terms of makespan and convergence rate.

The next section introduces relevant research works on the DAG scheduling problem on heterogeneous systems. Section 3 describes the models of the studied problem as formal statement. Section 4 presents the design of the proposed DRSCRO for DAG scheduling. In Section 5, the simulation performance of DRSCRO is analyzed and compared with some existing scheme algorithms. Section 6 draws the conclusions of this paper and the suggestions for future research.

2. Literature Review

The DAG scheduling problem, which has been proven to be NP-hard in general [1], can be formulated as the search for an optimal solution to the assignment of the tasks in DAG onto a set of processors, to minimize the total scheduling length (i.e., makespan). There are two main categories, heuristic (deterministic) and metaheuristic (nondeterministic), of the various scheduling algorithms proposed over the last decade. As metaheuristic methods, CRO-based algorithms for DAG scheduling on heterogeneous systems are based on Chemical Reaction Optimization (CRO) algorithm, which was proposed very recently and has shown its power to deal with NP-complete problems.

2.1. Heuristic and Metaheuristic Methods

The heuristic methods are on the basis of the heuristics which are extracted from intuitions, and the most important class of them is list scheduling algorithms [212]. The HEFT algorithm, which was proposed by Topcuoglu et al. [3], utilizes the information of execution cost on average of each task as an upward-ranking heuristic to calculate the task priority. At each step of HEFT, the task with the highest value of upward rank is selected and mapped to the processor with a greedy approach (i.e., the assigned processor minimizes the earliest finish time of the selected task). Experimental results prove that HEFT obtains better performance on schedule quality and computational cost than the other list scheduling algorithms. The performance of heuristic-based algorithms heavily relied on the effectiveness of the heuristics. The higher complexity DAG scheduling problems have, the harder greedy heuristics produce consistent results on a wide range of problems. In particular, GA has been widely used to evolve solutions for many task scheduling problems as the most representative metaheuristic method [21]. Different from heuristic-based algorithms, the metaheuristic methods use a guided-random-search-based process for solution searching. They typically require sufficient sampling of candidate solutions in the search space and have shown robust performance on a variety of scheduling problems. For solving DAG scheduling problem successfully, many metaheuristic algorithms have been utilized such as GA [1421], ACO [13], SA [24], TS [22, 23], CRO [27, 30], VNS [21], and energy-efficient stochastic [33].

According to No-Free-Lunch Theorem [34], all well-designed metaheuristic methods have the same performance on searching for optimal solutions when averaged over all possible fitness functions. In comparison with the heuristic methods, the metaheuristic methods, which always have much higher computational cost, can obtain better performance in terms of schedule quality, because the metaheuristic methods can search a wider area of the solution space with the guided-random-search-based processes for solution searching, while the search of the heuristic-based algorithms are narrowed down to a very smaller portion by means of the heuristics.

2.2. CRO-Based Algorithms for DAG Scheduling on Heterogeneous Systems

CRO was proposed by Lam and Li very recently [25], and, as far as we know, as metaheuristic methods, Double Molecular Structure-Based Chemical Reaction Optimization (DMSCRO) [27] and Tuple-Based Chemical Reaction Optimization (TMSCRO) [30] are the only two CRO-based algorithms for DAG scheduling on heterogeneous systems. CRO-based algorithms mimic the chemical reaction process, which accords with energy conservation, in a closed container. The molecules with two kinds of energy, potential energy (PE) and kinetic energy (KE), in CRO-based algorithms are the solutions to DAG scheduling problem. The PE value of a molecule is calculated by fitness function, which is equal to the objective value, makespan, of the corresponding solution. And KE is for helping the molecule escape from local optimums and its value is nonnegative. A buffer is also used in CRO-based algorithms for energy interchange and conservation. Moreover, to find the solution with the global minimal makespan, four types of elementary chemical reactions, on-wall ineffective collision, decomposition, intermolecular ineffective collision, and synthesis, are applied for the intensification and the diversification searches. The typical execution flow of CRO framework adopted in DMSCRO and TMSCRO is as proposed in [25] and the parameters used in CRO are presented in Table 1.

As metaheuristic methods, DMSCRO and TMSCRO have better performance in terms of schedule quality than heuristic methods and the reason is as presented in the last paragraph of Section 2.1. The experimental results in [27, 30] prove that both of DMSCRO and TMSCRO outperform GA. DMSCRO is the first algorithm by applying CRO proposed by Lam and Li in [25] to solve the DAG scheduling problem, and it enjoys the advantages of both GA and SA. On the one hand, the intermolecular collision and on-wall collision designed in DMSCRO have similar effect to the crossover operation and the mutation operation in GA, respectively. On the other hand, the energy conservation requirement in DMSCRO is able to guide the searching of the optimal solution similarly to the way the Metropolis Algorithm of SA guides the evolution of the solutions in SA. Two additional operations, decomposition and synthesis, give DMSCRO more opportunities to jump out of the local optimum and explore the wider areas in the solution space. This benefit enables DMSCRO to find good solutions faster than GA, which has been widely used to evolve solutions for many task scheduling problems. DMSCRO are not compared with SA in [27, 30], because the underlying principles and philosophies between DMSCRO and SA differ a lot [27]. Typically, metaheuristic algorithms like CRO-based algorithm of GA-based algorithms operating on a population of solutions are able to find good solutions faster than that operating on a single solution like SA-based algorithms. Comparing with DMSCRO, TMSCRO applies constrained earliest finish time algorithm to data pretreatment to take the advantage of the super molecule and constrained critical paths [35], which is, as heuristic information, for accelerating convergence. Moreover, the molecule structure and elementary reaction operators design in TMSCRO are more reasonable than those in DMSCRO on intensification and diversification of searching the solution space.

However, for solving the NP problem of DAG scheduling on heterogeneous systems, CRO-based algorithms, TMSCRO and DMSCRO, still have very large time expenditure as metaheuristic scheduling strategies; therefore, the searching capabilities and convergence rates of them need to be improved. There are three deficiencies of TMSCRO and DMSCRO. First, in [30], the concept of super molecule is applied for accelerating convergence and the super molecule is selected by heuristic scheduling strategies, but the performance of this kind of super molecule selection method is affected by the range of problems. Second, in both TMSCRO and DMSCRO, the initial molecules, which are very important for the whole searching process, are randomly created, and the uncertainty of this kind of initialization undermines the searching capabilities of TMSCRO and DMSCRO. Moreover, the intensification capabilities of CRO-based algorithms for DAG scheduling also need to be improved, to obtain better performances of the average results when the iteration stopping criterions are satisfied.

Therefore, this paper proposes an algorithm, Double-Reaction-Structured CRO (DRSCRO), for DAG task scheduling on heterogeneous systems to aim at obtaining schedules with better quality. In this paper, the conventional CRO framework scheme is modified and two reaction phases, one for super molecule selection and another for global optimization, are developed in DRSCRO. CRO as a metaheuristic algorithm is utilized in the molecule selection phase to obtain a super molecule [31] for better convergence rate. Moreover, in the global optimization phase, the variable neighborhood search (VNS) algorithm method [21, 32, 36], which is an effective metaheuristic with the utilizations of neighborhood structures and a local search to change the neighborhood systematically, is used to optimize the initial molecule, and one of its neighborhood structures is also adopted in the reaction operator to promote the intensification capability. And there is a new model proposed for processor selection utilized in the neighborhood structures of the VNS algorithm for better effectiveness.

Moreover, in [21], VNS was incorporated with GA for DAG scheduling, but the task priority was unchangeable in the VNS algorithm in [21], which reduces the efficiency of VNS to obtain a better solution. So, different from [21], to promote the intensification capability of the whole algorithm, the VNS in DRSCRO is modified under the consideration of the optimization of the scheduling order and the processor assignment both.

3. Problem Formulation

The DAG scheduling problem is typically with two inputs: a heterogeneous system for task computing in parallel and a parallel program of application (i.e., DAG). In this paper, the heterogeneous system is assumed as a static computing system model presented by , which is a fully connected network of processors. The heterogeneity level in this paper is formulated as , where the parameter . In this paper, represents the computation cost of a task mapped to the processor and the value of each is randomly chosen within the scope of .

In general, consists of a task (node) set and an edge set . is as defined in the first paragraph of this section, and the same processor executes a task in the DAG without preemption. The constraint between tasks and is denoted as the edge , which means that the execution of task only after the execution result of task has been transmitted to task . Each edge has a nonnegative weight denoting the communication cost between and . Each task in a DAG can only be executed on one processor and the communication can be performed simultaneously by the processors. In addition, when two communicating tasks are mapped to the same processor, the communication cost of them is zero. Predecessor () represents the set of the predecessors of , while successor () represents the set of the successors of . The task with no predecessor is denoted as while the task with no successor is denoted as .

Consider that there is a DAG with tasks to be mapped to a heterogeneous system with processors. Assuming the highest-priority ready task on the processor , the earliest start time of , , can be formulated aswhere can be defined as (2). is the time when processor is available to the execution of the task :where represents all the tasks which have already been scheduled on the processor and denotes the actual finish time when the task finishes its execution. in (1) represents the time when all the data needed for the process of have been transmitted to , which is formulated aswhere has the same definition in (2) and predecessor () denotes the set of all the immediate predecessors of task . is 0 if the task and task are mapped to the same processor .

If task is mapped to the processor with nonpreemptive processing approach, the earliest finish time of task , , is formulated as

After the task is executed by the processor , is assigned to . The makespan of the entire parallel program is equivalent to the actual finish time of exit task :

The computation of the communication-to-computation ratio (CCR) can be formulated as inwhere is the average computation cost of task and it can be calculated as follows:

A simple four-task DAG and a heterogeneous computation system with three processors are shown in Figures 1(a) and 1(b), respectively. The definition of the notations can be found in Table 2.

4. Design of DRSCRO

DRSCRO imitates the molecular interactions in chemical reactions based on the concepts of atoms, molecule, molecular structure, and energy of a molecule. In DRSCRO, a molecule corresponds to a scheduling solution in DAG scheduling, with a unique molecular structure representing the atom positions in a molecule. We utilize the molecular structure of TMSCRO in our work, under the consideration of its capability to represent the constrained relationship between the tasks in a molecule (solution). In addition, the energy of each molecule corresponds to the fitness value of a solution. The molecular interactions try to reconstruct more stable molecular structure with lower energy. There are four kinds of basic chemical reactions, on-wall ineffective collision, decomposition, intermolecular ineffective collision, and synthesis, for molecular interactions in DRSCRO, and each kind of reaction contains two subclasses. These two subclasses of reaction operators are applied in the phase of super molecule selection and the phase of global optimization, respectively.

4.1. Framework of DRSCRO

The framework of DRSCRO to schedule a DAG job is as shown in Figure 2 with two basic phases, the phase of super molecule selection and the phase of global optimization. In each phase, DRSCRO first initializes the process of a phase, and then the phase process enters iteration.

In this framework, DRSCRO first executes the phase of super molecule selection to obtain the super molecule (i.e., just the molecule with the global minimal makespan),  SMole, with other output molecules as the input of the next global optimization phase for the first time (the input of VNS algorithm is the population with  SMole after each iteration in the global optimization phase in the other times), and then DRSCRO performs the phase of global optimization to approach the global optimum solution. The VNS algorithm with a new model for processor selection is adopted as the initialization of the global optimization phase, and it is also utilized as a local search process to promote the intensification capability of DRSCRO. There are four kinds of elementary chemical reaction in DRSCRO, on-wall collision, decomposition, intermolecular collision, and synthesis. And each kind of reaction contains two types of operators which are, respectively, utilized in two phases of DRSCRO. In each iteration, one of the elementary chemical reaction operators is performed to generate new molecules and the PEs of the newly generated molecules (i.e., the fitness function values of the newly generated molecules) will be calculated. In addition,  SMole will be tracked and only participates in on-wall ineffective collision and intermolecular ineffective collision in the global optimization phase to explore as much as possible the solution space in its neighborhoods and the main purpose is to prevent the super molecule from changing dramatically. The iteration of each phase repeats until the stopping criteria (or next phase criteria) are met, and  SMole and its fitness function value are just the final solution and makespan (i.e., global min point), respectively. In the implementations of the experiments in this paper, the next phase criteria and the stop criteria of DRSCRO are set as when there is no makespan improvement after 10000 consecutive iterations in the search loop.

4.2. Molecular Structure and Fitness Function

This subsection presents the encoding of scheduling solutions (i.e., the molecular structure) and the statement of the fitness function in DRSCRO.

4.2.1. Molecular Structure

In this paper, an atom with three elements can be denoted as a tuple and the molecular structure M with an array of tuples can be formulated as in (8) to represent a solution to the DAG scheduling problem. The order of the tuples in M represents the priority of each DAG task with the allocated processor , and is a topological sequence of DAG, which is with the hypothetical entry task (with no predecessors) and exit task (with no successors) , respectively, representing the beginning and end of execution. Moreover, if tuple is before tuple and is the predecessor of in DAG, the second integer of tuple  , , will be 1, and vice versa

4.2.2. Fitness Function

Potential energy (PE) is defined as the fitness function value of the corresponding solution represented by . The overall schedule length of the entire DAG, namely, makespan, is the largest finish time among all tasks, which is equivalent to the actual finish time of the exit node in DAG. In this paper, the goal of DAG scheduling problem by DRSCRO is to obtain the scheduling that minimizes makespan and ensure that the precedence of the tasks is not violated. Hence, each fitness function value is defined as

Algorithm 1 presents how to calculate the value of the optimization fitness function .

(1) makespan = 0;
(2) for each node in = ((, , ), (, , , , )) do
(3)  calculate the actual finish time of (i.e. )
(4)  if  
(5)   update makespan
(6)   makespan =
(7)  end if
(8) end for
(9) return makespan;
4.3. Super Molecule Selection Phase
4.3.1. Initialization

There are two kinds of initial molecule generator, one used in the phase of super molecule selection and the other used in the phase of global optimization, to generate the initial solutions for DRSCRO to manipulate. The tuples of the first molecule used in the initialization of the phase of super molecule selection are ascendingly ordered by the upward rank value [27] of their , and element three of each tuple is generated by a random perturbation. The upward rank value can be calculated by

A detailed description of the initial molecule generator of the super molecule selection phase is given in Algorithm 2. For the first input molecule  , in each tuple in is set as .

(1) = 1;
(2) while  MoleN  ≤  PopSize do
(3)  for each in molecule to randomly change;
(4)   change randomly
(5)  end for
(6)  generate a new molecule ;
(7)  MoleN = MoleN + 1;
(8) end while
4.3.2. Elementary Chemical Reaction Operators

In DRSCRO, the operators for super molecule selection just randomly change of each tuple in a molecule as the intensification searches or the diversification searches [25] to optimize the processor mapping of a solution. Figures 3, 4, 5, and 6, respectively, show the examples of four operators for super molecule selection, in which the molecules correspond to the DAG as shown in Figure 1(a). And the white blocks in these examples denote the tuples that do not change during the reaction operation calculations.

As shown in Figure 3, the operator, OnWallSMS, is used to generate a new molecule from a given reaction molecule for optimization. OnWallSMS works as follows: (1) The operator randomly chooses a tuple (, , ) in  . (2) The operator changes randomly. In the end, the operator generates a new molecule from as an intensification search.

As shown in Figure 4, the operator, DecompSMS, is used to generate new molecules and from a given reaction molecule . DecompSMS works as follows: (1) The operator generates two molecules and . (2) The operator keeps the tuples in , which is at the odd position in   and then changes the remaining ’s of tuples in , randomly. (3) The operator retains the tuples in , which is at the even position in  , and then changes the remaining ’s of tuples in randomly. In the end, the operator generates two new molecules and from as a diversification search.

As shown in Figure 5, the operator, IntermoleSMS, is used to generate new molecules and from given molecules   and  . This operator first uses the steps in OnWallSMS to generate from , and then the operator generates the other new molecule from in the similar fashion. In the end, the operator generates two new molecules and from   and   as an intensification search.

As shown in Figure 6, the operator, SynthSMS, is used to generate a new molecule from given molecules and . SynthSMS works as follows: The operator keeps the tuples in , which is at the same position in and with the same ’s, and then changes the remaining ’s in , randomly. As a result, the operator generates from and as a diversification search.

4.4. Global Optimization Phase
4.4.1. Initialization

VNS is utilized by our proposed algorithm as the initialization of the global optimization phase and it is also as a local search process to promote the intensification capability of DRSCRO during the running of the whole algorithm.

Algorithms 3 and 4, respectively, present the subset generator of the phase output and the main steps of the whole VNS algorithm (i.e., the initialization of the global optimization phase). In DRSCRO, the VNS algorithm only processes the subset of the population with the super molecule,  SMole, after each iteration in the global optimization phase (the output of super molecule selection phase is the input of VNS for the first time). As presented in Algorithm 3, if the pop_set (i.e., the set of population) is the output of the super molecule selection phase, the tuple orders and s of its elements will be adjusted. pop_subset is the subset of population and pop_subset_num is the number of the elements in pop_subset, which is set as PopSize × 50% in this paper.

(1) tempSet = pop_set;
(2) pop_subset = ;
(3) if  pop_set is the input of the VNS algorithm for the first time (i.e. the output of the super molecule selection phase)
(4)  for each in tempSet except
(5)   choose a tuple (, , ) in , where , randomly;
(6)   generate a random number ;
(7)   if  rnd≥ 0.5
(8)    find the first predecessor = Pred() from to the begin in molecule ;
(9)    interchanged position of (, , ) and (, , ) in molecule ;
(10)   update , and as defined in the last paragraph of Section 4.2.1.
(11)  end if
(12)  for each in molecule to randomly change;
(13)    change randomly
(14)  end for
(15)  if Fit() < Fit()
(16)     = ;
(17)  end if
(18) end for
(19) end if
(20) pop_subset adds ;
(21) pop_subset adds the molecules in tempSet with tuple order different from ;
(22) while    do
(23)  pop_subset add a molecules in pop_set which do not exist in pop_subset;
(24) end while
(1) pop_subset = InitVNS(pop_set);
(2) Select the set of neighborhood structures ( = );
(3) for each individual in the pop_subset  do
(4)  d = 1;
(5)  while    do
(6)   Randomly generate a molecule from the th neighborhood of ;
(7)   Apply some local search method with as the initial molecule (the local optimum presented by );
(8)   If   is better than
(9)    ;
(10)   ;
(11)  else
(12)    d = d + 1;
(13)  end if
(14) end while
(15) until the termination condition is satisfied
(16) end for
(17) execute the combination strategy;

In Algorithm 4, different from the VNS proposed in [21], the task priority was changeable in the VNS algorithm used in DRSCRO, the reason for which is that the unchangeable task priority in the VNS reduces its efficiency to obtain a better solution. Therefore, under the consideration of the optimization of the scheduling order and the processor assignment both, the input molecules of VNS can be with different tuple order (i.e., task priority) as presented in Algorithm 3 in each iteration. is set to 2 as presented in [37]. As the essential factor of VNS, two neighborhood structures, load balance and communication reduction neighborhood structures, which demonstrate their power in solving DAG scheduling problem on heterogeneous systems as presented in [21], are adopted by the VNS algorithm in DRSCRO for their high efficiency. In this paper, a new model is also proposed for processor selection of these two neighborhood structures. As presented in [21], there are two intuitions used to construct the neighborhood structures. One is that balancing load among various processors usually helps minimizing the makespan, especially when most tasks are allocated to only a few processors; the other is that reducing communication overhead and idle waiting time of processors always results in a more effective schedule, especially given a relatively high unit communication. However, there is a contradiction between these two intuitions, because reducing communication overhead and idle waiting time of processors always means that some processors are with most tasks. So, different from the original ones in [21], we develop a new model for processor selection. Let be all the task execution cost of processor , and is the communication cost overhead of processor as defined in [21]. The values of and are the tendencies of load balancing and communication reducing, respectively (i.e., the tendency of task reducing or increasing). The greater is the stronger tendency of reducing tasks on is, and the greater is the stronger tendency of increasing tasks on is. Therefore, a parameter Tend() is developed to measure the tendency with the combination of and as (11). The neighborhood structure computation processes of load balance and communication reduction are as presented in Algorithms 5 and 6, respectively. The proposed model is under the comprehensive consideration of both intuitions and can make the VNS algorithm more effective than the original one:

(1) for each processor in the solution   do
(2)  Compute Tend();
(3) end for
(4) Choose the processor with the largest Tend();
(5) Randomly choose a task from exec();
(6) Randomly choose a processor different from ;
(7) Reallocate to the processor ;
(8) Encode and reschedule the changed solution ;
(9) return  ;
(1) for each processor in the solution   do
(2)  Compute Tend();
(3) end for
(4) Choose the processor with the smallest Tend();
(5) Set the candidate set, cand, empty;
(6) for each task in the set exec() do
(7) Compute the set predecessor();
(8) Update predecessor() with predecessor() = predecessor() − exec();
(9) cand = cand + predecessor();
(10) end for
(11) Randomly choose a task from cand;
(12) Reallocate to the processor ;
(13) Encode and reschedule the changed solution ;
(14) return  ;

The VNS algorithm in DRSCRO utilizes the dual termination criteria. The termination criterion 1 sets the upper bound of the local search iterations to 20, and the termination criterion 2 sets the maximum iteration number without improvement to 3. The VNS algorithm will stop if either criterion is satisfied. To form a new initial population, a combination strategy is utilized for combining the current population and the VNS output after the VNS algorithm outputs the subset of the population. The current population and the VNS output are first merged and sorted by increasing makespan; then the first PopSize molecules are selected to generate the new initial population.

4.4.2. Elementary Chemical Reaction Operators

The operators for global optimization not only vary of each tuple but also interchange the positions of the tuples in a molecule as the intensification searches or the diversification searches [25] to optimize the whole solution.

On-wall ineffective collision (as an intensification search), decomposition (as a diversification search), and synthesis (as a diversification search) are as presented in [30], and we do not repeat them here to focus on our main work. In [30], the function of the ineffective collision operator is similar to that of the on-wall ineffective collision operator. Therefore, different from [30], a modified ineffective collision operator is proposed in this paper, and it utilized the load balance neighborhood structure used in the VNS mentioned before, to promote the intensification capability of DRSCRO and avoid the function duplication.

The operator, IntermoleGO (i.e., the ineffective collision operator), is used to generate new molecules and from given molecules and . This operator first uses the steps in OnWallGO to generate from , and then the operator generate the other new molecule from in the similar fashion. In the end, the operator generates two new molecules and from and as an intensification search. The detailed executions are presented in Algorithm 7. Figure 7 shows the example of the IntermoleGO, in which the molecules correspond to the DAG as shown in Figure 1(a).

(1) choose randomly a tuple (, , ) in where = 0;
(2) exchange the positions of (, , ) and (,, );
(3) modify , and in as defined in the last paragraph of Section 4.2.1;
(4) generate a new molecule = GenLBNeighborhood();
(5) choose randomly a tuple (, , ) in where = 0;
(6) exchange the positions of (, , ) and (, , );
(7) modify , and in as defined in the last paragraph of Section 4.2.1;
(8) generate a new molecule = GenLBNeighborhood();

4.5. Illustrative Example

Consider the example shown in Figure 1(a). Its edges are labeled with the communication costs, whereas the execution costs are shown in Table 3.

Initially, the path (, , , ) is found based on the upward rank value of each task in DAG, and the first molecule,   = ((, 0, ), (, 1, ), (, 0, ), and (, 1, )), can be obtained. Algorithm 2, InitMoleSMS, is then executed to generate the initial population with 10 elements (i.e., PopSize is set as 10) for super molecule selection phase. The initial population molecules are operated during the iterations in the super molecule selection phase as presented in the framework of DRSCRO in Section 4.1, and the super molecule,  SMole = ((, 0, ), (, 1, ), (, 0, ), and (, 1, )), can be obtained.

In the global optimization phase, Algorithm 4, InitMoleGOVNS, is then executed to generate (or to update) the initial population after each iteration as presented in Section 4.1. The molecules are operated during the iterations in the global optimization phase as presented in the framework of DRSCRO in Section 4.1, and the global minimal makespan = 40 is finally obtained, for which the corresponding solution (i.e., molecule) is ((, 0, ), (, 1, ), (, 0, ), and (, 1, )).

4.6. Analysis of DRSCRO

As a new metaheuristic strategy, the CRO-based methods for DAG scheduling which is proposed very recently have demonstrated the capability for solving this kind of NP-hard optimization problems. By analyzing the framework, molecular structure, chemical reaction operators, and the operational environment in DRSCRO, it can be shown to some extent that DRSCRO scheme has the advantage of three points in comparison with other CRO-based algorithms for DAG scheduling.

First, to some degree, super molecule in DRSCRO is similar to InitS in TMSCRO [30] or the “elite” in GA [31]. However, the “elite” in GA is usually generated from two chromosomes, while super molecule is approached by executing the first phase of DRSCRO. Moreover, in comparison with TMSCRO, DRSCRO uses a metaheuristic strategy (CRO) to get a better super molecule. It is because, as intelligent random-search algorithm, CRO used in the phase of DRSCRO for super molecule selection searches a wider area of the solution space than CEFT applied in TMSCRO, which narrow the search down to a very small portion of the solution space. As a result, a better super molecule may contribute to a better global optimum solution and accelerates convergence. Second, DAG scheduling problem has two complex aspects including task sequence optimization and processor assignment optimization, which lead to a very large and complicated solution space. So, for a better capability of intensification search than other CRO-based algorithms for DAG scheduling on heterogeneous systems, DRSCRO applied VNS algorithm as the initialization of the global optimization phase, which is also as a local search process during the running of DRSCRO, and one of the neighborhood structures of VNS is also utilized in the ineffective reaction operator. Moreover, during the running of DRSCRO, the task priority is changeable in our adopted VNS algorithm and a new model for processor selection is also utilized in the neighborhood structures for promoting efficiency of VNS, different from the VNS proposed in [22]. All of three advantages as previously mentioned enhance the ability to get better rapidity of convergence and better search result in the whole solution space, which is demonstrated by the experimental results in Sections 5.3 and 5.4. The time complexity of DRSCRO is .

5. Experimental Details

In this section, the simulation experiment and comparative evaluation of HEFT, DMSCRO, TMSCRO, and proposed DRSCRO are presented. As presented in [27, 30], by theory analysis and experimental results, TMSCRO and DMSCRO proved to have better performance than GA; therefore, our work is the further study of CRO-based algorithms for DAG scheduling on heterogeneous systems, and, for DRSCRO as a metaheuristic algorithm, we focus on the performance of our proposed algorithm itself and the comparison between DRSCRO and other similar kinds of algorithms.

First, two extensive sets of graphs as the test beds for comparative study are described. Next, the parameter settings which are used in the simulation experiments are presented. The results and analysis of the experiment, including makespan test and convergence rate test, are given in the final part.

5.1. Test Bed

As presented in [27, 30], two extensive sets of DAGs, real-world application and randomly generated application graphs, are considered as the test beds in the experiments to enhance the comparability of various algorithms. The first extensive test bed is two real-world problem DAGs, molecular dynamics code [38] and Gaussian elimination [8]. Molecular dynamics are a computer simulation of physical movements of the molecules and atoms, which are allowed to interact for a period of time, in the context of N-body simulation. Molecular dynamics code DAG is shown in Figure 8. Gaussian elimination is used to calculate the solution for a linear equation system, which is applied systematically to convert row operations on a set of linear equations to the upper triangular form. As shown in Figure 9, the total number of tasks in the Gaussian elimination DAG with the matrix size of 7 is 27, and the largest task number at the same level is 6. The reason of the utilization of these two application graphs as a test bed is not only to enhance the comparability of various algorithms but also to show the function application of our proposed algorithm as an illustrative demonstration without loss of generality. The second extensive test bed for comparative study is the DAGs of random graphs. A random graph generator presented in [39] is implemented to generate random graphs in the simulation experiment. It allows the user to generate a variety of random graphs with different characteristics, such as CCR, the amount of calculation of a task, the successor number of a task, and the total number of tasks in a random graph. It is also assumed that all tasks and communication links have the same computation cost and communication cost, respectively.

As shown in Figure 2, the next phase criteria and stopping criteria of DRSCRO are that the makespan stays unchanged for 5000 consecutive iterations in the search loop. And the stopping criterion of TMSCRO and DMSCRO is that the makespan remains the same for 10000 consecutive iterations.

5.2. Parameter Setting

In the experiments, a parameter is set to represent the heterogeneity level as presented in the first paragraph of Section 3. It complies with the MHM model assumption and results in the fact that speeds of a computing processor are different for different tasks. In doing so, the heterogeneity level is equal to the biggest possible ratio of the best processor speed to the worst processor speed for each task. is set as the value to make the heterogeneity level 2 unless otherwise specified in this paper.

The details of parameter setting are shown in Table 4. The parameters 6–12, which are the CRO-based algorithms tested in the simulation, are set as presented in [25].

5.3. Makespan Tests

The performance of the proposed algorithm is compared with two state-of-the-art CRO-based scheduling algorithms, DMSCRO and TMSCRO, and a heuristic algorithm HEFT. Each makespan value plotted in the graphs is the average value of a number of independent runs. In the first extensive test bed, the makespan is averaged over 10 independent runs (HEFT is run only once as a deterministic algorithm.), while in the second extensive test bed the makespan is averaged over 30 different random graph running instances. Moreover, to prove the robustness of DRSCRO, the best final value achieved in all these runs, the worst final value, and the related standard deviation or variance are also presented.

5.3.1. Real-World Application Graphs

Figures 1013 show the simulation experiment results of DRSCRO, DMSCRO, TMSCRO, and HEFT on the real-world application graphs, and Tables 47 list the detail of the experimental results.

As shown in Figures 10 and 11, it can be observed that the average makespan decreases as the processor number increases. The results also show that DRSCRO, TMSCRO, and DMSCRO achieve very similar performance, which are all metaheuristic methods. It is because, according to No-Free-Lunch Theorem, all well-designed metaheuristic methods have the same performance on searching for optimal solutions when averaged over all possible fitness functions. The TMSCRO and DMSCRO used in the simulation are well-designed and taken from the literature. Therefore it proved that DRSCRO developed in our work is also well-designed.

A close observation of the results in Tables 10 and 11 shows that DRSCRO outperforms TMSCRO and DMSCRO on average slightly. The reason is only because DRSCRO has better capability of intensification search by applying VNS and the utilization of one of its neighborhood structures in the ineffective reaction operator, as presented in the last paragraph of Section 4.6. Therefore the performance of the average results obtained by DRSCRO is better than that obtained by TMSCRO and DMSCRO, when the stopping criterion is satisfied. Moreover, DRSCRO, TMSCRO, and DMSCRO typically outperform HEFT because they search a wider area of the solution space as metaheuristic methods, while the search of HEFT is narrowed down to a very smaller portion by means of the heuristics.

Figures 12 and 13 and the results in Tables 7 and 8 show the performance of the experimental results of these four algorithms with CCR value increasing. It can be seen that the makespan on average increases with the CCR value increasing. It is because the heterogeneous processors are in the idle state for longer, as a result of the DAGs becoming more communication-intensive. It also can be observed that DRSCRO, TMSCRO, and DMSCRO outperform HEFT and the advantage becomes more significant with the value of CCR increasing, which suggest that heuristic algorithm like HEFT has less consistent performance in a wide scheduling scenario range, and metaheuristic algorithm performs more effectively for communication-intensive DAGs.

5.3.2. Randomly Generated Application Graphs

As shown in Figures 1416, randomly generated DAGs are used to evaluate the performance of DRSCRO, TMSCRO, DMSCRO, and HEFT in these experiments. And the details of these experimental results are listed in Tables 911.

Figure 14 shows the performance on the experimental results of these four algorithms with the processor number increasing. As shown in Figure 14, DRSCRO always outperforms TMSCRO, DMSCRO, and HEFT as the number of processors increases. Figure 15 shows that DMSCRO has better performance than the other three algorithms as the task number increases. The reasons for these are similar to those explained in the third paragraph of Section 5.3.1. Figure 16 shows the makespan on average with CCR values increasing. It can be seem that the average makespan increases rapidly with the increasing of the value of CCR. As shown in Figure 16, the makespan on average increases rapidly when the value of CCR rises. It is the fact that the DAG becomes more communication-intensive with CCR increasing which leads to the processors staying in the idle state for longer.

5.4. Convergence Tests

In this section, the convergence experiments are conducted to show the change of makespan among DRSCRO, TMSCRO, and DMSCRO. The convergence traces and significant tests are to further reveal the differences between DRSCRO and the other two algorithms. In these experiments, as suggested in [27], the stopping criteria of these three algorithms are that the total running time reaches a setting value (e.g., 180 s). Under the consideration of comparability, the beginning of the time counting of DRSCRO is set as the start of the global optimization phase processing. In the first extensive test bed, the makespan is averaged over 10 independent runs, while in the second extensive test bed the makespan is averaged over 30 different random graph running instances.

5.4.1. Convergence Trace

The convergence traces of DRSCRO, TMSCRO, and DMSCRO for processing the molecular dynamics code and Gaussian elimination are plotted in Figures 17 and 18, respectively. Figures 1921 show the convergence traces when processing the randomly generated DAG sets, of which each contains 10, 20, and 50 tasks, respectively. As shown in Figures 1721, it can be observed that the convergence traces of these three algorithms have obvious differences. And the DRSCRO converges faster than the other two algorithms in every case. The reason for the better rate of convergence of DRSCRO is as presented in the last paragraph of Section 4.6 (i.e., DRSCRO takes the advantage of its double-reaction structure to obtain a better super molecule for accelerating convergence). Even though the VNS algorithm adds the time cost in each iteration, the enhanced optimization capability of DRSCRO also makes it obtain a better coverage rate than TMSCRO and DMSCRO. The simulation experimental results show that DRSCRO converges faster than TMSCRO by 19.4% on average (by 29.3% in the best case) and faster than DMSCRO by 33.9% on average (by 41.2% in the best case).

Moreover, the statistical analysis based on the average values achieved is also presented in Section 5.4.2, to prove that DRSCRO outperforms the other CRO-based algorithms for DAG scheduling from a statistical point of view.

5.4.2. Significant Tests

Statistical analysis is necessary for the average coverage rates obtained in all cases by DRSCRO, TMSCRO, and DMSCRO, which are metaheuristic methods, in order to find significant differences among these results. Nonparametric tests according to the recommendations in [40] are specifically considered to be used, since the experimental results may present neither normal distribution nor variance homogeneity. Therefore, the Friedman test and the Quade test are applied to check whether significant differences exist in the performance between these three algorithms. A significance level = 0.05 is used in all statistical tests.

Tables 12 and 13, respectively, list the test results of the Friedman test and the Quade test, which both reject the null hypothesis of equivalent performance. In both of these two tests, our proposed DRSCRO is not only compared against all the algorithms but also compared against the remaining ones as the control method. The results in Tables 12 and 13 validate the significant differences in the performance of DRSCRO, TMSCRO, and DMSCRO.

In sum, it could be concluded that DRSCRO, which is the control algorithm, statistically outperforms the other CRO-based DAG scheduling algorithm on coverage rate with a significant level of 0.05.

6. Discussion

The experimental results of makespan tests show that the performance of DRSCRO is very similar to the other similar kinds of metaheuristic algorithms because when averaged over all possible fitness functions, each well-designed metaheuristic algorithm has the same performance for searching optimal solutions, according to No-Free-Lunch Theorem. However, the proposed DRSCRO can achieve better performance and find good solutions faster than the other similar kinds of metaheuristic algorithms as the experimental results of convergence tests, and the reason for it, as the analysis in the last paragraph in Section 4.6, is that DRSCRO has a better super molecule creation by metaheuristic method, and under the consideration of the optimization of scheduling order and processor assignment, DRSCRO takes the advantages of VNS algorithm in the global optimization phase to improve the optimization capability. A load balance neighborhood structure is also applied in the ineffective reaction operator for a better intensification capability. The new processor selection model utilized in the neighborhood structures also promotes the efficiency of VNS algorithm.

7. Conclusion and Future Study

An algorithm named Double-Reaction-Structured CRO (DRSCRO) is developed for DAG scheduling on heterogeneous systems in this paper. DRSCRO includes two reaction phases, one for super molecule selection and another for global optimization. The phase of super molecule selection is used to obtain a super molecule by the metaheuristic method for better convergence rate, different from other CRO-based algorithms for DAG scheduling on heterogeneous systems. In addition, to promote the intersection capability of DRSCRO, the VNS algorithm, which is with a new model for processor selection utilized in the neighborhood structures, is used as the initialization of global optimization phase, and the load balance neighborhood structure of VNS is also applied in the ineffective reaction operator. The experimental results show that DRSCRO can also achieve a higher speedup than the other CRO-based algorithms as far as we know. And DRSCRO algorithm can also obtain better performance on average makespan in some cases.

In future work, we will analyze the parameter sensitivity of DRSCRO for promoting its activeness. Moreover, to make the proposed algorithm more practical, DRSCRO will be also extended to aim at two main objectives, such as (1) minimization of schedule length (time domain) and (2) minimization of number of used processors (resource domain).

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work is financially supported by the National Natural Science Foundation of China (Grant no. 61462073) and the National High Technology Research and Development Program of China (863 Program) (Grant no. 2015AA020107).