Mathematical Problems in Engineering

Volume 2013 (2013), Article ID 708495, 16 pages

http://dx.doi.org/10.1155/2013/708495

## A Genetic Algorithm for Task Scheduling on NoC Using FDH Cross Efficiency

School of Communication and Information Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China

Received 5 August 2013; Accepted 7 November 2013

Academic Editor: Hao-Chun Lu

Copyright © 2013 Song Chai et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

A CrosFDH-GA algorithm is proposed for the task scheduling problem on the NoC-based MPSoC regarding the multicriterion optimization. First of all, four common criterions, namely, makespan, data routing energy, average link load, and workload balance, are extracted from the task scheduling problem on NoC and are used to construct the DEA DMU model. Then the FDH analysis is applied to the problem, and a FDH cross efficiency formulation is derived for evaluating the relative advantage among schedule solutions. Finally, we introduce the DEA approach to the genetic algorithm and propose a CrosFDH-GA scheduling algorithm to find the most efficient schedule solution for a given scheduling problem. The simulation results show that our FDH cross efficiency formulation effectively evaluates the performance of schedule solutions. By conducting comparative simulations, our CrosFDH-GA proposal produces more metrics-balanced schedule solution than other multicriterion algorithms.

#### 1. Introduction

The scheduling problem has long been a research hotspot since its proposal in the 1950s, as the Job-shop scheduling problem [1]. After the computer technology emerged in the 1940s, the scheduling problem also found its position in the computer science region, as the task scheduling problem on the uniprocessor in the 1960s [2], the multiprocessor in the 1970s [3], the distributed computing in the 1980s [4], and the grid computing in the early 21st century [5]. Now the chip fabrication technology has brought us to the single-chip multicore era [6]. The presence of chip multiprocessor (CMP), especially the NoC (network-on-chip)-based MPSoC [7], brings new challenges to the task scheduling algorithm design.

In NoC solution, the idea of introducing the network infrastructure to the chip design, along with the newly arisen concept of green communication [8], makes the goal of scheduling algorithm change from single-objective optimization on makespan to performing optimization simultaneously on multiple metrics, not only the traditional makespan, but also energy [9] and NoC criterions [10], and some of the optimizations of these metrics are even in conflict with each other. So the goal of scheduling algorithm design leans forward to balancing these multiple metrics.

On the other hand, DEA is a nonparametric technique that is used to measure the relative efficiency of multi-input multioutput DMUs (decision making units). It was first presented by Charnes, Cooper, and Rhodes as the CCR model in 1976 [11], then developed into several variations based on different RTS (return-to-scale) assumptions. The concept of efficiency in DEA gives us a reasonable standard to make trade-off between multiple metrics.

In this paper, a FDH DEA model of NoC task scheduling is constructed, and a FDH cross efficiency formulation is proposed based on peer appraisal for further assessment of the relative advantages of DMUs. Then the proposed DEA approach is introduced to the genetic algorithm, and a CrosFDH-GA scheduling algorithm is proposed for the task scheduling problem on NoC to find the most efficient and balanced schedule solution.

The rest of this paper is organized as follows. Section 2 summarizes the related work of this paper; Section 3 formulates the task scheduling problem on NoC; the FDH and cross efficiency FDH formulation is given in Section 4; Section 5 presents our CrosFDH-GA scheduling algorithms; simulations results and discussion are given in Section 6; Section 7 concludes the paper.

#### 2. Related Works

The multicriterion scheduling algorithms for CMPs have been widely researched in recent years. In [12], a scheduling algorithm is proposed for multicore processors to avoid resource contention, as well as to reduce energy consumption. A modified genetic algorithm which incorporates bacteriological algorithm is proposed in [13] to maximize the system reliability and reduce makespan. In [14], the optimization of makespan and workload balance is addressed, and a NSGA-II based schedule algorithm is proposed for multicore-based grid. A multiobjective evolutionary algorithm (MOEA) based schedule heuristic is proposed for the joint optimization of performance, energy, and temperature on multicore processors in [15].

As for the DEA’s application in the field of task scheduling on multicore, a FDH-based evaluation method for the assessment of schedule heuristics is proposed in [16]. Although both [16] and our work adopt DEA FDH model as the analytic tool, our work introduces the concept of cross efficiency to the FDH model and uses FDH cross efficiency to rank schedules. Moreover, the incorporation of DEA evaluation method into metaheuristic also distinguishes itself from the work in [16].

#### 3. Problem Formulation

##### 3.1. Task Model

In this paper, tasks are modeled using directed acyclic graphs (DAGs). A DAG is an acyclic graph where is the set of nodes which represent the tasks and is the set of edges in which an element denotes the communication from task to task . The edge indicates the precedent relation between two tasks.

Each node and edge are associated with a weight, denoted by and , respectively. Weight is the computational load required by a processing Element (PE) to execute task ; and is the data transmission load between task and task . In our work, both and are presented using time unit (cycles).

##### 3.2. Network-on-Chip Hardware

The target hardware is a 2D mesh NoC-based MPSoC, as illustrated in Figure 1(a). Each PE is connected to a router, and routers are interconnected with each other through bidirection links. Data is transferred through NoC in the form of packets.

PEs are homogenous processor cores with local data cache. If two consequential tasks are scheduled to the same PE, the successor task reads the predecessor’s data directly from the data cache of the PE without routing in NoC.

The microstructure of a NoC router is shown in Figure 1(b). The router has five *Inports* and *Outports* corresponding to five directions of *East*, *West*, *North*, *South*, and *Local*. The decoder in the *Inport* scans the first flit of the FIFO for any incoming packet. If the decoder detects the head flit of a packet, it performs *XY routing algorithm* and sends request signal to the arbiter of the corresponding *Outport*. If the arbiter receives multiple request signals, the contention is solved using *Round-Robin arbitration*. The granted *Inport* then forwards the packet to the downstream router. *Wormhole routing* is adopted to minimize the buffer requirement as well as the packet latency [17]. The *back pressure* mechanism is also employed to further reduce end-to-end delay [18].

The energy model of a NoC is presented by the *Bit Energy* proposed in [19]. Analytically, the average energy consumption of transmitting one bit from node to node is calculated by
where and represent the energy consumed on the node and on the link, respectively, and is the number of nodes the bit passes on its way from node to node *.*

##### 3.3. Monitored Metrics

In this paper, four common metrics of NoC are extracted and monitored for the assessment of schedule solution, and they are makespan, data routing energy, average link load [20], and workload balance [21].

*Makespan* (the metric), which is the amount of time required by a NoC to finish entire tasks in a DAG following the instruction of a schedule, is the time metric of a schedule, while the *data routing Energy* (the metric) is the total amount of energy dissipated in each NoC component during the execution of the DAG.

The *average Link load* (the metric) is calculated by adding up all the data transmission time (in cycles) on each link and then dividing it by the number of links (for the NoC in Figure 1(a), there are 48 links). The metric represents how busy the NoC is during the execution, and a higher average link load metric implies a higher possibility of link contentions. A good schedule is supposed to reduce the metric.

Finally, the *workload Balance* (the metric) is defined to be the inverse coefficient of variant of the total workload on each processor, as shown in (2). The is the actual load on processor , and the load_{ave} is the average load. The metric reflects the load balance of the processors

#### 4. DEA Evaluation of Schedules

##### 4.1. A Brief Review of DEA

Data envelopment analysis (DEA) is a nonparametric technique that is widely used to measure the relative efficiency among many-input, many-output decision-making units (DMUs), which in our context are the schedules. The efficiency of a DMU is defined as the weighted sum of its output divided by the weighted sum of its input. The essence of DEA is that it allows each DMU to choose a particular set of weight coefficients which favors its own efficiency, under the constraint that the efficiencies of all DMUs calculated by this set of coefficients do not exceed 1. A DMU is “efficient” if the efficiency calculated by DEA is 1; otherwise the DMU is marked as “inefficient.” The following *Linear Programing* (LP) problem is the CCR model of DEA: (CCR multiplier form)
where and are the input and output vectors of DMU ; and are the coefficient (multiplier) vectors of the inputs and outputs; is the number of DMUs; and the objective function is the efficiency of DMU .

The result of applying DEA is a classification among DMUs as efficient group or inefficient group. The efficient DMUs form an efficient frontier on the multi-input, multioutput space that envelops all inefficient DMUs. The projection of inefficient DMU on the efficient frontier is the hypothetical efficient unit, which is a linear combination of the efficient DMUs. An inefficient DMU can also be converted to an efficient DMU by proportionally scaling down by the value of its efficiency in the inputs and maintaining its original outputs. This interpretation of DEA efficiency is corresponding to the envelop form of DEA, which is the dual problem of (3) (CCR envelop form) where and are the input and output matrices; is a nonnegative vector; and is the efficiency of DMU .

The way that the efficient frontier is generated differentiates between DEA models which imply different returns to scale assumptions. There are four basic returns to scale assumption: constant returns to scale (CRS), corresponding to the CCR model [11]; variable returns to scale (VRS), corresponding to the BCC model [22]; increasing returns to scale; and decreasing returns to scale.

In this paper, we focus on a special case of DEA, namely, the free disposal hull (FDH) [23]. In VRS FDH formulation of DEA, each DMU is evaluated by comparing itself to other DMUs on a one-on-one basis, and a DMU is considered efficient only when no other DMU dominates it.

In most cases, the idea of DEA that let DMU specify its own weight coefficients to show its maximum advantage is desirable. However, in some extreme scenes, a DMU can “cheat” a high efficiency score by weighting a single input or a single output and setting the rest weight coefficients close to 0. This can happen when some DMUs have a particularly small input or particularly large output; in our words, these DMUs have unbalanced metrics, and these “mavericks” need to be depreciated. Moreover, although DEA effectively discriminates between efficient and inefficient DMUs, it does not further assess the relative advantages among the efficient ones.

One solution to the above questions is to introduce cross efficiency in the efficiency measurement. The concept of cross efficiency, corresponding to the simple efficiency implied by original DEA, is the peer-appraisal equivalent of DEA’s self-appraisal process. The cross efficiency of a certain DMU is the efficiency value calculated by using weight coefficients derived by other DMUs. If a DMU is a maverick or has unbalanced metrics, then its cross efficiency value derived from other DMU’s coefficients is not likely to be high.

Two widely accepted crossefficiency formulations, the aggressive and benevolent formulations, were proposed in [24] based on CCR Model. Both formulations add a secondary goal to the normal DEA efficiency calculation (maximizing reference DMU’s efficiency): the aggressive formulation minimizes target DMU’s crossefficiency, while the benevolent formulation maximizes its efficiency. Given that the simple DEA efficiency of DMU is , then the crossefficiency of DMU evaluated by DMU is defined by (CCR crossefficiency, benevolent formulation)

##### 4.2. FDH DEA Evaluation of Schedules

In order to apply data envelopment analysis to the schedule evaluation, the multi-input multioutput DMU model needs to be defined using the schedule metrics, namely, *makespan *, *routing energy *, *average link load * and *workload balance *, proposed in Section 3.

The classification of metrics as inputs and outputs follows a simple “rule of thumb” [16]. If the value of a metric is larger-is-better, then it is an output; otherwise the metric is an input. As a result of this classification, the DMU model of our schedule evaluation is as follows: are the inputs, and is the output.

Moreover, in the rest of this paper the term “schedule” is referring to a schedule that is discriminable using the four-metric classification. If two schedules have identical metrics, they are regarded as the same schedule; at least they are not discriminable under current metrics.

With the scheduling DMU model, the observed schedules are defined as follows.

*Definition 1 (schedule set). *The observed schedules or the *schedule set* is the set of observed schedules, where each schedule is defined by the input vector and output scalar .

Another concept that is relevant to DEA is the* possible production set*. The *possible production set* is the space enveloped by the efficient frontier on the multi-input multioutput space. Normally, the *possible production set* is unknown in the DEA and needs to be constructed using the observed DMUs. The FDH* possible production set* postulates were proposed in [23]. Here, under our context, we restate these postulates as the following axiom.

*Axiom (possible schedule set). *The *possible schedule set* (PSS) of a schedule set satisfies the following.*Postulate I.* PSS contains all schedules in .*Postulate II.* PSS contains unobserved schedule if(i), and or(ii).

Postulate II is the free disposal postulate, which suggests a free disposal hull of [25]. Together with the determinist Postulate I, they define the FDH *possible schedule set* of our schedule efficiency analysis. Now, we formally introduce FDH DEA to the schedule evaluation and define the efficient schedule as follows.

*Definition 2 (efficient schedule and efficient schedule set). *In a schedule set , a schedule is called efficient if the optimization problem (6) has an optimal solution of ; otherwise is inefficient. The set of all efficient schedules in is called the efficient schedule set, denoted by , and likewise, the inefficient schedule set is denoted by :
where and are the input vector and output scalar of ; is the number of schedules in ; and are the input and output matrices of ; is a binary vector in ; and are the nonnegative slack variables, representing input excess and output shortfall; is a non-Archimedean infinitesimal constant; and is the efficiency of .

Optimization problem (6) is a *Mixed-Integer Programing* (MIP). The binary vector and the constraint enforce a one-on-one comparison between target schedule and all the schedules in to search for a reference schedule that minimizes .

The first two constraints of problem (6) can be simplified to (7) by removing the slack variables. Obviously, the problem has feasible solutions when and in this situation. From this point on, if a reference schedule is found with , that means that produces output at least as same as , with the inputs no more than , which is a scale-down from . Then this makes inefficient:

However, only is not enough for a schedule to be efficient. Consider an efficient schedule and an inefficient schedule which is distinguished from by a small input excess in the makespan; the constraint (7) holding for also holds for ; thus for . The slacks variables and are introduced to remove these schedules with input excess and output shortfall. The nonzero and force the objective function in (6) to be less than 1.

Definition 1 implies the dominance/Pareto optimality of an efficient schedule. A dominant schedule is defined as follows.

*Definition 3 (dominant schedule). *A schedule is said to dominate if(I)each component of is not greater than that of ;(II) is not less than ;(III.a)at least of component of is less than the corresponding one of (dominates in input); or(III.b) is greater than (dominates in output).

Then the relationship between efficient schedule and dominant schedule is given in Theorem 4.

Theorem 4 (dominance and efficiency). *In a schedule set , a schedule is called efficient if and only if no other schedule dominates it.*

*Proof. **If Part.* Suppose there is a schedule that dominates .

The efficiency of is calculated by solving the following optimization problem:

Let and ; then constraints of (8) are converted to

From the domination relation, we have , , , and , and at least one of above inequations holds strictly. This means at least one of , , , and in (4) is not zero. So the objective function in (8) has a feasible solution of , and schedule solution is not efficient.*Only If Part.* Suppose schedule is not efficient then there must exist a set of , , , and that makes the optimal value of (8) be . Assuming in , from (8) we have
The optimal value of suggests that (I) , and at least one of the and is not zero or (II) . Either of the above situations implies that schedule is dominated by .

Theorem 4 relates the relatively abstract concept of FDH efficiency to the concept of dominance. Moreover, the following two corollaries are deduced from Theorem 4.

Corollary 5. *In a schedule set , if a schedule satisfies*(1)*, or*(2)*, or*(3)*, or*(4)*, ,** then it is in the efficient schedule set .*

*Proof. *If satisfies one of the above conditions, then there is no schedule dominating . From Theorem 4, is efficient.

Corollary 5 points out that the schedule with the smallest makespan or the smallest energy consumption or the smallest queuing time or the best workload balance is an efficient schedule.

Corollary 6. *In a schedule set , removing any inefficient schedule from does not change the elements in .*

*Proof. *The schedules in are dominant ones, and removing any inefficient schedule will not change the dominant position of the elements in . Thus, remains unchanged.

##### 4.3. FDH Cross Evaluation of Schedules

In this section, a cross evaluation process is proposed based on peer-appraisal FDH DEA for further assessment of schedules. DEA calculates the efficiency of a DMU by allowing the DMU to choose a scenario that is best for itself. Although this basis is plausible in most situations, some “maverick” DMUs, especially the DMUs with a single small input or a single large output, may “cheat” DEA to achieve high score by valuing its only strength and depreciating other metrics. These unbalanced DMUs must be devaluated during further assessment. Moreover, an assessment of relative advantages among efficient DMUs is also required.

FDH model is a MIP problem in nature. In order to derive its peer-appraisal variation, the dual problem of the FDH DEA in (6) is needed to construct the formulation. Normally, it is difficult to write the dual problem of a MIP; however, by exploring the particularity of the vector , Agrell has proven in [26] that the MIP problem in FDH model can be simplified to a LP problem. Using his result, the FDH envelop form in (6) is reduced to the LP problem given in

The dual problem of (11) is (FDH multiplier form equivalent)

LP problem (12) is the multiplier form equivalent of FDH model. It also reveals the economic meaning of FDH. Coefficients are the prices for output and input . The profit of DMU *i* under the price system is calculated by , and the input cost of target DMU is normalized . The second constraint of (11) is equivalent to
which suggests a nonnegative profit difference between the input-scaled DMU and DMU . The upper bound of scale factor , calculated by letting , is , which indicates a scaling down of DMU ’s input. FDH scans all the DMUs to find a reference DMU and a price system with the largest scale-down factor .

Based on the FDH multiplier form equivalent given in (12), we now define the peer-appraisal FDH cross efficiency as follows.

*Definition 7 (FDH cross efficiency, benevolent formulation). *Given that the FDH efficiency of schedule is , the FDH cross efficiency of evaluated by is the optimal value of in

Definition 7 is the FDH correspondence of the peer-appraisal CCR (benevolent formulation) in [24], which is reviewed in Section 4.1. The third constraint LP in (14) ensures that the efficiency of DMU *i* calculated by coefficients is the FDH efficiency (primary goal), and under this constraint, (14) searches for the best efficiency value of (secondary goal).

The following two theorems reveal the relation between FDH efficiency and FDH cross efficiency.

Theorem 8. *The cross efficiency of schedule evaluated by itself is its FDH efficiency.*

*Proof. *Assume that the cross efficiency of schedule evaluated by itself is and the FDH efficiency (simple efficiency) is .

The calculation of is solving the following LP:

Compared with the FDH multiplier form equivalent in (12), the LP in (15) has extra constraints of
where is the optimal value of (12). Since satisfies (12), the extra constraints of (16) always hold true. That means is also the optimal value of (15); thus .

Theorem 9. *The cross efficiency of schedule evaluated by schedule does not exceed the value of its simple efficiency .*

*Proof. *It is obvious that the optimal solution in (14) is a feasible solution of calculating schedule 's FDH efficiency using (12)

Thus the optimal value of in (14) is not greater than the optimal value of in (17).

Using the peer-appraisal FDH proposed in (14), the cross efficiency of a DMU is defined as follows.

*Definition 10 (cross efficiency matrix, average cross efficiency, and the most efficient schedule). *In a schedule set with schedules, the cross efficiency matrix is defined by
where is the FDH cross efficiency of evaluated by using (14). The average cross efficiency of DMU is the average value of its cross efficiencies evaluated by all DMUs in , defined by . The most efficient schedule () is the schedule with the largest average cross efficiency.

The cross efficiency matrix is constructed to calculate the cross efficiency of each DMU. The diagonal elements in are the self-appraisal FDH efficiencies, and the rest of the elements are the peer-appraisal FDH efficiencies. The elements in th row of are the efficiencies of DMU rated by peers, and the elements in th column are the efficiencies of peers rated by DMU . The cross efficiency of DMU is the average value of the th row. And is the best (both efficient and well metrics balanced) schedule in under our evaluation system.

Corollary 11. *The average cross efficiency of a schedule , which is defined in Definition 10, is not greater than its simple efficiency .*

*Proof. *Using the result of previous theorem, the cross efficiency of schedule evaluated by arbitrary schedule is not greater than the . Thus , which is the average value of , is not greater than the .

Then the relation between the FDH efficiency and the most efficient schedule is given in the following theorem.

Theorem 12. *The most efficient schedule of a schedule set is an efficient schedule.*

*Proof. *Suppose the most efficient schedule is an inefficient schedule, and then there exits an efficient schedule that dominates it. Formally, let be the most efficient schedule of , where is the input vector and is the output scalar. Let be the cross efficiency of schedule evaluated by schedule , and , , , are the optimal coefficients. Let be a dominant schedule of .

First assume that dominates in output (balance), and , where and has a small increment in balance. Now we calculate the cross efficiency of schedule evaluated by schedule as follows:
Compared with the calculation of the cross efficiency of schedule evaluated by schedule , it is easy to verify that , , , , is a feasible solution of the above LP. Then the optimal value of is not less than , which is greater than . That means the cross efficiency of schedule evaluated by an arbitrary schedule is greater than the , which contradicts the assumption of being the most efficient schedule.

Then assume that dominates in input. Without loss of generality, we assume that dominates in makespan, and , where and has a small decrement in makespan. Then the cross efficiency of schedule evaluated by schedule is calculated by solving the following LP:

Let , , and . By substituting , , and in (20), it is easy to verify that they are a feasible solution of (20). Then the optimal solution of the optimal value of is not less than , which contradicts the assumption of being the most efficient schedule.

Finally, if dominates in multiple metrics, say , intermediary schedules of and can be constructed to prove that is not the most efficient schedule using the previous results.

Now we will prove the unit invariance property of our peer-appraisal FDH proposal as well as the original FDH model.

Theorem 13 (unit invariance property). *The values of optimal goal in (5) and (12) are independent of the units that inputs and outputs are measured in.*

*Proof. *First we prove that FDH efficiency (5) is unit invariant. The FDH model in (5) is equivalent to the LP in (11). So if (11) is unit invariant, then (5) is too.

In a schedule set , let and be the input vector and output scalar, and let , and be the optimal value for (12).

Under the new unit system, the inputs and outputs now are the original values multiplied by conversion coefficients, denoted by and . Then applying (11) to the new schedule set , obviously we have a feasible solution of , and , which is transformed from the original problem. Therefore, we have the optimal value of new problem . Suppose , and the weight coefficients are and under this circumstance. Converting the weight coefficients to the original unit system, we have and , which also satisfy the constraints of original problem in (12). This suggests that , , and are also a feasible solution of original problem, and contradicts the optimal assumption of . So is the only possibility. This proves that the FDH efficiency is not affected by the units that inputs and outputs are measured in.

Using this result, the unit invariance of peer-appraisal FDH can be proved in the same manner.

#### 5. Efficient Scheduling Using CrosFDH-GA

##### 5.1. Basic Design

The most straightforward way to introduce our FDH cross evaluation method to the genetic algorithm (GA) is to calculate the average cross efficiency of all individuals in the pool and use the efficiency value as the fitness of each individual. The problem of this simple solution is the high computational requirement of DEA calculation.

For a genetic algorithm with 1000 population, the calculation of FDH simple efficiency of a single individual is to solve a LP with 4001 variables (, , , , , and ) and 2000 constraints using (12), and the calculation of FDH simple efficiency of all individuals is to solve 1000 such LPs.

Then the calculation of FDH cross efficiency of an individual evaluated by another individual is to solve a LP with 4001 variables and 3000 constraints using (14). To calculate the average cross efficiency of an individual, solving 999 such LPs is required. And to calculate the average cross efficiency of all individuals, it requires repeating the process for 1000 times.

In our computing configuration (Intel i5-3210M, 4 Gb RAM, 32 bit Win7, VS2010, and GLPK 4.47), the calculation of a single FDH simple efficiency and a single FDH cross efficiency in the above DEA-GA implementation takes about 3.6 seconds and 6.2 seconds on average. The calculation of an average cross efficiency requires over 10 minutes. The DEA solving time of 1000 individuals in a single generation is estimated to be about 7 days. If the GA runs for 50 generations, the whole solving time is near a year, which is beyond acceptable level.

In this paper, we propose a solution to this problem using a “divide-and-conquer" method. The whole population (*Metapopulation*) is divided in to 4 subpopulations: *Subpopulation M*, *Subpopulation E*, *Subpopulation L*, and *Subpopulation B*, each of which experiences its own evolution towards a single optimization goal (*Makespan*, *Energy*, *average Link load*, and *workload Balance*, correspondingly). In each generation, after the algorithm evaluates the performance of every individual, the elites of each subpopulation are selected and regrouped as the *DEA-ready pool* for the DEA evaluation process. The basic idea is that, according to Theorem 4, the more preeminent a schedule is in one metric, the less likely it is dominated by another schedule. Then the top performers in the DEA-ready pool are duplicated to each subpopulation and replace the bottom individuals, and the subpopulations continue to evolve. The process is shown in Figure 2.

##### 5.2. Genetic Operations

In our proposal, a chromosome or an individual represents one schedule solution. The structure of a chromosome is an array with the size of processors number, and the value of its element, , represents that task is assigned to processor .

The evolution of the four subpopulations is independent, each of which goes through a complete series of genetic operations. The basic framework of genetic algorithm is based on the proposal in [27] as follows.

*Step 1 (initialization). *The chromosome is randomly created and added to the subpopulation. When the number of population reaches *subpopulation_size*, algorithm goes to the next step.

*Step 2 (evaluation). *Performance of each individual in the pool is evaluated.

*Step 3 (selection). *Chromosomes are ordered according to their subpopulation’s optimization goal, and the top *sel_ratio* chromosomes directly enter the next generation’s pool.

*Step 4 (crossover). *Two random chromosomes, *chr1* and *chr2*, are selected from current pool, and two new chromosomes, *nchr1* and *nchr2*, are generated by swapping middle part of the chromosome array. This step produces population of *cros_ratio*** subpopulation_size*.

*Step 5 (mutation). *A random chromosome is picked, and values of two random positions of the chromosome are swapped to produce a new chromosome. The population generated in the mutation step accounts for *mut_ratio* of *subpopulation_size*.

*Step 6 (termination). *GA is terminated after certain number of generations. If GA does not meet its terminal condition, the algorithm iterates to Step 2 and repeats the whole process.

For example, consider a scheduling problem of 9 tasks (task 1~task 9) scheduling to 3 processors (processor 1~processor 3). Two randomly generated chromosomes, *chr1* and *chr2*, are listed in Table 1. Following the previous definition, *chr1* represents that task 2 and task 7 are scheduled to processor 1; tasks 4, 5, and 8 are scheduled to processor 2; and tasks 1, 3, 6, and 9 are scheduled to processor 3. Chromosome *chr2* represents that tasks 3, 5, 6, 7, and 8 are scheduled to processor 1; task 1 and task 2 are scheduled to processor 2; and task 4 and task 9 are scheduled to processor 3.

In the *crossover* operation, two randomly selected chromosomes, *chr1* and *chr2*, swap their middle part of chromosome to form two new chromosomes *nchr1* and *nchr2*, as shown in Figure 3.

In the mutation operation, a new chromosome *nchr1* is generated by randomly picking a chromosome *chr1*, and swapping two arbitrary positions in *chr1* as shown in Figure 4.

##### 5.3. Cross Evaluation of Individuals

The elites of each subpopulation are selected to form a *DEA-ready pool*. Then the DEA approach is applied to the individuals in this pool.

First, the FDH simple efficiency of every individual in the pool is calculated, and the inefficient individuals are removed. According to Corollary 5, the removal of inefficient DMUs does not change the dominant position of the efficient ones. Then the average cross efficiencies of the remaining individuals are calculated, and the individual with the largest value of average cross efficiency is marked as the most efficient schedule.

The reason for the removal of inefficient DMUs is threefold: first of all, as proven in Theorem 12, we know that the most efficient schedule which we are pursuing is not an inefficient schedule; secondly, the removal of inefficient schedules eliminates the influence of these obviously defected schedules on the coefficients of the following calculation of FDH cross efficiency (otherwise, an inefficiency schedule would become a constraint in the calculation of FDH cross efficiency according to (14)); finally, it further reduces the computational demand of our algorithm.

Pseudocode 1 shows cross evaluation process in our algorithm.

#### 6. Computational Experiments and Discussion

##### 6.1. Simulation Results of FDH Cross Evaluation Formulation

In this section, we extract 40 actual schedule solutions from our simulation, and use our proposed FDH cross evaluation, as well as other DEA methods, to analyze the performance of these DMUs. The chosen schedules are from the 5th generation of our CrosFDH-GA for a schedule problem of scheduling 100 tasks onto a 4 × 4 mesh NoC. Top 10 best schedules in each subpopulation are grouped as our schedule set. The schedules are listed in Table 2, and for more intuitive observation, all the metrics shown are preprocessed by dividing the value by the average value of each metric in the set. As we have proved in Theorem 13, this normalization process will not affect the value of the DMU’s efficiency.

CCR efficiency (CCR), BCC efficiency (BCC), FDH efficiency (FDH), CCR super efficiency (Super CCR), CCR average Cross efficiency (Cros CCR), and FDH average Cross efficiency (Cros FDH) of each DMU are also calculated and listed in Table 2. All the DEA formulations are solved using GLPK [28].

Two kinds of FDH cross efficiencies, Cros FDH (All) and Cros FDH (Eff), are presented in Table 2. The difference between these two is as follows. Cros FDH (All) calculates the FDH cross efficiency using all 40 schedules while Cros FDH (Eff) only takes into account the FDH efficient ones, which is suggested in Section 5.3. The value of CrosFDH (Eff) for inefficient schedules is not calculated and marked with “—.”

Moreover, a *maverick index* (MI), which is suggested in [24], is calculated for each FDH cross efficiency. MI for DMU is calculated by .

MI measures the difference between a DMU’s simple efficiency and its cross efficiency. The larger MI value implies that the DMU is more likely to be a maverick that “cheats” a high simple efficiency by choosing a particular set of coefficients that favors its only strength and depreciating other metrics, in our words, a metric-unbalanced DMU.

From Table 2 we observe that, among the 3 simple efficiency formulations, CCR has the smallest efficient schedule number of 2, while BCC has 9 efficient schedules, and FDH has 13. Both efficient schedules in CCR are efficient in BCC and FDH; all 9 efficient schedules in BCC are also FDH efficient, and FDH has 4 extra schedules: *schedule 4*, *schedule 15*, *schedule 18*, and *schedule 19*, than BCC. The difference between the efficient schedule numbers of these three DEA models is caused by the different shapes of efficient frontier, which is generated by different constraints in the model formulations. The convex-shaped efficient frontier of BCC in the multi-input multioutput space contains more DMUs than the CCR efficient frontier, while the staircase-like FDH efficient frontier has the largest number of efficient DMUs on it. Moreover, the value of CCR efficiency of a schedule is generally the smallest one among the three models, and BCC is generally smaller than the FDH efficiency. In fact, as pointed out in [29], the FDH efficiencies are generally higher than CCR and BCC efficiencies.

Three DMU ranking methods, CCR super efficiency, CCR cross efficiency, and FDH cross efficiency, are also compared in Table 2. As observed in the table, CCR super efficiency and CCR cross efficiency are consistent with the CCR simple efficiency, and both CCR ranking methods mark *schedule 32* as the best schedule. For the FDH average cross efficiency, it is easy to tell that CrosFDH (all) is more discriminating than the previous methods. The FDH average cross efficiencies of 13 FDH efficient schedules vary from 0.9934 (*schedule 20*, ranks 1 in 40 schedules) to 0.905 (*schedule 4*, ranks 35 in 40 schedules). The reason for an efficient *schedule 4* achieving such a low average cross efficiency score is explained by its MI value. *schedule 4* has a high MI of 0.105, which suggests that it “cheats” in the FDH simple efficiency calculation. A close look on its metrics reveals that it compromises too much on the and metric. The same situation happens with *schedule 31* (ranks 25 in 40) and *schedule 32* (ranks 24 in 40), in which both are classified as CCR efficient schedules and *schedule 32* even being marked as the “best” schedule under CCR super efficiency and CCR cross efficiency analysis. MIs of *schedule 31* and *32* are 0.0568 and 0.0553; the relevantly high MIs imply that they are more likely to be mavericks. And after examining their metrics, it is shown that they are both metric unbalanced because they trade too much performance on the and metric for the metric.

As to the two different implementations of the FDH average cross efficiency, the Cros FDH (All) and Cros FDH (Eff), deliver very similar results on the DMU ranking. Top 3 schedules under Cros FDH (All) are *schedules 20*, *17*, and *18*. Comparing to the top 3 ranking of Cros FDH (Eff), *schedules 17*, *20*, and *18*, only a slight change of order exists, which validates the process of inefficient DMU removal in the cross evaluation of individuals proposed in Section 5.3.

Moreover, the average values of four metrics and MI (All) in each subpopulation are calculated and listed in Table 3. The most interesting results are in the *Subpopulation B*. The average *workload balance* of *Subpopulation B* is almost as twice as the average value of the rest three subpopulations. However, the performance of other metrics is not so good in the *Subpopulation B* and is 1.2%, 15.4%, and 16% larger than the average value of the rest three subpopulations in *makespan*, *energy*, and *average link load*, respectively. This phenomenon indicates that the optimization on the metrics is in conflict with other metrics, especially with and metrics. The assertion is also supported by the MI (All). The average MI in *Subpopulation B* is 129.3% larger than the average value of the rest three subpopulations, which suggests that the schedules in *Subpopulation B* are extremely metric unbalanced, compared with the schedules in other subpopulations. Thus, in order to achieve a highly metric-balanced schedule solution, compromise must be made on the metric.

##### 6.2. Simulation Results of CrosFDH-GA Scheduling Algorithm

###### 6.2.1. Simulation Setup

In this section, comparative simulations are made to evaluate the performance of our CrosFDH-GA scheduling algorithm. Twenty DAGs, *tg1~tg20*, are generated using *Task Graph For Free* (TGFF) [30]. The task number of generated DAGs varies from 50 to 100. Along with two real-world application, solving *laplace equation* using Gauss-Seidel algorithm [31] and *molecular dynamic coding* [32], a total of 22 task sets are simulated.

The control groups of our simulation are four GAs with different global objective functions, and they are multiplication and division (MD), weighted sum (WS), weighted exponential sum (WES), and exponential weighted criterion (EWC) based on global criterion method [33]. The fitness functions are presented as follows: where , , , and are the weight coefficients of the corresponding metric. In our implementation, , , , and are all set to 0.25, which means there is no preference between four metrics.

Moreover, the *sel_ratio*, *cros_ratio*, and *mut_ratio* of GA are 0.2, 0.4, and 0.4, and GA terminates its iteration after 50 generations. The population of four global objective function-based GAs is 1000. The *subpopulation_size* of CrosFDH-GA is 250, which ensures that total population is 1000 individuals.

All the output schedules are simulated under a System C based cycle-accurate NoC simulator, which is a wormhole-routing modified version of [34]. The implemented NoC simulator is a 4 × 4 mesh NoC with the router structure illustrated in Figure 3. The link width is 16 bit, and the FIFO depth is one flit. routing is used to forward packets and RR arbitration is adopted to solve contentions. The NoC simulator also integrates Orion 2.0 [35] to measure the actual routing energy during execution of a DAG.

###### 6.2.2. Results and Discussion

The simulation results of 22 task sets, which consist of 20 DAGs generated by TGFF and two real-world applications of solving Laplace equation (LE) using Gauss-Seidel algorithm and molecular dynamic coding (MDC), under four global criterion GAs and our CrosFDH GA, are illustrated in Figure 5. All the shown makespan and energy metrics are the actual results measured in our NoC simulator.

As shown in Figure 5, it is observed that the four global criterion GAs render similar performance on the , , , and metrics, while the proposed CrosFDH GA exhibits its different tendency on the optimization of metrics. As shown in Figure 5(a), all five scheduling algorithms demonstrate same level of performance on the *makespan*. The main performance difference exists in the optimization of the energy (Figure 5(b)), average link load (Figure 5(c)), and workload balance metrics (Figure 5(d)). The four global criterion GAs always output the schedule solution with the best *workload balance *(in *tg10*, MD-GA and WS-GA do not find the schedule solution with the best metric within 50 generations as the WES-GA and EWC-GA do), as shown in Figure 5(d). On the other hand, our proposal always compromises on the metrics and trades for better optimization on the and metrics.

To be more specific, we normalize the , , , and metrics of the five scheduling algorithms to the CrosFDH correspondence in 22 task sets and calculate the average value of these metrics for each scheduling algorithm. The results are shown in Table 4.

From the table, the four global criterion GAs have eight times as much metric as CrosFDH-GA on average. However, our proposal has better performance on both and metrics. The average energy of our algorithm is about 36% smaller, and the average link load is about 29% smaller than the rest algorithms on average. This tendency of optimization in the CrosFDH-GA is explained by the previous analysis of schedule’s FDH cross efficiency, which suggests that a better metric-balanced schedule should trade the metric for the and metrics.

Moreover, in our observation, schedule solutions that show similar performance to the output schedule of the four global criterion GAs exists in the *Subpopulation B *of the CrosFDH-GA. However, these schedule solutions are depreciated during the peer-appraising process of FDH cross efficiency, which supports the conclusion that the output schedules of the global criterion GAs are metric unbalanced.

Figure 5(e) illustrates the solving time (measured in seconds) of five algorithms, and for illustrative presentation, the data in Figure 5(e) are preprocessed by applying log_{10} to the solving time of each algorithm. According to the figure, MD-GA has the smallest solving time in the five algorithms. The WS-GA, WES-GA, and EWC-GA have 4.4%, 4.5%, and 54.6% more solving times than the MD-GA on average. The CROSFDH-GA has the largest solving time which is nearly 109 times larger than MD-GA. The reason of such long solving time is caused by the high computation load introduced by DEA analysis. In the CROSFDH-GA, 90.2% of the solving time is used to calculate the average FDH cross efficiency, 8.3% of the solving time is used to calculate FDH simple efficiency, and the rest of the algorithm consumed only 1.5% of the solving time.

In MD-GA, WS-GA, WES-GA, and EWC-GA, a clear increasing trend of solving time is observed as the scale of the scheduling problem (task number) rises, as shown in Figure 6(a). However such trend is not observed in CROSFDH-GA, as shown in Figure 6(b).

The reason of this phenomenon in CROSFDH-GA is that the solving time of CROSFDH-GA is largely determined by the calculation time of average FDH cross efficiency, and the calculation time of average FDH cross efficiency is depending on the size of the schedule set, which in our CROSFDH-GA is the number of efficiency schedules in the DEA-ready pool in each generation. Thus, the solving time of CROSFDH-GA is not directly related to the task number of a schedule problem but to the number of efficient schedules in the DEA-ready pool in each generation.

Figure 7(a) shows the relation between number of efficient schedules in the DEA-ready pool and the average solving time of the average FDH cross efficiency in a generation. As shown in the figure, the solving time rises rapidly as the efficient schedule number increases. Moreover, Figure 7(b) gives a statistic result of efficient schedule number during all 22 simulations. As observed in the figure, most of the generations during the simulation have the efficient schedule number that lies between 30 and 70, which requires about 7 to 100 seconds average FDH cross efficiency solving time.

Finally, Figure 8 demonstrates the trend of how four metrics converge during the iteration. The illustrated metrics are the best ones in the corresponding subpopulations in each generation and are all normalized to the final value of the 50th generation. Figure 8 shows that the metric is the first converged metric, followed by the metric and metric, which both converge in the same pace. And the metric is the last converged metric.

#### 7. Conclusion

In this paper, a FDH cross efficiency formulation, as well as a CrosFDH-GA algorithm, is proposed for the task scheduling problem on the NoC-based MPSoC. Four common metrics, namely, makespan, routing energy, average link load, and workload balance, are used to construct the multi-input multioutput DMU model. After using FDH simple efficiency to eliminate inefficient (dominated) schedules, the peer-appraisal FDH cross efficiency is introduced to ranking schedules, during which the maverick (metric-unbalanced) schedules are depreciated. Then a FDH cross efficiency-based genetic algorithm with four subpopulations, each of which optimizes a single metric, is proposed for solving actual scheduling problem on NoC. According to our simulation results, the proposed FDH cross efficiency effectively distinguishes the schedule solutions according to the balance of their metrics, and our CrosFDH always outputs more metric-balanced schedules than other global criterion GAs.

#### Conflict of Interests

The authors certify that there is no conflict of interests regarding the publication of this paper.

#### Acknowledgments

This work was supported by the National Natural Science Foundation of China (61201005), the Fundamental Research Funds for the Central Universities (No.ZYGX2011J006 and No.ZYGX2012J001) and the National Major Projects (2011ZX03003-003-04).

#### References

- A. S. Manne, “On the job-shop scheduling problem,”
*Operations Research*, vol. 8, no. 2, pp. 219–223, 1960. View at Publisher · View at Google Scholar - M. H. Rothkopf, “Scheduling with random service times,”
*Management Science*, vol. 12, no. 9, pp. 707–713, 1966. View at Publisher · View at Google Scholar - R. R. Muntz and E. G. Coffman Jr., “Preemptive scheduling of real-time tasks on multiprocessor systems,”
*Journal of the ACM*, vol. 17, no. 2, pp. 324–338, 1970. View at Publisher · View at Google Scholar · View at Scopus - T. Malone, “Enterprise: a market-like task scheduler for distributed computing environments,”
*The Ecology of Computation*, pp. 177–205, 1988. View at Google Scholar - X. He, X. Sun, and G. Von Laszewski, “QoS guided Min-Min heuristic for grid task scheduling,”
*Journal of Computer Science and Technology*, vol. 18, no. 4, pp. 442–451, 2003. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at Scopus - D. Geer, “Industry trends: chip makers turn to multicore processors,”
*Computer*, vol. 38, no. 5, pp. 11–13, 2005. View at Publisher · View at Google Scholar · View at Scopus - W. J. Dally and B. Towles, “Route packets, not wires: on-chip interconnection networks,” in
*Proceedings of the 38th Design Automation Conference*, pp. 684–689, June 2001. View at Scopus - W. Vereecken, W. Van Heddeghem, D. Colle, M. Pickavet, and P. Demeester, “Overall ICT footprint and green communication technologies,” in
*Proceedings of the 4th International Symposium on Communications, Control, and Signal Processing (ISCCSP '10)*, pp. 1–6, March 2010. View at Publisher · View at Google Scholar · View at Scopus - J. Huang, C. Buckl, A. Raabe, and A. Knoll, “Energy-aware task allocation for network-on-chip based heterogeneous multiprocessor systems,” in
*Proceedings of the 19th International Euromicro Conference on Parallel, Distributed, and Network-Based Processing (PDP '11)*, pp. 447–454, February 2011. View at Publisher · View at Google Scholar · View at Scopus - B. Ge, N. Jing, W. He, and Z. Mao, “Contention and energy aware mapping for real-time applications on Network-on-Chip,” in
*Proceedings of the International Conference in SoC Design Conference (ISOCC '12)*, pp. 72–76, 2012. - A. Charnes, W. W. Cooper, and E. Rhodes, “Measuring the efficiency of decision making units,”
*European Journal of Operational Research*, vol. 2, no. 6, pp. 429–444, 1978. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at Scopus - A. Merkel, J. Stoess, and F. Bellosa, “Resource-conscious scheduling for energy efficiency on multicore processors,” in
*Proceedings of the 5th ACM EuroSys Conference on Computer Systems (EuroSys '10)*, pp. 153–166, April 2010. View at Publisher · View at Google Scholar · View at Scopus - O. Sathappan, P. Chitra, P. Venkatesh, and M. Prabhu, “Modified genetic algorithm for multiobjective task scheduling on heterogeneous computing system,”
*International Journal of Information Technology, Communications and Convergence*, vol. 1, no. 1, pp. 146–158, 2011. View at Publisher · View at Google Scholar - J. M. N. Abad, S. K. Shekofteh, H. Tabatabaee, and M. Mehrnejad, “CoreIIScheduler: scheduling tasks in a multi-core-based grid using NSGA-II technique,” in
*Intelligent Informatics*, pp. 507–518, Springer, New York, NY, USA, 2013. View at Google Scholar - H. F. Sheikh and I. Ahmad, “Fast algorithms for thermal constrained performance optimization in DAG scheduling on multi-core processors,” in
*Proceedings of the International Green Computing Conference (IGCC '11)*, pp. 1–6, July 2011. View at Publisher · View at Google Scholar · View at Scopus - A. J. Ruiz-Torres and F. J. López, “Using the FDH formulation of DEA to evaluate a multi-criteria problem in parallel machine scheduling,”
*Computers and Industrial Engineering*, vol. 47, no. 2-3, pp. 107–121, 2004. View at Publisher · View at Google Scholar · View at Scopus - F. Samman, T. Hollstein, and M. Glesner, “New theory for deadlock-free multicast routing in wormhole-switched virtual-channelless networks-on-chip,”
*IEEE Transactions on Parallel and Distributed Systems*, vol. 22, no. 4, pp. 544–557, 2011. View at Publisher · View at Google Scholar · View at Scopus - E. Bolotin, I. Cidon, R. Ginosar, and A. Kolodny, “QNoC: QoS architecture and design process for network on chip,”
*Journal of Systems Architecture*, vol. 50, no. 2-3, pp. 105–128, 2004. View at Publisher · View at Google Scholar · View at Scopus - T. T. Ye, L. Benini, and G. De Micheli, “Analysis of power consumption on switch fabrics in network routers,” in
*Proceedings of the 39th Annual Design Automation Conference (DAC '02)*, pp. 524–529, June 2002. View at Scopus - E. Carvalho, N. Calazans, and F. Moraes, “Heuristics for dynamic task mapping in NoC-based heterogeneous MPSoCs,” in
*Proceedings of the 18th IEEE/IFIP International Workshop on Rapid System Prototyping (RSP '07)*, pp. 34–40, May 2007. View at Publisher · View at Google Scholar · View at Scopus - M. Zakarya, N. Dilawar, M. A. Khattak, and H. Maqssod, “Energy efficient workload balancing algorithm for real-time tasks over multi-core,”
*World Applied Sciences Journal*, vol. 22, no. 10, pp. 1431–1439, 2013. View at Google Scholar - R. D. Banker, A. Charnes, and W. W. Cooper, “Some models for estimating technical and scale inefficiencies in data envelopment analysis,”
*Management Science*, vol. 30, no. 9, pp. 1078–1092, 1984. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at Scopus - D. Deprins, L. Simar, and H. Tulkens, “Measuring labor-efficiency in post offices,” in
*Public Goods, Environmental Externalities and Fiscal Competition*, pp. 285–309, Springer, New York, NY, USA, 2006. View at Google Scholar - J. Doyle and R. Green, “Efficiency and cross-efficiency in DEA: derivations, meanings and uses,”
*Journal of the Operational Research Society*, vol. 45, no. 5, pp. 567–578, 1994. View at Google Scholar · View at Zentralblatt MATH · View at Scopus - D. McFadden, “Cost, revenue, and profit functions,” in
*Histoy of Economic Thought Chapters*, vol. 1, 1978. View at Google Scholar - P. J. Agrell and J. Tind, “A dual approach to nonconvex frontier models,”
*Journal of Productivity Analysis*, vol. 16, no. 2, pp. 129–147, 2001. View at Publisher · View at Google Scholar · View at Scopus - O. Sinnen, L. A. Sousa, and F. E. Sandnes, “Toward a realistic task scheduling model,”
*IEEE Transactions on Parallel and Distributed Systems*, vol. 17, no. 3, pp. 263–275, 2006. View at Publisher · View at Google Scholar · View at Scopus - A. Makhorin, “GLPK (GNU linear programming kit),” 2006.
- H. O. Fried, C. A. K. Lovell, and S. S. Schmidt,
*The Measurement of Productive Efficiency and Productivity Growth*, Oxford University Press, Oxford, UK, 2008. - R. P. Dick, D. L. Rhodes, and W. Wolf, “TGFF: task graphs for free,” in
*Proceedings of the 1998 6th International Workshop on Hardware/Software Codesign*, pp. 97–101, March 1998. View at Publisher · View at Google Scholar · View at Scopus - M.-Y. Wu and D. D. Gajski, “Hypertool: a programming aid for message-passing systems,”
*IEEE Transactions on Parallel and Distributed Systems*, vol. 1, no. 3, pp. 330–343, 1990. View at Publisher · View at Google Scholar · View at Scopus - S. Kim and J. Browne, “A general approach to mapping of parallel computation upon multiprocessor architectures,” in
*Proceedings of the International Conference on Parallel Processing*, 1988. - R. T. Marler and J. S. Arora, “Survey of multi-objective optimization methods for engineering,”
*Structural and Multidisciplinary Optimization*, vol. 26, no. 6, pp. 369–395, 2004. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at Scopus - C. Song, W. Chang, L. Yubai, and Y. Zhongming, “A NoC simulation and verification platform based on systemC,” in
*Proceedings of the International Conference on Computer Science and Software Engineering (CSSE '08)*, pp. 423–426, December 2008. View at Publisher · View at Google Scholar · View at Scopus - A. B. Kahng, L. Bin, L. Peh, and K. Samadi, “Orion 2.0: a fast and accurate NoC power and area model for early-stage design space exploration,” in
*Proceedings of the conference on Design, Automation and Test in Europe (DATE '09)*, pp. 423–428, April 2009. View at Scopus