#### Abstract

The evaluation of multiobjective evolutionary algorithms (MOEAs) involves many metrics, it can be considered as a multiple-criteria decision making (MCDM) problem. A framework is proposed to estimate MOEAs, in which six MOEAs, five performance metrics, and two MCDM methods are used. An experimental study is designed and thirteen benchmark functions are selected to validate the proposed framework. The experimental results have indicated that the framework is effective in evaluating MOEAs.

#### 1. Introduction

Without a loss of generality, the mathematical formula of multiobjective problems (MOPs) can be expressed as follows:where is the decision vector in the decision space Ω. is the objective function. Generally speaking, the objectives contradict each other. We cannot find a single solution to optimize all the objectives. Optimizing one objective often leads to deterioration in at least one objective.

Over the past two decades, many multiobjective evolutionary algorithms (MOEAs) have been proposed, such as vector evaluated genetic algorithm (VEGA) [1], Pareto archived evolution strategy (PAES) [2], strength Pareto evolutionary algorithm (SPEA) [3], SPEA2 [4], Pareto envelope-based selection algorithm (PESA) [5, 6], nondominated sorting algorithm (NSGA) [7], NSGAII [8], multiobjective evolutionary algorithm based on decomposition (MOEAD) [9, 10], indicator-based evolutionary algorithm (IBEA) [11], epsilon-multiobjective evolutionary algorithm (epsilon-MOEA) [12], multiobjective particle swarm optimizer (MOPSO) [13], speed-constrained multiobjective particle swarm optimizer (SMPSO) [14], generalized differential evolution (GDE3) [15], ABYSS [16], multiobjective symbiotic organism search (MOSOS) [17], multiobjective differential evolution algorithm (MODEA) [18], grid-based adaptive MODE (GAMODE) [19]. These algorithms make great contributions to the development of evolutionary algorithms and optimization approaches. These methods try to make population move towards the optimal Pareto front region.

In the single optimization, the algorithm performance can be evaluated by the difference between and function optimal value. However, the method cannot be adopted in MOPs. In order to solve the problem, many criteria are proposed to evaluate the performance of MOEAs. In fact, the experiment results of almost every algorithm indicate that the proposed algorithm is competitive compared with the state-of-the-art algorithms. Nondominated objective space and box plot are adopted in SPEA2 [4]. NSGAII employs convergence and diversity metrics to compare with SPEA and PAES [8]. The set convergence and inverted generation distance (IGD) are used to evaluate the performance of MOEAD [9]. Epsilon indicator is used in IBEA [11]. Convergence measurement, spread, hypervolume and computational time are selected as performance metrics in epsilon-MOEA [12]. To validate the proposed MOPSO, four quantitative performance indexes (success counting IGD, set coverage, two-set difference hypervolume) and qualitative performance index (plotting the Pareto fronts) are adopted [13]. Three quality indicators, additive unary epsilon indicator, spread, and hypervolume, are considered in SMPSO [14]. Spacing, binary metrics and are used in GD3 [15]. Three metrics, generation distance (GD), spread, and hypervolume, are used to estimate ABYSS [16]. GD, diversity, computational time, and box plot are considered as measurement in MOSOS [17]. GD and diversity metrics are adopted in MOEDA [18]. There are three metrics, GD, IGD, and hypervolume, in GAMODE [19].

Among these metrics, some focus on the convergence of MOEAs, while some pay attention to the diversity of MOEAs. Convergence is to measure the ability to attain global Pareto front and diversity is to measure the distribution along the Pareto front. It is observed that every proposed algorithm often introduces few metrics to estimate the performance based on the results from benchmarks. The conclusions of these MOEAs are that they are the best and competitive. However, it is unfaithfully to measure MOEAs performance by one or two metrics. Every metric can just demonstrate some specific qualification of performance while neglecting other information. For instance, the metric GD can provide information about the convergence of MOEAs, but it cannot evaluate the diversity of MOEAs. Therefore, these evaluations are not comprehensive. It cannot entirely estimate the whole performance of MOEAs. As evaluation of MOEAs involves many metrics, it can be regarded as a multiple-criteria decision making (MCDE) problem. MCDE techniques can be used to cope with the problem. In order to overcome the problem and make fair comparisons, a framework using MCDE methods is proposed. In the framework, comprehensive performance metrics are established, in which both convergence and diversity are considered. Two MCDE methods are employed to evaluate six MOEAs. The efforts can give more fair and faithful comparisons than single metric.

The rest of this paper is organized as follows: Section 2 proposes the framework, in which six algorithms, five performance metrics, and two MCDM methods are briefly introduced. Experiments are presented in Section 3 and conclusions are illustrated in Section 4.

#### 2. Evaluation Framework

A framework is proposed to evaluate multiobjective algorithms in Figure 1. Six MOEAs, five performance metrics, and two MCDM methods are employed in the framework.

##### 2.1. Six MOEAs

*(1) NSGAII [8]*. NSGAII was proposed to solve the high computational complexity, lack of elitism, and specifying of the sharing parameter of NSGA. In NSGAII, a selection operator is designed by creating a mating pool to combine the parent population and offspring population. Nondominated sort and crowding distance ranking are also implemented in the algorithm.

*(2) PAES [2]*. The Pareto archived evolution strategy (PAES) is a simple evolutionary algorithm. The algorithm is considered as a (1 + 1) evolution strategy, employing local search from a population of one but using a reference archive of previously found solutions in order to identify the approximate dominance ranking of the current and candidate solutions vectors.

*(3) SPEA2 [4]*. The strength Pareto evolutionary algorithm (SPEA) was proposed in 1999 by Zitzler. Based on the SPEA, an improved version, namely, SPEA2, was proposed, which incorporated a fine-grained fitness assignment, a density estimation technique, and an enhanced archive truncation method.

*(4) MOEAD [9]*. Multiobjective evolutionary algorithm based on decomposition (MOEAD) was proposed by Li and Zhang. It decomposes a multiobjective optimization problem into a number of scalar optimization subproblems and optimizes them simultaneously. Each subproblem is optimized by only using information from neighboring subproblems, which makes the algorithm effective and efficient. It won the outstanding paper award of IEEE Transactions on evolutionary computation.

*(5) MOPSO [13]*. Multiobjective particle swarm optimizer (MOSPO) is based on Pareto dominance and the use of a crowing factor to filter out the list of available leaders. Different mutation operators act on different subdivisions of the swarm. The epsilon-dominance concept is also incorporated in the algorithm.

*(6) SMPSO [14]*. Speed-constrained multiobjective PSO (SMPSO) was proposed in 2009. It allows generating new particle position in which velocity is extremely high. The turbulence factor and an external archive are designed to store the nondominated solutions found during search.

##### 2.2. Performance Metrics

Nowadays, there are many metrics to measure performance of MOEAs. Among them, the following five metrics are widely employed. They can reveal the convergence and diversity of MOEAs very well. However, many researches just employ a few of them to evaluate algorithms and argue that their proposed algorithms are the best. In fact, it is unfair to give the conclusion without comprehensive metrics and evaluations. Therefore, the five metrics are selected to make the comprehensive comparisons.

*(1) GD*where is the distance between nondominated solution and the nearest Pareto front solution in objective space. It is to measure the closeness of the solutions to the real Pareto front. If GD is equal to zero, this reveals that all the nondominated solutions generated are located in the real Pareto front. Therefore, the lower value of GD indicates that the algorithm has better performance [20].

*(2) IGD*. is a set of uniformly distributed points in the objective space. is the nondominated solution set obtained by an algorithm and the distance from to is defined aswhere is the minimum Euclidean distance between and the points in . Algorithms with smaller IGD values are desirable [21, 22].

* (3) Hypervolume. *This hypervolume metric calculates the volume (in the objective space) covered by members of nondominated solutions sets obtained by MOEAs where all objectives are to be minimized [16]. A hypervolume can be calculated as follows:

The larger the HV value is, the better the algorithm is.

*(4) Spacing*. The metric spacing is to measure how uniformly the nondominated set is distributed. It can be formulated as follows:where is the same as the in GD metric, is the average value of , and* n* is the number of individuals in nondominated set. The smaller the spacing is, the better the algorithm performs [23, 24].

*(5) Maximum Pareto Front Error*. It is to measure the worst case and can be formulated as follows:where is the same as employed in GD. MPFE is the largest distance among these . The lower the value of MPFE is, the better the algorithm is [25].

In order to elaborate the five metrics, Figure 2(a) reveals the distance used in GD, space, and MPFE. Figure 2(b) presents the distance in IGD metric, and Figure 2(c) depicts the HV metric.

**(a) GD, space, and MPFE**

**(b) IGD**

**(c) HV**

##### 2.3. TOPSIS

TOPSIS is one of MCDM methods to evaluate alternatives. In TOPSIS, the best alternative should have two characteristics: one is the farthest from the negative-ideal solution and the other one is the nearest to positive-ideal solution. The negative-ideal solution is a solution that maximizes the cost criteria and minimizes the benefit criteria, which has all the worst values attainable from the criteria. The positive-ideal solution minimizes the cost criteria and maximizes the benefit criteria. It is consisted of all the best values attainable from the criteria [26, 27]. TOPSIS consists of the following steps.

*Step 1 (obtain decision matrix). *If the number of alternatives is and the number of criteria is , decision matrix with rows and columns will be obtained as in Table 1.

In Table 1, is a value indicating the performance rating of each algorithm* j*th with respect to each criterion th.

*Step 2 (normalize decision matrix). *According to (7), the normalized value is calculated as follows:

*Step 3 (calculate the weighted normalized decision matrix). *The matrix is calculated by multiplying the normalized decision matrix and its weights are presented aswhere is the weight of the* i*th criterion, .

*Step 4 (find the negative-ideal and positive-ideal solutions). *where is associated with cost criteria and is associated with benefit criteria.

*Step 5 (calculate the -dimensional Euclidean distance). *The separation of each algorithm from the ideal solution is presented as follows:The separation of each algorithm from the negative-ideal solution is defined as follows:

*Step 6 (calculate the relative closeness to the ideal solution). *The relative closeness of the algorithm* j*th is defined as

*Step 7 (rank algorithms order). *The is between 0 and 1. The larger the is, the better the algorithm is.

##### 2.4. VIKOR Method

The VIKOR was proposed by Opricovic and Tzeng [28–31]. The method is developed to rank and select from a set of alternatives. The multicriteria ranking index is introduced based on the idea of closeness to the ideal solutions. The VIKOR requires the following steps.

*Step 1. *Determine the worst values of criterion and the best value of criterion as follows ():where is the value of th criterion for alternative , is the number of criteria, and* J* is the number of alternatives.

*Step 2. * and can be formulated as follows:where is the weight of th criteria. and are employed to measure ranking.

*Step 3. *Compute the values as follows:where the alternative obtained by is with a maximum utility, the alternative acquired by is with a minimum individual regret of the opponent, and is the weight of the strategy of the majority of criteria and is often set to 0.5.

*Step 4. *Rank the alternatives in decreasing order. Rank the three measurements, respectively: , , and .

*Step 5. *The alternative is considered as the best if the following two conditions are met: C1: , where is the alternative with second position in the ranking list by and is the number of alternatives; C2: alternative should be the best ranked by or .

#### 3. Experiments

The experiments are designed to evaluate the above six algorithms. In order to make fair comparisons, thirteen test benchmark functions are widely used in MOPs and employed in the experiments. They can be divided into two groups: ZDT suites and WFG suites. All of these test suites are minimization of the objective. The detailed information is given in Table 2 [32, 33].

The mathematical forms of WFG can be obtained in [32] and ZDT suites are presented as follows:

The parameters settings of these algorithms are the same as the original paper. The maximum function evaluations are set to 25,000. Each algorithm runs thirty times and the average values of performance metrics are obtained.

##### 3.1. Results

In order to elaborate the whole calculation process, the ZDT1 results of five metrics are presented in Table 3. The four metrics GD, IGD, MPFE, and spacing of SMPSO are the smallest, and hypervolume is the biggest. PAES is the worst because the performances of five metrics are the worst among the six algorithms. Normalized decision matrix of five performances metrics is presented in Table 4. Suppose that the weight is equal to 1/5. Thus, according to Table 4, positive-ideal and negative-ideal solutions can be defined as follows: then, the distances and are calculated according to (10) and (11), demonstrated in Table 5. The global performance of each algorithm is determined by , calculated in (12) and presented in Table 5. Therefore, the ranking of six algorithms is as follows: SMPSO > SPEA2 > MOPSO > NSGAII > MOEAD > PAES. SMPSO is the best algorithm and PAES is the worst one for ZDT1.

For VIKOR method, the* Q, S, *and* R *are calculated and presented in Table 6. According to the feature of* Q, S, *and* R*, SMPSO is the best one while PAES is the worst one. SPEA2 is better than MOEAD. However, as the condition cannot be satisfied,* S *value is used to determine the ranking for NSGAII, SPEA2, MOEAD, MOPSO. Therefore, the ranking among six algorithms is SMPSO > SPEA2 > MOPSO> NSGAII > MOEAD > PAES.

Tables 7 and 8 give the complete rankings of TOPSIS and VIKOR methods for all benchmark functions. For the thirteen test functions, NSGAII wins in one problem, WFG1, and performs better in ZDT3, WFG2, WFG4, and WFG8. SPEA2 wins in five problems: ZDT3, WFG2, WFG4, WFG7, and WFG8. SPEA2 provides better performance in ZDT1, ZDT2, WFG1, WFG3, and WFG9. However, it achieves worse result in WFG6. MOEAD obtains the best result in ZDT6 and better performance in WFG5. MOPSO wins in one problem, WFG3, and gets better results in ZDT6 and WFG6. SMPSO provides the best performance in five problems: ZDT1, ZDT2, WFG5, WFG6, and WFG9.

TOPSIS and VKIOR methods achieve same rankings for ZDT1, ZDT2, ZDT3, ZDT6, WFG2, WFG3, WFG4, WFG5, WFG8, and WFG9.

However, TOPSIS and VKIOR methods have different rankings for WFG1, WFG6, and WFG7. Take the WFG1 as an instance. The final values of TOPSIS and VKIOR are presented in Table 9. As there are six algorithms,* J* is set to six, . It indicates that the* Q* value difference between two algorithms should be more than 0.2. Otherwise, the rank between the two algorithms is determined by* S or R*.

From Table 9, it can be noticed that this condition cannot be met between NSGAII and SPEA2, so the values* S* are used to compare the two algorithms. The value of NSGAII is smaller than that of SPEA2, so NSGAII is better than SPEA2, NSGAII is the first, and SPEA2 is the second. However, TOPSIS directly uses* CC* as the ranking criteria. The* CC* value of SPEA2 is bigger than NSGAII, so it can be ranked the first and NSGAII is the second in TOPSIS method.

##### 3.2. Discussion

In order to further make comparisons, the best and worst performances of the above six algorithms are selected. The nondominated solutions obtained by the two kinds of algorithms are depicted in Figures 3–5. WFG1, NSGAII, and SPEA2 achieve the best ranking according to VIKOR and TOPSIS methods, respectively, so the nondominated solutions of two algorithms are both shown in Figure 4.

It can be clearly observed that the better algorithm can achieve nondominated solutions uniformly distributed along the Pareto front, and the worse algorithm obtains nondominated solutions, in which both convergence and diversity are poor.

ZDT1, ZDT2, WFG5, WFG6, and WFG9 have common features. They do not have local Pareto front and their Pareto fronts are continuous. SMPSO allows new particle position in which velocity is very high. The turbulence factor and an external archive are designed to store the nondominated solutions found during search. These mechanisms can help population move quickly towards the Pareto front for this type of problems, in which local Pareto front does not exist and Pareto front is continuous.

ZDT 3 and WFG 4 have discrete Pareto front. SPEA2 has achieved performance on the two problems. Thus, if the problem has the discrete Pareto front, SPEA2 is a good choice.

ZDT6 is a biased function as the first objective function value is larger compared to the second one. MOEAD obtains superior results, so, if the problem has the feature, MOEAD should be chosen.

From Tables 7 and 8, the no-free-lunch theorem can also be observed: any optimization algorithm improves performance over one class of problems exactly paid for in loss over another class. No algorithm can achieve the best or worst performance for all test functions.

#### 4. Conclusions

There are many MOEAs. When a multiobjective optimization algorithm is proposed, the experiment results often indicate that the algorithm is competitive based on one or two performance metrics. Generally, these comparisons are unfair and the results are unfaithful. In order to make fair comparisons and rank these MOEAs, a framework is proposed to evaluate MOEAs. The framework employs six well-known MOEAs, five performance metrics, and two MCDM methods. The six MOEAs are NSGAII, PAES, SPEA2, MOEAD, MOPSO, and SMPSO. The five performance metrics GD, IGD, MPFE, spacing, and hypervolume are selected, in which both convergence and diversity of nondominated solutions are fully considered. Two methods are TOPSIS and VIKOR.

The results have indicated that SPEA2 is the best algorithm and PAES is the worst one. However, SPEA2 cannot perform well on all test functions and PAES also does not achieve the worst performance on all test functions. The experiment results are consistent with the no-free-lunch theorem. What is more, the observation of experiment results shows that the ability of MOEAs to solve MOPs depends on both MOEAs and the features of MOPs.

#### Conflicts of Interest

The authors declare that they have no conflicts of interest.

#### Acknowledgments

The authors thank the Natural Science Foundation of China (Grants nos. 71503134, 91546117, 71373131), Key Project of National Social and Scientific Fund Program (16ZDA047), Philosophy and Social Sciences in Universities of Jiangsu (Grant no. 2016SJB630016).