Mathematical Problems in Engineering

Volume 2017, Article ID 6439631, 20 pages

https://doi.org/10.1155/2017/6439631

## Latest Stored Information Based Adaptive Selection Strategy for Multiobjective Evolutionary Algorithm

Air Force Engineering University, Xi’an, China

Correspondence should be addressed to Jiale Gao; moc.361@dgk_elaijoag

Received 4 July 2017; Revised 18 October 2017; Accepted 15 November 2017; Published 17 December 2017

Academic Editor: Salvatore Alfonzetti

Copyright © 2017 Jiale Gao et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

The adaptive operator selection (AOS) and the adaptive parameter control are widely used to enhance the search power in many multiobjective evolutionary algorithms. This paper proposes a novel adaptive selection strategy with bandits for the multiobjective evolutionary algorithm based on decomposition (MOEA/D), named latest stored information based adaptive selection (LSIAS). An improved upper confidence bound (UCB) method is adopted in the strategy, in which the operator usage rate and abandonment of extreme fitness improvement are introduced to improve the performance of UCB. The strategy uses a sliding window to store recent valuable information about operators, such as factors, probabilities, and efficiency. Four common used DE operators are chosen with the AOS, and two kinds of assist information on operator are selected to improve the operators search power. The operator information is updated with the help of LSIAS and the resulting algorithmic combination is called MOEA/D-LSIAS. Compared to some well-known MOEA/D variants, the LSIAS demonstrates the superior robustness and fast convergence for various multiobjective optimization problems. The comparative experiments also demonstrate improved search power of operators with different assist information on different problems.

#### 1. Introduction

Multiobjective optimization is a common problem that scientists and engineers face, which concerns optimizing problems with multiple and often conflicting objectives. In principle, there is no single solution for a multiobjective optimization problem (MOP), but a set of Pareto-optimal solutions. This paper considers the following continuous MOP:where is a* n*-dimensional decision variable vector, is the decision space, and consists of* m* real-valued continuous objective functions [1, 2].

Over the past decades, a number of multiobjective evolutionary algorithms (MOEAs) have been proposed. The first MOEA, named vector evaluated genetic algorithm, has been used for MOPs since 1980s [3] and after that more and more attention has been attached to MOEA. The first generation of intelligence algorithms for MOPs represented by NSGA [4], NPGA [5], and MOGA [6] were presented in 1990s. In 2000s, the second generation appeared, and the renowned ones include NSGA-II [7], SPEA-II [8], MOEA/D [9], and some biology imitated algorithms, such as MOPSO [2], MOIA [10], MOACO [11]. These algorithms are usually classified into 3 categories: Pareto based methods [4–8], indicator-based methods [12], and decomposition-based methods [9]. The first one evaluates individuals based on the nondominated ranking. The second one integrates the convergence and diversity into a single indicator to guide evolution. The last one decomposes an MOP into a set of single subproblems, and evaluates solutions with regard to reference vector.

An extreme difficulty for most MOEAs is how to promote the searching efficiency; that is to say how to improve the operator to induce a more probability for searching high dimensional space. There are two improving methods: enhancing operator with adaptive parameter control and using multiple operators with an adaptive operator selection (AOS). The simulated binary crossover (SBX) [13] and the differential operator (DE) [14] are widely adopted, and they are usually combined with the binomial crossover or the polynomial mutation, as these combinations could bring powerful search ability [15, 16]. There are still some limitations for composite operators [17] and variants [18] when searching complex problems. However, more and more adaptive control strategies are considered indispensable. In [19], best individuals are selected as donor vectors for the DE, and the parameter of every mutation vary within the range of the statistical data of each generation. Zhu et al. [17] design a novel recombination operator possessing the advantages of both the DE and the SBX. Its adaptive parameter control strategy is to allocate probabilities for SBX and DE according to the search period, whereas Zhao et al. [20] suggest that the different neighborhood sizes have an unavoidable influence on the search power of operators based on the framework of MOEA/D, and the experiments imply that adaptive selection of neighborhood sizes works very well.

The major intention of the adaptive parameter control is to solve an essential problem regarded as the exploration versus exploitation (EvE) dilemma. Exploitation means searching the local space deeply, that is to say making full use of the current operator with current parameters. However, exploration means the operator search power for unfamiliar areas, which is displayed by other operators or the configuration of the current operator with different parameters. In conclusion, the EvE dilemma can be considered as looking for a tradeoff of the search power both in unfamiliar and familiar areas.

The EvE dilemma is of great significance for the existence of AOS methods. The EvE dilemma has been intensively studied in the game theory community for dealing with the multiarmed bandit (MAB) problem, which was first proposed in 1952 [21]. An interesting strategy, the upper confidence bound (UCB) selection strategy, has been used to solve the EvE dilemma since 1994 [22]. It possesses distinctive advantages among many AOSs, and a host of improved UCB versions appears after that. The application of UCB strategy in solving the MAB problem has been widely recognized, and in the following part UCB is referred to as the MAB algorithm for convenience. DMAB [23] presents the combination of the MAB problem and two statistical tests and suggests that the operator selection is sensitive to any change of the reward distribution. SlMAB [24] uses a sliding time window to store recent rewards and applied operators with a mechanism of first in first out (FIFO). Compared with DMAB, SlMAB possesses less assessment and factors. Furthermore, two rank based methods, the Area Under Curve and the Sum of Ranks, are presented for assigning credits, respectively [25]. Inspired by SlMAB and rank based credit assignment, Li et al. [16] suggest that a decay factor is useful for the credit assignment and it can improve the selection probability of the best operator. Two modified MAB methods are provided, called UCB-Tuned and UCB-V, in which reward variance is used as a parameter for a better EvE tradeoff [26].

Whether enhancing operator with adaptive parameter control or using multiple operators with AOSs, the statistical data about the operators is vitally meaningful with no doubt. Enlightened by the sliding time window, we present a novel adaptive method which is used to store the information about the operators, called latest stored information based adaptive selection (LSIAS) strategy. The information of operators includes operator names, operator efficacies, and parameters about operators such as neighborhood sizes, scaling factors, and other parameters of operators (the parameters about operators will be regarded as assist information on operator for convenience). In this paper, two kinds of assist information are taken into account and they are used within MOEA/D with dynamical resource allocation (MOEA/D-DRA) [27], which won the championship of the CEC 2009 MOEA contest. The reason of choosing a decomposition-based algorithm is mainly because each decomposed subproblem is a single objective problem, which easily gives an exact value to measure the performance of operators every time. To validate the effectiveness and robustness of LSIAS, 22 well-known benchmark problems such as ZDT problems [28], DTLZ problems [29], and UF problems [30] are adopted. The comparative experiments are demonstrated when the MOEA/D-LSIAS compares with some versions of MOEA/D, for example, MOEA/D-DE [31], MOEA/D-DRA [27], MOEA/D-FRRMAB [16], MOEA/D-STM [32], MOEA/D-UCB-T [21], and MOEA/D-AGR [33].

The rest of this paper is organized as follows. The background and some works regarding the AOS and the adaptive parameter control are described in Section 2. The detail description of the LSIAS strategy is given in Section 3, and its utilization with the MOEA/D-DRA is presented in Section 4. Section 5 analyzes the result of the comparison experiments. Finally, conclusions are summarized and further work along the direction of the adaptive selection is discussed.

#### 2. Related Background

##### 2.1. Tchebycheff Approach in MOEA/D

MOEA/D provides a method which decomposes an MOP into a series of single problems. It is suitable to evaluate the performance of operators. There are three common decomposition methods: the weighted sum approach, the Tchebycheff approach, and the boundary intersection method, which are all described in [9]. This paper employs the Tchebycheff approach and it is in the formwhere is the reference point and for each . For each Pareto optimal point there exists a weight vector such that is the optimal solution of (2) and each optimal solution of (2) is a Pareto optimal solution of (1). Therefore, one is able to obtain different Pareto optimal solutions by altering the weight vector.

##### 2.2. Basic Differential Evolution Operator

Differential evolution (DE) is a parallel direct search method. There are many various mutation strategies about DE. Here, several frequently used DE operators [15, 34–36] are given as follows:(i)DE/rand/1: ,(ii)DE/rand/2: ,(iii)DE/target-to-rand/1: ,(iv)DE/target-to-rand/2: ,(v)DE/best/1: ,(vi)DE/best/2: ,(vii)DE/target-to-best/1: ,

where is the target vector, is the mutant vector, and both of them belong to the* i*th subproblem. is one of the best individual vectors in the population. The scale factors* F* and* K* are used to control the influence of the mutant vector difference within the range . The donor vectors , , , , and are different individuals.

After the differential evolution, the crossover operation and the polynomial mutation usually go on to mutate the vector . The crossover operation decides which dimension of the new solution will be replaced by or the trial vector . Basically, DE works with the binomial crossover [14], and the trial vector is formed as follows:where* r *is a random number in the range . CR is the crossover rate and has the same range as* r*. and is a random integer within the range .

The polynomial mutation provides a deep mutation for the to generate a new solution . The polynomial mutation is defined as follows:where and are the lower and upper bounds of the* j*th variable, respectively, is the mutation probability, and is a random number within the range . is a mutation factor and is obtained bywhere is a random number within the range and is the distribution index and is usually set to 20.

The description above is a common DE mutation process. It has been validated that these different DE operators enjoy different search powers. “DE/rand/1” and “DE/rand/2” show strong exploration performance. “DE/target-to-rand/1,” “DE/best/1,” and “DE/best/2” manifest their perfect exploitation performance and are useful for unimodal problems. However, “DE/target-to-rand/1” is more suitable for rotated problems than other DE operators [19]. Accordingly, a combination of multiple DE operators could offer a strong search power to solve most MOPs.

##### 2.3. Adaptive Operator Selection

Every operator is in possession of different search power. Although parameter self-adaptive adjustment improves their search power, it is limited by its best performance. The AOS offers intense search power by choosing different operators when faced with different dilemmas.

There are mainly two AOS methods: the probability based methods, such as the probability matching (PM) [37] and the adaptive pursuit (AP) [38], and the bandits based methods [39]. The similarity of all AOS methods is that they enjoy similar processes, which are applied operator based credit assignment and selection based credit accumulation. Nevertheless, the detail of selection and credit accumulation is different.

###### 2.3.1. Credit Assignment

Two credit assignment methods are often used. One is the dynamic statistic information evaluation about operators, and the other is the search power evaluation which uses various complex statistics to detect outlier production. The former takes recent assist information on operator into account. Some recent assist information is employed as rewards which decide the credit assignment of operators [24]. Rank factor is used to increase the use frequency of better operators. The latter possesses a complex calculation in consideration of two measures, fitness and diversity. In [40], the evaluation criterion depends on the appearance probability of outlier solutions. This evaluation criterion does not regard the fitness as the unique criterion, as the authors argue that infrequent but powerful operators are as significant as frequent but powerless operators. Density estimator is adopted as evaluation criterion in [41, 42], and a statistic method is added in [42]. This method calculates the normalized relative fitness improvements from successful operators, and then it regards the mean value of the improvements brought by operators as the credit. In [26], four different credit assignment methods are adopted, which are Average Absolute Reward, Extreme Absolut Reward, Average Normalized Reward, and Extreme Normalized Reward. All the rewards of each method are evaluated, and the method with the max probability is chosen as the credit assignment method at current generation.

###### 2.3.2. Operator Selection

Assume that there are* K* different operators, and and are the probability vector and the estimate of the* i*th operator reward, respectively.

*Probability Matching*where* r* is the reward of selected and successfully applied operator and* t* is the time point. is a decay factor which alleviates the influence brought by the accumulation reward of previously used operators. In this method, the worst probability is and the best probability is . So each operator has a nonzero probability to be chosen during the whole search process. As every operator performs differently at different phases, nonzero probability operator selections are very suitable for AOSs, and it manifests superior robustness.

*Adaptive Pursuit*where is the same as in (6). When the best operator is successfully applied, it will get a relatively better reward. The accumulation is enlarged and it is the reason why this selection strategy always chooses the best operator, noted .

*Multiarmed Bandit*where is the same as that in (6), is the successful times of th operator at point , and is a scaling factor of the tradeoff of different search powers. In the MAB algorithm, the operator denotes the arm.

Each operator gets different credit after credit assignment, which is the key of operator selection. As (9) revealed, the selection depends on two factors: one is the credit value of operators () and the other is the usage number of operators (the part behind parameter* C*). The parameter* C* plays a crucial role in deciding which factor plays a more important role. SlMAB [24] uses a sliding window with a mechanism of FIFO to store some latest information about operators. The latest information about operators truly reflects the operator performance. Due to the timeliness inadequacy of this accumulation, a decay factor is suggested in [16]. In [21], two modified MAB methods are proposed, which are MOEA/D-UCB-Tuned and MOEA/D-UCB-V. The two methods use a parameter called the rewards’ variance to modify (9), and the experimental results show that UCB-T performs better than V on most test problems.

Except for the two methods mentioned above, there are some other kinds of adaptive operator selection methods, such as gradient based methods and multiple trial vector comparison based methods. Schütze et al. [43] propose a local search mechanism, Hypervolume Directed Search (HVDS). In HVDS, the gradient information is used to select search behaviors which are greedy search, descent direction, and search along the Pareto front. As these search behaviors are based on gradients, these methods cannot be used if objectives are not differentiable. Lara et al. [2] suggest a novel iterative search procedure, Hill Climber with Sidestep (HCS), in which it is capable of moving both toward and along the Pareto set depending on the distance of the current iterate toward this set. The search direction selection is on the basis of the dominance relation between several trial vectors and the old individual.

#### 3. The Proposed Algorithm

In this section, we present an improved bandit based method for MOPs, named latest stored information based adaptive selection strategy. This method attaches more attention to the AOS dynamic nature. It is mainly composed of two parts, credit assignment and operator selection.

##### 3.1. Credit Assignment

Credit assignment contains two main tasks: one is to calculate credit value of applied operators; the other is to assign the credit fairly.

For the first task, fitness change is adopted as rewards of successful applied operators, which is regarded as Fitness Improvement Rate (FIR). During different search processes, the convergence levels of individuals are highly different. Normalization is used for FIR as follows:where is the fitness value of the solution of last generation on the subproblem and is the fitness value of the current solution on the subproblem.

A sliding window with length* W *is used to store operators, their assist information, and FIRs with the mechanism of FIFO. It always stores the latest* W* configuration of operators and their related information.

Supposing that the operator number is , the type number of assist information on operator is* T*, and the number of each type of assist information on operator is , . The structure of sliding window is shown as in Figure 1. As the sliding window revealed, the first layer is stored by operator names, the last is FIRs, and the middle layers are for parameters. It is worth noting that the locations of different types of parameters in the sliding window are ranked according to their significance, and this order is also the computation order of assist information on operator. Since the best suitable operator and the best suitable configuration of operator and assist information on operator are considered simultaneously, the credit value to operator is first to be assigned and the configuration of operator and assist information is considered later. The configurations among assist information on operator are not taken into consideration. The index of FIR is not described in Figure 1. This is mainly because the index of FIR is mainly associated with the credit assignment of different operators and different types of assist information on operator. The details of the index of FIR are illustrated in Figure 2.