Complexity

Complexity / 2021 / Article

Research Article | Open Access

Volume 2021 |Article ID 8878780 | https://doi.org/10.1155/2021/8878780

Jin Jin, "Large-Scale Evolutionary Strategy Based on Gradient Approximation", Complexity, vol. 2021, Article ID 8878780, 12 pages, 2021. https://doi.org/10.1155/2021/8878780

Large-Scale Evolutionary Strategy Based on Gradient Approximation

Academic Editor: Guang Li
Received04 Sep 2020
Revised05 Apr 2021
Accepted03 May 2021
Published17 May 2021

Abstract

For large-scale optimization, CMA-ES has the disadvantages of high complexity and premature stagnation. An improved CMA-ES algorithm called GI-ES was proposed in this paper. For the problem of high complexity, the method in this paper replaces the calculation of a covariance matrix with the modeling of expected fitting degrees for a given covariance matrix. At the same time, to solve the problem of premature stagnation, this paper replaces the historical information of elite individuals with the historical information of all individuals. The information can be seen as approximate gradients. The parameters of the next generation of individuals are generated based on the approximate gradients. The experimental results were tested using CEC 2010 and CEC2013 LSGO benchmark test suite, and the experimental results verified the effectiveness of the algorithm on a number of different tasks.

1. Introduction

Most machine learning problems can be modeled as optimization problems, and as the amount of data increases, the model becomes more and more complex. Therefore, the problem of large-scale optimization has become the focus of most researchers. The covariance matrix adaptation evolutionary computation (CMA-ES) [1] is one of the most powerful evolutionary strategies for global optimization. It is an algorithm based on the population probability distribution, very similar to the estimation of distribution algorithm (EDA) [2]. A common shortcoming of simple evolutionary strategies is that the noise parameters of the standard deviation are fixed. CMA-ES can automatically adjust the standard deviation according to the distribution of the population, which brings two benefits:(1)Increasing the diversity of the population, so that the algorithm can jump out of the local optimal solution(2)Adjusting the standard deviation parameters so that the algorithm can adaptively adapt to the fitness terrain

In addition, because CMA-ES can use the information of the optimal solution to adjust its parameters at the same time, it can expand the search scope when the optimal solution is far away or narrow the search scope when the optimal solution is near. Due to these advantages, CMA-ES algorithm, as one of the most popular gradient-free optimization algorithms, has become the choice of many researchers and practitioners.

Despite its huge advantage in solving optimization problems, CMA-ES suffers some limitations when dealing with LSOPs:Time-consuming: because the basic operation of CMA-ES is based on covariance, and each generation takes time and space, sampling the new population requires additional calculations to decompose the matrix. CMA-ES relies on spectral decomposition when dealing with numerical errors and ill-conditioned conditions, which is generally considered to be computationally inefficient compared to other decomposition techniques.Lack of diversity in population: another obvious disadvantage is that CMA-ES assessed some of the best individuals. Although it can speed up convergence to some extent, this strategy discards most of the information. This prevents the CMA-ES from performing well when dealing with ill-conditional problems. We all know that great people in life have certain qualities that we can learn from, but some people who fail also keep a record of “not doing” something. It is important for the next generation to make better calculations and evaluations.

These two limitations may prevent the use of CMA-ES for LSGOs.

To solve the above problems, this paper proposes an evolutionary strategy based on gradient information utilization (GI-ES), which extends the application of CMA-ES in the field of large-scale optimization. The characteristics of GI-ES are as follows:(1)The fitness scores were optimized for each sampling scheme for each GI-ES. The best-performing scheme is likely to perform better in the sampled generation if the expected results are good enough. Maximizing the expected fit score of a sampling scheme is the same as maximizing the overall fit score of the sample in a given sample.(2)The gradient information is simulated using individual information, and the approximate gradient information is used to guide the search.(3)GI-ES uses an approximate gradient for the search directions, allowing the algorithm to adapt to the fitness landscape, on which the variables depend. This process generates a fitness score evaluation based on the expected fitting score. The gradient signal is obtained by using the maximum likelihood estimation method of the model described above. It differs from traditional evolutionary computation in that it represents the “population” as a parameterized distribution. It uses a search gradient to update the parameters of the distribution, which can be calculated using the adaptive values of historical data.

Following the introduction section, we first discuss the related work of large-scale optimization and the utilization of historical information strategies for evolutionary computation in Section 2. In Section 3, the background and motivation of this paper are introduced. Section 4 describes the detailed implementations of GI-ES. Thereafter, the simulation results on the benchmark test suites are examined to evaluate the effectiveness of the proposed approach in Section 5. Finally, Section 6 summarizes this paper.

In recent years, large-scale optimization has become a hot research topic, and many large-scale benchmark functions have been put forward to examine the merits of large-scale optimization algorithms. Researchers have done a lot of useful work and proposed many solutions to solving large-scale global optimization problems.

There are currently two main research directions for large-scale optimization problems:(1)Decomposition-based algorithm: the dimensionality reduction (decomposition) is carried out for large-scale problems in the form of grouping, so as to decompose the large-scale problems into multiple sub-problems to solve. Multiple sub-problems are optimized using evolutionary algorithms within the framework of cooperative coevolution (CC).(2)Instead of directly decomposing large-scale problems, optimization is carried out by combining multiple local search algorithms, each with its own set of parameters.

The rest of this section will further discuss some algorithms in the recent CEC large-scale optimization competition.

CEC2008 is the first known LSGO competition. Multiple trajectory search (MTS) [3] was the first winner. Then, MTS-LS1, MTS-LS2, and MTS-LS3 are the improved version of MTS. Some DE-based algorithms have achieved good results in LSGO competitions. Self-adaptive differential evolution with multi-trajectory search (SaDE-MMTS) [4] is a hybridization algorithm, which integrates JADE [5] and modified MTS-LS1. MA-SW-Chains [6] is an extension of MA-CMA-Chains, where CMA was replaced with Solis Wets (SW).

In the CEC2010 competition, MA-SW-Chains was the winner. Ensemble Optimization Evolutionary Algorithm (EOEA) [7] was the second-ranked algorithm. In EOEA, the optimization process is divided into two stages, namely, global shrinking and local exploration. In EOEA, EDA based on the Mixed Gaussian and Cauchy Model (MUEDA) is used to achieve more quickly convergence. The goal of the second stage is exploitation. The third place in CEC2010 is the Differential Ant Colony Algorithm (DASA) [8]. It tries to solve LSGO by converting the real parameter optimization problem into a graph search problem.

Improved multiple offspring sampling (MOS) [9] was the winner of CEC2012. MOS combines SW and MTS-LS as two local searches. Self-adaptive differential evolution algorithm (jDElsgo) was proposed in [10]. After continuous improvement, jDElsgo was ranked second in CEC2012. The third rank was cooperative coevolution evolutionary algorithm with global search (CCGS). CCGS is considered an extension of EOEA [11]. Cooperative coevolution with delta grouping (DECC-DML) was proposed in [12] to enhance the performance of CC framework on non-separable problems.

CEC2013 uses a new benchmark functions. The winner of CEC2013 competition was modified MOS [13]. DECC-G [14] was the reference algorithm in CEC2013. The second-ranked algorithm was smoothing and auxiliary function-based cooperative coevolution for global optimization (SACC) [15]. SACC adopted parallel search for the first time under the CC framework.

Bi-space interactive cooperative coevolutionary algorithm (BICA) [16] is a two-space co-evolutionary algorithm framework that evolves in two spaces. The model evolves to provide better grouping, and the individual evolves to achieve better adaptability. SHADE with Iterative Local Search (SHADE-ILS) [17] iteratively combines the modern differential evolution algorithm with a local search method selected from various search methods. The selection of local search methods is dynamic and takes into account the improvements they have made in the previous enhancement phase to determine the best method for the problem in each situation. In LSHADE-SPA [18], differential evolution with linear population size reduction is used for global exploration, while a modified version of multitrack search is used for local exploitation.

A hybrid adaptive evolutionary differential evolution (HACC-D) was also proposed in CEC2014 [19]. HACC-D belongs to the CC class of algorithms. JADE and SaNSDE are used as CC subcomponent optimization algorithms. Scaling up Covariance Matrix Adaptation Evolution Strategy using cooperative coevolution (CC-CMA-ES) is another CC-based competitive algorithm. The basic optimizer in CC-CMA-ES is CMA-ES.

For a detailed overview of the state-of-the-art large-scale optimization algorithms, please refer to [20]. This paper proposes an algorithm for large-scale optimization problems, called GI-ES. The algorithm has been verified on the CEC2010 and CEC2013 test sets. The experimental results verify the potential value of the algorithm.

3. Background and Motivation

This section briefly introduces the background of CMA-ES algorithm. With that in mind, the paper focuses on the utilization of information by the CMA-ES algorithm. Based on the analysis of different information utilization methods, this paper proposes a model of gradient information utilization.

3.1. CMA-ES

The CMA-ES algorithm is an evolutionary strategy algorithm proposed by Hansen et al. [1]. CMA-ES algorithm uses a Gaussian distribution to sample the solution space of the optimization problem. The parameters of the distribution are updated according to a sample selection mechanism. Iterate through the update process until all the conditions are met.

For the objective function , CMA-ES generates a new generation of population by estimating the distribution of the objective function; that is, the points in the new population are obtained from the normal distribution , where represents the mean value, is a positive definite matrix, and this matrix is called the covariance matrix. In the algorithm, the covariance matrix is continuously adjusted to make the distribution of the search points closer to the equipotential line of the target function; that is, the ideal covariance matrix should be the inverse matrix equal to the hessian matrix, though this is difficult to achieve for more complex functions. Next, we briefly introduce the basic ideas behind CMA-ES.

3.1.1. Sampling

For a given objective function, CMA-ES first assumes the existence of an optimal fitness terrain to make the search move in the optimal direction. For convex quadratic function, the correlation between variables is linear and can be completely eliminated. So CMA-ES treats these functions “as” spherical functions and effectively solves them. This strategy is also applicable to general black-box objective functions, because their local landscapes can be approximated as convex quadratic functions. In general, the optimal covariance is not known. CMA-ES sampling through the multivariate Gaussian distribution . is the search step size, which is used to control the local search capability of Gaussian distribution. The number of samples is used for each sample, and then the fitness value of the samples is calculated. In each generation, CMA-ES provides a multivariate normally distributed parameter for sampling the next generation.

3.1.2. Update

The CMA-ES algorithm updates the parameters (, and ). Update operations are more complex. The new mean was updated with the excellent individuals obtained in the generation. The contribution of individuals to the mean value is considered by inertia weighting.wherewhere represents individuals with the highest fitness ranking among samples. Because the weight of each sample is different, the weight of the sample with better fitness is bigger.

There are two ways to update covariance: and . This paper adopts update strategy. The update strategy of covariance matrix is as follows:where is the n-dimensional integer matrix, is the learning rate, the learning rate range is [0, 1], and is the evolutionary path, that is, the memory of mean deviation will decay along with the optimization process. Therefore, CMA-ES algorithm is used to model the variance matrix of multivariate Gaussian distribution, instead of using maximum likelihood estimation to model the sample. In this way, the successful step size update in the previous step will be highly likely to appear again in the next generation. For the detailed method of updating, see [21].

The step size isamong which is the evolutionary path.

3.2. Motivation

Because CMA-ES can use the information provided by the optimal solution to adjust its mean value and variance simultaneously, it can expand the search scope when the optimal solution is far away, or narrow the search scope when the optimal solution is near. But the covariance matrix takes in space and time. Throughout the iteration process, only the individuals with higher fitness ranking are considered. When faced with some simple problems, the algorithm can quickly converge, but when faced with large-scale optimization problems, this can quickly lead to premature stagnation.

Survival of the fittest is an important part of evolutionary computing, but the diversity of populations is crucial to the performance of the algorithm. Through evolution, the traditional CMA-ES preserves the optimal individuals and influences the distribution of the next generation by learning the individuals of the optimal parts. That is, in the evolutionary past, populations tended more and more toward “elite” individuals than toward “bad” individuals. But the bad ones retain key information about what not to do. Success studies are everywhere, but what is really useful is often the experience of unsuccessful people, because it contains long-term life observations and lessons. These constitute the motivation for writing this article.

4. Proposed Approach

The proposed algorithm GI-ES adopts the basic framework of CMA-ES but makes some improvements to CMA-ES. In the proposed scheme, keep all the information about each scheme available to each generation, good or bad. With these gradient signal assessments, we can move the whole scheme in a better direction for the next generation. Since we need to evaluate the gradient, we can use the standard stochastic gradient descent algorithm (SGD) applied to deep learning [22].

4.1. GI-ES

The fitness score was optimized for each sampling scheme in GI-ES. If the expected results are good enough, the best-performing scheme in the sampling generation may perform better. Maximization of the expected fitness score of a sampling scheme is actually equivalent to maximization of the overall fitness score.

Assuming that is the sampling scheme vector of the probability distribution function , we can define the expected value of the objective function F aswhere represents the parameter of the probability distribution function. For example, if is a normal distribution, is and . For our simple two-dimensional problem, every whole is a two-dimensional vector . Using the same logarithm likelihood method as in REINFORCE, we can calculate the gradient of :

In a sample of size , we have scheme , so that the gradient can be estimated by summation:

With the above gradient, we can use a learning rate of alpha (say 0.01) and start our theta optimization of the probability distribution function so that our sampling scheme gets a higher fitness score on the target function F. Using SGD or Adam [23] algorithm, we can update in the next generation:

After the probability distribution function is updated, the new competitive scheme can be sampled until an appropriate solution is obtained. Since relevant parameters are not used, the efficiency of this algorithm is .

GI-ES adopts an approximate gradient as the direction of search. It represents the “population” in traditional evolutionary computation as a parameterized distribution .

4.1.1. Multinormal Distribution

In this proposed method, multivariate normal distribution is used as the parameterized distribution. The parameter of multivariate normal distribution is . is mean of multivariate normal distribution, and is the covariance matrix. To sample more efficiently, we need a matrix , meeting . Then, the problem can be simplified to a standard multivariate normal distribution problem. represents the probability density function of the multinormal distribution.

To calculate the gradient information of the multivariate Gaussian variable, the logarithm of the probability density is obtained, so that the gradient can be estimated by summation:so and can be obtained in (15) and (14).

Then, update the parameters with the calculated gradient information.

To adapt to exploring or mining solution space, the algorithm can change the solution distribution according to the parameters of the solution that it is exploring.

4.1.2. The Technique of GI-ES

Further, A can be decomposed into a scale parameter , and a normalized covariance factor satisfying . This decoupling form of two orthogonal components can be independently learned.

The advantage of overall information utilization is to prevent information loss, but outliers still need to be considered. Therefore, this paper adopts the method of fitting degree shaping to solve outliers dominant situation [24]. In this method, according to the fitness value information, the population individuals are ranked according to the fitness from small to large. Calculate the utility value according to the fitness value .

The complexity of each covariance matrix update is . The complexity can be reduced to by calculating the update of local non-exponential coordinates. In this case, the update of gradient information can be decomposed into the following components:

The pseudocode of GI-ES is given in Algorithm 1.

Require:: objective function; : initial ; ;
Ensure: optimal
(1)initial and ;
(2)repeat
(3)  fordo
(4)    draw sample ;
(5)    ;
(6)    evaluate the fitness value
(7)  end for
(8)  sort the sampling particles according to the fitness value and compute utilities function
(9)  compute approximate gradients according to (17)–(20)
(10)  update parameters use the approximate gradient information
(11)  update the approximate mean vector
(12)  update the scalar step size
(13)  update the covariance factor
(14)until the termination criterion

5. Experiments and Analysis

In this section, GI-ES is used to compare with the state-of-the-art algorithms to verify the effectiveness of the proposed algorithm. Three performance analysis experiments were performed. First, GI-ES was evaluated using CEC2010. Second, CEC2013 is used to evaluate the GI-ES and compared with nine state-of-the-art algorithms. Finally, a parametric analysis was performed to study the effect of each component in GI-ES.

The first experiment is CEC2010, the second experiment is CEC2013, and the third experiment is statistical analysis. The benchmark functions are shown as follows. The dimensions (D) of all functions are 1000 except for two overlapping functions, F13 and F14 in CEC2013, where D is 905.(1)The CEC2010 contains 20 test questions. These test functions can be divided into four classes:(i)F1–F3: separable functions(ii)F4–F8: partially separable functions, in which a small number of variables are dependent, while all the remaining ones are independent ()(iii)F9–F18: partially separable function that consists of multiple independent subcomponents, each of which is m-non-separable ()(iv)F19–F20: fully non-separable functionsFor more detailed features, please refer to [25].(2)The CEC2013 contains 15 test questions:(i)F1–F3: separable functions(ii)F4–F11: partially separable functions(iii)F12–F14: overlapping functions (D = 905)(iv)F15: fully non-separable functions

For more detailed features, please refer to [26].

To make the experimental data more accurate, each experiment was run 25 times to record the statistical results. The solution error measure was recorded at the end of each run; is the well-known global optimum of each function. The maximum number of iterations is set to 3.0E + 6 according to the default value of test suite.

5.1. Parametric Analysis

Most of the parameters of GI-ES are the same as reference [27]. In the algorithm, the number of population and the learning rate of gradient information are the parameters that need to be specified artificially.

In Section 4, we stated that GI-ES represents a mixed effect of three main components which are as follows: (1) the fitness scores were optimized for each sampling scheme for each GI-ES, (2) the gradient information is simulated using individual information, and the approximate gradient information is used to guide the search, and (3) GI-ES uses an approximate gradient for the search directions.

To further verify the performance of the algorithm, we analyzed each part of the algorithm to show the individual effect of each components. Table 1 illustrates the mean values of each component, and best values are marked in bold. It can be seen from the results that the GI-ES algorithm with three mechanisms works best.GI-ES (1): the fitness scores were optimized for each sampling scheme for each GI-ES. GI-ES (2): the gradient information is simulated using individual information, and the approximate gradient information is used to guide the search. GI-ES (3): GI-ES uses an approximate gradient for the search directions.


Func.GI-ES (1)GI-ES (2)GI-ES (3)GI-ES

F11.26E + 056.62E+037.34E + 045.06E + 04
F29.11E + 022.92E + 029.85E + 032.10E24
F36.16E + 008.09E139.36E + 001.05E + 01
F44.51E + 095.07E + 093.30E + 111.39E+08
F51.93E + 061.25E + 067.48E + 061.09E+06
F65.43E + 035.47E + 032.28E+011.11E + 06
F77.78E + 013.59E + 064.20E + 096.34E+01
F82.08E + 122.04E+106.21E + 153.12E + 11
F91.61E + 088.60E + 075.28E + 081.35E+08
F102.91E + 048.46E + 027.60E+028.11E + 07
F112.55E + 082.21E + 083.80E + 115.14E+05
F124.35E + 032.39E + 031.16E + 115.13E+01
F138.02E + 089.04E + 061.92E + 089.87E+04
F145.15E + 086.88E + 083.65E + 074.58E+06
F153.39E + 062.86E + 061.32E + 065.56E+05

5.2. Evaluation Criteria

To evaluate the performance of GI-ES, we apply the same methods in [18] such that three evaluation criteria were used. The first is Formula One Score (FOS). Formula One Score was used in the latest LSGO competition (CEC2015). According to this criterion, the algorithms will be ranked from best to worst. Then, the top 10 ranks will get 25, 18, 15, 12, 10, 8, 6, 4, 2, and 1, respectively. Algorithms ranked outside the top 10 will get zero. Maximum values of R indicate better performance. The second and third are two non-parametric statistical hypothesis tests: Friedman test using as a significance level.

5.3. Performance Analysis Using CEC2010 and CEC2013

Statistical results of GI-ES using CEC2010 and CEC2013 are illustrated in Tables 2 and 3, respectively. Figure 1 illustrates the convergence behavior of GI-ES using sample functions from each class in CEC2013: f3 as fully separable, f8 and f11 as partially separable functions, f12 and f14 as overlapping functions where D = 905, and f15 as fully non-separable functions.


FuncBestWorstMedianMeanStd.

F13.54E + 033.88E + 033.77E + 033.71E + 031.70E + 02
F28.59E + 021.18E + 031.01E + 031.02E + 031.61E + 02
F38.69E 011.33E + 001.03E + 001.10E + 002.31E 01
F42.49E + 092.73E + 092.27E + 092.61E + 091.21E + 08
F51.87E + 074.41E + 073.52E + 073.14E + 071.27E + 07
F61.02E + 011.02E + 011.02E + 011.02E + 011.27E 02
F78.81E + 041.32E + 051.12E + 051.10E + 052.19E + 04
F82.42E + 053.30E + 051.20E + 052.86E + 054.41E + 04
F92.10E + 034.54E + 032.78E + 033.32E + 031.22E + 03
F109.07E + 021.13E + 031.01E + 031.02E + 031.13E + 02
F111.09E + 011.34E + 012.01E + 011.21E + 011.25E + 00
F120.00E + 000.00E + 000.00E + 000.00E + 000.00E + 00
F131.90E + 002.23E + 011.04E + 011.21E + 011.02E + 01
F143.30E + 036.36E + 035.22E + 034.83E + 031.53E + 03
F159.07E + 021.15E + 031.01E + 031.03E + 031.23E + 02
F169.10E + 003.37E + 013.23E + 012.14E + 011.23E + 01
F170.00E + 000.00E + 000.00E + 000.00E + 000.00E + 00
F181.98E + 022.20E + 032.00E + 031.20E + 031.00E + 03
F198.03E + 048.99E + 049.21E+ 048.51E + 044.82E + 03
F206.40E + 011.01E + 035.19E + 024.39E + 024.75E + 02


FuncBestWorstMedianMeanStd.

F13.86E + 046.26E + 045.01E + 045.06E + 041.20E + 04
F21.98E 242.22E 240.00E + 002.10E 241.21E 25
F31.05E + 011.05E + 011.03E + 011.05E + 011.14E 02
F46.98E + 072.08E + 086.05E + 121.39E + 086.92E + 07
F58.56E + 051.32E + 064.03E 251.09E + 062.34E + 05
F61.10E + 061.12E + 061.25E + 031.11E + 061.15E + 04
F72.32E + 011.04E + 021.93E + 016.34E + 014.02E + 01
F81.06E + 115.18E + 112.01E + 113.12E + 112.06E + 11
F91.24E + 081.46E + 081.51E + 081.35E + 081.09E + 07
F108.06E + 078.16E + 078.10E + 078.11E + 075.30E + 05
F113.02E + 057.26E + 053.23E + 055.14E + 052.12E + 05
F124.12E + 016.14E + 012.31E + 005.13E + 011.01E + 01
F139.15E + 041.06E + 056.21E + 049.87E + 047.18E + 03
F144.24E + 064.92E + 064.35E + 064.58E + 063.39E + 05
F153.22E + 057.90E + 055.01E + 055.56E + 052.34E + 05

CMA-ES variants for solving large-scale optimization are used as comparison algorithms. Some other algorithms are not derived from CMA-ES but are the state-of-the-art ones, for example, CC-based differential evolution (DECC-G) [14], the multiple offspring sampling (MOS) [9]. The comparison algorithm is shown in Tables 4 and 5. All of these algorithms followed the same CEC2010 and CEC2013 guidelines. The results of the comparative experiment are recorded in Tables 6 and 7.


#AlgorithmPublished year

1SDENS [28]2010
2jDElsgo [10]2010
3DECC-DML [12]2010
4MA-SW-chains [29]2010
5DASA [8]2010
6EOEA [7]2010
7LMDEa [30]2012
8jDEsps [4]2012
9MOS2012 [9]2012
10CCGS [11]2012
11DM-HDMR [31]2014
12HACC-D [32]2014
13DISCC [33]2016
14EADE [34]2017
15ANDE [35]2017
16SEE [36]2018
17MMO-CC [37]2018


#AlgorithmPublished year

1MOS2013 [13]2013
2SACC [15]2013
3DECC-CG [14]2013
4VGDE [38]2014
5IHDELS [39]2015
6CBCC3-DG2 [40]2016
7CRO [41]2016
8CCFR-I [42]2017
9CCFRI-DG2 [42]2017


Func.GI-ESEADEANDELMDEaSDENSjDElsgoDECC-DMLMA-SWDISCC

F13.71E + 034.70E 223.72E 261.35E 235.73E 068.86E 201.93E 252.10E 149.77E 25
F21.02E + 034.16E + 023.72E + 026.97E + 022.21E + 031.25E 012.17E + 028.10E + 025.07E + 02
F31.10E + 006.25E 147.46E 146.44E 012.70E 053.81E 121.18E 137.28E 137.77E 14
F42.61E + 091.08E + 115.13E + 112.08E + 115.11E + 128.06E + 103.58E + 123.53E + 111.26E + 12
F53.14E + 078.79E + 079.19E + 076.62E + 071.18E + 089.72E + 072.99E + 081.68E + 082.35E + 08
F61.02E + 011.90E + 011.64E + 002.63E 012.02E 041.70E 087.93E + 058.14E + 042.13E + 06
F71.10E + 052.11E 011.80E + 002.45E 011.20E + 081.31E 021.39E + 081.03E + 026.45E + 05
F82.86E + 052.26E041.25E + 073.61E 045.12E + 073.15E + 063.46E + 071.41E + 077.54E + 07
F93.32E + 033.67E + 074.12E + 072.64E + 075.63E + 083.11E + 075.92E + 071.41E + 076.08E + 07
F101.02E + 032.62E + 033.06E + 032.80E + 036.87E + 032.64E + 031.25E + 042.07E + 032.27E + 03
F111.21E + 011.14E + 028.95E + 011.19E + 012.21E + 022.20E + 011.80E 133.80E + 018.20E 01
F120.00E + 002.80E + 044.89E + 041.83E + 044.13E + 051.21E + 043.80E + 063.62E 063.34E + 04
F131.21E+011.01E + 031.11E + 035.95E + 022.19E + 037.11E + 021.14E + 031.25E + 031.31E + 03
F144.83E + 031.46E + 081.87E + 088.63E + 071.88E + 091.69E + 081.89E + 083.11E + 072.08E + 08
F151.03E + 033.18E + 033.19E + 035.63E + 037.32E + 035.84E + 031.54E + 042.74E + 035.54E + 03
F162.14E + 013.00E + 023.04E + 023.87E + 024.08E + 021.44E + 025.08E 029.98E + 011.98E + 01
F170.00E+001.52E + 052.02E + 052.14E + 051.08E + 061.02E + 056.54E + 061.24E + 001.81E + 05
F181.20E+032.26E + 032.35E + 031.68E + 033.08E + 041.85E + 032.47E + 031.30E + 035.16E + 03
F198.51E + 041.29E + 061.63E + 064.42E + 058.80E + 052.74E + 051.59E + 072.85E + 051.71E + 06
F204.39E+022.10E + 032.20E + 031.38E + 039.90E + 021.53E + 039.91E + 021.07E + 031.94E + 03

Func.DASAEOEAjDEspsMOS2012DM-HDMRHACC-DCCGSSEEMMO-CC
F11.52E 212.20E 234.10E 230.00E+002.34E + 011.99E 271.83E 226.99E 110.00E+00
F28.48E + 003.62E 011.10E + 021.97E + 024.36E + 031.43E144.44E 028.77E + 031.43E + 03
F37.20E 111.67E 131.30E 131.12E + 001.67E + 013.45E 141.91E 011.99E + 010.00E+00
F45.05E + 112.86E + 128.15E + 111.91E + 106.96E + 111.55E + 121.79E + 122.58E + 117.64E+06
F56.20E + 082.24E + 077.71E + 076.81E + 081.45E + 081.96E + 081.97E+075.85E + 083.34E + 08
F61.97E + 073.85E + 065.58E 031.99E + 071.63E + 013.55E092.88E + 061.99E + 075.77E 01
F77.78E + 001.24E + 025.77E + 050.00E + 002.91E + 053.87E071.37E + 023.14E 022.41E + 10
F84.98E + 071.01E + 071.52E + 061.12E + 064.41E + 077.44E + 072.81E + 071.82E + 062.63E + 08
F93.60E + 074.63E + 072.31E + 045.75E + 065.20E + 073.32E + 075.53E + 072.67E + 078.99E+01
F107.29E + 031.00E+031.85E + 037.86E + 034.49E + 031.30E + 044.74E + 031.27E + 041.63E + 03
F111.98E + 023.18E + 011.94E 051.99E + 021.10E + 017.82E142.99E + 012.19E + 022.99E + 00
F121.78E + 032.61E + 041.57E + 040.00E+001.97E + 031.31E + 065.35E + 032.60E + 020.00E + 00
F131.21E + 031.24E + 031.86E + 021.36E + 033.35E + 061.96E + 031.51E + 037.12E + 023.05E + 04
F141.00E + 081.65E + 083.85E + 051.52E + 073.41E + 089.21E + 071.35E + 089.88E + 070.00E+00
F151.45E + 042.14E + 035.50E + 031.54E + 045.95E + 031.56E + 041.74E+031.50E + 042.05E + 03
F163.97E + 028.26E + 014.97E + 003.97E + 021.24E 061.95E113.11E + 013.97E + 028.87E + 00
F171.03E + 047.93E + 045.52E + 044.66E 054.03E + 041.42E + 061.48E + 047.40E + 030.00E+00
F184.92E + 032.94E + 039.73E+023.91E + 038.40E + 034.02E + 033.13E + 033.14E + 033.37E + 04
F198.34E + 051.84E + 068.00E+ 053.41E+041.71E + 061.87E + 075.93E + 057.13E + 051.54E + 07
F201.13E + 031.97E + 038.79E + 025.31E + 022.45E + 061.51E + 031.31E + 031.43E + 031.10E + 03

FE = 3.0E + 06 bold font represents the best result.

Func.GI-ESMOS2013DECC-CGCBCC3-DG2CCFR-IDG2CCFR-ICROIHDELSVGDESACC

F15.06E + 040.00E + 002.03E 138.65E + 052.00E 051.30E 051.84E + 064.34E 280.00E + 002.73E 24
F22.10E248.32E + 021.03E + 031.41E + 043.60E + 025.50E 019.84E + 021.32E + 034.56E + 017.06E + 02
F31.05E + 019.17E132.87E 102.06E + 012.10E + 012.00E + 012.01E + 012.01E + 013.98E 131.11E + 00
F41.39E + 081.74E + 082.60E + 103.39E + 079.60E + 074.50E+071.55E + 103.04E + 085.96E + 084.56E + 10
F51.09E + 066.94E + 067.28E + 142.14E + 062.80E + 062.50E+062.38E + 079.59E + 063.00E + 067.74E + 06
F61.11E + 061.48E + 054.85E+041.05E + 061.10E + 061.10E + 061.06E + 061.03E + 061.31E + 052.47E + 05
F76.34E+011.62E + 046.07E + 082.95E + 072.00E + 078.60E + 062.78E + 083.46E + 041.85E + 038.98E + 07
F83.12E + 118.00E + 124.26E + 146.74E+107.00E + 109.60E + 094.56E + 141.36E + 127.00E + 141.20E + 15
F91.35E+083.83E + 084.27E + 081.70E + 081.90E + 081.90E + 085.27E + 086.74E + 082.31E + 085.98E + 08
F108.11E + 079.02E+051.10E + 079.28E + 079.50E + 079.50E + 079.44E + 079.16E + 071.57E + 022.95E + 07
F115.14E+055.22E + 072.46E + 117.70E + 084.00E + 083.30E + 082.91E + 101.07E + 077.52E + 072.78E + 09
F125.13E+012.47E + 021.04E + 035.81E + 071.60E + 096.00E + 083.69E + 033.77E + 022.52E + 038.73E + 02
F139.87E+043.40E + 063.42E + 106.03E + 081.20E + 099.30E + 085.33E + 093.80E + 061.36E + 091.78E + 09
F144.58E+062.56E + 076.08E + 111.11E + 093.40E + 092.10E + 096.08E + 101.58E + 072.29E + 101.75E + 10
F155.56E+052.35E + 066.05E + 077.11E + 069.80E + 068.20E + 061.88E + 072.81E + 063.44E + 062.01E + 06

FE = 3.0E + 06, bold font represents the best result.

Tables 8 and 9 summarize the ranking for GI-ES and the state-of-the-art algorithms using Formula One Score (FOS). Tables 10 and 11 summarize the ranking obtained using Friedman’s test.


GI-ESEADEANDELMDEaSDENSjDElsgoDECC-DMLMA-SW-ChainsDISCC

22610676125291257111550

DASAEOEAjDEspsMOS2012DM-HDMRHACC-DCCGSSEEMMO-CC
4994195168431649879237


GI-ESMOS2013DECC-CGCBCC3-DG2CCFR-IDG2CCFR-ICROIHDELSVGDESACC

27821810315012015455157196111


GI-ESjDEspsjDElsgoMMO-CCLMDEaMA-SW-ChainsMOS2012EADECCGS

5.3367.757.757.87.958.638.89.4

EOEAHACC-DANDEDASASEEDISCCDECC-DMLDM-HDMRSDENS
9.69.859.911.111.2811.6311.6812.4814.1


GI-ESMOS2013DECC-CGCBCC3-DG2CCFR-IDG2CCFR-ICROIHDELSVGDESACC

33.577.25.66.35.178.35.174.436.27

5.3.1. Formula One Score (FOS)

As shown in Table 8 and 9, GI-ES ranked second in all comparison algorithms of CEC2010 using Formula One Score (FOS) and first for CEC2013. Regarding CEC2010, the best algorithm is MMO-CC with 237 points. Comparing the winners of CEC2010 and CEC2012: MA-SW-chains and MOS2012 get 115 and 168 points. This shows that GI-ES is competitive. Using CEC2013, GI-ES gets points, followed by MOS2013, and VGDE, with 218 and 196 points, respectively.

5.3.2. Friedman Test

According to Friedman test illustrated in Tables 10 and 11, GI-ES obtained the best ranking for both CEC2010 and CEC2013 benchmarks. Using CEC2010, GI-ES gets 5.33 points, jDEsps gets 6 points, MMO-CC gets 7.75, and MA-SW-chains gets 7.95 points. While using CEC2013, GI-ES gets 3 points, MOS2013 gets 3.57 points.

From the previous comparison, high ranking using Formula One Score (FOS) does not guarantee the same ranking using Friedman test.

In the Friedman test, the critical value is 16.92, and value is 1.64E 05. These indicate that there are significant differences between different algorithms.

High ranking using Formula One Score (FOS) does not guarantee the same ranking using Friedman test. Because more weight will be given for the top positions.

5.4. CPU Computational Time

Previous experimental results have evaluated the effectiveness of GI-ES. This section illustrates the analysis of the algorithm. The runtime of the algorithm is shown in Table 12. Table 12 records the average running time of GI-ES and the baseline versions of CMA-ES and MA-SW-Chain CEC2010 LSGO. The dimensions are 1000.


D = 1000F1F2F3F4F5F6F7F8F9F10

MA-SW-Chain166.87218.9197.29199.92260.2229.17169.57165.96431.78478.9
CMA-ES159.14216.73184.41141.91232.9237.31236.01136.01186.61198.43
GI-ES147.96213.04180.7298.22189.21193.62192.3292.32142.92154.74

D = 1000F11F12F13F14F15F16F17F18F19F20
MA-SW-Chain472.41169.31169.96687.67735.8725.51174.55180.11163.41174.07
CMA-ES308.27106.25137.12233.54372.58277.1536.538.39136.57138.22
GI-ES251.9692.5493.41189.85328.89333.4692.8194.7192.88194.53

MA-SW-Chain is also used as a comparison. The table records the average time of 51 independent runs of the three algorithms when the dimension is 1000. The maximum number of iterations is 10E06.

It can be seen from the results that, for the separable functions f1–f3, the runtime of GI-ES is slightly better than that of MA-SW-Chain and CMA-ES. The run-time difference between the three algorithms is not obvious. For partially separable functions (f4–f18), the running time of GI-ES is significantly better than that of CMA-ES and MA-SW-Chains. In terms of fully non-separable functions, CMA-ES performs better than GI-ES. In general, GI-ES offers better performance on the part of the separability problem.

6. Conclusion

This research has shown that the use of guiding gradient information improves the performance of evolutionary computations. The problem of low utilization of historical information by ES was solved by guiding the information generated by the distribution of next-generation solutions. The guidance information was obtained by approximating the gradient. This strategy not only increased the diversity of information, but also made full use of the optimal information in the heuristic algorithm. The theoretical analysis and experimental results showed that this method incorporating guidance information is accurate and stable.

The experimental results showed that the use of guiding information is effective. The algorithm was also compared with other state-of-the-art meta-heuristic algorithms and demonstrated good average performance and pair-wise comparison performance across a wide range of test functions.

The experiments showed that this algorithm is an effective global optimization method for large-scale problems, which makes it applicable to a large number of practical applications. The principle behind the use of guidance information is simple, but effective, and has a certain guiding significance for heuristic optimization algorithms.

Data Availability

The data are available at https://titan.csit.rmit.edu.au/∼e46507/publications/lsgo-cec10.pdf.

Conflicts of Interest

The author declares that there are no conflicts of interest.

References

  1. N. Hansen, S. D. Müller, and P. Koumoutsakos, “Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (CMA-ES),” Evolutionary Computation, vol. 11, no. 1, pp. 1–18, 2003. View at: Publisher Site | Google Scholar
  2. M. Hauschild and M. Pelikan, “An introduction and survey of estimation of distribution algorithms,” Swarm and Evolutionary Computation, vol. 1, no. 3, pp. 111–128, 2011. View at: Publisher Site | Google Scholar
  3. L.-Y. Tseng and C. Chen, “Multiple trajectory search for large scale global optimization,” in Proceedings of the 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence), Hong Kong, China, June 2008. View at: Publisher Site | Google Scholar
  4. S.-Z. Zhao, P. N. Suganthan, and S. Das, “Self-adaptive differential evolution with multi-trajectory search for large-scale optimization,” Soft Computing, vol. 15, no. 11, pp. 2175–2185, 2011. View at: Publisher Site | Google Scholar
  5. J. Zhang and A. C. Sanderson, “Jade: adaptive differential evolution with optional external archive,” IEEE Transactions on Evolutionary Computation, vol. 13, no. 5, pp. 945–958, 2009. View at: Publisher Site | Google Scholar
  6. D. Molina, M. Lozano, and F. Herrera, “Ma-sw-chains: memetic algorithm based on local search chains for large scale continuous global optimization,” in Proceedings of the IEEE Congress on evolutionary computation, pp. 1–8, Barcelona, Spain, July 2010. View at: Publisher Site | Google Scholar
  7. Y. Wang and B. Li, “Two-stage based ensemble optimization for large-scale global optimization,” in Proceedings of the IEEE Congress on Evolutionary Computation, pp. 1–8, Barcelona, Spain, July 2010. View at: Publisher Site | Google Scholar
  8. P. Korošec, J. Šilc, and B. Filipič, “The differential ant-stigmergy algorithm,” Information Sciences, vol. 192, pp. 82–97, 2012. View at: Google Scholar
  9. A. LaTorre, S. Muelas, and J.-M. Pena, “Multiple offspring sampling in large scale global optimization,” in Proceedings of the 2012 IEEE Congress on Evolutionary Computation, Brisbane, Australia, June 2012. View at: Publisher Site | Google Scholar
  10. J. Brest, A. Zamuda, I. Fister, and M. S. Maučec, “Large scale global optimization using self-adaptive differential evolution algorithm,” in Proceedings of the IEEE Congress on Evolutionary Computation, pp. 1–8, Barcelona, Spain, July 2010. View at: Publisher Site | Google Scholar
  11. K. Zhang and B. Li, “Cooperative coevolution with global search for large scale global optimization,” in Proceedings of the 2012 IEEE Congress on Evolutionary Computation, pp. 1–7, Brisbane, Australia, June 2012. View at: Google Scholar
  12. M. N. Omidvar, X. Li, and X. Yao, “Cooperative co-evolution with delta grouping for large scale non-separable function optimization,” in Proceedings of the IEEE Congress on Evolutionary Computation, pp. 1–8, Barcelona, Spain, July 2010. View at: Publisher Site | Google Scholar
  13. A. LaTorre, S. Muelas, and J.-M. Peña, “Large scale global optimization: experimental results with mos-based hybrid algorithms,” in Proceedings of the 2013 IEEE congress on evolutionary computation, pp. 2742–2749, Cancun, Mexico, June 2013. View at: Publisher Site | Google Scholar
  14. Z. Yang, K. Tang, and X. Yao, “Large scale evolutionary optimization using cooperative coevolution,” Information Sciences, vol. 178, no. 15, pp. 2985–2999, 2008. View at: Publisher Site | Google Scholar
  15. F. Wei, Y. Wang, and Y. Huo, “Smoothing and auxiliary functions based cooperative coevolution for global optimization,” in Proceedings of the 2013 IEEE Congress on Evolutionary Computation, pp. 2736–2741, Cancun, Mexico, June 2013. View at: Publisher Site | Google Scholar
  16. H. Ge, M. Zhao, Y. Hou et al., “Bi-space interactive cooperative coevolutionary algorithm for large scale black-box optimization,” Applied Soft Computing, vol. 97, Article ID 106798, 2020. View at: Publisher Site | Google Scholar
  17. D. Molina, A. LaTorre, and F. Herrera, “Shade with iterative local search for large-scale global optimization,” in Proceedings of the 2018 IEEE Congress on Evolutionary Computation (CEC), pp. 1–8, Rio de Janeiro, Brazil, July 2018. View at: Publisher Site | Google Scholar
  18. A. A. Hadi, A. W. Mohamed, and K. M. Jambi, “Lshade-spa memetic framework for solving large-scale optimization problems,” Complex & Intelligent Systems, vol. 5, no. 1, pp. 25–40, 2019. View at: Publisher Site | Google Scholar
  19. V. A. Shim, K. C. Tan, and K. K. Tan, “A hybrid adaptive evolutionary algorithm in the domination-based and decomposition-based frameworks of multi-objective optimization,” in Proceedings of the 2012 IEEE Congress on Evolutionary Computation, IEEE, Brisbane, Australia, June 2012. View at: Publisher Site | Google Scholar
  20. A. LaTorre, S. Muelas, and J.-M. Peña, “A comprehensive comparison of large scale global optimizers,” Information Sciences, vol. 316, pp. 517–549, 2015. View at: Publisher Site | Google Scholar
  21. N. Hansen, “The CMA evolution strategy: a tutorial,” 2016, http://arxiv.org/abs/1604.00772. View at: Google Scholar
  22. A. Bordes, L. Bottou, and P. Gallinari, “Sgd-qn: careful quasi-Newton stochastic gradient descent,” Journal of Machine Learning Research, vol. 10, pp. 1737–1754, 2009. View at: Google Scholar
  23. D. P. Kingma and J. Ba, “Adam: a method for stochastic optimization,” 2014, http://arxiv.org/abs/1412.6980. View at: Google Scholar
  24. J. Grefenstette, “Rank-based selection,” Evolutionary Computation, vol. 1, pp. 187–194, 2000. View at: Google Scholar
  25. K. Tang, X. Li, P. N. Suganthan, Z. Yang, and T. Weise, “Benchmark functions for the CEC’2010 special session and competition on large-scale global optimization,” Tech. Rep., Nature Inspired Computation and Applications Laboratory, Hefei, China, 2009, Technical Report. View at: Google Scholar
  26. X. Li, K. Tang, M. N. Omidvar, Z. Yang, K. Qin, and H. China, “Benchmark functions for the CEC 2013 special session and competition on large-scale global optimization,” Gene, vol. 7, no. 33, p. 8, 2013. View at: Google Scholar
  27. J. Jin, C. Yang, and Y. Zhang, “An improved CMA-ES for solving large scale optimization problem,” in Advances in Swarm Intelligence, Y. Tan, Y. Shi, and M. Tuba, Eds., pp. 386–396, Springer International Publishing, 2020. View at: Publisher Site | Google Scholar
  28. H. Wang, Z. Wu, S. Rahnamayan, and D. Jiang, “Sequential de enhanced by neighborhood search for large scale global optimization,” in Proceedings of the IEEE congress on evolutionary computation, pp. 1–7, Barcelona, Spain, July 2010. View at: Publisher Site | Google Scholar
  29. D. Molina, M. Lozano, and F. Herrera, “MA-SW-chains: memetic algorithm based on local search chains for large scale continuous global optimization,” in Proceedings of the IEEE Congress on Evolutionary Computation, Barcelona, Spain, July 2010. View at: Publisher Site | Google Scholar
  30. T. Takahama and S. Sakai, “Large scale optimization by differential evolution with landscape modality detection and a diversity archive,” in Proceedings of the 2012 IEEE Congress on Evolutionary Computation, pp. 1–8, Brisbane, Australia, June 2012. View at: Publisher Site | Google Scholar
  31. S. Mahdavi, M. E. Shiri, and S. Rahnamayan, “Cooperative co-evolution with a new decomposition method for large-scale optimization,” in Proceedings of the 2014 IEEE Congress on Evolutionary Computation (CEC), pp. 1285–1292, Beijing, China, July 2014. View at: Publisher Site | Google Scholar
  32. S. Ye, G. Dai, L. Peng, and M. Wang, “A hybrid adaptive coevolutionary differential evolution algorithm for large-scale optimization,” in Proceedings of the 2014 IEEE Congress on Evolutionary Computation (CEC), pp. 1277–1284, Beijing, China, July 2014. View at: Publisher Site | Google Scholar
  33. G. Dai, X. Chen, L. Chen, M. Wang, and L. Peng, “Cooperative coevolution with dependency identification grouping for large scale global optimization,” in Proceedings of the 2016 IEEE Congress on Evolutionary Computation (CEC), pp. 5201–5208, Vancouver, Canada, July 2016. View at: Google Scholar
  34. A. W. Mohamed, “Solving large-scale global optimization problems using enhanced adaptive differential evolution algorithm,” Complex & Intelligent Systems, vol. 3, no. 4, pp. 205–231, 2017. View at: Publisher Site | Google Scholar
  35. A. W. Mohamed and A. S. Almazyad, “Differential evolution with novel mutation and adaptive crossover strategies for solving large scale global optimization problems,” Applied Computational Intelligence and Soft Computing, vol. 2017, Article ID 7974218, 18 pages, 2017. View at: Publisher Site | Google Scholar
  36. P. Yang, K. Tang, and X. Yao, “Turning high-dimensional optimization into computationally expensive optimization,” IEEE Transactions on Evolutionary Computation, vol. 22, no. 1, pp. 143–156, 2017. View at: Publisher Site | Google Scholar
  37. X. Peng, Y. Jin, and H. Wang, “Multimodal optimization enhanced cooperative coevolution for large-scale optimization,” IEEE Transactions on Cybernetics, vol. 49, no. 9, pp. 3507–3520, 2018. View at: Publisher Site | Google Scholar
  38. F. Wei, Y. Wang, and T. Zong, “Variable grouping based differential evolution using an auxiliary function for large scale global optimization,” in Proceedings of the 2014 IEEE Congress on Evolutionary Computation (CEC), pp. 1293–1298, IEEE, Beijing, China, July 2014. View at: Publisher Site | Google Scholar
  39. D. Molina and F. Herrera, “Iterative hybridization of de with local search for the cec’2015 special session on large scale global optimization,” in Proceedings of the 2015 IEEE congress on evolutionary computation (CEC), pp. 1974–1978, IEEE, Sendai, Japan, May 2015. View at: Publisher Site | Google Scholar
  40. M. N. Omidvar, B. Kazimipour, X. Li, and X. Yao, “CBCC3—a contribution-based cooperative co-evolutionary algorithm with improved exploration/exploitation balance,” in Proceedings of the 2016 IEEE Congress on Evolutionary Computation (CEC), pp. 3541–3548, Vancouver, Canada, July 2016. View at: Publisher Site | Google Scholar
  41. S. Salcedo-Sanz, C. Camacho-Gómez, D. Molina, and F. Herrera, “A coral reefs optimization algorithm with substrate layers and local search for large scale global optimization,” in Proceedings of the 2016 IEEE Congress on Evolutionary Computation (CEC), pp. 3574–3581, Vancouver, Canada, July 2016. View at: Publisher Site | Google Scholar
  42. M. Yang, M. N. Omidvar, C. Li et al., “Efficient resource allocation in cooperative co-evolution for large-scale global optimization,” IEEE Transactions on Evolutionary Computation, vol. 21, no. 4, pp. 493–505, 2016. View at: Publisher Site | Google Scholar

Copyright © 2021 Jin Jin. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Related articles

No related content is available yet for this article.
 PDF Download Citation Citation
 Download other formatsMore
 Order printed copiesOrder
Views299
Downloads592
Citations

Related articles

No related content is available yet for this article.

Article of the Year Award: Outstanding research contributions of 2021, as selected by our Chief Editors. Read the winning articles.