Abstract
For largescale optimization, CMAES has the disadvantages of high complexity and premature stagnation. An improved CMAES algorithm called GIES was proposed in this paper. For the problem of high complexity, the method in this paper replaces the calculation of a covariance matrix with the modeling of expected fitting degrees for a given covariance matrix. At the same time, to solve the problem of premature stagnation, this paper replaces the historical information of elite individuals with the historical information of all individuals. The information can be seen as approximate gradients. The parameters of the next generation of individuals are generated based on the approximate gradients. The experimental results were tested using CEC 2010 and CEC2013 LSGO benchmark test suite, and the experimental results verified the effectiveness of the algorithm on a number of different tasks.
1. Introduction
Most machine learning problems can be modeled as optimization problems, and as the amount of data increases, the model becomes more and more complex. Therefore, the problem of largescale optimization has become the focus of most researchers. The covariance matrix adaptation evolutionary computation (CMAES) [1] is one of the most powerful evolutionary strategies for global optimization. It is an algorithm based on the population probability distribution, very similar to the estimation of distribution algorithm (EDA) [2]. A common shortcoming of simple evolutionary strategies is that the noise parameters of the standard deviation are fixed. CMAES can automatically adjust the standard deviation according to the distribution of the population, which brings two benefits:(1)Increasing the diversity of the population, so that the algorithm can jump out of the local optimal solution(2)Adjusting the standard deviation parameters so that the algorithm can adaptively adapt to the fitness terrain
In addition, because CMAES can use the information of the optimal solution to adjust its parameters at the same time, it can expand the search scope when the optimal solution is far away or narrow the search scope when the optimal solution is near. Due to these advantages, CMAES algorithm, as one of the most popular gradientfree optimization algorithms, has become the choice of many researchers and practitioners.
Despite its huge advantage in solving optimization problems, CMAES suffers some limitations when dealing with LSOPs: Timeconsuming: because the basic operation of CMAES is based on covariance, and each generation takes time and space, sampling the new population requires additional calculations to decompose the matrix. CMAES relies on spectral decomposition when dealing with numerical errors and illconditioned conditions, which is generally considered to be computationally inefficient compared to other decomposition techniques. Lack of diversity in population: another obvious disadvantage is that CMAES assessed some of the best individuals. Although it can speed up convergence to some extent, this strategy discards most of the information. This prevents the CMAES from performing well when dealing with illconditional problems. We all know that great people in life have certain qualities that we can learn from, but some people who fail also keep a record of “not doing” something. It is important for the next generation to make better calculations and evaluations.
These two limitations may prevent the use of CMAES for LSGOs.
To solve the above problems, this paper proposes an evolutionary strategy based on gradient information utilization (GIES), which extends the application of CMAES in the field of largescale optimization. The characteristics of GIES are as follows:(1)The fitness scores were optimized for each sampling scheme for each GIES. The bestperforming scheme is likely to perform better in the sampled generation if the expected results are good enough. Maximizing the expected fit score of a sampling scheme is the same as maximizing the overall fit score of the sample in a given sample.(2)The gradient information is simulated using individual information, and the approximate gradient information is used to guide the search.(3)GIES uses an approximate gradient for the search directions, allowing the algorithm to adapt to the fitness landscape, on which the variables depend. This process generates a fitness score evaluation based on the expected fitting score. The gradient signal is obtained by using the maximum likelihood estimation method of the model described above. It differs from traditional evolutionary computation in that it represents the “population” as a parameterized distribution. It uses a search gradient to update the parameters of the distribution, which can be calculated using the adaptive values of historical data.
Following the introduction section, we first discuss the related work of largescale optimization and the utilization of historical information strategies for evolutionary computation in Section 2. In Section 3, the background and motivation of this paper are introduced. Section 4 describes the detailed implementations of GIES. Thereafter, the simulation results on the benchmark test suites are examined to evaluate the effectiveness of the proposed approach in Section 5. Finally, Section 6 summarizes this paper.
2. Related Work
In recent years, largescale optimization has become a hot research topic, and many largescale benchmark functions have been put forward to examine the merits of largescale optimization algorithms. Researchers have done a lot of useful work and proposed many solutions to solving largescale global optimization problems.
There are currently two main research directions for largescale optimization problems:(1)Decompositionbased algorithm: the dimensionality reduction (decomposition) is carried out for largescale problems in the form of grouping, so as to decompose the largescale problems into multiple subproblems to solve. Multiple subproblems are optimized using evolutionary algorithms within the framework of cooperative coevolution (CC).(2)Instead of directly decomposing largescale problems, optimization is carried out by combining multiple local search algorithms, each with its own set of parameters.
The rest of this section will further discuss some algorithms in the recent CEC largescale optimization competition.
CEC2008 is the first known LSGO competition. Multiple trajectory search (MTS) [3] was the first winner. Then, MTSLS1, MTSLS2, and MTSLS3 are the improved version of MTS. Some DEbased algorithms have achieved good results in LSGO competitions. Selfadaptive differential evolution with multitrajectory search (SaDEMMTS) [4] is a hybridization algorithm, which integrates JADE [5] and modified MTSLS1. MASWChains [6] is an extension of MACMAChains, where CMA was replaced with Solis Wets (SW).
In the CEC2010 competition, MASWChains was the winner. Ensemble Optimization Evolutionary Algorithm (EOEA) [7] was the secondranked algorithm. In EOEA, the optimization process is divided into two stages, namely, global shrinking and local exploration. In EOEA, EDA based on the Mixed Gaussian and Cauchy Model (MUEDA) is used to achieve more quickly convergence. The goal of the second stage is exploitation. The third place in CEC2010 is the Differential Ant Colony Algorithm (DASA) [8]. It tries to solve LSGO by converting the real parameter optimization problem into a graph search problem.
Improved multiple offspring sampling (MOS) [9] was the winner of CEC2012. MOS combines SW and MTSLS as two local searches. Selfadaptive differential evolution algorithm (jDElsgo) was proposed in [10]. After continuous improvement, jDElsgo was ranked second in CEC2012. The third rank was cooperative coevolution evolutionary algorithm with global search (CCGS). CCGS is considered an extension of EOEA [11]. Cooperative coevolution with delta grouping (DECCDML) was proposed in [12] to enhance the performance of CC framework on nonseparable problems.
CEC2013 uses a new benchmark functions. The winner of CEC2013 competition was modified MOS [13]. DECCG [14] was the reference algorithm in CEC2013. The secondranked algorithm was smoothing and auxiliary functionbased cooperative coevolution for global optimization (SACC) [15]. SACC adopted parallel search for the first time under the CC framework.
Bispace interactive cooperative coevolutionary algorithm (BICA) [16] is a twospace coevolutionary algorithm framework that evolves in two spaces. The model evolves to provide better grouping, and the individual evolves to achieve better adaptability. SHADE with Iterative Local Search (SHADEILS) [17] iteratively combines the modern differential evolution algorithm with a local search method selected from various search methods. The selection of local search methods is dynamic and takes into account the improvements they have made in the previous enhancement phase to determine the best method for the problem in each situation. In LSHADESPA [18], differential evolution with linear population size reduction is used for global exploration, while a modified version of multitrack search is used for local exploitation.
A hybrid adaptive evolutionary differential evolution (HACCD) was also proposed in CEC2014 [19]. HACCD belongs to the CC class of algorithms. JADE and SaNSDE are used as CC subcomponent optimization algorithms. Scaling up Covariance Matrix Adaptation Evolution Strategy using cooperative coevolution (CCCMAES) is another CCbased competitive algorithm. The basic optimizer in CCCMAES is CMAES.
For a detailed overview of the stateoftheart largescale optimization algorithms, please refer to [20]. This paper proposes an algorithm for largescale optimization problems, called GIES. The algorithm has been verified on the CEC2010 and CEC2013 test sets. The experimental results verify the potential value of the algorithm.
3. Background and Motivation
This section briefly introduces the background of CMAES algorithm. With that in mind, the paper focuses on the utilization of information by the CMAES algorithm. Based on the analysis of different information utilization methods, this paper proposes a model of gradient information utilization.
3.1. CMAES
The CMAES algorithm is an evolutionary strategy algorithm proposed by Hansen et al. [1]. CMAES algorithm uses a Gaussian distribution to sample the solution space of the optimization problem. The parameters of the distribution are updated according to a sample selection mechanism. Iterate through the update process until all the conditions are met.
For the objective function , CMAES generates a new generation of population by estimating the distribution of the objective function; that is, the points in the new population are obtained from the normal distribution , where represents the mean value, is a positive definite matrix, and this matrix is called the covariance matrix. In the algorithm, the covariance matrix is continuously adjusted to make the distribution of the search points closer to the equipotential line of the target function; that is, the ideal covariance matrix should be the inverse matrix equal to the hessian matrix, though this is difficult to achieve for more complex functions. Next, we briefly introduce the basic ideas behind CMAES.
3.1.1. Sampling
For a given objective function, CMAES first assumes the existence of an optimal fitness terrain to make the search move in the optimal direction. For convex quadratic function, the correlation between variables is linear and can be completely eliminated. So CMAES treats these functions “as” spherical functions and effectively solves them. This strategy is also applicable to general blackbox objective functions, because their local landscapes can be approximated as convex quadratic functions. In general, the optimal covariance is not known. CMAES sampling through the multivariate Gaussian distribution . is the search step size, which is used to control the local search capability of Gaussian distribution. The number of samples is used for each sample, and then the fitness value of the samples is calculated. In each generation, CMAES provides a multivariate normally distributed parameter for sampling the next generation.
3.1.2. Update
The CMAES algorithm updates the parameters (, and ). Update operations are more complex. The new mean was updated with the excellent individuals obtained in the generation. The contribution of individuals to the mean value is considered by inertia weighting.wherewhere represents individuals with the highest fitness ranking among samples. Because the weight of each sample is different, the weight of the sample with better fitness is bigger.
There are two ways to update covariance: and . This paper adopts update strategy. The update strategy of covariance matrix is as follows:where is the ndimensional integer matrix, is the learning rate, the learning rate range is [0, 1], and is the evolutionary path, that is, the memory of mean deviation will decay along with the optimization process. Therefore, CMAES algorithm is used to model the variance matrix of multivariate Gaussian distribution, instead of using maximum likelihood estimation to model the sample. In this way, the successful step size update in the previous step will be highly likely to appear again in the next generation. For the detailed method of updating, see [21].
The step size isamong which is the evolutionary path.
3.2. Motivation
Because CMAES can use the information provided by the optimal solution to adjust its mean value and variance simultaneously, it can expand the search scope when the optimal solution is far away, or narrow the search scope when the optimal solution is near. But the covariance matrix takes in space and time. Throughout the iteration process, only the individuals with higher fitness ranking are considered. When faced with some simple problems, the algorithm can quickly converge, but when faced with largescale optimization problems, this can quickly lead to premature stagnation.
Survival of the fittest is an important part of evolutionary computing, but the diversity of populations is crucial to the performance of the algorithm. Through evolution, the traditional CMAES preserves the optimal individuals and influences the distribution of the next generation by learning the individuals of the optimal parts. That is, in the evolutionary past, populations tended more and more toward “elite” individuals than toward “bad” individuals. But the bad ones retain key information about what not to do. Success studies are everywhere, but what is really useful is often the experience of unsuccessful people, because it contains longterm life observations and lessons. These constitute the motivation for writing this article.
4. Proposed Approach
The proposed algorithm GIES adopts the basic framework of CMAES but makes some improvements to CMAES. In the proposed scheme, keep all the information about each scheme available to each generation, good or bad. With these gradient signal assessments, we can move the whole scheme in a better direction for the next generation. Since we need to evaluate the gradient, we can use the standard stochastic gradient descent algorithm (SGD) applied to deep learning [22].
4.1. GIES
The fitness score was optimized for each sampling scheme in GIES. If the expected results are good enough, the bestperforming scheme in the sampling generation may perform better. Maximization of the expected fitness score of a sampling scheme is actually equivalent to maximization of the overall fitness score.
Assuming that is the sampling scheme vector of the probability distribution function , we can define the expected value of the objective function F aswhere represents the parameter of the probability distribution function. For example, if is a normal distribution, is and . For our simple twodimensional problem, every whole is a twodimensional vector . Using the same logarithm likelihood method as in REINFORCE, we can calculate the gradient of :
In a sample of size , we have scheme , so that the gradient can be estimated by summation:
With the above gradient, we can use a learning rate of alpha (say 0.01) and start our theta optimization of the probability distribution function so that our sampling scheme gets a higher fitness score on the target function F. Using SGD or Adam [23] algorithm, we can update in the next generation:
After the probability distribution function is updated, the new competitive scheme can be sampled until an appropriate solution is obtained. Since relevant parameters are not used, the efficiency of this algorithm is .
GIES adopts an approximate gradient as the direction of search. It represents the “population” in traditional evolutionary computation as a parameterized distribution .
4.1.1. Multinormal Distribution
In this proposed method, multivariate normal distribution is used as the parameterized distribution. The parameter of multivariate normal distribution is . is mean of multivariate normal distribution, and is the covariance matrix. To sample more efficiently, we need a matrix , meeting . Then, the problem can be simplified to a standard multivariate normal distribution problem. represents the probability density function of the multinormal distribution.
To calculate the gradient information of the multivariate Gaussian variable, the logarithm of the probability density is obtained, so that the gradient can be estimated by summation:so and can be obtained in (15) and (14).
Then, update the parameters with the calculated gradient information.
To adapt to exploring or mining solution space, the algorithm can change the solution distribution according to the parameters of the solution that it is exploring.
4.1.2. The Technique of GIES
Further, A can be decomposed into a scale parameter , and a normalized covariance factor satisfying . This decoupling form of two orthogonal components can be independently learned.
The advantage of overall information utilization is to prevent information loss, but outliers still need to be considered. Therefore, this paper adopts the method of fitting degree shaping to solve outliers dominant situation [24]. In this method, according to the fitness value information, the population individuals are ranked according to the fitness from small to large. Calculate the utility value according to the fitness value .
The complexity of each covariance matrix update is . The complexity can be reduced to by calculating the update of local nonexponential coordinates. In this case, the update of gradient information can be decomposed into the following components:
The pseudocode of GIES is given in Algorithm 1.

5. Experiments and Analysis
In this section, GIES is used to compare with the stateoftheart algorithms to verify the effectiveness of the proposed algorithm. Three performance analysis experiments were performed. First, GIES was evaluated using CEC2010. Second, CEC2013 is used to evaluate the GIES and compared with nine stateoftheart algorithms. Finally, a parametric analysis was performed to study the effect of each component in GIES.
The first experiment is CEC2010, the second experiment is CEC2013, and the third experiment is statistical analysis. The benchmark functions are shown as follows. The dimensions (D) of all functions are 1000 except for two overlapping functions, F13 and F14 in CEC2013, where D is 905.(1)The CEC2010 contains 20 test questions. These test functions can be divided into four classes:(i)F1–F3: separable functions(ii)F4–F8: partially separable functions, in which a small number of variables are dependent, while all the remaining ones are independent ()(iii)F9–F18: partially separable function that consists of multiple independent subcomponents, each of which is mnonseparable ()(iv)F19–F20: fully nonseparable functions For more detailed features, please refer to [25].(2)The CEC2013 contains 15 test questions:(i)F1–F3: separable functions(ii)F4–F11: partially separable functions(iii)F12–F14: overlapping functions (D = 905)(iv)F15: fully nonseparable functions
For more detailed features, please refer to [26].
To make the experimental data more accurate, each experiment was run 25 times to record the statistical results. The solution error measure was recorded at the end of each run; is the wellknown global optimum of each function. The maximum number of iterations is set to 3.0E + 6 according to the default value of test suite.
5.1. Parametric Analysis
Most of the parameters of GIES are the same as reference [27]. In the algorithm, the number of population and the learning rate of gradient information are the parameters that need to be specified artificially.
In Section 4, we stated that GIES represents a mixed effect of three main components which are as follows: (1) the fitness scores were optimized for each sampling scheme for each GIES, (2) the gradient information is simulated using individual information, and the approximate gradient information is used to guide the search, and (3) GIES uses an approximate gradient for the search directions.
To further verify the performance of the algorithm, we analyzed each part of the algorithm to show the individual effect of each components. Table 1 illustrates the mean values of each component, and best values are marked in bold. It can be seen from the results that the GIES algorithm with three mechanisms works best.GIES (1): the fitness scores were optimized for each sampling scheme for each GIES. GIES (2): the gradient information is simulated using individual information, and the approximate gradient information is used to guide the search. GIES (3): GIES uses an approximate gradient for the search directions.
5.2. Evaluation Criteria
To evaluate the performance of GIES, we apply the same methods in [18] such that three evaluation criteria were used. The first is Formula One Score (FOS). Formula One Score was used in the latest LSGO competition (CEC2015). According to this criterion, the algorithms will be ranked from best to worst. Then, the top 10 ranks will get 25, 18, 15, 12, 10, 8, 6, 4, 2, and 1, respectively. Algorithms ranked outside the top 10 will get zero. Maximum values of R indicate better performance. The second and third are two nonparametric statistical hypothesis tests: Friedman test using as a significance level.
5.3. Performance Analysis Using CEC2010 and CEC2013
Statistical results of GIES using CEC2010 and CEC2013 are illustrated in Tables 2 and 3, respectively. Figure 1 illustrates the convergence behavior of GIES using sample functions from each class in CEC2013: f3 as fully separable, f8 and f11 as partially separable functions, f12 and f14 as overlapping functions where D = 905, and f15 as fully nonseparable functions.
(a)
(b)
(c)
(d)
(e)
(f)
CMAES variants for solving largescale optimization are used as comparison algorithms. Some other algorithms are not derived from CMAES but are the stateoftheart ones, for example, CCbased differential evolution (DECCG) [14], the multiple offspring sampling (MOS) [9]. The comparison algorithm is shown in Tables 4 and 5. All of these algorithms followed the same CEC2010 and CEC2013 guidelines. The results of the comparative experiment are recorded in Tables 6 and 7.
Tables 8 and 9 summarize the ranking for GIES and the stateoftheart algorithms using Formula One Score (FOS). Tables 10 and 11 summarize the ranking obtained using Friedman’s test.
5.3.1. Formula One Score (FOS)
As shown in Table 8 and 9, GIES ranked second in all comparison algorithms of CEC2010 using Formula One Score (FOS) and first for CEC2013. Regarding CEC2010, the best algorithm is MMOCC with 237 points. Comparing the winners of CEC2010 and CEC2012: MASWchains and MOS2012 get 115 and 168 points. This shows that GIES is competitive. Using CEC2013, GIES gets points, followed by MOS2013, and VGDE, with 218 and 196 points, respectively.
5.3.2. Friedman Test
According to Friedman test illustrated in Tables 10 and 11, GIES obtained the best ranking for both CEC2010 and CEC2013 benchmarks. Using CEC2010, GIES gets 5.33 points, jDEsps gets 6 points, MMOCC gets 7.75, and MASWchains gets 7.95 points. While using CEC2013, GIES gets 3 points, MOS2013 gets 3.57 points.
From the previous comparison, high ranking using Formula One Score (FOS) does not guarantee the same ranking using Friedman test.
In the Friedman test, the critical value is 16.92, and value is 1.64E − 05. These indicate that there are significant differences between different algorithms.
High ranking using Formula One Score (FOS) does not guarantee the same ranking using Friedman test. Because more weight will be given for the top positions.
5.4. CPU Computational Time
Previous experimental results have evaluated the effectiveness of GIES. This section illustrates the analysis of the algorithm. The runtime of the algorithm is shown in Table 12. Table 12 records the average running time of GIES and the baseline versions of CMAES and MASWChain CEC2010 LSGO. The dimensions are 1000.
It can be seen from the results that, for the separable functions f1–f3, the runtime of GIES is slightly better than that of MASWChain and CMAES. The runtime difference between the three algorithms is not obvious. For partially separable functions (f4–f18), the running time of GIES is significantly better than that of CMAES and MASWChains. In terms of fully nonseparable functions, CMAES performs better than GIES. In general, GIES offers better performance on the part of the separability problem.
6. Conclusion
This research has shown that the use of guiding gradient information improves the performance of evolutionary computations. The problem of low utilization of historical information by ES was solved by guiding the information generated by the distribution of nextgeneration solutions. The guidance information was obtained by approximating the gradient. This strategy not only increased the diversity of information, but also made full use of the optimal information in the heuristic algorithm. The theoretical analysis and experimental results showed that this method incorporating guidance information is accurate and stable.
The experimental results showed that the use of guiding information is effective. The algorithm was also compared with other stateoftheart metaheuristic algorithms and demonstrated good average performance and pairwise comparison performance across a wide range of test functions.
The experiments showed that this algorithm is an effective global optimization method for largescale problems, which makes it applicable to a large number of practical applications. The principle behind the use of guidance information is simple, but effective, and has a certain guiding significance for heuristic optimization algorithms.
Data Availability
The data are available at https://titan.csit.rmit.edu.au/∼e46507/publications/lsgocec10.pdf.
Conflicts of Interest
The author declares that there are no conflicts of interest.