Research Article | Open Access
Zhongbo Hu, Shengwu Xiong, Qinghua Su, Xiaowei Zhang, "Sufficient Conditions for Global Convergence of Differential Evolution Algorithm", Journal of Applied Mathematics, vol. 2013, Article ID 193196, 14 pages, 2013. https://doi.org/10.1155/2013/193196
Sufficient Conditions for Global Convergence of Differential Evolution Algorithm
The differential evolution algorithm (DE) is one of the most powerful stochastic real-parameter optimization algorithms. The theoretical studies on DE have gradually attracted the attention of more and more researchers. However, few theoretical researches have been done to deal with the convergence conditions for DE. In this paper, a sufficient condition and a corollary for the convergence of DE to the global optima are derived by using the infinite product. A DE algorithm framework satisfying the convergence conditions is then established. It is also proved that the two common mutation operators satisfy the algorithm framework. Numerical experiments are conducted on two parts. One aims to visualize the process that five convergent DE based on the classical DE algorithms escape from a local optimal set on two low dimensional functions. The other tests the performance of a modified DE algorithm inspired of the convergent algorithm framework on the benchmarks of the CEC2005.
The differential evolution algorithm (DE) is a population-based stochastic parallel evolutionary algorithm. DE emerged as a very competitive form of evolutionary computing since it was proposed by Storn and Price in 1995 . DE or its variants have been achieving competitive ranking in various competitions held on the IEEE Congress on Evolutionary Computation (CEC) Conference Series [2, 3]. According to frequently reported comprehensive studies [4–6], DE outperforms many other optimization methods in terms of convergence speed and robustness over common benchmark functions. Compared to most other evolutionary algorithms, DE is much more simple and straightforward to implement, and has very few control parameters. Perhaps due to these advantages, it has got many practical applications, such as function optimization [7–11], multiobjective optimization , classification , and scheduling .
Theoretical studies of algorithms are very important to understand their search behaviors and to develop more efficient algorithms. With the popularity of DE in applications, more and more researchers pay attention to the theoretical studies on DE. According to the research contents, the main results of theoretical studies on DE can be divided into three classes as follows.
1.1. Researches on the Timing Complexity of DE
DE is a population-based stochastic search algorithm. Its runtime-complexity analysis is a critical issue. Zielinski et al.  investigated the runtime complexity of DE for various stopping criteria including a fixed number of generations () and maximum distance criterion (MaxDist). MaxDist means that algorithms stop the execution if the maximum distance from every vector to the best population member is below a given threshold.
1.2. Researches on the Dynamical Behavior of DE’s Population
This class focuses on investigating the evolving process of DE’s population. For instance, the development of the expected population variance and population distribution over time is an important issue. Zaharie [16–20] theoretically analyzed the influence of the variation operators (mutation and crossover) and their parameters on the expected population variance. In 2009, Zaharie  theoretically investigated the influence of the crossover operators (including classical binomial and exponential strategies) and the crossover probability on the expected population variance. Dasgupta et al. [22, 23] proposed a mathematical model of the underlying evolutionary dynamics of a one-dimensional DE-population, and the model showed that the fundamental dynamics of the each parameter vector in DE employs the gradient-descent type search strategy. Wang and Huang  developed a stochastic model of a one-dimensional DE-population to analyze the evolving process of the population distribution over time.
1.3. Researches on the Convergence Property of DE
This class investigates the limit behavior of DE’s population. The main issue is that under which assumptions can it be guaranteed that DE or its variants can reach an optimal solution . Technically speaking, commonly used concepts include convergence in probability, almost sure convergence, and convergence in distribution.
Xue et al.  performed a mathematical modeling and convergence analysis of continuous multi-objective differential evolution (MODE) under certain simplified assumptions, and this work was extended in . Zhao et al.  proposed a hybrid differential evolution with transform function (HtDE) and proved its convergence. Sun  developed a Markov chain modeling and proved that the classical DE does not hold with convergence in probability. He et al.  defined the differential operator (DO) as a random mapping from the solution space to the Cartesian product of solution space and analyzed the asymptotic convergence of DE by using the random contraction mapping theorem. Ghosh et al.  established asymptotic convergence behavior of a classical DE (DE/rand/1/bin) algorithm by applying the concepts of Lyapunov stability theorems. And the analysis is based on the assumption that the objective function has the following two properties, (1) the objective function has the second-order continual derivative in the search space, and (2) it possesses a unique global optimum in the range of search.
The studies of this paper are confined to the third class, convergence property of DE.
We note that the conclusion of [30, 31] is in contradiction with . According to the inference process, the asymptotic convergence in  refers to almost sure convergence. In fact, if DE does not hold with convergence in probability, then it does not hold with almost sure convergence. We also note that the value of the random mapping DO defined in  may be greater than 1, which is debatable. In , the asymptotic convergence analysis of DE/rand/1/bin, which was proved by applying Lyapunov stability theorems, should be a local convergence property. The reason is, according to Lyapunov stability theorems, the distribution of the initial population depends on the maximum region of the asymptotic stability. So for some functions, DE/rand/1/bin possesses asymptotic stability property if and only if initial individuals are closed enough to the global optimum. In addition, from the mutation operators of the classical DE, it can be derived that DE, if its population traps in a local optimum, cannot escape. This property was employed by  to prove that the classical DE does not possess global convergence in probability.
Taking into account that a convergent algorithm may have stronger robustness than a divergent one. Zhao et al.  developed a convergent algorithm, HtDE and proved its convergence. Zhan and Zhang  proposed a DE with random walk. Xue et al. [26, 27] analyzed MODE’s convergence. However, the conditions for global convergence of DE have not been explored. In this paper, the following problems will be addressed. (i)What are sufficient conditions for the global convergence of DE?(ii) What is the algorithm framework of the convergent DE? (iii)Which operators can assist the classical DE to hold with a certain asymptotic convergence?
The discussion in this paper will be undertaken in a general measurable space, and infinite production will be used as an analysis tool.
This paper is organized as follows. Section 2 introduces the classical DE. Section 3 proves a sufficient condition and a corollary for the convergence of DE to the global optima. Section 4 presents a DE algorithm framework satisfying the convergence conditions. Section 5 proves several operators satisfying the convergent algorithm framework. Section 6 gives numerical experiments to verify the robustness of the convergent DE. Section 7 analyzes and discusses the theoretical conclusions and the experimental results in detail. Section 8 summarizes this paper and indicates several directions for future research.
2. Classical Differential Evolution
DE is a competitive algorithm for solving continuous optimization problem. Consider the optimization problem: where is a measurable space and is the objective function (or the fitness of ) which satisfies that for any bounded , is bounded. The optimal solution set is denoted as , where is the optimum solution.
Let be a measure to space . Perhaps , which means that is a set with measure 0. This is not convenient to analyze. In view of the accuracy of practical problems, without loss of generality, we can consider an expanded set , where is a small positive value. We can choose an appropriate, which can meet the accuracy and make . We use () to replace the setin this paper. Meanwhile, in order to simplify the calculation, let us suppose that the search space , where , is the dimension of .
The classical DE [2, 33, 34] works through a simple cycle of reproduction and selection operators after initialization. The reproduction operator includes mutation and crossover operators. The classical DE for solving the above problem (1) can be described in detail as follows.(1) Initialization: Generate an initial population denoted by , and let .(2) Reproduction: Generate a trial population from the target population . Mutation: generate a new population from by a mutation operator, denoted by . Crossover: generate a new population from and by a crossover operator, denoted by , and let .(3) Selection: generate a new population from and by a selection operator, denoted by .(4) If the termination condition is satisfied, then stop; else let and ; then go to Step 2.
The initial population is generated by assigning random values in the search space to the variables of every solution.
2.1. Reproduction Operator
2.1.1. Mutation Operator
After initialization, DE creates a donor vector corresponding to each individualin the th generation through the mutation operator. Several most frequently referred mutation strategies are presented as follows: DE/rand/1: , DE/best/1: , DE/cur-to-best/1: , DE/best/2: , DE/rand/2: , where denotes the best individual of the current generation, the indices are uniformly random integers mutually different and distinct from the running index , NP is population size, and is a real parameter, called mutation factor or scaling factor.
2.1.2. Crossover Operator
Following mutation, the crossover operator is applied to further increase the diversity of the population. In crossover, the target vector, , is combined with elements from the donor vector, , to produce the trial vector, , using the binomial crossover, where is the probability of crossover and is a random integer in . Unless otherwise mentioned, is a uniformly distributed random number confined in the range .
2.2. Selection Operator
Finally, the selection operator is employed to maintain the most promising trial individuals in the next generation. The classical DE adopts a simple selection scheme. It compares the objective values of the target vector and trial vector . If the trial individual reduces the value of the objective function then it is accepted for the next generation; otherwise the target individual is retained in the population. The selection operator is defined as
3. Convergence Condition
There are different kinds of definitions of convergence for analyzing asymptotic convergence of algorithms. The following definition of convergence, that is, convergence in probability, is used in this paper.
Definition 1. Let be a population sequence generated by using DE to solve the optimization problem (1). Then DE converges to the global optimum, if and only if
Let us give a sufficient condition for the convergence of DE.
Theorem 2. Consider using DE to solve the optimization problem (1). In the target population , there exists at least one individual , which corresponds to the trial individual by a reproduction operator, such that
and the series diverges; then DE converges to the optimal solution set.
Where denotes any subsequence of natural number set, denotes the probability that belongs to the optimal solution set , and is a small positive value which may change as .
Proof. In DE, each target individual corresponds to a trial individual by its reproduction operator. According to the condition of Theorem 2, we can get the probability that all the individuals of the trial population do not belong to the optimal solution set : so, we can get the probability that all the individuals of every trial population in previous iterations do not belong to the optimal solution set And because of the elitist selection operation in DE, the optimal individual of trail populations will retain the next generation population. So we can get the probability that the th population does not contain optima: So for the classical DE with elitist selection, we have And from the property of the infinite product : So for the divergent series , we can get that According to Definition 1, this theorem holds.
Corollary 3. In Theorem 2, if equals ever to a positive constant , then DE converges to the optimal solution .
Proof. Obviously, the series diverges when equals ever to a positive constant . From Theorem 2, we can get that DE converges to the optimal solution .
Now we give several observations to the above conditions as follows.(i)Theorem 2 means that if the probability entering into the optimal set in a certain sub-sequence population is large enough, then the modified DE converges to the global optimal set in probability. And the population states need no ergodicity. (ii)Corollary 3 is just a special case of Theorem 2 and is very easy to check. There are some improved DE algorithms such as HtDE proposed by Zhao et al. , DE-RW proposed by Zhan and Zhang , DE-MC proposed by Braak , which satisfies the convergence condition of Theorem 2 (or Corollary 3). (iii)He and Yu  and Rudolph  presented several important conclusions on convergence conditions for evolutionary algorithms. These conclusions do apply to DE algorithm. However, Comparing with these conclusions, Theorem 2 is more relaxed and easier to check.
4. Algorithm Framework Possessing Convergence
As the introduction section analyzed, it cannot be guaranteed that the classical DE holds with the global convergence. However, DE can converge to the global optimal solution if its reproduction operation satisfies the sufficient conditions given in Theorem 2 or Corollary 3. A DE algorithm framework integrating an extra mutation component will be given in this section. Owing to the fact that the purpose of using the extra mutation is to assist the classical DE to converge, this paper addresses to the operator as AsCo-mutation operator.
According to the sufficient conditions proved above, we can define the AsCo-mutation operator as follow.
Definition 4. AsCo-mutation is a mutation operator assisting the classical DE to converge. It satisfies the following conditions.(1) To a certain sub-sequence of population sequence , AsCo-mutation changes at least one individual in each with a positive probability.(2) Let denote the population generated by using AsCo-mutation; there exists at least one individual in , such that
and the series diverges.
Taking into account the fact that the algorithm framework using AsCo-mutation will contain some convergent algorithms of DE family, this paper addresses to the algorithm framework as CDE. The algorithm framework CDE can be described as follows.(1) Initialization: generate an initial population denoted by , and let .(2) Reproduction: generate a trial population from the target population . Mutation: generate a new population from by a mutation operator, denoted by . Crossover: generate a new population from and by a crossover operator, denoted by . AsCo-mutation: if the certain condition generating sub-sequence population is satisfied, then generate a new population from by AsCo-mutation and let ; otherwise, let .(3) Selection: generate a new population from and by a selection operator, denoted by .(4) If the termination condition is satisfied, then stop; else let and ; then go to Step 2.
On the basis of DE, the reproduction operator of CDE increases a step, AsCo-mutation. Obviously, the algorithm framework CDE satisfies Theorem 2 when the AsCo-mutation satisfies the Definition 4. That is to say, CDE, which employs the AsCo-mutation given by the Definition 4, converges to the global optimum.
5. Several Mutation Operators Satisfying Convergence Condition
Like DE algorithm, most evolutionary algorithms for numerical optimization problems use vectors of floating point numbers for their chromosomal representations. For such representations, many mutation operators  have been proposed. The most common mutation operators include Uniform mutation  and Gaussian mutation [41, 42]. We introduce these operators and prove that they meet the definition of AsCo-mutation for CDE in turn.
5.1. Uniform Mutation
Uniform mutation replaces the solution vector with a uniformly distributed random vector confined in the domain . Each component of the vector is a uniformly distributed (independent identically distributed) random number from . So the density function of can be expressed as:
As shown in the CDE algorithm framework, suppose that AsCo-mutation operator employed by CDE is Uniform mutation. Let denote the new individual generated by Uniform mutation; then the probability that belongs to the optimal solution set can be calculated as follow:
The method that CDE uses Uniform mutation is flexible, such as mutating an arbitrary individual selected from the set at a given probability and mutating more than one individual. Let denote the number of mutated individuals, then the probability that at least one of belongs to the optimal solution set can be calculated as follow: where the is an empirical probability, , and the diversity of the population will gradually enhance as increases.
In addition, the implementation of Uniform mutation operator can be also flexible. For example, in order to keep the tradeoff between exploration and exploitation, this paper presents the following operator.
DE/um-best/1: where rand(0,1) denotes a uniform random number in . The , are boundary individuals at a given probability , each element of which equals either the upper boundary or the lower boundary value. The , are uniform random integers in . That is, when the index is no less than NP, will takes a boundary individual. Obviously, if takes the upper boundary value of the th dimension while takes the lower boundary value (and vice-versa), then the element is ergodic in the th dimension. Therefore the individualcan be ergodic in the search space like Uniform mutation operator.
5.2. Gaussian Mutation
Gaussian mutation modifies all components of the solution vector by adding a random noise: where is a vector of independent random Gaussian numbers with a mean of zero and standard deviations . The density function of can be expressed as:
Now, let us suppose that AsCo-mutation operator employed by CDE is Gaussian mutation. Then the probability that generated by Gaussian mutation belongs to the optimal solution set can be calculated as follow: where.
On the other hand, for any individual , such that . So Implying that
Like uniform mutation, the used method of Gaussian mutation is flexible. As before denotes the number of individuals mutated by Gaussian mutation operator, denotes the probability that each individual is mutated, and denotes the probability that at least one of belongs to the optimal solution set. Then can be calculated as follow:
Theorem 5. DE algorithm employing uniform mutation or Gaussian mutation operator converges in probability to the global optimum of the optimization problem (1).
6. Experimental Verification
It is proved in the previous sections that CDE algorithms possess convergence in probability, which only means it can be guaranteed that CDE algorithms reach an optimal solution when the iteration times approaches infinity, but does not mean that CDE can find out the optimal solution within finite iteration times. However, a convergent algorithm should generally hold stronger robustness. Thus this section gives experiments by being composed of two parts to verify CDE’s robustness. One aims to visualize the process escaping from a local optimal set of CDE on two low dimensional functions. The other is conducted to test a modified DE algorithm, which is inspired of the above convergence theory, on the benchmark functions of the CEC2005.
6.1. Experiments on Low Dimensional Functions
To achieve the aims mentioned above, experiments are conducted on two numerical functions which are chosen according to the experimental results of [43–46]. One is the DE deceptive function , which can lead the classical DE to trap in the local optimum. The other is the Rastrigin function. In [45, 46], nineteen benchmark functions including the rastrigin function were tested using the classical DE. Those results indicated that the optimization effect of the rastrigin function is one of the worst.
6.1.1. Deceptive Function
Consider where the function is given by
The landscape of DE deceptive function is shown in Figure 1. The global optimum of the function is with the function value . There is a deceptive local minimum with function value in this test function.
6.1.2. Rastrigin Function (2 Dimensions)
The global optimum of the function is with the function value . There are many local optima in this test function.
Let CDE-um denote the CDE algorithm using uniform mutation operator. Suppose that CDE-um mutates the worst individual of at probability 1, and the new individual is directly retained to the next generation. Experiments were conducted to compare five typical versions of the classic DE with CDE-um algorithm. All experiments were implemented for 50 independent replications. The convergence times and convergence ratio on the 50 replications were reported.
In order to show the robustness of CDE-um, we reported the number of function evaluations (FES) to achieve the Ter_Err within Max_FES. Table 1 gave the FES of 50 independent replications of five typical versions on the DE’s deceptive function, while Table 2 reported the FES on the Rastrigin function. Those typical versions included DE/best/1 versus CDE-um/best/1, DE/rand/1 versus CDE-um/rand/1, DE/cur-to-best/1 versus CDE-um/cur-to-best/1, and DE/best/2 versus CDE-um/best/2, as well as DE/rand/2 versus CDE-um/best/2. Table 3 analyzes the results of Tables 1 and 2. From the statistics of Table 3, we can see that the ratio (ConRa) converging to the optimum of CDE-um is much higher than the corresponding DE.
| “FES” denotes the number of function evaluations. “—” indicates that the algorithm cannot find the optimum within Max_FES. |
| “FES” denotes the number of function evaluations. “—” indicates that the algorithm cannot find the optimum within Max_FES. |
|“RTs” denotes runing times. “CTs” denotes convergence times. “ConRa” denotes convergence ratio.|
Figures 2, 3, and 4 showed the convergence times graphs in 50 independent replications within 800 iterations for DE’s deceptive function. We can see that all the convergence curves hold two common characteristics as follows.(i)When the iteration times are smaller, the convergence times of five typical versions of the classical DE are slightly larger than the corresponding CDE-um. However, with the iteration times are increasing, the convergence times of CDE-um will become far larger than the corresponding DE. From this we can see that smaller increasing in the computational cost can make a greatly improving on the robustness of CDE-um algorithm. (ii)When the iteration times are larger, all the convergence graphs of five typical versions of the classical DE become a straight line. However, all the graphs of CDE-um show the ladder’s rising status. This indirectly shows that the classical DE cannot escape from a local optimal set or a premature solution set if trapping in, but CDE-um enhances the ability to escape from the local optimal set or premature solution set.
The convergence graphs on the rastrigin function had the similar characteristics with DE’s deceptive function, so the graphics are omitted here.
The population size is set to . The maximum number of function evaluations (Max_FES) is set to 5,000,000.(i)Mutation factor, [44, 47]; (ii)Crossover probability, [44, 47]; (iii)Termination error value (Ter_Err), Ter_Err.
6.2. Experiments on Functions of CEC2005
Wang et al.  presented a composite differential evolution algorithm (CoDE), which employed three trial vector generation strategies, that is, rand/1/bin, rand/2/bin, and current-to-rand/1. The experimental studies on the 25 benchmark functions of CEC2005 have indicated that CoDE’s overall performance was better than the other seven outstanding competitors (please refer to  for details). Now we give a convergent CoDE algorithm (CCoDE-umbest) based on the above convergent algorithm framework. The CCoDE-umbest algorithm has the DE/um-best/1 operator, which was presented in Section 5.1, instead of the current-to-rand/1 of CoDE.
This paper compared CCoDE-umbest with CoDE on the 25 benchmark functions of CEC2005. Table 4 reported the average and standard deviation of the function error values obtained in 25 runs when FES = 1.5E + 5 and FES = 3.0E + 5, respectively. The two bottom lines in Table 4 gave the test statistics for sign test  on the mean errors. From Table 4, the probability values (0.012 for FES = 1.5E + 5, 0.041 for FES = 3.0E + 5) supporting the null hypothesis are less than the significance level at 0.05. So we can reject the null hypothesis, that is to say, the overall performance of CCoDE-umbest is better than CoDE on the benchmarks. It implies that the use of the convergent algorithm framework can improve the performance of CoDE.