Journal of Applied Mathematics

Journal of Applied Mathematics / 2013 / Article

Research Article | Open Access

Volume 2013 |Article ID 193196 | https://doi.org/10.1155/2013/193196

Zhongbo Hu, Shengwu Xiong, Qinghua Su, Xiaowei Zhang, "Sufficient Conditions for Global Convergence of Differential Evolution Algorithm", Journal of Applied Mathematics, vol. 2013, Article ID 193196, 14 pages, 2013. https://doi.org/10.1155/2013/193196

Sufficient Conditions for Global Convergence of Differential Evolution Algorithm

Academic Editor: Yongkun Li
Received06 Jul 2013
Revised11 Aug 2013
Accepted27 Aug 2013
Published21 Oct 2013

Abstract

The differential evolution algorithm (DE) is one of the most powerful stochastic real-parameter optimization algorithms. The theoretical studies on DE have gradually attracted the attention of more and more researchers. However, few theoretical researches have been done to deal with the convergence conditions for DE. In this paper, a sufficient condition and a corollary for the convergence of DE to the global optima are derived by using the infinite product. A DE algorithm framework satisfying the convergence conditions is then established. It is also proved that the two common mutation operators satisfy the algorithm framework. Numerical experiments are conducted on two parts. One aims to visualize the process that five convergent DE based on the classical DE algorithms escape from a local optimal set on two low dimensional functions. The other tests the performance of a modified DE algorithm inspired of the convergent algorithm framework on the benchmarks of the CEC2005.

1. Introduction

The differential evolution algorithm (DE) is a population-based stochastic parallel evolutionary algorithm. DE emerged as a very competitive form of evolutionary computing since it was proposed by Storn and Price in 1995 [1]. DE or its variants have been achieving competitive ranking in various competitions held on the IEEE Congress on Evolutionary Computation (CEC) Conference Series [2, 3]. According to frequently reported comprehensive studies [46], DE outperforms many other optimization methods in terms of convergence speed and robustness over common benchmark functions. Compared to most other evolutionary algorithms, DE is much more simple and straightforward to implement, and has very few control parameters. Perhaps due to these advantages, it has got many practical applications, such as function optimization [711], multiobjective optimization [12], classification [13], and scheduling [14].

Theoretical studies of algorithms are very important to understand their search behaviors and to develop more efficient algorithms. With the popularity of DE in applications, more and more researchers pay attention to the theoretical studies on DE. According to the research contents, the main results of theoretical studies on DE can be divided into three classes as follows.

1.1. Researches on the Timing Complexity of DE

DE is a population-based stochastic search algorithm. Its runtime-complexity analysis is a critical issue. Zielinski et al. [15] investigated the runtime complexity of DE for various stopping criteria including a fixed number of generations () and maximum distance criterion (MaxDist). MaxDist means that algorithms stop the execution if the maximum distance from every vector to the best population member is below a given threshold.

1.2. Researches on the Dynamical Behavior of DE’s Population

This class focuses on investigating the evolving process of DE’s population. For instance, the development of the expected population variance and population distribution over time is an important issue. Zaharie [1620] theoretically analyzed the influence of the variation operators (mutation and crossover) and their parameters on the expected population variance. In 2009, Zaharie [21] theoretically investigated the influence of the crossover operators (including classical binomial and exponential strategies) and the crossover probability on the expected population variance. Dasgupta et al. [22, 23] proposed a mathematical model of the underlying evolutionary dynamics of a one-dimensional DE-population, and the model showed that the fundamental dynamics of the each parameter vector in DE employs the gradient-descent type search strategy. Wang and Huang [24] developed a stochastic model of a one-dimensional DE-population to analyze the evolving process of the population distribution over time.

1.3. Researches on the Convergence Property of DE

This class investigates the limit behavior of DE’s population. The main issue is that under which assumptions can it be guaranteed that DE or its variants can reach an optimal solution [25]. Technically speaking, commonly used concepts include convergence in probability, almost sure convergence, and convergence in distribution.

Xue et al. [26] performed a mathematical modeling and convergence analysis of continuous multi-objective differential evolution (MODE) under certain simplified assumptions, and this work was extended in [27]. Zhao et al. [28] proposed a hybrid differential evolution with transform function (HtDE) and proved its convergence. Sun [29] developed a Markov chain modeling and proved that the classical DE does not hold with convergence in probability. He et al. [30] defined the differential operator (DO) as a random mapping from the solution space to the Cartesian product of solution space and analyzed the asymptotic convergence of DE by using the random contraction mapping theorem. Ghosh et al. [31] established asymptotic convergence behavior of a classical DE (DE/rand/1/bin) algorithm by applying the concepts of Lyapunov stability theorems. And the analysis is based on the assumption that the objective function has the following two properties, (1) the objective function has the second-order continual derivative in the search space, and (2) it possesses a unique global optimum in the range of search.

The studies of this paper are confined to the third class, convergence property of DE.

We note that the conclusion of [30, 31] is in contradiction with [29]. According to the inference process, the asymptotic convergence in [30] refers to almost sure convergence. In fact, if DE does not hold with convergence in probability, then it does not hold with almost sure convergence. We also note that the value of the random mapping DO defined in [30] may be greater than 1, which is debatable. In [31], the asymptotic convergence analysis of DE/rand/1/bin, which was proved by applying Lyapunov stability theorems, should be a local convergence property. The reason is, according to Lyapunov stability theorems, the distribution of the initial population depends on the maximum region of the asymptotic stability. So for some functions, DE/rand/1/bin possesses asymptotic stability property if and only if initial individuals are closed enough to the global optimum. In addition, from the mutation operators of the classical DE, it can be derived that DE, if its population traps in a local optimum, cannot escape. This property was employed by [29] to prove that the classical DE does not possess global convergence in probability.

Taking into account that a convergent algorithm may have stronger robustness than a divergent one. Zhao et al. [28] developed a convergent algorithm, HtDE and proved its convergence. Zhan and Zhang [32] proposed a DE with random walk. Xue et al. [26, 27] analyzed MODE’s convergence. However, the conditions for global convergence of DE have not been explored. In this paper, the following problems will be addressed. (i)What are sufficient conditions for the global convergence of DE?(ii) What is the algorithm framework of the convergent DE? (iii)Which operators can assist the classical DE to hold with a certain asymptotic convergence?

The discussion in this paper will be undertaken in a general measurable space, and infinite production will be used as an analysis tool.

This paper is organized as follows. Section 2 introduces the classical DE. Section 3 proves a sufficient condition and a corollary for the convergence of DE to the global optima. Section 4 presents a DE algorithm framework satisfying the convergence conditions. Section 5 proves several operators satisfying the convergent algorithm framework. Section 6 gives numerical experiments to verify the robustness of the convergent DE. Section 7 analyzes and discusses the theoretical conclusions and the experimental results in detail. Section 8 summarizes this paper and indicates several directions for future research.

2. Classical Differential Evolution

DE is a competitive algorithm for solving continuous optimization problem. Consider the optimization problem: where is a measurable space and is the objective function (or the fitness of ) which satisfies that for any bounded , is bounded. The optimal solution set is denoted as , where is the optimum solution.

Let be a measure to space . Perhaps , which means that is a set with measure 0. This is not convenient to analyze. In view of the accuracy of practical problems, without loss of generality, we can consider an expanded set , where is a small positive value. We can choose an appropriate, which can meet the accuracy and make . We use    () to replace the setin this paper. Meanwhile, in order to simplify the calculation, let us suppose that the search space , where , is the dimension of .

The classical DE [2, 33, 34] works through a simple cycle of reproduction and selection operators after initialization. The reproduction operator includes mutation and crossover operators. The classical DE for solving the above problem (1) can be described in detail as follows.(1) Initialization: Generate an initial population denoted by , and let .(2) Reproduction: Generate a trial population from the target population .Mutation: generate a new population from by a mutation operator, denoted by .Crossover: generate a new population from and by a crossover operator, denoted by , and let .(3) Selection: generate a new population from and by a selection operator, denoted by .(4) If the termination condition is satisfied, then stop; else let and ; then go to Step 2.

The initial population is generated by assigning random values in the search space to the variables of every solution.

2.1. Reproduction Operator
2.1.1. Mutation Operator

After initialization, DE creates a donor vector corresponding to each individualin the th generation through the mutation operator. Several most frequently referred mutation strategies are presented as follows:DE/rand/1: ,DE/best/1: ,DE/cur-to-best/1: , DE/best/2: , DE/rand/2: , where denotes the best individual of the current generation, the indices are uniformly random integers mutually different and distinct from the running index , NP is population size, and is a real parameter, called mutation factor or scaling factor.

2.1.2. Crossover Operator

Following mutation, the crossover operator is applied to further increase the diversity of the population. In crossover, the target vector, , is combined with elements from the donor vector, , to produce the trial vector, , using the binomial crossover, where is the probability of crossover and is a random integer in . Unless otherwise mentioned, is a uniformly distributed random number confined in the range .

2.2. Selection Operator

Finally, the selection operator is employed to maintain the most promising trial individuals in the next generation. The classical DE adopts a simple selection scheme. It compares the objective values of the target vector and trial vector . If the trial individual reduces the value of the objective function then it is accepted for the next generation; otherwise the target individual is retained in the population. The selection operator is defined as

3. Convergence Condition

There are different kinds of definitions of convergence for analyzing asymptotic convergence of algorithms. The following definition of convergence, that is, convergence in probability, is used in this paper.

Definition 1. Let be a population sequence generated by using DE to solve the optimization problem (1). Then DE converges to the global optimum, if and only if
Let us give a sufficient condition for the convergence of DE.

Theorem 2. Consider using DE to solve the optimization problem (1). In the target population , there exists at least one individual , which corresponds to the trial individual by a reproduction operator, such that and the series diverges; then DE converges to the optimal solution set.
Where denotes any subsequence of natural number set, denotes the probability that belongs to the optimal solution set , and is a small positive value which may change as .

Proof. In DE, each target individual corresponds to a trial individual by its reproduction operator. According to the condition of Theorem 2, we can get the probability that all the individuals of the trial population do not belong to the optimal solution set : so, we can get the probability that all the individuals of every trial population in previous iterations do not belong to the optimal solution set And because of the elitist selection operation in DE, the optimal individual of trail populations will retain the next generation population. So we can get the probability that the th population does not contain optima: So for the classical DE with elitist selection, we have And from the property of the infinite product [35]: So for the divergent series , we can get that According to Definition 1, this theorem holds.

Corollary 3. In Theorem 2, if equals ever to a positive constant , then DE converges to the optimal solution .

Proof. Obviously, the series diverges when equals ever to a positive constant . From Theorem 2, we can get that DE converges to the optimal solution .

Now we give several observations to the above conditions as follows.(i)Theorem 2 means that if the probability entering into the optimal set in a certain sub-sequence population is large enough, then the modified DE converges to the global optimal set in probability. And the population states need no ergodicity. (ii)Corollary 3 is just a special case of Theorem 2 and is very easy to check. There are some improved DE algorithms such as HtDE proposed by Zhao et al. [28], DE-RW proposed by Zhan and Zhang [32], DE-MC proposed by Braak [36], which satisfies the convergence condition of Theorem 2 (or Corollary 3). (iii)He and Yu [37] and Rudolph [38] presented several important conclusions on convergence conditions for evolutionary algorithms. These conclusions do apply to DE algorithm. However, Comparing with these conclusions, Theorem 2 is more relaxed and easier to check.

4. Algorithm Framework Possessing Convergence

As the introduction section analyzed, it cannot be guaranteed that the classical DE holds with the global convergence. However, DE can converge to the global optimal solution if its reproduction operation satisfies the sufficient conditions given in Theorem 2 or Corollary 3. A DE algorithm framework integrating an extra mutation component will be given in this section. Owing to the fact that the purpose of using the extra mutation is to assist the classical DE to converge, this paper addresses to the operator as AsCo-mutation operator.

According to the sufficient conditions proved above, we can define the AsCo-mutation operator as follow.

Definition 4. AsCo-mutation is a mutation operator assisting the classical DE to converge. It satisfies the following conditions.(1) To a certain sub-sequence of population sequence , AsCo-mutation changes at least one individual in each with a positive probability.(2) Let denote the population generated by using AsCo-mutation; there exists at least one individual in , such that and the series diverges.
Taking into account the fact that the algorithm framework using AsCo-mutation will contain some convergent algorithms of DE family, this paper addresses to the algorithm framework as CDE. The algorithm framework CDE can be described as follows.(1) Initialization: generate an initial population denoted by , and let .(2) Reproduction: generate a trial population from the target population .Mutation: generate a new population from by a mutation operator, denoted by .Crossover: generate a new population from and by a crossover operator, denoted by .AsCo-mutation: if the certain condition generating sub-sequence population is satisfied, then generate a new population from by AsCo-mutation and let ; otherwise, let .(3) Selection: generate a new population from and by a selection operator, denoted by .(4) If the termination condition is satisfied, then stop; else let and ; then go to Step 2.
On the basis of DE, the reproduction operator of CDE increases a step, AsCo-mutation. Obviously, the algorithm framework CDE satisfies Theorem 2 when the AsCo-mutation satisfies the Definition 4. That is to say, CDE, which employs the AsCo-mutation given by the Definition 4, converges to the global optimum.

5. Several Mutation Operators Satisfying Convergence Condition

Like DE algorithm, most evolutionary algorithms for numerical optimization problems use vectors of floating point numbers for their chromosomal representations. For such representations, many mutation operators [39] have been proposed. The most common mutation operators include Uniform mutation [40] and Gaussian mutation [41, 42]. We introduce these operators and prove that they meet the definition of AsCo-mutation for CDE in turn.

5.1. Uniform Mutation

Uniform mutation replaces the solution vector with a uniformly distributed random vector confined in the domain . Each component of the vector is a uniformly distributed (independent identically distributed) random number from . So the density function of can be expressed as:

As shown in the CDE algorithm framework, suppose that AsCo-mutation operator employed by CDE is Uniform mutation. Let denote the new individual generated by Uniform mutation; then the probability that belongs to the optimal solution set can be calculated as follow:

The method that CDE uses Uniform mutation is flexible, such as mutating an arbitrary individual selected from the set at a given probability and mutating more than one individual. Let denote the number of mutated individuals, then the probability that at least one of belongs to the optimal solution set can be calculated as follow: where the is an empirical probability, , and the diversity of the population will gradually enhance as increases.

In addition, the implementation of Uniform mutation operator can be also flexible. For example, in order to keep the tradeoff between exploration and exploitation, this paper presents the following operator.

DE/um-best/1: where rand(0,1) denotes a uniform random number in . The , are boundary individuals at a given probability , each element of which equals either the upper boundary or the lower boundary value. The , are uniform random integers in . That is, when the index is no less than NP, will takes a boundary individual. Obviously, if takes the upper boundary value of the th dimension while takes the lower boundary value (and vice-versa), then the element is ergodic in the th dimension. Therefore the individualcan be ergodic in the search space like Uniform mutation operator.

5.2. Gaussian Mutation

Gaussian mutation modifies all components of the solution vector by adding a random noise: where is a vector of independent random Gaussian numbers with a mean of zero and standard deviations . The density function of can be expressed as:

Now, let us suppose that AsCo-mutation operator employed by CDE is Gaussian mutation. Then the probability that generated by Gaussian mutation belongs to the optimal solution set can be calculated as follow: where.

On the other hand, for any individual , such that . So Implying that

Like uniform mutation, the used method of Gaussian mutation is flexible. As before denotes the number of individuals mutated by Gaussian mutation operator, denotes the probability that each individual is mutated, and denotes the probability that at least one of belongs to the optimal solution set. Then can be calculated as follow:

Obviously, let , uniform mutation and Gaussian mutation operators satisfy Definition 4. And thus we can get the following Theorem 5.

Theorem 5. DE algorithm employing uniform mutation or Gaussian mutation operator converges in probability to the global optimum of the optimization problem (1).

6. Experimental Verification

It is proved in the previous sections that CDE algorithms possess convergence in probability, which only means it can be guaranteed that CDE algorithms reach an optimal solution when the iteration times approaches infinity, but does not mean that CDE can find out the optimal solution within finite iteration times. However, a convergent algorithm should generally hold stronger robustness. Thus this section gives experiments by being composed of two parts to verify CDE’s robustness. One aims to visualize the process escaping from a local optimal set of CDE on two low dimensional functions. The other is conducted to test a modified DE algorithm, which is inspired of the above convergence theory, on the benchmark functions of the CEC2005.

6.1. Experiments on Low Dimensional Functions

To achieve the aims mentioned above, experiments are conducted on two numerical functions which are chosen according to the experimental results of [4346]. One is the DE deceptive function [45], which can lead the classical DE to trap in the local optimum. The other is the Rastrigin function. In [45, 46], nineteen benchmark functions including the rastrigin function were tested using the classical DE. Those results indicated that the optimization effect of the rastrigin function is one of the worst.

6.1.1. Deceptive Function

Consider where the function is given by

The landscape of DE deceptive function is shown in Figure 1. The global optimum of the function is with the function value . There is a deceptive local minimum with function value in this test function.

6.1.2. Rastrigin Function (2 Dimensions)

Consider

The global optimum of the function is with the function value . There are many local optima in this test function.

Let CDE-um denote the CDE algorithm using uniform mutation operator. Suppose that CDE-um mutates the worst individual of at probability 1, and the new individual is directly retained to the next generation. Experiments were conducted to compare five typical versions of the classic DE with CDE-um algorithm. All experiments were implemented for 50 independent replications. The convergence times and convergence ratio on the 50 replications were reported.

In order to show the robustness of CDE-um, we reported the number of function evaluations (FES) to achieve the Ter_Err within Max_FES. Table 1 gave the FES of 50 independent replications of five typical versions on the DE’s deceptive function, while Table 2 reported the FES on the Rastrigin function. Those typical versions included DE/best/1 versus CDE-um/best/1, DE/rand/1 versus CDE-um/rand/1, DE/cur-to-best/1 versus CDE-um/cur-to-best/1, and DE/best/2 versus CDE-um/best/2, as well as DE/rand/2 versus CDE-um/best/2. Table 3 analyzes the results of Tables 1 and 2. From the statistics of Table 3, we can see that the ratio (ConRa) converging to the optimum of CDE-um is much higher than the corresponding DE.


Alg. best/1 rand/1 cur-to-best/1 best/2 rand/2
DE CDE-um DE CDE-um DE CDE-um DE CDE-um DE CDE-um

FES
176 192 272 416 448 824 264 384 392 632
4240 184032 3328 424 560 504
147504 3920 2160 272 744 560 504
500832 304 464 1384 1808 336 1296
26200 312 560 1552 536 600
163976 448 6248 1360 320 808
1568 368 592 464 296 312 472 888
200 368 727168 2944 464 560 936
5448 320 312 592 664 392 912 424 472
66712 312 22736 304 392 768
98824 320 5040 520 272 1448 352 744
82080 177312 344 240 416 504 648
50208 288 520 592 4272 368 800
5408 272 320 824 272 408 360 704
3096 352 400 416 976 288 344 576 680
152256 304 352 832 288 320 528
312 4200 3592 432 624
312624 256 312 1192 288 256 480 672
370328 224 508880 1232 608 672
176 256 2144 1328 272 2600 368 784
205016 632248 1240 312 352 376 592
184 168 328 360 672 376 328 464 392 720
607768 312 688 2888 416 408 672
2200 2240 632 464 352 896
184240 1312 536 752 312 520 472 1080
160 406128 448 552 408 368 360 672
268560 817456 9792 2456 448 560
295208 360 1128 776 352 744
38936 336 336 672 416 728
160 296 402096 472 512 208 328 496 784
126144 368 520 336 4360 576
759336 312 184160 752 2456 376 840
292232 304 2064 392 680 480 320 5032
24032 336 632 936 272 2088 352 656
240 107664 2456 272 2696 456 552
555152 1432 1008 320 1096 336 1664
168 265536 264 105600 352 232 512 752
168 2256 296 1864 1680 360 352 384 760
55616 280 320 20664 904 832
336 248 592 376 352 824
176 263744 11136 800 408 448 616 69016
123896 352 400 456 2168 264 960 536 592
200 352 1456 288 672 288 384 544 792
784008 304 238752 1368 320 5320 520 2536
164408 29432 1072 304 1248 488 584
208 252816 4216 392 704 568 496 536
2088 3200 432 560 280 440 576 536
542560 312 616 472 440 368 848
192 1768 80536 640 432 672
743632 304 320 592 2976 408 680

“FES” denotes the number of function evaluations. “—” indicates that the algorithm cannot find the optimum within Max_FES.

Alg.
best/1 rand/1 cur-to-best/1 best/2 rand/2
DE CDE-um DE CDE-um DE CDE-um DE CDE-um DE CDE-um

FES
103376 1680 1184 824 1088 1232 2000 2400
1392 1472 1408 3328 976 1392 2112 2432
560 1664 1424 1264 2160 1264 1392 1584 2512
942224 1200 1392 1384 1168 1280 1936 2176
672 1440 1584 1280 560 1136 1280 1792 2480
452064 1696 1600 1184 1360 1120 1312 1856 2304
560 624 1536 1456 1600 464 1088 1184 2000 1792
720 688 1424 1648 1280 2944 1472 2352
576 1296 1552 1312 664 1296 1920 2128
672 576 1184 1248 1184 22736 1056 1296 1776 1936
198368 1424 1424 2720 520 1072 1280 2176 2368
60336 1856 1104 240 1200 1280 2000 2480
656 1424 1584 1136 4272 896 5840 1840 2304
267360 1552 28192 1376 824 1056 1056 1728 2288
1568 1344 1376 976 1136 1408 1792 2720
672 1296 1440 832 1136 1552 2416 2144
608 800 1408 1856 1200 4200 800016 1216 1936 2240
672 1600 1600 1456 1192 1088 1568 2112 2496
704 1536 3712 1184 1232 1168 1232 2096 2512
560 236592 1328 1648 1344 1328 1072 1408 1904 2240
1536 1808 1952 1240 1328 2336 2368
241776 1200 1872 1280 376 1216 1312 1744 2880
1072 1408 218208 1344 2888 1152 1248 2000 2512
1520 1408 1232 632 1264 1152 1904 2464
1344 689360 1136 752 1200 5568 1856 2352
640 1424 1488 1376 408 1184 1424 1808 2448
788880 1344 1376 9792 1216 1424 2416 2512
576 656 1408 1168 1056 1128 1168 1296 1856 2240
640 560 1408 1536 1184 672 960 1344 1856 2432
1376 228048 5136 512 1312 1312 1888 2512
672 1312 1520 1456 520 1232 1184 2112 2160
736 17952 1584 1536 2456 1200 3504 1920 2560
592 1264 1536 1408 680 1296 1136 2000 2432
15456 1328 24288 1280 936 1296 1264 1856 2272
624 1232 1680 1264 2456 1136 1152 2144 2448
1264 1504 1040 1008 1264 1504 1776 2352
544 656 1440 1616 1184 352 944 1200 1824 2432
640 1296 1520 1184 1680 960 1360 2064 2048
720 656 1600 1520 20664 976 1296 1744 2080
672 1536 1392 1120 592 1056 1232 1904 2464
784 1472 1456 1344 800 1072 1168 1776 2176
528 1472 1472 2168 1280 12944 1824 2672
640 576 1344 1648 1376 672 1056 10960 1984 2528
624 1680 1360 1328 1368 1120 1248 1952 2768
720 87056 1568 1392 1376 1072 1104 1136 1776 2464
640 640 1280 1696 1344 704 1088 1264 2096 2368
576 1328 1344 1232 560 1232 1312 2000 2240
848 1504 1344 472 1200 1136 2080 2176
544 688 1600 1440 1344 640 1056 1216 2000 2544
640 1392 1408 1136 592 1056 1344 2096 2336

“FES” denotes the number of function evaluations. “—” indicates that the algorithm cannot find the optimum within Max_FES.

Function Deceptive function Rastrigin function
Algorithm RTs CTs ConRa RTs CTs ConRa

DE/best/1 50 8 16% 50 27 54%
CDE-um/best/1 50 50 100% 50 50 100%

DE/rand/1 50 28 56% 50 45 90%
CDE-um/rand/1 50 50 100% 50 50 100%

DE/cur-to-best/1 50 17 34% 50 49 98%
CDE-um/cur-to-best/1 50 50 100% 50 50 100%

DE/best/2 50 29 58% 50 47 94%
CDE-um/best/2 50 50 100% 50 50 100%

DE/rand/2 50 37 74% 50 49 98%
CDE-um/rand/2 50 50 100% 50 50 100%

“RTs” denotes runing times. “CTs” denotes convergence times. “ConRa” denotes convergence ratio.

Figures 2, 3, and 4 showed the convergence times graphs in 50 independent replications within 800 iterations for DE’s deceptive function. We can see that all the convergence curves hold two common characteristics as follows.(i)When the iteration times are smaller, the convergence times of five typical versions of the classical DE are slightly larger than the corresponding CDE-um. However, with the iteration times are increasing, the convergence times of CDE-um will become far larger than the corresponding DE. From this we can see that smaller increasing in the computational cost can make a greatly improving on the robustness of CDE-um algorithm. (ii)When the iteration times are larger, all the convergence graphs of five typical versions of the classical DE become a straight line. However, all the graphs of CDE-um show the ladder’s rising status. This indirectly shows that the classical DE cannot escape from a local optimal set or a premature solution set if trapping in, but CDE-um enhances the ability to escape from the local optimal set or premature solution set.

The convergence graphs on the rastrigin function had the similar characteristics with DE’s deceptive function, so the graphics are omitted here.

The population size is set to . The maximum number of function evaluations (Max_FES) is set to 5,000,000.(i)Mutation factor, [44, 47]; (ii)Crossover probability, [44, 47]; (iii)Termination error value (Ter_Err), Ter_Err.

6.2. Experiments on Functions of CEC2005

Wang et al. [48] presented a composite differential evolution algorithm (CoDE), which employed three trial vector generation strategies, that is, rand/1/bin, rand/2/bin, and current-to-rand/1. The experimental studies on the 25 benchmark functions of CEC2005 have indicated that CoDE’s overall performance was better than the other seven outstanding competitors (please refer to [48] for details). Now we give a convergent CoDE algorithm (CCoDE-umbest) based on the above convergent algorithm framework. The CCoDE-umbest algorithm has the DE/um-best/1 operator, which was presented in Section 5.1, instead of the current-to-rand/1 of CoDE.

This paper compared CCoDE-umbest with CoDE on the 25 benchmark functions of CEC2005. Table 4 reported the average and standard deviation of the function error values obtained in 25 runs when FES = 1.5E + 5 and FES = 3.0E + 5, respectively. The two bottom lines in Table 4 gave the test statistics for sign test [49] on the mean errors. From Table 4, the probability values (0.012 for FES = 1.5E + 5, 0.041 for FES = 3.0E + 5) supporting the null hypothesis are less than the significance level at 0.05. So we can reject the null hypothesis, that is to say, the overall performance of CCoDE-umbest is better than CoDE on the benchmarks. It implies that the use of the convergent algorithm framework can improve the performance of CoDE.


FES
Alg.
CCoDE-umbest CoDE CCoDE-umbest CoDE
Mean Std. Mean Std. Mean Std. Mean Std.

f1
f2
f3
f4
f5
f6
f7
f8
f9
f10
f11
f12
f13
f14
f15
f16
f17
f18
f19
f20
f21
f22