Abstract

Population initialization is a crucial task in population-based optimization methods, which can affect the convergence speed and also the quality of the final solutions. Generally, if no a priori information about the solutions is available, the initial population is often selected randomly using random numbers. This paper presents a new initialization method by applying the concept of adaptive randomness (AR) to distribute the individuals as spaced out as possible over the search space. To verify the performance of AR, a comprehensive set of 34 benchmark functions with a wide range of dimensions is utilized. Conducted experiments demonstrate that AR-based population initialization performs better than other population initialization methods such as random population initialization, opposition-based population initialization, and generalized opposition-based population initialization in the convergence speed and the quality of the final solutions. Further, the influences of the problem dimensionality, the new control parameter, and the number of trial individuals are also investigated.

1. Introduction

Evolutionary algorithms (EAs) are population-based stochastic optimization algorithms. For each optimization problem, they maintain a set of candidate solutions to play the role of individuals in a population, perform crossover and mutation operations on this set to generate different solutions, and use a fitness function to determine the environment within which the solutions live. In the last decade, EAs have been applied successfully to solve many real-world and benchmark optimization problems. However, as population-based algorithms, EAs such as the genetic algorithm (GA) [1] and differential evolution (DE) [2, 3] all have common drawbacks—long computational time and premature convergence, especially when the solution space is hard to explore.

Since reducing computation time needed to reach optimal solutions and improve the quality of the final solutions would be beneficial, many efforts have been already done. However, most of the work mainly focused on the introduction and improvement of selection mechanisms, crossover and mutation operators, parameter adjustments, and some other hybrid strategies. If no information about the solution is available, the most commonly used method to generate the initial population is random initialization. Little work has been done on the population initialization, even though it is a crucial task in EAs and can affect the convergence speed and also the quality of the final solution. Maaranen et al. [4] used quasirandom sequences to generate the initial population for GAs. The experimental results showed that their approach could improve the quality of the final solutions but with no noteworthy improvement for convergence speed. Rahnamayan et al. [5] proposed an opposition-based population initialization method, which achieved a fast convergence speed. Wang et al. [6] presented a population initialization method based on space transformation search (in their following work, such a method is renamed as generalized opposition-based initialization [7]). Experimental results showed that their approach when combined with other strategies outperformed the traditional random initialization and opposition-based initialization.

This paper proposes a new approach for population initialization by employing the adaptive randomness (AR) to improve the quality of the final solutions and also accelerate the convergence speed. AR initialization is an enhanced version of random initialization. It is simple and easy to be implemented. The main idea of AR is to make use of the difference between individuals to make them more evenly spread over the entire search space and then a better approximation for the current candidate solution is obtained. Although this paper only embeds the AR for population initialization of classical DE, the idea is general enough to be applied to all other EAs. Experimental results on 34 well-known benchmark problems show that the proposed approach performs better than the random initialization, opposition-based initialization, and generalized opposition-based initialization both in the quality of the final solutions and the convergence speed.

The remainder of this paper is organized as follows: in Section 2, the concept of AR is briefly explained. In Section 3, the classical DE is briefly reviewed. In Section 4, the proposed AR-based population initialization algorithm is presented. Experimental results are given in Section 5 with focus on the test functions used, parameter setting, results, and results’ analysis. In Section 6, we conclude the work and all benchmark functions are listed in the appendix.

2. Adaptive Randomness

Traditionally, EAs imitate natural evolution in a population. The population is a set of candidate solutions to an optimization problem, making us consider several solutions at the same time. The population evolves from one generation to another as the individuals are crossbred and mutated until the predefined criteria are satisfied. If no a priori information about the solution is available, the initial population is often selected randomly using random numbers [4]. Obviously, the computation time is directly related to the distance of the random numbers from optimal solutions [5].

In practice, random numbers cannot be generated algorithmically. The algorithmically generated numbers (usually called pseudorandom numbers) only try to imitate random numbers. However, it is usually more important that the numbers are as evenly distributed as possible than that they imitate random numbers [4], for they provide much more information about the fitness function. This forms the basis of our approach for population initialization, namely, adaptive randomness (AR).

AR slightly modifies the random initialization by controlling the individuals that can come into the initial population. When adding a new individual to the initial population, AR needs to make sure that the individual should not be too close to any of the previous individuals already in the initial population.

To achieve this, AR should maintain two sets of individuals, that is, partial initial population and set of trial individuals . Before concentrating on AR-based population initialization, we define the two sets first.

Definition 1. Let be the initial population of a specific optimization method, where is the population size and () is the candidate solution in a -dimensional space. Then is defined by .

Definition 2. is the set of trial individuals such that . Each trial individual () is randomly chosen from the -dimensional search space and is the predefined number of trial individuals.

Obviously trial individuals are those individuals that are randomly generated from the search space but have not been added into .

To distribute the individuals in as spaced out as possible, when adding a new individual into , AR first generates trial individuals to form and then the trial individual that is farthest away from all individuals in is selected to be added into . Such a process is repeated until the number of individuals in reaches .

In AR the distance between every pair of individuals and is calculated by Euclidean distance as

So when adding a new individual into the trial individual will be chosen such that for all , where counts the number of individuals in and returns the minimum value in a set of values. The rationale for using (2) is to evenly distribute the individuals in initial population by maximizing the minimum distance between the trial individual selected and the individuals already in .

Figure 1 gives a simple example to show you how to select the trial individual to update . Assume that already has two individuals (i.e., and ) and we generate with two (suppose that ) trial individuals (i.e., and ). According to (1), we can obtain , , , and . So and . Since , the trial individual will be added into .

Before introducing the AR-based population initialization algorithm, the classical DE is briefly reviewed in the following section.

3. Brief Review of the Classical DE

DE, firstly proposed by Storn and Price in [2, 3], is a population-based stochastic optimization algorithm and has been successfully used in both benchmark test functions and real-world applications. It is simple yet effective and robust. A plethora of experimental studies show its better performance than other EAs.

The proposed algorithm is also based on this DE scheme. Let us assume that () is the th individual in population , where is the population size, is the generation index, and is the population in the th generation. The main idea of DE is to generate trial vectors. Mutation and crossover are used to produce new trial vectors, and selection determines which of the vectors will be successfully selected into the next generation.

For classical DE (DE/rand/1/bin), the mutation, crossover, and selection operators can be defined as follows.

Mutation. For each vector in generation , a mutant vector is defined by where and , , and are randomly selected integer indices from . Further, , so the population size satisfies . is a real number which determines the amplification of the differential variation . Larger values of result in higher diversity in the generated population and lower values in faster convergence.

Crossover. As many other EAs do, DE also takes a crossover operation to increase the diversity of population and generate the trial vectors. The trial vector can be defined as where is the problem dimension. The classical DE uses the DE/rand/1/bin scheme to generate the trial vector where is the predefined crossover probability, is a random number within for the th dimension, and is a random parameter index.

Selection. A greedy selection mechanism is used as where is the fitness function. Without loss of generality, this paper only considers minimization problems. If, and only if, the trial vector is better than (i.e., ), is set to ; otherwise, the remains unchanged; that is, . Hence the population either gets better or remains the same with respect to the fitness function but never deteriorates.

Though there are many variants of DE [2, 3], to maintain a general comparison, this paper only uses the classical DE in the conducted experiments to demonstrate the improvement of the convergence speed and the quality of the final solutions by using AR-based population initialization.

4. The Proposed AR-Based Population Initialization Algorithm

For a specific optimization problem, when lacking a priori information about the solutions, the initial population is usually created using random numbers. AR makes full use of the distance information during the process of population initialization. By applying the AR strategy, we can distribute the individuals as spaced out as possible and obtain a better approximation for the current candidate solutions. So instead of using a pure random initialization, we propose the following AR-based population initialization algorithm (see Algorithm 1).

()  Set , ,
()  Randomly generate an individual from a -dimensional search space
()  
()  ++
()  while    do
()   Randomly initialize with trial individuals
()   Select the trial individual such that for all :
     
()   
()   ++
() Set
() end while
() return Set the final as the initial population

As is shown in Algorithm 1, is initially empty and the first individual of is randomly chosen from the search space. During population initialization, will be incrementally updated with the individuals selected from until the number of individuals in reaches the population size .

The flowcharts of DE with random population initialization, opposition-based population initialization, generalized opposition-based population initialization, and AR-based population initialization are shown in Figure 2.

AR-based population initialization will be embedded in the classical DE in Section 5 to show its effectiveness in the improvement of the convergence speed and the quality of the final solutions.

5. Empirical Study

To investigate the effectiveness of the proposed AR-based population initialization algorithm in the improvement of convergence speed and the final solution, we embedded it in the classical DE and conducted controlled experiments. Our experiments were carried out on a PC at 2.3 GHz with 2 GB of RAM.

In the following subsections, we provide details on the test functions of our study (Section 5.1), parameter settings (Section 5.2), and our experiment results and analysis (Section 5.3).

5.1. Test Functions

In order to compare the convergence speed and the quality of the final solutions of DE with random population initialization (), DE with opposition-based population initialization (), DE with generalized opposition-based population initialization (), and DE with AR-based population initialization (), a comprehensive test set with 34 numerical benchmark functions is employed. The test set includes well-known unimodal as well as highly multimodal minimization problems [8, 9]. The definition, the range of search space, and also the global optimum(s) for each function are given in the appendix. The dimensionality of these problems varies from 2 to 100, covering a wide range of problem complexity.

5.2. Parameter Settings

For all the conducted experiments, the parameters of the classical DE, namely, population size (), differential amplification factor (), crossover probability constant (), and maximum number of function calls (), if no a change is mentioned, are fixed to 100, 0.5, 0.9, and , respectively. Such a setting follows the suggestions given in the literature (e.g., [1012]). And the parameter of in AR-based population initialization is set to 3 unless a change is mentioned.

5.3. Results and Analysis

The experiments are categorized as follows. In Section 5.3.1, , , , and are compared in terms of convergence speed and robustness. In Section 5.3.2, , , , and are compared in terms of the quality of the final solution. In Section 5.3.3, the effect of problem dimensionality is investigated. In Section 5.3.4, the effect of parameter is studied. All the experiments are conducted 1,000 times with different random seeds, and the average results throughout the optimization runs are recorded. It should be noted that, in the experiments, we find that for a small value of conducted times, the values of evaluation times and the final solutions are not stable.

5.3.1. Comparison of DEr, DEo, DEgo, and DEar in Terms of Convergence Speed and Robustness

By the suggestions in the literature (e.g., [5, 6, 9]), we compare the convergence speed of , , , and by measuring the number of function calls (NFC). For each optimization problem, NFC is recorded when a specific algorithm reduces the best value to a value smaller than the value-to-reach (VTR) before meeting . In order to minimize the effect of the stochastic nature of the algorithms, all the reported NFCs are averaged over 1,000 independent trials. Obviously, a smaller NFC means a higher convergence speed. In order to compare the convergence speed between two specific algorithms, we introduce another metric, acceleration rate (ARE), which is defined as where and are the NFCs for the two algorithms algA and algB ( and are all chosen from ). So means that algB is faster. The VTR is set to for all benchmark functions. The same setting has been used in the literature (e.g., [13, 14]).

We also compare the robustness of , , , and by measuring the success rate (SR) [13]. In the current work, a successful running means that a specific algorithm successfully reaches the VTR for each test function in the allowed . So SR can be calculated as

SR is a commonly used metric to characterize the robustness of a specific algorithm; that is, a larger SR means that the algorithm is more robust.

Further the average NFC (), the average SR (), and the average ARE () over the test functions are calculated as

Table 1 summarizes the numerical results when solving the 34 benchmark functions shown in the appendix. The best result of NFC for each function is highlighted in boldface and , , and are shown in the last row. Since comparing the algorithms with different SR values seems meaningless, the reported average values are calculated only on the functions where all the algorithms have the same success rate.

As seen, outperforms on 22 test functions (about 64.7% of the problems). Though is faster than on 4 functions (i.e., , , , and ), its SR is worse. Except for , the rest 3 functions are functions with a low dimensionality (). outperforms on 22 test functions (about 64.7% of the problems), while surpasses only on 4 functions (i.e., , , , and ). Though is faster than on and , its SR is worse. outperforms on 24 test functions (about 70.6% of the problems), while surpasses only on and . Though is faster than on , its SR is worse. All the algorithms fail to solve 6 functions (i.e., , , , , , and ). On and , only can solve them with a very small ARE while the other three algorithms all fail. The between and is 1.0116, which means that is on average 1.16% faster than . Similarly, is on average 1.52% faster than and is on average 1.38% faster than .

So we can conclude that shows better convergence speed than the other 3 algorithms with the same parameter settings and fixing the for the . Some sample bar charts for the performance comparison of the 4 algorithms are given in Figure 3.

5.3.2. Comparison of DEr, DEo, DEgo, and DEar in Terms of the Quality of Final Solutions

In this section, is compared with , , and with respect to the quality of the final solutions. All the experiments were conducted 1,000 times, and the mean function error value and standard deviation of the results are recorded. The results of the 4 DE algorithms on the 34 test problems are presented in Tables 2 and 3, where “Mean” indicates the mean function error value and “Std. Dev.” stands for the standard deviation. The best results among the 4 DE algorithms are shown in boldface.

From the results, it can be seen that achieves better results than , , and on 24 test functions (about 70.5% of the test functions). achieves the same performance as the on function , and they are both much better than the other algorithms on this problem. For the rest of the 9 functions, all the algorithms achieve the same results.

To compare the performance of multiple algorithms on the test suite, the average ranking of the Friedman test is conducted by the suggestions considered in [7, 15]. Table 4 shows the average ranking of the 4 DE algorithms on functions . These algorithms can be sorted by average ranking into the following order: , , , and . It means that and are the best and worst ones among the four algorithms, respectively. So as seen, although opposition-based population initialization can accelerate the convergence speed on some test problems, when compared with , it cannot improve the quality of the final solutions. So if we want to obtain a high quality solution, opposition-based population initialization cannot be used alone.

To investigate the significant differences between the behavior of two algorithms, we conduct four tests, that is, Nemenyi’s, Holm’s, Shaffer’s, and Bergmann-Hommel’s [7, 15]. For each test, we calculate the adjusted values on pairwise comparisons of all algorithms. Table 5 shows the results of adjusted values. Under the null hypothesis, the two algorithms are equivalent. If the null hypothesis is rejected, then the performances of these two algorithms are significantly different. In this paper, we only discuss whether the hypotheses is rejected at the 0.05 level of significance. As we can see, all the four tests reject hypotheses .

Besides the above four tests, we also conduct Wilcoxon’s test to recognize significant differences between the behavior of two algorithms [7, 15]. Table 6 shows the values of applying Wilcoxon’s test among and the other three DE algorithms. The values below 0.05 (the significant level) are shown in boldface. From the results, it can be seen that is significantly better than , , and .

5.3.3. Scalability Test: Effect of Problem Dimensionality

The performance of most EAs (including DE) deteriorates quickly with the growth of the dimensionality of the problem. The main reason is that, in general, the complexity of the problem (search space) increases exponentially with its dimension. Here, we show a scalable test of for , , and for each scalable function in our test set. Table 7 summarizes the comparison results of the four DE algorithms for , , and . From the results, it can be seen that is not always affected by the growth of dimensionality. For , , , , , , , , and , achieves similar performance when the dimension increases from to . The performance of deteriorates quickly with the growth of dimension for five functions (, , , , and ). For the remaining one function (), the growth of dimension does not affect the performance of .

5.3.4. Effect of Different Settings

In , a new control parameter (the number of trial individuals) is added to DE’s parameters (, , and ). Since denotes the number of trial individuals, should be a positive integer. And when , is equal to . So in our study we restrict to the positive integers within the range of . As mentioned above, was fixed to 3 in all experiments. Such a value was set without any effort to find an optimal value. However the performance of may be influenced by different settings of values.

We investigate the correlation between and the quality of the final solutions using the Spearman correlation [16]. We repeat the conducted experiments in Section 5.3.3 for (since ) with step size of 10 (i.e., 1,000 trials per function per ). For the limitation of space, we do not show all the results of the final solutions; only the final solutions obtained on and are shown in Table 8 for illustration. But almost a similar behavior has been observed for all functions that the quality of the final solution is better than that of , , and when .

Table 9 shows the Spearman correlation test results between and the final solutions obtained on each test function. As seen, there is not a significant correlation between and the quality of the final solutions. It means that , like other control parameters of DE, has a problem-oriented value. Since the larger the value is, the more time to initialize the population is required, especially when the dimensionality of the problem is also large. Our limited experiments suggest to use a small value of .

6. Conclusions

This paper employs the concept of adaptive randomness (AR) for population initialization. The main idea of AR is to make use of the difference between individuals to make them more evenly spread over the search space and then a better approximation for the current candidate solution is obtained. In order to investigate the performance of the AR-based population initialization, the classical DE has been utilized. By embedding AR within DE, was proposed. Experiments are conducted on 34 benchmark functions. The experimental results can be summarized as follows.(i) is compared with other DE algorithms such as DE with random initialization (), DE with opposition-based learning (), and DE with generalized opposition-based learning () with respect to the convergence speed and robustness. The results demonstrate that performs better than the other three DE algorithms at least on 64.7% of the test functions. Although the other three DE algorithms outperform on some functions, their success rates are always worse.(ii) is further compared with , , and with respect to the quality of the final solutions. The results show that performs better than the other three DE algorithms on the majority (about 70.5%) of test functions. And on the rest of functions, obtains no worse results than the results of the other three DE algorithms. Statistical comparisons also show that is the best of the four DE algorithms.(iii)A scalability test of over 15 test functions with different problem dimensions (, , and ) are conducted. The 15 functions are scalable and chosen from the 34 test functions. The results show that is not always affected by the growth of dimensionality. For 9 functions, achieves similar performance when the dimension increases from to . For 5 functions, the performance of deteriorates quickly with the growth of dimension while for the remaining 1 function, the growth of dimension does not affect the performance of .(iv)The influence of (number of trial individuals) was studied by investigating the Spearman correlation between and the quality of the final solutions. The results obtained on the 34 test functions show that there is not a significant correlation between and the quality of the final solutions. But the quality of the final solution is better than that of the other three DE algorithms when .

The main motivation of the current work was the introduction of the concept of adaptive randomness for population initialization. Although this paper only embeds the AR within classical DE, the idea is general enough to be applied to all other population-based methods (e.g., GA and PSO). Further since AR is new, studies are still required to investigate its benefits, weaknesses, and limitations. The current work can be considered as a first step in applying AR.

Appendix

List of Benchmark Functions

The 34 test functions we employed are given below. All the functions used in this paper are to be minimized.

(1) Sphere Model. Consider where and the global optimum is 0 at . is a unimodal, scalable, convex, and easy function.

(2) Axis Parallel Hyperellipsoid. Consider where and the global optimum is 0 at . is a unimodal, scalable, convex, and easy function.

(3) Schwefel’s Problem 1.2. Consider where and the global optimum is 0 at . is a unimodal and scalable function.

(4) Rosenbrock’s Valley. Consider where and the global optimum is 0 at . is a nonconvex unimodal function. Its optimum is inside a long, narrow, parabolic shaped flat valley.

(5) Rastrigin’s Function. Consider where and the global optimum is 0 at . is highly mutimodel.

(6) Griewangk’s Function. Consider where and the global optimum is 0 at .

(7) Sum of Different Powers. Consider where and the global optimum is 0 at . is a unimodal and scalable function.

(8) Ackley’s Problem. Consider where and the global optimum is 0 at .

(9) Beale’s Function. Consider where and the global optimum is 0 at .

(10) Colville’s Function. Consider where and the global optimum is 0 at .

(11) Easom’s Function. Consider where and the global optimum is 0 at . is unimodal and its global minimum lays in a narrow area relative to the search space.

(12) Six-Hump Camel Back Function. Consider where and the global optimum is 0 at and . has six local minima, two of them are global.

(13) Levy’s Function. Consider where and the global optimum is 0 at . has about local minima.

(14) Matyas Function. Consider where and the global optimum is 0 at . is unimodal.

(15) Perm Function. Consider where and the global optimum is 0 at . is unimodal.

(16) Michalewicz’s Function. Consider where , , , and the global optimum is 0.

(17) Zakharov’s Function. Consider where and the global optimum is 0 at . is unimodal.

(18) Schwefel’s Problem 2.22. Consider where and the global optimum is 0 at . is unimodal.

(19) Schwefel’s Problem 2.21. Consider where and the global optimum is 0 at . is unimodal.

(20) Step Function. Consider where and the global optimum is 0 when .

(21) Noisy Quartic Function. Consider where and the global optimum is 0 at .

(22) Kowalik’s Function. Consider where , α = [0.1957, 0.1947, 0.1735, 0.1600, 0.0844, 0.0627, 0.0456, 0.0342, 0.0323, 0.0235, 0.0246], , and the global optimum is 0.0003075 at . is multimodal.

(23) Shekel 5 Problem. Consider where and the global optimum is 0 at .

(24) Shekel 7 Problem. Consider where and the global optimum is 0 at .

(25) Shekel 10 Problem. Consider where and the global optimum is 0 at .

(26) Tripod Function. Consider where , for ; otherwise , and the global optimum is 0 at .

(27) 4th De Jong. Consider where and the global optimum is 0 at .

(28) Alpine Function. Consider where and the global optimum is 0 at . is multimodal and not symmetrical.

(29) Schaffer’s Function 6. Consider where and the global optimum is 0 at . is multimodal.

(30) Pathological Function. Consider where and the global optimum is 0 at . is multimodal and extremely complex.

(31) Inverted Cosine Wave Function. Consider where and the global optimum is at . is multimodal.

(32) Aluffi-Pentini’s Problem. Consider where and the global optimum is 0 at .

(33) Becker and Lago Problem. Consider where and the global optimum is 0 at .

(34) Bohachevsky 1 Problem. Consider where and the global optimum is 0 at .

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (nos. 61202048 and 61202200), the Commonwealth Project of Science and Technology Department of Zhejiang Province (no. 2014C23008), the Zhejiang Provincial Nature Science Foundation of China (no. LY13F020010), and the Open Foundation of State Key Laboratory of Software Engineering of Wuhan University of China (no. SKLSE-2012-09-21).