Abstract
In evolutionary algorithms, genetic operators iteratively generate new offspring which constitute a potentially valuable set of search history. To boost the performance of offspring generation in the realcoded genetic algorithm (RCGA), in this paper, we propose to exploit the search history cached so far in an online style during the iteration. Specifically, survivor individuals over the past few generations are collected and stored in the archive to form the search history. We introduce a simple yet effective crossover model driven by the search history (abbreviated as SHX). In particular, the search history is clustered, and each cluster is assigned a score for SHX. In essence, the proposed SHX is a datadriven method which exploits the search history to perform offspring selection after the offspring generation. Since no additional fitness evaluations are needed, SHX is favorable for the tasks with limited budget or expensive fitness evaluations. We experimentally verify the effectiveness of SHX over 15 benchmark functions. Quantitative results show that our SHX can significantly enhance the performance of RCGA, in terms of both accuracy and convergence speed. Also, the induced additional runtime is negligible compared to the total processing time.
1. Introduction
Evolutionary algorithms (EAs) have been shown to be generic and effective to search for global optima in the complex search space theoretically [1–3] and practically [4–6]. The exploration process of EAs imitates the natural selection process, which is realized by conducting the offspring generation and survivor individual selection alternately and iteratively. The population quality is gradually improved throughout the exploration process, which can be viewed as a stochastic populationbased generationandtest process. Because of the offspring generation, a large number of candidate solutions (i.e., individuals) are sampled, accompanied by corresponding fitness values, genetic information, and genealogy information. Such accumulated search data constitute search history which can be very informative and valuable for boosting the overall performance. For instance, exploiting search history can be useful for improving the search procedure under a limited budget of fitness evaluations (FEs). That is, no additional FEs are allowed for improving the search performance. Also, the computational cost of a single FE can be high when the fitness functions are complicated. To enable a better solution for the population without increasing the number of FEs, the way of exploiting the search history truly matters. Nevertheless, search history has been sparsely exploited and studied in existing methods.
Realcoded genetic algorithm (RCGA) has been widely studied in the past decades [7–11], and the main efforts for improving the performance of RCGA have been focused on the development of the crossover techniques [12]. Because the crossover operator is to generate new offspring from the current population, the quality of the new solutions straightforwardly affects the evolution direction and convergence speed. Given different mechanisms, crossover methods can differ from (1) parent selection, (2) offspring generation, and (3) offspring selection. Both parent and offspring can be more than two, depending on the design. The abovementioned three aspects associate the exploration ability with exploitation ability, and the degree and balance between both abilities affect the performance largely [13]. Although the selfadaptive feature of RCGA [14] can adjust the relationship to a certain extent, the “best” degrees and balance between exploration and exploitation for achieving a satisfactory solution can differ greatly with respect to different problem settings and can be hardly achieved with the adaptive feature.
With a large amount of search history data up to the current generation in hand, we attempt to introduce a crossover method that effectively exploits the history data in this paper. At first, an archive is defined to collect the survivor individuals over generations as the search history. Then, the stored individuals are clustered by kmeans [15], and each cluster is assigned a score depending on the number of belonging individuals. At last, offspring is generated and selected according to the scores. We introduce two different schemes to update the archive. The proposed crossover operator, named search historydriven crossover (SHX), generates offspring by considering the cluster scores. Since SHX enables an offspring selection mechanism, any existing parent selection and offspring generation mechanisms can be easily integrated with it. To our knowledge, this is the first work to design the crossover model by effectively exploiting search history. We present a set of experiments to systemically evaluate the effectiveness of the proposed method using 15 benchmark functions. Three conventional crossover operators are employed, and the results with/without SHX are compared. Apart from the above, two archive update methods are also analyzed.
The main technical contributions of this paper are threefold. First, we propose a novel crossover model by effectively exploiting the search history. Second, we introduce the offspring selection based on the clusters calculated from the search history. Third, we introduce two schemes to update the survivor archive. A preliminary version of this paper appears in GECCO2020 [16].
2. Related Work
Crossover is one of the principal operators for generating offspring and deeply relates to the performance of the realcoded genetic algorithm (RCGA). Blend crossover (BLX) [17] proposed by Eshelman and Schaffer is one of the most popular operators. Offspring genes are independently and uniformly sampled within an interval between a gene pair of parents. The parameter corresponds to the extension of the sampling interval, which plays a key role in maintaining the diversity of offspring. Eshelman et al. proposed Blend crossover (BLX) [18] which involves two extension parameters. Deb and Agrawal introduced simulated binary crossover (SBX) [19] which simulates the singlepoint crossover in binarycoded GA for continuous search space. The interval used in SBX is determined by a polynomial probability distribution depending on the distribution index . indirectly adjusts the tendency of offspring generation. The above crossover operators have a common feature that the offspring genes are extracted according to a certain probability distribution from the predefined interval on the parent genes. This feature enables better results than using crossover operators for binary coding in the continuous search space. On the other hand, some crossover operators set more than two individuals as parents, which aim to generate offspring with wellpreserved population statistics. In the case of unimodal normal distribution crossover (UNDX) [20], the generation of offspring follows a unimodal normal distribution defined on the line connecting two of the three parents. For simplex crossover (SPX) [21], individuals are taken as parents in the dimensional search space. SPX uniformly generates offspring within dimensional simplex constructed by parent individuals and expanded by a parameter .
Search history has also been exploited in some research, but to the best of our knowledge, none of them is for the purpose of improving the crossover model. Since online real systems often provide uncertain evaluation values which lead to unreliable convergence of GA, Sano and Kita proposed memorybased fitness estimation GA (MFEGA) [22]. MFEGA estimates the fitness from neighboring individuals stored in the search history. Leveraging search history allows estimation without requiring additional evaluation. Amor and Rettinger proposed GA using selforganizing maps (GASOM) [23]. SOM (selforganizing maps) can provide a visualized search history, which makes the regions explored intuitive for users. Moreover, individual novelty is introduced by the activation frequency in the search history table and utilized by the reseeding operator to preserve the exploration power. Yuen and Chow presented the continuous nonrevisiting GA (cNrGA) [24]. A binary partitioning tree called a density tree stores all evaluated individuals and divides the search space into nonoverlapped partitions by means of distributions. These subregions are used to check whether a new individual needs to be evaluated or not.
3. Overview
Principles of designing good crossover operators for RCGA are discussed in [25]. Two among them are especially important: (1) the crossover operator should preserve the statistics of the population; (2) the crossover operator should generate offspring with as much diversity as possible under the constraint of (1). By following these suggestions, the key idea of SHX is to cluster the search history and select population members from excessively generated candidate solutions by preserving the statistics represented by the clusters. Figure 1 illustrates the overview of our SHX. The proposed method is performed under the framework of RCGA which mainly involves survivor selection and crossover. Mutation is optional, but we exclude it to clearly investigate the effectiveness of SHX in this work.
The proposed method is described in Algorithm 1. Population is denoted by which comprises individuals, and the population at the th generation is denoted as . Similarly, parents for SHX, excessively generated candidate solutions during SHX, offspring after SHX, and survivors for the next generation are represented by , , , and , respectively. The size of each set is denoted using with a subscript of the set name (e.g., the size of parents is denoted by ). In addition to , our method manages an archive which preserves survivors throughout the generation alternation. and are initialized by randomly placing individuals in the search space. The archive update process is conducted after the survivor selection. Survivor individuals of the current generation are aggregated into both and of the next generation. SHX can be further divided into parent selection, offspring generation, and offspring selection. Different from conventional RCGA, individuals generated from are regarded as offspring candidates . The main purpose of SHX is to narrow down to individuals denoted by according to the statistics provided by . is calculated from the clustering result of the archive and immediately impacts the offspring selection.

SHX can adopt any existing crossover operators (e.g., BLX [17] and SPX [21]) for the offspringGeneration function (Algorithm 1, line 8) to generate from . For the parentSelection function (Algorithm 1, line 6) and the survivorSelection function (Algorithm 1, line 11), the just generation gap (JGG) [26, 27] is employed in this work. That is, the parentSelection function randomly extracts individuals from as , and the survivorSelection function selects top individuals in as according to the fitness value. To show the performance increase brought by SHX, we choose the widely applied BLX, SPX, and UNDX for the offspring generation and compare the results in Section 6. We explain archiveUpdate (Algorithm 1, lines 3 and 13) and offspringSelection (Algorithm 1, line 9) in detail in Section 4 and Section 5, respectively.
4. Survivor Archive
Since the genetic operations are run alternately and iteratively, collecting and analyzing the history data may be beneficial for boosting performance. Given that SHX is to maintain the historical statistics while producing offspring for the next generation, the archive is designed to store over few past generations and extracts statistics . The calculation of is based on the kmeans, which is an offtheshelf nonsupervised clustering method. The pseudocode of kmeans is shown in Algorithm 2. In particular, kmeans is employed to cluster the individuals in based on their position in the search space, and is a normalized frequency histogram to show the proportion of each cluster size to . A higher score indicates that the corresponding cluster is more likely to be a promising search region. The statistics can then be maintained by probabilistically assigning newly generated candidates to each cluster according to .

To keep the computational cost brought by kmeans within an acceptable and constant range, the archive size is fixed to . That is, a part of individuals in must be replaced with new survivors during the archive update to incorporate new information. Two types of update methods are considered in this work: (1) randomly selecting individuals in and replacing them with (denoted by ); (2) replacing a part of with in the order in which the individuals of arrived (denoted by ). The performance comparison between these two approaches is discussed in Section 6.
The update of and calculation of are executed in the function archiveUpdate (Algorithm 1, lines 3 and 13) which is summarized in Algorithm 3. At the replacement step (Algorithm 3, line 4), individuals are discarded from based on or approaches, and new are stored to . Initialization is executed when equals 0. The kmeansFit function (Algorithm 3, line 7) updates the centroids of the clusters according to the updated and assigns updated cluster labels to each individual in . After that, the normalized frequency histogram for each cluster is calculated by the hist function (Algorithm 3, line 9) for further usage in offspring selection (Algorithm 4). Note that the initial centroids of the clusters in the current generation are inherited from the previous generation, as most individuals in are the same as .


5. Search HistoryDriven Crossover (SHX)
SHX randomly selects parents by following the strategy of existing crossover operators (e.g., two parents in the case of BLX and parents in the case of SPX) and excessively generates candidate offspring for further offspring selection. because must ensure a sufficient number of individuals that can be assigned to each cluster in . Here, generating individuals excessively can also be considered as a mechanism of diversity preservation. It is worth pointing out that the offspring selection is a different procedure from the survivor selection. Offspring selection belongs to the crossover model and is conducted before fitness evaluation. Survivor selection is conducted after fitness evaluation. Offspring selection narrows down to based on roulette wheel selection [28]. Each proportion of the wheel relates to each possible selection (i.e., clusters), and is used to associate a probability of selection with each cluster in . This can also be viewed as a procedure that SHX preferentially selects individuals in more “promising” regions. This bias selection can encourage the evolution of the population and accelerate the whole convergence. Besides, the statistics of the population (e.g., cluster size) can be maintained between two consecutive generations because the new generation is sampled based on the statistics of the history. Also, the diversity of can be preserved because each newly generated individual from has a probability to be assigned to.
The algorithm of offspring selection is shown in Algorithm 4. Input is excessively generated by existing crossover operators (Algorithm 1, line 8). Each candidate is labeled by the kmeans Predict function (Algorithm 4, line 1) based on the current clusters estimated from . Then, the roulette is constructed based on . The roulette selection is called times, yielding selected offspring. Each time of roulette selection produces a cluster ID, and one candidate in that belongs to the corresponding cluster is randomly selected and assigned to . To avoid duplicate selection, a selected candidate will be excluded from . If no more candidates correspond to a certain cluster (this is rarely the case by assuming ), the roulette is reconstructed by eliminating the proportion of the corresponding cluster. Finally, is passed to the survivor selection process which determines using JGG.
6. Experimental Results
The performance of SHX is investigated over 15 benchmark functions, with each function in two different dimension settings. We comprehensively compare the performance of RCGA with/without SHX, and SHX is run with different settings of archive update methods (/) and offspring generation methods (BLX [17]/SPX [21]/UNDX [20]).
6.1. Experimental Setup
Benchmark functions are a useful tool to verify the effectiveness of a method, and it is general to use several functions with different properties, such as in [29, 30]. We selected 15 benchmark functions with different characteristics from the literature [31–33] for evaluation. Detailed information of each function is summarized in Table 1. Initialization of the population and the archive is conducted within the range provided by the 4th column in Table 1. It is worth mentioning that the searching space (i.e., range of parameters) during the generation alternation is not constrained. Each function is labeled according to different combination of characteristics (U + S, U + NS, M + S, and M + NS). By involving various characteristics of functions, we can analyze the proposed method more comprehensively and objectively. Furthermore, as all selected functions are adjustable in the setting of dimension, we adopt two different numbers of dimensions ( and ) to control the difficulty degree of the search problem.
The setting of hyperparameters of the proposed method is listed in Table 2. The proposed method includes hyperparameters of not only RCGA (number of generations, , and ) but also SHX (, , and ). Basically, the search problem defined by each function becomes more hard as the number of dimensions increases, which requires a lot of evaluations. For adaptive adjustment, the number of generations, , and are set proportional to the number of dimensions. The constant values of each parameter are empirically determined because the purpose of the experiments is to validate the effectiveness of having SHX, rather than achieving the best solution for each function.
All experiments are executed 100 times with different random seeds. In each experiment, the generation alternation completely executed the number of generation times defined in Table 2. For a fair comparison, iterations under the same random seed start using the same population. The runtime and fitness are recorded with Python implementation (without either parallelization or optimization) on a i77700 CPU at 3.60 GHz, 12.0 GB RAM desktop computer.
6.2. Comparison in the FinalGenerationElite
The results of the absolute error between the optimal value and the finalgenerationelite fitness with respect to all combinations of functions, dimension, and methods are displayed in Table 3. Table 3 shows the minimum, maximum, median, mean, standard deviation (SD), and value of the Mann–Whitney U test by each combination. The Mann–Whitney U test evaluates the significance of SHX results against results without SHX under the significance level . Before showing the superiority by involving SHX, we first exclude a few results that all the methods are trapped by local optima or cannot reach the global optima. (1) Easom Function. This function has several local minima. It is unimodal, and the global minimum only has a small area corresponding to the search space, which can be hardly arrived at. (2) Schwefel 2.26. Since the setup of this experiment does not restrict the range of parameters during search, an extremely small fitness value (even smaller than the global optimum) can be achieved with this function, which is not suitable for comparisons.
From Table 3, we can observe the clear improvement of performance brought by SHX. The results of the value show that the methods with SHX have recognized the significance at least in 23 settings among all 30 settings. In the other five results (minimum, maximum, median, mean, and SD), the methods without SHX cannot achieve outperformed results for most settings. For instance, focusing on the minimum results, the methods without SHX outperform the methods with SHX only 5, 0, and 4 times by BLX, SPX, and UNDX, respectively. On the other hand, SHX with sequential archive update achieves the best performance. SHBLX_sequential, SHSPX_sequential, and SHUNDX_sequential show the significance in 27, 26, and 27 settings, respectively. In addition, they achieve the best results in most settings with respect to the maximum, median, and mean results. One possible reason for outperforming in most cases is that removes the oldest individual which arrived first, and therefore SHX can select offspring according to the uptodate search history to reflect the trend of evolution more sensitively. In contrast, uniformly removes individuals in the archive, which may impede the discovery of new solutions since old individuals may be retained for more generations in the archive.
6.2.1. Analysis on BLX vs. SHBLX
It has been already known that the standard BLX [17] faces difficulties especially when the target function is nonseparable [34] due to the parameterwise sampling. By observing the results of to and to from Table 3, we can find that involving SHX significantly improves the performance, which indicates that SHX can help BLX to greatly mitigate this drawback. It is easy to understand because offspring selection with clusters embeds distance measure which builds the relationship among parameters.
6.2.2. Analysis on SPX vs. SHSPX
SPX [21] is a better alternative of BLX, and we can observe from Table 3 that SPX noticeably outperforms BLX. From Table 3, it is also very clear that SHX further boosts the performance of SPX to a large extent. In particular, the results of minimum and median are improved by involving SHX for all settings. As pointed out in [21], SPX has the ability to maintain the mean and covariance of the parent individuals, which is consistent with the design guideline of good crossover operators mentioned in Section 3. Since SHX manages an archive that stores search history over few generations, it can preserve some useful statistics (e.g., centroids of clusters) much longer. That is why SHX is able to enhance SPX.
6.2.3. Analysis on UNDX vs. SHUNDX
Similar to BLX and SPX, Table 3 shows that the results involving SHX are improved in most settings. UNDX is also designed to generate offspring inheriting the distribution of the parent individuals [35]. Therefore, statistics of the search history provided by SHX are useful for UNDX to enhance search ability.
6.3. Comparison in Convergence Curve
With the aid of search history, SHX not only achieves better results but also improves the convergence speed. In this section, we compare the generation alternation for over all the test functions in the case of . Evaluation values of elite individuals from the 1st generation to the 100th generation are plotted in Figure 2. The mean value of 100 trials is represented by the line, and the range between the minimum and the maximum is represented by the shaded area. Smaller area means more stable search. It should be noted that as the ranges of parameters are not constrained during the search procedure, methods can achieve infinitely small values of fitness, and a lower value does not mean a better result in the case of , as explained in Section 6.2.
(a)
(b)
(c)
(d)
(e)
(f)
For BLX, SPX, and UNDX, exploiting SHX shows faster convergence speed comparing against them without SHX in most cases. The superiority becomes more obvious when the problem setting is more difficult (e.g., multimodal functions vs. unimodal functions ).
6.4. Comparison in Processing Time
In this section, we show the runtime overhead of the processing brought by SHX. Figure 3 shows the comparisons in processing time of an optimization task ( and a single fitness evaluation takes 0.01 second) for BLX and SPX. The parameter setting follows Table 2, and all the results are averaged over 10 trials. It took 93.9 seconds and 94.1 seconds for BLX and SPX to complete the entire process, respectively. SHBLX_random took additional 1.7 seconds to BLX. SHBLX_sequential took 1.6 seconds more than BLX. Similarly, the additional runtime for SHSPX_random and SHSPX_sequential to SPX were 3.9 seconds and 3.9 seconds, respectively. These numbers demonstrate the additional runtime only occupies a small part of the total processing time. These additional computational costs mainly occur in the clustering with archive data and the label assignment with candidate offspring. The cost can be further reduced by fusing efficient distance measure or parallel computing. For a fixed size of an archive, the runtime grows linearly with the increase in the number of generations. Considering the complexity of the fitness function and the budget, SHX is a practical alternative to other crossover models.
7. Conclusions
In this paper, we have proposed a novel crossover model (SHX) which is simple yet effective and efficient. It can be easily integrated with any existing crossover operators. The key idea is to exploit search history over generations to gain useful information for generating offspring. Experimental results demonstrate that our SHX can significantly boost the performance of existing crossovers, in terms of the final solution and the convergence speed. Also, according to experiments, the induced extra runtime is negligible compared to the total processing time.
SHX still has a few limitations. (1) Additional hyperparameters need to be determined. (2) The induced additional runtime may be unable to sufficiently support applications which require high processing speed. As the future work, we would like to address the above limitations. For instance, hyperparameters can be adaptively set by considering specific contexts, and parallelization can be introduced to speed up SHX.
Data Availability
The test data used to support the findings of this study are included within the article.
Disclosure
A preliminary version of this work appears in GECCO2020 and has also been mentioned in the manuscript which can be viewed at the following link: https://arxiv.org/abs/2003.13508.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This work was partly supported by JSPS KAKENHI, Grant number (JP18K17823).