Abstract

Computationally expensive optimization problems are often solved using surrogates and a common variant is the radial basis functions (RBF) model. It aggregates several basis functions which all depend on a hyperparameter affecting their individual outputs and consequentially the overall surrogate prediction. However, the optimal value of the hyperparameter is typically unknown and should therefore be calibrated. This raises the question how does the hyperparameter affect the overall optimization search effectiveness (and not just the stand-alone surrogate accuracy) and to what extent is such a calibration beneficial, which is an important consideration both for end-users and algorithm researchers alike. To rigorously address this issue this paper presents an analysis based on an extensive set of numerical experiments with an RBF surrogate-assisted evolutionary algorithm. It follows that the hyperparameter strongly affected performance and that the extent of its impact varied depending on the basis function, objective function modality, and problem dimension. Overall, calibration of the hyperparameter was typically highly beneficial to the search performance while dynamically optimizing the hyperparameter during the search yielded additional performance gains.

1. Introduction

Industrial optimization problems often involve functions whose evaluation is computationally expensive, for example, in engineering design optimization candidate products could be evaluated by a computer simulation (instead of real-world experiments) to improve the design process efficiency. This simulation-driven setup introduces several optimization challenges: (i) the simulation acts as a “black-box function”, namely, it assigns an output value to each input vector (candidate product) through involved numerical procedures or proprietary codes but there is no analytic expression for this functional relation thereby necessitating derivative-free optimization methods, (ii) each simulation run is typically computationally intensive so only a small number of vectors can be evaluated, and (iii) such “black-box functions” often have complicated features (such as multimodality) which exacerbate the search difficulty.

A framework which has shown to be effective in such problems combines an evolutionary algorithm (EA) with a surrogate model (also termed in the literature as a “metamodel”) yielding the surrogate-assisted EA (SAEA). The evolutionary algorithm is a heuristic search algorithm which seeks an optimum by using operators inspired by the process of evolution and applies them to a population of candidate solutions, while the surrogate model approximates the true expensive black-box function and provides estimated function values at a much lower computational cost when compared to the true expensive function. A common surrogate variant is the radial basis functions (RBFs) model which uses a set of functions whose values are linearly aggregated. RBFs rely on a hyperparameter (HP) which affects their individual outputs and consequentially the surrogate prediction and the overall SAEA performance. The optimal HP value is typically unknown, and therefore it needs to be calibrated, which raises the following question: how significant is the impact of the HP on the search and does the numerical effort spent on such calibration yield significant performance gains?. The impact of the HP has been examined in the literature with respect to the stand-alone surrogate prediction accuracy but hardly so in the context of the overall surrogate-assisted search, which is a significantly more involved relation [1]. To address these open questions this study presents an extensive set of numerical experiments and analysis which examine how the HP affects the overall SAEA performance both when a fixed HP is selected a-priori and when it is dynamically calibrated during the search. Based on the large volume of results, the analysis clarifies in which instance does the HP strongly affect the SAEA performance. The remainder of this paper is organized as follows: Section 2 surveys the relevant literature, Section 3 explains the methods used, Section 4 presents the numerical experiments and their analysis, and lastly Section 5 concludes the paper.

2. Literature Survey

SAEAs have shown their effectiveness across a variety of problems ranging from drug formulation to aircraft design [2, 3]. The success of such algorithms stems from the combination of the EA, which is both derivative-free and capable of handling functions with complicated features, with the surrogate which provides approximate function values efficiently. In this setup the EA searches for an optimum of the surrogate while the latter is updated during the search. A basic SAEA is presented in Algorithm 1.

Input: initial sample size, max evaluations, EA parameters;
Generate an initial set of vectors and evaluate with the true function;
Repeat
 Train a surrogate model by using the vectors evaluated so far;
 Search for an optimum of the surrogate by using an EA;
 Evaluate the predicted solution (and possibly additional vectors) with the true function;
until max evaluations;
Output: the best solution found;

Various SAEA implementations have been explored in the literature, for example in [4], a multifidelity setup was proposed where several simulations of different accuracy were combined for design optimization of hydraulic turbines. In [5] the authors used an RBF-based SAEA for design optimization of a mechanical controller, while in [6] a multiobjective SAEA based on a Kriging surrogate was proposed. In [7] the authors used a SAEA to optimize the layout of a wind turbine farm with the goal of maximizing energy production while accounting for wind regimes and turbine wakes. More intricate frameworks include [8] which combined an EA with machine-learning techniques to handle vectors in which the simulation fails and [9, 10] in which ensembles of multiple surrogates were used concurrently to improve the prediction accuracy. Elaborate surrogate updating procedures were studied in [11, 12], while a dynamic in-search selection of the surrogate type was studied in [9, 13]. Recently in [14] the authors proposed a dual surrogate-assisted algorithm which uses a global surrogate and a group of local surrogates for approximation and a dual-population cooperative particle swarm algorithm to explore the surrogates to better handle highly multimodal functions. Other studies have proposed ensembles of surrogates [15] and more recently in [16] the authors proposed using an ensemble of local surrogates and selecting the subset that provides the most accurate approximation. Also related is [17] in which the authors proposed a surrogate-assisted particle swarm algorithm in which the sampled points are split into multiple subsets and a different surrogate is trained based on each sample. The surrogates are then managed to improve the global and local search capabilities of the algorithm. Also using multiple surrogates, in [18] the authors proposed a surrogate-assisted algorithm which used both a Kriging surrogate for localized modelling and a global quadratic model to approximate the overall function trend. The algorithm also uses two optimization algorithms (“grey wolf” and a multistart search) to identify good solutions from each surrogate.

Various studies have also examined the impact of the HP on the stand-alone surrogate. In [19] a Kriging surrogate was used along with several HP calibration methods to achieve different balances between global and local prediction accuracies. In [20], the authors proposed estimating the optimal Kriging HP by using a multistart gradient search on the cross-validation error function, while in [21], calibrating the HP was found to affect the accuracy of various surrogates. In [22], the authors found that the size of the initial sample affected the calibration of the HP. However, the mentioned studies examined the impact of the HP only on the prediction accuracy of the stand-alone surrogate while this study analyzes the impact on the overall SAEA performance.

3. Methods

3.1. The RBF Surrogate

The RBF surrogate used in this study is a linear combination (superposition) of basis functions (BFs), namely,where is the surrogate prediction, while , , and are the sampled vectors and basis functions, respectively. The scalar weights are determined from the Lagrange interpolation conditions

Two widely used BFs are the Gaussian and multiquadric (MQ) [23] which are defined aswhere is the HP which determines the BF shape and consequentially the overall surrogate prediction . This relation is visualized in Figure 1.

Given its impact, various studies have examined how to calibrate the HP based on the stand-alone surrogate accuracy. In [24], the authors used an empirical fixed value derived from the mean distance between the sampled vectors but without accounting for the observed function values. In [25], the HP was calibrated by minimizing the root mean square error (RMSE) of the RBF prediction by using a gradient descent. A related leave-one-out cross-validation procedure was proposed in [2629] with various modifications explored in [3033]. In [34] the HP was calibrated based on a least-squares fit procedure, while in [35], MARS interpolating polynomials were used. More recently, [36] proposed using an EA for the HP calibration. In the numerical experiments performed in this study, both a fixed and a dynamically optimized HP were used as explained in Section 4.

3.2. The SAEA Algorithm

A representative SAEA was implemented for the numerical experiments. It begins by sampling an initial random population and training the RBF surrogate. The main loop then begins in which a real-coded EA seeks an optimum of the surrogate. Every generations the best percent of the population members are evaluated with the true objective function and are added to the cache of sampled vectors. The surrogate is retrained based on the updated cache and the EA population is re-evaluated with the new surrogate. Following [37] the settings and were used. The loop repeats until either the true (expensive) function evaluations reach (where is the function dimension), or 1000 EA generations have occurred. Algorithm 2 provides the complete SAEA workflow.

Input: initial sample size, max evaluations, EA parameters, and parameters;
 generate and sample an initial random population;
repeat
  train an RBF surrogate based on the evaluated vectors;
  run one generation of the EA;
  for eachgeneration do
   evaluate with the best vectors with the true function;
   retrain the surrogate and re-evaluate the population;
  end
until max function evaluations or max EA generations;
output: the best solution found;

For the real-coded EA a representative variant with a population size of 20 was used which operates as follows: (i) vectors are ranked by fitness, (ii) parents are selected by stochastic universal selection (SUS) and recombined with a uniform crossover operator with a probability of 0.7 to produce offspring, and (iii) the latter are mutated with the breeder genetic algorithm operator with a probability of 0.1.

4. Experiments

4.1. Fixed HP

A first set of numerical experiments was performed to analyze if and to what extent different fixed HP values, namely, which are chosen a-priori, affect the overall SAEA performance. Following the Design-of-Experiments (DOE) framework [38] the numerical experiments were formulated by using a single variable, namely the HP, which was assigned different fixed values, and 3 factors (function type, function dimension, and BF type). Accordingly, in each combination of factors the numerical experiments were repeated with the different HP values so that the HP impact could be analyzed across a wide range of problem settings. The DOE setup is summarized in Table 1.

The established set of test functions of [39] was used as it includes functions with diverse features as shown in Figure 2.

In each test case (namely, a combination of factors), the SAEA was ran with four different HP settings and with 30 runs per setting to obtain an adequate sample, thereby yielding a total of 2400 runs .

Tables 25 present the resultant statistics (mean, median, standard deviation, min, and max) of the final function values for each batch of 30 runs across the different HP values and test cases. Also provided is the -value of the Mann–Whitney statistical significance test. It was obtained by comparing the results corresponding to the HP associated with the best mean to the results associated with the 3 remaining HP settings. Statistical significance was measured at the 0.05 level, namely, for . As such in each of the 20 combinations of factors (function-dimension-BF), the number of significant performance differences observed (denoted ) varied between 0 (no significant differences observed) which implies that the HP had no significant impact to 3 (all comparisons had significant differences) which implies that the HP had a strong impact.

The corresponding results are summarized in Table 6 from which it follows that in 1 combination of factors the HP had no significant impact, in 3 cases it had only a minor impact, and in the remaining 16 cases it had a significant impact. The cases showing a lessened HP impact are low-dimensional and involve the Gaussian basis function. These results are attributed to the relative simplicity of the lower-dimensional cases which allowed for a good approximation across a range of HP values. Also, the output of a Gaussian BF surrogate approaches the sample mean faster than a MQ surrogate due to the Gaussian BF behaviour and therefore the RBF output was similar for different HP values which results in fewer performance differences, as observed. In contrast, with the MQ basis function and with the high-dimensional cases the HP consistently had a significant impact. This is attributed to the MQ basis function, whose output varies for a wider range of HP values. Also, in high-dimensional cases, generating an accurate surrogate is more challenging hence, different HP values resulted in significantly different prediction accuracies. These aspects are further addressed in length in the analysis which follows.

Expanding on the preceding analysis an additional fine-grained analysis was also performed to highlight how each factor (BF, modality, dimension) affected performance and the results are summarized in Table 7. It follows that the per-factor influence is as follows:

4.1.1. Basis Function

For a Gaussian BF in 4 cases varying the HP had a negligible impact while the MQ BF variations were always significant, namely, the sensitivity to the HP was lower for a Gaussian BF than it was for a MQ one. This is attributed to the mathematical formulation of each BF, namely, for a Gaussian BFthe expression decays to 0 rapidly for a wide range of HP values which results in similar surrogate predictions for different HP values and consequentially a lower HP impact. In contrast, with a MQ BF,

the output varies over a wider range of HP values which in turn affects the surrogate prediction and results in a more significant HP impact. As an example Figure 3(a) shows a plot of the RMSE of the surrogate prediction when using Gaussian and MQ BFs with the Ackley-5D function. The error quickly rises for the Gaussian BF with HP values since the individual BFs decay to zero and the surrogate prediction approaches the sample mean. In contrast, there is a much higher sensitivity to the HP values with the MQ BFs as observed in the experiments.

4.1.2. Modality

The HP impact was significant for the low modality functions (Dixon, Rosenbrock, 2-3 significant differences per test case) but its impact was diminished for the high modality functions (Ackley, Griewank, and Rastrigin). This is attributed to the fact that as the modality was increased the surrogate prediction accuracy degraded due to the more complicated function features up to a point where the prediction accuracy was negligibly affected by different HP values. To demonstrate this, numerical experiments were performed with a modified Rastrigin functionwhere is a prescribed parameter which determines the function modality (higher values increase modality). The surrogate prediction accuracy was examined for a 5D function and results are presented in Figure 3(b). It follows that the surrogate prediction RMSE increased with modality for all HP values as observed in the numerical experiments. Consequentially the impact of the HP was more pronounced for the lower modality settings and diminished as the modality was increased, as observed in the numerical experiments.

4.1.3. Dimension

In the lower dimensional cases , the HP had a significant impact in 7 cases, while in the higher dimensional cases in 9 cases. These results are attributed to the “curse of dimensionality,” namely in the lower dimensional problems a more accurate surrogate could be trained since the problems are simpler and hence the sensitivity to the HP was lower. In contrast, in the higher dimensional cases training an accurate surrogate was more challenging and required more precise HP values hence the higher sensitivity to the HP. To demonstrate this Figure 3(c) shows the variation of the prediction RMSE for different dimensions with the Rosenbrock function. It follows that for lower dimensions changing the HP affected the RMSE much less than for higher dimensions, as observed in the experiments.

Beyond the above per-factor analysis, an additional evaluation was performed to examine if and to what extent the impact of the HP varied depending on the main algorithm parameter of the limit of true (expensive) function evaluations. Accordingly, beyond the baseline evaluations limit the full set of numerical experiments was repeated also for a limit of 100 d and 200 d. Table 8 summarizes the obtained results for the number of cases in which the HP had a significant impact. It follows that the impact was consistent across all sets and that the number of cases with a statistically significant performance difference and the test cases in which they occurred remained largely unchanged across the three settings. Accordingly, the remainder of the analysis is presented for the setting.

Overall, the analysis shows that except for specific cases (Gaussian BF/low modality functions/low dimensions), the HP had a significant impact on the overall SAEA performance.

4.2. Dynamically Optimized HP

The preceding analysis examined cases where the HP value was unchanged. This raises the following question: would the overall SAEA performance vary if the HP is dynamically calibrated during the search?. For a rigorous evaluation an additional set of experiments was performed with a modified algorithm in which the HP was dynamically calibrated based on a cross-validation procedure with an 80-20 training-testing split ratio. The HP optimization was achieved by a golden search algorithm in the interval . Tests were conducted as in the preceding section and results are presented in Table 9 and compared to those from the best-performing variant (mean-wise) from the preceding section. It follows that optimizing the HP consistently improved performance as evident both from the mean and -values where the variant with the optimized HP had a statistically significant performance advantage at 0.05 level (-value) in 8 out of 10 cases (all except the Ackley-5D and Rosenbrock-5D functions).

Furthermore, the prediction accuracies of the optimized and fixed HP variants were compared and the results are presented in Figure 4. It follows that the optimized HP variant had significantly smaller prediction errors and consequently achieved better final results. This trend was observed across different test cases, as shown. Overall, the analysis shows that dynamically optimizing the HP consistently improved the SAEA performance.

5. Conclusion

Surrogate-assisted evolutionary algorithms (SAEA) are often used to solve simulation-driven optimization problems and a common surrogate is the radial basis functions (RBF) model. The latter relies on a hyperparameter (HP) which affects both the individual basis functions and the overall surrogate prediction. The HP needs to be set by some numerical procedure and this raises the question how significant is its impact on the overall SAEA performance and not only on the stand-alone surrogate, namely, is the computational effort spent on the HP calibration justified in terms of the overall SAEA performance. This is an important consideration both for end-users and algorithm researches.

To address this issue, this paper has presented an extensive set of experiments involving a representative RBF-based SAEA. Analysis of the large volume of results shows that a fixed HP typically had a significant impact on performance except for specific cases (Gaussian basis function/low dimension/high modality). Analysis also shows that dynamically optimizing the HP during the search significantly improved performance. These results indicate that a careful a-priori selection of the HP can significantly improve the SAEA effectiveness while additional gains could be achieved with dynamic HP calibration.

Data Availability

Log files of the numerical experiments are available from the corresponding author upon request.

Conflicts of Interest

The author declares that there are no conflicts of interest.