Abstract

Surrogate models (SMs) can profitably be employed, often in conjunction with evolutionary algorithms, in optimisation in which it is expensive to test candidate solutions. The spatial intuition behind SMs makes them naturally suited to continuous problems, and the only combinatorial problems that have been previously addressed are those with solutions that can be encoded as integer vectors. We show how radial basis functions can provide a generalised SM for combinatorial problems which have a geometric solution representation, through the conversion of that representation to a different metric space. This approach allows an SM to be cast in a natural way for the problem at hand, without ad hoc adaptation to a specific representation. We test this adaptation process on problems involving binary strings, permutations, and tree-based genetic programs.

1. Introduction

Some optimisation problems have objective functions which are prohibitively expensive to evaluate [1, 2]. Functions may be mathematically ill behaved (e.g., discontinuous, nonlinear, or nonconvex) or even a black box with largely unknown characteristics. Many engineering design problems have functions of this type [3, 4] and require experiments, lengthy simulations or both, to evaluate the extent to which the design objectives are met by a function of parameters controlling the design. In the jargon of evolutionary computation, these controlling parameters are the genotype that encodes the design (i.e., the phenotype) which has to be expressed by means of an expensive simulation (i.e., a fitness evaluation).

Optimisation methods based on surrogate models (SMs), also known as response surface models, can tackle this problem of expensive objective functions [57]. A survey of surrogate model-based optimisation (SMBO) methods can be found elsewhere [8]. An SM is an easily evaluated mathematical model that approximates an expensive objective function as precisely as possible. Inside knowledge of the objective function is not necessary to construct an SM, which is solely built from discrete evaluations of the expensive objective function. We refer to a pair of a candidate solution and its known objective function value as a data-point. Many simple problems have solutions which are real numbers, and perhaps the simplest example of an SM is piecewise-linear interpolation, which creates a function from data-points by linking them with straight-line segments. More useful SMs for solutions on the real line are polynomial interpolants, which have a continuous differential. These and other methods of building SMs naturally extend to spatial interpolation and regression.

The usual SMBO procedure [8] is given in Algorithm 1. An initial SM is constructed from a few solutions of the expensive objective function. Further evaluations are applied to candidate solutions which the SM predicts to be promising. Subsequently, the processes of searching the SM to obtain an optimum set of solutions, evaluation of the solutions using the expensive objective function, and update of the SM with the new data-points are repeated. An evolutionary algorithm can be used in the SMBO procedure to infer the location of a promising set of solutions using the SM, rather than having to evaluate the expensive objective function. This is feasible because the computational cost of a complete run of the evolutionary algorithm on the SM is negligible (in the order of few seconds) with regard to the cost of evaluating a solution using the expensive objective function of the problem (in the order of minutes, hours, or even days depending on the problem).

(1) Sample uniformly at random a small set of candidate solutions and evaluate them using the
 expensive objective function (initial set of data-points)
(2) while a limit on the number of expensive function evaluations has not been reached do
(3) Construct a new surrogate model (SM) using all data-points available
(4) Determine the optimum value of the SM by search, for example, using an evolutionary algorithm
  (this is feasible as the model is cheap to evaluate)
(5) Evaluate the solution which optimises the SM using the expensive objective function
  (making an additional data-point available)
(6) end while
(7) Return the best solution found

Virtually all SMs are implicitly or explicitly spatial models, and the prediction process involves exploiting some assumed spatial relations (e.g., a smooth curve of surface) between the values of the objective function at a query point and those at the known data-points. This makes SMBOs naturally suited to continuous optimisation problems. However they are not obviously applicable to combinatorial optimisation problems, except those with solutions which are naturally represented as vectors of integers, when a discretized SM may be used. When each solution is a vector, an integer, or a real number, techniques for building SMs from data-points can be borrowed from statistics (e.g., multivariate regression [9]) or from machine learning (e.g., supervised learning by neural networks or support vector machines [1012]).

There is increasing interest in optimisation problems with solutions with complicated representations which also have expensive objective functions. For example, permutations and related representations are natural representations of solutions to many scheduling problems. But a candidate schedule may have to be tested by simulating an entire production process, making the SMBO approach very attractive. However, although a permutation can be regarded as a special type of vector, permutations cannot be treated in the same way, because the information they encode is in the order of the elements, not their values. This makes the standard SMBO approach unsuitable.

Variable-length sequences occur in many bioinformatics problems [13], and an SMBO can be used to select biological sequences for detailed study or simulation at an atomic level: an example is the search for proteins with desired properties.

Genetic programming (GP) [14] normally operated on a tree representation of a problem, and a number of its well-known applications have expensive objective functions. For example, genetic programs can be used to encode a robot’s behavioral controller, which may need to be tested repeatedly in a virtual or real environment to assess how good it is at controlling the robot in performing a task such as wall-following or obstacle avoidance [15].

Let us summarize current situation of SM with regard to solution representations. Evolutionary algorithms and other search algorithms have been widely used to optimise SMs for continuous spaces [16]. More recent work [17] has considered vector solutions. Other studies [18] have approached applications with expensive objective functions which are inherently combinatorial problems with structured solutions (e.g., graphs) by encoding solutions in vector form to allow the use of standard SMs. Evolutionary algorithms have also been used to train, rather than search, the SM using the known data-points [19]; in the approach, GP performs symbolic regression to obtain the vector-input function which best fits the data-points.

Apart from the recent initial work of the present authors [20, 21], SMs do not seem to have been defined directly on more complicated representations than vectors. In order to use SMs on search problems with structured representations, the state of the art is to shoe-horn the original representation into a vector form in a preprocessing phase, known as feature extraction in the machine learning literature [22]. There are a number of drawbacks to this approach. For a start, feature extraction is a very delicate task. Only a carefully chosen vector of features will be a good representation of the information relevant to a learning task. Secondly, the unnatural encoding of a solution in vector form introduces extra nonlinearity into an already expensive objective function, making it harder to learn and consequently requiring additional expensive function evaluations to approximate it well enough to locate the optimum solution. In addition, the extraction of features from structured representations such as GP trees is itself unnatural and hence ineffective. For example, a symbolic regression formula or a Boolean formula would appear to have no obvious mapping to a fixed-length vector.

The underlying difficulty is that of making a problem fit the format of current SMs. Surely is it better to modify the SM to accommodate the problem? Or is there some way to modify satisfactory SMs to accept more complicated solution representations?

We recently [20, 21] answered these questions by generalizing a well-known class of SMs—radial basis function networks [23]—using a geometric framework [2427] which had previously been used to generalize search algorithms, such as particle swarm optimisation and differential evolution, from continuous spaces to combinatorial spaces. The generalization method is conceptually simple. Firstly, an algorithm which operated in a continuous space is rewritten in terms of Euclidean distances between points. Many spatial algorithms can be rewritten in this way. Then Euclidean distance is replaced with a generic distance metric, which yields a formally well-defined algorithm. This algorithm can be adapted to any solution representation by specifying an appropriate distance metric for that representation.

An algorithm generalised using this geometric methodology can readily be adapted to complicated representations because many types of structured object admit natural relations of distance or similarity. In particular edit distances are well suited to structured objects. The edit distance between two configurations is the minimum number of unit edit operations required to transform one of them into the other. For example, hamming distance is an edit distance between binary strings based on the unit edit of a bit flip. For permutations, another metric is swap distance, which is the minimum number of binary exchanges of elements required to transform one permutation into the other. For variable-length sequences, Levenshtein distance measures the minimum number of insertions, deletions, or changes of characters required to transform one sequence into the other. There are also edit distances defined on trees and graphs, based on modifications of edges and nodes.

In the remainder of this paper, we first review how radial basis function networks [23] can be generalised to a range of solution representations using this geometric methodology. We will show how the resulting generalised models can be linked to a target representation using an appropriate distance metric and then used within an SMBO to optimise problems on the target representation. We will illustrate the derivation of SMBOs for three target representations: binary strings, permutations, and GP trees. All our test problems are assumed to have costly objective functions. We use hamming distance as the metric for binary strings and test the resulting SMBO on the well-known NK-landscapes [28] problem. We use hamming distance and swap distance with permutations and test the SMBO on the quadratic assignment problem [29]. We use a form of tree edit distance with GP trees and address standard GP benchmarks of symbolic regression and parity. We should be clear that we are not aiming to show that a generalised SMBO can replace expensive objective functions with structured representations in solving practical problems, but to demonstrate that generalised SMBOs can be in principle applied to such problems, and that it provides meaningful results when applied to classic example problems in simple discrete spaces, which is itself a large conceptual leap.

2. Radial Basis Function Networks

The machine learning literature [22] contains a number of approaches to problems of finding a function in a certain class that best interpolates a set of the data-points which are naturally cast in terms of Euclidean distances, which could readily be generalised to other metric spaces, by replacing Euclidean distance with some metric. These methods include nearest-neighbor regression, inverse distance-weighted interpolation, radial basis function network interpolation, and Gaussian process regression (also known as kriging). The first two methods are relatively simple but they cannot be used as SMs because the global optimum of the functions created from the data-points coincides with a data-point used in the construction of these functions and these methods never provide better solutions than any of the data-points. Gaussian process regression [30] is a very powerful method with a solid theoretical foundation, which can not only extrapolate a global optimum but also give it an interval of confidence. Radial basis function network interpolation is similar to Gaussian process regression but conceptually simpler. We focus on radial basis function networks (RBFNs) and leave the generalization of Gaussian process regression for future work.

2.1. Classic RBFNs

A radial basis function (RBF) is a real-valued function whose value depends only on the distance from some point , called its center, so that . The point is an argument of the function. The norm is usually Euclidean, so is Euclidean distance between and , but other norms are possible and have been used. Commonly used RBFs include Gaussian functions, multiquadrics, poly-harmonic splines, and thin-plate splines. The most frequently used are Gaussian functions of the form: where is the width parameter.

RBFs are typically used to build function approximations of the form: The approximating function is thus the sum of RBFs, each associated with its own center , width , and weighted by a coefficient and there is a bias term . Figure 1 shows an example of a function obtained in this way.  Any continuous function can in principle be approximated with arbitrary accuracy by such a sum, if enough RBFs are used.

In an RBFN, there are three types of parameters that need to be determined to optimise the fit between and the data: the weights , the centers , and the width parameters . The most common way to find these parameters has two phases. Firstly, unsupervised learning (i.e., clustering) is used to determine the position of the centers and the widths of the RBFs. Then, the weights that optimise the fit are obtained by least-squares minimisation.

A simplified procedure for fitting an RBFN, which skips the unsupervised learning phase, is widely used. The centers are first chosen to coincide with the known points . Then the widths are determined by a heuristic based on the distance of each center to the nearest neighbors (local model) or all widths are set to the same value, which is chosen in relation to the maximum distance between any two centers (global model). The bias can either be set to the mean of the function values at the known data-points (i.e., training set), or to 0. The weights are then determined by solving the system of simultaneous linear equations in which express the requirement that the function interpolates the data-points: Setting , the system can be written in matrix form as . The matrix is nonsingular if the points are distinct and the family of functions is positive definite (which is the case for Gaussian functions), and thus the weights can be obtained by simple linear algebra:

2.2. Generalization of RBFNs to Arbitrary Representations

To generalize RBFNs, we need to generalize (i) the class of functions used to approximate the unknown function, (ii) the training procedure which finds the function within that class that best fits the data-points, and (iii) the model query procedure that predicts the value of the unknown function at a query point.

Following the geometric methodology of our generalization, we first need to rewrite each of the above three elements as a function of Euclidean distance alone and then substitute a distance metric which is chosen to suit the target representation. Finally we rewrite the algorithm in terms of that distance to obtain an instance of that algorithm specific to the target representation.

Let be a metric space associated with a distance function . An RBF whose value depends only on the distance from some point , so that , can be generalised to a function whose value depends only on the distance from some point in the metric space, so that . For example, generalised Gaussian functions can be obtained by replacing Euclidean distance with the generic metric in the original definition, so that .

A set of configurations and an associated edit distance comprise a metric space, as all edit distances meet the metric axioms [27, 31, 32]. Consequently, a generalised RBF is well-defined on any set of configurations, making it a representation-independent function. For example, the set of binary strings and hamming distance form a metric space. If hamming distance is used as the metric , then generalised Gaussian functions become well-defined functions , which map binary strings to real numbers. Note that both and are binary strings. Alternatively, if the swap distance on permutations replaces the metric , then these generalised Gaussian functions become well-defined functions mapping permutations to real numbers.

The SM , which is a linear combination of RBFs, can be generalised to a linear combination of generalised RBFs: . Like its components, the generalised SM is representation independent and it can be applied to any solution representation by replacing the metric with a metric appropriate to the target representation. An SM is generalized in this way of parameterizing many functions on general metric spaces economically in terms of , , and . This property is independent of the underlying representation. When the underlying metric space is finite as it is in combinatorial optimisation, any function can be approximated with arbitrary accuracy by a sufficiently large number of RBFs. In the limit, every point in space would be associated with an RBF, parameterised to fit the function value exactly.

The SM is fitted to the known data-points without reference to their underlying representation but solely in terms of the distances between data-points and the objective function values . Therefore the fitting process is representation independent, like the model. In particular, a simplified model-fitting procedure can obtain the centers, widths, and weights by least-squares minimisation of the system . However, when the distance function is not embeddable in Euclidean space, the RBFs are no longer necessarily positive definite, and neither is the matrix , and hence the inverse matrix needed to determine the weights , may not exist. This difficulty can be overcome by using the pseudoinverse of , which always exists, is unique, and corresponds to when that exists. It can also be shown that the weights determined by solving the system using the pseudoinverse are the same as those obtained by least-squares minimisation. This way of generalizing RBFNs to structured representations is related to kernel methods in machine learning. However, in those methods, the types of distances to be used between objects can be difficult to design, because they must be implicitly embedded in a vector space (i.e., positive-definite kernels), which is not necessary for our approach.

3. Experiments on Binary Strings

Binary strings are of course a special type of vector. However, they can illustrate the application of generalised SMBOs to combinatorial spaces because their property of being vectors is not utilised. We experimented with the well-known NK-Landscape problem [28], which provides a tunable set of rugged, epistatic landscapes over a space of binary strings, and we consider it to have a costly objective function. We evaluated the SMBO algorithm with landscapes of size , each for .

We used a standard SMBO algorithm (Algorithm 1). The SM is an RBFN model fitted to the data-points using the simplified learning procedure presented in the previous section. The centers of the RBFs are the data-points. The widths of the RBFs are all set to , where is the maximum distance between any two centers. Thus each RBF extends over all the centers, allowing the known function value at each center to contribute to the prediction of the function value at any point in the landscape near the given center. The value of the bias term is set to the average of the function values at all the known data-points. Thus the SM returns this value at any point outside the influence of all centers. The coefficients are determined by least-squares minimisation, as described in the previous section.

We set other parameters as a function of the problem size . Our aim is to find the best solution to this problem with candidate solutions in quadratic time; that is, we set the number of allowable expensive function evaluations to . Initially, 2 data-points are sampled, and sample points are suggested by the SM. To search the SM, we use a standard generational evolutionary algorithm with tournament selection with a tournament size of 2, uniform crossover at a rate of 0.5, and bitwise mutation at a rate of . The population size and the number of generations are both set to . If the predicted value of the best solution found by the SM is better than the best value at any of the known data-points, then the model could extrapolate from the data, and that solution is evaluated using the expensive objective function. Otherwise, a point is chosen at random and evaluated with the expensive objective function in an attempt to gather more data about undersampled regions.

We compared SMBO with random search (RS), a standard () evolutionary algorithm (() EA), and a generational evolutionary algorithm (EA), all using the expensive objective function directly. We expect evolutionary algorithms to outperform random search, but we include the latter as it can do well with small samples. We allowed all the algorithms evaluations of the expensive objective function.

The () EA has a population of a single individual and uses bitwise mutation with a bit-flip probability of . EA has a population of individuals, runs for generations, and uses tournament selection with tournament size 2, bitwise mutation with a bit-flip probability of , and uniform crossover at a rate of . For each of the 16 combinations of and , we generated a single fitness landscape and ran all for algorithms 50 times each. We also estimated the global optimum using an evolutionary algorithm with 1,000 individuals and 1,000 generations.

Table 1 shows that, for each combination of and , SMBO consistently found the best solution and the best average solution. Furthermore, in 12 out of 16 cases, SMBO was able to find the estimated real optimum. As the problem size increases, the differential in favor of SMBO increases. As expected, as the ruggedness of the problem increases, search algorithms get less close to the estimated real optimum. As for the other algorithms in the comparison, the population-based EA generally did better than () EA and RS, especially on larger problems. Perhaps surprisingly, RS often did better than () EA. It seems that () EA can easily get trapped at local optima, especially when the sample and problem sizes are large.

4. Experiments on Permutations

This section greatly extends our previous work [21]. Experiments were carried out on six standard quadratic assignment problems (QAPs), kra30a, kra32, lipa30a, nug30, ste36a, and tho30 (where the number in the name indicates the problem size), and on two instances of a unimodal problem on permutations of size 30, in which the fitness of the permutation, to be minimised, is given by its distance to some fixed permutation. This unimodal problem can be seen as a generalization of the OneMax problem for binary strings [33], in which the fitness of a solution is the number of in the string. This is in turn equivalent to a problem in which the fitness of a solution is given by hamming distance from the solution to the string with all bits set to 1. From the symmetry of hamming space, this problem is again equivalent to any problem in which a string with all bits set to one is to be replaced with some target string. The two instances of the unimodal problem are obtained by using two different distance functions on permutations, hamming distance (unih30), and swap distance [27] (unis30). We address this unimodal problem to test the SMBO on a fitness landscape with an explicit and visible topography. We will consider the problems in the test-bed as having costly objective functions and leave as future work testing the SMBO on real-world problems with expensive objective functions. Furthermore, using a larger test-bed and testing the scalability of SMBO with respect to instance size would be desirable. However, we found that it would take an excessive amount of time, as the SM is searched every time it is used to suggest a solution to test in the expensive objective function. We will also consider a larger test-bed and a scalability analysis in future work.

The algorithm that uses hamming distance is called and the algorithm using Swap distance is called . Clearly, the choice of a distance well suited to the problem at hand is crucial to obtain an SM able to make meaningful predictions and guide appropriately the search of the SMBO. In this paper, we limit ourselves to experiment with these two distances. In future work, we will investigate other distances and other problems in the attempt to find out a rule to select a priori a good distance for a given type of problem.

As in the previous section, that is, binary strings, we used a standard SMBO algorithm (Algorithm 1) with an RBFN model which is fitted to the available data-points using the simplified learning procedure. For , all the RBFs have the same widths , where is the maximum distance across all centers. However, this setting did not work well for , and we found that produced better results. The value of greatly affects the accuracy of the predictions of the SM. So, it needs to be tuned but might in future be “learnt” from the sampled data-point. The value of the bias term is set to the average of the known data-points, and coefficients of the RBFs are determined by least-squares minimisation, as in the previous section.

The other settings of the SMBO are as follows. For all problems, the allowance of expensive function evaluations is set to 100. Initially, 10 data-points are sampled, and the number of sample points suggested by the SM is . To search the SM, we used a memetic algorithm on permutations with truncation selection, cycle crossover, swap mutation at a mutation rate of 0.01, and local search based on the 2-opt neighborhood. The population size and the number of generations were both set to 20, and 10 new offspring were generated in each generation, allowing an adequate solution to be obtained from the SM. The solution with the best predicted objective value is evaluated with the expensive objective function, provided it has not been sampled before. If it has, the second-best is sampled provided that it has not been sampled before. Otherwise the third best is sampled and so on.

We compared the SMBO algorithms with random search (RS) and a standard genetic algorithm (GA), both using the expensive objective function directly. We allowed all the algorithms the same number of evaluations of expensive objective function.

The GA had a population of 10 individuals ran for 18 generations, with 5 new individuals in each generation. It uses truncation selection, swap mutation with a probability of 0.01, and cycle crossover. We did 50 runs for each algorithm and problem.

The results, presented in Table 2, consistently rank as the most effective algorithm, followed by the GA and , and then random search (RS). Clearly, the SM based on hamming distance is effective and better than swap distance by a surprising margin, considering that hamming and swap distance are closely related. even outperforms on the unimodal landscape under Swap distance (unis30), which we expected to favor .

We performed further analyses to try to understand the mechanism of the SMBO algorithms more fully. Firstly, to make sure that the distance metrics used by the SMBOs are suitable, we did a static analysis of the predictive power of the SMs in isolation from the SMBOs. This analysis is presented in Table 3. Looking at the number of significantly positive correlations and the average correlation, presented in Table 3, it is evident that hamming metric always gives better predictions than swap. Neither metric is deceptive, as there are no negative correlations. Fitness-distance correlations [34] of both distance metrics, given in Table 4, show that swap distance has higher correlations for the QAP, even though this is not reflected in its performance. This suggests that static prediction power gives a better indication whether a distance metric is appropriate for an SMBO.

Other aspects of the SM may affect the performance of an SMBO. For instance, we would obviously prefer to locate the real optimum of the SM before sampling with the expensive objective function and the GA which we actually use to search the SM provides no guarantee of finding of the optimum or even a good solution. How good are the solutions that it finds? Table 5 shows fitness-distance correlations for the SMs, after training with 100 randomly sampled data-points. All these values are extremely high, suggesting that the GA usually locates very good solutions.

Another attribute of the SM that may affect the performance of an SMBO is the effect of the distance metric and the parameter on the topography of the model. These choices affect the extrapolative property of the model, which allows an optimum value to be found which is higher than that of any data-points. Table 6 shows that can extrapolate much more often than . This may well provide any reason why outperforms . However, the precise merit of hamming distance in this regard remains a subject for future work.

5. Experiments on Genetic Programming

Experiments were carried out on standard GP problems, symbolic regression and parity problems, and a unimodal problem, in which the fitness of a tree (to be minimised) is given by its distance to a given tree. This last problem can again be seen as a generalization of the OneMax problem for binary strings [33].

We have used structural hamming distance [35] as the metric of distance between two GP trees: this is a parameterless variant of the well-known structural distance for GP trees [36].

As in previous sections, we used a standard SMBO with an RBFN model fitted using the simplified learning procedure. The RBFs have the same widths , where is the maximum distance across all centers. The value of the bias term is set to the average function value of the known data-points. The coefficients of the RBFs in the linear model are determined by least-squares minimisation.

We set other parameters as a function of the maximum depth of the trees in the initial population, which is likely to determine the proportion of the search space that will actually be visited. The maximum number of nodes in a binary tree with a maximum depth  is . The number of expensive function evaluations allowed was . Thus our aim was to get each algorithm to produce the best solution in a time linearly proportional to the maximum size of the trees in the initial population. We set the initial sample size to 2 data-points and the number of points suggested by the SM to . To search the SM, we use a standard GP with tournament selection using a tournament size of 2, subtree crossover at a rate of 0.8, subtree mutation at a rate of 0.17, and reproduction operator at a rate of 0.03. The population size and the number of generations were both set to , which we expected to provide GP with enough trials to locate a good solution of the SM. If the predicted value of the best solution found by the SM is better than the best value at any of the known data-points, then the model could extrapolate from the data, and that solution is evaluated using the expensive objective function. Otherwise, a point is chosen at random and evaluated with the expensive objective function in an attempt to gather more data about undersampled regions.

We compare the SMBO algorithm with random search (RS) and a standard GP, both using the expensive objective function directly. We allowed all the algorithms evaluations of the expensive objective function. The GP used has a population of approximately individuals and it runs for approximately generations. For fairness, the exact values of these two parameters are assigned in a way that their product is exactly . It uses tournament selection with a tournament of size 2, subtree mutation with a probability of 0.17, subtree crossover at a rate of 0.8, and reproduction operator at a rate of 0.03. For each problem, we varied the maximum depth between 3 and 7 and did 50 runs.

The results given in Table 7 make it immediately apparent that all algorithms get better results as is increased, as we would expect. On the unimodal problem, looking at the average results, SMBO is consistently the best, followed by RS and finally by GP. The unimodal problem has the best fitness distance correlation with structural Hamming distance, suggesting that this metric is well suited for applying SMBO to this problem. This suggests that a good distance metric for SMBO in general should have good fitness-distance correlation for the problem at hand.

Surprisingly, RS does better than GP, which appears not to have had enough fitness evaluations available to get the evolution process properly started, especially when the sample and problem sizes were large. On the parity problem, SMBO wins again but with a smaller margin. Again, GP is worse than RS; however, if it is allowed a larger budget of expensive evaluations (i.e.,  ), its performance matches RS. But more evaluations improve the performance of SMBO even more. On the symbolic regression problem, RS performs the best and GP the worst, although more evaluations allow SMBO and GP to outperform RS. This suggests that structural hamming distance is not particularly suitable for applying the SMBO to this problem.

There are many possible distances for parse trees we could use as basis for the SMBO. In future work, we should select distances suitable for the problem at hand, that is, that give rise to smoother/more unimodal landscape. In recent, Moraglio et al. [37] introduced a distance for GP, the semantic distance, that turns any GP problems into a unimodal problem. So for future work it could be interesting to use this distance as a base for SMBO.

6. Conclusions and Future Work

New applications are opened up by extending surrogate model-based optimisation (SMBO) to more complicated representations which cannot be naturally mapped to vectors of features. We have put forward a conceptually simple, formal, general, and systematic approach to adapting SMBO using radial basis function (RBF) networks to any target representation. Any algorithm that can be written in terms of Euclidean distances between candidate solutions can be generalised by replacing Euclidean distance function with a generic metric appropriate to the target representation (e.g., edit distance). RBF networks can be naturally generalised to encompass any representations because both the approximating model and the learning of the model parameter can be cast completely in a representation-independent way and rely only on distance relations between training instances and query instances.

We have validated experimentally the framework on three representations. First, we have considered the binary strings representation endowed with the hamming distance and tested the SMBO on the NK-landscapes, obtaining consistently that with the same budget of expensive function evaluations, the SMBO performs the best in comparison with other search algorithms. The second representation we have considered is the permutation representation endowed with hamming distance and with swap distance and tested the SMBO on the quadratic assignment problem and on unimodal problems, obtaining consistently that with the same budget of expensive function evaluations, the SMBO with hamming distance performs the best in comparison with other search algorithms. Surprisingly, the SMBO based on swap distance does not work as well as the SMBO based on hamming distance. We have presented an analysis in the attempt to elucidate the causes of the different performance. Further investigation is required to pinpoint the structural difference between Hamming distance and Swap distance that gives rise to the performance difference. Lastly, as an experimental validation of the framework on a nontrivial discrete space and structured representation, we have considered the genetic programming (GP) trees endowed with the structural hamming distance and tested the SMBO on a test-bed of standard GP problems, obtaining that with the same budget of expensive function evaluations, the SMBO performs well in a comparison with other search algorithms. These results suggest that our approach has the potential to solve real-world combinatorial optimisation problems with complicated solution representations and nontrivial discrete search spaces.

Much work remains to be done. Firstly, we plan to look at further well-known permutation and GP problems and consider different distance metrics. For instance, the traveling salesman problem may be cast in terms of a distance based on the 2-opt move. Then we intend to consider problems with other complicated nonvectorial representations, such as variable-length sequences. Our eventual aim is to address some challenging real-world problems in a new way. We will also experiment with different types of RBF and more complex learning processes (i.e., learning the centers and the widths of the RBFs). Lastly, we will attempt the generalization of more sophisticated interpolation and regression methods, including Gaussian process regression, which is a state-of-the-art method in machine learning.

Disclosure

A preliminary version of this paper appeared in Proceedings of the Eleventh European Conference on Evolutionary Computation in Combinatorial Optimization, pp. 142–154, 2011, and in Proceedings of the Genetic and Evolutionary Computation Conference (Companion Material), pp. 133-134, ACM SIGEVO, 2011.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.