#### Abstract

Evolution strategies are successful global optimization methods. In many practical numerical problems constraints are not explicitly given. Evolution strategies have to incorporate techniques to optimize in restricted solution spaces. Famous constraint-handling techniques are penalty and multiobjective approaches. Past work has shown that in particular an ill-conditioned alignment between the coordinate system of Gaussian mutation and the constraint boundaries leads to premature convergence. Covariance matrix adaptation evolution strategies offer a solution to this alignment problem. Last, metamodeling of the constraint boundary leads to significant savings of constraint function calls and to a speedup by repairing infeasible solutions. This work gives a brief overview over constraint-handling methods for evolution strategies by demonstrating the approaches experimentally on two exemplary constrained problems.

#### 1. Introduction

Many continuous optimization problems in practical applications are subject to constraints [1]. Constraints can make an easy problem hard and hard problems even harder. Surprisingly, in the past only little research efforts have been devoted to the development of efficient and effective constraint-handling techniques—in contrast to the energy invested in the development of new methods for unconstrained optimization. This observation also holds true in the field of evolutionary computation. This paper is devoted to constraint-handling techniques that have been developed, in particular for evolution strategies. It summarizes our line of research of the last years in the field of constraint-handling and premature step-size reduction [2–7]. In real-valued solution spaces a constrained problem can be hard to solve due to a coordinate system alignment problem that leads to premature fitness stagnation. We review not only various general approaches like penalty functions, but also specialized approaches that have been developed to solve coordinate alignment problems, by summarizing each constraint-handling method, stating experimental results on two exemplary test functions and discussing advantages and disadvantages.

The remainder of this section gives a brief introduction to evolution strategies, constrained problems, and a taxonomy of constraint-handling techniques. Section 2 introduces three examples from the famous family of penalty functions. A bioinspired multiobjective approach is reviewed in Section 3. The methods that concentrate on coordinate system alignment are presented in Section 4, while Section 5 is devoted to metamodeling of the constraint boundary.

##### 1.1. Evolution Strategies

Evolution strategies (ES) are a family of strong stochastic methods for global optimization. Developed by Rechenberg [8] and Schwefel [9], they have become famous for global numerical optimization, that is, nonconvex optimization in . In each iteration offspring solutions are produced and the best are selected as parents for the following generation. An important basis of ES is the self-adaptive Gaussian mutation operator that we briefly repeat in this context. An individual of a -ES with the -dimensional objective variable vector is mutated in the following way:

while delivers a Gaussian distributed number. The strategy parameter vector undergoes mutation—a typical variation of —with log-normal mutation:

as crossover operator arithmetic recombination is applied in most cases. For a detailed introduction to ES we recommend the introduction by Beyer and Schwefel [10] or the introductory chapter to ES in Eiben's book [11].

##### 1.2. Constrained Problems

In the field of evolutionary computation the constraints typically are not considered available in their explicit formal form. Rather, the constraints are assumed to be black boxes: a vector fed to the black box just returns a numerical or boolean value. If there is a numerical response, then the information about a positive value can be used to assess the distance to feasible solutions. A number of constraint-handling methods exploit this information. In general, the constrained continuous nonlinear programming problem is defined as follows: find a solution in the -dimensional solution space that minimizes the objective function , in symbols as:

A feasible solution satisfies all inequality and equality constraints. A feasible solution that minimizes is termed as an optimal solution. If for some inequality constraint at an optimal solution , then the constraint is said to be active. We assume that the evaluations of the constraint functions are computationally expensive and that the return values are boolean and provide the information of whether the solution is feasible or not. In order to be able to develop more advanced constraint-handling techniques, for example, repair or feasibility check approaches, metamodels of the constraint function can be built with certain assumptions, that is, to be linear, quadratic, and so forth.

The two following test functions excellently demonstrate the phenomenon of premature fitness stagnation that will be discussed in the following sections and that is a challenge for most constraint-handling techniques. The two functions will be used for the discussion of the methods reviewed in the current paper. Problem 2.40—taken from Schwefel's artificial test problems [10]— exhibits a linear objective function and an optimum with five active linear constraints. The problem is to minimize

subject to

with minimum and .

The second problem is called tangent problem (TR). It is based on the sphere model subject to one linear constraint: with and . The success rates on TR get worse when approximating the optimum. In this paper we will focus on the TR problem with dimensions, denoted as TR2.

##### 1.3. A Brief Taxonomy of Constraint-Handling Methods

A variety of constraint-handling methods for evolutionary algorithms have been developed in the last decades. Most of them can be classified into five main types of concepts.

() *Penalty functions* decrease the fitness of infeasible solutions by taking the number of infeasible constraints or the distance to feasibility into account [12–16]. The history of penalty functions began with the sequential unconstrained minimization technique by Fiacco and McCormick [13] in which the constrained problem is solved by a sequence of unconstrained optimizations. The penalty factors are stepwise intensified. In similar approaches penalty factors can be defined statically [14] or depending on the number of satisfied constraints [16]. They can dynamically depend on the number of generations as Joines and Houck propose [15]:
at generation , parameters and are user defined; typical settings are , , or . is a measure for the constraint violation. A frequent definition is with factors and . Penalties can be adapted according to an external cooling scheme [15] or by adaptive heuristics [12]. In the death penalty approach [5] infeasible solutions are rejected and new solutions are created until enough feasible ones exist. In the segregated genetic algorithm by Riche et al. [17] two penalty functions, a weak one and an intense one, are calculated in order to surround the optimum. In the coevolutionary penalty function approach by Coello Coello [18] the penalty factors of an inner evolutionary algorithm are adapted by an outer evolutionary algorithm. Some methods are based on the assumption that any feasible solution is better than any infeasible one [19, 20]. Examples are the metric penalty functions by Hoffmeister and Sprave [21]. Feasible solutions are compared using the objective function while infeasible solutions are compared considering the satisfaction of constraints. Similar is the approach by Oyman et al. [22]. Their fitness function depends on the parent and children population at every generation and, therefore, becomes a dynamic approach. In his approach called stochastic-ranking Runarsson [23] he uses metamodels to predict both fitness functions values and penalties based on constraint violations. From this point of view the approach is related to methods that are based on metamodeling the constraint boundary.

() *Repair algorithms* either replace infeasible solutions or only use the repaired solutions for evaluation of their infeasible pendants [24, 25]. This class of algorithms can also be seen as local search methods that reduce the constraint violation. The repair algorithm generates a feasible solution from an infeasible one. In the Baldwinian case, the fitness of the repaired solution replaces the fitness of the original solution. In the Lamarckian case, the feasible solution overwrites the infeasible one. In general, defining a repair algorithm can be as complex as solving the problem itself.

() *Decoder functions* map genotypes to phenotypes which are guaranteed to be feasible. Decoders build up a relationship between the constrained solution space and an artificial solution space easier to handle [25–27]. They map a genotype into a feasible phenotype. By this means even quite different genotypes may be mapped onto the same phenotype. Eiben and Smith [11] define decoders as a class of mappings from the genotype space to the feasible regions of the solution space with the following properties: every must map to a single solution , every solution must have at least one representation , and every must have the same number of representations in (this need not be one).

() *Feasibility preserving representations and operators* force candidate solutions to be feasible [28, 29]. A famous example is the GENOCOP algorithm [27] that reduces the problem to convex search spaces and linear constraints. A predator-prey approach to handle constraints is proposed by Paredis [28] using two separate populations. Schoenauer and Michalewicz [29] propose special operators that are designed to search regions in the vicinity of active constraints. A comprehensive overview to decoder-based constraint-handling techniques is given by Coello [25] and also by Michalewicz and Fogel [27].

() *Multiobjective optimization* techniques are based on the idea of handling each constraint as an objective [30–35]. Under this assumption many multiobjective optimization methods can be applied. Such approaches were used by Parmee and Purchase [34], Jimenez and Verdegay [32], Coello Coello [31], and Surry et al. [36]. In the behavioral memory-method by Schoenauer and Xanthakis [35] the EA concentrates on minimizing the constraint violation of each constraint in a certain order and optimizing the objective function in the last step.

Of course, constraint-handling methods exist that do not fit into the taxonomy. Montes and Coello Coello [37] introduced a technique based on a multimembered ES with a feasibility comparison mechanism. The -constrained differential evolution approach by Takahama and Sakai [38] combines the usage of an for equality constraints with differential evolution. The dynamic multiswarm particle optimizer by Liang and Suganthan [39] makes use of a set of subswarms concentrating on different constraints. It is combined with sequential quadratic programming as a local search method. The approach of Mezura-Montes et al. [40] combines differential evolution, different mutation operators to increase the probability of producing better offspring, three selection criteria, and a diversity mechanism. Mezura-Montes [41] approach gives a survey of constraint-handling methods for evolutionary algorithms.

In the following section we will compare various approaches from different fields and compare them, in particular with regard to the mentioned premature step-size problem. The next section shows this problem experimentally.

#### 2. Penalty Methods

Evolutionary search is guided by the quality of its candidate solutions. Consequently, an obvious solution to constraint-handling is to deteriorate the fitness of infeasible methods [11, 25]. Here we review three penalty functions exemplarily. Death penalty is the simplest way, but wastes comparably many constraint function calls. Paragraph 2.2 is a typical penalty technique where the solutions are penalized with regard to the progress of the search. The death penalty step control approach that prevents premature step-size reduction is reviewed in Section 2.3.

##### 2.1. Death Penalty

First of all, we will analyze the behavior of death penalty, that is, simply discarding infeasible offspring solutions [42, 43]. This is the first time we can observe premature fitness stagnation. Table 1 shows the corresponding results of a (15,100)-ES with the following settings on problems 2.40 and TR2. We use the mutation introduced in Section 1.1 with settings and and arithmetic recombination with randomly chosen parents. All experiments in this article make use of the same experimental settings unless stated explicitly. The termination condition is fitness stagnation: the algorithms terminate if the fitness win from generation to generation falls below . In this case the magnitude of the step sizes is too small to effect further improvements. Parameters best, mean, and dev describe the achieved fitness (difference between the optimal fitness and the fitness of the best solution ) of 25 experimental runs while ffe counts the average number of fitness function evaluations and cfe of constraint function evaluations, respectively. The results show that death penalty is not able to approximate the optimum of the problem satisfactorily. The relatively high-standard deviations dev show that the algorithms produce unsatisfactorily different results.

We can summarize the behavior of death penalty mentioning the advantage that *death penalty* is easy to implement. The disadvantages are that *death penalty* is inefficient as many infeasible tries are wasted, and it suffers from premature convergence. The following methods aim at preventing premature convergence.

##### 2.2. Dynamic Penalty Functions

The question arises whether dynamic penalty functions also suffer from premature convergence. To answer this question we tested the penalty function by Joines and Houck [15] that is based on adding a penalty on infeasible solutions

with parameters , and the constraint violation . The penalty depends on the number of iterations and decreases in the course of time. Table 2 shows the experimental analysis of the penalty function on 2.40 and TR2 with and . Again, the algorithm is based on a (15,100)-ES with the same settings like in the last paragraph 2.1. The algorithm stops earlier, but the results are even worse and show that premature fitness stagnation occurs, too. The reason is quite obvious: the success rate in the vicinity of the infeasible search space remains small because of the penalty—no matter whether caused by discarding or penalizing. Consequently, we can summarize as follows: *dynamic penalty functions* are easy to implement, and no feasible starting point is required. But the disadvantages are that *dynamic penalty functions* suffer from premature convergence. Related work on penalty functions can be found in [12–16].

##### 2.3. Death Penalty Step Control

The most obvious modification to prevent premature step-size reduction is the introduction of a minimum step-size for the mutation strengths with :

This is exactly what the death penalty step control evolution strategy (DSES) is aiming at [5]. Nevertheless, a lower bound on the step sizes will also prevent an unlimited approximation of the optimum when reaching the range of . Consequently, the DSES makes use of a control mechanism to reduce during convergence to the optimal solution. The reduction process depends on the number of infeasible mutations produced when reaching the area of the optimum at the boundary of the feasible solution space. The reduction process of depends on the number of rejected infeasible solutions: in every infeasible trial, is reduced by a factor according to the equation

The DSES is denoted by [; ]-DSES. Again, we show the behavior of the constraint-handling method on problem TR2 and 2.40; see Table 3. The method is able to approximate the optimum of problem 2.40 with comparably few fitness function evaluations, but a waste of constraint function evaluations. Intuitively, the five active linear constraints of problem 2.40 cause many infeasible samples, so does the step-sizes reduction mechanism. On *harder* problems like TR2 the low success rates still prevent an arbitrarily exact approximation of the optimal solution. The success of the DSES depends on a proper *reduction speed*, that is, proper parameter settings for and . Too fast reduction results in premature convergence; too slow reduction is inefficient. Further experiments on other test functions confirm this picture.

Again, we summarize the following results: *death penalty step control* is easy to implement, and shows an improvement of the approximation of optima with active constraints. But the disadvantages are that *death penalty step control* consumes many constraint function evaluations, its success depends on proper parameter settings, and on some problems it may still suffer from low success rates. A more detailed experimental analysis of the DSES can be found in [4, 5].

#### 3. A Multiobjective Bioinspired Approach

A familiar variant to handle constraints is to treat each constraint—or an aggregated sum of all constraints—and the objective function as separate objectives in a multiobjective formulation. Similar approaches have been introduced in the past [30–35]. Here we review a similar constraint-handling technique that treats the fulfillment of constraints and the optimization of the objective function as separate objectives that are optimized using a population specific selection scheme. The bioinspired concept offers an answer to the problem of low success rates: our two-sex evolution strategy (Kramer and Schwefel [5]) allows candidate solutions to cross the constraint boundary. The mechanism to enforce the approach of the optimum stems from nature. Individuals of different sex are selected by different criteria and nature allows pairing only between individuals of different sex. Transferring this principle to constraint-handling means: Every individual of the two sexes evolution strategy (TSES) is assigned to a feature called *sex*. Similar to nature, individuals with different sexes are selected according to different criteria. Individuals with sex are selected by the objective function. Individuals with sex are selected by the fulfillment of constraints. The intermediary recombination operator plays a key role. Recombination is only allowed between parents of different sex. A few modifications are necessary to prevent an explosion of the step size, that is, a two-step selection operator for individuals of sex similar to the operator by Hoffmeister and Sprave [21]. For a list of TSES variants and modifications we refer to [5]. The populations are noted as —the index determines the sex, that is, for objective function and for constraints.

Table 4 shows the experimental results of the TSES on problems TR2 and 2.40. While death penalty completely fails on problem 2.40, the (8 8, 13 87)-TSES reaches the optimum in every run. Now, a better approximation of the *harder* problem TR2 is possible. Nevertheless, the approximation quality may still be improved and an analysis on further test problems—that can be found in [4]—shows that the TSES is successful on many constrained problems, but not on all. Fortunately, the TSES is quite robust to the chosen population ratios.

We can summarize that the *two-sex evolution strategy* improves the approximation of optima with active constraints, allows infeasible starting points, saves constraint function evaluations, for example, in comparison to the DSES, and is quite robust to parameter changes. But the disadvantages are that the *two-sex evolution strategy* still consumes many fitness function evaluations; on some problems it may still suffer from low success rates, for example, on TR2.

#### 4. Coordinate Alignment Techniques

In real-valued optimization the coordinate system plays an important role. If the coordinate system of the mutation operators, for example, of Gaussian mutation, is not aligned to the coordinate system of the objective function—and this is frequently the case in black-box optimization—undesirable effects may occur like premature step-size reduction.

##### 4.1. Premature Step-Size Reduction

The phenomenon of premature step-size reduction at the constraint boundary has been analyzed in [2]—in particular for the condition that the optimum lies on the constraint boundary or even in a vertex of the feasible search space. In such cases the evolutionary algorithm frequently suffers from low success probabilities near the constraint boundaries. Under simple conditions, that is, a linear objective function, linear constraints, and a comparably simple mutation operator, the occurrence of premature convergence due to a premature decrease of step sizes was proven. Figure 1 illustrates the reason for premature step size reduction. We assume the simplified case in which mutations are produced on the boundary of the circles. in case of large mutation strengths () with the region of success, that is, the marked part of the circles, is smaller in comparison to the whole circle than in the case that the circle is not cut by the constraint boundary (). Consequently, the probability to produce successful mutations is higher for small step sizes and these mutations are favored during optimization. This is a coordinate system alignment problem: In case of independent step sizes and coordinate rotation the mutation circle can adapt to a mutation ellipsoid whose region of success is not restricted by the constraint boundary.

Arnold and Brauer [44] analyzed the behavior at the boundary of linear constraints and models the distance between the search point and the constraint plane with a Markov chain. Furthermore, they discuss the working of step length adaptation mechanisms based on success probabilities.

##### 4.2. Biased Mutation

The shape of the standard mutation ellipsoid is Gaussian. The best modification to improve the success rate situation would be a more flexible mutation distribution function. Later, we will see that a rotation of the mutation ellipsoid is a reasonable undertaking. But is a deformation also an adequate solution to low success rates? Biased mutation aims at biasing the mean of the Gaussian distribution into beneficial directions self-adaptively [7]. A self-adaptive bias coefficient vector determines the direction of this bias and augments the degree of freedom of the mutation operator. This additional degree of freedom improves the success rate of reproducing superior offspring. The mutation operator adapts the bias direction within the interval (for left) and (for right) in each of the dimensions:

This relative direction must be translated into an absolute bias vector. For this sake the step sizes can be used. For every the bias vector is defined by

Since the absolute value of bias coefficient is less than or equal to 1, the bias will be bound to the step sizes . This restriction prevents the search from being biased too far away from the parent. Hence, the biased mutation works as follows:

To allow self-adaptation, the bias coefficients are mutated in the following *meta-EP* way:

with parameter determining the mutation strength on the bias. The biased mutation operator (BMO) biases the mean of mutation and enables the ES to reproduce offspring outside the standard mutation ellipsoid. To direct the search, the biased mutation enables the center of the ellipsoid to move within the bounds of the regular step sizes . An *adaptive* variant of the originally *self-adaptive* biased mutation is the descent mutation operator. It estimates the descent direction of two population's centers and of successive generations. Let be the center of the population at generation :

The normalized descent direction of two successive population centers and is

Similar to the BMO, the descent mutation operator (DMO) now becomes

The DMO is reasonable as long as the assumption of locality is true: the estimated direction of the global optimum can be derived from local information, that is, the descent direction of two successive populations' centers. Again, we analyze both biased mutation operators on the test problems 2.40 and TR2 and show the results in Table 5. For the sake of adaptation of the bias an increase of offspring individuals to is necessary. The bias mutation parameter is set to the standard setting . Our experiments show that the BMO and the DMO are both able to improve the results on problem 2.40. The experiments reveal that the mutation distribution deformation improves the success rate—intuitively by shifting the center of the mutation ellipsoid so that the latter is not cut off by the infeasible solution space. But the results show that the *harder* problem TR2 is still not easy to approximate.

We can conclude that *biased mutation* improves the approximation of optima with active constraints. *Descent biased mutation* is comparatively efficient, in particular more efficient than the BMO. But the disadvantages are that *biased mutation* consumes many fitness and constraint function evaluations, and on some problems it may still suffer from low success rates.

##### 4.3. Mutation Ellipsoid Rotation

Correlated mutation by Schwefel [45] rotates the axes of the hyperellipsoid to adapt to local properties of the fitness landscape. Three ways are possible to rotate the mutation ellipsoid with the help of possible rotation angles:

(1)a self-adaptive rotation—in this case the rotation angles become strategy parameters and the algorithm has to tune itself,(2)a rotation with the help of a coevolutionary approach, (3)with a metamodel of the constraint boundary that delivers the orientation of the constraint boundary.Table 6 shows the experimental results of self-adaptive correlated mutation (SA-ES), a metaevolutionary approach ((3,15(3,15))-MA-ES) [5], and correlated mutation using the metamodel estimator (MM-ES) with and binary search steps. Correlated mutations make use of additional strategy parameters, that is, angles for the rotation of the hyperellipsoid. The self-adaptation process of the SA-ES fails to adapt the angles automatically. The parameter space of step sizes and angles is too large to adapt successfully by means of self-adaptation. The MA-ES is a nested ES, that is, an outer ES evolves the angles of an inner ES that optimizes the problem itself. Of course, this approach is rather inefficient—as one fitness evaluation of the outer ES causes a whole run of the inner ES on the original problem—but the results demonstrate that the rotation of the hyperellipsoid has a strong impact on the approximation capabilities on problem TR2. The MM-ES approach is capable of estimating the proper rotation angle and controlling the mutation ellipsoid to approximate the optimum. We use the linear metamodel that will be introduced in Section 5. The rotation angles can be computed estimating the normal vector of the estimated hyperplane and the axes of the mutation ellipsoid. This is an easy undertaking in two dimensions. A comparison between the MM-ES approach with and with binary search steps shows that it is advantageous to invest search for a precise metamodel estimation: a higher accuracy of the metamodel delivers better approximation results.

Obviously, the coordinate system alignment problem is solved with the mutation ellipsoid rotation. But the self-adaptive rotation does not lead to satisfying results, while the metaevolutionary approach is inefficient. In the following paragraph we will investigate whether the covariance matrix adaptation techniques, which are designed to align coordinate systems, are able to adapt their covariance matrix to constrained problems automatically without a metamodel.

##### 4.4. Covariance Matrix Techniques

Past research on constraint-handling missed to concentrate on covariance matrix adaptation techniques. It is an astonishing fact that no sophisticated constraint-handling techniques for these algorithms have been introduced so far. Nevertheless, we will now analyze whether the coordinate system alignment problem can be solved with covariance matrix adaptation using death penalty. The idea of covariance matrix adaptation techniques is to adapt the distribution of the mutation operator such that the probability to reproduce steps that led to the actual population increases. This idea is similar to the estimation of distributions approaches. The covariance matrix adaptation evolution strategy (CMA-ES) was introduced by Hansen [46] and Ostermeier [47]. The results of the CMA-ES on problems TR2 and 2.40 can be found in Table 7. Amazingly, the CMA-ES is able to cope with the low success rates around the optimum of the TR problem. We observed that the average number of infeasible solutions during the approximation is . This indicates that a reasonable adaptation of the mutation ellipsoid takes place. An analysis of the angle between the main axis of the mutation ellipsoid and the constraint function shows that it converges to zero, the same do the step sizes during approximation of the optimum. Hence, the coordinate system alignment is successful.

We can conclude that the CMA-ES is able to align the coordinate system automatically without a metamodel. Recent results have shown that an acceleration can be achieved if the covariance matrix is rotated with the help of a metamodel exactly at the time when the constraint boundary is reached [3].

#### 5. Metamodeling of Constraints

In black-box scenarios the constraint boundaries are not explicitly given. Metamodeling of constraints allows advanced constraint-handling methods. Metamodels can be used for various purposes, for example, for checking the feasibility and for repair of infeasible mutations, and—like we have seen in the previous section—for control of mutation ellipsoids and covariance matrices. Metamodeling of objective functions has developed to a successful standard in evolutionary optimization [48–50].

##### 5.1. Linear Constraint Estimation

For constraint metamodeling various classification and regression methods can be applied. For the case of linear constraints a metamodel that is based on sampling infeasible points and binary search on the segments to the last feasible point has been developed [3]. The approach works as follows: first, the center point of the model estimator is determined. When the first infeasible offspring individual is produced, the feasible parent is the center of the corresponding metamodel estimator and the distance becomes radius of the model estimator. Then, random points are generated on the surface of a hypersphere. Point is the center of a hypersphere with radius , such that the constraint boundary is cut. In dimensions additional infeasible points , have to be produced. The model estimator produces the infeasible points by sampling on the surface of a hypersphere with radius until a sufficient number of infeasible points are produced. The points on the surface are sampled randomly with uniform distribution using the method of Marsaglia [51]. In the first step the algorithm produces Gaussian distributed points and scales the numbers to length . Further scaling and shifting yields randomly distributed point on the hypersphere surface.

In a next step the binary search procedure is applied to identify points *on* the constraint boundary: the line between the feasible point and the th infeasible point cuts the real constraint hyperplane in point . We approximate with binary search on this segment. The center of the last interval defined by the last points of the binary search is an estimation of point on . Figure 2 illustrates the situation. With regard to the estimated angle error , the real hyperplane lies between and .

In the last step we calculate the normal vector of using the points on the constraint boundary. We assume that the points , , represent linearly independent vectors as the endpoints of the lines they lie on have been generated in a random procedure. A successive Gram-Schmidt orthogonalization of the th vector on the th previously produced vectors delivers the normal vector of . Note that we estimate the normal vector of the linear constraint model only one time, that is, when the first infeasible solutions have been detected. Later update steps only concern the local support point of the hyper-plane (hence, in iteration the linear model is specified by normal vector and current support point ). At the beginning, any of the points may be the support point . For later update steps two cases have to be distinguished. Let be the distance between the mutation ellipsoid center and the constraint boundary at time and let be the number of binary search steps to achieve the angle accuracy of .

(1)The search (i.e., the center of the mutation ellipsoid) approaches : if distance between and becomes smaller than , a reestimation of the support point is reasonable.(2)The search moves parallel to : an exceeding of distance with causes a reestimation of .We use binary steps on the line between the current infeasible solutions and to find the new support point .

For nonlinear constraints other regression or classification techniques may be taken into account like support vector regression or support vector machines [52].

##### 5.2. Feasibility Check

A metamodel can be used to check the feasibility of new solutions in order to reduce constraint function evaluations [3]. For this purpose an exact estimation of the constraint boundary is necessary. Potentially feasible solutions are checked for feasibility with a real evaluation of the constraint function. Two errors for the feasibility prediction of individual are possible.

(1)The model predicts that is feasible, but it is not. Points of this category are examined for feasibility. This will cause an unnecessary constraint function evaluation.(2)The model predicts that is infeasible, but it is feasible. The individual will be discarded, but may be a very good approximation of the optimum.
Exemplarily, we take the linear constraint metamodel of the previous paragraph into account and test the feasibility check approach. A *safety margin * can reduce the number of errors of type 2. We set to the distance of the mutation ellipsoid center and the estimated constraint boundary . Hence, the distance between and the shifted constraint boundary becomes . A regular update of the constraint boundary support point is necessary; see previous Section 5.1. Table 8 shows the results of the CMA-ES with feasibility check using the constraint metamodel. We can observe a significant saving of fitness and constraint evaluations with a high approximation capability.

##### 5.3. Solution Repair

The repair approach projects infeasible mutations onto the constraint boundary . We assume the angle error that can be estimated by the number of binary search steps . In the solution repair approach the projection vector is elongated by length . Figure 3 illustrates the situation. Let be the support point of the hyperplane at time and let be the infeasible solution. It holds that and . We get

The elongation of the projection into the potentially feasible region guarantees feasibility of the repaired individuals. Nevertheless, it might prevent fast convergence, in particular in regions far away from the hyperplane support point as grows with increasing length of . The center of the hyperplane is updated every generations. The results of the CMA-ES repair algorithm can be found in Table 9. We observe a significant decrease of fitness function evaluations, in particular on problem TR2. The search concentrates on the boundary of the infeasible search space, in particular on the feasible site.

#### 6. Summary

Many constraint-handling methods exist for evolution strategies, at the head penalty functions. Due to low success rates at the constraint boundary, ES without coordinate alignment techniques often fail to find the optima in the vertex of the feasible solution space. The death penalty step control approach and the multiobjective biologically inspired two-sex ES prevent a premature step-size reduction on some problems, but its success depends on proper parameter settings. Low success rates at the constraint boundary can be increased with coordinate system alignment techniques. A first step into this direction is biased mutation techniques, that is, biased mutation and descent biased mutation. Much better results can be achieved with metamodel-based mutation ellipsoid rotation. This rotation cannot be achieved self-adaptively, but automatically with covariance matrix adaptation mechanisms. The latter shows excellent results, even on hard problems like TR2. Further improvements of the CMA-ES can be achieved with metamodeling: constraint boundary surrogates can be used for prediction of feasibility of mutations and for repair of infeasible solutions. At last, Table 10 summarizes the best results that could be achieved on the two problems combining the CMA-ES with covariance matrix rotation, feasibility check, and repair of infeasible solutions.

Metamodeling of constraints will probably become more and more important for future research. Nonlinear models will increase the accuracy of feasibility prediction. Advanced regression methods will improve the accuracy of repaired infeasible solutions. Further constraint-handling methods are imaginable like adaptation of mutation probability distributions and covariance matrices—also with non-linear metamodels.