Research Article | Open Access
Benjamin Van Dyke, "Equal Angle Distribution of Polling Directions in Direct-Search Methods", Journal of Optimization, vol. 2014, Article ID 619249, 15 pages, 2014. https://doi.org/10.1155/2014/619249
Equal Angle Distribution of Polling Directions in Direct-Search Methods
The purpose of this paper is twofold: first, to introduce deterministic strategies for directional direct-search methods, including new instances of the mesh adaptive direct-search (MADS) and the generating set search (GSS) class of algorithms, which utilize a nice distribution of PoLL directions when compared to other strategies, and second, to introduce variants of each algorithm which utilize a minimal positive basis at each step. The strategies base their PoLL directions on the use of the QR decomposition to obtain an orthogonal set of directions or on using the equal angular directions from a regular simplex centered at the origin with vertices on the unit sphere. Test results are presented on a set of smooth, nonsmooth, unconstrained, and constrained problems that give comparisons between the various implementations of these directional direct-search methods.
In , Vicente and Custódio introduced a framework for directional direct-search methods (DSM) encompassing both the MADS  class of algorithms (using an underlying mesh and simple decrease) and the Gss  class of algorithms (using sufficient decrease under a forcing function ) for black-box optimization problems of the form where the objective function is nonsmooth, possibly discontinuous and is the set of feasible points. The algorithms under this framework are shown to have first order convergence results in the Rockafellar sense  for discontinuous functions. The framework is shown in Algorithm 1.
While both algorithms fall under this framework and are shown to have the same convergence results, there are tradeoffs for either choice. The MADS class of algorithms not only requires a bit more complexity in choosing polling directions, but also requires simple decrease in order to accept an iteration as successful. Under the given framework, any instance of Gss requires sufficient decrease using a forcing function that must be chosen under unknown criteria. The sufficient decrease condition, however, can be implemented in such a way that it is mostly negligible and the method for choosing polling directions is simpler because of the absence of a mesh.
Each iteration of this DSM framework is characterized by two steps, a SEARCH and a POLL. The POLL step may be opportunistic (the POLL stops if a successful point has been found) and is governed by a POLL size parameter (MADS denotes this ) such that under certain assumptions. These assumptions include the requirement that at each iteration either the set of SEARCH and POLL points lie on an integer mesh (MADS) or in order for the iteration to be a success the algorithm must find a point that satisfies sufficient decrease for some forcing function that is continuous, nondecreasing with , when as shown in Algorithm 1. The use of in the figure indicates the use of either a forcing function (sufficient decrease) or the zero function (simple decrease).
In , Audet and Dennis introduced the mesh adaptive direct-search (MADS) class of algorithms for (1), where the objective function either has no available derivative information or the derivative information is unreliable and is the set of feasible points determined by black-box constraint functions. Under certain assumptions, MADS is shown to converge to both first-order  and second-order  stationary points in the Clarke sense , as well as the previously mentioned first-order stationary points in the Rockafellar sense.
MADS was introduced as a means of extending the Gps  class of algorithms by allowing for an infinite number of polling directions when exploring the space of variables. The first instance of MADS called LTMADS  relied on a random lower triangular matrix to generate the set of POLL directions and relied on probabilistic arguments to obtain convergence results. This instance was later shown to have undesirably large angles between POLL directions at some iterations. To correct these deficiencies, a second instance of MADS called ORTHOMADS  was introduced that is deterministic and uses a maximal orthogonal basis at each iteration for the set of polling directions.
Each iteration of MADS is characterized by two steps, a SEARCH and a local POLL, performed on a mesh whose fineness approaches zero in the limit. At each iteration the mesh is defined by where is the set of all evaluated points at the start of iteration , is the mesh size parameter at iteration , and is a matrix with directions in . The POLL step is executed if the SEARCH step fails to find a lower objective function value than the current best solution. The POLL step evaluates mesh points adjacent to the current best solution in directions that form a positive spanning set for . The POLL step can be characterized by the following definition taken from .
Definition 1. At iteration , the set of POLL trial points for the MADS algorithm is defined to be where is a positive spanning set such that and for each , (i) can be written as a nonnegative integer combination of the directions of ,(ii)the distance in -norm from the POLL center to a POLL trial point is bounded above by the POLL size parameter , and(iii)limits (as defined by Coope and Price ) of the normalized sets are positive spanning sets.
In  it is observed that the POLL and mesh size parameters produced by a MADS instance satisfy
ORTHOMADS  uses the Halton sequence  and the scaled Householder transformation, with and , to construct an orthogonal basis for at each iteration. It defines where is the mesh index, with at the initial iteration. ORTHOMADS has been shown to behave well in practice, but recently a new variant of MADS, called QRMADS , was introduced to counter the tendency of ORTHOMADS to generate a nonuniform set of polling directions.
QRMADS augments a single direction from a dense sequence on with a matrix of full rank and then uses the QR decomposition to construct an orthogonal basis that is then rounded to the current mesh. It defines where satisfies and is chosen so that after rounding the basis does not become degenerate.
Although QRMADS was shown to have a more uniform set of polling directions than ORTHOMADS, the version tested in  is not deterministic. This is due to the fact that the augmented matrix with full rank was generated at random. Consequently, one outcome of this paper is to introduce a deterministic version of QRMADS by deterministically generating a matrix with full rank.
It has been observed, in some cases, that reducing the number of function evaluations at every iteration from a maximal positive spanning set to a minimal positive spanning set can improve the performance of these types of algorithms (see, e.g., [12–14]). For this reason, this paper introduces instances of MADS and Gss, called EADMADS and EADGss, that have versions using both a maximal positive basis ( polling directions) and a minimal positive basis ( polling directions). This is accomplished by generating either an orthogonal basis or a regular simplex at each iteration. In the case of the maximal variant, the algorithmic framework for EADMADS is exactly the same as QRMADS and the new name supersedes the old. The new name was chosen to represent the use of a set of directions, for each POLL step, that have an “equal angle distribution.” By this it is meant that within the positive basis, the angle between any two directions (except a direction and its opposite, when applicable) is constant.
Section 2 gives the methods for constructing maximal and minimal polling sets by finding either an orthogonal set of directions or the vertices of a regular simplex centered at the origin on the unit sphere. Section 3 gives the method used by EADMADS for constructing a polling set on the current mesh consisting of “nearly orthogonal” directions or a “nearly regular” simplex. Section 4 outlines the EADMADS algorithm and shows that it is a valid MADS instance that shares the theoretical convergence results of this class of algorithms. Section 5 outlines the EADGss instance of Gss which is essentially the same as EADMADS except that it requires sufficient decrease instead of restricting the SEARCH and POLL points to a mesh. Finally, in Section 6, a method for making both EADMADS and EADGss deterministic is detailed and numerical results comparing each against ORTHOMADS, QRMADS, and one another on a set of test problems taken from the optimization literature are given.
2. Constructing a Basis
At each iteration of the MADS algorithm (and DSM framework) a positive spanning set is required. In this section, we discuss a method for taking a single direction from a sequence that is dense on the unit sphere and forming either an orthogonal basis or the set of vertices of a regular simplex on the unit sphere centered at the origin that includes the given direction. We can then take the negatives of all the directions from the orthogonal basis to form a maximal positive spanning set or take the vertices of the regular simplex to form a minimal positive spanning set. Since one of the directions from the dense set is included in each positive spanning set, we will then have as dense in the unit sphere.
2.1. Orthogonal Basis
To generate an orthogonal basis from a given direction from a dense set, we employ the QR decomposition of an matrix of full rank. This matrix can be created by augmenting with any matrix of full rank (e.g., the identity or a rotation matrix) and taking the first columns that form a basis. The QR decomposition then generates unique orthogonal matrix and upper triangular matrix such that . As this method will generate the same basis as the Gram-Schmidt process, the first direction will correspond to .
2.2. Regular Simplex
To generate a regular simplex, , centered at the origin with vertices on the unit sphere from a given direction from a dense set, we again start from an matrix of full rank, created in the same manner as above, and use the following algorithm.
Let , with , , where is the th column of and is linearly independent. We construct a regular simplex, , with vertices on the unit sphere by using the fact that for any two vertices of the simplex. This can be done as follows.(i)Set .(ii)For set (iii)Set
Proposition 2. is the unique solution to the following equation:
Due to rounding issues, this algorithm has the same unfortunate computational problems as the Gram-Schmidt process and may give an answer that does not closely approximate a regular simplex, possibly even containing complex directions. To ensure that a valid regular simplex is generated, an orthogonal matrix can be used in place of an arbitrary matrix of full rank. This can be accomplished by finding the QR decomposition of and then applying the algorithm to the orthogonal matrix . In the following proof, we will simply assume that is already orthogonal.
Proof. To find the solution to (10) for each , we employ the method of Lagrange multipliers. First we define the Lagrangian function as follows:
We then solve the system of the following equations:
First, we multiply (12) by and use (14) to obtain
We can then solve for to obtain
From (12) and by induction, we can see that for each , . Consequently, , . We can then reduce (16) to Using , , when , and (17), we solve to obtain Now we can substitute (18) into (17) and simplify to obtain
Next, we substitute (18) and (19) into (12) to obtain Then we take the norm of each side and use , , when , (13), and the fact that to obtain the quadratic equation We then get .
Finally, we observe Thus the maximum occurs when this is negative definite or when . We can then take and the corresponding value of . We then substitute , , and (18) into (12) and solve for to obtain .
Note, the above computation shows that when orthogonal directions are used, problem (10) has a unique solution. This is reflected by the fact that the above quadratic equation has two real solutions, one corresponding to the closest point to within the constrained region and the other corresponding to the farthest point. In order to ensure that the process for generating a regular simplex is deterministic, it is important to note that this unique closest point exists and is found. We can also observe that each new vertex of the regular simplex is linearly independent to the set , showing that the first vertices of the simplex form a basis for .
3. Rounding a Basis to the Mesh
Since each iteration of the MADS class of algorithms requires a positive spanning set to lie on a mesh, it is necessary to take any basis from Section 2 and obtain a close approximation to that basis lying on the given mesh. To obtain a “nearly orthogonal” integer basis or a “nearly regular” simplex with integer vertices we take each column vector of or the first columns of , then project onto the hypercube centered at the origin with integer side length , and then round each component to the nearest integer as follows: where refers to the usual rounding operation applied to each component of the vector. We then generate the new basis with column vectors . When is formed from we define to obtain a maximal positive spanning set. When is formed from we set to get a minimal positive spanning set.
Note, for small values of , it is possible that the new set of vectors will not form a basis. Consequently, we add the constant to the exponent in order to ensure that will be nonsingular. In , it was shown that when is formed from and satisfies (7), the resulting set of directions will form a basis.
To find the constant when is obtained from , we use the fact that the distance from a matrix to the nearest singular matrix (in Frobenius norm) is bounded below by the smallest singular value of ; that is, if is singular, then . Thus we need the singular values of the matrix comprised of the first columns of . Proposition 3 gives this information. (Note: in  the matrices and shown below are studied in a similar context.)
Proposition 3. Let be a matrix with columns corresponding to any vertices of a regular simplex centered at the the origin with vertices on . Then the singular values of are and .
Proof. We need to find the eigenvalues of . We use the fact that the vertices, , of a regular simplex centered at the origin on have the property
Then observe that
showing that has eigenvalue . Next, observe that
showing that is an eigenvalue with eigenvector .
Similarly, are also eigenvectors with eigenvalue . Since these vectors are linearly independent, they form a basis for the corresponding eigenspace, thus accounting for all eigenvalues. We finish by taking the square roots to obtain the singular values of .
Next, we need the fact that for any two matrices and , , where denotes the smallest singular value (follows from Theorem 3.1 of ). Then letting be the matrix with columns , that is, we obtain . So for to be nonsingular, we need where represents the perturbation matrix resulting from rounding. Thus we require for all values of . Then letting , we obtain the following requirement for the constant determined only by the dimension :
Finally, we end with the following proposition from  which will be useful for showing EADMADS as a valid instance of MADS and for establishing convergence results.
Proposition 4. If is arbitrary, , and , then for large we have .
4. The EADMADS Instance of MADS
The EADMADS algorithm differs from the LTMADS and ORTHOMADS algorithms only in the construction of the POLL directions and the POLL set . For each of the three instances, the mesh is defined by the set of directions . For the EADMADS implementation with polling directions, the choices of the mesh and POLL size parameters are with defined as in (7), so that , for all , as in (23). For the EADMADS implementation with polling directions, the choices of the mesh and POLL size parameters are with defined as in (31), so that , for all . The use of the factor reflects the possibility that may be up to times as long as each in -norm and is the same strategy used for the implementation of LTMADS . Since approximates a regular simplex, however, we can expect to remain close to the length of each . The update rules for the mesh index are as in Algorithm 2, which describes the EADMADS algorithm.
This algorithm is very similar to the algorithm found in , with and in place of the Halton directions and . The integer and the update rules are chosen so that the algorithm will generate a subsequence of unsuccessful iterations with and such that the directions used in this subsequence will correspond to the entire sequence , where represents a sequence that is dense on .
The POLL directions are determined by using any sequence that is dense on the unit sphere (or the hypercube ), the mesh index , a sequence of matrices of full rank, and the index . At each iteration, a direction from the dense sequence is augmented with a matrix to obtain a basis and then either QR decomposition or the algorithm from Section 2.2 is used to find a basis matrix with the direction from the first column corresponding to . The columns of the resulting matrix are then projected onto the hypercube centered at zero with side length and rounded. The resulting matrix will then be either “nearly orthogonal” or correspond to vertices of a “nearly regular” simplex with the length of each column in -norm bounded by . In the case of the orthogonal basis, is completed to a maximal positive basis composed of the directions given by . In the case of the regular simplex, is completed to a minimal positive basis composed of the directions given by . This then forces the trial point to be exactly from the POLL center in -norm (except possibly in the case of the direction ).
Proposition 5. The set of normalized directions , with as in (23), is dense on .
Proof. We need to show that the POLL directions for EADMADS satisfy the following five properties from [2, 22]. (i)Each is a positive spanning set: this follows from the use of the constant to ensure that the round procedure on the first columns of results in a basis.(ii)Any direction can be written as a nonnegative integer combination of the directions of : this is true by construction since and is an integer vector on the hypercube centered at the origin with side length .(iii)The distance from the POLL center to a POLL trial point in -norm is bounded above by : this is true by construction since we ensured and so .(iv)The set of normalized directions used over all failed iterations is dense on the unit sphere: this follows from the strategy chosen for updating so that the entire sequence corresponds to the sequence of failed iterates, by Proposition 5, and (4) which ensures the existence of large .(v)Limits (as defined by ) of convergent subsequences of the normalized sets are positive spanning sets: for this property we need to show , for all , for some constant . The result will then follow as in .
First, note that the set columns of correspond to vertices of a regular simplex centered at the origin on the unit sphere} is bounded since , where denotes the Frobenius norm on and is the ball of radius centered around the matrix of zeros. Also, the set is bounded since any such would satisfy and so . Now, since the determinant function is on , it is Lipschitz on the bounded set . That is, for any there exists a constant such that This then yields So, in particular, if we take (since ) and such that , then by applying Proposition 3 to compute , we get If is the matrix corresponding to the first columns of , then note that, for any , and , for some , since for some perturbation matrix resulting from rounding. We can then choose small enough so that . Then by Proposition 4, we can choose large enough, say , so that and conclude that
Next, let have columns and note that by construction . Since has integer components we know that . Also, note that , , since each is on the hypercube with side length . We then get the following estimate:
Finally, note that is decreasing as increases. So , if , where is defined above. We then let and conclude that
5. The EADGss Instance of a Directional Direct-Search Method
At each iteration, the EADGss instance of Gss uses a positive spanning set whose directions have an equal angle distribution either by using the QR decomposition to generate an orthogonal basis or by generating a regular simplex centered at the origin. The integer and the update rules are chosen so that there will be a subsequence of unsuccessful iterations with and such that the directions used in this subsequence will correspond to the entire sequence , where is dense on . The POLL size parameter is given by . The update rules for are as in Algorithm 3, which describes the EADGss algorithm.
The POLL directions are determined by using any sequence that is dense on the unit sphere (or the hypercube ), the index , a sequence of matrices of full rank, and the index . At each iteration, a direction from the dense sequence is augmented with a matrix to obtain a basis and then either QR decomposition or the algorithm from Section 2.2 (along with the QR decomposition) is used to find a basis matrix with the direction from the first column corresponding to . The resulting matrix will then be either orthogonal, , or correspond to vertices of a regular simplex, , with the length of each column having -norm one. In the case of the orthogonal basis, is completed to a maximal positive basis composed of directions, . In the case of the regular simplex, the first columns of are completed to a minimal positive basis composed of directions by finding the final vertex of the simplex, , where are the columns of .
It should be noted that this algorithm is very similar to that for EADMADS (Algorithm 2), with the exception that EADGss uses sufficient decrease at each iteration instead of confining the SEARCH and POLL steps to a mesh. Consequently, there are only two main differences. The first is that in the final step of the construction of , EADMADS rounds and EADGss does not (hence no need for separate for the and variants). The second difference is in the determination of a successful iterate; EADGss uses a forcing function to determine sufficient decrease and EADMADS uses simple decrease.
6. Numerical Tests
In this section, numerical test results are given to compare various instances of MADS and Gss amongst one another. Each test was performed using a MATLAB implementation. The tests were performed on 130 problems taken from the optimization literature, with dimensions between nine and 60. The test problems were chosen to satisfy three categories. The first category consists of 33 smooth functions, the second category consists of 60 nonsmooth functions, and the third category consists of 37 constrained test problems, as listed in Table 1. Since many of the problems are variably dimensioned versions of one another, no generalization was used more than three times. The distribution of problems by dimension is given in Table 2.
The choices for and for ORTHOMADS correspond to (5), for the versions of EADMADS they correspond to (6) and (7), and for the version of EADMADS they correspond to (33) and (31). For each version of EADGss . The mesh index is allowed to be negative for all instances of both algorithms and for each instance of MADS the initial mesh size is set to one.
The stopping criteria are satisfied when the number of function evaluations reach 3,000. For each implementation, a POLL step was required at every iteration and an opportunistic strategy was used (the POLL stops if a successful point has been found). For each test, no search was performed and constraints are handled using the extreme barrier method ( if ).
In order to create and test deterministic versions of both EADMADS and EADGss, the Sobol sequence  is used in place of either the Halton sequence or the sequence of partition centers used for QRMADS. This is done for two reasons. Although the Halton sequence is deterministic, the sequence of partition centers is not. In higher dimensions, however, it can be observed that the Halton sequence exhibits poor uniformity. Sobol sequences, on the other hand, are both deterministic and remain uniform. For each test, the Sobol sequence was generated using the sobolset command in MATLAB and used in the given order.
At each iteration is taken to be the Sobol direction. This is then augmented with the next Sobol directions, taken in order, and the identity matrix. This is done in order to ensure that a matrix of full rank is created. From this matrix, a new matrix is formed from the first columns that form a basis for and then the QR decomposition is applied to this new matrix. When a minimal positive basis is required, the resulting orthogonal matrix is then input into the simplex algorithm of Section 2.2. For EADMADS, these directions are then rounded to the current mesh. In place of the mesh, EADGss uses a sufficient decrease requirement implemented with the forcing function .
To compare the various instances of each algorithm, the results from the tests were used to generate data profiles, as outlined in . For each test function, is defined to be the smallest value found for by any of the instances of either MADS or Gss after 3,000 function evaluations. Then after each evaluation, the algorithm is declared a success if the smallest known value for thus far satisfies the test
The number of successes over all the test functions is then recorded as a percent of success. The results for each instance of MADS or Gss are then plotted as the number of function evaluations against the percent of problems to obtain the data profile.
The first test performed was to compare both the nondeterministic and deterministic instances of EADMADS with polling directions against ORTHOMADS. The data profiles for the deterministic instance of EADMADS, five runs of the nondeterministic instance of EADMADS (formerly called QRMADS), and ORTHOMADS are given in Figure 1. A data profile is given for each instance using the set of all test functions, the set of smooth test functions only, the set of nonsmooth test functions only, and the set of constrained test functions only. In this figure, we can see that the deterministic instance of EADMADS performs very well compared to the other two instances when considering the set of all test functions or just the set of nonsmooth functions. When just considering the set of smooth or constrained test functions, the deterministic instance of EADMADS performs roughly as well as the nondeterministic instance and better than ORTHOMADS.
Figure 2 illustrates the distribution and the density of directions for all three Mads instances. Each instance was run on the Rosenbrock function with dimensions and 20. In each case, there are roughly directions shown and each direction is projected onto the plane defined by the first two coordinates. The resulting plots are typical of what can be expected regardless of the choices of the two coordinates. In this figure, we can see that the directions for both the deterministic and nondeterministic instances of EADMADS have relatively uniform distributions with little clustering. The directions for ORTHOMADS, on the other hand tend to cluster within specific regions. While the directions for both instances of EADMADS are very similar, with the unit sphere being covered nicely, the test results from this section indicate that the deterministic instance is expected to perform as well as or better than the nondeterministic instance (QRMADS).
Note 1. In Figures 2, 4, and 6, the EADMADS and EADGss directions become more restricted towards the center as the dimension increases. This behavior is a reflection of the concentration of measure phenomenon and the fact that the observable diameter of converges to zero as (for more on this topic see ). Intuitively, one can imagine that as n increases, the contribution to the norm of a vector from the first two elements gets smaller.
The next test performed was to compare the deterministic instance of EADGss with polling directions to a nondeterministic instance of Gss with polling directions using random directions in place of the Sobol sequence (this is essentially the algorithm tested in ). The data profiles for the deterministic instance of EADGss and five runs of the nondeterministic instance are given in Figure 3. A data profile is given for each instance using the set of all test functions, the set of smooth test functions only, the set of nonsmooth test functions only, and the set of constrained test functions only. From this figure, we can observe that the deterministic instance performed better on the sets containing all the test functions, the nonsmooth test functions, and the constrained test functions. Both the deterministic and nondeterministic instances show similar performances on the set of smooth test functions.
Figure 4 illustrates the distribution and the density of directions for both instances of Gss. Each instance was run on the Rosenbrock function with dimensions and 20. Similar to what is observed in Figure 2, we can see that the directions from these two instances of EADGss have relatively uniform distributions with little clustering.
The last test performed was to compare the deterministic instances of EADMADS and EADGss with both and polling directions. The data profiles for each instance of the two algorithms are given in Figure 5. A data profile is given for each instance using the set of all test functions, the set of smooth test functions only, the set of nonsmooth test functions only, and the set of constrained test functions only. From this figure, we can observe that the instances of both EADMADS and EADGss perform better than the instances on both the set of all test problems and the set of smooth test problems. We can see a lot of parity amongst the four instances on the set of nonsmooth functions with the two instances doing a bit better as the number of function evaluations approaches 3,000. The results from the set of constrained test functions are a little more mixed with the instance of EADMADS performing very well overall and the instance of EADGss performing well as the number of function evaluations approaches 3,000.
Figure 6 illustrates the distribution and the density of directions for the , deterministic instances of EADMADS and EADGss. Each instance was run on the Rosenbrock function with dimensions and 20. In this figure, we can observe that the overall distribution of directions for the