Abstract

We propose a derivative-free trust region algorithm with a nonmonotone filter technique for bound constrained optimization. The derivative-free strategy is applied for special minimization functions in which derivatives are not all available. A nonmonotone filter technique ensures not only the trust region feature but also the global convergence under reasonable assumptions. Numerical experiments demonstrate that the new algorithm is effective for bound constrained optimization. Locally, optimal parameters with respect to overall computational time on a set of test problems are identified. The performance of the best choice of parameter values obtained by the algorithm we presented which differs from traditionally used values indicates that the algorithm proposed in this paper has a certain advantage for the nondifferentiable optimization problems.

1. Introduction

Many of the objective functions in mathematical optimization that occur in engineering are obtained from a mass of numerical experiments and have special characteristics such as being nonconvex for which their first-order or second-order derivatives are unavailable. In this paper, we analyze the solution of the nonlinear problem with bound constraints:where and is a twice continuously differentiable function, but its first-order or second-order derivatives are not explicitly available. It mainly emerges in the field of operations research, management science, industrial engineering, applied mathematics, and network transmission [1] and in engineering disciplines that deal with analytical optimization techniques such as banking business and weather analysis. The unavailable first- or second-order derivatives may result in traditional derivative-based methods like quasi-Newton methods and conjugate gradient methods are not work. Therefore, researches focus on derivative-free methods which could effectively avoid the use of derivative information.

1.1. Derivative-Free Trust Region Method

Derivative-free technique has been explored to tune parameters of nonlinear optimization methods [2], to automatic error analysis [3, 4] and to design helicopter rotor blade [5, 6] and hydrodynamic [7]. These methods are all special algorithms designed for particular optimizations, but have their usage limitations. In [812] another type of derivative-free methods is proposed based on the traditional derivative-based algorithm framework [8, 1315]. They construct a function with all available derivatives to approximate the original objective function. Conn and Scheinberg and Vicente [8, 13] have already given a derivative-free method under trust region method framework. They construct the trust region subproblemwhere is the function value at th iteration point, is the gradient of at th iteration point, and is the Hessian. Although the function model and the true objective function are meant to coincide the model gradients and Hessian may be (and typically are) different from the original objective function gradient and Hessian , function (2) defined in [8] is called fully linear or fully quadratic model, dependent upon the related chosen truncated Taylor series conditions; it must be occasionally updated in order to guarantee that the residual between the approximated and real functions (and more critically, their gradients) is within the related error bounds. In fact, by definition, the function values are essentially the same. We will show the definition of fully linear model after a reasonable assumption.

Assumption (A1). Suppose that a level set and a maximal radius are given. Suppose furthermore that is twice continuously differentiable with Lipschitz continuous Hessian in an appropriate open domain containing the neighborhood of the set .

Definition 1. Let a function , which satisfies assumption (A1), be given. A set of model functions is called a fully linear class of models if the following hold.
There exist positive constants , , and , such that, for any , . There is a model function in , with Lipschitz continuous gradient and corresponding Lipschitz constant bounded by , such that
(1) the error between the gradient of the model and the original objective function satisfies(2) the error between the model and the original objective function satisfiesSuch a model is called fully linear on .

Remark 2. For this class there exists an algorithm which we will call a model-improvement algorithm that, in a finite, uniformly bounded (with respect to and ) number of steps can(1)either establish that a given model is fully linear on (we will say that a certificate has been provided and the model is certifiably fully linear)(2)or find a model that is fully linear on .

1.2. Affine-Scale Trust Region Method for Bound Constrained Optimization

Since we analyze the bound constrained optimization (1a) and (1b), the trust region subproblem isAs the solution of subproblem (5), induces following decrease:with constant independent of . Hereby, the norm of the projected gradientis a suitably chosen criticality measurement. In order to obtain a relatively short and elegant convergence result, we describe a concrete implementation by means of a Cauchy step that is defined by an affine-scaled gradient used here have stronger smoothness properties. Similar approaches can be found in [1618]. Define the diagonal affine-scaling matrix aswhere and are given constants and is the th component of the gradient of . Solve the subproblemFollowing the idea of [18], we are able to prove that the solution of this quadratic model (9) also satisfies the decrease (6).

1.3. Nonmonotone Filter Technique

The filter method was first introduced for constrained nonlinear optimization by Fletcher and Leyffer [19] and then it has a wide range of applications in various optimization problems; see [18, 2023]. In 2005, the filter method has been extended into a filter trust region method by Gould et al [24] for unconstrained optimization and by Sainvrtu [25] for general box constrained optimization. They indicate that the filter method is a reliable and efficient way for nonlinear optimizations. In this paper, we will make a further study on nonmonotone filter method and propose a new algorithm for (1a) and (1b). The main features of this paper are as follows:(i)We present a further extension of that filter trust region method by introducing both a suitable nonmonotonicity criterion and a derivative-free strategy for bound constrained optimization.(ii)The global convergence of the presented derivative-free trust region method with the nonmonotone filter technique for bound constrained optimization is established.(iii)Numerical results indicate that the new algorithm is effective for problems for which the derivative functions are unavailable.

The paper is therefore in the following way: we present our algorithmic scheme in Section 2. There we first show that the decrease direction described in this paper satisfies the predicted decrease inequality and recall the nonmonotone trust region method from [18, 26] and then make the necessary modifications for a derivative-free version. The global convergence properties of the derivative-free trust region method with nonmonotone filter technique is shown in Section 3. The corresponding numerical results are reported in Section 4 together with some additional tests. Finally, conclusions and further discussions are given.

Notation. Unless otherwise specified, throughout this paper, the norm is the 2-norm for a vector and the induced 2-norm for a matrix. Let denote a closed ball in and denote the closed ball centered at , with radius . In addition, is the level set about . We use the subscript and subscript to distinguish the relevant information between the original function and the approximate function; for example, is the criticality measurement of and is the criticality measurement of . is the trial step in th iteration. and is the gradient and Hessian of the trial step, respectively.

2. The Derivative-Free Trust Region Algorithm with Nonmonotone Filter Technique

We analyze the behaviors of subproblem (9) with the diagonal matrix defined by (8). Let denote the criticality measurement such thatholds on for some . Fraction of Cauchy decrease condition defined aswhere is a constant and with . It is not difficult to prove that both and are criticality measurement, i.e., satisfying that if and only if is the KKT point of problem (1a) and (1b). Furthermore, if is bounded and uniformly continuous on , then is uniformly continuous. The proof is similar to Lemma 6.1 and Lemma 6.2 in [18] except that we now have to replace by the approximated gradient . In order to discuss the global convergence we first provide the following lemma to show that the decrease direction satisfies the predicted decrease inequality (6).

Lemma 3. Suppose that criticality measurement satisfies (10). If and if the trial step satisfies the fraction of Cauchy decrease condition (11), then (6) holds.

Proof. Considering that and is defined above, firstly we obtain from following inequality that is a decrease direction of at 0, that is,In the case that the maximum stepsize is determined by the trust region constraint , we obtain In the case that the maximum stepsize is determined by the lower bounds of the set , thenIn the same way, the stepsize admitted by the upper bounds of the set can be estimated:In the case , we set . Otherwise, in the case that positive definition as well as less than a constant , the function , attains its global minimum at , whereWe have . If , then , and if thenIf , thenIf, on the other hand, , thenCombining with the assumptions (10) and (11), the conclusion is obtained.

There are two criteria in the proposed algorithm, to measure if the trial step is acceptable. One is in the trust region method. The new algorithm is based on a nonmonotone decrease criterion. Nonmonotone trust region methods were investigated by Toint [26] and Ulbrich [18]. Let the increasing sequence enumerate all indices of accepted steps. Moreover,Conversely, if for all , then was rejected. In the following we introduce the set of all these “successful” indices by :We follow [18] to choose integer , , fix , and then compare the predicted decrease promised by the trust region model with a relaxation of the actual decreasefor the computation of reduction ratio in order to decide whether a step is acceptable or not. The idea behind the update rule (22) is the following: instead of requiring that be smaller than , it is only required that is either less than or less than the weighted mean of the function values at the last successful iterates. Of course, if , then and the usual reduction ratio is recovered. Our approach is a slightly stronger requirement than the straightforward idea to replace with . Unfortunately, for this latter choice it does not seem be possible to establish all the global convergence results that are available for the monotone case. For our approach, however, this is possible without making the theory substantially more difficult.

The other criterion is in the filter step. We prefer a filter mechanism to assess the suitability of . Our strategy is inspired by that of [24]: we decide that a trial point is acceptable for the filter if and only ifwhere is a small positive constant and .

Aiming to solve the nonlinear optimization with unavailable first- (or second-) order derivatives and to guarantee reliable and efficient numerical performance, we now state following derivative-free trust region method.

Algorithm 4 (a derivative-free trust region method with nonmonotone filter technique).
Initialization Step. Choose a starting point and suitable constants , , , , , , , , , , , and such that , , , , , , , , , , , and . Set NONCONVEX=0, RESTRICT=0, and .
Main StepStep 1.Construct as in (2). Get and .If is nonconvex, set NONCONVEX=1.Step 2.If and is fully linear and , stop; otherwise, set , implement step 9 to obtain , and make fully linear. Update .Step 3.Solve the trust region subproblem (5).Step 4.Compute the trial point . Obtain , and .Step 5.Evaluate ratioby formulas (6) and (22).If , set and RESTRICT=1. Go to Step 7.Step 6.These are tests to accept the trial step by (20) and (23).If is acceptable for the filter and NONCONVEX=0: set and RESTRICT=0 and add to the filter if either .Elseif , set and RESTRICT=0;Elseif NONCONVEX=1, set and reinitialize the filter to the empty set;Else set and RESTRICT=1.Step 7. Adjust trust region Step 8.If , or and RESTRICT=0 and is not fully linear on , go to Step 1; else set RESTRICT=1, implement Step 9, and then go to Step 2.Step 9.Set and set . Repeat: increment by one, , and improve by until it is fully linear (i.e., satisfying the error bounds (3) and (4)).Set and .Until .

Remark 5. At the beginning of every iteration, Step 5 is encountered with . In this case the sum in (22) is empty and thus we define

Remark 6. In order to obtain a suitable approximation function, the objective function of the trust region subproblem needs to update if necessary. Step 9 is the model improvement method which has the same principle as the Algorithm 2 proposed in [14, 15].

3. Global Convergence for First-Order Critical Points

The purpose of this section is to provide in-depth introduction to the global convergence properties of Algorithm 4 in first-order case. We recall or state some reasonable assumptions that are assumed to hold for problem (1a) and (1b) in order to get global convergence of the derivative-free trust region method.

Assumptions(A2)The level set is bounded.(A3)There exist positive constants and such that and , respectively, for all . And set .

The global convergence property is described by following Theorem 7 which indicates that there exist at less one accumulation point of a sequence generated by the derivative-free trust region method from Algorithm 4 with filter technique which is still a stationary point of the optimization problem (1a) and (1b).

Theorem 7. Suppose that Assumptions (A1)-(A3), the error bounds (3) and (4), and the fact that is bounded by hold. Suppose furthermore that is fully linear on . Then

In order to obtain Theorem 7, the remainder of this section will derive Lemmas 815 as support.

Lemma 8. Suppose that Assumptions (A1)-(A3), the error bounds (3) and (4), and the fact that is bounded by hold. is the fully linear function on . Then step 3 of Algorithm 4 will terminate in a finite number of improvement step, if .

Proof. We prove this result by contradiction. Assume that the loop in Step 9 is infinite. We will show that must be zero in this case.
If Step 9 is implemented, we notice that we do not have a certifiably fully linear model and that the radius exceeds . Then set and improve the model until it is fully linear on . If of the resulting model satisfies , the procedure stops with .
Otherwise, that is, improve the model until it is fully linear on . Then, estimate whether the procedure stops or not. If not, the radius should be multiplied by again, and go on.
The only way for this procedure to be infinite is ifThis construction impliesSince each model was fully linear on , the bound (3) providesThe choice of in Algorithm 4 implies that

Lemma 9. Suppose that Assumptions (A1)-(A3), the error bounds (3) and (4), and the fact that is bounded by hold. is the fully linear function on . Suppose furthermore that , , , etc. are generated by Algorithm 4, then for all computed indices holds,

Proof. We note from (6) that . The proof is by induction. For we have by (20) and using Now assume that (32) holds for . If then, using (32) and ,If then , and with we obtainUsing , and , we can proceed as follows:

Lemma 9 provides the sufficiently descent property at the th iteration of Algorithm 4.

Lemma 10. Suppose that Assumptions (A1)-(A3), the error bounds (3) and (4), and the fact that is bounded by hold. Suppose furthermore that , , , etc. are generated by Algorithm 4, that is fully linear on , and thatThen for arbitrary with , the th iteration is successful.

Proof. Since and , we obtain from the decrease condition (6)On the other hand, since the current model is fully linear on , then from the bound (3) on the error between the function and the model and from (38), if , then we havewhere we have used the assumption to deduce the inequality.
If , then we have, with appropriate ,Since continuous, there exists such thatfor all with . Hence, for these indices ,At the same time, we get from (5) and (38) thatThis inequality implies thatThis means thatFor either case, we obtain the conclusion that ; that is, the th iteration is successful.

Lemma 11. Suppose that Assumptions (A1)-(A3), the error bounds (3) and (4), and the fact that is bounded by hold. Suppose furthermore that is fully linear on and that there exists a constant such that for all . Then there is a constant such that for all .

Proof. The proof is the same as for Lemma 5.3 in [8] when except that we now have to replace by the criticality measurement and use (38) instead of the model decrease defined in [8].

Lemma 12. Suppose that Assumptions (A1)-(A3), the error bounds (3) and (4), and the fact that is bounded by hold. Suppose furthermore that is fully linear on and that there exists a constant such that for all . Then there can be only finitely many successful nonconvex iterations in the course of the algorithm, i.e., .

Proof. Suppose, for the purpose of obtaining a contradiction, that there are infinitely many successful nonconvex iterations, which we index by . It follows from the fact of that the algorithm also guarantees that for all iteration in , where is the set of sufficient descent iterations, which in turn implies with (6) that, for ,where we have used Lemma 8, (A3) and our lower bound on the gradient norm to obtain the last inequality. Combining now this bound with (3), we deduce thatwhere . As we have supposed that there are infinitely many successful nonconvex iterations, we have that , and is unbounded above, which contradicts the fact that the objective function is bound below; as stated in assumption A1, our initial assumption must then be false and the set of successful nonconvex iteration must be finite.

Lemma 13. Suppose that Assumptions (A1)-(A3), the error bounds (3) and (4), and the fact that is bounded by hold. Suppose furthermore that is fully linear on and that there are only finitely many successful iterations, i.e., . Then .

Proof. Let be the index of the last successful iteration. Then andNow observe that Restrict is set by the algorithm in the course of every unsuccessful iteration. This flag must thus be set at the beginning of every iteration of index for . As a consequence, for all . This, (48), and the mechanism of step 7 of the algorithm then imply thatAssume now, for the purpose of establishing a contradiction, that for some . Then Lemma 12 implies that (49) is impossible and we deduce that

Lemma 14. Suppose that Assumptions (A1)-(A3), the error bounds (3) and (4), and the fact that is bounded by hold. Suppose furthermore that is fully linear on and that . Then .

Proof. Assume for the purpose of obtaining a contradiction that, for all large enough,for some and define . The bound (51) and the Lemma 12 then imply that is finite and therefore that the filter is no longer reset to the empty set for sufficiently large. Moreover, since our assumptions imply that is bounded above and away form zero, there must exist a subsequence such thatBy definition of , is acceptable for the filter in each iteration . This implies, since the filter is not reset for large enough, that, for each sufficiently large, there exists an index such thatBut (51) implies that for all sufficiently large. Hence we deduce from (53) thatfor all sufficiently large. But the left-hand side of this inequality tends to zero when tends to infinity because of (52), yielding the desired contradiction. Hence the conclusion holds.

We now consider the similar conclusion of Lemma 5.7 in [8] that if the model criticality measurement converges to 0 on a subsequence, then so does the true criticality measurement .

Lemma 15. For any subsequence such thatit also holds that

Combining both Lemmas 14 and 15, a global convergent property immediately will be given as Theorem 7, which also illustrates that the criticality step plays an important role in ensuring a subsequence of the iterations being close to the first-order stationarity.

4. Numerical Experiment

In this section, we examine the practical performance of the derivative-free trust region method on two aspects. The comparisons of the numerical results between the proposed derivative-free trust region method and traditional gradient algorithm are firstly given in order to illustrate the effectiveness of Algorithm 4 in solving the general optimization problems. And then the derivative-free trust region algorithm with nonmonotone filter technique to parameter estimation is presented to show the performance of Algorithm 4 to derivative-free optimization problems. All routines are written in Matlab R2009a and run on a PC with 2.66GHz Intel(R) Core(TM)2 Quad CPU and 4G DDR2.

4.1. Numerical Results of the Derivative-Free Trust Region Algorithm with Nonmonotone Filter Technique

The Hock [27] test set is frequently used to test derivative-free algorithms on moderate-size problems. For running the proposed derivative-free trust region algorithm with nonmonotone filter technique, the bound constraints of each problem define the set and linear (not bound) constraints of the original problem were processed by projections. We test 27 simple bound constrained optimization problems (List in Table 1) from Test examples for Nonlinear Programming Codes [2729]. The values , , , , , , , , and are used. At the same time, we introduce two different algorithms traditional quasi-Newton method and conjugate gradient method [30] into this section to measure the objective algorithm efficiency through the tested problems. We denote the two algorithms as Algorithm 1.1 and Algorithm 1.2.

We use the tool of Dolan and More [31] to analyze the efficiency of the given three algorithms. Figures 1 and 2 show that the performance of Algorithm 4 is feasible and has the robust property among those three methods. It is not difficult to see from Figure 1 that Algorithm 4 has a huge lead among those three methods in the CPU time for solving each test problems. Simultaneously, Figure 2 illustrates that Algorithm 4 more quickly to optimal performance in terms of the number of function evaluations than other two algorithms.

To measure the efficiency of the proposed algorithm for large-scale optimizations, in this section, we compare this method with Algorithm 2.1 in [32] using three characteristics “NI,” “NF,” and “CPU,” where “NI” presents the number of iterations, “NF” is the calculation frequency of the function, and “CPU” is the time of the process in addressing the tested problems. The numerical results with the corresponding problem are listed in Table 2.

Although we only list the data of dimension of 9000, obviously, the objective algorithm (Algorithm 4) is more effective for large-scale optimizations since the iteration number and CPU time which are two essential aspects to measure the efficiency of an algorithm are less than Algorithm 2.1.

4.2. The Derivative-Free Trust Region Algorithm to Parameter Estimation

Algorithm 16 (basic trust region method).
Step 0. An initial point and an initial trust region radius are given, as well as parameters and Compute and set .
Step 1. Solve a trust region subproblem and compute a step which satisfies the sufficient decrease conditions.
Step 2. Compute and If , then set ; otherwise, set .
Step 3. SetIncrement by one, and go to Step 1.

Trust region methods generate steps with the help of a quadratic model of the objective function, but they use this model in different ways from line search methods. Trust region methods define a region around the current iterate within which the trust the model to be an adequate representation of the objective function and then choose the step to be the approximate minimizer of the model in this region. The size of the trust region is critical to the effectiveness of each step. There are four parameters found in a trust region update process (57); namely, are used to adjust the size of the trust region radius. The values are arbitrary and that much better options are available. The classical parameter valuesare recommended in the literature [30], but they may not be the best choice if we consider the total CPU time or the overall number of function evaluations which was necessary to solve a set of optimization problems. This is a derivative-free optimization problem. In this section, an objective is to identify four optimal parameters found in a trust-region update using the proposed Algorithm 4 in this paper. Next we describe the specific representation of this parameter estimation problem.

Let be a set of optimization problems, , be the CPU time which is necessary to solve the problem with the parameter values , and be the number of function evaluations. Thus the small-dimensional nondifferentiable optimization problem measuring by the overall CPU time or the overall number of function evaluations is written asWe consider to minimize the total computing time . The test problems in and their dimension are list in Table 3. The initial point chosen for Algorithm 4 is . Thus, the best set of parameters given by Algorithm 4 isTable 3 shows the results on the test problems. Timings are in seconds. The measure is the number of function evaluations in each case. Failures occur when . Table 3 demonstrates that this strategy allowed the improvement if the truth initial value from to seconds, i.e., an improvement of approximately 10.31 %. The total number of function evaluations is improved from to , i.e., an improvement of approximately 11.52 %.

The function which characterizes the performance profile of parameter values and is the same as [2]. The performance of the computing time required to solve each optimization problem in the problem set , with the parameter values , can be visualized in the profile of Figure 3 which compares the CPU time with the parameter values . The profile of Figure 4 presents a similar comparison, using the number of function evaluations. Both Figures 3 and 4 illustrate that Algorithm 4 has advantage in solving nondifferentiable optimization problems.

5. Conclusion

This paper proposes a derivative-free trust region algorithm with nonmonotone filter technique for bound constrained optimization.(i)This algorithm is mainly designed to solve the unavailable derivatives optimization problems in engineering. The proposed algorithm possesses the trust region property and adopts nonmonotone filter technique for bound constrained optimization.(ii)The global convergence is provided under the definition of fully linear model. The sufficient descent property is able to make the objective function value descend, and then the iteration sequence converges to the global limit point if the problems are convex.(iii)The preliminary numerical results compared with traditional quasi-Newton method and conjugate gradient method turn out the proposed algorithm is feasible for those of the unavailable derivative functions. Large-scale problems are done by the given problems, which shows that the new algorithms are very effective.(iv)Finally, optimal parameters with respect to overall computational time on a set of test problems are identified. We use the proposed algorithm to get a best choice of parameter values which differ from traditionally used values and compare the numerical results affected by two different parameters from the two aspects of the CPU time and the number of function evaluations.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work is supported by the National Science Foundation of China (Grant nos. 11626037, 11526036, and 11601014); the 13th Five-Year Science and Technology Project of Education Department of Jilin Province (Grant nos. JJKH20170036KJ, JJKH20160046KJ); the Innovation Talent Training Program of Science and Technology of Jilin Province of China (Grant no. 20180519011JH); the Scientific and Technological Planning Project of Jilin Province of China (Grant nos. 20160520108JH, 20170101037JC); the PhD Start-Up Fund of Natural Science Foundation of Beihua University and the Youth Training Project Foundation of Beihua University (Grant no. 2017QNJJL10).