Nonlinear Analysis: Algorithm, Convergence, and Applications 2014View this Special Issue
Sufficient Descent Polak-Ribière-Polyak Conjugate Gradient Algorithm for Large-Scale Box-Constrained Optimization
A practical algorithm for solving large-scale box-constrained optimization problems is developed, analyzed, and tested. In the proposed algorithm, an identification strategy is involved to estimate the active set at per-iteration. The components of inactive variables are determined by the steepest descent method at first finite number of steps and then by conjugate gradient method subsequently. Under some appropriate conditions, we show that the algorithm converges globally. Numerical experiments and comparisons by using some box-constrained problems from CUTEr library are reported. Numerical comparisons illustrate that the proposed method is promising and competitive with the well-known method—L-BFGS-B.
We consider the box-constrained optimization problem where is continuously differentiable, and , with . The gradient of at is , and its th component is () or for the sake of simplicity. We define the feasible region of (1) as ; that is, We say that a vector is a stationary point for problem (1) if it satisfies Problem (1) is very important in practical optimization, because numerous practical problems can be converted into this form. In addition, problem (1) is often treated as a subproblem of augmented Lagrangian or penalty schemes for general constrained optimization. Hence, the development of numerical algorithms to efficiently solve (1), especially when the dimension is large, is important both in theory and application. Moreover, the box-constrained optimization problems have been received much attention in recent decades. We refer to the excellent paper  for a good review.
Many algorithms for solving this type of problems are based on active set strategies, for example, . In this class of methods, a working set is used to estimate the set of active constraints at the solution and it is updated at per-iteration. The early active set methods are quite efficient for relatively lower dimensional problems but are typically unattractive for large-scale ones . The main reason is that at most one constraint can be added to or dropped from the active set at each step. However, this slows down the rate of the convergence. Based on this consideration, more and more scholars are engaged in designing active set methods aiming to make rapid changes against/from incorrect predictions, for example, [3–8].
Let . It is clear that problem (1) reduces to the following unconstrained optimization problem: It can be solved by almost any existing effective methods such as the nonlinear conjugate gradient method, limited memory BFGS method, and spectral gradient method. These methods are much suitable for solving (4) with being assumed to be large due to their simplicity and low storage requirements.
In conjugate gradient methods scheme, Polak-Ribière-Polyak (PRP) method is generally believed to be the most efficient from computation point of view. But its global convergence for nonconvex nonlinear function is still uncertain . To overcome this shortcoming, various techniques have been proposed. In particular, Zhang et al.  introduced a modified MPRP method in which the generated direction is descent independent of line search rule . Global convergence is guaranteed by means of a variation of the Armijo-like line search strategy of Grippo and Lucidi . Moreover, MPRP is also considered to be competent to solve the box-constrained optimization problem (1) based on projected gradient techniques . However, to the best of our knowledge, papers on nonlinear conjugate gradient for solving box-constrained minimization problems are relatively fewer.
In this paper, from a different point of view, we further study the applications of the MPRP method in  for solving box-constrained minimization problems. First, motivated by the work of Facchinei et al. , we estimate a lower dimensional free subspace at per-iteration, which has the potencies of being numerically cheaper when applied to solve large-scale problems. The search direction at per-iteration consists of two parts: some of the components are defined simply; the others are determined by the nonlinear gradient method. The remarkable feather of the proposed method is that all the generated points are feasible without requiring gradient projection. Under some appropriate conditions, we show that the method converges globally. We present experimental results comparing the developed method with the well-known algorithm L-BFGS-B.
The paper is organized as follows. For easy comprehension of the proposed algorithm, we briefly recall an active set identification technique and then state the steps of our new algorithm in Section 2. In Section 3, we show that the proposed method converges globally. In Section 4, we test the performance of the algorithm and do some numerical comparisons. Finally, we draw some conclusions in Section 5. In the rest of this paper, the symbol denotes the Euclidean norm of vectors.
2. Motivation and Algorithm
In this section, we introduce a new algorithm which is defined by where is a search direction and is the corresponding step length. To construct our algorithm, throughout this paper, we assume that the following assumptions hold.
Assumption 1. The level set is compact.
Assumption 2. In some neighborhood of , the gradient is Lipschitz continuous; that is, there exists a constant such that
Assumption 3. Strict complementarity condition holds at ; that is, the strict inequalities hold in the first and the last implications of (3).
In , Facchinei et al. presented an active set Newton algorithm for solving problem (1). Their algorithm possesses some favorable properties, such as fast local convergence and feasibility of all iterations. In addition, only a lower dimensional quadratic program subproblem needs to be solved at per-iteration. We now simply recall the active set identification technique in . Let and be nonnegative continuous and bounded functions defined on , such that if or , then or , respectively. For any , define the index set , and as The set is an estimate of the active set at point . Facchinei et al.  showed that with Assumption 3, when is sufficiently close to , the index set is an exact identification of . This property will enable us to find the active set at the stationary point in a finite number of steps. This active set identification technique (7) has motivated much additional studies on box-constrained optimization problems, for example, [14–18].
Using the active set estimation technique in (7), we deduce the search direction used in our algorithm in detail now. For the sake of simplicity, let , , , and let be the number of elements in the corresponding vector let be the matrix whose columns are , where is the th column of the identity matrix. Additionally, we denote, respectively, the th component of and by and .
Let , in which Now, we restrict our attention to define components with index . Let where in which , , and . Clearly, and therefore . It is not difficult to see that the definition of the search direction ensures the sufficient descent property; that is The definition of the search direction (9) was designed by Zhang et al.  in full space . Here, it is stated for slightly different circumstances, but it is easy to verify that it is still valid in our context.
In order to guarantee the feasibility of all iterations and the descent of the objective function at each iteration, we define the positive scalar by Note that when the strict complementarity condition holds and , then is always well defined unless the stationary point is achieved. Combining with (8) and (9), the compact form of can be rewritten as
Remark 4. Observe that the search direction of inactive variables in (9) contain both gradient and direction information at the current or the previous steps, while the set of inactive variables might be changed as the iteration progresses. In particular, the sets and might not be the same. In this situation, we set , that is, a negative gradient direction. Otherwise, we use the conjugate gradient formula to update the search direction in the free subspace.
Remark 5. Considering the case when the current iteration is still not an optimal point, the value of might be sufficiently small. In this case, for practical purpose, it might be efficient to restrict bounded from zero.
To list our algorithm in detail, we first show some useful properties of the direction. The following result shows that whenever , it is a descent direction for objective function at current point . The property is very important to our algorithm.
Lemma 6. If is attained by (13), it satisfies and the equality holds if and only if .
Proof. By the definition of search direction , we have The above relation can be easily obtained from the definition of the direction in (13). Notice that the strict complementary condition holds, which shows if and only if .
As an immediate consequence of the definition of the search direction, we have the following fact.
Lemma 7. Let be determined by (13) and assume that ; then,
The above lemma shows that for all such that Since the search direction has been computed, a step length ought to be determined in order to obtain the next iteration as (5). In this paper, we consider a backtracking line search procedure, where a decreasing sequence of is tried until the smallest is found which satisfies where , . Then a step length is accepted corresponding to .
We are now ready to formally state the overall algorithm for solving the box-constrained optimization problems (1) as follows.
Algorithm 8 (SDPRP). The steps of the algorithm is given as follows.
Step 0. Given starting point , constants , and . Set .
Step 1. Determine , , and according to (7).
Step 2. If , set . Otherwise, set Determine by (13).
Step 3. If , then stop.
Step 4. Find the smallest integer (say ) satisfying (18). Let .
Step 5. Set .
Step 6. Let , go to Step 1.
Remark 9. Now suppose that the active sets have been identified after a finite number of steps; then, we have for all and for all . If , by the line search process, we know that does not satisfy (18). That is,
By the mean-value theorem, there is a such that where is the Lipschitz constant. It follows from [10, Lemma 3.1] that there exists a positive constant such that . Substituting the above inequality into (20), we have It follows that when , the line search step is well defined, so is the whole algorithm.
3. Convergence Analysis
Proof. Let . If , then we have Since and , the right inequality implies . If , we will prove it similarly. And if , we can get from (11) that . Now suppose that is a stationary point of (1); it gets from (3) and (7) that Then it follows from (9) and (11) that , and . Therefore .
Lemma 11. Let be a sequence of iterations generated by Algorithm 8 and let be a search direction defined by (13). Suppose that there are subsequences and as . Then is a stationary point of problem (1).
Proof. Since the number of distinct sets , , and is finite and Assumptions 1–3 hold, Assumption 3 ensures that there exists such that for all , Since , we obviously have Furthermore, the fact that and implies The proof of can be obtained once we notice that (11) and as .
Proof. It easily follows from Lemma 7 that the sequence of points generated by the Algorithm 8 is contained in the compact set . Hence, by Assumption 1, there exists at least a limit point of this sequence.
If the sequence is finite with last point , then, by Lemma 10, is a stationary point of problem (1). So we assume that the sequence is infinite.
Regarding the first part of proof in Lemma 11, , , and are constants starting from some iteration index ; that is,
It is easy to see that as . Now, it remains to prove () as . According to the global convergence theorem in , we have By the definition of in (9), it is equivalent to It follows from (29) and (31) that , which shows our claims by Lemma 11. The proof is complete.
4. Numerical Experiments
Now, let us report some numerical results attained by our Sufficient Descent Polak-Ribière-Polyak Algorithm—SDPRP. The algorithm is implemented by Fortran77 code in double precision arithmetic. All experiments are run on a PC with CPU Intel Pentium Dual E2140 1.6 GHz, 512 M bytes of SDRAM memory, and Red Hat Linux 9.03 operating system. Our experiments are performed on a set of the nonlinear box-constrained problems from the CUTEr  library that have second derivatives available. Since we are interested in large problems, we refine this selection by considering only problems where the number of variables is at least . Altogether, we solve problems. The type of objective function and the character of the problems being tested are listed in Table 1.
In the experiments, for easily comparing with other codes, we use the projected gradient errors to measure the quality of the solutions instead of ; that is, we force the iteration stopped when where is the projected gradient of objective function and denotes the maximum absolute component of a vector. Clearly, the stopping criteria (32) are equivalent to in our method. We also stop the execution of SDPRP when iterations or function evaluations are completed without achieving convergence. We choose the initial parameters , , , and . Moreover, we also test our proposed methods with different parameters and to see that is the best choice. In order to assess the performance of the SDPRP, we then test the well-known method L-BFGS-B (available at http://www.ece.northwestern.edu/~nocedal/lbfgsb.html) [20, 21] for comparison.
When running the codes L-BFGS-B, default values are used for all parameters. For L-BFGS-B, besides (32), the algorithm has another built-in stopping test based on the parameter factr. It is designed to terminate the run when the change in the objective function is sufficiently small. The default value factr for moderate accuracy. Additionally, it can occur sometimes that the line search cannot make any progress, or the value of Nfg reached a prefixed number (=10000); in these cases, the run is also terminated.
The numerical results of the algorithms SDPRP and L-BFGS-B are listed in Tables 2 and 3, respectively. We used horizontal lines in both tables to divide the selected problems into classes according to Table 1. The columns in Tables 2 and 3 have the following meanings: Problem: name of the problem; Dim: dimension of the problem; Iter: number of iterations; Nf: number of function evaluations; Ng: number of gradient evaluations; Time: CPU time in seconds; Fv: final function value; Pgnorm1: maximum-norm of the final gradient projection.
The symbol “—” indicates that the CUTEr system becomes nonrespondent when calculating the corresponding problem. For L-BFGS-B, we record the number of function and gradient evaluations Nfg because L-BFGS-B always evaluates the function and gradient at the same time.
Some general observations on the results in Tables 2 and 3 are the following. From the last column of Table 3, we see that, for problems, L-BFGS-B has terminated abnormally without being able to satisfy the termination condition (32). SDPRP fails to reach a stationary point based on the stopping criteria (32) only on the problems: BDEXP, EXPQUAD, SOND1LS, QR3DLS, BIGGS1, BQPGAUSS, CHENHARK, JNLBRNGB, NCVXBQP1, ODNAMUR, NCVXBQP2, GRIDGENA. It is an interesting fact that L-BFGS-B obtains at least as good function value as SDPRP on some problems but the projected gradient did not meet the stooping condition. This reason is not clear as pointed out by Zhu et al. in . In analyzing the performance of both methods on the problems at which at least one method works successfully, we observe that, on these problems, L-BFGS-B requires less iterations, less function evaluations than SDPRP. The sterical data of both algorithms are summarized in Table 4, where “Scueed” is the number of successes and “Failure” is the number of failures of tested problems for both algorithms.
Taking everything together, the preliminary numerical comparisons indicate that our proposed method is efficient and competitive with the well-known method L-BFGS-B. Moreover, we conclude that the method provides a valid approach for solving large-scale box-constrained problems.
In this paper, we have developed a subspace nonlinear conjugate gradient method for solving box-constrained optimization problems. For most of the tested optimization problems (42 out of 54), the algorithm works successfully to terminate at the solution. However, in the most cases, the number of function evaluations seems large. A common feather of this method is that all the generated points are feasible, without requiring gradient projection as usual methods in this scheme. Therefore, this may be the reason why the function evaluation numbers are higher than those of projected gradient or trust-region methods. However, we also believe that SDPRP is a valid approach for box-constrained problems. In our view, there are at least three issues that could lead to improvements. The first point that should be considered is probably the choice of the parameters in the active set identification technique and the value of the used parameters is not the only choice. Another important point that should be further investigated is the adoption of gradient projection technique. The third assumption is strong; how can we modify this algorithm so as to avoid the strict complementary assumption? Additionally, it is worthwhile to investigate that some existing conjugate gradient methods, such as [12, 22], whether it is possible to embed an active set framework to solve box-constrained problems. To this end, although the proposed method does not obtain significant development as we have expected, we think that the enhancement of this proposed method is still noticeable. Hence, we believe that the proposed algorithm is a valid approach for the problems and possibility and it may have its own potency.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
The authors thank the associate editor and two anonymous referees for their constructive suggestions which improved the paper greatly. This author’s work is supported by Naturel Science Foundation of Henan Province Grants GGJS2011030 and 132300410168.
D. G. Luenberger, Introduction to Linear and Nonlinear Programming, chapter 11, Addison-Wesley, Reading, Mass, USA, 1973.
J. Zhang, Y. Xiao, and Z. Wei, “Nonlinear conjugate gradient methods with sufficient descent condition for large-scale unconstrained optimization,” Mathematical Problems in Engineering, vol. 2009, Article ID 243290, 16 pages, 2009.View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
L. Zhang, Nonlinear conjugate gradient methods for optimization problems [Ph.D. thesis], College of Mathematics and Econometrics, Hunan University, Hunan, China, 2006.