Abstract and Applied Analysis

Abstract and Applied Analysis / 2014 / Article
Special Issue

Nonlinear Analysis: Algorithm, Convergence, and Applications 2014

View this Special Issue

Research Article | Open Access

Volume 2014 |Article ID 236158 | https://doi.org/10.1155/2014/236158

Qiuyu Wang, Yingtao Che, "Sufficient Descent Polak-Ribière-Polyak Conjugate Gradient Algorithm for Large-Scale Box-Constrained Optimization", Abstract and Applied Analysis, vol. 2014, Article ID 236158, 9 pages, 2014. https://doi.org/10.1155/2014/236158

Sufficient Descent Polak-Ribière-Polyak Conjugate Gradient Algorithm for Large-Scale Box-Constrained Optimization

Academic Editor: Gaohang Yu
Received07 Dec 2013
Accepted14 Feb 2014
Published13 Apr 2014


A practical algorithm for solving large-scale box-constrained optimization problems is developed, analyzed, and tested. In the proposed algorithm, an identification strategy is involved to estimate the active set at per-iteration. The components of inactive variables are determined by the steepest descent method at first finite number of steps and then by conjugate gradient method subsequently. Under some appropriate conditions, we show that the algorithm converges globally. Numerical experiments and comparisons by using some box-constrained problems from CUTEr library are reported. Numerical comparisons illustrate that the proposed method is promising and competitive with the well-known method—L-BFGS-B.

1. Introduction

We consider the box-constrained optimization problem where is continuously differentiable, and , with . The gradient of at is , and its th component is () or for the sake of simplicity. We define the feasible region of (1) as ; that is, We say that a vector is a stationary point for problem (1) if it satisfies Problem (1) is very important in practical optimization, because numerous practical problems can be converted into this form. In addition, problem (1) is often treated as a subproblem of augmented Lagrangian or penalty schemes for general constrained optimization. Hence, the development of numerical algorithms to efficiently solve (1), especially when the dimension is large, is important both in theory and application. Moreover, the box-constrained optimization problems have been received much attention in recent decades. We refer to the excellent paper [1] for a good review.

Many algorithms for solving this type of problems are based on active set strategies, for example, [2]. In this class of methods, a working set is used to estimate the set of active constraints at the solution and it is updated at per-iteration. The early active set methods are quite efficient for relatively lower dimensional problems but are typically unattractive for large-scale ones [3]. The main reason is that at most one constraint can be added to or dropped from the active set at each step. However, this slows down the rate of the convergence. Based on this consideration, more and more scholars are engaged in designing active set methods aiming to make rapid changes against/from incorrect predictions, for example, [38].

Let . It is clear that problem (1) reduces to the following unconstrained optimization problem: It can be solved by almost any existing effective methods such as the nonlinear conjugate gradient method, limited memory BFGS method, and spectral gradient method. These methods are much suitable for solving (4) with being assumed to be large due to their simplicity and low storage requirements.

In conjugate gradient methods scheme, Polak-Ribière-Polyak (PRP) method is generally believed to be the most efficient from computation point of view. But its global convergence for nonconvex nonlinear function is still uncertain [9]. To overcome this shortcoming, various techniques have been proposed. In particular, Zhang et al. [10] introduced a modified MPRP method in which the generated direction is descent independent of line search rule [11]. Global convergence is guaranteed by means of a variation of the Armijo-like line search strategy of Grippo and Lucidi [12]. Moreover, MPRP is also considered to be competent to solve the box-constrained optimization problem (1) based on projected gradient techniques [13]. However, to the best of our knowledge, papers on nonlinear conjugate gradient for solving box-constrained minimization problems are relatively fewer.

In this paper, from a different point of view, we further study the applications of the MPRP method in [10] for solving box-constrained minimization problems. First, motivated by the work of Facchinei et al. [3], we estimate a lower dimensional free subspace at per-iteration, which has the potencies of being numerically cheaper when applied to solve large-scale problems. The search direction at per-iteration consists of two parts: some of the components are defined simply; the others are determined by the nonlinear gradient method. The remarkable feather of the proposed method is that all the generated points are feasible without requiring gradient projection. Under some appropriate conditions, we show that the method converges globally. We present experimental results comparing the developed method with the well-known algorithm L-BFGS-B.

The paper is organized as follows. For easy comprehension of the proposed algorithm, we briefly recall an active set identification technique and then state the steps of our new algorithm in Section 2. In Section 3, we show that the proposed method converges globally. In Section 4, we test the performance of the algorithm and do some numerical comparisons. Finally, we draw some conclusions in Section 5. In the rest of this paper, the symbol denotes the Euclidean norm of vectors.

2. Motivation and Algorithm

In this section, we introduce a new algorithm which is defined by where is a search direction and is the corresponding step length. To construct our algorithm, throughout this paper, we assume that the following assumptions hold.

Assumption 1. The level set is compact.

Assumption 2. In some neighborhood of , the gradient is Lipschitz continuous; that is, there exists a constant such that

Assumption 3. Strict complementarity condition holds at ; that is, the strict inequalities hold in the first and the last implications of (3).

In [3], Facchinei et al. presented an active set Newton algorithm for solving problem (1). Their algorithm possesses some favorable properties, such as fast local convergence and feasibility of all iterations. In addition, only a lower dimensional quadratic program subproblem needs to be solved at per-iteration. We now simply recall the active set identification technique in [3]. Let and be nonnegative continuous and bounded functions defined on , such that if or , then or , respectively. For any , define the index set , and as The set is an estimate of the active set at point . Facchinei et al. [3] showed that with Assumption 3, when is sufficiently close to , the index set is an exact identification of . This property will enable us to find the active set at the stationary point in a finite number of steps. This active set identification technique (7) has motivated much additional studies on box-constrained optimization problems, for example, [1418].

Using the active set estimation technique in (7), we deduce the search direction used in our algorithm in detail now. For the sake of simplicity, let ,  ,  , and let be the number of elements in the corresponding vector let be the matrix whose columns are , where is the th column of the identity matrix. Additionally, we denote, respectively, the th component of and by and .

Let , in which Now, we restrict our attention to define components with index . Let where in which , , and . Clearly,   and therefore . It is not difficult to see that the definition of the search direction ensures the sufficient descent property; that is The definition of the search direction (9) was designed by Zhang et al. [10] in full space . Here, it is stated for slightly different circumstances, but it is easy to verify that it is still valid in our context.

In order to guarantee the feasibility of all iterations and the descent of the objective function at each iteration, we define the positive scalar by Note that when the strict complementarity condition holds and , then is always well defined unless the stationary point is achieved. Combining with (8) and (9), the compact form of can be rewritten as

Remark 4. Observe that the search direction of inactive variables in (9) contain both gradient and direction information at the current or the previous steps, while the set of inactive variables might be changed as the iteration progresses. In particular, the sets and might not be the same. In this situation, we set , that is, a negative gradient direction. Otherwise, we use the conjugate gradient formula to update the search direction in the free subspace.

Remark 5. Considering the case when the current iteration is still not an optimal point, the value of might be sufficiently small. In this case, for practical purpose, it might be efficient to restrict bounded from zero.

To list our algorithm in detail, we first show some useful properties of the direction. The following result shows that whenever , it is a descent direction for objective function at current point . The property is very important to our algorithm.

Lemma 6. If is attained by (13), it satisfies and the equality holds if and only if .

Proof. By the definition of search direction , we have The above relation can be easily obtained from the definition of the direction in (13). Notice that the strict complementary condition holds, which shows if and only if .

As an immediate consequence of the definition of the search direction, we have the following fact.

Lemma 7. Let be determined by (13) and assume that ; then,

The above lemma shows that for all such that Since the search direction has been computed, a step length ought to be determined in order to obtain the next iteration as (5). In this paper, we consider a backtracking line search procedure, where a decreasing sequence of is tried until the smallest is found which satisfies where , . Then a step length is accepted corresponding to .

We are now ready to formally state the overall algorithm for solving the box-constrained optimization problems (1) as follows.

Algorithm 8 (SDPRP). The steps of the algorithm is given as follows.
Step  0.  Given starting point , constants , and . Set .
Step  1.  Determine , , and according to (7).
Step  2.  If , set . Otherwise, set Determine by (13).
Step  3.  If , then stop.
Step  4.  Find the smallest integer (say ) satisfying (18). Let .
Step  5.  Set .
Step  6.  Let , go to Step  1.

Remark 9. Now suppose that the active sets have been identified after a finite number of steps; then, we have for all and for all . If , by the line search process, we know that does not satisfy (18). That is,
By the mean-value theorem, there is a such that where is the Lipschitz constant. It follows from [10, Lemma 3.1] that there exists a positive constant such that . Substituting the above inequality into (20), we have It follows that when , the line search step is well defined, so is the whole algorithm.

3. Convergence Analysis

In this section, we show that Algorithm 8. converges globally. The following lemma gives a sufficient and necessary condition for the global convergence of Algorithm 8.

Lemma 10. Let be a sequence of iterations generated by Algorithm 8, and let be a search direction defined by (13). Then is a stationary point of (1) if and only if .

Proof. Let . If , then we have Since and , the right inequality implies . If , we will prove it similarly. And if , we can get from (11) that . Now suppose that is a stationary point of (1); it gets from (3) and (7) that Then it follows from (9) and (11) that , and . Therefore .

Lemma 11. Let be a sequence of iterations generated by Algorithm 8 and let be a search direction defined by (13). Suppose that there are subsequences and as . Then is a stationary point of problem (1).

Proof. Since the number of distinct sets , , and is finite and Assumptions 13 hold, Assumption 3 ensures that there exists such that for all [3], Since , we obviously have Furthermore, the fact that and implies The proof of can be obtained once we notice that (11) and as .

The similar proof of Lemmas 10 and 11 can also be found in [3]. To end of this section, now we are ready to establish global convergence of Algorithm 8.

Theorem 12. Suppose that Assumptions 13 hold. Then the sequence generated by Algorithm 8 has at least a limit point, and every limit point of this sequence is a stationary point of problem (1).

Proof. It easily follows from Lemma 7 that the sequence of points generated by the Algorithm 8 is contained in the compact set . Hence, by Assumption 1, there exists at least a limit point of this sequence.
If the sequence is finite with last point , then, by Lemma 10, is a stationary point of problem (1). So we assume that the sequence is infinite.
Regarding the first part of proof in Lemma 11, , , and are constants starting from some iteration index ; that is,
It is easy to see that as . Now, it remains to prove () as . According to the global convergence theorem in [10], we have By the definition of in (9), it is equivalent to It follows from (29) and (31) that , which shows our claims by Lemma 11. The proof is complete.

4. Numerical Experiments

Now, let us report some numerical results attained by our Sufficient Descent Polak-Ribière-Polyak Algorithm—SDPRP. The algorithm is implemented by Fortran77 code in double precision arithmetic. All experiments are run on a PC with CPU Intel Pentium Dual E2140 1.6 GHz, 512 M bytes of SDRAM memory, and Red Hat Linux 9.03 operating system. Our experiments are performed on a set of the nonlinear box-constrained problems from the CUTEr [19] library that have second derivatives available. Since we are interested in large problems, we refine this selection by considering only problems where the number of variables is at least . Altogether, we solve problems. The type of objective function and the character of the problems being tested are listed in Table 1.

Number of sets Objective type Problem character Classification

1 Others AcademicOBR2-AN- -
2 Others ModelingOBR2-MN- -
3 Others Real applicationOBR2-RN- -
4 Sum of squares AcademicSBR2-AN- -
5 Sum of squares ModelingSBR2-AN- -
6 Quadratic AcademicQBR2-AN- -
7 Quadratic ModelingQBR2-MN- -
8 Quadratic Real applicationQBR2-RN- -
9 Others Modeling OBI2-MY- -
10 Quadratic AcademicQBR2-AY- -

In the experiments, for easily comparing with other codes, we use the projected gradient errors to measure the quality of the solutions instead of ; that is, we force the iteration stopped when where is the projected gradient of objective function and denotes the maximum absolute component of a vector. Clearly, the stopping criteria (32) are equivalent to in our method. We also stop the execution of SDPRP when iterations or function evaluations are completed without achieving convergence. We choose the initial parameters , , , and . Moreover, we also test our proposed methods with different parameters and to see that is the best choice. In order to assess the performance of the SDPRP, we then test the well-known method L-BFGS-B (available at http://www.ece.northwestern.edu/~nocedal/lbfgsb.html) [20, 21] for comparison.

When running the codes L-BFGS-B, default values are used for all parameters. For L-BFGS-B, besides (32), the algorithm has another built-in stopping test based on the parameter factr. It is designed to terminate the run when the change in the objective function is sufficiently small. The default value factr for moderate accuracy. Additionally, it can occur sometimes that the line search cannot make any progress, or the value of Nfg reached a prefixed number (=10000); in these cases, the run is also terminated.

The numerical results of the algorithms SDPRP and L-BFGS-B are listed in Tables 2 and 3, respectively. We used horizontal lines in both tables to divide the selected problems into classes according to Table 1. The columns in Tables 2 and 3 have the following meanings:Problem: name of the problem;Dim: dimension of the problem;Iter: number of iterations;Nf: number of function evaluations;Ng: number of gradient evaluations;Time: CPU time in seconds;Fv: final function value;Pgnorm1: maximum-norm of the final gradient projection.

Problem Dim Iter Nf Time Fv Pgnorm1

SINEALI 1000 19 64 0.030
BDEXP 1000 10001 10003 6.030
EXPLIN 120 76 260 0.010
EXPLIN2 120 59 183 0.000
EXPQUAD 120 4007 20001 0.230
MCCORMCK 5000 32 72 0.130
PROBPENL 500 18 173 0.010

QRTQUAD 120 949 4240 0.030
S368 100 54 136 0.240

HADAMALS 1024 113 479 0.390

SCOND1LS 5002 5061 20001 81.330
CHEBYQAD 50 735 4514 6.440
HS110 200 1 2 0.000
LINVERSE 1999 1527 1565 2.220
NONSCOMP 5000 67 128 0.110
QR3DLS 610 2760 20000 5.530

DECONVB 61 3353 11830 0.260

QUDLIN 5000 1 2 0.00
BIGGSB1 5000 10001 11535 10.93
BQPGABIM 50 45 213 0.00
BQPGASIM 50 57 223 0.00
BQPGAUSS 2003 3945 20001 6.78
CHENHARK 5000 6335 20001 9.56
CVXBQP1 10000 14 15 0.05
HARKERP2 100 10 47 0.00
JNLBRNG1 10000 4001 7427 38.40
JNLBRNG2 10000 1911 6880 23.35
JNLBRNGA 10000 2182 5955 20.17
JNLBRNGB 10000 4981 20001 54.23
NCVXBQP1 10000 1 2 0.01
NCVXBQP2 10000 3180 20000 16.18
NCVXBQP3 10000 3184 20000 17.11
NOBNDTOR 5476 893 1462 3.54
OBSTCLAE 10000 5261 5794 40.60
OBSTCLAL 10000 813 1121 6.29
OBSTCLBL 10000 3509 3872 26.90
OBSTCLBM 10000 3362 3730 25.97
OBSTCLBU 10000 2126 2454 16.71
PENTDI 1000 11 22 0.00

TORSION1 10000 1550 2401 12.51
TORSION2 10000 3227 4745 26.11
TORSION3 10000 394 712 3.23
TORSION4 10000 2183 2957 17.40
TORSION5 10000 114 228 1.26
TORSION6 10000 1639 1847 12.37
TORSIONA 10000 1410 2273 13.26
TORSIONB 10000 2662 4351 25.30
TORSIONC 10000 433 747 4.41
TORSIOND 10000 2269 3019 20.06
TORSIONE 10000 97 194 0.95
TORSIONF 10000 1589 1775 13.42

ODNAMUR 11130 10001 16448 44.78

GRIDGENA 6218 620 20000 16.58

NOBNDTOR 5476 893 1462 3.69


Problem Dim Nf Ng Time Fv Pgnorm1

SINEALI 1000 44 44 0.03
BDEXP 1000 18 18 0.01
EXPLIN 120 40 40 0.00
EXPLIN2 120 43 43 0.00
EXPQUAD 120 8 8 0.00
MCCORMCK 5000 10150 10150 40.29
PROBPENL 500 4 4 0.00

QRTQUAD 120 13349 13349 0.63
S368 100 21 21 0.07

HADAMALS 1024 20 20 0.04

SCOND1LS 5002 25 25 0.15
CHEBYQAD 50 21 21 0.15
HS110 200 21 21 0.02
LINVERSE 1999 195 195 0.32
NONSCOMP 5000 42 42 0.09
QR3DLS 610 11054 11054 10.24

DECONVB 61 291 291 0.02

QUDLIN 5000 314 314 0.16
BIGGSB1 5000 4452 4452 8.36
BQPGABIM 50 41 41 0.00