Nonlinear Conjugate Gradient Methods with Sufficient Descent Condition for Large-Scale Unconstrained Optimization
Two nonlinear conjugate gradient-type methods for solving unconstrained optimization problems are proposed. An attractive property of the methods, is that, without any line search, the generated directions always descend. Under some mild conditions, global convergence results for both methods are established. Preliminary numerical results show that these proposed methods are promising, and competitive with the well-known PRP method.
In this paper, we consider the unconstrained optimization problem where is a continuously differentiable function, and its gradient at point is denoted by , or for the sake of simplicity. is the number of variables, which is automatically assumed to be large. The iterative formula of nonlinear conjugate gradient method is given by where is a steplength, and is a search direction which is determined by where is a scalar. Since 1952, there have been many well-known formulas for the scalar , for example, Fletcher-Reeves (FR), Ploak-Ribiére-Polyak (PRP), Hestenes-Stiefel (HS), and Dai-Yuan (DY) (see [1–4]), where , symbol denotes the Euclidean norm of vectors. Their corresponding methods generally specified as FR, PRP, HS, and DY conjugate gradient methods. If is a strictly convex quadratic function, all these methods are equivalent in the case that an exact line search is used. If the objective function is nonconvex, their behaviors may be distinctly different. In the past two decades, the convergence properties of FR, PRP, HS, and DY methods have been intensively studied by many researchers (e.g., [5–13]).
In practical computation, the HS and PRP methods, which share the common numerator , are generally believed to be the most efficient conjugate gradient methods, and have got meticulous in recent years. One remarkable property of both methods is that they essentially perform a restart if a bad direction occurs (see ). However, Powell  constructed an example showed that both methods can cycle infinitely without approaching any stationary point even if an exact line search is used. This counter-example also indicates that both methods have a drawback that they may not globally be convergent when the objective function is non-convex. Therefore, during the past few years, much effort has been investigated to create new formulae for , which not only possess global convergence for general functions but are also superior to original method from the computation point of view (see [15–22]). An excellent survey of nonlinear conjugate gradient methods with special attention to global convergence properties is made by Hager and Zhang .
Recently, Dai and Liao  proposed two new formulae (called DL and DL+) for based on the secant condition from quasi-Newton method. More lately, Li, Tang, and Wei (see ) also presented another two formulae (called LTW and LTW+) based on a modified secant condition in . In addition, the corresponding conjugate gradient method for with DL+ (or LTW+) converges globally for non-convex minimization problems, the reported numerical results showed that it excels the standard PRP method. However, the convergence result of the method for with formula DL (or LTW) has not been totally explored yet. In this paper, we further study conjugate gradient method for the solution of unconstrained optimization problems. Meanwhile, we focus our attention on the scalar for with DL (or LTW). Our motivation mainly comes from the recent work of Zhang et al. . We introduce two versions of modified DL and LTW conjugate gradient-type methods. An attractive property of both proposed methods are that the generated directions are always descending. Besides, this property is independent of line search used and the convexity of objective function. Under some favorable conditions, we establish the global convergence of the proposed methods. We also do some numerical experiments by using a large set of unconstrained optimization problems, which indicate the proposed methods possess better performances when compared with the classic PRP method.
We organize this paper as follows. In the next Section, we briefly review the conjugate gradient methods are proposed in [18, 20]. We present two conjugate gradient methods in Sections 3 and 4, respectively. Global convergence properties are also discussed simultaneously. In the last Section we perform the numerical experiments by using a set of large problems, and do some numerical comparisons with PRP method.
2. Conjugate Gradient Methods with Secant Condition
In this section we give a short description of the new conjugate gradient method of Dai and Liao in . In the following, we also briefly review another effective conjugate gradient method of Li, Tang, and Wei in . Motivated by the these methods, we introduce our new versions of conjugate gradient-type methods in the following sections.
The following two assumptions are often utilized in convergence analysis for conjugate gradient algorithms.
Assumption 2.1. The objective function is bounded below, and the level set is bounded.
Assumption 2.2. In some neighborhood of , is differentiable and its gradient is Lipschitz continuous, namely, there exists a positive constant such that
The above assumption implies that there exists a positive constant such that
2.1. Dai-Liao Method
Note that, in quasi-Newton method, standard BFGS method, and limited memory BFGS method, the serch direction always have the common form where is some symmetric and positive definite matrix satisfing the secant condition (or quasi-Newton equation) Combing the above two equations, we obtain Keeping these relations in mind, Dai and Liao introduced the following conjugacy condition: where is a parameter. Multiplying (1.3) with and making use of the new conjugacy condition (2.6), Dai and Liao obtained the following new formula for computing : In order to ensure the global convergence for general functions, Dai and Liao restrict to be positive, that is, The reported numerical experiments showed that the corresponding conjugate gradient method is efficient.
2.2. Li-Tang-Wei Method
Recently, Wei et al.  proposed a modified secant condition where
Notice this new secant condition contains not only gradient value information, but also function value information at the present and the previous step. Additionally, this modified secant condition has inspired many further studies on optimization problems (e.g., [23–25]).
Based on the modified secant condition (2.9), Li, Tang and Wei (see ) presented the new conjugacy condition: Similar to the Dai-Liao formulas in (2.7) and (2.8), Li, Tang, and Wei also constructed the following two conjugate gradient formulas for : where .
Obviously, the new secant condition (2.11) gives a more accurate approximation for the Hessian of the objective function (see ). Hence the formulas (2.12) should outperform the Dai-Liao’s methods from theoretically, and the numerical results in  confirmed this claim. In addition, based on another modified quasi-Newton equation of Zhang et al. , Yabe and Takano  also proposed some similar conjugate gradient methods for unconstrained optimization.
Combing with strong Wolfe-Powell line search, the conjugate gradient methods with from DL+ or LTW+ were proved convergent globally for non-convex minimization problems. But for from DL or LTW, there are no similar results. The major contribution of our following work is to circumvent this difficulty. However, our attention does not focus on the general iterative style (1.3), our idea mainly originate from the very recently three-term conjugate gradient method of Zhang et al. .
3. Modified Dai-Liao Method
As we have stated in the previous section, the standard conjugate gradient method with (1.2)-(1.3) and (2.7) cannot guarantee the sequence approaches to any stationary point of the problem. In this section, we will appeal to a three-term form to take the place of (1.3).
The first three-term nonlinear conjugate gradient algorithm was presented by Nazareth , in which the search direction is determined by with , . The main property of is that, for a quadratic function, it remains conjugate even without exact line searches. Recently, Zhang et al.  proposed a descent modified PRP conjugate gradient method with three terms as follows: where . A remarkable property of the method is that it produces a descent direction at each iteration. Motivated by the nice descent property, we also give a three-term conjugate gradient method based on the DL formula for in (2.7), that is, where and . It is easy to see that the sufficient descent condition also holds true if no line search is used, that is,
In order to achieve the global convergence result of the PRP method, Grippo and Lucidi  proposed a new line search below. For given constants , , and , let satisfy where are constants. Here we prefer this new line search to the classical Armijo one for the sake of a greater reduction of objective function and wider tolerance of (see ).
Introducing the line search rule, we are now ready to state the steps of the modified Dai-Liao (MDL) conjugate gradient-type algorithm as follows.
From now on, we use to denote . For MDL algorithm, we have the following two important results. The proof of the following first lemma was established by Zoutendijk , where it is stated for slightly different circumstances. For convenience, we give the detailed proof here.
Lemma 3.2. Consider the conjugate gradient-type method in the form (1.2) and (3.3), and let the steplength be obtained by the line search (3.7)-(3.8). Suppose that Assumptions 2.1-2.2 hold. Then one has
Proof. Since is obtained by the line search (3.7)-(3.8). Then, by (3.4) and (3.7) we have Hence, is a decreasing sequence and the sequence is contained in . Hence, Assumptions 2.1-2.2 imply that there exists a constant such that From (3.11), we have This together with (3.10) implies that (3.9) holds.
Lemma 3.3. If there exists a constant such that then there exists a constant such that
Proof. From the line search (3.7)-(3.8) and (3.4), we have By the definition of in (3.3) , we get from (2.1), (2.2), (3.4), and (3.13) that Lemma 3.2 indicates that as , then there exists a constant and an integer , such that the following inequality holds for all : Hence, we have for any Setting , we deduce that for all .
Using the preceding lemmas, we are now ready to give the promised convergence results.
Proof. We proceed by contradiction. Assume that the conclusion is not true. Then there exists a positive constant such that
If , we have from (3.9) that . This contradicts assumption (3.20).
Suppose that . Using Assumptions 2.1-2.2 and (3.8), we obtain Combining with (3.4) yields The above inequality and Lemma 3.3 imply , which contradicts (3.20). This completes the proof.
4. Modified Li-Tang-Wei Method
In a similar manner, we provide a modified Li-Tang-Wei method with three terms in the form: where . It is not difficult to see that the sufficient descent property (3.4) also holds.
Proof. See Lemma 3.2.
Lemma 4.3. Consider the conjugate gradient-type method in the form (1.2) and (4.1), let the steplength be obtained by line search (3.7)-(3.8). Suppose that Assumptions 2.1-2.2 hold. Then one has where was defined as in Assumption 2.2.
Proof. Since is obtained by (3.7)-(3.8), from (3.10) we know that By mean value theorem, we know that there exists such that Using (4.4) we get where denotes the closed convex hull of . It follows from the definition of and (4.5) that From the definition of and Assumption 2.2, we know that This verifies our claims.
Lemma 4.4. If there exists a constant such that then there exists a constant such that
Proof. From the line search (3.7)-(3.8) and (3.4), we have According to the definition of in (4.1), we get from (2.1), (2.2), (4.9), and (4.11) that Lemma 4.3 shows that as . Hence there exists a constant and an integer , such that the following inequality holds for all : Hence, we have for any Let , we get (4.10).
Now we can establish the following global convergence theorem for MLTW method. Since its proof is essentially similar to Theorem 3.4, we omit it.
To end of this section, we show that MLWT method is equivalent to all the general method (1.4) if an exact line search is used. In deriving this equivalence, we work with an exact line search rule, that is, we compute such that is satisfied. Hence, Subsequently, Moreover, let where is an symmetric positive definite matrix, , and is a real number. In this case, it is not difficult to see that . Note that by the definition of in (2.12), we have Then we have the main properties of a conjugate gradient method. The following theorem shows that MLWT method have quadratic termination property, which means that the method terminates at most steps when it is applied to a positive definite quadratic. The proof can be found in Theorem in  and is omitted.
Theorem 4.6. For a positive definite quadratic function (4.19), the conjugate gradient method (1.2)–(4.1) with exact line search terminates after steps, and the following properties hold for all , (), where is the number of distinct eigenvalues of .
The theorem also shows that conjugate gradients (1.2)–(4.1) represent conjugacy of directions, orthogonality of gradients, and descent condition. This also indicates that methods (1.2)–(4.1) preserve the property of being equivalent to the general conjugate gradient method (1.4) for strict convex quadratics with exact line search. The cases of DL, LWT, and MDL can be proved in a similar way.
5. Numerical Experiments
The main work of this section is to report the performance of the algorithms MDL and MLTW on a set of test problems. The codes were written in Fortran77 and in double precision arithmetic. All the tests were performed on a PC (Intel Pentium Dual E2140 1.6 GHz, 256 MB SDRAM). Our experiments were performed on a set of 73 nonlinear unconstrained problems that have second derivatives available. These test problems are contributed by . Andrei, and the Fortran expression of their functions and gradients are available at http://www.ici.ro/camo/neculai/SCALCG/evalfg.for. 26 out of these problems are from CUTE  library. For each test function we have considered 10 numerical experiments with number of variables .
In order to assess the reliability of our algorithms, we also tested these methods against the well-known routine PRP using the same problems. The PRP code is coauthored by Liu, Nocedal, and Waltz, it can be obtained from Nocedal’s web page at http://www.ece.northwestern.edu/~nocedal/software.html/. While running of the PRP code, default values were used for all parameters. All these algorithms terminate when the following stopping criterion is met: We also force these routines stopped if the iterations exceed or the number of function evaluations reach without achieving convergence. In MDL and MLTW, we use , . Moreover, we also test our proposed methods MDL and MLTW with different parameters to see that is the best choice. Since a large set of problems is used, we describe the results fully on the first author's web page at the web site: http://maths.henu.edu.cn/szdw/teachers/xyh.htm. The tables contain the number of the problem (Problem), the dimension of the problem (Dim), the Number of iterations (Iter), the number of function and gradient evaluations (Nfcnt), the CPU time required in seconds (Time), the final function value (Fv), and norm of the final gradient (Norm).
There are problems that were excluded from the first two tables because they lead an “overflow error” when evaluated at some point by MDL and MLWT methods. However, the same error was occurred on problems when evaluated by PRP method. From these tables, we also see that MDL and MLWT failed to satisfy the termination condition (5.1) on other and problems, respectively. But PRP method cannot achieve convergence on problems. So only 634 problems remain where at least one method runs successfully. Now, we change our attention to consider the function values of the remaining problems founded by all three methods. We note that, on problems, the differences of these functional values obtained by each method is less than the pretty small tolerance . Therefore, it is reasonable to think that all the three methods obtained the same optimal solution on these problems.
To approximatively assess the performance of MDL, MLWT, and PRP methods on the remaining problems, we use the profile of Dolan and Moré  as an evaluated tool. That is, for subset of the methods being analyzed, we plot the fraction of problems for which any given method is within a factor of the best. Meanwhile, we use the iterations, function and gradient evaluations, and CPU time consuming as performance measure, since they reflect the main computational cost and the efficiency for each method. The performance profiles of all three methods are plotted in Figures 1, 2, and 3.
Observing Figures 1 and 2, respectively, it concludes that MDL and MLWT are always the top performer for almost all values of , which shows that they perform better than PRP method regarding iterations, function, and gradient evaluations. Figure 3 shows the implementation of the these methods using the total CPU time as a measure. This figure shows that PRP method is faster than the others. Why do our methods need more computing time though requiring less iterations? We think that it is highly possible that our new version of formula is a somewhat more complicated than the standard PRP method.
Taking everything into consideration and albeit both proposed conjugate gradient methods did not obtain significant development as we have expected, we think that, for some specific problems, the enhancement of the proposed methods are still noticeable. Hence, we believe that each one of the new algorithm is a valid approach for the problems and has its own potential.
The authors are very grateful to two anonymous referees for their useful suggestions and comments on the previous version of this paper. This work was supported by Chinese NSF Grant 10761001, and Henan University Science Foundation Grant 07YBZR002.
M. R. Hestenes and E. Stiefel, “Methods of conjugate gradient for solving linear systems,” Journal of Research of the National Bureau of Standards, vol. 49, pp. 409–436, 1952.View at: Google Scholar
E. Polak and G. Ribière, “Note sur la convergence de directions conjugées,” Revue Francaise d'Informatique et de Recherche Operationnelle, vol. 16, pp. 35–43, 1969.View at: Google Scholar
Z. Wei, G. Y. Li, and L. Qi, “Global convergence of the Polak-Ribière-Polyak conjugate gradient method with an Armijo-type inexact line search for nonconvex unconstrained optimization problems,” Mathematics of Computation, vol. 77, no. 264, pp. 2173–2193, 2008.View at: Publisher Site | Google Scholar | MathSciNet
W. Sun and Y. Yuan, Optimization Theory and Methods: Nonlinear Programming, vol. 1 of Springer Optimization and Its Applications, Springer, New York, NY, USA, 2006.View at: MathSciNet