A Globally Convergent Hybrid Conjugate Gradient Method and Its Numerical Behaviors

Huang, Yuan-Yuan; Liu, San-Yang; Du, Xue-Wu; Dong, Xiao-Liang

doi:https://doi.org/10.1155/2013/147025

Journal of Applied Mathematics

On this page

Abstract Introduction Preliminaries Conclusions Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2013 | Article ID 147025 | https://doi.org/10.1155/2013/147025

A Globally Convergent Hybrid Conjugate Gradient Method and Its Numerical Behaviors

Yuan-Yuan Huang,¹San-Yang Liu,¹Xue-Wu Du,²and Xiao-Liang Dong¹

Academic Editor: Martin Weiser

Received21 Dec 2012

Revised12 Mar 2013

Accepted27 Mar 2013

Published22 Apr 2013

Abstract

We consider a hybrid Dai-Yuan conjugate gradient method. We confirm that its numerical performance can be improved provided that this method uses a practical steplength rule developed by Dong, and the associated convergence is analyzed as well.

1. Introduction

Consider the following problem of finding an such that where is the -dimensional Euclidean space and is continuous. Throughout this paper, this problem corresponds to optimality condition of a certain problem of minimizing which may be not easy to calculate or cannot be expressed as elementary functions. When the dimension is large, conjugate gradient methods can be efficient to solve problem (1). For any given starting point , a sequence is generated by the following recursive relation: with where is a steplength, is a descent direction, stands for , and is a parameter. Different choices of result in different nonlinear conjugate gradient methods. The Dai-Yuan (DY) formula [1] and the Hestenes-Stiefel (HS) formula [2] are two famous ones, and are given by respectively, where means the -norm and . Other well-known formulae for such as the Fletcher-Reeves formula [3], the Polak-Ribière-Polyak formula [4, 5], and the Hager-Zhang formula [6], please refer to [7, 8] for further survey.

In [1], the steplength is obtained by the following weak Wolfe line search: where . With the same line search, two hybrid versions related to the DY method and the HS method were proposed in [9], which generate the parameter by respectively. And initial numerical results in [10] suggested that the two hybrid conjugate gradient methods (abbreviated as DYHS and DYHS+, resp.) are efficient, especially the DYHS+ method performed better.

The line search plays an important role in the efficiency of conjugate gradient methods. Hager and Zhang [6] showed that the first condition (5) of the weak Wolfe line search limits the accuracy of a conjugate gradient method to the order of the square root of the machine precision (see also [11, 12]); thus, in order to get higher precision, they proposed approximate Wolfe conditions [6, 7], where and , which are usually used combining with the weak Wolfe line search. However, there is no theory to guarantee convergence in [6–8]. By following a referee’s suggestion, we adapt the approximate Wolfe conditions to a Dai-Yuan hybrid conjugate gradient method and investigate its numerical performances.

More recently, Dong designed a practical Armijo-type steplength rule [13] only using gradient, please see [14] for a more conceptual version, the steplength is chosen by the following steps: choose , compute some appropriate initial steplength , determine a real number (see (12)) and find to be the largest such that where is a nonnegative integer and .

The main differences between the practical steplength rule and the weak Wolfe conditions are that the former does not require function evaluations, it is in high accuracy and has broader application scope. The feature of high accuracy is supported by the corresponding theory analysis in [6, 11]. Numerical results reported in [13] also imply that the line search (9) is efficient and highly accurate. So, it is meaningful to imbed the line search into the hybrid conjugate gradient method with parameter and to check its efficiency.

This paper is to solve problem (1), which corresponds to optimality condition of a certain problem of minimizing . If the original function cannot be expressed as elementary functions, the weak Wolfe conditions cannot be applied directly, while the practical steplength rule (9) and the approximate Wolfe conditions (8) can be used to solve this kind of nonlinear unconstrained minimization problems. So, in order to investigate the numerical performances of the two modified methods with steplength rules (9) and (8) and to confirm their broader application scope, two classes of test problems are selected. One class is composed of unconstrained nonlinear optimization problems from the CUTEr library, and the other class is composed of some boundary value problems.

The rest of this paper is organized as follows. In Section 2, we give some basic definitions and properties used in this paper. In Section 3, we describe two modified versions of the Dai-Yuan hybrid conjugate gradient method with the line search (9) and the approximate Wolfe conditions (8) and illustrate that if is Lipschitz continuous, the former version is convergent in the sense that . In Section 4, we test the modified hybrid conjugate gradient methods over two classes of standard test problems and compare them with the DYHS method and the DYHS+ method. Finally, some conclusions are given in Section 5.

2. Preliminaries

In this section, we give some basic definitions and related properties which will be used in the following discussions.

Assumption 1. Assume is -Lipschitz continuous in , that is, there exists a constant such that And its original function is bounded below in the level set .

Definition 2. A mapping is said to be -monotone on the set if there exists such that

By using the definition above, we know that if the gradient is -Lipschitz continuous, then for any given , the gradient must be -monotone along the ray for some . Then, how to evaluate such ? In [13], it is suggested to evaluate using the following approximation formula:

Lemma 3. For any given and for all , if is continuously differentiable and the given gradient is -monotone along the ray , then the following inequality holds:

Proof. Please see [13] for a detailed proof.

3. Algorithm and Its Convergence

In this section, we first formally describe the hybrid conjugate gradient method. Then, we give its two modified versions and illustrate that the modified version with steplength rule (9) is convergent.

Algorithm 4. Step 0. Choose . Set and .

Step 1. If , then stop, otherwise find such that certain steplength rule holds.

Step 2. Compute the new iterate by (2) and the new search direction by

Set and go to Step 1.

Denote (i)We abbreviate Algorithm 4 as MDYHS+, if the steplength rule in Step 1 is finding to be the largest () such that holds, where is defined in (12) and . (ii)And we abbreviate Algorithm 4 as MDYHS + 1, if in Step 1 is located to satisfy the approximate Wolfe conditions (8). It should be noticed that on the face of it, the approximate Wolfe conditions only use gradient information to locate the steplength, while they require function evaluations in practical implementations in [6–8], please refer to [8, pages 120–125] for details. So, the algorithm in [6–8] to generate the steplength satisfying the approximate Wolfe conditions (8) is not applicable. Here, we determine the steplength of the MDYHS+1 method following the inexact line search strategies of [15, Algorithm 2.6]. Detailed steps are described in Algorithm 6. The initial value of is taken to be .

Remark 5. The choice of comes from [13]. Since it is important to use current information about the algorithm and the problem to make an initial guess of , the author of [13] uses the relation and to give an approximation to the optimal steplength through

Now, we describe the line search algorithm in [15], which is very close to one suggested in [16].

Algorithm 6. Step 0. Set and . Choose . Set .

Step 1. If does not satisfy then set , and go to Step 2. If does not satisfy then set , and go to Step 3. Otherwise, set , and return.

Step 2. Set . Then go to Step 1.

Step 3. Set . Then go to Step 1.

Next, we analyze the convergence properties of the MDYHS+ method.

Lemma 7. Consider the previous MDYHS+ method. If for all , then the steplength is well-defined, namely, can be found to satisfy (16) after a finite number of trials. The search direction satisfies the sufficient descent condition Furthermore, .

Proof. We prove the desired results by induction. When , by using , we have We now show that the steplength will be determined within a finite number of trials. Otherwise, for any , the following inequality holds where . Since is continuous, taking the limits with respect to on the both sides of (23) yields this is a contradiction. Then, is well-defined and Assume that (21) holds for and . A similar discussion to the case yields is well-defined. Multiplying by we have that it follows from and that Then,

Theorem 8. Consider the previos MDYHS+ method, and assume that is -monotone in the interval , where is the very defined in (12). If Assumption 1 holds, then either or

Proof. Since is -monotone, by using Lemma 3, we obtain Furthermore, Combining (33) with (16) yields which, together with (21), implies
Next, we follow [13] to consider two possible cases.
Case 1 . Consider Assumption 1, we have Thus, (30) holds.
Case 2 . From this case, we have . Since is the maximal number in such that (16) holds, then there must exist a natural number , when , there exists such that and violates (16), namely, then Combining it with the Lipschitz continuity of and the fact that yields which, together with (34), implies with . Using (40) recursively, we obtain Combining it with Assumption 1 yields namely,

Theorem 9. The MDYHS+ method is convergent in the sense that .

Proof. The proof of Theorem 9 is much like the convergence analysis of the DY method [1, Theorem 3.3], so the corresponding details are omitted here.

4. Numerical Experiments

In this section, we did some numerical experiments to test the performances of the MDYHS+ method and the MDYHS + 1 method. One purpose of this section is to compare them with the DYHS method and the DYHS+ method. The other purpose is to confirm their broader application scope by solving boundary value problems. So, two classes of test problems were selected here. One class was drawn from the CUTEr library [17, 18], and the other class came from [19]. More information was described in the following subsections.

For the MDYHS+ method, we set and . For the MDYHS + 1 method, we followed [8] to choose and . And for the hybrid conjugate gradient methods DYHS and DYHS+, the values of and in (5) and (6) were taken to be 0.01 and 0.1, respectively. The initial value of was taken to be for the first iteration and for (see [10]). For all the methods, the largest trial times of choosing steplength at each iteration was taken to be , and the stopping criterion used was . In order to understand the numerical performance of each method deeply, we did numerical experiments with . Our computations were carried out using Matlab R2012a on a desktop computer with an Intel(R) Xeon(R) 2.40 GHZ CPU and 6.00 GB of RAM. The operating system is Linux: Ubuntu 8.04.

4.1. Tested by Some Problems in the CUTEr Library

In this subsection, we implemented four different hybrid conjugate gradient methods and compared their numerical performances. Because the DYHS method and the DYHS+ method need the information of the original function of (1), we selected a collection of test problems from the CUTEr library and listed them in Table 1. The first column “Prob.” denotes the problem number, and the columns “Name” and “” denote the name and the dimension of the problem, respectively. Since we were interested in large-scale problems, we only considered problems with size at least . The largest dimension was set to 10,000. Moreover, we accessed CUTEr functions from within Matlab R2012a by using Matlab interface.

Our numerical results were reported in Tables 2, 3, 4, 5, and 6 in the form of , where , , and stand for the number of iterations, the total trial times of the line search and the CPU time elapsed, respectively. For the DYHS+ and the DYHS, we let and be the number of function evaluations and the number of gradient evaluations, respectively, and set by automatic differentiation (see [10, 20] for details). Moreover, “—” means the method’s failure to achieve a prescribed accuracy when the number of iterations exceeded , and the test problems are represented in the form of #Pro.().

The performances of the four methods, relative to CPU time, were evaluated using the profiles of Dolan and Morè [21]. That is, for the four methods, we plotted the fraction of problems for which each of the methods was within a factor of the best time. Figures 1, 2, and 3 showed the performance profiles referring to CPU time, the number of iterations and the total trial times of the line search, respectively. These figures revealed that the MDYHS+ method and the MDYHS + 1 method performed better than the DYHS method and the DYHS+ method. The performance profiles also showed that the MDYHS+ method and the MDYHS + 1 method were comparable and solved almost all of the test problems up to . Yet, the latter has no convergence.

(a)

(b)

(a)

(b)

(a)

(b)

4.2. Tested by Some Boundary Value Problems

In this section, we implemented the MDYHS+ method and the MDYHS + 1 method to solve some boundary value problems. See [22, Chapter 1] for the background of the boundary value problems.

In order to confirm the efficiency of the MDYHS+ method and the MDYHS + 1 method to solve this class of problems, We drew a set of 11 boundary value problems from [19] and listed them in Table 7, where the test problems were expressed by #Pro.() (#Pro. denotes the problem number in [19] and denotes the dimension), and the test results were listed in the form of .

From Table 7, we can see that both of the MDYHS+ method and the MDYHS + 1 method are efficient in solving boundary value problems. The MDYHS + 1 method seems a little better but has no convergence.

5. Conclusions

This paper has studied two modified versions of a Dai-Yuan hybrid conjugate gradient method with two different line searches only using gradient information and has proven that with the line search (9), it is convergent in the sense of . Then, we investigated the numerical behaviors of the two modified versions over two classes of standard test problems. From the numerical results, we can conclude that the two modified hybrid conjugate gradient methods are more efficient (especially in high precision) in solving large-scale nonlinear unconstrained minimization problems and have broader application scope. For example, they can be used to solve some boundary value problems, where functions are not explicit.

Acknowledgments

The authors are very grateful to the associate editor and the referees for their valuable suggestions. Meanwhile, the first author is also very grateful to Yunda Dong for suggesting her to write this paper and to add Section 4.2 to the revised version. This work was supported by National Science Foundation of China, no. 60974082.

References

Y. H. Dai and Y. Yuan, “A nonlinear conjugate gradient method with a strong global convergence property,” SIAM Journal on Optimization, vol. 10, no. 1, pp. 177–182, 1999.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
M. R. Hestenes and E. Stiefel, “Methods of conjugate gradients for solving linear systems,” Journal of Research of the National Bureau of Standards, vol. 49, pp. 409–436, 1952.
View at: Google Scholar | Zentralblatt MATH | MathSciNet
R. Fletcher and C. M. Reeves, “Function minimization by conjugate gradients,” The Computer Journal, vol. 7, pp. 149–154, 1964.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
E. Polak and G. Ribiere, “Note sur la convergence de méthodes de directions conjuguées,” Revue francaise d’informatique et derecherche opérationnelle, série rouge, vol. 3, no. 16, pp. 35–43, 1969.
View at: Google Scholar | Zentralblatt MATH | MathSciNet
B. T. Polyak, “The conjugate gradient method in extremal problems,” USSR Computational Mathematics and Mathematical Physics, vol. 9, no. 4, pp. 94–112, 1969.
View at: Google Scholar
W. W. Hager and H. Zhang, “A new conjugate gradient method with guaranteed descent and an efficient line search,” SIAM Journal on Optimization, vol. 16, no. 1, pp. 170–192, 2005.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
W. W. Hager and H. Zhang, “A survey of nonlinear conjugate gradient methods,” Pacific Journal of Optimization, vol. 2, no. 1, pp. 35–58, 2006.
View at: Google Scholar | Zentralblatt MATH | MathSciNet
W. W. Hager and H. Zhang, “Algorithm 851: $CG -DESCENT$ , a conjugate gradient method with guaranteed descent,” Association for Computing Machinery, vol. 32, no. 1, pp. 113–137, 2006.
View at: Publisher Site | Google Scholar | MathSciNet
Y. H. Dai and Y. Yuan, “An efficient hybrid conjugate gradient method for unconstrained optimization,” Annals of Operations Research, vol. 103, pp. 33–47, 2001.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
Y.-h. Dai and Q. Ni, “Testing different conjugate gradient methods for large-scale unconstrained optimization,” Journal of Computational Mathematics, vol. 21, no. 3, pp. 311–320, 2003.
View at: Google Scholar | Zentralblatt MATH | MathSciNet
R. Pytlak, Conjugate Gradient Algorithms in Nonconvex Optimization, vol. 89 of Nonconvex Optimization and its Applications, Springer, Berlin, Germany, 2009.
View at: MathSciNet
Y.-H. Dai and C.-X. Kou, “A nonlinear conjugate gradient algorithm with an optimal property and an improved Wolfe line search,” SIAM Journal on Optimization, vol. 23, no. 1, pp. 296–320, 2013.
View at: Publisher Site | Google Scholar | MathSciNet
Y. Dong, “A practical PR+ conjugate gradient method only using gradient,” Applied Mathematics and Computation, vol. 219, no. 4, pp. 2041–2052, 2012.
View at: Publisher Site | Google Scholar | MathSciNet
Y. Dong, “New step lengths in conjugate gradient methods,” Computers & Mathematics with Applications, vol. 60, no. 3, pp. 563–571, 2010.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
A. S. Lewis and M. L. Overton, “Nonsmooth optimization via quasi-Newton methods,” Mathematical Programming A, 2012.
View at: Publisher Site | Google Scholar
C. Lemarechal, “A view of line search,” Optimization and Optimal Control, vol. 30, pp. 59–78, 1981, Lecture Notes on Control and Information Sciences.
View at: Google Scholar
I. Bongartz, A. R. Conn, N. Gould, and P. L. Toint, “CUTE: constrained and unconstrained testing environment,” ACM Transactions on Mathematical Software, vol. 21, no. 1, pp. 123–160, 1995.
View at: Publisher Site | Google Scholar
N. I. M. Gould, D. Orban, and P. L. Toint, “CUTEr and SifDec: a constrained and unconstrained testing environment, revisited,” ACM Transactions on Mathematical Software, vol. 29, pp. 373–394, 2003.
View at: Google Scholar
E. Spedicato and Z. Huang, “Numerical experience with Newton-like methods for nonlinear algebraic systems,” Computing, vol. 58, no. 1, pp. 69–89, 1997.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
A. Griewank, “On automatic differentiation,” in Mathematical Programming: Recent Developments and Applications, M. Iri and K. Tanabe, Eds., pp. 83–108, Kluwer Academic, 1989.
View at: Google Scholar
E. D. Dolan and J. J. Moré, “Benchmarking optimization software with performance profiles,” Mathematical Programming A, vol. 91, no. 2, pp. 201–213, 2002.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
J. M. Ortega and W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables, Academic Press, New York, NY, USA, 1970.
View at: MathSciNet

Copyright

Copyright © 2013 Yuan-Yuan Huang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

1046

Downloads

952

Citations