A Conjugate Gradient Method for Unconstrained Optimization Problems

Yuan, Gonglin

doi:https://doi.org/10.1155/2009/329623

International Journal of Mathematics and Mathematical Sciences

On this page

Abstract Introduction Conclusions Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2009 | Article ID 329623 | https://doi.org/10.1155/2009/329623

A Conjugate Gradient Method for Unconstrained Optimization Problems

Gonglin Yuan¹

Academic Editor: Petru Jebelean

Received05 Jul 2009

Revised28 Aug 2009

Accepted01 Sept 2009

Published30 Nov 2009

Abstract

A hybrid method combining the FR conjugate gradient method and the WYL conjugate gradient method is proposed for unconstrained optimization problems. The presented method possesses the sufficient descent property under the strong Wolfe-Powell (SWP) line search rule relaxing the parameter . Under the suitable conditions, the global convergence with the SWP line search rule and the weak Wolfe-Powell (WWP) line search rule is established for nonconvex function. Numerical results show that this method is better than the FR method and the WYL method.

1. Introduction

Consider the following variables unconstrained optimization problem: where is smooth and its gradient is avaible. The nonlinear conjugate gradient (CG) method for (1.1) is designed by the iterative form where is the th iterative point, is a steplength, and is the search direction defined by where is a scalar which determines the different conjugate gradient methods [1, 2], and is the gradient of at the point . There are many well-known formulas for such as the Fletcher-Reeves (FR) [3], Polak-Ribière-Polyak (PRP) [4], Hestenses-Stiefel (HS) [5], Conjugate-Descent (CD) [6], Liu-Storrey (LS) [7], and Dai-Yuan (DY) [8]. The CG method is a powerful line search method for solving optimization problems, and it remains very popular for engineers and mathematicians who are interested in solving large-scale problems [9–11]. This method can avoid, like steepest descent method, the computation and storage of some matrices associated with the Hessian of objective functions. Then there are many new formulas that have been studied by many authors (see [12–20] etc.).

The following formula for is the famous FR method: where and are the gradients and of at the point and respectively, denotes the Euclidian norm of vectors. Throughout this paper, we also denote by Under the exact line search, Powell [21] analyzed the small-stepsize property of the FR conjugate gradient method and its global convergence, Zoutendijk [22] proved its global convergence for nonconvex function. Al-Baali [23] proved the sufficient descent condition and the global convergence of the FR conjugate gradient method with the SWP line search by restricting the parameter Liu et al. [24] extended the result to the the parameter Wei et al. (WYL) [17] proposed a new conjugate gradient formula: The numerical results show that this method is competitive to the PRP method, the global convergence of this method with the exact line search and Grippo-Lucidi line search conditions is proved. Huang et al. [25] proved that by restricting the parameter under the SWP line search rule, this method has the sufficient descent property. Then it is an interesting task to extend the bound of the parameter and get the sufficient descent condition.

The sufficient descent condition where is a constant, is crucial to insure the global convergence of the nonlinear conjugate gradient method [23, 26–28]. In order to get some better results of the conjugate gradient methods, Andrei [29, 30] proposed the hybrid conjugate gradient algorithms as convex combination of some other conjugate gradient algorithms. Motivated by the ideas of Andrei [29, 30] and the above observations, we will give a hybrid method combining the FR method and the WYL method. The proposed method, relaxing the parameter under the SWP line search technique, possesses the sufficient descent condition (1.6). The global convergence with the SWP line search and the WWP line search of our method is established for the nonconvex functions. Numerical results show that the presented method is competitive to the FR and the WYL method.

In the following section, the algorithm is stated. The properties and the global convergence of the new method are proved in Section 3. Numerical results are reported in Section 4 and one conclusion is given in Section 5.

2. Algorithm

Now we describe our algorithm as follows.

Algorithm 2.1 (the hybrid method). Step 1. Choose an initial point , , , . Set Step 2. If then stop; otherwise go to the next step.Step 3. Compute step size by some line search rules.Step 4. Let If then stop.Step 5. Calculate the search direction where Step 6. Set and go to Step 3.

3. The Properties and the Global Convergence

In the following, we assume that for all otherwise a stationary point has been found. The following assumptions are often used to prove the convergence of the nonlinear conjugate gradient methods (see [3, 8, 16, 17, 27]).

Assumption 3.1. (i) The function has a lower bound on the level set where is a given point.
(ii) In an open convex, set that contains is differentiable, and its gradient is Lipschitz continuous, namely, there exists a constants such that

3.1. The Properties with the Strong Wolfe-Powell Line Search

The strong Wolfe-Powell (SWP) search rule is to find a step length such that

where

The following theorem shows that the hybrid algorithm with the SWP line search possesses the sufficient condition (1.6) only under the parameter

Theorem 3.2. Let the sequences and be generated by Algorithm 2.1, and let the stepsize be determined by the SWP line search (3.2) and (3.3), if then the sufficient descent condition (1.6) holds.

Proof. By the definition and the formulae (1.4) and (1.5), we have Using (3.3) and the above inequality, we get By (2.1), we have We prove the descent property of by induction. Since if now we suppose that are all descent directions, for example,
By (3.6), we get
that is, However, from (3.7) together with (3.9), we deduce Repeating this process and using the fact imply By the restriction and we have So Then (3.11) can be rewritten as Thus, by induction, holds for all
Denote then and (3.13) turns out to be
this implies that (1.6) holds. The proof is complete.

Lemma 3.3. Suppose that Assumption 3.1 holds. Let the sequences and be generated by Algorithm 2.1, let the stepsize be determined by the SWP line search (3.2) and (3.3), and let the conditions in Theorem 3.2 hold. Then the Zoutendijk condition [22] holds.

By the same way, if Assumption 3.1 and the condition (for all ) hold, (3.15) also holds for the exact line search, the Armijo-Goldstein line search, and the weak Wolfe-Powell line search. The proofs can be seen in [31, 32]. Now, we prove the global convergence theorem of Algorithm 2.1 with the SWP line search.

Theorem 3.4. Suppose that Assumption 3.1 holds. Let the sequence and be generated by Algorithm 2.1, let the stepsize be determined by the SWP line search (3.2) and (3.3), let the conditions in Theorem 3.2 hold, and let the parameter . Then

Proof. By (1.6), (3.3), and the Zoutendijk condition (3.15), we get Denote so (3.17) can be rewritten as We prove the result of this theorem by contradiction. Assume that this theorem is not true, then there exists a positive constant such that Squaring both sides of (2.1), we obtain Dividing both sides by applying (3.4), (3.18), and the parameter we get Using (3.3) and (1.6), we have Repeating this process and using the fact we get Now, combining (3.24) and (3.20), we get another formula Thus this contradicts the condition (3.19). However, the conclusion of this theorem is correct.The following theorem will show that Algorithm 2.1 with the SWP line search, only under the descent condition is global convergence too.

Theorem 3.5. Suppose that Assumption 3.1 holds. Let the sequence be generated by Algorithm 2.1, let the stepsize be determined by the SWP line search (3.2) and (3.3), let the parameter , and let (3.27) hold. Then, for all the following inequalities holds: where furthermore, (3.16) holds

Proof. By (2.1), (3.3), (3.27), and we can deduce that (3.10) holds for all and According to the second inequality of (3.10), we have Similar to the way of [31], it is not difficult to get (3.28).
Secondly, using (3.30) and the Cauchy-Schwarz inequality implies that (or see [31])
Moreover, repeating the process of the first inequality of (3.10) and using we get By (3.3), (3.22), and the above inequality, we have Thus, using (3.20), we have where By Zoutendijk condition (3.15), (3.31), and (3.34), we obtain This contradiction shows that (3.16) is true.

3.2. The Properties with the Weak Wolfe-Powell (WWP) Line Search

The weak Wolfe-Powell line search is to find a step length satisfying (3.2) and where

From the computation point of view, one of the well-known formulas for is the PRP method. The global convergence with the exact line search had been proved by Polak and Ribière [4] when the objective function is convex. Powell [33] gave a counter example to show that there exist nonconvex functions on which the PRP method does not converge globally even if the exact line search is used. He suggested that should not be less than zero. Considering this suggestion, under the assumption of the sufficient descent condition, Gilbert and Nocedal [27] proved that the modified PRP method is globally convergent with the WWP line search technique. For the new formula we know that it is always larger than zero. Then we can also get the global convergence of the hybrid method with the WWP line search.

Lemma 3.6 (see [31, Lemma ]). Let Assumption 3.1 hold and let the sequences be generated by Algorithm 2.1. The stepsize is determined by (3.2) and (3.36). Suppose that (3.20) is true, and the sufficient descent condition (1.6) holds. Then we have and where

The following Property 1 was introduced by Gilbert and Nocedal [27], which pertains to the PRP method under the sufficient descent condition. The WYL also has this property. Now we will prove that this Property 1 pertains to the new method.

Property 1. Suppose that We say that the method has Property 1 if for all there exists constants and such that and

Lemma 3.7. Let Assumption 3.1 hold, let the sufficient descent condition (1.6) hold, and let the sequences be generated by Algorithm 2.1. Suppose that there exists a constant such that for all Then this method possesses Property 1.

Proof. By (3.36), (1.6), (3.1), and (3.38), we have Then we get By (3.1) again, we obtain Combing the above inequality and (3.41) implies that By (3.5) and (3.38), we have let If using (3.1), (3.38), (3.43), and the above equation, we obtain Therefore, the conclusion of this lemma holds.

Lemma 3.8 (see [31, Lemma ]). Let the sequences and be generated by Algorithm 2.1 and the conditions in Lemma 3.7 hold. If and has Property 1, then there exists such that for any and any index there is an index satisfying where denotes the set of positive integers, denotes the numbers of elements in

Finally, by Lemmas 3.6 and 3.8, we present the global convergence theorem of Algorithm 2.1 with the WWP line search. Similar to [31, Theorem ], it is not difficult to prove the result, so we omit it.

Theorem 3.9. Let the sequences and be generated by Algorithm 2.1 with the weak Wolfe-Powell line search and the conditions in Lemma 3.7 hold. Then (3.16) holds.

4. Numerical Results

In this section, we report some results of the numerical experiments. It is well known that there exist many new conjugate gradient methods (see [1, 13–16, 18, 19, 29, 30]) which have good properties and good numerical performances. Since the given formula is the hybrid of the FR formula and the WYL formula, we only test Algorithm 2.1 under the WWP line search on problems in [34] with the given initial points and dimensions, and compare its performance with those of the FR [3] and the WYL [17] methods. The parameters are chosen as follows: The following Himmeblau stop rule is used as follows.

If let otherwise, let

If or was satisfied, we will stop the program, where . We also stop the program if the iteration number is more than one thousand. All codes were written in MATLAB and run on PC with 2.60 GHz CPU processor and 256 MB memory and Windows XP operation system. The detail numerical results are listed at http://210.36.18.9:8018/publication.asp?id=36990.

Dolan and Moré [35] gave a new tool to analyze the efficiency of Algorithms. They introduced the notion of a performance profile as a means to evaluate and compare the performance of the set of solvers on a test set Assuming that there exist solvers and problems, for each problem and solver they defined:

computing time (the number of function evaluations or others) required to solve problem by solver .

Requiring a baseline for comparisons, they compared the performance on problem by solver with the best performance by any solver on this problem; that is, using the performance ratio Suppose that a parameter for all is chosen, and if and only if solver does not solve problem

The performance of solver on any given problem might be of interest, but we would like to obtain an overall assessment of the performance of the solver, then they defined thus was the probability for solver that a performance ratio was within a factor of the best possible ratio. Then function was the (cumulative) distribution function for the performance ratio. The performance profile for a solver was a nondecreasing, piecewise constant function, continuous from the right at each breakpoint. The value of was the probability that the solver would win over the rest of the solvers.

According to the above rules, we know that one solver whose performance profile plot is on top right will win over the rest of the solvers.

Figures 1-2 show that the performances of these methods are relative to the iteration number () and the number of the function and gradient (), where the “FR” denotes the FR formula with WWP rule, the “WYL” denotes the WYL formula with WWP rule, and Algorithm 2.1 denotes the new method with WWP rule, respectively.

From Figures 1-2, it is easy to see that Algorithm 2.1 is the best among the three methods, and the method is much better than methods. Notice that the global convergence of the method with the WWP line search has not been established yet. In other words, the given method is competitive to the other two normal methods and the hybrid formula is notable.

5. Conclusions

This paper gives a hybrid conjugate gradient method for solving unconstrained optimization problems. Under the SWP line search, this method possesses the sufficient descent condition only with the parameter . The global convergence with the SWP line search and the WWP line search is established for the nonconvex functions. Numerical results show that the given method is competitive to other two conjugate gradient methods.

For further research, we should study the new method with the nonmonotone line search technique. Moreover, more numerical experiments for large practical problems (such as the problems [36]) should be done, and the given method should be compared with other famous formulas in the future. How to choose the parameters and in the algorithm is another aspect of future investigation.

Acknowledgments

The authors are very grateful to the anonymous referees and the editors for their valuable suggestions and comments, which improved our paper greatly. This work is supported by China NSF grants 10761001 and the Scientific Research Foundation of Guangxi University (Grant no. X081082).

References

G. H. Yu, Nonlinear self-scaling conjugate gradient methods for large-scale optimization problems, Doctorial thesis, Sun Yat-Sen University, Guangzhou, China, 2007.
G. Yuan and Z. Wei, “New line search methods for unconstrained optimization,” Journal of the Korean Statistical Society, vol. 38, no. 1, pp. 29–39, 2009.
View at: Publisher Site | Google Scholar | MathSciNet
R. Fletcher and C. M. Reeves, “Function minimization by conjugate gradients,” The Computer Journal, vol. 7, pp. 149–154, 1964.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
E. Polak and G. Ribière, “Note sur la convergence de méthodes de directions conjuguées,” Revue Francaised Informatiquet de Recherche Operiionelle, vol. 3, no. 16, pp. 35–43, 1969.
View at: Google Scholar | Zentralblatt MATH | MathSciNet
M. R. Hestenes and E. Stiefel, “Methods of conjugate gradients for solving linear systems,” Journal of Research of the National Bureau of Standards, vol. 49, pp. 409–436, 1952.
View at: Google Scholar | Zentralblatt MATH | MathSciNet
R. Fletcher, Practical Methods of Optimization. Vol. 1: Unconstrained Optimization, John Wiley & Sons, New York, NY, USA, 2nd edition, 1997.
Y. Liu and C. Storey, “Efficient generalized conjugate gradient algorithms. I. Theory,” Journal of Optimization Theory and Applications, vol. 69, no. 1, pp. 129–137, 1991.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
Y. H. Dai and Y. Yuan, “A nonlinear conjugate gradient method with a strong global convergence property,” SIAM Journal on Optimization, vol. 10, no. 1, pp. 177–182, 1999.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
J. Nocedal, “Conjugate gradient methods and nonlinear optimization,” in Linear and Nonlinear Conjugate Gradient-Related Methods (Seattle, WA, 1995), L. Adams and J. L. Nazareth, Eds., pp. 9–23, SIAM, Philadelphia, Pa, USA, 1996.
View at: Google Scholar | Zentralblatt MATH | MathSciNet
E. Polak, Optimization: Algorithms and Consistent Approximations, vol. 124 of Applied Mathematical Sciences, Springer, New York, NY, USA, 1997.
View at: MathSciNet
Y. Yuan and W. Sun, Theory and Methods of Optimization, Science Press of China, Beijing, China, 1999.
Y. Dai, “A nonmonotone conjugate gradient algorithm for unconstrained optimization,” Journal of Systems Science and Complexity, vol. 15, no. 2, pp. 139–145, 2002.
View at: Google Scholar | Zentralblatt MATH | MathSciNet
W. W. Hager and H. Zhang, “A new conjugate gradient method with guaranteed descent and an efficient line search,” SIAM Journal on Optimization, vol. 16, no. 1, pp. 170–192, 2005.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
W. W. Hager and H. Zhang, “Algorithm 851: $C G_{D} E S C E N T$ , a conjugate gradient method with guaranteed descent,” ACM Transactions on Mathematical Software, vol. 32, no. 1, pp. 113–137, 2006.
View at: Publisher Site | Google Scholar | MathSciNet
W. W. Hager and H. Zhang, “A survey of nonlinear conjugate gradient methods,” Pacific Journal of Optimization, vol. 2, pp. 35–58, 2006.
View at: Google Scholar
Z. Wei, G. Li, and L. Qi, “New nonlinear conjugate gradient formulas for large-scale unconstrained optimization problems,” Applied Mathematics and Computation, vol. 179, no. 2, pp. 407–430, 2006.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
Z. Wei, S. Yao, and L. Liu, “The convergence properties of some new conjugate gradient methods,” Applied Mathematics and Computation, vol. 183, no. 2, pp. 1341–1350, 2006.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
G. Yuan, “Modified nonlinear conjugate gradient methods with sufficient descent property for large-scale optimization problems,” Optimization Letters, vol. 3, no. 1, pp. 11–21, 2009.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
G. Yuan and X. Lu, “A modified PRP conjugate gradient method,” Annals of Operations Research, vol. 166, pp. 73–90, 2009.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
L. Zhang, W. Zhou, and D.-H. Li, “A descent modified Polak-Ribière-Polyak conjugate gradient method and its global convergence,” IMA Journal of Numerical Analysis, vol. 26, no. 4, pp. 629–640, 2006.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
M. J. D. Powell, “Restart procedures for the conjugate gradient method,” Mathematical Programming, vol. 12, no. 2, pp. 241–254, 1977.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
G. Zoutendijk, “Nonlinear programming, computational methods,” in Integer and Nonlinear Programming, J. Abadie, Ed., pp. 37–86, North-Holland, Amsterdam, The Netherlands, 1970.
View at: Google Scholar | Zentralblatt MATH | MathSciNet
M. Al-Baali, “Descent property and global convergence of the Fletcher-Reeves method with inexact line search,” IMA Journal of Numerical Analysis, vol. 5, no. 1, pp. 121–124, 1985.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
G. H. Liu, J. Y. Han, and H. X. Yin, “Global convergence of the Fletcher-Reeves algorithm with inexact linesearch,” Applied Mathematics: A Journal of Chinese Universities, vol. 10, no. 1, pp. 75–82, 1995.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
H. Huang, Z. Wei, and S. Yao, “The proof of the sufficient descent condition of the Wei-Yao-Liu conjugate gradient method under the strong Wolfe-Powell line search,” Applied Mathematics and Computation, vol. 189, no. 2, pp. 1241–1245, 2007.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
D. Touati-Ahmed and C. Storey, “Efficient hybrid conjugate gradient techniques,” Journal of Optimization Theory and Applications, vol. 64, no. 2, pp. 379–397, 1990.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
J. C. Gilbert and J. Nocedal, “Global convergence properties of conjugate gradient methods for optimization,” SIAM Journal on Optimization, vol. 2, no. 1, pp. 21–42, 1992.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
Y. F. Hu and C. Storey, “Global convergence result for conjugate gradient methods,” Journal of Optimization Theory and Applications, vol. 71, no. 2, pp. 399–405, 1991.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
N. Andrei, “Another hybrid conjugate gradient algorithm for unconstrained optimization,” Numerical Algorithms, vol. 47, no. 2, pp. 143–156, 2008.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
N. Andrei, “Hybrid conjugate gradient algorithm for unconstrained optimization,” Journal of Optimization Theory and Applications, vol. 141, no. 2, pp. 249–264, 2009.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
Y. H. Dai and Y. Yuan, Nonlinear Conjugate Gradient Methods, Shanghai Scientific and Technical, Shanghai, China, 1998.
Z. F. Li, J. Chen, and N. Y. Deng, “Convergence properties of conjugate gradient methods with Goldstein line search,” Journal of China Agricultural University, vol. 1, no. 4, pp. 15–18, 1996.
View at: Google Scholar
M. J. D. Powell, “Nonconvex minimization calculations and the conjugate gradient method,” in Numerical Analysis (Dundee, 1983), vol. 1066 of Lecture Notes in Mathematics, pp. 122–141, Springer, Berlin, Germany, 1984.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
J. J. Moré, B. S. Garbow, and K. E. Hillstrom, “Testing unconstrained optimization software,” ACM Transactions on Mathematical Software, vol. 7, no. 1, pp. 17–41, 1981.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
E. D. Dolan and J. J. Moré, “Benchmarking optimization software with performance profiles,” Mathematical Programming, vol. 91, no. 2, pp. 201–213, 2002.
View at: Google Scholar | Zentralblatt MATH | MathSciNet
K. E. Bongartz, A. R. Conn, N. I. M. Gould, and P. L. Toint, “CUTE: constrained and unconstrained testing environments,” ACM Transactions on Mathematical Software, vol. 21, pp. 123–160, 1995.
View at: Google Scholar

Copyright

Copyright © 2009 Gonglin Yuan. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

4854

Downloads

1374

Citations