A New Modified Three-Term Hestenes–Stiefel Conjugate Gradient Method with Sufficient Descent Property and Its Global Convergence

Baluch, Bakhtawar; Salleh, Zabidin; Alhawarat, Ahmad

doi:https://doi.org/10.1155/2018/5057096

Journal of Optimization

On this page

Abstract Introduction Results and Discussion Conclusion Data Availability Conflicts of Interest References Copyright Related Articles

Research Article | Open Access

Volume 2018 | Article ID 5057096 | https://doi.org/10.1155/2018/5057096

A New Modified Three-Term Hestenes–Stiefel Conjugate Gradient Method with Sufficient Descent Property and Its Global Convergence

Bakhtawar Baluch,¹Zabidin Salleh,²and Ahmad Alhawarat³

Academic Editor: Wlodzimierz Ogryczak

Received14 May 2018

Revised01 Aug 2018

Accepted19 Aug 2018

Published27 Sept 2018

Abstract

This paper describes a modified three-term Hestenes–Stiefel (HS) method. The original HS method is the earliest conjugate gradient method. Although the HS method achieves global convergence using an exact line search, this is not guaranteed in the case of an inexact line search. In addition, the HS method does not usually satisfy the descent property. Our modified three-term conjugate gradient method possesses a sufficient descent property regardless of the type of line search and guarantees global convergence using the inexact Wolfe–Powell line search. The numerical efficiency of the modified three-term HS method is checked using 75 standard test functions. It is known that three-term conjugate gradient methods are numerically more efficient than two-term conjugate gradient methods. Importantly, this paper quantifies how much better the three-term performance is compared with two-term methods. Thus, in the numerical results, we compare our new modification with an efficient two-term conjugate gradient method. We also compare our modification with a state-of-the-art three-term HS method. Finally, we conclude that our proposed modification is globally convergent and numerically efficient.

1. Introduction

In the field of optimization conjugate gradient methods are a well-known approach for solving large-scale unconstrained optimization problems. The conjugate gradient (CG) methods are simple and have relatively modest storage requirements. This class of methods has a vast number of applications in different areas, especially in the field of engineering [1–3].

Consider the unconstrained optimization problem:where is continuously differentiable and its gradient is . Normally CG methods generate a sequence defined byIn (2), is a general line search and is a search direction given by where is a parameter of the CG method. The six pioneering forms of are defined in [4–10].

Line searches may be exact or inexact. Exact line searches are time consuming, computationally expensive, and difficult and require large amounts of storage [11–13]. Thus, inexact line search techniques are often adopted because of their efficiency and global convergence properties. Well-known inexact line search methods include the Wolfe and strong Wolfe techniques, which can be written as where , andRecently, Alhawarat and Salleh [14], Salleh and Alhawarat [15], and Alhawarat et al. [16, 17] proposed efficient CG and hybrid CG methods that fulfill the required global convergence properties. To improve the existing methods, a three-term CG technique has been introduced. Several different researchers have suggested various modifications to the three-term CG method. For instance, Beale [18] and Nazareth [19] proposed CG methods based on three terms that possess the finite termination property, but these do not perform well in practice [20, 21]. Furthermore, reports by McGuire and Wolfe [22], Deng and Li [23], Zhang et al. [24, 25], Cheng [26], Al-Bayati and Sharif [27], Zhang Xiao and Wei [28], Andrei [29–31], Sugiki et al. [32], Narushima et al. [33], Babaie-Kafaki and Ghanbari [34], Al-Baali et al. [35], Sun and Liu [36], and Baluch et al. [37] discuss the global convergence and numerical results of modified three-term CG methods.

In this paper, a modified three-term Hestenes–Stiefel (HS) method is proposed. The general formula of the HS method [4] is This is known to be the first of all the CG parameters. This method ensures the global convergence of the exact line search. A nice property of the HS method is that it satisfies the conjugacy condition, regardless of whether the line search is exact or inexact [38]. However, this method does not satisfy the global convergence property when used with an inexact line search.

In this paper, the method of Zhang et al. [25] is modified with the help of another efficient CG parameter proposed by Wei et al. [39]. An attractive feature of the new three-term HS method is that it satisfies the sufficient descent condition regardless of the line search used. Furthermore, our modification is globally convergent for both convex and nonconvex functions when using an inexact line search. Numerical experiments show that the new modification is more efficient and robust than the MTTHS algorithm proposed by Zhang et al. [25]. The second aspect of this paper is to quantify the improvement of the three-term CG method over two-term approaches. To do this, we consider the efficient two-term CG method [40] given byThis DHS [40] method is one of the more efficient CG techniques, as it possesses the sufficient descent property and offers global convergence under Wolfe–Powell line search conditions. The numerical results given by this method are also convincing. Therefore, this two-term CG method is compared with our new modification to quantify the improvement offered by three-term CG methods.

The remainder of this paper is organized as follows. In Section 2, the motivation for and construction of the three-term HS CG method is discussed, and the general form is presented in Algorithm A. Section 3 is divided into two subsections, with Section 3.1 covering the sufficient descent condition and the global convergence properties for convex and nonconvex functions and Section 3.2 presenting detailed numerical results to evaluate the proposed method. Finally, Section 4 concludes this paper.

2. Motivation and Formulas

Zhang et al. [25] proposed the first three-term HS (TTHS) method. This can be written as and .

TTHS satisfies the descent property; if an exact line search is used, then it reduces to the original HS method. Further, to guarantee the global convergence properties of the search direction given by (8), a modified (MTTHS) algorithm was introduced with the search direction: where and .

As MTTHS was introduced to prove the global convergence properties of the search direction in (8), the question arises as to why (8) is not used to prove the global convergence properties. Instead of ignoring (8), it should be made efficient and globally convergent. Thus, there is room to modify (8) so as to satisfy the global convergence properties. It is expected that such a modification would outperform the MTTHS algorithm numerically.

Wei et al. [39] proposed an efficient CG parameter given byIn this parameter, the term plays an important role in satisfying the sufficient descent and global convergence properties. Thus, we take from the denominator of the above parameter and use it with (8) to construct a new modified three-term HS method. Hence,It is known that the HS method does not converge globally when the objective function is nonconvex. Further, Gilbert and Nocedal [41] showed that the parameter must be nonnegative to achieve convergence for nonconvex or nonlinear functions, i.e.,Applying the same technique to our parameter giveswhere If the line search is exact, then the parameters , , and reduce to the original parameters [4], [41], and TTHS [25]. The procedure of our proposed three-step CG method is described in Algorithm A.

Algorithm A.

Step 0. Choose an initial point , and set ,

Step 1. For convergence, if (), then the algorithm terminates; otherwise, go to step 2.

Step 2. Compute

, and are given in (11) and (14).

Step 3. Determine the step size by the Wolfe line search (4).

Step 4. Compute the new point

Step 5. Set and go to step 1.

3. Results and Discussion

This section contains a theoretical discussion and numerical results. The first subsection considers the global convergence properties of our proposed method and the second presents the results from numerical computations.

3.1. Global Convergence Properties

Assumptions

(A1) The level set is bounded.

(A2) In some neighborhoodof , the gradient is Lipschitz continuous on an open convex set that contains , i.e., there exists a positive constant such thatAssumptions (A1) and (A2) imply that there exist positive constants and such thatWe now prove the sufficient descent condition independent of the line search and also . From (15), (11), and (14), we can writethat is,Hence, the sufficient descent condition holds regardless of the line search. Now, we prove thatAs we have , taking the modulus on both sides givesBy the Schwartz inequality, we havesoorHence, we haveThe HS method is well known for its conjugacy conditions, such asBy [15], CG methods that inherit (27) will be more efficient than other CG parameters that do not inherit this property. Dai and Liao [42] proposed the following conjugacy condition for an inexact line search: Using the exact line search , (28) reduces to the conjugacy condition in (27).

Lemma 1 (see [43]). Suppose there is an initial point for which Assumptions (A1) and (A2) hold. Now, consider the method in the form of (2), in which is a descent direction and satisfies the Wolfe line search condition (4). Then This is known as Zoutendijk’s condition and is used for proving the global convergence of a CG method. This condition together with (26) shows that

Definition 2. The function is called uniformly convex [36] on if there exists a positive constant such that We now show the global convergence of Algorithm A for uniformly convex functions.

Lemma 3. Let the sequences and be generated by Algorithm A and suppose that (31) holds. Then,where .

Proof. For details, see Lemma 2.1 of [44].

Theorem 4. Let the conditions in Assumptions (A1) and (A2) hold and the function be uniformly convex. Then,

Proof. AsThen, using the second Wolfe condition (4) and the sufficient descent condition,we haveFrom (11), (32), and (36) and Assumption (A2),Let us suppose that , where and so that . Thus, Now,Combining (38) and (39) with (15), we obtainNow, let so thatand we get . This implies thatHence, by (30), we have

We are now going to prove the global convergence of Algorithm A for nonconvex functions.

Lemma 5. Suppose that Assumptions (A1) and (A2) hold. Let the sequence be generated by Algorithm A. If there exists a constant such that for every , then where

Proof. As and , and also for all , then for all . Hence, is well defined. Ifthen , where and are unit vectors. Therefore,As ,Now, from Assumption (A2), (14), and (18),From (17), (18), and (48), there exists a constant such thatFrom (30) and (49), we obtainCombining this with (44) completes the proof.

Theorem 6. Let Assumptions (A1) and (A2) hold. Then, the sequence () generated by Algorithm A satisfies

Proof. Suppose that . Then, there exists a constant such that
The proof has two parts.
Part 1. See Theorem 2.2, step 1 in [36].
Part 2. From (15) and (49), we have In the beginning of the proof, we suppose that . Then, there exist a positive constant and some such that . Thus, which contradicts Assumption (A2), (30), and (52). Therefore,

3.2. Numerical Discussion

We now report the results of several numerical experiments. Zhang et al. [25] demonstrated the superior numerical efficiency of the MTTHS algorithm with respect to PRP+ [41], CG_DESCENT [45], and L-BFGS [46] using the Wolfe line search, while Dai and Wen [40] reported the numerical efficiency of the DHS method. Thus, we compare the efficient three-term HS method proposed in this paper (named the Bakhtawar–Zabidin–Ahmad method, BZA) with MTTHS [25] and DHS [40]. The BZA method was implemented using the Wolfe–Powell line search (4) with , , and

All codes were written in MATLAB 7.1 and run on an Intel Core i5 system with 8.0 GB RAM and a 2.60 GHz processor. Table 1 lists the numerical results given by BZA, MTTHS, and DHS for a number of test functions. In the Table 1, NI/CT/GE/FE represents number of iterations, CPU time, number of gradient evaluations and number of function evaluations.

According to Moré et al. [47], the efficiency of any method can be determined by its performance on a number of test functions. The number of test functions should not be too large or too small, with 75 considered ideal for testing the efficiency of any method. The test functions in Table 1 were taken from Andrei’s test function collection [48] with standard initial points and dimensions ranging from 2 to 10000.

If the solution had not converged after 500 seconds, the program was terminated. Generally, convergence was achieved within this time limit; functions for which the time limit was exceeded are denoted by “F” for Fail in Table 1.

The Sigma plotting software was used to graph the data. We adopt the performance profiles given by Dolan and Moré [49]. Thus, MTTHS, DHS, and BZA are compared in terms of NI/CT/GE/FE in Figures 1–4. For each method, we plotted the fraction of problems that were solved correctly within a factor of the best time. In the figures, the uppermost curve is the method that solves the most problems within a factor t of the best time. From Table 1 and Figures 1–4, the BZA method outperforms the MTTHS algorithm and DHS method in terms of NI, CT, GE, and FE.

The BZA method solves around 99.5% of the problems, and the performance of BZA is 85% better than that of DHS and 77% better than that of MTTHS. We can also conclude that, on average, three-term conjugate gradient methods are 85% better than two-term conjugate gradient methods (DHS).

4. Conclusion

We have proposed a modified three-term HS conjugate gradient method. An attractive property of the proposed method is that it produces a sufficient descent condition , regardless of the line search. The global convergence properties of the proposed method have been established under Wolfe line search conditions. Numerical results show that the proposed method is more efficient and robust than state-of-the-art three term (MTTHS) and two-term (DHS) CG methods.

Data Availability

No data were used to support this study.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

References

E. Polak, Optimization: Algorithms and Consistent Approximations, Springer, New York, NY, USA, 1997.
View at: MathSciNet
J. Nocedal, “Conjugate gradient methods and nonlinear optimization,” in Linear and Nonlinear Conjugate Gradient Related Methods, L. Adams and J. L. Nazareth, Eds., pp. 9–23, SIAM, Philadelphia, PA, USA, 1995.
View at: Google Scholar | MathSciNet
R. Ziadi, R. Ellaia, and A. Bencherif-Madani, “Global optimization through a stochastic perturbation of the Polak-Ribière conjugate gradient method,” Journal of Computational and Applied Mathematics, vol. 317, pp. 672–684, 2017.
View at: Publisher Site | Google Scholar
M. R. Hestenes and E. Stiefel, “Methods of conjugate gradients for solving linear systems,” Journal of Research of the National Bureau of Standards, vol. 49, pp. 409–436, 1952.
View at: Publisher Site | Google Scholar | MathSciNet
R. Fletcher and C. M. Reeves, “Function minimization by conjugate gradients,” The Computer Journal, vol. 7, pp. 149–154, 1964.
View at: Publisher Site | Google Scholar | MathSciNet
E. Polak and G. Ribière, “Note sur la convergence de méthodes de directions conjuguées,” Revue Française d'Informatique et de Recherche Opérationnelle, vol. 3, no. 16, pp. 35–43, 1969.
View at: Publisher Site | Google Scholar | MathSciNet
B. T. Polyak, “The conjugate gradient method in extreme problems, USSR Comput,” USSR Computational Mathematics and Mathematical Physics, vol. 9, pp. 94–112, 1969.
View at: Publisher Site | Google Scholar
R. Fletcher, Practical Methods of Optimization, vol. I: Unconstrained Optimization, John Wiley & Sons, New York, NY, USA, 2nd edition, 1987.
View at: MathSciNet
Y. Liu and C. Storey, “Efficient generalized conjugate gradient algorithms, Part 1,” Journal of Optimization Theory and Applications, vol. 69, no. 1, pp. 129–137, 1991.
View at: Publisher Site | Google Scholar | MathSciNet
Y. H. Dai and Y. Yuan, “A nonlinear conjugate gradient method with a strong global convergence property,” SIAM Journal on Optimization, vol. 10, no. 1, pp. 177–182, 1999.
View at: Publisher Site | Google Scholar | MathSciNet
Z.-J. Shi and J. Guo, “A new family of conjugate gradient methods,” Journal of Computational and Applied Mathematics, vol. 224, no. 1, pp. 444–457, 2009.
View at: Publisher Site | Google Scholar | MathSciNet
G. Yuan, X. Lu, and Z. Wei, “A conjugate gradient method with descent direction for unconstrained optimization,” Journal of Computational and Applied Mathematics, vol. 233, no. 2, pp. 519–530, 2009.
View at: Publisher Site | Google Scholar | MathSciNet
Z.-J. Shi, S. Wang, and Z. Xu, “The convergence of conjugate gradient method with nonmonotone line search,” Applied Mathematics and Computation, vol. 217, no. 5, pp. 1921–1932, 2010.
View at: Publisher Site | Google Scholar | MathSciNet
A. Alhawarat and Z. Salleh, “Modification of nonlinear conjugate gradient method with weak Wolfe-Powell line search,” Abstract and Applied Analysis, Article ID 7238134, 6 pages, 2017.
View at: Google Scholar
Z. Salleh and A. Alhawarat, “An efficient modification of the Hestenes-Stiefel nonlinear conjugate gradient method with restart property,” Journal of Inequalities and Applications, vol. 2016, no. 1, Article ID 110, 2016.
View at: Publisher Site | Google Scholar
A. Alhawarat, Z. Salleh, M. Mamat, and M. Rivaie, “An efficient modified Polak-Ribière-Polyak conjugate gradient method with global convergence properties,” Optimization Methods and Software, vol. 32, no. 6, pp. 1299–1312, 2017.
View at: Google Scholar
A. Alhawarat, M. Mamat, M. Rivaie, and Z. Salleh, “An efficient hybrid conjugate gradient method with the strong wolfe-powell line search,” Mathematical Problems in Engineering, vol. 2015, Article ID 103517, 7 pages, 2015.
View at: Publisher Site | Google Scholar | MathSciNet
E. M. L. Beale, “A derivative of conjugate gradients,” in Numerical Methods for Nonlinear Optimization, F. A. Lootsma, Ed., pp. 39–43, Academic Press, London, UK, 1972.
View at: Google Scholar | MathSciNet
L. Nazareth, “A conjugate direction algorithm without line searches,” Journal of Optimization Theory and Applications, vol. 23, no. 3, pp. 373–387, 1977.
View at: Publisher Site | Google Scholar | MathSciNet
Y. H. Dai and Y. Yuan, Nonlinear Conjugate Gradient Methods, Shanghai Science and Technology Publisher, Shanghai, China, 2000.
W. W. Hager and H. Zhang, “A survey of nonlinear conjugate gradient methods,” Pacific Journal of Optimization. An International Journal, vol. 2, no. 1, pp. 35–58, 2006.
View at: Google Scholar | MathSciNet
M. F. McGuire and P. Wolfe, “Evaluating a restart procedure for conjugate gradients,” Report RC-4382, IBM Research Center, Yorktown Heights, 1973.
View at: Google Scholar
N. Y. Deng and Z. Li, “Global convergence of three terms conjugate gradient methods,” Optimization Methods and Software, vol. 4, pp. 273–282, 1995.
View at: Google Scholar
L. Zhang, W. Zhou, and D. H. Li, “A descent modified Polak-Ribiere-Polyak conjugate gradient method and its global convergence,” IMA Journal of Numerical Analysis (IMAJNA), vol. 26, no. 4, pp. 629–640, 2006.
View at: Publisher Site | Google Scholar | MathSciNet
L. Zhang, W. Zhou, and D. Li, “Some descent three-term conjugate gradient methods and their global convergence,” Optimization Methods and Software, vol. 22, no. 4, pp. 697–711, 2007.
View at: Publisher Site | Google Scholar | MathSciNet
W. Cheng, “A two-term PRP-based descent method,” Numerical Functional Analysis and Optimization, vol. 28, no. 11, pp. 1217–1230, 2007.
View at: Publisher Site | Google Scholar | MathSciNet
A. Y. Al-Bayati and W. H. Sharif, “A new three-term conjugate gradient method for unconstrained optimization,” Canadian Journal on Science and Engineering Mathematics, vol. 1, no. 5, pp. 108–124, 2010.
View at: Google Scholar
J. Zhang, Y. Xiao, and Z. Wei, “Nonlinear conjugate gradient methods with sufficient descent condition for large-scale unconstrained optimization,” Mathematical Problems in Engineering, vol. 2009, Article ID 243290, 16 pages, 2009.
View at: Publisher Site | Google Scholar | MathSciNet
N. Andrei, “A modified Polak-Ribière-Polyak conjugate gradient algorithm for unconstrained optimization,” Optimization. A Journal of Mathematical Programming and Operations Research, vol. 60, no. 12, pp. 1457–1471, 2011.
View at: Publisher Site | Google Scholar | MathSciNet
N. Andrei, “On three-term conjugate gradient algorithms for unconstrained optimization,” Applied Mathematics and Computation, vol. 219, no. 11, pp. 6316–6327, 2013.
View at: Publisher Site | Google Scholar | MathSciNet
N. Andrei, “A simple three-term conjugate gradient algorithm for unconstrained optimization,” Journal of Computational and Applied Mathematics, vol. 241, pp. 19–29, 2013.
View at: Publisher Site | Google Scholar | MathSciNet
K. Sugiki, Y. Narushima, and H. Yabe, “Globally convergent three-term conjugate gradient methods that use secant conditions and generate descent search directions for unconstrained optimization,” Journal of Optimization Theory and Applications, vol. 153, no. 3, pp. 733–757, 2012.
View at: Publisher Site | Google Scholar | MathSciNet
Y. Narushima, H. Yabe, and J. A. Ford, “A three-term conjugate gradient method with sufficient descent property for unconstrained optimization,” SIAM Journal on Optimization, vol. 21, no. 1, pp. 212–230, 2011.
View at: Publisher Site | Google Scholar | MathSciNet
S. Babaie-Kafaki and R. Ghanbari, “Two modified three-term conjugate gradient methods with sufficient descent property,” Optimization Letters, vol. 8, no. 8, pp. 2285–2297, 2014.
View at: Publisher Site | Google Scholar | MathSciNet
M. Al-Baali, Y. Narushima, and H. Yabe, “A family of three-term conjugate gradient methods with sufficient descent property for unconstrained optimization,” Computational Optimization and Applications, vol. 60, no. 1, pp. 89–110, 2015.
View at: Publisher Site | Google Scholar | MathSciNet
M. Sun and J. Liu, “Three modified Polak-Ribière-Polyak conjugate gradient methods with sufficient descent property,” Journal of Inequalities and Applications, vol. 2015, no. 1, 2015.
View at: Publisher Site | Google Scholar
B. Baluch, Z. Salleh, A. Alhawarat, and U. A. M. Roslan, “A New Modified Three-Term Conjugate Gradient Method with Sufficient Descent Property and Its Global Convergence,” Journal of Mathematics, Article ID 2715854, 12 pages, 2017.
View at: Publisher Site | Google Scholar | MathSciNet
Z.-F. Dai, “Two modified HS type conjugate gradient methods for unconstrained optimization problems,” Nonlinear Analysis: Theory, Methods & Applications, vol. 74, no. 3, pp. 927–936, 2011.
View at: Publisher Site | Google Scholar | MathSciNet
Z. Wei, G. Li, and L. Qi, “New nonlinear conjugate gradient formulas for large-scale unconstrained optimization problems,” Applied Mathematics and Computation, vol. 179, no. 2, pp. 407–430, 2006.
View at: Publisher Site | Google Scholar
Z. Dai and F. Wen, “Another improved Wei–Yao–Liu nonlinear conjugate gradient method with sufficient descent property,” Applied Mathematics and Computation, vol. 218, no. 14, pp. 7421–7430, 2012.
View at: Publisher Site | Google Scholar
J. C. Gilbert and J. Nocedal, “Global convergence properties of conjugate gradient methods for optimization,” SIAM Journal on Optimization, vol. 2, no. 1, pp. 21–42, 1992.
View at: Publisher Site | Google Scholar
Y.-H. Dai and L.-Z. Liao, “New conjugacy conditions and related nonlinear conjugate gradient methods,” Applied Mathematics & Optimization, vol. 43, no. 1, pp. 87–101, 2001.
View at: Publisher Site | Google Scholar | MathSciNet
G. Zoutendijk, “Nonlinear programming, computational methods,” in Integer and Nonlinear Programming, J. Abadie, Ed., pp. 37–86, North-Holland, Amsterdam, The Netherlands, 1970.
View at: Google Scholar | MathSciNet
Z.-f. Dai and B.-S. Tian, “Global convergence of some modified PRP nonlinear conjugate gradient methods,” Optimization Letters, vol. 5, no. 4, pp. 615–630, 2011.
View at: Publisher Site | Google Scholar | MathSciNet
W. W. Hager and H. Zhang, “A new conjugate gradient method with guaranteed descent and an efficient line search,” SIAM Journal on Optimization, vol. 16, no. 1, pp. 170–192, 2005.
View at: Publisher Site | Google Scholar | MathSciNet
D. C. Liu and J. Nocedal, “On the limited memory BFGS method for large scale optimization,” Mathematical Programming, vol. 45, no. 1–3, pp. 503–528, 1989.
View at: Publisher Site | Google Scholar | MathSciNet
J. J. Moré, B. S. Garbow, and K. E. Hillstrom, “Testing unconstrained optimization software,” ACM Transactions on Mathematical Software, vol. 7, no. 1, pp. 17–41, 1981.
View at: Publisher Site | Google Scholar | MathSciNet
N. Andrei, “An unconstrained optimization test functions collection,” Advanced Modeling and Optimization, vol. 10, no. 1, pp. 147–161, 2008.
View at: Google Scholar | MathSciNet
E. D. Dolan and J. J. Moré, “Benchmarking optimization software with performance profiles,” Mathematical Programming, vol. 91, no. 2, pp. 201–213, 2002.
View at: Publisher Site | Google Scholar | MathSciNet

Copyright

Copyright © 2018 Bakhtawar Baluch et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

2587

Downloads

2360

Citations