A New Modified Three-Term Hestenes–Stiefel Conjugate Gradient Method with Sufficient Descent Property and Its Global Convergence
This paper describes a modified three-term Hestenes–Stiefel (HS) method. The original HS method is the earliest conjugate gradient method. Although the HS method achieves global convergence using an exact line search, this is not guaranteed in the case of an inexact line search. In addition, the HS method does not usually satisfy the descent property. Our modified three-term conjugate gradient method possesses a sufficient descent property regardless of the type of line search and guarantees global convergence using the inexact Wolfe–Powell line search. The numerical efficiency of the modified three-term HS method is checked using 75 standard test functions. It is known that three-term conjugate gradient methods are numerically more efficient than two-term conjugate gradient methods. Importantly, this paper quantifies how much better the three-term performance is compared with two-term methods. Thus, in the numerical results, we compare our new modification with an efficient two-term conjugate gradient method. We also compare our modification with a state-of-the-art three-term HS method. Finally, we conclude that our proposed modification is globally convergent and numerically efficient.
In the field of optimization conjugate gradient methods are a well-known approach for solving large-scale unconstrained optimization problems. The conjugate gradient (CG) methods are simple and have relatively modest storage requirements. This class of methods has a vast number of applications in different areas, especially in the field of engineering [1–3].
Consider the unconstrained optimization problem:where is continuously differentiable and its gradient is . Normally CG methods generate a sequence defined byIn (2), is a general line search and is a search direction given by where is a parameter of the CG method. The six pioneering forms of are defined in [4–10].
Line searches may be exact or inexact. Exact line searches are time consuming, computationally expensive, and difficult and require large amounts of storage [11–13]. Thus, inexact line search techniques are often adopted because of their efficiency and global convergence properties. Well-known inexact line search methods include the Wolfe and strong Wolfe techniques, which can be written as where , andRecently, Alhawarat and Salleh , Salleh and Alhawarat , and Alhawarat et al. [16, 17] proposed efficient CG and hybrid CG methods that fulfill the required global convergence properties. To improve the existing methods, a three-term CG technique has been introduced. Several different researchers have suggested various modifications to the three-term CG method. For instance, Beale  and Nazareth  proposed CG methods based on three terms that possess the finite termination property, but these do not perform well in practice [20, 21]. Furthermore, reports by McGuire and Wolfe , Deng and Li , Zhang et al. [24, 25], Cheng , Al-Bayati and Sharif , Zhang Xiao and Wei , Andrei [29–31], Sugiki et al. , Narushima et al. , Babaie-Kafaki and Ghanbari , Al-Baali et al. , Sun and Liu , and Baluch et al.  discuss the global convergence and numerical results of modified three-term CG methods.
In this paper, a modified three-term Hestenes–Stiefel (HS) method is proposed. The general formula of the HS method  is This is known to be the first of all the CG parameters. This method ensures the global convergence of the exact line search. A nice property of the HS method is that it satisfies the conjugacy condition, regardless of whether the line search is exact or inexact . However, this method does not satisfy the global convergence property when used with an inexact line search.
In this paper, the method of Zhang et al.  is modified with the help of another efficient CG parameter proposed by Wei et al. . An attractive feature of the new three-term HS method is that it satisfies the sufficient descent condition regardless of the line search used. Furthermore, our modification is globally convergent for both convex and nonconvex functions when using an inexact line search. Numerical experiments show that the new modification is more efficient and robust than the MTTHS algorithm proposed by Zhang et al. . The second aspect of this paper is to quantify the improvement of the three-term CG method over two-term approaches. To do this, we consider the efficient two-term CG method  given byThis DHS  method is one of the more efficient CG techniques, as it possesses the sufficient descent property and offers global convergence under Wolfe–Powell line search conditions. The numerical results given by this method are also convincing. Therefore, this two-term CG method is compared with our new modification to quantify the improvement offered by three-term CG methods.
The remainder of this paper is organized as follows. In Section 2, the motivation for and construction of the three-term HS CG method is discussed, and the general form is presented in Algorithm A. Section 3 is divided into two subsections, with Section 3.1 covering the sufficient descent condition and the global convergence properties for convex and nonconvex functions and Section 3.2 presenting detailed numerical results to evaluate the proposed method. Finally, Section 4 concludes this paper.
2. Motivation and Formulas
Zhang et al.  proposed the first three-term HS (TTHS) method. This can be written as and .
TTHS satisfies the descent property; if an exact line search is used, then it reduces to the original HS method. Further, to guarantee the global convergence properties of the search direction given by (8), a modified (MTTHS) algorithm was introduced with the search direction: where and .
As MTTHS was introduced to prove the global convergence properties of the search direction in (8), the question arises as to why (8) is not used to prove the global convergence properties. Instead of ignoring (8), it should be made efficient and globally convergent. Thus, there is room to modify (8) so as to satisfy the global convergence properties. It is expected that such a modification would outperform the MTTHS algorithm numerically.
Wei et al.  proposed an efficient CG parameter given byIn this parameter, the term plays an important role in satisfying the sufficient descent and global convergence properties. Thus, we take from the denominator of the above parameter and use it with (8) to construct a new modified three-term HS method. Hence,It is known that the HS method does not converge globally when the objective function is nonconvex. Further, Gilbert and Nocedal  showed that the parameter must be nonnegative to achieve convergence for nonconvex or nonlinear functions, i.e.,Applying the same technique to our parameter giveswhere If the line search is exact, then the parameters , , and reduce to the original parameters , , and TTHS . The procedure of our proposed three-step CG method is described in Algorithm A.
Step 0. Choose an initial point , and set ,
Step 1. For convergence, if (), then the algorithm terminates; otherwise, go to step 2.
Step 2. Compute
Step 3. Determine the step size by the Wolfe line search (4).
Step 4. Compute the new point
Step 5. Set and go to step 1.
3. Results and Discussion
This section contains a theoretical discussion and numerical results. The first subsection considers the global convergence properties of our proposed method and the second presents the results from numerical computations.
3.1. Global Convergence Properties
(A1) The level set is bounded.
(A2) In some neighborhoodof , the gradient is Lipschitz continuous on an open convex set that contains , i.e., there exists a positive constant such thatAssumptions (A1) and (A2) imply that there exist positive constants and such thatWe now prove the sufficient descent condition independent of the line search and also . From (15), (11), and (14), we can writethat is,Hence, the sufficient descent condition holds regardless of the line search. Now, we prove thatAs we have , taking the modulus on both sides givesBy the Schwartz inequality, we havesoorHence, we haveThe HS method is well known for its conjugacy conditions, such asBy , CG methods that inherit (27) will be more efficient than other CG parameters that do not inherit this property. Dai and Liao  proposed the following conjugacy condition for an inexact line search: Using the exact line search , (28) reduces to the conjugacy condition in (27).
Lemma 1 (see ). Suppose there is an initial point for which Assumptions (A1) and (A2) hold. Now, consider the method in the form of (2), in which is a descent direction and satisfies the Wolfe line search condition (4). Then This is known as Zoutendijk’s condition and is used for proving the global convergence of a CG method. This condition together with (26) shows that
Definition 2. The function is called uniformly convex  on if there exists a positive constant such that We now show the global convergence of Algorithm A for uniformly convex functions.
Lemma 3. Let the sequences and be generated by Algorithm A and suppose that (31) holds. Then,where .
Proof. For details, see Lemma 2.1 of .
Theorem 4. Let the conditions in Assumptions (A1) and (A2) hold and the function be uniformly convex. Then,
Proof. AsThen, using the second Wolfe condition (4) and the sufficient descent condition,we haveFrom (11), (32), and (36) and Assumption (A2),Let us suppose that , where and so that . Thus, Now,Combining (38) and (39) with (15), we obtainNow, let so thatand we get . This implies thatHence, by (30), we have
We are now going to prove the global convergence of Algorithm A for nonconvex functions.
Lemma 5. Suppose that Assumptions (A1) and (A2) hold. Let the sequence be generated by Algorithm A. If there exists a constant such that for every , then where
Proof. As and , and also for all , then for all . Hence, is well defined. Ifthen , where and are unit vectors. Therefore,As ,Now, from Assumption (A2), (14), and (18),From (17), (18), and (48), there exists a constant such thatFrom (30) and (49), we obtainCombining this with (44) completes the proof.
Theorem 6. Let Assumptions (A1) and (A2) hold. Then, the sequence () generated by Algorithm A satisfies
Proof. Suppose that . Then, there exists a constant such that
The proof has two parts.
Part 1. See Theorem 2.2, step 1 in .
Part 2. From (15) and (49), we have In the beginning of the proof, we suppose that . Then, there exist a positive constant and some such that . Thus, which contradicts Assumption (A2), (30), and (52). Therefore,
3.2. Numerical Discussion
We now report the results of several numerical experiments. Zhang et al.  demonstrated the superior numerical efficiency of the MTTHS algorithm with respect to PRP+ , CG_DESCENT , and L-BFGS  using the Wolfe line search, while Dai and Wen  reported the numerical efficiency of the DHS method. Thus, we compare the efficient three-term HS method proposed in this paper (named the Bakhtawar–Zabidin–Ahmad method, BZA) with MTTHS  and DHS . The BZA method was implemented using the Wolfe–Powell line search (4) with , , and
All codes were written in MATLAB 7.1 and run on an Intel Core i5 system with 8.0 GB RAM and a 2.60 GHz processor. Table 1 lists the numerical results given by BZA, MTTHS, and DHS for a number of test functions. In the Table 1, NI/CT/GE/FE represents number of iterations, CPU time, number of gradient evaluations and number of function evaluations.
According to Moré et al. , the efficiency of any method can be determined by its performance on a number of test functions. The number of test functions should not be too large or too small, with 75 considered ideal for testing the efficiency of any method. The test functions in Table 1 were taken from Andrei’s test function collection  with standard initial points and dimensions ranging from 2 to 10000.
If the solution had not converged after 500 seconds, the program was terminated. Generally, convergence was achieved within this time limit; functions for which the time limit was exceeded are denoted by “F” for Fail in Table 1.
The Sigma plotting software was used to graph the data. We adopt the performance profiles given by Dolan and Moré . Thus, MTTHS, DHS, and BZA are compared in terms of NI/CT/GE/FE in Figures 1–4. For each method, we plotted the fraction of problems that were solved correctly within a factor of the best time. In the figures, the uppermost curve is the method that solves the most problems within a factor t of the best time. From Table 1 and Figures 1–4, the BZA method outperforms the MTTHS algorithm and DHS method in terms of NI, CT, GE, and FE.
The BZA method solves around 99.5% of the problems, and the performance of BZA is 85% better than that of DHS and 77% better than that of MTTHS. We can also conclude that, on average, three-term conjugate gradient methods are 85% better than two-term conjugate gradient methods (DHS).
We have proposed a modified three-term HS conjugate gradient method. An attractive property of the proposed method is that it produces a sufficient descent condition , regardless of the line search. The global convergence properties of the proposed method have been established under Wolfe line search conditions. Numerical results show that the proposed method is more efficient and robust than state-of-the-art three term (MTTHS) and two-term (DHS) CG methods.
No data were used to support this study.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
E. Polak, Optimization: Algorithms and Consistent Approximations, Springer, New York, NY, USA, 1997.View at: MathSciNet
R. Fletcher, Practical Methods of Optimization, vol. I: Unconstrained Optimization, John Wiley & Sons, New York, NY, USA, 2nd edition, 1987.View at: MathSciNet
A. Alhawarat and Z. Salleh, “Modification of nonlinear conjugate gradient method with weak Wolfe-Powell line search,” Abstract and Applied Analysis, Article ID 7238134, 6 pages, 2017.View at: Google Scholar
A. Alhawarat, Z. Salleh, M. Mamat, and M. Rivaie, “An efficient modified Polak-Ribière-Polyak conjugate gradient method with global convergence properties,” Optimization Methods and Software, vol. 32, no. 6, pp. 1299–1312, 2017.View at: Google Scholar
Y. H. Dai and Y. Yuan, Nonlinear Conjugate Gradient Methods, Shanghai Science and Technology Publisher, Shanghai, China, 2000.
M. F. McGuire and P. Wolfe, “Evaluating a restart procedure for conjugate gradients,” Report RC-4382, IBM Research Center, Yorktown Heights, 1973.View at: Google Scholar
N. Y. Deng and Z. Li, “Global convergence of three terms conjugate gradient methods,” Optimization Methods and Software, vol. 4, pp. 273–282, 1995.View at: Google Scholar
A. Y. Al-Bayati and W. H. Sharif, “A new three-term conjugate gradient method for unconstrained optimization,” Canadian Journal on Science and Engineering Mathematics, vol. 1, no. 5, pp. 108–124, 2010.View at: Google Scholar
K. Sugiki, Y. Narushima, and H. Yabe, “Globally convergent three-term conjugate gradient methods that use secant conditions and generate descent search directions for unconstrained optimization,” Journal of Optimization Theory and Applications, vol. 153, no. 3, pp. 733–757, 2012.View at: Publisher Site | Google Scholar | MathSciNet