A New Modified Three-Term Conjugate Gradient Method with Sufficient Descent Property and Its Global Convergence

Baluch, Bakhtawar; Salleh, Zabidin; Alhawarat, Ahmad; Roslan, U. A. M.

doi:https://doi.org/10.1155/2017/2715854

Journal of Mathematics

On this page

Abstract Introduction Conclusion Conflicts of Interest References Copyright Related Articles

Research Article | Open Access

Volume 2017 | Article ID 2715854 | https://doi.org/10.1155/2017/2715854

A New Modified Three-Term Conjugate Gradient Method with Sufficient Descent Property and Its Global Convergence

Bakhtawar Baluch,¹Zabidin Salleh,¹Ahmad Alhawarat,²and U. A. M. Roslan¹

Academic Editor: Liwei Zhang

Received04 Apr 2017

Revised30 Jun 2017

Accepted03 Aug 2017

Published13 Sept 2017

Abstract

A new modified three-term conjugate gradient (CG) method is shown for solving the large scale optimization problems. The idea relates to the famous Polak-Ribière-Polyak (PRP) formula. As the numerator of PRP plays a vital role in numerical result and not having the jamming issue, PRP method is not globally convergent. So, for the new three-term CG method, the idea is to use the PRP numerator and combine it with any good CG formula’s denominator that performs well. The new modification of three-term CG method possesses the sufficient descent condition independent of any line search. The novelty is that by using the Wolfe Powell line search the new modification possesses global convergence properties with convex and nonconvex functions. Numerical computation with the Wolfe Powell line search by using the standard test function of optimization shows the efficiency and robustness of the new modification.

1. Introduction

The conjugate gradient method is an efficient and organized tool for solving the large-scale nonlinear optimization problem, due to its simplicity, easiness, and low memory requirements. This method is very popular for mathematician and engineers and those who are interested in solving the large-scale optimization problems [1–3].

Consider the unconstrained optimization problemwhere is continuously differentiable and its gradient is available . Generally CG method generates an iterative sequence defined bywhere is a line search and is a search direction defined by where the term is a scalar. There are six essential formulas for , which are stated as(Hestenes and Stiefel [4], 1952),(Fletcher and Reeves [5], 1964),(Polak et al. [6, 7], 1969),(Conjugate-Descent [8], 1997),(Liu and Storey [9], 1991), (Dai and Yuan [10], 2000).

Generally inexact line search is used in order to get the global convergence of conjugate gradient method, such as Wolfe line search or strong Wolfe line search; the Wolfe line search is given as where The strong Wolfe line search is computing such that Recently Alhawarat et al. [11, 12] and Alhawarat and Salleh [13, 14] have proposed an efficient and hybrid conjugate gradient method that satisfies the global convergence properties. To enhance the effectiveness of two-term conjugate gradient method, the three-term conjugate gradient has been widely studied and given much importance. The three-term conjugate gradient method attains different numerical outcomes, depending on how the scalar parameter is being selected. The papers by Beale [15], McGuire and Wolfe [16], Nazareth [17], Deng and Li [18], Dai and Yuan [19], Zhang et al. [20, 21], Cheng [22], Zhang et al. [23], Al-Bayati and Sharif [24], Narushima et al. [25], Andrei [26–28], Sugiki et al. [29], Al-Baali et al. [30], Babaie-Kafaki and Ghanbari [31], and Sun and Liu [32] presented different types of three-term conjugate gradient method along with their numerical performance and efficiency and proved their global convergence properties. As a comparison with classical conjugate gradient algorithms, the proposed three-term conjugate gradient algorithms are numerically strong, efficient, reliable, and robust compared to the classical conjugate gradient algorithms and Beale [15] was the first to propose the three-term conjugate gradient method.

In the new three-term modification, we put our attention on the numerator of PRP method, in which the parameter is given asThe PRP method is among one of the most efficient and reliable conjugate gradient method due to good numerical performance. The global convergence of PRP is established when the objective function is strongly convex and the line search is exact [6]. On the other hand Powell [33] through his analysis expressed that there exist nonconvex functions for which PRP method does not converge globally. Gilbert and Nocedal [34] established the so-called method; in this method is restricted to be nonnegative denoted as If the standard Wolfe line search (10) is used, thenmethod attains the global convergence and also sufficient descent conditions are being satisfied.

Recently Sun and Liu [32] proposed a new conjugate gradient method called TMPRP 1 method by using the VFR formula from Wei et al. [35], in which the search direction is stated aswhere or .

This method has attractive property of satisfying the sufficient descent condition independent of any line search and attains global convergence if standard Wolfe line is used. As compared with the strong Wolfe line search, the standard Wolfe line search takes less computation in order to get an acceptable step size at each iteration. Hence the standard Wolfe line search increases the effectiveness of the conjugate gradient method [32].

The rest of the paper is organized as follows. In Section 2, the motivation and formula for construction of three-term conjugate gradient method are given. In Section 3 we have presented Algorithm 1.1 in which the general form of three-term conjugate gradient method is shown. In Section 4 the sufficient descent condition and the global convergence properties for convex and nonconvex function are proven. In Section 5, the detailed numerical results to test the proposed method are reported.

2. Motivation and Formula

Wei et al. [35] proposed three new formulas which are given in the following:There is an efficient conjugate gradient method named In this formula, the denominator plays an important role in satisfying the sufficient descent condition and performs well in terms of global convergence and numerical result. This motivated us to take the denominator from this formula. Secondly, the PRP [6, 7] method is considered to be one of the most proficient CG parameters due to the properties of its numerator . If the step taken becomes very small, then reaches zero such as . Afterwards ; then the search direction continued as the steepest descent method. So the numerator of PRP method worked efficiently and does not jam.

This motivated us to construct a new modified three-term conjugate gradient method, such asFurther, Powell [36] showed that the PRP method can cycle infinitely without approaching a minimum point, even if the step size is chosen to the least positive minimizer. To overcome this, Gilbert and Nocedal [34] showed their analysisSoIn , , and , the parameters and have an important role in the sense that when is getting smaller, the numbers of iteration, function evaluation, and gradient evaluation are decreased and when is getting larger, the numbers of iteration, function evaluation, and gradient evaluation are also decreased. So we observe that the best value for the parameters is .

3. Algorithm 1.1

Step 0. Given an initial point , , , , and set , .

Step 1. If , where , then the algorithm stops; otherwise, go to Step 2.

(Note: all the norm we use in this paper means ).

Step 2. Compute the search direction (19) by using and where where .

Step 3. Determine the step size by the Wolfe line search (10).

Step 4. Compute where is given in Step 3 and is given in Step 2.

Step 5. Set and go to Step 1.

4. Global Convergence of Modified Three Term

Assumptions 1. (A1) The level set is bounded.
(A2) In some neighborhood of , the gradient is Lipschitz continuous on an open convex set that contains ; that is, there exists a positive constant such thatthen Assumptions (A1) and (A2) and [32, 34] imply that there exist positive constants and such that

Since is decreasing as , from Assumption (A1) it is shown that the sequence created by Algorithm 1.1 will be contained in a bounded region. Then the sequence is convergent.

Now we will prove the sufficient descent condition independent of line search and also . From (19),Multiplying by , we obtain that is,Hence the sufficient descent condition independent of line search holds.

Now we prove that . As we have , by taking modulus on both sidesBy Schwarz inequality we have So,Thus we have

Lemma 1. Assumptions (A1) and (A2) hold if is supposed to be an initial point. Now consider any method in the form of (2), in which is a descent direction and satisfies the Wolfe condition (10) or the strong Wolfe line search condition (11). Then we have the Zoutendijk condition: which is normally used to prove the global convergence of CG method. From (29) the Zoutendijk condition is equivalent to the following inequality:

Definition 2 (see [32]). The function is called uniformly convex on , if there exists a positive constant such thatwhere the function has the Hessian matrix .

We now show the global convergence of Algorithm 1.1 for uniformly convex functions.

Lemma 3. Let both sequences and be generated by Algorithm 1.1 and suppose that (32) holds; thenwhere , is a positive constant, and is a positive number whose range is , from the Wolfe line search (10).

Proof. Detail of proof can be seen in Lemma of [37].

Theorem 4. Suppose that Assumptions (A1) and (A2) hold and the function is uniformly convex; then

Proof. From (15), (33), and (A2), we haveNow From (18), (33), and (A2), we have Combining (35) and (37) with (19),Now letting , we get ,So by (31), we get

We now show the global convergence for nonconvex functions.

Lemma 5. Let Assumptions (A1) and (A2) hold. Consider the sequence () to be generated by Algorithm 1.1. If there exists a positive constant in such a way that for every ,where .

Proof. Since , , and for every , then we get for every , so that is well defined. Ifthen we get . Also and are unit vectors, soas we know ,Now from (18), (22), and (A2) Now from (21), (22), and (45), there is a constant as follows:Therefore, from (31) and (46), we haveThis along with (44) completes the proof.

Theorem 6. Suppose that Assumptions (A1) and (A2) hold. Then the sequence () generated by Algorithm 1.1 satisfies

Proof. Suppose that the conclusion (48) is not true. Then there exists a positive constant in such a way that .
The proof has the following two parts.
Part 1. We noticed that for any and we have , such thatProceeding the same proof of Theorem , step 1 from [32], we havePart 2. Taking a bound on the direction . Now from (19) and (46) we haveAt the beginning of proof we assume that ; then there exists a constant and also there exists such that . Then which contradicts with (A2), (31), and (51). Hence it is proved that .

5. Numerical Results

In this part we compare the numerical results of proposed three-term BZAU (Bakhtawar, Zabidin, Ahmad and Ummu) method with recently developed TMPRP1 method and also compare their performance. The Wolfe line search (10) is used and the values of the parameters for BZAU and TMPRP1 method areμ= 2, 10⁽⁻⁴⁾;η= 1, 0; ρ = 0.1, 0.1; and σ = 0.5, 0.5, respectively. The code was written in Matlab 7.1 and run on an i5 computer with 2.40 GHz CPU processor, 2.0 GB RAM memory. We test the functions taken from [38] with dimension ranges . The main purpose in optimization for the selection of large number of test functions is to test the unconstrained optimization algorithms properly. Dantzig (1914–2005) said the final test of a theory is its capacity to solve the problems which originated it. This is one of the main reasons we select the large-scale unconstrained optimization problems to test the theoretical progress in numerical form through mathematical programming [38].

Moré et al. [39] claimed the efficiency of a method and that algorithm for a small number of test functions is not suitable because this will lead to the choice of an algorithm that is not favorable. Testing a method or algorithm for a large number of test functions would lead to large amount of data and from that data we can interpret which method or algorithm is more efficient and robust. But the number of test functions should not be very large nor very small, so there is a benchmark of 75 numbers of test functions which are chosen to test the efficiency of any method.

Practically, optimizers need to evaluate nonlinear optimization method. To prove the global convergence properties of any method, the theory is not enough to determine the reliability and efficiency of a method. As a result, the robustness of any method is established by testing the large number of test problems [38].

In global convergence property is used in case of proving convex function and property is used for proving nonconvex function. But in the numerical part is used for a comparison with TMPRP1 method. The TMPRP1 possesses the sufficient descent property without any line searches. Theoretically, TMPRP1 method established well and converges globally. When it comes to numerical computation, the TMPRP1 method is tested by a benchmark of 75 numbers of test functions and shows a promising result. Hence the TMPRP1 method is then compared with our BZAU method.

In Table 1 number of iterations, number of function evaluations, number of gradient evaluations, and CPU time are represented by NI/NF/GE/CT. If the CT exceeds 500 seconds and the NI is more than 10000 iterations, then the function is given the name of Fail F. This standard is followed by every paper. For most of the function we can get the result within this limit and the function that does not come in this limit is named Fail F.

The performance profiles are adopted by Dolan and Moré [40]. In Figures 1–4 we compare the performance based on the NI/NF/GE/CT. For every method, we plot fraction of problems for which the method is within a factor of best time. The left hand side of the figure represents the percentage of the test problem of which method is robust and the fastest; the right hand side of the figure shows the percentage of test problems that are solved successfully by either the BZAU or TMPRP1 method. In the graph there are two axes and , as there are much values of which creates difficulty in understanding the graph. The value of -axis is then converted in natural log of so it shows -axis values in exponent like , and the values of -axis are taken in linear form of . Comparing Figures 1–4 shows that the BZAU method outperforms the TMPRP1 method in every case. The top curve is the most efficient method, so the new modified three-term CG method is also efficient in terms of numerical result.

6. Conclusion

In this paper, we have proposed a new modified three-term conjugate gradient method for unconstrained optimization. The new modified three-term BZAU method possesses the sufficient descent property independent of any line search. Global convergence is shown for both convex and nonconvex functions using the Wolfe line search. In numerical result we compare the three-term BZAU method with TMPRP1 method [32]. As in [32] the TMPRP1 method is shown to be numerically efficient when it comes to comparison with other two robust methods such as CG_Descent method [41] and DTPRP method [42]. That is the reason for comparing our BZAU method with TMPRP1 method.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

References

J. Nocedal, “Conjugate gradient methods and nonlinear optimization,” in Linear and Nonlinear Conjugate Gradient Related Methods, J. L. Nazareth, Ed., pp. 9–23, SIAM, Philadelphia, PA, USA, 1995.
View at: Google Scholar | MathSciNet
E. Polak, Optimization: Algorithms and Consistent Approximations, Springer, New York, NY, USA, 1997.
View at: MathSciNet
R. Ziadi, R. Ellaia, and A. Bencherif-Madani, “Global optimization through a stochastic perturbation of the Polak-Ribière conjugate gradient method,” Journal of Computational and Applied Mathematics, vol. 317, pp. 672–684, 2017.
View at: Publisher Site | Google Scholar
M. R. Hestenes and E. Stiefel, “Methods of conjugate gradients for solving linear systems,” Journal of Research of the National Bureau of Standards, vol. 49, pp. 409–436 (1953), 1952.
View at: Publisher Site | Google Scholar | MathSciNet
R. Fletcher and C. M. Reeves, “Function minimization by conjugate gradients,” The Computer Journal, vol. 7, pp. 149–154, 1964.
View at: Publisher Site | Google Scholar | MathSciNet
E. Polak and G. Ribière, “Note Sur la convergence de directions conjugeès, Rev,” Revue Française d'Informatique et de Recherche Opérationnelle, vol. 3, no. 16, pp. 35–43, 1969.
View at: Publisher Site | Google Scholar | MathSciNet
B. T. Polyak, “The conjugate gradient method in extreme problems,” USSR Computational Mathematics and Mathematical Physics, vol. 9, no. 4, pp. 94–112, 1969.
View at: Publisher Site | Google Scholar
R. Fletcher, Practical Method of Optimization, Vol. I: Unconstrained Optimization, Wiley, New York, NY, USA, 2nd edition, 1997.
View at: MathSciNet
Y. Liu and C. Storey, “Efficient generalized conjugate gradient algorithms. I. Theory,” Journal of Optimization Theory and Applications, vol. 69, no. 1, pp. 129–137, 1991.
View at: Publisher Site | Google Scholar | MathSciNet
Y. H. Dai and Y. Yuan, “A nonlinear conjugate gradient with a strong global convergence properties,” SIAM Journal on Optimization, vol. 10, no. 1, pp. 177–182, 2000.
View at: Publisher Site | Google Scholar | MathSciNet
A. Alhawarat, M. Mamat, M. Rivaie, and Z. Salleh, “An efficient hybrid conjugate gradient method with the strong Wolfe-Powell line search,” Mathematical Problems in Engineering, vol. 2015, Article ID 103517, 7 pages, 2015.
View at: Publisher Site | Google Scholar
A. Alhawarat, Z. Salleh, M. Mamat, and M. Rivaie, “An efficient modified Polak–Ribière–Polyak conjugate gradient method with global convergence properties,” Optimization Methods and Software, pp. 1–14, 2016.
View at: Publisher Site | Google Scholar
A. Alhawarat and Z. Salleh, “Modification of nonlinear conjugate gradient method with weak Wolfe-Powell line search,” Abstract and Applied Analysis, Article ID 7238134, Art. ID 7238134, 6 pages, 2017.
View at: Publisher Site | Google Scholar | MathSciNet
Z. Salleh and A. Alhawarat, “An efficient modification of the Hestenes-Stiefel nonlinear conjugate gradient method with restart property,” Journal of Inequalities and Applications, vol. 2016, no. 1, article no. 110, 2016.
View at: Publisher Site | Google Scholar
E. M. Beale, “A derivative of conjugate gradients,” in Numerical Methods for Nonlinear Optimization, F. A. Lootsma, Ed., pp. 39–43, Academic Press, London, 1972.
View at: Google Scholar | MathSciNet
M. F. McGuire and P. Wolfe, “Evaluating a Restart Procedure for Conjugate Gradients,” Tech. Rep., IBM Research Center, Yorktown Heights, 1973.
View at: Google Scholar
L. Nazareth, “A conjugate direction algorithm without line searches,” Journal of Optimization Theory and Applications, vol. 23, no. 3, pp. 373–387, 1977.
View at: Publisher Site | Google Scholar | MathSciNet
N. Y. Deng and Z. Li, “Global convergence of three terms conjugate gradient methods,” Optim. Method Softw, vol. 4, pp. 273–282, 1995.
View at: Google Scholar
Y. H. Dai and Y. Yuan, “A nonlinear conjugate gradient method with a strong global convergence property,” SIAM Journal on Optimization, vol. 10, no. 1, pp. 177–182, 1999.
View at: Publisher Site | Google Scholar | MathSciNet
L. Zhang, W. Zhou, and D.-H. Li, “A descent modified Polak-Ribiére-Polyak conjugate gradient method and its global convergence,” IMA Journal of Numerical Analysis, vol. 26, no. 4, pp. 629–640, 2006.
View at: Publisher Site | Google Scholar | MathSciNet
L. Zhang, W. Zhou, and D. Li, “Some descent three-term conjugate gradient methods and their global convergence,” Optimization Methods and Software, vol. 22, no. 4, pp. 697–711, 2007.
View at: Publisher Site | Google Scholar | MathSciNet
W. Cheng, “A two-term PRP-based descent method,” Numerical Functional Analysis and Optimization, vol. 28, no. 11-12, pp. 1217–1230, 2007.
View at: Publisher Site | Google Scholar | MathSciNet
J. Zhang, Y. Xiao, and Z. Wei, “Nonlinear conjugate gradient methods with sufficient descent condition for large-scale unconstrained optimization,” Mathematical Problems in Engineering, Article ID 243290, Art. ID 243290, 16 pages, 2009.
View at: Publisher Site | Google Scholar | MathSciNet
A. Y. Al-Bayati and W. H. Sharif, “A new three-term conjugate gradient method for unconstrained optimization,” Canadian Journal on Science & Engineering Mathematics, vol. 1, no. 5, pp. 108–124, 2010.
View at: Google Scholar
Y. Narushima, H. Yabe, and J. A. Ford, “A three-term conjugate gradient method with sufficient descent property for unconstrained optimization,” SIAM Journal on Optimization, vol. 21, no. 1, pp. 212–230, 2011.
View at: Publisher Site | Google Scholar | MathSciNet
N. Andrei, “A modified Polak-Ribière-Polyak conjugate gradient algorithm for unconstrained optimization,” Optimization. A Journal of Mathematical Programming and Operations Research, vol. 60, no. 12, pp. 1457–1471, 2011.
View at: Publisher Site | Google Scholar | MathSciNet
N. Andrei, “On three-term conjugate gradient algorithms for unconstrained optimization,” Applied Mathematics and Computation, vol. 219, no. 11, pp. 6316–6327, 2013.
View at: Publisher Site | Google Scholar | MathSciNet
N. Andrei, “A simple three-term conjugate gradient algorithm for unconstrained optimization,” Journal of Computational and Applied Mathematics, vol. 241, pp. 19–29, 2013.
View at: Publisher Site | Google Scholar | MathSciNet
K. Sugiki, Y. Narushima, and H. Yabe, “Globally convergent three-term conjugate gradient methods that use secant conditions and generate descent search directions for unconstrained optimization,” Journal of Optimization Theory and Applications, vol. 153, no. 3, pp. 733–757, 2012.
View at: Publisher Site | Google Scholar | MathSciNet
M. Al-Baali, Y. Narushima, and H. Yabe, “A family of three-term conjugate gradient methods with sufficient descent property for unconstrained optimization,” Computational Optimization and Applications. An International Journal, vol. 60, no. 1, pp. 89–110, 2015.
View at: Publisher Site | Google Scholar | MathSciNet
S. Babaie-Kafaki and R. Ghanbari, “Two modified three-term conjugate gradient methods with sufficient descent property,” Optimization Letters, vol. 8, no. 8, pp. 2285–2297, 2014.
View at: Publisher Site | Google Scholar | MathSciNet
M. Sun and J. Liu, “Three modified Polak-Ribière-Polyak conjugate gradient methods with sufficient descent property,” Journal of Inequalities and Applications, vol. 2015, no. 1, 2015.
View at: Publisher Site | Google Scholar
M. J. Powell, “Restart procedures for the conjugate gradient method,” Mathematical Programming, vol. 12, no. 2, pp. 241–254, 1977.
View at: Publisher Site | Google Scholar | MathSciNet
J. C. Gilbert and J. Nocedal, “Global convergence properties of conjugate gradient methods for optimization,” SIAM Journal on Optimization, vol. 2, no. 1, pp. 21–42, 1992.
View at: Publisher Site | Google Scholar
Z. Wei, G. Li, and L. Qi, “New nonlinear conjugate gradient formulas for large-scale unconstrained optimization problems,” Applied Mathematics and Computation, vol. 179, no. 2, pp. 407–430, 2006.
View at: Publisher Site | Google Scholar | MathSciNet
M. J. D. Powell, “Nonconvex minimization calculations and the conjugate gradient method,” in Numerical Analysis, vol. 1066 of Lecture Notes in Mathematics, pp. 122–141, Springer, Berlin, Germany, 1984.
View at: Publisher Site | Google Scholar | MathSciNet
Z.-f. Dai and B.-S. Tian, “Global convergence of some modified PRP nonlinear conjugate gradient methods,” Optimization Letters, vol. 5, no. 4, pp. 615–630, 2011.
View at: Publisher Site | Google Scholar | MathSciNet
N. Andrei, “An unconstrained optimization test functions collection,” Advanced Modeling and Optimization, vol. 10, no. 1, pp. 147–161, 2008.
View at: Google Scholar | MathSciNet
J. J. Moré, B. S. Garbow, and K. E. Hillstrom, “Testing unconstrained optimization software,” ACM Transactions on Mathematical Software, vol. 7, no. 1, pp. 17–41, 1981.
View at: Publisher Site | Google Scholar | MathSciNet
E. D. Dolan and J. J. Moré, “Benchmarking optimization software with performance profiles,” Mathematical Programming. A Publication of the Mathematical Programming Society, vol. 91, no. 2, Ser. A, pp. 201–213, 2002.
View at: Publisher Site | Google Scholar | MathSciNet
W. W. Hager and H. Zhang, “A survey of nonlinear conjugate gradient methods,” Pacific Journal of Optimization. An International Journal, vol. 2, no. 1, pp. 35–58, 2006.
View at: Google Scholar | MathSciNet
Z. Dai and F. Wen, “Another improved Wei-Yao-Liu nonlinear conjugate gradient method with sufficient descent property,” Applied Mathematics and Computation, vol. 218, no. 14, pp. 7421–7430, 2012.
View at: Publisher Site | Google Scholar | MathSciNet

Copyright

Copyright © 2017 Bakhtawar Baluch et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

2300

Downloads

1286

Citations