Journal of Mathematics

Volume 2017, Article ID 2715854, 12 pages

https://doi.org/10.1155/2017/2715854

## A New Modified Three-Term Conjugate Gradient Method with Sufficient Descent Property and Its Global Convergence

^{1}School of Informatics and Applied Mathematics, Universiti Malaysia Terengganu, 21030 Kuala Nerus, Terengganu, Malaysia^{2}Department of Mathematics, College of Science, Al-Isra University, Amman, Jordan

Correspondence should be addressed to Zabidin Salleh; ym.ude.tmu@nidibaz

Received 4 April 2017; Revised 30 June 2017; Accepted 3 August 2017; Published 13 September 2017

Academic Editor: Liwei Zhang

Copyright © 2017 Bakhtawar Baluch et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

A new modified three-term conjugate gradient (CG) method is shown for solving the large scale optimization problems. The idea relates to the famous Polak-Ribière-Polyak (PRP) formula. As the numerator of PRP plays a vital role in numerical result and not having the jamming issue, PRP method is not globally convergent. So, for the new three-term CG method, the idea is to use the PRP numerator and combine it with any good CG formula’s denominator that performs well. The new modification of three-term CG method possesses the sufficient descent condition independent of any line search. The novelty is that by using the Wolfe Powell line search the new modification possesses global convergence properties with convex and nonconvex functions. Numerical computation with the Wolfe Powell line search by using the standard test function of optimization shows the efficiency and robustness of the new modification.

#### 1. Introduction

The conjugate gradient method is an efficient and organized tool for solving the large-scale nonlinear optimization problem, due to its simplicity, easiness, and low memory requirements. This method is very popular for mathematician and engineers and those who are interested in solving the large-scale optimization problems [1–3].

Consider the unconstrained optimization problemwhere is continuously differentiable and its gradient is available . Generally CG method generates an iterative sequence defined bywhere is a line search and is a search direction defined by where the term is a scalar. There are six essential formulas for , which are stated as(Hestenes and Stiefel [4], 1952),(Fletcher and Reeves [5], 1964),(Polak et al. [6, 7], 1969),(Conjugate-Descent [8], 1997),(Liu and Storey [9], 1991), (Dai and Yuan [10], 2000).

Generally inexact line search is used in order to get the global convergence of conjugate gradient method, such as Wolfe line search or strong Wolfe line search; the Wolfe line search is given as where The strong Wolfe line search is computing such that Recently Alhawarat et al. [11, 12] and Alhawarat and Salleh [13, 14] have proposed an efficient and hybrid conjugate gradient method that satisfies the global convergence properties. To enhance the effectiveness of two-term conjugate gradient method, the three-term conjugate gradient has been widely studied and given much importance. The three-term conjugate gradient method attains different numerical outcomes, depending on how the scalar parameter is being selected. The papers by Beale [15], McGuire and Wolfe [16], Nazareth [17], Deng and Li [18], Dai and Yuan [19], Zhang et al. [20, 21], Cheng [22], Zhang et al. [23], Al-Bayati and Sharif [24], Narushima et al. [25], Andrei [26–28], Sugiki et al. [29], Al-Baali et al. [30], Babaie-Kafaki and Ghanbari [31], and Sun and Liu [32] presented different types of three-term conjugate gradient method along with their numerical performance and efficiency and proved their global convergence properties. As a comparison with classical conjugate gradient algorithms, the proposed three-term conjugate gradient algorithms are numerically strong, efficient, reliable, and robust compared to the classical conjugate gradient algorithms and Beale [15] was the first to propose the three-term conjugate gradient method.

In the new three-term modification, we put our attention on the numerator of PRP method, in which the parameter is given asThe PRP method is among one of the most efficient and reliable conjugate gradient method due to good numerical performance. The global convergence of PRP is established when the objective function is strongly convex and the line search is exact [6]. On the other hand Powell [33] through his analysis expressed that there exist nonconvex functions for which PRP method does not converge globally. Gilbert and Nocedal [34] established the so-called method; in this method is restricted to be nonnegative denoted as If the standard Wolfe line search (10) is used, thenmethod attains the global convergence and also sufficient descent conditions are being satisfied.

Recently Sun and Liu [32] proposed a new conjugate gradient method called TMPRP 1 method by using the VFR formula from Wei et al. [35], in which the search direction is stated aswhere or .

This method has attractive property of satisfying the sufficient descent condition independent of any line search and attains global convergence if standard Wolfe line is used. As compared with the strong Wolfe line search, the standard Wolfe line search takes less computation in order to get an acceptable step size at each iteration. Hence the standard Wolfe line search increases the effectiveness of the conjugate gradient method [32].

The rest of the paper is organized as follows. In Section 2, the motivation and formula for construction of three-term conjugate gradient method are given. In Section 3 we have presented Algorithm 1.1 in which the general form of three-term conjugate gradient method is shown. In Section 4 the sufficient descent condition and the global convergence properties for convex and nonconvex function are proven. In Section 5, the detailed numerical results to test the proposed method are reported.

#### 2. Motivation and Formula

Wei et al. [35] proposed three new formulas which are given in the following:There is an efficient conjugate gradient method named In this formula, the denominator plays an important role in satisfying the sufficient descent condition and performs well in terms of global convergence and numerical result. This motivated us to take the denominator from this formula. Secondly, the PRP [6, 7] method is considered to be one of the most proficient CG parameters due to the properties of its numerator . If the step taken becomes very small, then reaches zero such as . Afterwards ; then the search direction continued as the steepest descent method. So the numerator of PRP method worked efficiently and does not jam.

This motivated us to construct a new modified three-term conjugate gradient method, such asFurther, Powell [36] showed that the PRP method can cycle infinitely without approaching a minimum point, even if the step size is chosen to the least positive minimizer. To overcome this, Gilbert and Nocedal [34] showed their analysisSoIn , , and , the parameters and have an important role in the sense that when is getting smaller, the numbers of iteration, function evaluation, and gradient evaluation are decreased and when is getting larger, the numbers of iteration, function evaluation, and gradient evaluation are also decreased. So we observe that the best value for the parameters is .

#### 3. Algorithm 1.1

*Step 0.* Given an initial point , , , , and set , .

*Step 1.* If , where , then the algorithm stops; otherwise, go to Step 2.

(Note: all the norm we use in this paper means ).

*Step 2.* Compute the search direction (19) by using and where where .

*Step 3.* Determine the step size by the Wolfe line search (10).

*Step 4. *Compute where is given in Step 3 and is given in Step 2.

* Step 5.* Set and go to Step 1.

#### 4. Global Convergence of Modified Three Term

*Assumptions 1. *(A1) The level set is bounded.

(A2) In some neighborhood of , the gradient is Lipschitz continuous on an open convex set that contains ; that is, there exists a positive constant such thatthen Assumptions (A1) and (A2) and [32, 34] imply that there exist positive constants and such that

Since is decreasing as , from Assumption (A1) it is shown that the sequence created by Algorithm 1.1 will be contained in a bounded region. Then the sequence is convergent.

Now we will prove the sufficient descent condition independent of line search and also . From (19),Multiplying by , we obtain that is,Hence the sufficient descent condition independent of line search holds.

Now we prove that . As we have , by taking modulus on both sidesBy Schwarz inequality we have So,Thus we have

Lemma 1. *Assumptions (A1) and (A2) hold if is supposed to be an initial point. Now consider any method in the form of (2), in which is a descent direction and satisfies the Wolfe condition (10) or the strong Wolfe line search condition (11). Then we have the Zoutendijk condition: which is normally used to prove the global convergence of CG method. From (29) the Zoutendijk condition is equivalent to the following inequality:*

*Definition 2 (see [32]). *The function is called uniformly convex on , if there exists a positive constant such thatwhere the function has the Hessian matrix .

We now show the global convergence of Algorithm 1.1 for uniformly convex functions.

Lemma 3. *Let both sequences and be generated by Algorithm 1.1 and suppose that (32) holds; thenwhere , is a positive constant, and is a positive number whose range is , from the Wolfe line search (10).*

*Proof. *Detail of proof can be seen in Lemma of [37].

Theorem 4. *Suppose that Assumptions (A1) and (A2) hold and the function is uniformly convex; then *

*Proof. *From (15), (33), and (A2), we haveNow From (18), (33), and (A2), we have Combining (35) and (37) with (19),Now letting , we get ,So by (31), we get

We now show the global convergence for nonconvex functions.

Lemma 5. *Let Assumptions (A1) and (A2) hold. Consider the sequence () to be generated by Algorithm 1.1. If there exists a positive constant in such a way that for every ,where .*

*Proof. *Since , , and for every , then we get for every , so that is well defined. Ifthen we get . Also and are unit vectors, soas we know ,Now from (18), (22), and (A2) Now from (21), (22), and (45), there is a constant as follows:Therefore, from (31) and (46), we haveThis along with (44) completes the proof.

Theorem 6. *Suppose that Assumptions (A1) and (A2) hold. Then the sequence () generated by Algorithm 1.1 satisfies*

*Proof. *Suppose that the conclusion (48) is not true. Then there exists a positive constant in such a way that .

The proof has the following two parts.*Part 1.* We noticed that for any and we have , such thatProceeding the same proof of Theorem , step 1 from [32], we have*Part 2.* Taking a bound on the direction . Now from (19) and (46) we haveAt the beginning of proof we assume that ; then there exists a constant and also there exists such that . Then which contradicts with (A2), (31), and (51). Hence it is proved that .

#### 5. Numerical Results

In this part we compare the numerical results of proposed three-term BZAU (Bakhtawar, Zabidin, Ahmad and Ummu) method with recently developed TMPRP1 method and also compare their performance. The Wolfe line search (10) is used and the values of the parameters for BZAU and TMPRP1 method are* μ*= 2, 10

^{(−4)};

*= 1, 0;*

*η**ρ*= 0.1, 0.1; and

*σ*= 0.5, 0.5, respectively. The code was written in Matlab 7.1 and run on an i5 computer with 2.40 GHz CPU processor, 2.0 GB RAM memory. We test the functions taken from [38] with dimension ranges . The main purpose in optimization for the selection of large number of test functions is to test the unconstrained optimization algorithms properly. Dantzig (1914–2005) said the final test of a theory is its capacity to solve the problems which originated it. This is one of the main reasons we select the large-scale unconstrained optimization problems to test the theoretical progress in numerical form through mathematical programming [38].

Moré et al. [39] claimed the efficiency of a method and that algorithm for a small number of test functions is not suitable because this will lead to the choice of an algorithm that is not favorable. Testing a method or algorithm for a large number of test functions would lead to large amount of data and from that data we can interpret which method or algorithm is more efficient and robust. But the number of test functions should not be very large nor very small, so there is a benchmark of 75 numbers of test functions which are chosen to test the efficiency of any method.

Practically, optimizers need to evaluate nonlinear optimization method. To prove the global convergence properties of any method, the theory is not enough to determine the reliability and efficiency of a method. As a result, the robustness of any method is established by testing the large number of test problems [38].

In global convergence property is used in case of proving convex function and property is used for proving nonconvex function. But in the numerical part is used for a comparison with TMPRP1 method. The TMPRP1 possesses the sufficient descent property without any line searches. Theoretically, TMPRP1 method established well and converges globally. When it comes to numerical computation, the TMPRP1 method is tested by a benchmark of 75 numbers of test functions and shows a promising result. Hence the TMPRP1 method is then compared with our BZAU method.

In Table 1 number of iterations, number of function evaluations, number of gradient evaluations, and CPU time are represented by NI/NF/GE/CT. If the CT exceeds 500 seconds and the NI is more than 10000 iterations, then the function is given the name of Fail F. This standard is followed by every paper. For most of the function we can get the result within this limit and the function that does not come in this limit is named Fail F.