#### Abstract

To find a solution of unconstrained optimization problems, we normally use a conjugate gradient (CG) method since it does not cost memory or storage of second derivative like Newton’s method or Broyden–Fletcher–Goldfarb–Shanno (BFGS) method. Recently, a new modification of Polak and Ribiere method was proposed with new restart condition to give a so-call AZPRP method. In this paper, we propose a new modification of AZPRP CG method to solve large-scale unconstrained optimization problems based on a modification of restart condition. The new parameter satisfies the descent property and the global convergence analysis with the strong Wolfe-Powell line search. The numerical results prove that the new CG method is strongly aggressive compared with CG_Descent method. The comparisons are made under a set of more than 140 standard functions from the CUTEst library. The comparison includes number of iterations and CPU time.

#### 1. Introduction

The conjugate gradient (CG) method aims to find a solution of optimization problems without constraint. Suppose that the following optimization problem is considered:where is continuous, the function is differentiable, and the gradient is available. The iterative method is given by the following sequence:where is the starting point and is a step length. The search direction of the CG method is defined as follows:where and is a parameter.

To obtain the step length, we normally use the inexact line search, since the exact line search which is defined as follows,requires many iterations to obtain the step length. Normally, we use the strong version of Wolfe-Powell (SWP) [1, 2] line search which is given bywhere .

The weak Wolfe-Powell (WWP) line search is defined by (5) andwhere . The famous parameters of are the Hestenes–Stiefel (HS) [3], Fletcher**–**Reeves (FR) [4], and Polak**–**Ribière**–**Polyak (PRP) [5] formulas, which are given bywhere .

Powell [6] shows that there exists a nonconvex function such that the PRP method does not globally converge. Gilbert and Nocedal [7] show that if with the WWP and the descent property is satisfied, then it is globally convergent.

Al**-**Baali [8] proved that the CG method with FR coefficient is convergent with SWP line search when . Hager and Zhang [9, 10] presented a new CG parameter with descent property, i.e., . This formula is given as follows:where ; ; and is a constant. In the numerical experiments, they set in (9). Al**-**Baali et al. [11] compared with a new three-term CG method (G3TCG).

Regarding the speed, memory requirements, number of iterations, function evaluations, gradient evaluations, and robustness to solve unconstrained optimization problems which have prompted the development of the CG method, the readers are advised to refer references [10–15] for more information on these new formulas.

#### 2. The New Formula and the Algorithm

Alhawarat et al. [15] presented the following simple formula:

Dai and Laio [12] presented the following formula:where and

The new formula is a modification of and is defined as follows:where and .

We obtain the following relations (Algorithm 1):

#### 3. Convergence Analysis of Coefficient with CG Method

*Assumption 1. *(A)The level set is bounded, that is, a positive constant exists such that(B)In some neighbourhoods of , is continuous and the gradient is available and its gradient is Lipschitz continuous; that is, for all there exists a constant such thatThis assumption shows that there exists a positive constant such thatThe descent condition(17) plays an important role in the CG method. The sufficient descent condition proposed by Al-Baali [8] is a modification of (17) as follows:where . Note that the general form of the sufficient descent condition is (18) with

##### 3.1. Global Convergence for with the SWP Line Search

The following theorem demonstrates that ensures that the sufficient descent condition (21) is satisfied with the SWP line search.

The following theorem shows that satisfies the descent condition. The proof is similar to that presented in [8].

Theorem 1. *Let and be generated using (2), (3), and , where is computed by the SWP line search (5) and (6). If , then the sufficient descent condition (18) holds.*

Algorithm 1 shows the steps to obtain the solution of optimization problem using strong Wolfe-Powell line search.

Descent condition is (18) with c > 0.

*Proof. *By multiplying () by , we obtainDivide (19) by ; usingand (12), we obtainFrom (3), we obtain . Assume that it is true until i.e., for . Repeating the process for (21), we obtainAshence,and when , we obtain . Let , thenThe proof is complete.

Theorem 2. *Let and be obtained by using (2), (3), and where is computed by SWP line search (5) and (6), then the descent condition holds.*

*Proof. *By multiplying (3) by and substituting , we obtain

which completes the proof.

Zoutendijk [16] presented a useful lemma for global convergence property of the CG method. The condition is given as follows.

Lemma 1. *Let Assumption 1 hold and consider any method in the form of (2) and (3), where is obtained by the WWP line search (6) and (7), in which the search direction is descent. Then, the following condition holds: *

Theorem 3. *Suppose Assumption 1 holds. Consider any form of equations (2) and (3), with the new formula (12), in which is obtained from the SWP line search (5) and (6) with . Then,**The proof is similar to that presented in [8].*

*Proof. *We will prove the theorem by contradiction. Assume that the conclusion is not true, then a constant exists such thatSquaring both sides of equation (3), we obtainDivide (31) by , we getUsing (6), (12), and (32), we obtainRepeating the process for (33) and using the relationship yieldsFrom (33), we obtainTherefore,This result contradicts (32), thus . The proof is complete.

#### 4. Numerical Results

To investigate the effectiveness of the new parameter, several test problems in Table 1 from CUTEst [17] are chosen. We performed a comparison with the CG_Descent 5.3 based on the CPU time and the number of iterations. We employed the SWP line search with the line as mentioned in [1, 2] with and . The modified CG_Descent 6.8 where the memory (mem) equals zero is employed to obtain all results. The code can be downloaded from Hager web pagehttp://users.clas.ufl.edu/hager/papers/Software/.

The CG_Descent 5.3 results are obtained by run CG_Descent 6.8 with memory which equals zero. The host computer is an AMD A4-7210 with RAM 4 GB. The results are shown in Figures 1 and 2 in which a performance measure introduced by Dolan and More [18] was employed. As shown in Figure 1, formula A strongly outperforms over CG_Descent in number of iterations. In Figure 2, we notice that the new CG formula A is strongly competitive with CG_Descent.

##### 4.1. Multimodal Function with Its Graph

In this section, we present six-hump camel back function, which is a multimodal function to test the efficiency of the optimization algorithm. The function is defined as follows:

The number of variables (*n*) equals 2. This function has six local minima, with two of them being global. Thus, this function is a multimodal function usually used to test global minima. Global minima are and . The function value is . As its name describes, this function looks like the back of an upside down camel with six humps (see Figure 3 for a three-dimensional graph); for more information about two-dimensional functions, the reader can refer to [19].

Finally, note that CG method can be applied in image restoration problems and neural network and others. For more information, the reader can refer to [20, 21].

#### 5. Conclusions

In this study, a modified version of the CG algorithm (A) is suggested and its performance is investigated. The modified formula is restarted based on the value of the Lipchitz constant. The global convergence is established by using SWP line search. Our numerical results show that the new coefficient produces efficient and competitive results compared with other methods, such as CG_Descent 5.3. In the future, an application of the new version of CG method will be combined with feed-forward neural network (back-propagation (BP) algorithm) to improve the training process and produce fast training multilayer algorithm. This will help in reducing time needed to train neural network when the training samples are massive.

#### Data Availability

The data used to support the findings of this study are included within the article.

#### Conflicts of Interest

The authors declare that they have no conflicts of interest regarding the publication of this paper.

#### Acknowledgments

The authors would like to thank Universiti Malaysia Terengganu for supporting this work.