Advances in Barycentric Interpolation Methods and their ApplicationsView this Special Issue
An Efficient Modified AZPRP Conjugate Gradient Method for Large-Scale Unconstrained Optimization Problem
To find a solution of unconstrained optimization problems, we normally use a conjugate gradient (CG) method since it does not cost memory or storage of second derivative like Newton’s method or Broyden–Fletcher–Goldfarb–Shanno (BFGS) method. Recently, a new modification of Polak and Ribiere method was proposed with new restart condition to give a so-call AZPRP method. In this paper, we propose a new modification of AZPRP CG method to solve large-scale unconstrained optimization problems based on a modification of restart condition. The new parameter satisfies the descent property and the global convergence analysis with the strong Wolfe-Powell line search. The numerical results prove that the new CG method is strongly aggressive compared with CG_Descent method. The comparisons are made under a set of more than 140 standard functions from the CUTEst library. The comparison includes number of iterations and CPU time.
The conjugate gradient (CG) method aims to find a solution of optimization problems without constraint. Suppose that the following optimization problem is considered:where is continuous, the function is differentiable, and the gradient is available. The iterative method is given by the following sequence:where is the starting point and is a step length. The search direction of the CG method is defined as follows:where and is a parameter.
To obtain the step length, we normally use the inexact line search, since the exact line search which is defined as follows,requires many iterations to obtain the step length. Normally, we use the strong version of Wolfe-Powell (SWP) [1, 2] line search which is given bywhere .
The weak Wolfe-Powell (WWP) line search is defined by (5) andwhere . The famous parameters of are the Hestenes–Stiefel (HS) , Fletcher–Reeves (FR) , and Polak–Ribière–Polyak (PRP)  formulas, which are given bywhere .
Powell  shows that there exists a nonconvex function such that the PRP method does not globally converge. Gilbert and Nocedal  show that if with the WWP and the descent property is satisfied, then it is globally convergent.
Al-Baali  proved that the CG method with FR coefficient is convergent with SWP line search when . Hager and Zhang [9, 10] presented a new CG parameter with descent property, i.e., . This formula is given as follows:where ; ; and is a constant. In the numerical experiments, they set in (9). Al-Baali et al.  compared with a new three-term CG method (G3TCG).
Regarding the speed, memory requirements, number of iterations, function evaluations, gradient evaluations, and robustness to solve unconstrained optimization problems which have prompted the development of the CG method, the readers are advised to refer references [10–15] for more information on these new formulas.
2. The New Formula and the Algorithm
Alhawarat et al.  presented the following simple formula:
Dai and Laio  presented the following formula:where and
The new formula is a modification of and is defined as follows:where and .
We obtain the following relations (Algorithm 1):
3. Convergence Analysis of Coefficient with CG Method
Assumption 1. (A)The level set is bounded, that is, a positive constant exists such that(B)In some neighbourhoods of , is continuous and the gradient is available and its gradient is Lipschitz continuous; that is, for all there exists a constant such thatThis assumption shows that there exists a positive constant such thatThe descent condition(17) plays an important role in the CG method. The sufficient descent condition proposed by Al-Baali  is a modification of (17) as follows:where . Note that the general form of the sufficient descent condition is (18) with
3.1. Global Convergence for with the SWP Line Search
The following theorem demonstrates that ensures that the sufficient descent condition (21) is satisfied with the SWP line search.
The following theorem shows that satisfies the descent condition. The proof is similar to that presented in .
Algorithm 1 shows the steps to obtain the solution of optimization problem using strong Wolfe-Powell line search.
Descent condition is (18) with c > 0.
Proof. By multiplying () by , we obtainDivide (19) by ; usingand (12), we obtainFrom (3), we obtain . Assume that it is true until i.e., for . Repeating the process for (21), we obtainAshence,and when , we obtain . Let , thenThe proof is complete.
Proof. By multiplying (3) by and substituting , we obtain
which completes the proof.
Zoutendijk  presented a useful lemma for global convergence property of the CG method. The condition is given as follows.
Lemma 1. Let Assumption 1 hold and consider any method in the form of (2) and (3), where is obtained by the WWP line search (6) and (7), in which the search direction is descent. Then, the following condition holds:
Theorem 3. Suppose Assumption 1 holds. Consider any form of equations (2) and (3), with the new formula (12), in which is obtained from the SWP line search (5) and (6) with . Then,The proof is similar to that presented in .
Proof. We will prove the theorem by contradiction. Assume that the conclusion is not true, then a constant exists such thatSquaring both sides of equation (3), we obtainDivide (31) by , we getUsing (6), (12), and (32), we obtainRepeating the process for (33) and using the relationship yieldsFrom (33), we obtainTherefore,This result contradicts (32), thus . The proof is complete.
4. Numerical Results
To investigate the effectiveness of the new parameter, several test problems in Table 1 from CUTEst  are chosen. We performed a comparison with the CG_Descent 5.3 based on the CPU time and the number of iterations. We employed the SWP line search with the line as mentioned in [1, 2] with and . The modified CG_Descent 6.8 where the memory (mem) equals zero is employed to obtain all results. The code can be downloaded from Hager web pagehttp://users.clas.ufl.edu/hager/papers/Software/.
The CG_Descent 5.3 results are obtained by run CG_Descent 6.8 with memory which equals zero. The host computer is an AMD A4-7210 with RAM 4 GB. The results are shown in Figures 1 and 2 in which a performance measure introduced by Dolan and More  was employed. As shown in Figure 1, formula A strongly outperforms over CG_Descent in number of iterations. In Figure 2, we notice that the new CG formula A is strongly competitive with CG_Descent.
4.1. Multimodal Function with Its Graph
In this section, we present six-hump camel back function, which is a multimodal function to test the efficiency of the optimization algorithm. The function is defined as follows:
The number of variables (n) equals 2. This function has six local minima, with two of them being global. Thus, this function is a multimodal function usually used to test global minima. Global minima are and . The function value is . As its name describes, this function looks like the back of an upside down camel with six humps (see Figure 3 for a three-dimensional graph); for more information about two-dimensional functions, the reader can refer to .
In this study, a modified version of the CG algorithm (A) is suggested and its performance is investigated. The modified formula is restarted based on the value of the Lipchitz constant. The global convergence is established by using SWP line search. Our numerical results show that the new coefficient produces efficient and competitive results compared with other methods, such as CG_Descent 5.3. In the future, an application of the new version of CG method will be combined with feed-forward neural network (back-propagation (BP) algorithm) to improve the training process and produce fast training multilayer algorithm. This will help in reducing time needed to train neural network when the training samples are massive.
The data used to support the findings of this study are included within the article.
Conflicts of Interest
The authors declare that they have no conflicts of interest regarding the publication of this paper.
The authors would like to thank Universiti Malaysia Terengganu for supporting this work.
E. Stiefel, “Methods of conjugate gradients for solving linear systems,” Journal of Research of the National Bureau of Standards, vol. 49, pp. 409–435, 1952.View at: Google Scholar
M. J. Powell, “Nonconvex minimization calculations and the conjugate gradient method,” in Numerical Analysis, pp. 122–141, Springer, Berlin, Heidelberg, 1984.View at: Google Scholar
G. Zoutendijk, Nonlinear Programming, Computational Methods, Integer and Nonlinear Programming, North Holland, Amsterdam, 1970.
G. Yuan, J. Lu, and Z. Wang, “The modified PRP conjugate gradient algorithm under a non-descent line search and its application in the Muskingum model and image restoration problems,” Soft Computing, vol. 25, no. 8, pp. 5867–5879, 2021.View at: Google Scholar