Solving Large-Scale Unconstrained Optimization Problems with an Efficient Conjugate Gradient Class

Bojari, Sanaz; Paripour, Mahmoud

doi:https://doi.org/10.1155/2024/5548724

Journal of Mathematics

On this page

Abstract Introduction Conclusion Data Availability Disclosure Conflicts of Interest Authors’ Contributions References Copyright Related Articles

Research Article | Open Access

Volume 2024 | Article ID 5548724 | https://doi.org/10.1155/2024/5548724

Solving Large-Scale Unconstrained Optimization Problems with an Efficient Conjugate Gradient Class

Sanaz Bojari¹and Mahmoud Paripour²

Academic Editor: Xian-Ming Gu

Received11 Sept 2023

Revised20 Dec 2023

Accepted19 Jan 2024

Published05 Feb 2024

Abstract

The main goal of this paper is to introduce an appropriate conjugate gradient class to solve unconstrained optimization problems. The presented class enjoys the benefits of having three free parameters, its directions are descent, and it can fulfill the Dai–Liao conjugacy condition. Global convergence property of the new class is proved under the weak-Wolfe–Powell line search technique. Numerical efficiency of the proposed class is confirmed in three sets of experiments including 210 test problems and 11 disparate conjugate gradient methods.

1. Introduction

In recent years, many iterative methods are developed to solve a large-scale unconstrained optimization problemwhere is a smooth function with Lipschitz continuous gradient . The process of an iterative optimization algorithm in iteration point is to find a descent direction and a step length and calculate the next iteration point as follows:

Usually, step lengths are considered if they fulfill the conditions of an inexact line search technique. A well-known example of such inexact line search techniques is the weak-Wolfe–Powell (WWP) technique.where , , , and [1]. Different inexact line search techniques are presented in [1] and three improvements of 3 are proposed by Bojari and Eslahchi [2], Yuan et al. [3], and Dai and Kou [4].

On the other hand, descent directions are obtained by the Newton-based methods for small problems, the quasi-Newton-based methods for medium size problems, and the gradient-based methods for large-scale problems [1].

Conjugate gradient (CG) method is one of the most popular gradient-based methods, which combines the negative of the gradient and some other available information to develop the next descent direction. Generally, the CG process can be summarized as follows:where is the scale parameter, and are the CG parameters, and is an arbitrary vector related to previous iterations.

In classic CG methods such as those of Hestenes–Stiefel [5], Fletcher–Reeves [6], Polak–Ribiére–Polyak [7, 8], Liu–Storey [9], and Dai–Yuan [10], only two parts and of direction (4) are considered, and parameter is defined as follows:where and parameter is determined as . Note that denotes the Euclidean norm of vectors.

Over the years, many researchers developed method (5) and increased their efficiency in theoretical and numerical views. For example, interested readers can see some modifications of the method in the study by Faramarzi and Amini [11] and Hu et al. [12], several combinations of the method in the work by Abubakar et al. [13] and Sakai and Iiduka [14], various developments of the method in the study by Mishra et al. [15], Wu [16], and Andrei [17], an extended method in [18], and variant improvements of the method in the study by Deepho et al. [19], Zhu et al. [20], and Jiang and Jian [21]. Furthermore, some researchers used techniques such as quasi-Newton [22, 23], regularization [24–26], a combination of above methods [27, 28], or alternative techniques [29, 30] and introduced appropriate CG methods to solve optimization problems. To discuss the CG methods in more detail, the readers can see [31].

In addition to their original authors, the issue of global convergence of method (5) has also been investigated by some researchers such as Al-Baali [32] and Gilbert and Nocedal [33].

As we mentioned before, one technique to develop a CG method is to think of a directionwhere and as a three-term CG direction [22, 23]. This point of view usually leads to appropriate behavior in numerical experiments.

Besides, it is known that the method has an excellent global convergence, which means that it generally solves more problems than other classic methods in equation (5). A well-known extension of the method is the three-term CG directionwhich is introduced by Andrei in [17]. It is established that direction (7) is descent and satisfies Dai and Liao [34] conjugacy condition. Also, the method is globally convergent under the WWP line search technique.

It is known that large-scale optimization problems have wide applications in science, engineering, transport, military, space technology [1, 35], artificial intelligence and image processing [12], risk managing [13, 19], and business and financial management [36, 37]. Furthermore, as we mentioned before, the CG methods are usually the best choices to solve a large-scale optimization problem. For these reasons, and also because of the excellent theoretical and numerical performance of methods (6) and (7), in this paper, we combine them and create a new class of three-term scaled conjugate gradient methods. We indicate that our class inherits all of the superb properties of methods (6) and (7). Furthermore, we illustrate the advantages of using the new class by running multitudinous numerical competitions [38].

The rest of this paper is organized as follows. In the next section, the new class of scaled three-term CG directions is introduced. Then, in Section 3, some properties of the presented class and the global convergence theorems are proved. Finally, the numerical results are presented in Section 4.

2. The Algorithm

In this section, we want to know what will happen if we consider a CG direction with denominators , similar to the method and equation (7), and numerators that contained parts , , and such as equation (6). Therefore, we first reckon the following direction:

Then, as we will show in the following, to confirm that our method satisfies the Dai–Liao conjugacy condition and to prove the global convergence theorems, we were forced to also consider the scaled coefficient of equation (7). So the structure of our directions becamewhich is actually a modification of equation (7).

In the end, to enjoy the benefits of free parameters, such as the possibility of creating a balance between the components of the direction and the possibility of choosing an appropriate method for different problems, we introduced our new CG class as follows:where and are three arbitrary constants.

In the rest of this article, for simplicity, we call direction (10) as class. The process of class is described in Algorithm 1.

	Input: An initial point and some constants , , , , and .
(1)	Set .
(2)	While do
(3)	if k = 0 then
(4)	Set .
(5)	else
(6)	Obtain by (10).
(7)	end
(8)	Calculate by (3).
(9)	Set .
(10)	Set .
(11)	end
	Output: The solution of problem (1).

One of the interesting features of the class is that its members fulfill the conjugacy condition of Dai and Liao [34] whenever . For example, in Algorithm 1, we use WWP line search technique (3), so the positiveness of is guaranteed. Therefore, from the definition of the class in equation (10) and conditions and , we have

Remark 1. Direction (7) is a member of the class, which can be established by setting and .

3. Convergence Theorems

To prove the global convergence of the class, we need Zoutendijk lemma [1] as well as the following common assumption.

Assumption 2. (1)The level set is bounded(2)In some neighborhood of , function is continuously differentiable and its gradient function is Lipschitz continuous

Lemma 3. (Zoutendijk lemma) Consider an iterative algorithm of form equation (2) and Assumption 2 and define as the angles between and , i.e.,

If directions are descent and step lengths are obtained from WWP condition (3), then

Proof. See Theorem 3.2 in [1].
Eventually, to have the global convergence of Algorithm 1, we will confirm two issues:(1)The directions in the class are descent(2)

Theorem 4. Suppose that Assumption 2 is true. Under the WWP line search technique, the directions of the class are descent.

Proof. From the definition of in equation (8), we have the following equation:Since by the WWP technique, we gain and also because and , the directions of the class are descent.

Theorem 5. Consider Assumption 2. For generated by Algorithm 1, we have the following equation:

Proof. We can rewrite the directions of the class as follows:withwhere is the identity matrix. Let us examine the three parts of matrix separately:(1)Part : since and (from WWP technique), this part is a positive definite diagonal matrix with eigenvalues far from zero(2)Part : this part is a skew-symmetric matrix, and therefore, its eigenvalues are purely imaginary or zero. Also, (3)Part : here, we have a rank one positive semidefinite matrix with a nonnegative coefficient From these three observations, it is obvious that the condition numbers of matrices , i.e., , are far from zero and their eigenvalues have positive real parts. On the other hand, for all , we have the following equation:which means that the square roots of matrices can be defined.
Now, if we set , from Kantorovich inequality [1], we obtain the following equation:Since is far from zero, the proof is complete.

Theorem 6. Under Assumption 2, for obtained from Algorithm 1, we have the following equation:

Proof. From Theorem 4 and inequality (9), we gain the following equation:Therefore from inequality (15) in Theorem 5, we obtain the following equation:which leads us to

4. Numerical Results

One important subject for an iterative method is how it performs numerically. To confirm the efficiency of the class in the structure of Algorithm 1, we create three sets of experiments. In all three sets, we perform the following:(1)Run our codes in MATLAB 9.5 and a computer (Intel i5-10400F, 2.90 GHz, and 8 GB memory) with Windows 10 operating system.(2)Terminate the algorithms whenever or the number of iterations exceed 4000 or the number of function evaluation exceed 20000. Note that in the last two cases, we say the algorithm is not successful.(3)We use the WWP line search technique in a bisection form similar to Algorithm 2.5.1 of [35], with , , and initial step lengths(4)Stop the loops of line search algorithm after 15 tries, to avoid an uphill search direction.(5)Select 42 test problems from [39], which are shown in Table 1, and consider them in five dimensions .(6)Compare the algorithms in four terms:(i): the number of iterations(ii): the number of function evaluations(iii): the number of gradient evaluations(iv): the CPU time in seconds(7)Apply Dolan and Moré method [40] to compare the algorithms. In their method, for a threshold , the probability function represents the percentage of problems that are solved by solver within a factor of the best solution. We call the graph of for all solvers a performance profile.

Moreover, for the first two sets, we test the class with 25 sets of randomly chosen parameters and consider the best one with , as our representative in the competitions. This means that although it is not possible to choose the optimal set of parameters for a problem, with a high probability, the users can be sure that any selected set of parameters will solve their problem with appropriate results.

In the first set of experiments, we compare our chosen candidate of the class, which is attained by setting , with classic methods (5). of this competition and the percent of problems that are solved by each algorithm are presented in Tables 2 and 3, respectively. In addition, the performance profiles of this competition are displayed in Figures 1 to 4. As we predicted, the method solved more problems than other classic methods, but its results are not good. Hence, in Figures 1 to 4, is usually the worst method at the beginning (for ) but gradually becomes the best one among the classic methods as the value of increases. On the other hand, method is the best one among the five classic methods (5). From Tables 2 and 3 and Figures 1 to 4, it is clear that our candidate of the class is the best method in this competition. Thus, we reach our goal of creating a method with excellent global convergence of and distinguished behavior of -like methods.

Remark 7. Since the three-term CG directions sometimes are more sensitive to round-off error than the two-term ones, in this set of experiments, we consider as a criterion to compare the round-off error of the classic method (5) with our chosen candidate of the class. The performance profile of the first set of experiments in term is presented in Figure 5. From Tables 2 and 3 and Figure 5, it is clear that our method solved more problems with less number of iterations and reached more accurate answers. So it seems that the class can control the round-off error properly.
For the second set of experiments, we consider seven newly developed CG methods in Table 4.(1): a descent two-term member of Dai and Liao family(2): a scaled three-term CG method(3): a three-term modification of the method(4): a three-term CG method(5): a two-term modification of the method(6): a hybrid two-term modification of the method(7): Algorithm 1 with our selected candidate of classThe results of this competition are demonstrated in Tables 5 and 6 and Figures 6–9.
Table 5 exhibits that in the structure of Algorithm 1, the representative of the class has solved 51.43, 41.43, and 45.71 percent of problems with the least number of algorithm iterations, function evaluations, and gradient evaluations, respectively. The method with the shortest CPU time is also . Figures 6–9 show that the method has behaved acceptably even in the cases where it was not the best method. Table 6indicates that our chosen member of the class has solved the largest number of problems (91.4286 percent which is about 31 percent more than the worst result) among the participating methods in this set of experiments. So, all the outcomes of the second set of experiments can easily show the advantages of class and therefore confirm its superiority.
Although Andrei has shown that finding the best CG method is one of the open problems in optimization [42], in our third set of experiments, we try to numerically investigate the effects of the parameters , or namely, parts and , in local and global convergence of Algorithm 1. To this aim, we consider three members of the class with , , and and call them , , and , respectively. Please note that the directions of are actually direction (7) multiplied in 0.1.
The results of the third set of experiments, which are displayed in Table 7, suggest that both parts and have improved local convergence and reduced the costs. In addition, it seems that part had more effect in improving local convergence, and part had more effect in improving global convergence.

Remark 8. It seems that the wonderful numerical results of the selected member of are due to the following reasons:(1)The coefficient which is known as scaling coefficient. This element controls the first part of a direction (part ).(2)The coefficient in the second part of the directions (part ). This element is inherited from equation (6), and it is one of the reasons for the appropriate behavior of -like directions in numerical experiments.(3)The denominator . This element usually leads to a global convergence for general functions. So, the algorithm presumably solves more problems.(4)The three free parameters , , and . These parameters create a balance between the components of the direction.

5. Conclusion

In this paper, we developed the new CG class by considering both and methods. In order to encourage the readers to use the class, we displayed that(1)The directions of the class satisfy the Dai–Liao conjugacy condition. So, it is indeed a method.(2)Its directions fulfill inequality . This means that under any line search technique which can guarantee , they are descent. In addition, if , then the directions of the class could be considered as sufficient descent.(3)Under WWP line search, the method is globally convergent, without any assumption (such as convexity) on .(4)Due to the presence of three free parameters, the class contains an infinite number of directions. Thus, the users can select an appropriate method according to their problems.(5)The method yields amazing results in numerical experiments because of its structure.

Data Availability

Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.

Disclosure

A preprint has previously been published (Bojari et al. in Research Square (2023)).

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Authors’ Contributions

All the authors contributed equally to this work.

References

J. Nocedal and S. J. Wright, Numerical Optimization, Springer Series in Operations Research, Springer, New York, NY, USA, 2nd edition, 2006.
S. Bojari and M. R. Eslahchi, “Global convergence of a family of modified BFGS methods under a modified weak-Wolfe-Powell line search for nonconvex functions,” International Journal of Operational Research, vol. 18, no. 2, pp. 219–244, 2020.
View at: Publisher Site | Google Scholar
G. Yuan, Z. Wei, and X. Lu, “Global convergence of BFGS and PRP methods under a modified weak Wolfe-Powell line search,” Applied Mathematical Modelling, vol. 47, no. 1, pp. 811–825, 2017.
View at: Publisher Site | Google Scholar
Y. H. Dai and C. X. Kou, “A nonlinear conjugate gradient algorithm with an optimal property and an improved Wolfe line search,” SIAM Journal on Optimization, vol. 23, no. 1, pp. 296–320, 2013.
View at: Publisher Site | Google Scholar
M. R. Hestenes and E. Stiefel, “Methods of conjugate gradients for solving linear systems,” Journal of Research of the National Bureau of Standards, vol. 49, no. 6, pp. 409–435, 1952.
View at: Publisher Site | Google Scholar
R. Fletcher and C. Reeves, “Function minimization by conjugate gradients,” The Computer Journal, vol. 7, no. 2, pp. 149–154, 1964.
View at: Publisher Site | Google Scholar
E. Polak and G. Ribiére, “Note sur la convergence de méthodes de directions conjuguées,” Revue Française D’Informatique Et De Recherche Opérationnelle Série Rouge, vol. 3, no. 16, pp. 35–43, 1969.
View at: Publisher Site | Google Scholar
B. T. Polyak, “The conjugate gradient method in extremal problems,” USSR Computational Mathematics and Mathematical Physics, vol. 9, no. 4, pp. 94–112, 1969.
View at: Publisher Site | Google Scholar
Y. Liu and C. Storey, “Efficient generalized conjugate gradient algorithms, part 1: theory,” Journal of Optimization Theory and Applications, vol. 69, no. 1, pp. 129–137, 1991.
View at: Publisher Site | Google Scholar
Y. H. Dai and Y. Yuan, “A nonlinear conjugate gradient method with a strong global convergence property,” SIAM Journal on Optimization, vol. 10, no. 1, pp. 177–182, 1999.
View at: Publisher Site | Google Scholar
P. Faramarzi and K. Amini, “A spectral three-term Hestenes–Stiefel conjugate gradient method,” International Journal of Operational Research, vol. 19, no. 1, pp. 71–92, 2021.
View at: Publisher Site | Google Scholar
W. Hu, J. Wu, and G. Yuan, “Some modified Hestenes–Stiefel conjugate gradient algorithms with application in image restoration,” Applied Numerical Mathematics, vol. 158, no. 1, pp. 360–376, 2020.
View at: Publisher Site | Google Scholar
A. B. Abubakar, P. Kumam, M. Malik, P. Chaipunya, and A. H. Ibrahim, “A hybrid FR–DY conjugate gradient algorithm for unconstrained optimization with application in portfolio selection,” AIMS Mathematics, vol. 6, no. 6, pp. 6506–6527, 2021.
View at: Publisher Site | Google Scholar
H. Sakai and H. Iiduka, “Sufficient descent Riemannian conjugate gradient methods,” Journal of Optimization Theory and Applications, vol. 190, no. 1, pp. 130–150, 2021.
View at: Publisher Site | Google Scholar
S. K. Mishra, S. K. Chakraborty, M. E. Samei, and B. Ram, “A q-Polak–Ribière–Polyak conjugate gradient algorithm for unconstrained optimization problems,” Journal of Inequalities and Applications, vol. 2021, no. 1, pp. 25–29, 2021.
View at: Publisher Site | Google Scholar
Y. Wu, “A modified three-term PRP conjugate gradient algorithm for optimization models,” Journal of Inequalities and Applications, vol. 2017, no. 1, p. 97, 2017.
View at: Publisher Site | Google Scholar
N. Andrei, “A modified Polak-Ribière-Polyak conjugate gradient algorithm for unconstrained optimization,” Optimization, vol. 60, no. 12, pp. 1457–1471, 2011.
View at: Publisher Site | Google Scholar
A. Alhawarat, H. Alolaiyan, I. A. Masmali, Z. Salleh, and S. Ismail, “A descent four-term of Liu and Storey conjugate gradient method for large scale unconstrained optimization problems,” European Journal of Pure and Applied Mathematics, vol. 14, no. 4, pp. 1429–1456, 2021.
View at: Publisher Site | Google Scholar
J. Deepho, A. B. Abubakar, M. Malik, and I. K. Argyros, “Solving unconstrained optimization problems via hybrid CD–DY conjugate gradient methods with applications,” Journal of Computational and Applied Mathematics, vol. 405, no. 1, Article ID 113823, 2022.
View at: Publisher Site | Google Scholar
Z. Zhu, D. Zhang, and S. Wang, “Two modified DY conjugate gradient methods for unconstrained optimization problems,” Applied Mathematics and Computation, vol. 373, no. 1, Article ID 125004, 2020.
View at: Publisher Site | Google Scholar
X. Jiang and J. Jian, “Improved Fletcher–Reeves and Dai–Yuan conjugate gradient methods with the strong Wolfe line search,” Journal of Computational and Applied Mathematics, vol. 348, no. 1, pp. 525–534, 2019.
View at: Publisher Site | Google Scholar
S. Bojari and M. R. Eslahchi, “Two families of scaled three-term conjugate gradient methods with sufficient descent property for nonconvex optimization,” Numerical Algorithms, vol. 83, no. 3, pp. 901–933, 2020.
View at: Publisher Site | Google Scholar
M. R. Eslahchi and S. Bojari, “Global convergence of a new sufficient descent spectral three-term conjugate gradient class for large-scale optimization,” Optimization Methods and Software, vol. 37, no. 3, pp. 830–843, 2022.
View at: Publisher Site | Google Scholar
W. W. Hager and H. C. Zhang, “The limited memory conjugate gradient method,” SIAM Journal on Optimization, vol. 23, no. 4, pp. 2150–2168, 2013.
View at: Publisher Site | Google Scholar
T. G. Woldu, H. Zhang, X. Zhang, and Y. H. Fissuh, “A modified nonlinear conjugate gradient algorithm for large-scale nonsmooth convex optimization,” Journal of Optimization Theory and Applications, vol. 185, no. 1, pp. 223–238, 2020.
View at: Publisher Site | Google Scholar
T. Zhao, H. Liu, and Z. Liu, “New subspace minimization conjugate gradient methods based on regularization model for unconstrained optimization,” Numerical Algorithms, vol. 87, no. 4, pp. 1501–1534, 2021.
View at: Publisher Site | Google Scholar
I. Arzuka, M. A. Bakar, and W. Leong, “A scaled three-term conjugate gradient method for unconstrained optimization,” Journal of Inequalities and Applications, vol. 2016, no. 325, pp. 1–16, 2016.
View at: Publisher Site | Google Scholar
W. Sun, H. Liu, and Z. Liu, “A class of accelerated subspace minimization conjugate gradient methods,” Journal of Optimization Theory and Applications, vol. 190, no. 3, pp. 811–840, 2021.
View at: Publisher Site | Google Scholar
P. Huang, H. Z. Huang, Y. F. Li, and H. M. Qian, “An efficient and robust structural reliability analysis method with mixed variables based on hybrid conjugate gradient direction,” International Journal for Numerical Methods in Engineering, vol. 122, no. 8, pp. 1990–2004, 2021.
View at: Publisher Site | Google Scholar
J. Liu, Y. Feng, and L. Zou, “Some three-term conjugate gradient methods with the inexact line search condition,” Calcolo, vol. 55, no. 2, p. 16, 2018.
View at: Publisher Site | Google Scholar
P. S. Stanimirović, B. Ivanov, H. Ma, and D. Mosić, “A survey of gradient methods for solving nonlinear optimization,” Electronic Research Archive, vol. 28, no. 4, pp. 1573–1624, 2020.
View at: Publisher Site | Google Scholar
M. Al-Baali, “Descent property and global convergence of the Fletcher-Reeves method with inexact line search,” IMA Journal of Numerical Analysis, vol. 5, no. 1, pp. 121–124, 1985.
View at: Publisher Site | Google Scholar
J. Gilbert and J. Nocedal, “Global convergence properties of conjugate gradient methods for optimization,” SIAM Journal on Optimization, vol. 2, no. 1, pp. 21–42, 1992.
View at: Publisher Site | Google Scholar
Y. -H Dai and L. -Z Liao, “New conjugacy conditions and related nonlinear conjugate gradient methods,” Applied Mathematics and Optimization, vol. 43, no. 1, pp. 87–101, 2001.
View at: Publisher Site | Google Scholar
W. Sun and Y. Yuan, Optimization Theory and Methods: Nonlinear Programming, Springer Series in Operations Research, Springer, New York, NY, USA, 2006.
Z. F. Dai, T. Li, and M. Yang, “Forecasting stock return volatility: the role of shrinkage approaches in a data-rich environment,” Journal of Forecasting, vol. 41, no. 5, pp. 980–996, 2022.
View at: Publisher Site | Google Scholar
Z. F. Dai, H. Zhu, and X. Zhang, “Dynamic spillover effects and portfolio strategies between crude oil, gold and Chinese stock markets related to new energy vehicle,” Energy Economics, vol. 109, no. 1, Article ID 105959, 2022.
View at: Publisher Site | Google Scholar
S. Bojari and M. Paripour, Solving Large Scale Unconstrained Optimization Problems with an Efficient Conjugate Gradient Class, Researchsquare, England, UK, 2023.
N. Andrei, “An unconstrained optimization test functions collection,” Advanced Modeling and Optimization, vol. 10, no. 1, pp. 147–161, 2008.
View at: Google Scholar
E. Dolan and J. Moré, “Benchmarking optimization software with performance profiles,” Mathematical Programming, vol. 91, no. 2, pp. 201–213, 2002.
View at: Publisher Site | Google Scholar
W. W. Hager and H. C. Zhang, “A new conjugate gradient method with guaranteed descent and an efficient line search,” SIAM Journal on Optimization, vol. 16, no. 1, pp. 170–192, 2005.
View at: Publisher Site | Google Scholar
N. Andrei, “Open problems in nonlinear conjugate gradient algorithms for unconstrained optimization,” Bulletin of the Malaysian Mathematical Sciences Society, vol. 34, no. 2, pp. 319–330, 2011.
View at: Google Scholar

Copyright

Copyright © 2024 Sanaz Bojari and Mahmoud Paripour. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

123

Downloads

149

Citations