Machine Learning and its Applications in Image RestorationView this Special Issue
Research Article | Open Access
Shengwei Yao, Yuping Wu, Jielan Yang, Jieqiong Xu, "A Three-Term Gradient Descent Method with Subspace Techniques", Mathematical Problems in Engineering, vol. 2021, Article ID 8867309, 7 pages, 2021. https://doi.org/10.1155/2021/8867309
A Three-Term Gradient Descent Method with Subspace Techniques
We proposed a three-term gradient descent method that can be well applied to address the optimization problems in this article. The search direction of the obtained method is generated in a specific subspace. Specifically, a quadratic approximation model is applied in the process of generating the search direction. In order to reduce the amount of calculation and make the best use of the existing information, the subspace was made up of the gradient of the current and prior iteration point and the previous search direction. By using the subspace-based optimization technology, the global convergence result is established under Wolfe line search. The results of numerical experiments show that the new method is effective and robust.
Gradient descent and conjugate gradient (CG) methods have profound significance for dealing with unconstrained optimization problems:where is a continuous-differential function. They are widely used because of the low storage requirements and the strong convergence property. Starting from an initial iteration point , in each iteration, the method produces an approximate solution sequence of (1):in which is the current iteration point, is the step length, and is the search direction that usually has the following form (for typical CG methods):where is the gradient of at and is the CG parameter. The step-size is usually determined by performing a specific line search, among which Wolfe line search [1, 2] is one of the most commonly used:andwhere . Sometimes, the strong Wolfe line search given in (4) andis also widely used in the CG method for the establishment of convergence results. Different choices of form different versions of the CG methods. The research results of conjugate gradient algorithm are very rich, including PRP, HS, LS, and DY [3–10]. For a detailed survey of conjugate gradient methods, refer to .
Recently, subspace technique has attracted more and more researchers’s attention. Various subspace techniques are applied to construct different approaches to deal with various optimization problems. For detailed description of subspace technique, refer to [12–14]. In , Yuan and Stoer came up with the SMCG method by using subspace technique in the conjugate gradient method. Specifically, a two-dimensional subspace is using in , to determine the search direction, that is,where and are parameters and . In [16, 17], Li et al. obtained the values of and by minimizing the dynamically selected conic approximate model and tensor model, respectively. In order to solve nonlinear optimization problems, Conn et al.  brought up a class of iterated-subspace minimization methods. Similar to Yuan’s idea , Yang et al.  raised a new CG method in the same subspace .
For better convergence and robust numerical performance, some authors pay attention to study conjugate gradient methods with different forms. Zhang et al.  presented a three-term conjugate gradient method, which possesses sufficient descent property without line search. And, they further extended two-variant method and established the global convergence results for general function under the standard Wolfe line search. In , Narushima et al. proposed a specific three-term CG method, drawing on the idea of multistep quasi-Newton method. Deng and Wan  put forward a three-term CG algorithm, in which the direction is formed by , , and . For some other three-term conjugate gradient methods, refer to [23–27].
The outline of the article is shown below. In Section 2, we use the technique of subspace minimization to derive the search direction on the subspace constituted by . We elaborate the proposed algorithm in Section 3. Section 4 provides the convergence analysis of the given algorithm under some suitable conditions. In Section 5, the numerical experiments compared with other methods are presented.
2. Derivation of the Search Direction
Different from the abovementioned subspace , in this paper, the following subspaceis used to construct the search direction. In order to simplify the calculation process, for each iteration point , the following quadratic model is used to approximate the function where can be viewed as an approximation of Hessian matrix. Moreover, is supposed to be positive definite and meet quasi-Newton equation . Model has two advantages, which is not only a good approximation of  but also convenient to minimize in the subspace . Since and are usually not collinear, we discuss the case of dimensions 3 and 2. Case I. dim () = 3. In this case, the direction has the form where are parameters to be calculated. Adding (10) to the minimizing problem (9), we get the following minimizing problem: then and are the solutions of the following linear system: To solve system (12), we need to estimate two quantities and . Inspired by BBCG , we adopt Next, we choose the BFGS [29–32] update initialized with the identity matrix to compute : This choice not only retains useful information but also gives numerical results that perform better than the scaling matrix . Substituting (13) and (14) into the linear system (12), we get Combining (13), the determinant of the system (15) is calculated as Then, and are obtained as Case II. dim() = 2. By observing numerical experiments, we find that produces better numerical results. In this case, we chose as the subspace. The direction is expressed as Substituting (18) into (9), we get Then, the solution of (19) can be expressed as Therefore, If it is an exact line search, namely, , then, from (17) and (21), we get which means , and the obtained method is degraded to the HS method (Hestenes and Stiefel) .
3. The Proposed TCGS Algorithm
4. Convergence Analysis
The main content of this section is to study the convergence properties of the TCGS algorithm. Some necessary assumptions for the objective function are given as follows.
Assumption 1. The level set is bounded.
Assumption 2. Suppose that is continuous-differential, and its gradient is Lipschitz continuous with the constant .,Based on the above assumptions, it can be showed that there exists a constant such thatIt is well known that, for optimization algorithm, the relevant properties of the search direction are very important for the convergence of the algorithm. Firstly, we will study some properties of the search direction generated by the TCGS algorithm. The following lemmas show that the direction is decreasing, and Dai–Liao conjugacy condition is satisfied.
Lemma 1. Suppose that . Then, generated by the TCGS algorithm is a descent direction.
Proof. According to (9), we can get . Since and is generated by the TCGS algorithm, . Furthermore,
Lemma 2. The search direction satisfies the Dai–Liao conjugacy condition with , namely, .
Lemma 5. Suppose that the objective function satisfies Assumptions 1 and 2, consider the algorithm TCGS in which the strong Wolfe line search (4) and (6) is used to compute the step size . Ifholds, then
Theorem 1. Under the Assumptions 1 and 2, consider the sequence generated by algorithm TSGS, satisfies the conditions (4) and (6). For uniformly convex function , i.e., there exists a constant such that
Then, (31) holds.
Proof. From (23) and (32), we get, respectively,On the basis of (33), combined with Cauchy inequality, it follows thatFurthermore, add the triangle inequality and (24), and we haveNext, we analyze the convergence in two cases: Case I. dim() = 3. and are computed by (16), and using (24), (33), (35), and (36), we get Similarly, we obtain Therefore, Case II. dim() = 2. Using (20), (24), and (33), we obtain Therefore, From Lemma 3, (31) holds.
5. Numerical Results
This section aims to observe the performance of the TCGS algorithm through numerical experiments and verify the effectiveness of the algorithm in dealing with unconstrained problems. For experimental purposes, we compared the numerical performance of TCGS with the PRP and MTHREECG  methods, in which MTHREECG has a structure similar to CGS, and PRP is a classic and effective CG method.
In , Deng and Wan presented a similar three-term conjugate gradient method (MTHREECG) with the following form:
We chose a total of 75 test functions, and all the test functions come from . The dimension of each function is from 1000, 2000, … to 10000 for the numerical experiments. The code is written by Fortran and available at https://camo.ici.ro/neculai/THREECG/threecg.for . The default parameter values of the algorithm are consistent with .
The performance profile introduced by Dolan and Moré  is one of the most used tools for evaluating the performance of different methods. In this paper, we use it to investigate the numerical performance of the proposed TCGS algorithm.
The performance profile for the number of iterations can be seen in Figure 1. Looking at Figure 1, it can be found that the TCGS algorithm can solve about of the test problems with the least iterations. With the increasing factor , TCGS method outperforms both MTHREECG and PRP methods. Figure 2 represents the performance profile with respect to the number of function evaluations. The result shows the similar performance with the result in Figure 1, in which TCGS also outperforms both MTHREECG and PRP methods. From the numerical experiments, it can be seen that the proposed TCGS method is efficient and robust in dealing with a set of unconstrained test problems.
By using the idea of subspace minimization, we come up with a new type of subspace gradient descent method in this paper. In our method, the subspace is spanned by , and the quadratic approximation of the objective function is minimized to obtain the search direction. Therefore, the direction has the form , where the values of and are computed by discussing in two cases. In addition, we prove the descent property of the direction and show that Dai–Liao conjugacy condition is satisfied. Under the Wolf line search, the convergence of the proposed TCGS algorithm is established. Observing the numerical results, the performance of the TCGS algorithm in a set of unconstrained problems is competitive.
The data used to support the findings of this study are included within the article.
Conflicts of Interest
The authors declare that there are no conflicts of interest.
This research was supported by the Guangxi Natural Science Foundation (nos. 2018GXNSFAA281340 and 2020GXNSFAA159014) and Program for Innovative Team of Guangxi University of Finance and Economics.
- P. Wolfe, “Convergence conditions for ascent methods,” SIAM Review, vol. 11, no. 2, pp. 226–235, 1969.
- P. Wolfe, “Convergence conditions for ascent methods. II: some corrections,” SIAM Review, vol. 13, no. 2, pp. 185–188, 1971.
- B. T. Polyak, “The conjugate gradient method in extremal problems,” USSR Computational Mathematics and Mathematical Physics, vol. 9, no. 4, pp. 94–112, 1969.
- M. R. Hestenes and E. Stiefel, Methods of Conjugate Gradients for Solving Linear systems, NBS, Washington, DC, USA, 1952.
- Y. Liu and C. Storey, “Efficient generalized conjugate gradient algorithms, part 1: Theory,” Journal of Optimization Theory and Applications, vol. 69, no. 1, pp. 129–137, 1991.
- Y. H. Dai and Y. Yuan, “A nonlinear conjugate gradient method with a strong global convergence property,” SIAM Journal on Optimization, vol. 10, no. 1, pp. 177–182, 1999.
- Z. Wei, S. Yao, and L. Liu, “The convergence properties of some new conjugate gradient methods,” Applied Mathematics and Computation, vol. 183, no. 2, pp. 1341–1350, 2006.
- G. Yuan, Z. Meng, and Y. Li, “A modified Hestenes and stiefel conjugate gradient algorithm for large-scale nonsmooth minimizations and nonlinear equations,” Journal of Optimization Theory and Applications, vol. 168, no. 1, pp. 129–152, 2016.
- G. Yuan, Z. Wei, and X. Lu, “Global convergence of BFGS and PRP methods under a modified weak Wolfe-Powell line search,” Applied Mathematical Modelling, vol. 47, pp. 811–825, 2017.
- G. Yuan, Z. Wei, and Y. Yang, “The global convergence of the Polak-Ribière-Polyak conjugate gradient algorithm under inexact line search for nonconvex functions,” Journal of Computational and Applied Mathematics, vol. 362, pp. 262–275, 2019.
- W. W. Hager and H. Zhang, “A survey of nonlinear conjugate gradient methods,” Pacific Journal of Optimization, vol. 2, no. 1, pp. 35–58, 2006.
- Y.-X. Yuan, “Subspace techniques for nonlinear optimization,” Series in Contemporary Applied Mathematics, Chinese Academy of Sciences, Beijing, China, 2007.
- Y.-X. Yuan, “Subspace methods for large scale nonlinear equations and nonlinear least squares,” Optimization and Engineering, vol. 10, no. 2, pp. 207–218, 2009.
- Y. Yuan, “A review on subspace methods for nonlinear optimization,” in Proceedings of the International Congress of Mathematics, pp. 807–827, Seoul, South Korea, August 2014.
- Y.-X. Yuan and J. Stoer, “A subspace study on conjugate gradient algorithms,” ZAMM-Journal of Applied Mathematics and Mechanics/Zeitschrift für Angewandte Mathematik und Mechanik, vol. 75, no. 1, pp. 69–77, 1995.
- Y. Li, Z. Liu, and H. Liu, “A subspace minimization conjugate gradient method based on conic model for unconstrained optimization,” Computational and Applied Mathematics, vol. 38, no. 1, p. 16, 2019.
- T. Wang, Z. Liu, and H. Liu, “A new subspace minimization conjugate gradient method based on tensor model for unconstrained optimization,” International Journal of Computer Mathematics, vol. 96, no. 10, pp. 1924–1942, 2019.
- A. Conn, N. Gould, A. Sartenaer, and P. Toint, “On iterated-subspace minimization methods for nonlinear optimization,” in Proceedings of the Linear and Nonlinear Conjugate Gradient-Related Methods, Seattle, WA, USA, January 1996.
- Y. Yang, Y. Chen, and Y. Lu, “A subspace conjugate gradient algorithm for large-scale unconstrained optimization,” Numerical Algorithms, vol. 76, no. 3, pp. 813–828, 2017.
- L. Zhang, W. Zhou, and D. Li, “Some descent three-term conjugate gradient methods and their global convergence,” Optimization Methods and Software, vol. 22, no. 4, pp. 697–711, 2007.
- Y. Narushima, H. Yabe, and J. A. Ford, “A three-term conjugate gradient method with sufficient descent property for unconstrained optimization,” SIAM Journal on Optimization, vol. 21, no. 1, pp. 212–230, 2011.
- S. Deng and Z. Wan, “A three-term conjugate gradient algorithm for large-scale unconstrained optimization problems,” Applied Numerical Mathematics, vol. 92, pp. 70–81, 2015.
- N. Andrei, “A new three-term conjugate gradient algorithm for unconstrained optimization,” Numerical Algorithms, vol. 68, no. 2, pp. 305–321, 2015.
- S. Yao and L. Ning, “An adaptive three-term conjugate gradient method based on self-scaling memoryless BFGS matrix,” Journal of Computational and Applied Mathematics, vol. 332, pp. 72–85, 2018.
- H. Kobayashi, Y. Narushima, and H. Yabe, “Descent three-term conjugate gradient methods based on secant conditions for unconstrained optimization,” Optimization Methods and Software, vol. 32, no. 6, pp. 1313–1329, 2017.
- X. Y. Wang, S. J. Li, and X. P. Kou, “A self-adaptive three-term conjugate gradient method for monotone nonlinear equations with convex constraints,” Calcolo, vol. 53, no. 2, pp. 133–145, 2016.
- P. Gao, C. He, and Y. Liu, “An adaptive family of projection methods for constrained monotone nonlinear equations with applications,” Applied Mathematics and Computation, vol. 359, pp. 1–16, 2019.
- Y. Dai and C. Kou, “A Barzilai-Borwein conjugate gradient method,” Science China Mathematics, vol. 59, no. 8, pp. 1511–1524, 2016.
- C. G. Broyden, “The convergence of a class of double-rank minimization algorithms 1. General considerations,” IMA Journal of Applied Mathematics, vol. 6, no. 1, pp. 76–90, 1970.
- R. Fletcher, “A new approach to variable metric algorithms,” The Computer Journal, vol. 13, no. 3, pp. 317–322, 1970.
- D. Goldfarb, “A family of variable-metric methods derived by variational means,” Mathematics of Computation, vol. 24, no. 109, p. 23, 1970.
- D. F. Shanno, “Conditioning of quasi-Newton methods for function minimization,” Mathematics of Computation, vol. 24, no. 111, p. 647, 1970.
- N. Andrei, “Acceleration of conjugate gradient algorithms for unconstrained optimization,” Applied Mathematics and Computation, vol. 213, no. 2, pp. 361–369, 2009.
- G. Zoutendijk, “Nonlinear programming, computational methods,” Integer and Nonlinear Programming, pp. 37–86, 1970.
- N. Andrei, “An unconstrained optimization test functions collection,” Advanced Modeling and Optimization, vol. 10, no. 1, pp. 147–161, 2008.
- N. Andrei, “A simple three-term conjugate gradient algorithm for unconstrained optimization,” Journal of Computational and Applied Mathematics, vol. 241, pp. 19–29, 2013.
- E. D. Dolan and J. J. Moré, “Benchmarking optimization software with performance profiles,” Mathematical Programming, vol. 91, no. 2, pp. 201–213, 2002.
Copyright © 2021 Shengwei Yao et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.