Special Issue

## Machine Learning and its Applications in Image Restoration

View this Special Issue

Research Article | Open Access

Volume 2021 |Article ID 8867309 | https://doi.org/10.1155/2021/8867309

Shengwei Yao, Yuping Wu, Jielan Yang, Jieqiong Xu, "A Three-Term Gradient Descent Method with Subspace Techniques", Mathematical Problems in Engineering, vol. 2021, Article ID 8867309, 7 pages, 2021. https://doi.org/10.1155/2021/8867309

# A Three-Term Gradient Descent Method with Subspace Techniques

Revised21 Sep 2020
Accepted18 Dec 2020
Published07 Jan 2021

#### Abstract

We proposed a three-term gradient descent method that can be well applied to address the optimization problems in this article. The search direction of the obtained method is generated in a specific subspace. Specifically, a quadratic approximation model is applied in the process of generating the search direction. In order to reduce the amount of calculation and make the best use of the existing information, the subspace was made up of the gradient of the current and prior iteration point and the previous search direction. By using the subspace-based optimization technology, the global convergence result is established under Wolfe line search. The results of numerical experiments show that the new method is effective and robust.

#### 1. Introduction

Gradient descent and conjugate gradient (CG) methods have profound significance for dealing with unconstrained optimization problems:where is a continuous-differential function. They are widely used because of the low storage requirements and the strong convergence property. Starting from an initial iteration point , in each iteration, the method produces an approximate solution sequence of (1):in which is the current iteration point, is the step length, and is the search direction that usually has the following form (for typical CG methods):where is the gradient of at and is the CG parameter. The step-size is usually determined by performing a specific line search, among which Wolfe line search [1, 2] is one of the most commonly used:andwhere . Sometimes, the strong Wolfe line search given in (4) andis also widely used in the CG method for the establishment of convergence results. Different choices of form different versions of the CG methods. The research results of conjugate gradient algorithm are very rich, including PRP, HS, LS, and DY . For a detailed survey of conjugate gradient methods, refer to .

Recently, subspace technique has attracted more and more researchers’s attention. Various subspace techniques are applied to construct different approaches to deal with various optimization problems. For detailed description of subspace technique, refer to . In , Yuan and Stoer came up with the SMCG method by using subspace technique in the conjugate gradient method. Specifically, a two-dimensional subspace is using in , to determine the search direction, that is,where and are parameters and . In [16, 17], Li et al. obtained the values of and by minimizing the dynamically selected conic approximate model and tensor model, respectively. In order to solve nonlinear optimization problems, Conn et al.  brought up a class of iterated-subspace minimization methods. Similar to Yuan’s idea , Yang et al.  raised a new CG method in the same subspace .

For better convergence and robust numerical performance, some authors pay attention to study conjugate gradient methods with different forms. Zhang et al.  presented a three-term conjugate gradient method, which possesses sufficient descent property without line search. And, they further extended two-variant method and established the global convergence results for general function under the standard Wolfe line search. In , Narushima et al. proposed a specific three-term CG method, drawing on the idea of multistep quasi-Newton method. Deng and Wan  put forward a three-term CG algorithm, in which the direction is formed by , , and . For some other three-term conjugate gradient methods, refer to .

The outline of the article is shown below. In Section 2, we use the technique of subspace minimization to derive the search direction on the subspace constituted by . We elaborate the proposed algorithm in Section 3. Section 4 provides the convergence analysis of the given algorithm under some suitable conditions. In Section 5, the numerical experiments compared with other methods are presented.

#### 2. Derivation of the Search Direction

Different from the abovementioned subspace , in this paper, the following subspaceis used to construct the search direction. In order to simplify the calculation process, for each iteration point , the following quadratic model is used to approximate the function where can be viewed as an approximation of Hessian matrix. Moreover, is supposed to be positive definite and meet quasi-Newton equation . Model has two advantages, which is not only a good approximation of  but also convenient to minimize in the subspace . Since and are usually not collinear, we discuss the case of dimensions 3 and 2.Case I. dim () = 3.In this case, the direction has the formwhere are parameters to be calculated. Adding (10) to the minimizing problem (9), we get the following minimizing problem:then and are the solutions of the following linear system:To solve system (12), we need to estimate two quantities and . Inspired by BBCG , we adoptNext, we choose the BFGS  update initialized with the identity matrix to compute :This choice not only retains useful information but also gives numerical results that perform better than the scaling matrix . Substituting (13) and (14) into the linear system (12), we getCombining (13), the determinant of the system (15) is calculated asThen, and are obtained asCase II. dim() = 2.By observing numerical experiments, we find that produces better numerical results. In this case, we chose as the subspace. The direction is expressed asSubstituting (18) into (9), we getThen, the solution of (19) can be expressed asTherefore,If it is an exact line search, namely, , then, from (17) and (21), we getwhich means , and the obtained method is degraded to the HS method (Hestenes and Stiefel) .

#### 3. The Proposed TCGS Algorithm

This section is mainly to elaborate the three-term CG method with subspace techniques (TCGS), in which the acceleration scheme  is used (Algorithm 1).

 Step 1: given and , set and . Step 2: if holds, then stop; else, go to step 3. Step 3: determine the step-size by Wolfe line search, which means that conditions (4) and (5) hold. Step 4: generate by using the acceleration scheme , and compute , , and . Step 5: if dim() = 2, compute by (21), and go to step 6. If dim() = 3, compute by (10) and (17), and go to step 6. Step 6: set , and go to step 2.

#### 4. Convergence Analysis

The main content of this section is to study the convergence properties of the TCGS algorithm. Some necessary assumptions for the objective function are given as follows.

Assumption 1. The level set is bounded.

Assumption 2. Suppose that is continuous-differential, and its gradient is Lipschitz continuous with the constant .,Based on the above assumptions, it can be showed that there exists a constant such thatIt is well known that, for optimization algorithm, the relevant properties of the search direction are very important for the convergence of the algorithm. Firstly, we will study some properties of the search direction generated by the TCGS algorithm. The following lemmas show that the direction is decreasing, and Dai–Liao conjugacy condition is satisfied.

Lemma 1. Suppose that . Then, generated by the TCGS algorithm is a descent direction.

Proof. According to (9), we can get . Since and is generated by the TCGS algorithm, . Furthermore,

Lemma 2. The search direction satisfies the Dai–Liao conjugacy condition with , namely, .

Lemma 3. Suppose that is generated by the proposed TCGS algorithm and the step size possesses the conditions (4) and (5), then

Proof. Based on condition (5) and assumption (23), we getSince , we prove it.

Lemma 4. Assume that Assumptions 1 and 2 hold. Consider the algorithm TCGS in which Wolfe line search is used to compute the step size . Then, the Zoutendijk condition  holds:

Proof. Combining equation (4) and Lemma 3, we haveCondition (29) and Assumption 1 deduce (28) directly.

Lemma 5. Suppose that the objective function satisfies Assumptions 1 and 2, consider the algorithm TCGS in which the strong Wolfe line search (4) and (6) is used to compute the step size . Ifholds, then

Theorem 1. Under the Assumptions 1 and 2, consider the sequence generated by algorithm TSGS, satisfies the conditions (4) and (6). For uniformly convex function , i.e., there exists a constant such that

Then, (31) holds.

Proof. From (23) and (32), we get, respectively,On the basis of (33), combined with Cauchy inequality, it follows thatFurthermore, add the triangle inequality and (24), and we haveNext, we analyze the convergence in two cases:Case I. dim() = 3. and are computed by (16), and using (24), (33), (35), and (36), we getSimilarly, we obtainTherefore,Case II. dim() = 2.Using (20), (24), and (33), we obtainTherefore,From Lemma 3, (31) holds.

#### 5. Numerical Results

This section aims to observe the performance of the TCGS algorithm through numerical experiments and verify the effectiveness of the algorithm in dealing with unconstrained problems. For experimental purposes, we compared the numerical performance of TCGS with the PRP and MTHREECG  methods, in which MTHREECG has a structure similar to CGS, and PRP is a classic and effective CG method.

In , Deng and Wan presented a similar three-term conjugate gradient method (MTHREECG) with the following form:

We chose a total of 75 test functions, and all the test functions come from . The dimension of each function is from 1000, 2000, … to 10000 for the numerical experiments. The code is written by Fortran and available at https://camo.ici.ro/neculai/THREECG/threecg.for . The default parameter values of the algorithm are consistent with .

The performance profile introduced by Dolan and Moré  is one of the most used tools for evaluating the performance of different methods. In this paper, we use it to investigate the numerical performance of the proposed TCGS algorithm.

The performance profile for the number of iterations can be seen in Figure 1. Looking at Figure 1, it can be found that the TCGS algorithm can solve about of the test problems with the least iterations. With the increasing factor , TCGS method outperforms both MTHREECG and PRP methods. Figure 2 represents the performance profile with respect to the number of function evaluations. The result shows the similar performance with the result in Figure 1, in which TCGS also outperforms both MTHREECG and PRP methods. From the numerical experiments, it can be seen that the proposed TCGS method is efficient and robust in dealing with a set of unconstrained test problems.

#### 6. Conclusion

By using the idea of subspace minimization, we come up with a new type of subspace gradient descent method in this paper. In our method, the subspace is spanned by , and the quadratic approximation of the objective function is minimized to obtain the search direction. Therefore, the direction has the form , where the values of and are computed by discussing in two cases. In addition, we prove the descent property of the direction and show that Dai–Liao conjugacy condition is satisfied. Under the Wolf line search, the convergence of the proposed TCGS algorithm is established. Observing the numerical results, the performance of the TCGS algorithm in a set of unconstrained problems is competitive.

#### Data Availability

The data used to support the findings of this study are included within the article.

#### Conflicts of Interest

The authors declare that there are no conflicts of interest.

#### Acknowledgments

This research was supported by the Guangxi Natural Science Foundation (nos. 2018GXNSFAA281340 and 2020GXNSFAA159014) and Program for Innovative Team of Guangxi University of Finance and Economics.

1. P. Wolfe, “Convergence conditions for ascent methods,” SIAM Review, vol. 11, no. 2, pp. 226–235, 1969. View at: Publisher Site | Google Scholar
2. P. Wolfe, “Convergence conditions for ascent methods. II: some corrections,” SIAM Review, vol. 13, no. 2, pp. 185–188, 1971. View at: Publisher Site | Google Scholar
3. B. T. Polyak, “The conjugate gradient method in extremal problems,” USSR Computational Mathematics and Mathematical Physics, vol. 9, no. 4, pp. 94–112, 1969. View at: Publisher Site | Google Scholar
4. M. R. Hestenes and E. Stiefel, Methods of Conjugate Gradients for Solving Linear systems, NBS, Washington, DC, USA, 1952.
5. Y. Liu and C. Storey, “Efficient generalized conjugate gradient algorithms, part 1: Theory,” Journal of Optimization Theory and Applications, vol. 69, no. 1, pp. 129–137, 1991. View at: Publisher Site | Google Scholar
6. Y. H. Dai and Y. Yuan, “A nonlinear conjugate gradient method with a strong global convergence property,” SIAM Journal on Optimization, vol. 10, no. 1, pp. 177–182, 1999. View at: Publisher Site | Google Scholar
7. Z. Wei, S. Yao, and L. Liu, “The convergence properties of some new conjugate gradient methods,” Applied Mathematics and Computation, vol. 183, no. 2, pp. 1341–1350, 2006. View at: Publisher Site | Google Scholar
8. G. Yuan, Z. Meng, and Y. Li, “A modified Hestenes and stiefel conjugate gradient algorithm for large-scale nonsmooth minimizations and nonlinear equations,” Journal of Optimization Theory and Applications, vol. 168, no. 1, pp. 129–152, 2016. View at: Publisher Site | Google Scholar
9. G. Yuan, Z. Wei, and X. Lu, “Global convergence of BFGS and PRP methods under a modified weak Wolfe-Powell line search,” Applied Mathematical Modelling, vol. 47, pp. 811–825, 2017. View at: Publisher Site | Google Scholar
10. G. Yuan, Z. Wei, and Y. Yang, “The global convergence of the Polak-Ribière-Polyak conjugate gradient algorithm under inexact line search for nonconvex functions,” Journal of Computational and Applied Mathematics, vol. 362, pp. 262–275, 2019. View at: Publisher Site | Google Scholar
11. W. W. Hager and H. Zhang, “A survey of nonlinear conjugate gradient methods,” Pacific Journal of Optimization, vol. 2, no. 1, pp. 35–58, 2006. View at: Google Scholar
12. Y.-X. Yuan, “Subspace techniques for nonlinear optimization,” Series in Contemporary Applied Mathematics, Chinese Academy of Sciences, Beijing, China, 2007. View at: Publisher Site | Google Scholar
13. Y.-X. Yuan, “Subspace methods for large scale nonlinear equations and nonlinear least squares,” Optimization and Engineering, vol. 10, no. 2, pp. 207–218, 2009. View at: Publisher Site | Google Scholar
14. Y. Yuan, “A review on subspace methods for nonlinear optimization,” in Proceedings of the International Congress of Mathematics, pp. 807–827, Seoul, South Korea, August 2014. View at: Google Scholar
15. Y.-X. Yuan and J. Stoer, “A subspace study on conjugate gradient algorithms,” ZAMM-Journal of Applied Mathematics and Mechanics/Zeitschrift für Angewandte Mathematik und Mechanik, vol. 75, no. 1, pp. 69–77, 1995. View at: Publisher Site | Google Scholar
16. Y. Li, Z. Liu, and H. Liu, “A subspace minimization conjugate gradient method based on conic model for unconstrained optimization,” Computational and Applied Mathematics, vol. 38, no. 1, p. 16, 2019. View at: Publisher Site | Google Scholar
17. T. Wang, Z. Liu, and H. Liu, “A new subspace minimization conjugate gradient method based on tensor model for unconstrained optimization,” International Journal of Computer Mathematics, vol. 96, no. 10, pp. 1924–1942, 2019. View at: Publisher Site | Google Scholar
18. A. Conn, N. Gould, A. Sartenaer, and P. Toint, “On iterated-subspace minimization methods for nonlinear optimization,” in Proceedings of the Linear and Nonlinear Conjugate Gradient-Related Methods, Seattle, WA, USA, January 1996. View at: Google Scholar
19. Y. Yang, Y. Chen, and Y. Lu, “A subspace conjugate gradient algorithm for large-scale unconstrained optimization,” Numerical Algorithms, vol. 76, no. 3, pp. 813–828, 2017. View at: Publisher Site | Google Scholar
20. L. Zhang, W. Zhou, and D. Li, “Some descent three-term conjugate gradient methods and their global convergence,” Optimization Methods and Software, vol. 22, no. 4, pp. 697–711, 2007. View at: Publisher Site | Google Scholar
21. Y. Narushima, H. Yabe, and J. A. Ford, “A three-term conjugate gradient method with sufficient descent property for unconstrained optimization,” SIAM Journal on Optimization, vol. 21, no. 1, pp. 212–230, 2011. View at: Publisher Site | Google Scholar
22. S. Deng and Z. Wan, “A three-term conjugate gradient algorithm for large-scale unconstrained optimization problems,” Applied Numerical Mathematics, vol. 92, pp. 70–81, 2015. View at: Publisher Site | Google Scholar
23. N. Andrei, “A new three-term conjugate gradient algorithm for unconstrained optimization,” Numerical Algorithms, vol. 68, no. 2, pp. 305–321, 2015. View at: Publisher Site | Google Scholar
24. S. Yao and L. Ning, “An adaptive three-term conjugate gradient method based on self-scaling memoryless BFGS matrix,” Journal of Computational and Applied Mathematics, vol. 332, pp. 72–85, 2018. View at: Publisher Site | Google Scholar
25. H. Kobayashi, Y. Narushima, and H. Yabe, “Descent three-term conjugate gradient methods based on secant conditions for unconstrained optimization,” Optimization Methods and Software, vol. 32, no. 6, pp. 1313–1329, 2017. View at: Publisher Site | Google Scholar
26. X. Y. Wang, S. J. Li, and X. P. Kou, “A self-adaptive three-term conjugate gradient method for monotone nonlinear equations with convex constraints,” Calcolo, vol. 53, no. 2, pp. 133–145, 2016. View at: Publisher Site | Google Scholar
27. P. Gao, C. He, and Y. Liu, “An adaptive family of projection methods for constrained monotone nonlinear equations with applications,” Applied Mathematics and Computation, vol. 359, pp. 1–16, 2019. View at: Publisher Site | Google Scholar
28. Y. Dai and C. Kou, “A Barzilai-Borwein conjugate gradient method,” Science China Mathematics, vol. 59, no. 8, pp. 1511–1524, 2016. View at: Publisher Site | Google Scholar
29. C. G. Broyden, “The convergence of a class of double-rank minimization algorithms 1. General considerations,” IMA Journal of Applied Mathematics, vol. 6, no. 1, pp. 76–90, 1970. View at: Publisher Site | Google Scholar
30. R. Fletcher, “A new approach to variable metric algorithms,” The Computer Journal, vol. 13, no. 3, pp. 317–322, 1970. View at: Publisher Site | Google Scholar
31. D. Goldfarb, “A family of variable-metric methods derived by variational means,” Mathematics of Computation, vol. 24, no. 109, p. 23, 1970. View at: Publisher Site | Google Scholar
32. D. F. Shanno, “Conditioning of quasi-Newton methods for function minimization,” Mathematics of Computation, vol. 24, no. 111, p. 647, 1970. View at: Publisher Site | Google Scholar
33. N. Andrei, “Acceleration of conjugate gradient algorithms for unconstrained optimization,” Applied Mathematics and Computation, vol. 213, no. 2, pp. 361–369, 2009. View at: Publisher Site | Google Scholar
34. G. Zoutendijk, “Nonlinear programming, computational methods,” Integer and Nonlinear Programming, pp. 37–86, 1970. View at: Google Scholar
35. N. Andrei, “An unconstrained optimization test functions collection,” Advanced Modeling and Optimization, vol. 10, no. 1, pp. 147–161, 2008. View at: Google Scholar
36. N. Andrei, “A simple three-term conjugate gradient algorithm for unconstrained optimization,” Journal of Computational and Applied Mathematics, vol. 241, pp. 19–29, 2013. View at: Publisher Site | Google Scholar
37. E. D. Dolan and J. J. Moré, “Benchmarking optimization software with performance profiles,” Mathematical Programming, vol. 91, no. 2, pp. 201–213, 2002. View at: Publisher Site | Google Scholar

#### More related articles

Article of the Year Award: Outstanding research contributions of 2020, as selected by our Chief Editors. Read the winning articles.