Advanced Techniques in Computational MechanicsView this Special Issue
Research Article | Open Access
A Conjugate Gradient Method with Global Convergence for Large-Scale Unconstrained Optimization Problems
The conjugate gradient (CG) method has played a special role in solving large-scale nonlinear optimization problems due to the simplicity of their very low memory requirements. This paper proposes a conjugate gradient method which is similar to Dai-Liao conjugate gradient method (Dai and Liao, 2001) but has stronger convergence properties. The given method possesses the sufficient descent condition, and is globally convergent under strong Wolfe-Powell (SWP) line search for general function. Our numerical results show that the proposed method is very efficient for the test problems.
The conjugate gradient (CG) method has played a special role in solving large-scale nonlinear optimization due to the simplicity of their iterations and their very low memory requirements. In fact, the CG method is not among the fastest or most robust optimization algorithms for nonlinear problems available today, but it remains very popular for engineers and mathematicians who are interested in solving large problems. The conjugate gradient method is designed to solve the following unconstrained optimization problem: where is a smooth, nonlinear function whose gradient will be denoted by . The iterative formula of the conjugate gradient method is given by where is a step length which is computed by carrying out a line search, and is the search direction defined by where is a scalar and denotes the gradient . If is a strictly convex quadratic function, namely, where is a positive definite matrix and if is the exact one-dimensional minimizer along the direction , then the method with (2) and (3) are called the linear conjugate gradient method. Otherwise, (2) and (3) is called the nonlinear conjugate gradient method. The most important feature of linear conjugate gradient method is that the search directions satisfy the following conjugacy condition: For nonlinear conjugate gradient methods, for general objective functions, (5) does not hold, since the Hessian changes at different points.
Some well-known formulas for are the Fletcher-Reeves (), Polak-Ribière (), Hestense-Stiefel (), and Dai-Yuan () methods which are given, respectively, by where denotes the Euclidean norm. Their corresponding conjugate methods are abbreviated as , , , and methods. Although all these methods are equivalent in the linear case, namely, when is a strictly convex quadratic function and are determined by exact line search, their behaviors for general objective functions may be far different.
For general functions, Zoutendijk  proved the global convergence of method with exact line search (here and throughout this paper, for global convergence, we mean that the sequence generated by the corresponding methods will either terminate after finite steps or contain a subsequence such that it converges to a stationary point of the objective function from a given initial point). Although one would be satisfied with its global convergence properties, the method performs much worse than the () method in real computations. Powell  analyzed a major numerical drawback of the method; namely, if a small step is generated away from the solution point, the subsequent steps may be also very short. On the other hand, in practical computation, the method resembles the method, and both methods are generally believed to be the most efficient conjugate gradient methods since these two methods essentially perform a restart if a bad direction occurs. However, Powell  constructed a counterexample and showed that the method and method can cycle infinitely without approaching the solution. This example suggests that these two methods have a drawback that they are not globally convergent for general functions. Therefore, in the past two decades, much effort has been exceterd to find out new formulas for conjugate methods such that not only they are globally convergent for general functions but also they have good numerical performance.
Recently, using a new conjugacy condition, Dai and Liao  proposed two new methods. Interestingly, one of their methods is not only globally convergent for general functions but also performs better than and methods. In this paper, similar to Dai and Liao's approach, we propose another formula for , analyze the convergence properties for the given method, and also carry the numerical experiment which shows that the given method is robust and efficient.
The remainder of this paper is organized as follows. In Section 2, we firstly state the corresponding formula which is proposed by Dai and Liao  and the motivations of this paper, and then we propose the new nonlinear conjugate gradient method. In Section 3, convergence analysis for the given method is presented. Numerical results are reported in Section 4. Finally, some conclusions are given in Section 5.
2. Motivations and New Nonlinear Conjugate Gradient Method
2.1. Dai-Liao's Methods
It is well known that the linear conjugate gradient methods generate a sequence of search directions such that the conjugacy condition (5) holds. Denote to be the gradient change, which means that For a general nonlinear function , we know by the mean value theorem that there exists some such that Therefore, it is reasonable to replace (5) with the following conjugacy condition: Recently, extension of (12) has been studied by Dai and Liao in . Their approach is based on the Quasi-Newton techniques. Recall that, in the Quasi-Newton method, an approximation matrix of the Hessian is updated such that the new matrix satisfies the following Quasi-Newton equation: The search direction in Quasi-Newton method is calculated by Combining these two equations, we obtain The previous relation implies that (12) holds if the line search is exact since in this case . However, practical numerical algorithms normally adopt inexact line searches instead of exact line searches. For this reason, it seems more reasonable to replace the conjugacy condition (12) with the condition where is a scalar.
To ensure that the search direction satisfies the conjugate condition (16), one only needs to multiply (3) with and use (16), yielding It is obvious that For simplicity, we call the method with (2), (3), and (17) as method. Dai and Liao also prove that the conjugate gradient method with is globally convergent for uniformly convex functions. For general functions, Powell  constructed an example showing that the method may cycle without approaching any solution point if the step length is chosen to be the first local minimizer along . Since the method reduces to the method in the case that holds, this implies that the method with (17) need not converge for general functions. To get the global convergence, like Gilbert and Nocedal , who have proved the global convergence of the method with the restriction that , Dai and Liao replaced (17) by We also call the method with (2), (3), and (19) as method, Dai and Liao show that method is globally convergent for general functions under the sufficient descent condition (21) and some suitable conditions. Besides, some numerical experiments in  indicate the efficiency of this method.
Similar to Dai and Liao's approach, Li et al.  proposed another conjugate condition and related conjugate gradient methods, and they also prove that the proposed methods are globally convergent under some assumptions.
From the above discussions, Dai and Liao's approach is effective; the main reason is that the search directions generated by method or method not only contain the gradient information but also contain some Hessian information. From (18) and (19), and are formed by two parts; the first part is , and the second part is . So, we also can consider and methods as some modified forms of the method by adding some information of Hessian which is contained in the second part. The convergence properties of the method are similar to method; it does not converge for general functions even if the line search is exact. In order to get the convergence, one also needs the nonnegative restriction and the sufficient descent assumption (21). From the above discussion, the descent condition or sufficient descent condition and nonnegative property of play important roles in the convergence analysis. We say that the descent condition holds if for each search directions In addition, we say that the sufficient descent condition holds if there exists a constant such that for each search direction , we have Motivated by the above ideal, in this paper, we focus on finding the new conjugate gradient method which possesses the following properties: (1)nonnegative property ;(2)the new formula contains not only the gradient information but also some Hessian information;(3)the search directions generated by the proposed method satisfy the sufficient descent conditions (21).
2.3. The New Conjugate Gradient Method
From the structure of (6), (7), (8), and (9), the and methods have the common numerator , and the and methods have the common numerator ; and this different choice makes them have different properties. Generally speaking, and methods have better convergence properties, and and methods have better numerical experiments. Powell  pointed out that the method, with exact line search, was susceptible to jamming. That is, the algorithm could take many short steps without making significant progress to the minimum. If the line search is exact, that means , in this case, method will turn out to be method. So, these two methods have the same disadvantage. The and methods which share the common numerator possess a built-in restart feature to avoid the jamming problem: when the step is small, the factor in the numerator of tends to zero. Hence, the next search direction is essentially the steepest descent direction . So, the numerical performance of these methods is better than the performance of the methods with in numerator of .
Just as above discussions, great attentions were given to find the methods which not only have global convergent properties but also have nice numerical experiments.
Recently, Wei et al.  proposed a new formula
The method with formula not only has nice numerical results but also possesses the sufficient descent condition and global convergence properties under the strong Wolfe-Powell line search. From the structure of , we know that the method with can also avoid jamming: when the step is small, tends to 1 and the next search direction tends to the steepest descent direction which is similar to method. But method has some advantages, such as under strong Wolfe-Powell line search, , and if the parameter in , method possesses the sufficient descent condition which deduces the global convergence of the method.
In [8, 9], Shengwei et al. extended such modification to method as follows: The previous formulae and can be considered as the modification forms of and by using to replace , respectively. In [8, 9], the corresponding methods are proved to be globally convergent for general functions under the strong Wolfe-Powell line search and Grippo-Lucidi line search. Based on the same approach, some authors give other discussions and modifications in [10–12]. In fact, is not our point at the beginning, our purpose is involving the information of the angle between and . From this point of view, has the following form: where is the angle between and . By multiplying with , the method not only has similar convergence properties with method, but also avoids jamming which is similar to method.
The above analysis motivates us to propose the following formula to compute : where . Since the are nonnegative under the strong Wolfe-Powell line search, we omit the nonnegative restriction and propose the following formula:
From (25) and (26), we know that we only substitute in the first part of the numerator of by . The reason is that we hope the formulae (25) and (26) contain the angle information between and . In fact, can be expressed as For simplicity, we call the method generated by (2), (3), and (26) as method and give the algorithm as follows.
Algorithm 1 (MDL). Step 1. Given , , set , ; if , then stop.
Step 2. Compute by some line searches.
Step 3. Let , and let ; if , then stop.
Step 4. Compute by (26) and generate by (3).
Step 5. Set and go to Step 2.
We make the following basic assumptions on the objective functions.
Assumption A. (i) The level set is bounded; namely, there exists a constant such that
(ii) In some neighborhood of , is continuously differentiable, and its gradient is Lipschitz continuous; namely, there exists a constant such that Under the above assumptions of , there exists a constant such that
The step length in Algorithm 1 (MDL) is obtained by some line search scheme. In conjugate gradient methods, the strong Wolfe-Powell conditions; namely, where , are often imposed on the line search ().
3. Convergence Analysis
Lemma 2. Suppose that Assumption A holds. Consider any conjugate gradient method in the form (2)-(3), where is a descent direction and is obtained by the strong Wolfe-Powell line search. If One has that If the objective functions are uniformly convex, we can prove that the norm of generated by Algorithm 1 (MDL) is bounded previously. Thus, by Lemma 2 one immediately has the following result.
Theorem 3. Suppose that Assumption A holds. Consider method, where is a descent direction and is obtained by the strong Wolfe-Powell line search. If the objective functions are uniformly convex, namely, there exists a constant such that One has that
Proof. It follows from (35) that By (3), (26), (29), (30), and (37), we have which implies the truth of (33). Therefore, by Lemma 2 we have (34), which is equivalent to (36) for uniformly convex functions. The proof is completed.
In order to prove the convergence of the method, we need to state some properties of .
Proof. By condition (32), we have , since and . So we have The proof is completed.
In addition, we can also prove that, in conjugate gradient method of forms (2)-(3), if is computed by (26) and is determined by strong Wolfe-Powell line search, then the search direction satisfies the sufficient descent condition (21).
Theorem 5. In any conjugate gradient methods, in which the parameter is computed by (26), namely, , and is determined by strong Wolfe-Powell line search of (31) and (32), if , then the search direction satisfied the sufficient descent condition (21).
Proof. We prove this theorem by induction. Firstly, we prove the descent condition as follow.
Since , supposing that holds for , we deduce that the descent condition holds by proving that holds for as follow.
By condition (32), we have . Combining (3) and (26), we have Equation (41) means that descent condition holds.
Secondly, we prove the following sufficient descent condition.
Set ; since the restriction , we have . Combining and (41), the sufficient descent condition (21) holds immediately.
Lemma 6. Suppose that Assumption A holds. Consider method, where is obtained by strong Wolfe-Powell lien search with . If there exists a constant such that then and where .
Proof. Firstly, note that ; otherwise, (21) is false. Therefore, is well defined. In addition, by relation (42) and Lemma 2, we have
Now, we divide formula into two parts as follows:
Then by (3) we have for all , Using the identity and (47) we can obtain using the condition , the triangle inequality, and (48), it follows that On the other hand, the line search condition (32) gives Equations (50), (32), and (21) imply that It follows from the definition of , (51), (28), and (30) that So, we have The proof is completed.
Gilbert and Nocedal  introduced property (*) which is very important for the convergence properties of the conjugate gradient methods. We are going to show that method with possesses such property (*).
In fact, by (50), (21), and (42), we have Using this, (28), (29), and (30) we obtain Note that can be defined such that . Therefore, we can say . As a result, we define we get from the first inequality in (56) that if , then
Let denote the set of positive integers. For and a positive integer , denote Let denote the number of elements in . From the previous property (*), we can prove the following lemma.
Lemma 7. Suppose that Assumption A holds. Consider method, where is obtained by the strong Wolfe-Powell line search in which . Then if (42) holds, there exists such that, for any and any index , there is an index such that
The proof of this lemma is similar to the proof of Lemma 6 in . In , authors proved that method with (19) has this property, if the search direction satisfies the sufficient descent condition (21). In our paper, we do not need this assumption, since the directions generated by method with strong Wolfe-Powell line search always possess the sufficient descent condition (21). So, we omit the proof of this lemma.
According to the previous lemmas and theorems, we can prove the following convergence theorem for the .
Theorem 8. Suppose that Assumption A holds. Consider method, if is obtained by strong Wolfe-Powell line search with . Then we have .
Proof. We proceed by contradiction. If , then (42) must hold. Then the conditions of Lemmas 6 and 7 hold. Defining , we have for any indices , , with ,
Equation (61), , and (28) give
Let be given by Lemma 7, and define to be the smallest integer not less than . By Lemma 6, we can find an index such that With this and , Lemma 7 gives an index such that For any index , by Cauchy-Schwartz inequality and (63), From these relations (65) and (64) and taking in (62), we get Thus, , which contradicts the definition of . The proof is completed.
4. Numerical Results
In form 1, the in is replaced by . By this modification, we can guarantee the nonnegativity restrictions in method. In form 2, is obtained by adding an adjusting term which contains some Hessian information of the objective function. In form 3, shows that is obtained by multiplying with and adding the second term .
From the above convergence analysis, we know that method has stronger convergent properties than method, and similar convergent properties with method and method. So, in this section, we test the following four CG methods:(i) method: method of the forms (2) and (3), in which is computed by (26);(ii) method: method of the forms (2) and (3), in which is computed by (19);(iii) method: method of the forms (2) and (3), in which is computed by (23);(iv) method: method of the forms (2) and (3), in which is computed by (9).
The column problem represents the problem name in , Dim represents the dimension of the problems. The numerical results are given in the form of , where , , and denote the numbers of iterations, function evaluations and gradient evaluations, respectively. The stopping condition is . Since we want to compare the performance of the different methods, in the numerical results, we omit the problems if all the four methods perform equally. The notation means that, for this problem, the corresponding method fails.
In this paper, based on and , a new formula is proposed to compute the parameter of the conjugate gradient methods. The main motivations are to improve both the convergence properties and numerical behavior of the conjugate gradient method. For general conjugate gradient methods, in order to get the global convergence results, the methods are required to possess the following major properties:(1)the generated directions are descent directions;(2)the parameters are nonnegative.
In addition, to ensure that the methods have robust and efficient numerical behavior, the parameter needs to approach zero, when the small step occurs.
From the convergence analysis of this paper, we known that the directions generated by method are descent directions, which is not true for or methods, and the proposed method is globally convergent for general functions. In the previous section, we compare the numerical performance of the method with the , , and methods. From the convergence analysis and numerical results, comparing with the , , and method, we can have the following.(a) method versus method: from the computational point of view, for most of the test problems, method performs quite similarly with method. There are 15 problems in which method outperforms the method and 18 problems in which method outperforms the method. But, from the convergent point of view, the method outperforms the method.(b) method versus method: the convergence properties of method are similar to method. By comparing the numerical results of method with method, there are 27 test problems in which method outperforms the method and only 4 test problems in which method outperforms the method. Therefore, we could say that method is much better than the method in numerical behavior.(c) method versus method: they possess similar convergence properties; the numerical results show that method performs little better than the method.
This research was supported by Guangxi High School Foundation Grant no. 2013BYB210 and Guangxi University of Finance and Economics Science Foundation Grant no. 2013A015.
- G. Zoutendijk, “Nonlinear programming, computational methods,” in Integer and Nonlinear Programming, J. Abadie, Ed., pp. 37–86, North-Holland Publishing, Amsterdam, 1970.
- M. J. D. Powell, “Restart procedures for the conjugate gradient method,” Mathematical Programming, vol. 12, no. 2, pp. 241–254, 1977.
- M. J. D. Powell, “Nonconvex minimization calculations and the conjugate gradient method,” in Numerical Analysis, vol. 1066 of Lecture Notes in Mathematics, pp. 122–141, Springer, Berlin, Germany, 1984.
- Y.-H. Dai and L.-Z. Liao, “New conjugacy conditions and related nonlinear conjugate gradient methods,” Applied Mathematics and Optimization, vol. 43, no. 1, pp. 87–101, 2001.
- J. C. Gilbert and J. Nocedal, “Global convergence properties of conjugate gradient methods for optimization,” SIAM Journal on Optimization, vol. 2, no. 1, pp. 21–42, 1992.
- G. Li, C. Tang, and Z. Wei, “New conjugacy condition and related new conjugate gradient methods for unconstrained optimization,” Journal of Computational and Applied Mathematics, vol. 202, no. 2, pp. 523–539, 2007.
- Z. Wei, S. Yao, and L. Liu, “The convergence properties of some new conjugate gradient methods,” Applied Mathematics and Computation, vol. 183, no. 2, pp. 1341–1350, 2006.
- Y. Shengwei, Z. Wei, and H. Huang, “A note about WYL's conjugate gradient method and its applications,” Applied Mathematics and Computation, vol. 191, no. 2, pp. 381–388, 2007.
- H. Huang, S. Yao, and H. Lin, “A new conjugate gradient method based on HS-DY methods,” Journal of Guangxi University of Technology, no. 4, pp. 63–66, 2008.
- L. Zhang, “An improved Wei-Yao-Liu nonlinear conjugate gradient method for optimization computation,” Applied Mathematics and Computation, vol. 215, no. 6, pp. 2269–2274, 2009.
- L. Zhang, “Further studies on the Wei-Yao-Liu nonlinear conjugate gradient method,” Applied Mathematics and Computation, vol. 219, no. 14, pp. 7616–7621, 2013.
- Z. Dai and F. Wen, “Another improved Wei-Yao-Liu nonlinear conjugate gradient method with sufficient descent property,” Applied Mathematics and Computation, vol. 218, no. 14, pp. 7421–7430, 2012.
- Y. Dai, J. Han, G. Liu, D. Sun, H. Yin, and Y.-X. Yuan, “Convergence properties of nonlinear conjugate gradient methods,” SIAM Journal on Optimization, vol. 10, no. 2, pp. 345–358, 1999.
- J. J. Moré, B. S. Garbow, and K. E. Hillstrom, “Testing unconstrained optimization software,” ACM Transactions on Mathematical Software, vol. 7, no. 1, pp. 17–41, 1981.
Copyright © 2013 Shengwei Yao et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.