Analytical and Numerical Approaches for Complicated Nonlinear EquationsView this Special Issue
A Hybrid of DL and WYL Nonlinear Conjugate Gradient Methods
The conjugate gradient method is an efficient method for solving large-scale nonlinear optimization problems. In this paper, we propose a nonlinear conjugate gradient method which can be considered as a hybrid of DL and WYL conjugate gradient methods. The given method possesses the sufficient descent condition under the Wolfe-Powell line search and is globally convergent for general functions. Our numerical results show that the proposed method is very robust and efficient for the test problems.
The nonlinear conjugate gradient (CG) method has played a special role in solving large-scale nonlinear optimization due to the simplicity of their iterations and their very low memory requirements. In fact, the CG method is not among the fastest or most robust optimization algorithms for nonlinear problems available today, but it remains very popular for engineers and mathematicians who are interested in solving large-scale problems. As we know, the nonlinear conjugate gradient method is the extended based on linear conjugate gradient method. The first linear conjugate gradient method is proposed by Hestenes and Stiefel 1952  for solving linear equations. In 1964, Fletcher and Reeves  extended it to nonlinear problems and get the first nonlinear conjugate gradient method (FR method).
In this paper, we focus on solving the following nonlinear unconstrained optimization problem by conjugate gradient method: where is a smooth, nonlinear function whose gradient will be denoted by . The iterative formula of the conjugate gradient method is given by where, is the search direction at and is the step-length. For nonlinear conjugate gradient method, is computed by where is a scalar and denotes the gradient. Different conjugate gradient methods correspond to the different ways to compute . Some well-known formulae for are the Fletcher-Reeves (FR), Polak-Ribière (PR), Hestense-Stiefel (HS), Dai-Yuan (DY), and CG-DESCENT, which are given, respectively, by (see ), (see ), (see ), (see ), (see ), where denotes the gradient change.
Although all these methods are equivalent in the linear case, namely, when is a strictly convex quadratic function and are determined by exact line search, their behaviors for general objective functions may be far different. For general functions, Zoutendijk  proved the global convergence of FR method with exact line search. (Here and throughout this paper, for global convergence, we mean the sequence generated by the corresponding methods will either terminate after finite steps or contain a subsequence such that it converges to a stationary point of the objective function from a given initial point.) Although one would be satisfied with its global convergence properties, the FR method performs much worse than the PR (HS) method in real computations. Powell  analyzed a major numerical drawback of the FR method, namely, if a small step is generated away from the solution point, the subsequent steps may be also very short. On the other hand, in practical computation, the HS method resembles the PR method, and both methods are generally believed to be the most efficient conjugate gradient methods since these two methods essentially perform a restart if a bad direction occurs. However, Powell  constructed a counterexample and showed that the PR method and HS method can cycle infinitely without approaching the solution. This example suggests that these two methods have a drawback that they are not globally convergent for general functions. Therefore, over the past few years, much effort has been put to find out new formulas for conjugate methods such that they are not only globally convergent for general functions but also have good numerical performance. The similar counterexamples are also constructed by Dai and Yuan .
From the structure of the above formulae , we know that and have the common numerator . They are globally convergent if the objective function Lipschitz continuous and the level set is bounded. For inexact line search, Al-Baali  proved the global convergence of FR method under the strong Wolfe-Powell line search with the restriction . Based on Al-Baali's result, Liu et al.  extended the global convergence of FR method to the case . Dai and Yuan  proved that the sufficient descent condition must hold for one of the directions and , and proposed the global convergence of FR method with general Wolfe line searches.
and share the common numerator , they possess a built-in restart feature to avoid the jamming problem as follows: when the step is small, the factor in the numerator of tends to be zero. Hence, the next search direction is essentially the steepest descent direction . So, the numerical performance of these methods is better than the performance of the methods with in numerator of . In  Polak and Ribière proved that if the objective function is strongly convex and line search is exact, the PR method is globally convergent. For general functions, Powell [7, 8] analyzed the convergence properties of PR method and constructed an example which shows that the PR method may cycle infinitely between nonstationary points. To get the global convergence, Gilbert and Nocedal  made the following nonnegative restriction on :
Generally speaking, methods with numerator possess better convergence than the methods with numerator . But from numerical performance point of view, methods with numerator outperform the methods with the numerator . So, a lot of effort has been made to find the method which has nice convergence properties and efficient numerical performance in the past decades. In , authors proposed a new conjugacy condition which made use of not only gradient values but also function values. Based on the given conjugacy condition, a class of nonlinear conjugate gradient methods is proposed. The PR method outperforms a lot of methods in numerical experiments, but it does not possess the sufficient descent condition. So, some modified forms of PR method have been studied in [15, 16], the given methods possess the sufficient descent condition and are globally convergent for general functions.
2. Motivations and the New Formula
Since the PR method is considered as one of the most efficient nonlinear conjugate gradient methods, a lot of effort has been made on its convergence properties and its modifications. In , with the sufficient descent assumption, Gilbert and Nocedal proved the global convergence of PR+ method under the Wolfe line search. Grippo and Lucidi  constructed an Armijo-type line search and proved that under this line search, directions generated by PR method satisfy the sufficient descent condition.
The method with formula not only has nice numerical results but also possess the sufficient descent condition and global convergence properties under the strong Wolfe-Powell line search. From the structure of , we know that, the method with can also avoid jamming such that when the step is small, tends to be 1 and the next search direction tends to be the steepest descent direction which is similar to method. But WYL method has some advantages, such as under strong Wolfe-Powell line search, , and if the parameter in SWP, WYL method possesses the sufficient descent condition which deduces the global convergence of the WYL method.
In [20, 21], Shengwei et al. and Huang et al. extended such modification to method as follows: The above formulae and can be considered as the modification form of and by using to replace , respectively. In [20, 21], the corresponding methods are proved to be globally convergent for general functions under the strong Wolfe-Powell line search and Grippo-Lucidi line search. Based on the same approach, some authors extended other discussions and modifications in [22–24]. In fact, is not our point at the beginning, our purpose is involving the information of the angle between and . From this point of view, has the following form: where is the angle between and . By multiplying with , the method not only has similar convergence properties with method, but also avoid jamming which is similar to method.
Recently, Dai and Liao  proposed a new conjugacy condition which is based on the Quasi-Newton techniques. According to the new conjugacy condition, the following formula is given: where , for simplicity, we call the method with (13) as DL1 method. It is obviously that In , for the method with , if the line search is exact, DL1 method has the same convergence properties with PR method, which indicates that DL1 method does not converge for general functions. To get the global convergence, Dai and Liao replace (13) by The formula (15) can be considered as a modified form of , by adding the part which may contain some information of Hessian . From the convergence analysis in , the nonnegative restriction and the sufficient descent condition are significant for the global convergence results.
Motivated by the above discussion, in this paper, we give the following formula to compute the parameter :
The formula can be considered a modification of , namely, by adding , the may contain some Hessian information . It also can be considered as a modified form of by substituting with . We call the method with (2), (3), and (16) as WYLDL method and give the corresponding algorithm as follows.
Algorithm 1 (WYLDL).
Step 1. Given , set , , if , then stop;
Step 2. Compute by the Strong Wolfe-Powell line search;
Step 3. Let , , if , then stop;
Step 4. Compute by (16) and generate by (3);
Step 5. Set , go to Step 2.
3. Convergence Analysis
For conjugate gradient methods, during the iteration process, the gradient of the objective function is required. We make the following basic assumptions on the objective functions.
Assumption 2. (i) The level set is bounded, namely, there exists a constant such that
(ii) In some neighborhood of , is continuously differentiable and its gradient is Lipschitz continuous, namely, there exists a constant such that
Under the above assumptions of , there exists a constant such that Exact Line Search. Suppose that is a descent direction and step length is the solution of For exact line search, Form (16) and (21), we can get that . In , author proved that if the line search is exact, the method with is globally convergent for the uniformly convex functions.
For conjugate gradient methods, the sufficient descent condition is significant to the global convergence. We say the sufficient descent condition holds if there exists a constant such that In nonlinear optimization algorithm, the Strong Wolfe-Powell conditions, namely, where , are often imposed on the line search.
In , Dai and Liao proved that, if directions satisfy the sufficient descent condition (22), the DL method is globally convergent under the strong Wolfe-Powell line search for general functions. In this section, we will prove that the directions generated by Algorithm 1 satisfy the sufficient descent condition (22). Based on this result, the global convergence of Algorithm 1 will be established.
Lemma 3. Suppose that the sequence is generated by Algorithm 1, the step-length satisfy the strong Wolfe-Powell conditions (23) and (24), if ; then, the generated directions satisfy the sufficient descent condition (22).
Proof. We prove this result by induction. By using (3), we have , and combining this equation with (), we can deduce that By strong Wolfe-Powell condition (24), it follows that . Which means that From (25), the following inequality holds: (25) and Wolfe-Powell condition (24) deduce that namely, The repeating of (29) can deduce that Since (30) can be expressed as With the restrictions and , for , the inequality (32) means that
For PR method, when a small step-length occurs, will tend to be zero, and the next search direction automatically approaches to . By such way, the PR method automatically avoids jamming. This property was first studied by Gilbert and Nocedal , which is called Property (). We are going to show that the method with possesses such property ().
Proof. By Lemma 3, we know that the sufficient descent condition (22) holds. Combining with Wolfe-Powell condition (24), we have It follows from (17), (35), and that Set which means that . By setting we have
For nonlinear conjugate gradient methods, Dai et al.  proposed the following general conclusion.
Lemma 5. Suppose that Assumption 2 holds. Consider any conjugate gradient method, where is a descent direction and is obtained by the strong Wolfe-Powell line search. if we have
Theorem 6. Suppose that Assumption 2 holds. Consider WYLDL method, where is obtained by strong Wolfe-Powell lien search with . If there exists a constant such that then and where .
Proof. Firstly, note that ; otherwise, the sufficient descent condition (22) fails. Therefore, is well defined. In addition, by relation (42) and Lemma 5 we have
Now, we divide formula into two parts as follows:
Then by (3), we have, for all , Using the identity and (47), we can obtain Using the condition , the triangle inequality, and (48), we obtain On the other hand, line search condition (24) gives Equations (22), (24), and (50) imply that It follows from the definition of , (17), (36), and (51) that So we have Let denote the set of positive integers. For and a positive integer , denote Let denote the number of elements in . Dai and Liao  pointed out that for conjugate gradient method which satisfies (i)Property 1(*);(ii)the sufficient descent condition;(iii)Theorem 6;if (42) holds, then the small step-sizes should not be too many. This property is described as follows.
Lemma 7. Suppose that Assumption 2 holds. Consider WYLDL method, where is obtained by the strong Wolfe-Powell line search in which . Then if (42) holds, there exists such that, for any and any index , there is an index such that
Proof. It follows from Lemmas 3 and 4 and Theorem 6 that WYLDL method possesses the above three conditions in . So, according to Lemma 3.5 in , the Lemma 7 holds. We omit the detailed proof of this Lemma 7.
According to the above lemmas and theorems, we can prove the following convergence theorem for WYLDL method.
Theorem 8. Suppose that Assumption 2 holds. Consider WYLDL method, if is obtained by strong Wolfe-Powell line search with , then we have
Proof. We prove this theorem by contradiction. If , then (42) must hold. Then the conditions of Theorem 6 and Lemma 7 hold. Defining , we have, for any indices , , with , (57), , and (17) give the following: Let be given by Lemma 7 and define to be the smallest integer not less than . By Theorem 6, we can find an index such that With this and , Lemma 7 gives an index such that For any index , by Cauchy-Schwartz inequality and (59), From these relations (61) and (60) and taking in (58), we get Thus , which contradicts the definition of . The proof is completed.
4. Numerical Experiments
In this section, we report the performance of the Algorithm 1 (WYLDL) on a set of test problems. The codes were written in Fortran 77 and in double precision arithmetic. All the tests were performed on the same PC. The experiments were performed on a set of 73 nonlinear unconstrained problems in . Some of the problems are from CUTE  library. For each test problem, we have performed 10 numerical experiments with number of variables .
In order to assess the reliability of the WYLDL algorithm, we also tested this method against the DL method and WYL method using the same problems. All these algorithms are terminated when . We also force stopped the routines if the iterations exceeded 1000 or the number of function evaluations reached 2000. In the Wolfe-Powell line search conditions (23) and (24), the parameters are , . For DL method, , which is the same with . We also test WYLDL algorithm with which is the best choice.
The comparing data contain the iterations, function and gradient evaluations, and CPU time. To approximatively assess the performance of WYLDL, WYL and DL methods, we use the profile of Dolan and Moré  as an evaluated tool.
Dolan and Moré  gave a new tool to analyze the efficiency of algorithms. They introduced the notion of a performance profile as a means to evaluate and compare the performance of the set of solvers on a test set . Assuming that there exist solvers and problems, for each problem and solver , they defined that
= computing cost (iterations or function and gradient evaluations or CPU time) is required to solve problem by solver .
Requiring a baseline for comparisons, they compared the performance on problem by solver with the best performance by any solver on this problem; that is, using the performance ratio as follows:
Suppose that a parameter for all . Set if and only if solver does not solve problem . Then they defined thus is the probability for solver that a performance ratio is within a factor of the best possible ratio. Then function is the distribution function for the performance ratio. The performance profile is a nondecreasing, piecewise constant function. That is, for subset of the methods being analyzed, we plot the fraction of the problems for which any given method is within a factor of the best.
For the testing problems, if all three methods cannot terminate successfully, then we got rid of them. In case one method fails, but there are other methods that terminate successfully, then the performance ratio of the failed method is set to be ( are the maxima of the performance ratios). The performance profiles based on iterations, function and gradient evaluations, and CPU-time of the three methods are plotted in Figures 1, 2, and 3, respectively.
From Figure 1, which plots the performance profile based on iterations, when , the DL method performs better than WYL and WYLDL methods. With the increasing of , when , the profiles of WYLDL and WYL methods outperform DL method. This means that, from the iteration point of view, for a subset of problems, DL method is better than WYL and WYLDL methods. But, for all the testing problems, WYLDL method is more robust than DL method.
From Figure 2, which plots the performance profile based on function and gradient evaluations, it can be seen that for , DL method performs better than WYL and WYLDL methods. Comparing with Figure 1, the difference of these methods is much less than the iterations’ profile. One of the possible reason is as follows: for WYLDL and WYL methods, the average times of function and gradient evaluations required during the iterations are less than DL method. From this point of view, the CPU time consumed by WYLDL or WYL methods should be less than DL method, since the CPU time is mainly dependent on function and gradient evaluations. Figure 3 validates this phenomenon. From Figures 1 to 3, it is easy to see that the performances of WYL method and WYLDL method are quite similar. The possible reason I thank is that the second part of , is very small compared with . One of the reasons may be relevant to the Wolfe line search. Since the line search used in this paper is based on Lemarechal , Fletcher , or More and Thuente's  strategy, this may make the directional derivative very small.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
This research was supported by the Guangxi Universities Foundation Grant no. 2013BYB210.
Y. H. Dai and Y. Yuan, Nonlinear Conjugate Gradient Method, Shanghai science and Technology Press, 2000.
Z. X. Wei, G. Y. Li, and L. Q. Qi, “Global convergence of the Polak-Ribière-Polyak conjugate gradient method with an Armijo-type inexact line search for nonconvex unconstrained optimization problems,” Mathematics of Computation, vol. 77, no. 264, pp. 2173–2193, 2008.View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
H. Huang, Z. Wei, and S. Yao, “The proof of the sufficient descent condition of the Wei-Yao-Liu conjugate gradient method under the strong Wolfe-Powell line search,” Applied Mathematics and Computation, vol. 189, no. 2, pp. 1241–1245, 2007.View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
H. Huang, S. Yao, and H. Lin, “A new conjugate gradient method based on HS-DY methods,” Journal of Guangxi University of Technology, vol. 4, pp. 63–66, 2008.View at: Google Scholar
I. Bongartz, A. R. Conn, N. Gould, and P. L. Toint, “CUTE: constrained and unconstrained testing environment,” ACM Transactions on Mathematical Software, vol. 21, no. 1, pp. 123–160, 1995.View at: Google Scholar
R. Fletcher, Practical Methods of Optimization. Vol. 1: Unconstrained Optimization, John Wiley and Sons, New York, NY, USA, 1989.View at: MathSciNet