Abstract
We propose a conjugate gradient method which is based on the study of the Dai-Liao conjugate gradient method. An important property of our proposed method is that it ensures sufficient descent independent of the accuracy of the line search. Moreover, it achieves a high-order accuracy in approximating the second-order curvature information of the objective function by utilizing the modified secant condition proposed by Babaie-Kafaki et al. (2010). Under mild conditions, we establish that the proposed method is globally convergent for general functions provided that the line search satisfies the Wolfe conditions. Numerical experiments are also presented.
1. Introduction
We consider the unconstrained optimization problem: where is a continuously differentiable function. Conjugate gradient methods are probably the most famous iterative methods for solving the optimization problem (1), especially when the dimension is large characterized by the simplicity of their iteration and their low memory requirements. These methods generate a sequence of points , starting from an initial point , using the iterative formula: where is the stepsize obtained by some line search and is the search direction defined by where is the gradient of at and is a scalar. Well-known formulas for include the Hestenes-Stiefel (HS) [1], the Fletcher-Reeves (FR) [2], the Polak-Ribière (PR) [3], the Liu-Storey (LS) [4], the Dai-Yuan (DY) [5] and the conjugate descent (CD) [6]. They are specified by respectively, where , , and denotes the Euclidean norm. If is a strictly convex quadratic function, and the performed line search is exact, all these methods are equivalent, but for a general function different choices of give rise to distinct conjugate gradient methods with quite different computational efficiency and convergence properties. We refer to the books [7, 8], the survey paper [9], and the references therein about the numerical performance and the convergence properties of conjugate gradient methods. During the last decade, much effort has been devoted to develop new conjugate gradient methods which are not only globally convergent for general functions but also computationally superior to classical methods and there are classified by two classes.
The first class utilizes the second-order information of the objective function to improve the efficiency and robustness of conjugate gradient methods. Dai and Liao [10] proposed a new conjugate gradient method by exploiting a new conjugacy condition based on the standard secant equation in which in (3) is defined by where is a scalar. Moreover, Dai and Liao also suggested a modification of (5) from a viewpoint of global convergence for general functions, by restricting the first term of being nonnegative, namely, Along this line, many researchers [11β15] proposed variants of Dai-Liao method based on modified secant conditions with higher orders of accuracy in the approximation of the curvature. Under proper conditions, these methods are globally convergent and sometimes competitive to classical conjugate gradient methods. However, these methods do not ensure to generate descent directions, therefore, the descent condition is usually assumed in their analysis and implementations.
The second class focuses to generate conjugate gradient methods which ensure sufficient descent independent of the accuracy of the line search. On the basis of this idea, Hager and Zhang [16] considered to modify parameter in (3) and proposed a new conjugate gradient method, called CG-DESCENT in which the update parameter is defined as follows: where and is a constant. An important feature of the CG-DESCENT method is that the generated direction satisfies . Moreover, Hager and Zhang [16] established that the CG-DESCENT is globally convergent for general functions under the Wolfe line search conditions.
Quite recently, Zhang et al. [17] considered a different approach, to modify the search direction such that the generated direction satisfies , independently of the line search used. More analytically, they proposed a modified FR method in which the search direction is given by This method is reduced to the standard FR method in case the performed line search is exact. Furthermore, in case is specified by another existing conjugate gradient formula, the property is still satisfied. Along this line, many related conjugate gradient methods have been extensively studied [17β24] with strong convergence properties and good average performance.
In this work, we propose a new conjugate gradient method which has both characteristics of the two previously discussed classes. More analytically, our method ensures sufficient descent independent of the accuracy of the line search and achieves a high-order accuracy in approximating the second-order curvature information of the objective function by utilizing the modified secant condition proposed by Babaie-Kafaki et al. [11]. Under mild conditions, we establish the global convergence of our proposed method. The numerical experiments indicate that the proposed method is promising.
The remainder of this paper is organized as follows. In Section 2, we present our motivation and our proposed conjugate gradient method. In Section 3, we present the global convergence analysis of our method. The numerical experiments are reported in Section 4 using the performance profiles of Dolan and MorΓ© [25]. Finally, Section 4 presents our concluding remarks and our proposals for future research.
Throughout this paper, we denote and as and , respectively.
2. Algorithm
Firstly, we recall that for quasi-Newton methods, an approximation matrix to the Hessian is updated so that a new matrix satisfies the following secant condition: Zhang et al. [26] and Zhang and Xu [27] expanded this condition and derived a class of modified secant condition with a vector parameter, in the form where is any vector satisfying , and is defined by Observing that this new quasi-Newton equation contains not only gradient value information but also function value information at the present and the previous step. Moreover, in [26], Zhang et al. proved that if is sufficiently small, then Clearly, these equations imply that the quantity approximates the second order curvature with a higher precision than the quantity does.
However, for values of greater than one (i.e., ), the standard secant equation (10) is expected to be more accurate than the modified secant equation (11). Recently, Babaie-Kafaki et al. [11] to overcome this difficulty considered an extension of the modified secant (11) as follows: where parameter is restricted to values of and adaptively switch between the standard secant equation (10) and the modified secant equation (14), by setting if and setting , otherwise. In the same way as Dai and Liao [10], they obtained an expression for , in the form where and is defined by (14). Furthermore, following Dai and Liaoβs approach, in order to ensure global convergence for general functions, they modified formula (15) as follows:
Motivated by the theoretical advantages of the modified secant equation (14) and the technique of the modified FR method [17], we propose a new conjugate gradient method as follows. Let the search direction be defined by where is defined by (15). It is easy to see that the sufficient descent condition holds: using any line search. Moreover, if is a convex quadratic function and the performed line search is exact, then , and ; hence, the conjugate gradient method (2)β(17) is reduced to the standard conjugate gradient method, accordingly.
Now, based on the above discussion, we present our proposed algorithm called modified Dai Liao conjugate gradient algorithm (MDL-CG).
3. Convergence Analysis
In order to establish the global convergence analysis, we make the following assumptions for the objective function .
Assumption 1. The level set is bounded, namely, there exists a positive constant such that
Assumption 2. In some neighborhood , is differentiable and its gradient is Lipschitz continuous, namely, there exists a positive constant such that
It follows directly from Assumptions 1 and 2 that there exists a positive constant such that
In order to guarantee the global convergence of Algorithm 1, we will impose that the steplength satisfies the Armijo conditions or the Wolfe conditions. The Armijo line search is to find a steplength such that
where are constants. In the Wolfe line search the steplength satisfies
with .
Next, we present some lemmas which are very important for the global convergence analysis.
Lemma 1 (see [11]). Suppose that Assumptions 1 and 2 hold. For and defined by (12) and (14), respectively, one has
Lemma 2. Suppose that Assumptions 1 and 2 hold. Let be generated by Algorithm MDL-CG, where the line search satisfies the Armijo condition (22), then there exists a positive constant such that for all .
Proof. From the Armijo condition (22) and Assumptions 1 and 2, we have
Using this together with inequality (18) implies that
We now prove (26) by considering the following cases.
Case 1. If , it follows from (18) that . In this case, inequality (26) is satisfied with .
Case 2. If by the line search, does not satisfy (22). This implies
By the mean-value theorem and Assumptions 1 and 2, we get
Using this inequality with (18) and (29), we have
Letting , we get (26), which completes the proof.
From inequalities (26) and (28), we can easily obtain the following lemma.
Lemma 3. Suppose that Assumptions 1 and 2 hold. Let be generated by Algorithm MDL-CG where the line search satisfies the Armijo line search (22), then
Next, we establish the global convergence theorem for Algorithm MDL-CG for uniformly convex functions.
Theorem 1. Suppose that Assumptions 1 and 2 hold and is uniformly convex on , namely, there exists a positive constant such that Let and be generated by Algorithm MDL-CG and let satisfy the Armijo condition (22), then one has either for some or
Proof. Suppose that for all . By the convexity assumption (33), we have Combining the previous relation with Lemma 1, we obtain Therefore, by the definition of the search direction in (17) together with the previous inequality, we give an upper bound for : Inserting this upper bound for in (32) yields , which completes the proof.
For simplicity, in Algorithm 1 in case the update parameter is computed by (16), we refer it as Algorithm -CG. In the following, we show that Algorithm -CG is globally convergent for general nonlinear functions under the Wolfe line search conditions (23) and (24). In the rest of this section, we assume that convergence does not occur, which implies that there exists a positive constant such that
Lemma 4. Suppose that Assumptions 1 and 2 hold. Let and be generated by Algorithm -CG and let be obtained by the Wolfe line search (23) and (24), then there exist positive constants and such that for all ,
Proof. From (14), (18) and (24), we have Utilizing this with Lemma 1, Assumption 2, and relations (21) and (38), we have Letting , then (39) is satisfied. Furthermore, by the Wolfe condition (24), we have Also, observe that By rearranging inequality (43), we obtain , and together with (44), we obtain It follows from Assumption 1 and (18), (24), (38), and (39) that Letting , we obtain (40).
Next, we present a lemma which shows that, asymptotically, the search directions change slowly.
Lemma 5. Suppose that Assumptions 1 and 2 hold. Let and be generated by Algorithm -CG and let be obtained by the Wolfe line search (23) and (24), then and where .
Proof. Firstly, note that , for otherwise (18) would imply . Therefore, is well defined. Next, we divide formula in two parts as follows: where Moreover, let us define a vector and a scalar by where Therefore, from (17), for , we obtain Using this relation with the identity , we have that In addition, using this with the condition and the triangle inequality, we get Now, we evaluate the quantity . It follows from the definition of in (52) and (21), (39), (40), and (45) that there exists a positive constant such that From the previous relation and Lemma 3, we obtain Therefore, using this with (54), we complete the proof.
Let denote the set of positive integers. For and a positive integer , we define the set Let denote the number of elements in . The following lemma shows that if the gradients are bounded away from zero and Lemma 4 holds, then a certain fraction of the steps cannot be too small. This lemma is equivalent to Lemma 3.5 in [10] and Lemma 4.2 in [28].
Lemma 6. Suppose that all assumptions of Lemma 5 hold. Then, there exists constant such that, for any and any index , there exists a greater index such that
Next, making use of Lemmas 4, 5, and 6, we can establish the global convergence theorem for Algorithm -CG under the Wolfe line search for general functions whose proof is similar to that of Theorem 3.6 in [10] and Theorem 4.3 in [28], thus we omit it.
Theorem 2. Suppose that Assumptions 1 and 2 hold. If is obtained by Algorithm -CG and is obtained by the Wolfe line search (23) and (24), then one has either for some or
4. Numerical Experiments
In this section, we report numerical experiments which were performed on a set of 73 unconstrained optimization problems. These test problems with the given initial points can be found in Andrei Neculaiβs web site (http://camo.ici.ro/neculai/SCALCG/testuo.pdf). Each test function made an experiment with the number of variables 1000, 5000, and 10000, respectively.
We evaluate the performance of our proposed conjugate gradient method -CG with that of the CG-DESCENT method [16]. The CG-DESCENT code is coauthored by Hager and Zhang obtained from Hagerβs web page (http://www.math.ufl.edu/~hager/papers/CG/). The implementation code was written in Fortran and have been compiled with the Intel Fortran compiler ifort (with compiler settings β02 -double-size 128) on a PC (2.66βGHz Quad-Core processor, 4βGbyte RAM) running Linux operating system. All algorithms were implemented with the Wolfe line search proposed by Hager and Zhang [16] and the parameters were set as default. In our experiments, the termination criterion is and set as in [11]. In the sequel, we focus our interest on our experimental analysis for the best value of parameter ; hence, we have tested values of ranging from 0 to 1 in steps of 0.005. The detailed numerical results can be found in the web site: http://www.math.upatras.gr/~livieris/Results/MDL_results.zip.
Figure 1 presents the percentage of the test problems that were successfully solved by Algorithm -CG for each choice of parameter and Figure 2 presents the multigraph of means with respect to function and gradient evaluations. Algorithm -CG reports the best results relative to the success rate for choices of which belong to the interval . Moreover, in case , Algorithm -CG illustrates the highest success rate (98.64%), solving 216 out of 219 of the test problems successfully. Clearly, the interpretation in Figures 1 and 2 presents the influence of parameter in the computational cost is more sensitive, hence, we focus our attention to the function and gradient evaluations metrics. Based on the experimental results performed on this limited test set we conjecture that the optimal parameter with respect to the computational cost belongs to the interval . Notice that in case , Algorithm -CG illustrates the least mean number of function and gradient evaluations. Furthermore, for values of in the intervals , and , Algorithm -CG exhibits its worst performance with respect to the computational cost. It is worth noticing that the choice exhibits the worst performance in terms of computational cost and success rate.
We conclude our analysis by considering the performance profiles of Dolan and MorΓ© [25] for the worst and the best parameter choices. The use of performance profiles provide a wealth of information such as solver efficiency, robustness, and probability of success in compact form and eliminate the influence of a small number of problems on the benchmarking process and the sensitivity of results associated with the ranking of solvers [25]. The performance profile plots the fraction of problems for which any given method is within a factor of the best method. The horizontal axis of the figure gives the percentage of the test problems for which a method is the fastest (efficiency), while the vertical axis gives the percentage of the test problems that were successfully solved by each method (robustness). The curves in Figures 3 and 4 have the following meaning.(i)βCG-DESCENTβ stands for the CG-DESCENT method.(ii)ββ stands for Algorithm -CG with .(iii)ββ stands for Algorithm -CG with .(iv)ββ stands for Algorithm -CG with .
Figures 3β5 present the performance profiles of CG-DESCENT, , and relative to the function evaluations, gradient evaluations, and CPU time (in seconds), respectively. Obviously, exhibits the best overall performance, significantly outperforming all other conjugate gradient methods, relative to all performance metrics. More analytically, solves about 64.4% and 66.2% of the test problems with the least number of function and gradient evaluations, respectively, while CG-DESCENT solves about 48.8% and 47%, in the same situations. Moreover, is more robust than the CG-DESCENT since it solves 55.3% and 53% of the test problems with the least number of function and gradient evaluations, respectively. As regarding the CPU time metric, the interpretation in Figure 5 illustrates that reports the best performance, followed by . More specifically, solves 68.4% of the test problems with the least CPU time while solves about 58% of the test problems. In terms of efficiency, and CG-DESCENT exhibit the best performance, successfully solving 216 out of 219 of the test problems. presents the worst performance, since its curves lie under the curves of the other conjugate gradient methods, regarding all performance metrics. In summary, based on the performance of and , we point out that the choice of parameter crucially affects the efficiency of Algorithm -CG.
5. Conclusions and Future Research
In this paper, we proposed a conjugate gradient method which consists of a modification of Dai and Liao method. An important property of our proposed method is that it ensures sufficient descent independence of the accuracy of the line search. Moreover, it achieves a high-order accuracy in approximating the second-order curvature information of the objective function by utilizing the modified secant condition proposed by Babaie-Kafaki et al. [11]. Under mild conditions, we establish that the proposed method is globally convergent for general functions under the Wolfe line search conditions.
The preliminary numerical results show that if we choose a good value of parameter , our proposed algorithm performs very well. However, we have not theoretically established an optimal parameter , yet which consists our motivation for future research. Moreover, an interesting idea is to apply our proposed method to a variety of challenging real-world problems such as protein folding problems [29].