Abstract

We propose a conjugate gradient method which is based on the study of the Dai-Liao conjugate gradient method. An important property of our proposed method is that it ensures sufficient descent independent of the accuracy of the line search. Moreover, it achieves a high-order accuracy in approximating the second-order curvature information of the objective function by utilizing the modified secant condition proposed by Babaie-Kafaki et al. (2010). Under mild conditions, we establish that the proposed method is globally convergent for general functions provided that the line search satisfies the Wolfe conditions. Numerical experiments are also presented.

1. Introduction

We consider the unconstrained optimization problem:min𝑓(π‘₯),π‘₯βˆˆβ„π‘›,(1) where π‘“βˆΆβ„π‘›β†’β„ is a continuously differentiable function. Conjugate gradient methods are probably the most famous iterative methods for solving the optimization problem (1), especially when the dimension is large characterized by the simplicity of their iteration and their low memory requirements. These methods generate a sequence of points {π‘₯π‘˜}, starting from an initial point π‘₯0βˆˆβ„π‘›, using the iterative formula:π‘₯π‘˜+1=π‘₯π‘˜+π›Όπ‘˜π‘‘π‘˜,π‘˜=0,1,…,(2) where π›Όπ‘˜>0 is the stepsize obtained by some line search and π‘‘π‘˜ is the search direction defined byπ‘‘π‘˜=ξ‚»βˆ’π‘”0,ifπ‘˜=0;βˆ’π‘”π‘˜+π›½π‘˜π‘‘π‘˜βˆ’1,otherwise,(3) where π‘”π‘˜ is the gradient of 𝑓 at π‘₯π‘˜ and π›½π‘˜ is a scalar. Well-known formulas for π›½π‘˜ include the Hestenes-Stiefel (HS) [1], the Fletcher-Reeves (FR) [2], the Polak-RibiΓ¨re (PR) [3], the Liu-Storey (LS) [4], the Dai-Yuan (DY) [5] and the conjugate descent (CD) [6]. They are specified by𝛽HSπ‘˜=π‘”π‘‡π‘˜π‘¦π‘˜βˆ’1π‘¦π‘‡π‘˜βˆ’1π‘‘π‘˜βˆ’1,𝛽FRπ‘˜=β€–π‘”π‘˜β€–2β€–π‘”π‘˜βˆ’1β€–2,𝛽PRπ‘˜=π‘”π‘‡π‘˜π‘¦π‘˜βˆ’1β€–π‘”π‘˜βˆ’1β€–2,𝛽LSπ‘˜π‘”=βˆ’π‘‡π‘˜π‘¦π‘˜βˆ’1π‘‘π‘‡π‘˜βˆ’1π‘”π‘˜βˆ’1,𝛽DYπ‘˜=β€–π‘”π‘˜β€–2π‘¦π‘‡π‘˜βˆ’1π‘‘π‘˜βˆ’1,𝛽CDπ‘˜=βˆ’β€–π‘”π‘˜β€–2π‘‘π‘‡π‘˜βˆ’1π‘”π‘˜βˆ’1,(4) respectively, where π‘ π‘˜βˆ’1=π‘₯π‘˜βˆ’π‘₯π‘˜βˆ’1, π‘¦π‘˜βˆ’1=π‘”π‘˜βˆ’π‘”π‘˜βˆ’1, and β€–β‹…β€– denotes the Euclidean norm. If 𝑓 is a strictly convex quadratic function, and the performed line search is exact, all these methods are equivalent, but for a general function different choices of π›½π‘˜ give rise to distinct conjugate gradient methods with quite different computational efficiency and convergence properties. We refer to the books [7, 8], the survey paper [9], and the references therein about the numerical performance and the convergence properties of conjugate gradient methods. During the last decade, much effort has been devoted to develop new conjugate gradient methods which are not only globally convergent for general functions but also computationally superior to classical methods and there are classified by two classes.

The first class utilizes the second-order information of the objective function to improve the efficiency and robustness of conjugate gradient methods. Dai and Liao [10] proposed a new conjugate gradient method by exploiting a new conjugacy condition based on the standard secant equation in which π›½π‘˜ in (3) is defined by𝛽DLπ‘˜=π‘”π‘‡π‘˜ξ€·π‘¦π‘˜βˆ’1βˆ’π‘‘π‘ π‘˜βˆ’1ξ€Έπ‘‘π‘‡π‘˜βˆ’1π‘¦π‘˜βˆ’1,(5) where 𝑑β‰₯0 is a scalar. Moreover, Dai and Liao also suggested a modification of (5) from a viewpoint of global convergence for general functions, by restricting the first term of being nonnegative, namely,𝛽DL+π‘˜ξƒ―π‘”=maxπ‘‡π‘˜π‘¦π‘˜βˆ’1π‘‘π‘‡π‘˜βˆ’1π‘¦π‘˜βˆ’1𝑔,0βˆ’π‘‘π‘‡π‘˜π‘ π‘˜βˆ’1π‘‘π‘‡π‘˜βˆ’1π‘¦π‘˜βˆ’1.(6) Along this line, many researchers [11–15] proposed variants of Dai-Liao method based on modified secant conditions with higher orders of accuracy in the approximation of the curvature. Under proper conditions, these methods are globally convergent and sometimes competitive to classical conjugate gradient methods. However, these methods do not ensure to generate descent directions, therefore, the descent condition is usually assumed in their analysis and implementations.

The second class focuses to generate conjugate gradient methods which ensure sufficient descent independent of the accuracy of the line search. On the basis of this idea, Hager and Zhang [16] considered to modify parameter π›½π‘˜ in (3) and proposed a new conjugate gradient method, called CG-DESCENT in which the update parameter is defined as follows:𝛽HZπ‘˜ξ€½π›½=maxπ‘π‘˜,πœ‚π‘˜ξ€Ύ,(7) whereπ›½π‘π‘˜=π‘”π‘‡π‘˜π‘¦π‘˜βˆ’1π‘‘π‘‡π‘˜βˆ’1π‘¦π‘˜βˆ’1β€–β€–π‘¦βˆ’2π‘˜βˆ’1β€–β€–2π‘”π‘‡π‘˜π‘‘π‘˜βˆ’1ξ€·π‘‘π‘‡π‘˜βˆ’1π‘¦π‘˜βˆ’1ξ€Έ2,πœ‚π‘˜1=βˆ’β€–β€–π‘‘π‘˜β€–β€–2‖‖𝑔minπ‘˜β€–β€–ξ€Ύ,,πœ‚(8) and πœ‚>0 is a constant. An important feature of the CG-DESCENT method is that the generated direction satisfies π‘‘π‘‡π‘˜π‘”π‘˜β‰€βˆ’(7/8)β€–π‘”π‘˜β€–2. Moreover, Hager and Zhang [16] established that the CG-DESCENT is globally convergent for general functions under the Wolfe line search conditions.

Quite recently, Zhang et al. [17] considered a different approach, to modify the search direction such that the generated direction satisfies π‘”π‘‡π‘˜π‘‘π‘˜=βˆ’β€–π‘”π‘˜β€–2, independently of the line search used. More analytically, they proposed a modified FR method in which the search direction is given byπ‘‘π‘˜ξƒ©=βˆ’1+𝛽FRπ‘˜π‘”π‘‡π‘˜π‘‘π‘˜βˆ’1β€–β€–π‘”π‘˜β€–β€–ξƒͺπ‘”π‘˜+𝛽FRπ‘˜π‘‘π‘˜βˆ’1.(9) This method is reduced to the standard FR method in case the performed line search is exact. Furthermore, in case π›½π‘˜ is specified by another existing conjugate gradient formula, the property π‘”π‘‡π‘˜π‘‘π‘˜=βˆ’β€–π‘”π‘˜β€–2 is still satisfied. Along this line, many related conjugate gradient methods have been extensively studied [17–24] with strong convergence properties and good average performance.

In this work, we propose a new conjugate gradient method which has both characteristics of the two previously discussed classes. More analytically, our method ensures sufficient descent independent of the accuracy of the line search and achieves a high-order accuracy in approximating the second-order curvature information of the objective function by utilizing the modified secant condition proposed by Babaie-Kafaki et al. [11]. Under mild conditions, we establish the global convergence of our proposed method. The numerical experiments indicate that the proposed method is promising.

The remainder of this paper is organized as follows. In Section 2, we present our motivation and our proposed conjugate gradient method. In Section 3, we present the global convergence analysis of our method. The numerical experiments are reported in Section 4 using the performance profiles of Dolan and MorΓ© [25]. Finally, Section 4 presents our concluding remarks and our proposals for future research.

Throughout this paper, we denote 𝑓(π‘₯π‘˜) and βˆ‡2𝑓(π‘₯π‘˜) as π‘“π‘˜ and πΊπ‘˜, respectively.

2. Algorithm

Firstly, we recall that for quasi-Newton methods, an approximation matrix π΅π‘˜βˆ’1 to the Hessian βˆ‡2π‘“π‘˜βˆ’1 is updated so that a new matrix π΅π‘˜ satisfies the following secant condition:π΅π‘˜π‘ π‘˜βˆ’1=π‘¦π‘˜βˆ’1.(10) Zhang et al. [26] and Zhang and Xu [27] expanded this condition and derived a class of modified secant condition with a vector parameter, in the formπ΅π‘˜π‘ π‘˜βˆ’1=Μƒπ‘¦π‘˜βˆ’1,Μƒπ‘¦π‘˜βˆ’1=π‘¦π‘˜βˆ’1+πœƒπ‘˜βˆ’1π‘ π‘‡π‘˜βˆ’1𝑒𝑒,(11) where 𝑒 is any vector satisfying π‘ π‘‡π‘˜βˆ’1𝑒>0, and πœƒπ‘˜βˆ’1 is defined byπœƒπ‘˜βˆ’1𝑓=6π‘˜βˆ’1βˆ’π‘“π‘˜ξ€Έξ€·π‘”+3π‘˜+π‘”π‘˜βˆ’1ξ€Έπ‘‡π‘ π‘˜βˆ’1.(12) Observing that this new quasi-Newton equation contains not only gradient value information but also function value information at the present and the previous step. Moreover, in [26], Zhang et al. proved that if β€–π‘ π‘˜βˆ’1β€– is sufficiently small, thenπ‘ π‘‡π‘˜βˆ’1ξ€·πΊπ‘˜π‘ π‘˜βˆ’1βˆ’π‘¦π‘˜βˆ’1‖‖𝑠=π‘‚π‘˜βˆ’1β€–β€–3,π‘ π‘‡π‘˜βˆ’1ξ€·πΊπ‘˜π‘ π‘˜βˆ’1βˆ’Μƒπ‘¦π‘˜βˆ’1‖‖𝑠=π‘‚π‘˜βˆ’1β€–β€–4.(13) Clearly, these equations imply that the quantity π‘ π‘‡π‘˜βˆ’1Μƒπ‘¦π‘˜βˆ’1 approximates the second order curvature π‘ π‘‡π‘˜βˆ’1πΊπ‘˜π‘ π‘˜βˆ’1 with a higher precision than the quantity π‘ π‘‡π‘˜βˆ’1π‘¦π‘˜βˆ’1 does.

However, for values of β€–π‘ π‘˜βˆ’1β€– greater than one (i.e., β€–π‘ π‘˜βˆ’1β€–>1), the standard secant equation (10) is expected to be more accurate than the modified secant equation (11). Recently, Babaie-Kafaki et al. [11] to overcome this difficulty considered an extension of the modified secant (11) as follows:π΅π‘˜π‘ π‘˜βˆ’1=Μƒπ‘¦βˆ—π‘˜βˆ’1,Μƒπ‘¦βˆ—π‘˜βˆ’1=π‘¦π‘˜βˆ’1+πœŒπ‘˜βˆ’1ξ€½πœƒmaxπ‘˜βˆ’1ξ€Ύ,0π‘ π‘‡π‘˜βˆ’1𝑒𝑒,(14) where parameter πœŒπ‘˜βˆ’1 is restricted to values of {0,1} and adaptively switch between the standard secant equation (10) and the modified secant equation (14), by setting πœŒπ‘˜βˆ’1=1 if β€–π‘ π‘˜βˆ’1‖≀1 and setting πœŒπ‘˜βˆ’1=0, otherwise. In the same way as Dai and Liao [10], they obtained an expression for π›½π‘˜, in the formΜƒπ›½π‘˜=π‘”π‘‡π‘˜ξ€·Μƒπ‘¦βˆ—π‘˜βˆ’1βˆ’π‘‘π‘ π‘˜βˆ’1ξ€Έπ‘‘π‘‡π‘˜βˆ’1Μƒπ‘¦βˆ—π‘˜βˆ’1,(15) where 𝑑β‰₯0 and Μƒπ‘¦βˆ—π‘˜βˆ’1 is defined by (14). Furthermore, following Dai and Liao’s approach, in order to ensure global convergence for general functions, they modified formula (15) as follows:Μƒπ›½βˆ—π‘˜ξƒ―π‘”=maxπ‘‡π‘˜Μƒπ‘¦βˆ—π‘˜βˆ’1π‘‘π‘‡π‘˜βˆ’1Μƒπ‘¦βˆ—π‘˜βˆ’1𝑔,0βˆ’π‘‘π‘‡π‘˜π‘ π‘˜βˆ’1π‘‘π‘‡π‘˜βˆ’1Μƒπ‘¦βˆ—π‘˜βˆ’1.(16)

Motivated by the theoretical advantages of the modified secant equation (14) and the technique of the modified FR method [17], we propose a new conjugate gradient method as follows. Let the search direction be defined byπ‘‘π‘˜ξƒ©=βˆ’1+π›½π‘˜π‘”π‘‡π‘˜π‘‘π‘˜βˆ’1β€–β€–π‘”π‘˜β€–β€–ξƒͺπ‘”π‘˜+π›½π‘˜π‘‘π‘˜βˆ’1,(17) where π›½π‘˜ is defined by (15). It is easy to see that the sufficient descent condition holds:π‘”π‘‡π‘˜π‘‘π‘˜β€–β€–π‘”=βˆ’π‘˜β€–β€–2,(18) using any line search. Moreover, if 𝑓 is a convex quadratic function and the performed line search is exact, then πœƒπ‘˜βˆ’1=0,Μƒπ‘¦βˆ—π‘˜βˆ’1=π‘¦π‘˜βˆ’1, and π‘”π‘‡π‘˜π‘ π‘˜βˆ’1=0; hence, the conjugate gradient method (2)–(17) is reduced to the standard conjugate gradient method, accordingly.

Now, based on the above discussion, we present our proposed algorithm called modified Dai Liao conjugate gradient algorithm (MDL-CG).

3. Convergence Analysis

In order to establish the global convergence analysis, we make the following assumptions for the objective function 𝑓.

Assumption 1. The level set β„’={π‘₯βˆˆβ„π‘›βˆ£π‘“(π‘₯)≀𝑓(π‘₯0)} is bounded, namely, there exists a positive constant 𝐡>0 such that β€–π‘₯‖≀𝐡,βˆ€π‘₯βˆˆβ„’.(19)

Assumption 2. In some neighborhood π’©βˆˆβ„’, 𝑓 is differentiable and its gradient 𝑔 is Lipschitz continuous, namely, there exists a positive constant 𝐿>0 such that ‖𝑔(π‘₯)βˆ’π‘”(𝑦)‖≀𝐿‖π‘₯βˆ’π‘¦β€–,βˆ€π‘₯,π‘¦βˆˆπ’©.(20)
It follows directly from Assumptions 1 and 2 that there exists a positive constant 𝑀>0 such that ‖𝑔(π‘₯)‖≀𝑀,βˆ€π‘₯βˆˆβ„’.(21)
In order to guarantee the global convergence of Algorithm 1, we will impose that the steplength π›Όπ‘˜ satisfies the Armijo conditions or the Wolfe conditions. The Armijo line search is to find a steplength π›Όπ‘˜=max{πœ†π‘—,𝑗=0,1,2,…} such that 𝑓π‘₯π‘˜+π›Όπ‘˜π‘‘π‘˜ξ€Έξ€·π‘₯βˆ’π‘“π‘˜ξ€Έβ‰€π›Ώπ›Όπ‘˜π‘”π‘‡π‘˜π‘‘π‘˜,(22) where 𝛿,πœ†βˆˆ(0,1) are constants. In the Wolfe line search the steplength π›Όπ‘˜ satisfies 𝑓π‘₯π‘˜+π›Όπ‘˜π‘‘π‘˜ξ€Έξ€·π‘₯βˆ’π‘“π‘˜ξ€Έβ‰€πœŽ1π›Όπ‘˜π‘”π‘‡π‘˜π‘‘π‘˜π‘”ξ€·π‘₯,(23)π‘˜+π›Όπ‘˜π‘‘π‘˜ξ€Έπ‘‡π‘‘π‘˜β‰₯𝜎2π‘”π‘‡π‘˜π‘‘π‘˜,(24) with 0<𝜎1<𝜎2<1.

1. Choose an initial point π‘₯ 0 ∈ ℝ 𝑛 ; Set π‘˜ = 0 .
2. If β€– 𝑔 π‘˜ β€– = 0 , then terminate; Otherwise go to the next.
3. Compute the descent direction 𝑑 π‘˜ by (15) and (17).
4. Determine a stepsize 𝛼 π‘˜ by some line search rule.
5. Let π‘₯ π‘˜ + 1 = π‘₯ π‘˜ + 𝛼 π‘˜ 𝑑 π‘˜ .
6. Set π‘˜ = π‘˜ + 1 and go to 2.

Next, we present some lemmas which are very important for the global convergence analysis.

Lemma 1 (see [11]). Suppose that Assumptions 1 and 2 hold. For πœƒπ‘˜βˆ’1 and Μƒπ‘¦βˆ—π‘˜βˆ’1 defined by (12) and (14), respectively, one has ||πœƒπ‘˜βˆ’1||‖‖𝑠≀3πΏπ‘˜βˆ’1β€–β€–2,β€–β€–Μƒπ‘¦βˆ—π‘˜βˆ’1‖‖‖‖𝑠≀4πΏπ‘˜βˆ’1β€–β€–.(25)

Lemma 2. Suppose that Assumptions 1 and 2 hold. Let {π‘₯π‘˜} be generated by Algorithm MDL-CG, where the line search satisfies the Armijo condition (22), then there exists a positive constant c>0 such that π›Όπ‘˜β€–β€–π‘”β‰₯π‘π‘˜β€–β€–2β€–β€–π‘‘π‘˜β€–β€–2,(26) for all π‘˜β‰₯0.

Proof. From the Armijo condition (22) and Assumptions 1 and 2, we have ξ“π‘˜β‰₯0βˆ’π›Ώπ›Όπ‘˜π‘‘π‘‡π‘˜π‘”π‘˜<+∞.(27) Using this together with inequality (18) implies that ξ“π‘˜β‰₯0π›Όπ‘˜β€–β€–π‘”π‘˜β€–β€–2=βˆ’π‘˜β‰₯0π›Όπ‘˜π‘‘π‘‡π‘˜π‘”π‘˜<+∞.(28) We now prove (26) by considering the following cases.
Case 1. If π›Όπ‘˜=1, it follows from (18) that β€–π‘‘π‘˜β€–β‰€β€–π‘”π‘˜β€–. In this case, inequality (26) is satisfied with 𝑐=1.
Case 2. If π›Όπ‘˜<1 by the line search, πœ†βˆ’1π›Όπ‘˜ does not satisfy (22). This implies 𝑓π‘₯π‘˜+πœ†βˆ’1π›Όπ‘˜π‘‘π‘˜ξ€Έξ€·π‘₯βˆ’π‘“π‘˜ξ€Έ>π›Ώπœ†βˆ’1π›Όπ‘˜π‘”π‘‡π‘˜π‘‘π‘˜.(29) By the mean-value theorem and Assumptions 1 and 2, we get 𝑓π‘₯π‘˜+πœ†βˆ’1π›Όπ‘˜π‘‘π‘˜ξ€Έξ€·π‘₯βˆ’π‘“π‘˜ξ€Έβ‰€πœ†βˆ’1π›Όπ‘˜π‘”π‘‡π‘˜π‘‘π‘˜+πΏπœ†βˆ’2𝛼2π‘˜β€–β€–π‘‘π‘˜β€–β€–2.(30) Using this inequality with (18) and (29), we have π›Όπ‘˜β‰₯βˆ’(1βˆ’π›Ώ)πœ†πΏπ‘”π‘‡π‘˜π‘‘π‘˜β€–β€–π‘‘π‘˜β€–β€–2=(1βˆ’π›Ώ)πœ†πΏβ€–β€–π‘”π‘˜β€–β€–2β€–β€–π‘‘π‘˜β€–β€–2.(31) Letting 𝑐=min{1,(1βˆ’π›Ώ)πœ†/𝐿}, we get (26), which completes the proof.

From inequalities (26) and (28), we can easily obtain the following lemma.

Lemma 3. Suppose that Assumptions 1 and 2 hold. Let {π‘₯π‘˜} be generated by Algorithm MDL-CG where the line search satisfies the Armijo line search (22), then ξ“π‘˜β‰₯0β€–β€–π‘”π‘˜β€–β€–4β€–β€–π‘‘π‘˜β€–β€–2<+∞.(32)

Next, we establish the global convergence theorem for Algorithm MDL-CG for uniformly convex functions.

Theorem 1. Suppose that Assumptions 1 and 2 hold and f is uniformly convex on β„’, namely, there exists a positive constant 𝛾>0 such that 𝛾‖π‘₯βˆ’π‘¦β€–2≀(βˆ‡π‘“(π‘₯)βˆ’βˆ‡π‘“(𝑦))𝑇(π‘₯βˆ’π‘¦),βˆ€π‘₯,π‘¦βˆˆβ„’.(33) Let {π‘₯π‘˜} and {π‘‘π‘˜} be generated by Algorithm MDL-CG and let π›Όπ‘˜ satisfy the Armijo condition (22), then one has either π‘”π‘˜=0 for some π‘˜ or limπ‘˜β†’βˆžβ€–β€–π‘”π‘˜β€–β€–=0.(34)

Proof. Suppose that π‘”π‘˜β‰ 0 for all π‘˜β‰₯0. By the convexity assumption (33), we have π‘‘π‘‡π‘˜βˆ’1Μƒπ‘¦βˆ—π‘˜βˆ’1β‰₯π‘‘π‘‡π‘˜βˆ’1π‘¦π‘˜βˆ’1β‰₯π›Ύπ›Όβˆ’1π‘˜βˆ’1β€–β€–π‘ π‘˜βˆ’1β€–β€–2.(35) Combining the previous relation with Lemma 1, we obtain ||Μƒπ›½π‘˜||=||||π‘”π‘‡π‘˜ξ€·Μƒπ‘¦βˆ—π‘˜βˆ’1βˆ’π‘‘π‘ π‘˜βˆ’1ξ€Έπ‘‘π‘‡π‘˜βˆ’1Μƒπ‘¦βˆ—π‘˜βˆ’1||||β‰€β€–β€–π‘”π‘˜β€–β€–ξ€·β€–β€–Μƒπ‘¦βˆ—π‘˜βˆ’1‖‖‖‖𝑠+π‘‘π‘˜βˆ’1β€–β€–ξ€Έ||π‘‘π‘‡π‘˜βˆ’1Μƒπ‘¦βˆ—π‘˜βˆ’1||≀4𝐿+π‘‘π›Ύβ€–β€–π‘”π‘˜β€–β€–β€–β€–π‘‘π‘˜βˆ’1β€–β€–.(36) Therefore, by the definition of the search direction π‘‘π‘˜ in (17) together with the previous inequality, we give an upper bound for β€–π‘‘π‘˜β€–: β€–β€–π‘‘π‘˜β€–β€–β‰€β€–β€–π‘”π‘˜β€–β€–+||Μƒπ›½π‘˜||β€–β€–π‘‘π‘˜βˆ’1β€–β€–β€–β€–π‘”π‘˜β€–β€–β€–β€–π‘”π‘˜β€–β€–2β€–β€–π‘”π‘˜β€–β€–+||Μƒπ›½π‘˜||β€–β€–π‘‘π‘˜βˆ’1β€–β€–β‰€β€–β€–π‘”π‘˜β€–β€–||̃𝛽+2π‘˜||β€–β€–π‘‘π‘˜βˆ’1‖‖≀1+24𝐿+π‘‘π›Ύξ‚Άβ€–β€–π‘”π‘˜β€–β€–.(37) Inserting this upper bound for π‘‘π‘˜ in (32) yields βˆ‘π‘˜β‰₯0β€–π‘”π‘˜β€–2<∞, which completes the proof.

For simplicity, in Algorithm 1 in case the update parameter π›½π‘˜ is computed by (16), we refer it as Algorithm MDL+-CG. In the following, we show that Algorithm MDL+-CG is globally convergent for general nonlinear functions under the Wolfe line search conditions (23) and (24). In the rest of this section, we assume that convergence does not occur, which implies that there exists a positive constant πœ‡>0 such thatβ€–β€–π‘”π‘˜β€–β€–β‰₯πœ‡,βˆ€π‘˜β‰₯0.(38)

Lemma 4. Suppose that Assumptions 1 and 2 hold. Let {π‘₯π‘˜} and {π‘‘π‘˜} be generated by Algorithm MDL+-CG and let 𝛼k be obtained by the Wolfe line search (23) and (24), then there exist positive constants 𝐢1 and 𝐢2 such that for all π‘˜β‰₯1, ||Μƒπ›½βˆ—π‘˜||≀𝐢1β€–β€–π‘ π‘˜βˆ’1β€–β€–,(39)||Μƒπ›½βˆ—π‘˜||||π‘”π‘‡π‘˜π‘‘π‘˜βˆ’1||β€–β€–π‘”π‘˜β€–β€–2≀𝐢2β€–β€–π‘ π‘˜βˆ’1β€–β€–.(40)

Proof. From (14), (18) and (24), we have π‘‘π‘‡π‘˜βˆ’1Μƒπ‘¦βˆ—π‘˜βˆ’1β‰₯π‘‘π‘‡π‘˜βˆ’1π‘¦π‘˜βˆ’1β‰₯ξ€·πœŽ2ξ€Έπ‘”βˆ’1π‘‡π‘˜βˆ’1π‘‘π‘˜βˆ’1=ξ€·1βˆ’πœŽ2ξ€Έβ€–β€–π‘”π‘˜βˆ’1β€–β€–2.(41) Utilizing this with Lemma 1, Assumption 2, and relations (21) and (38), we have ||Μƒπ›½βˆ—π‘˜||≀||||π‘”π‘‡π‘˜Μƒπ‘¦βˆ—π‘˜βˆ’1π‘‘π‘‡π‘˜βˆ’1Μƒπ‘¦βˆ—π‘˜βˆ’1||||||||𝑔+π‘‘π‘‡π‘˜π‘ π‘˜βˆ’1π‘‘π‘‡π‘˜βˆ’1Μƒπ‘¦βˆ—π‘˜βˆ’1||||≀4π‘€πΏβ€–π‘ π‘˜βˆ’1β€–ξ€·1βˆ’πœŽ2ξ€Έπœ‡2𝑀‖‖𝑠+π‘‘π‘˜βˆ’1β€–β€–ξ€·1βˆ’πœŽ2ξ€Έπœ‡2=(4𝐿+𝑑)𝑀1βˆ’πœŽ2ξ€Έπœ‡2β€–β€–π‘ π‘˜βˆ’1β€–β€–.(42) Letting 𝐢1=(4𝐿+𝑑)𝑀/(1βˆ’πœŽ2)πœ‡2, then (39) is satisfied. Furthermore, by the Wolfe condition (24), we have π‘”π‘‡π‘˜π‘‘π‘˜βˆ’1β‰₯𝜎2π‘”π‘‡π‘˜βˆ’1π‘‘π‘˜βˆ’1β‰₯βˆ’πœŽ2π‘‘π‘‡π‘˜βˆ’1Μƒπ‘¦βˆ—π‘˜βˆ’1+𝜎2π‘”π‘‡π‘˜π‘‘π‘˜βˆ’1.(43) Also, observe that π‘”π‘‡π‘˜π‘‘π‘˜βˆ’1=π‘‘π‘‡π‘˜βˆ’1π‘¦π‘˜βˆ’1+π‘”π‘‡π‘˜βˆ’1π‘‘π‘˜βˆ’1β‰€π‘‘π‘‡π‘˜βˆ’1Μƒπ‘¦βˆ—π‘˜βˆ’1.(44) By rearranging inequality (43), we obtain π‘”π‘‡π‘˜π‘‘π‘˜βˆ’1β‰₯βˆ’(𝜎2/1βˆ’πœŽ2)π‘‘π‘‡π‘˜βˆ’1Μƒπ‘¦βˆ—π‘˜βˆ’1, and together with (44), we obtain ||||π‘”π‘‡π‘˜π‘‘π‘˜βˆ’1π‘‘π‘‡π‘˜βˆ’1Μƒπ‘¦βˆ—π‘˜βˆ’1||||ξƒ―πœŽβ‰€max2ξ€·1βˆ’πœŽ2ξ€Έξƒ°,1.(45) It follows from Assumption 1 and (18), (24), (38), and (39) that ||Μƒπ›½βˆ—π‘˜||||π‘”π‘‡π‘˜π‘‘π‘˜βˆ’1||β€–β€–π‘”π‘˜β€–β€–2β‰€β€–β€–π‘”π‘˜β€–β€–ξ€·β€–β€–Μƒπ‘¦βˆ—π‘˜βˆ’1‖‖‖‖𝑠+π‘‘π‘˜βˆ’1β€–β€–ξ€Έ||π‘‘π‘‡π‘˜βˆ’1Μƒπ‘¦βˆ—π‘˜βˆ’1||||π‘”π‘‡π‘˜π‘‘π‘˜βˆ’1||β€–β€–π‘”π‘˜β€–β€–2≀(4𝐿+𝑑)πœ‡ξƒ―πœŽmax2ξ€·1βˆ’πœŽ2‖‖𝑠,1π‘˜βˆ’1β€–β€–.(46) Letting 𝐢2=(4𝐿+𝑑)max{𝜎2/(1βˆ’πœŽ2),1}/πœ‡, we obtain (40).

Next, we present a lemma which shows that, asymptotically, the search directions change slowly.

Lemma 5. Suppose that Assumptions 1 and 2 hold. Let {π‘₯π‘˜} and {π‘‘π‘˜} be generated by Algorithm MDL+-CG and let π›Όπ‘˜ be obtained by the Wolfe line search (23) and (24), then π‘‘π‘˜β‰ 0 and ξ“π‘˜β‰₯1β€–β€–π‘€π‘˜βˆ’π‘€π‘˜βˆ’1β€–β€–2<∞,(47) where π‘€π‘˜=π‘‘π‘˜/β€–π‘‘π‘˜β€–.

Proof. Firstly, note that π‘‘π‘˜β‰ 0, for otherwise (18) would imply π‘”π‘˜=0. Therefore, π‘€π‘˜ is well defined. Next, we divide formula Μƒπ›½βˆ—π‘˜ in two parts as follows: Μƒπ›½βˆ—π‘˜=Μƒπ›½π‘˜(1)+Μƒπ›½π‘˜(2),(48) where Μƒπ›½π‘˜(1)𝑔=maxπ‘‡π‘˜Μƒπ‘¦βˆ—π‘˜βˆ’1π‘‘π‘‡π‘˜βˆ’1Μƒπ‘¦βˆ—π‘˜βˆ’1ξƒ°,̃𝛽,0π‘˜(2)𝑔=βˆ’π‘‘π‘‡π‘˜π‘ π‘˜βˆ’1π‘‘π‘‡π‘˜βˆ’1Μƒπ‘¦βˆ—π‘˜βˆ’1.(49) Moreover, let us define a vector π‘Ÿπ‘˜ and a scalar π›Ώπ‘˜ by π‘Ÿπ‘˜πœβˆΆ=π‘˜β€–β€–π‘‘π‘˜β€–β€–,π›Ώπ‘˜βˆΆ=π›½π‘˜(1)β€–β€–π‘‘π‘˜βˆ’1β€–β€–β€–β€–π‘‘π‘˜β€–β€–,(50) where πœπ‘˜ξƒ©Μƒπ›½=βˆ’1+βˆ—π‘˜π‘”π‘‡π‘˜π‘‘π‘˜βˆ’1β€–β€–π‘”π‘˜β€–β€–2ξƒͺπ‘”π‘˜+Μƒπ›½π‘˜(2)π‘‘π‘˜βˆ’1.(51) Therefore, from (17), for π‘˜β‰₯1, we obtain π‘€π‘˜=π‘Ÿπ‘˜+π›Ώπ‘˜π‘€π‘˜βˆ’1.(52) Using this relation with the identity β€–π‘€π‘˜β€–=β€–π‘€π‘˜βˆ’1β€–=1, we have that β€–β€–π‘Ÿπ‘˜β€–β€–=β€–β€–π‘€π‘˜βˆ’π›Ώπ‘˜π‘€π‘˜βˆ’1β€–β€–=β€–β€–π‘€π‘˜βˆ’1βˆ’π›Ώπ‘˜π‘€π‘˜β€–β€–.(53) In addition, using this with the condition π›Ώπ‘˜β‰₯0 and the triangle inequality, we get β€–β€–π‘€π‘˜βˆ’π‘€π‘˜βˆ’1β€–β€–β‰€β€–β€–π‘€π‘˜βˆ’π›Ώπ‘˜π‘€π‘˜βˆ’1β€–β€–+β€–β€–π‘€π‘˜βˆ’1βˆ’π›Ώπ‘˜π‘€π‘˜β€–β€–β€–β€–π‘Ÿ=2π‘˜β€–β€–.(54) Now, we evaluate the quantity πœπ‘˜. It follows from the definition of πœπ‘˜ in (52) and (21), (39), (40), and (45) that there exists a positive constant 𝐷>0 such that β€–β€–πœπ‘˜β€–β€–=β€–β€–β€–β€–βˆ’ξƒ©Μƒπ›½1+βˆ—π‘˜π‘”π‘‡π‘˜π‘‘π‘˜βˆ’1β€–β€–π‘”π‘˜β€–β€–2ξƒͺπ‘”π‘˜+Μƒπ›½π‘˜(2)π‘‘π‘˜βˆ’1‖‖‖‖≀||̃𝛽1+βˆ—π‘˜||||π‘”π‘‡π‘˜π‘‘π‘˜βˆ’1||β€–β€–π‘”π‘˜β€–β€–2ξƒͺβ€–β€–π‘”π‘˜β€–β€–||||𝑔+π‘‘π‘‡π‘˜π‘ π‘˜βˆ’1π‘‘π‘‡π‘˜βˆ’1Μƒπ‘¦βˆ—π‘˜βˆ’1||||β€–β€–π‘‘π‘˜βˆ’1‖‖≀||̃𝛽1+βˆ—π‘˜||||π‘”π‘‡π‘˜π‘‘π‘˜βˆ’1||β€–β€–π‘”π‘˜β€–β€–2ξƒͺβ€–β€–π‘”π‘˜β€–β€–||||𝑔+π‘‘π‘‡π‘˜π‘‘π‘˜βˆ’1π‘‘π‘‡π‘˜βˆ’1Μƒπ‘¦βˆ—π‘˜βˆ’1||||β€–β€–π‘ π‘˜βˆ’1‖‖≀1+𝐢2ξ€Έξƒ―πœŽπ‘€+𝑑max2ξ€·1βˆ’πœŽ2ξ€Έξƒ°,1π΅β‰œπ·.(55) From the previous relation and Lemma 3, we obtain ξ“π‘˜β‰₯0β€–β€–π‘Ÿπ‘˜β€–β€–2=ξ“π‘˜β‰₯0β€–β€–πœπ‘˜β€–β€–2β€–β€–π‘‘π‘˜β€–β€–2=ξ“π‘˜β‰₯0β€–β€–πœπ‘˜β€–β€–2β€–β€–π‘”π‘˜β€–β€–4β€–β€–π‘”π‘˜β€–β€–4β€–β€–π‘‘π‘˜β€–β€–2≀𝐷2πœ‡4ξ“π‘˜β‰₯0β€–β€–π‘”π‘˜β€–β€–4β€–β€–π‘‘π‘˜β€–β€–2<+∞.(56) Therefore, using this with (54), we complete the proof.

Let 𝑍+ denote the set of positive integers. For πœ†>0 and a positive integer Ξ”, we define the set π’¦πœ†π‘˜,Ξ”ξ€½βˆΆ=π‘–βˆˆπ‘+β€–β€–π‘ βˆ£π‘˜β‰€π‘–β‰€π‘˜+Ξ”βˆ’1,π‘–βˆ’1β€–β€–ξ€Ύ>πœ†.(57) Let |π’¦πœ†π‘˜,Ξ”| denote the number of elements in π’¦πœ†π‘˜,Ξ”. The following lemma shows that if the gradients are bounded away from zero and Lemma 4 holds, then a certain fraction of the steps cannot be too small. This lemma is equivalent to Lemma 3.5 in [10] and Lemma 4.2 in [28].

Lemma 6. Suppose that all assumptions of Lemma 5 hold. Then, there exists constant πœ†>0 such that, for any Ξ”βˆˆπ‘+ and any index π‘˜0, there exists a greater index π‘˜β‰₯π‘˜0 such that ||π’¦πœ†π‘˜,Ξ”||>Ξ”2.(58)

Next, making use of Lemmas 4, 5, and 6, we can establish the global convergence theorem for Algorithm MDL+-CG under the Wolfe line search for general functions whose proof is similar to that of Theorem 3.6 in [10] and Theorem 4.3 in [28], thus we omit it.

Theorem 2. Suppose that Assumptions 1 and 2 hold. If {π‘₯π‘˜} is obtained by Algorithm MDL+-CG and 𝛼k is obtained by the Wolfe line search (23) and (24), then one has either π‘”π‘˜=0 for some π‘˜ or limπ‘˜β†’βˆžβ€–β€–π‘”infπ‘˜β€–β€–=0.(59)

4. Numerical Experiments

In this section, we report numerical experiments which were performed on a set of 73 unconstrained optimization problems. These test problems with the given initial points can be found in Andrei Neculai’s web site (http://camo.ici.ro/neculai/SCALCG/testuo.pdf). Each test function made an experiment with the number of variables 1000, 5000, and 10000, respectively.

We evaluate the performance of our proposed conjugate gradient method MDL+-CG with that of the CG-DESCENT method [16]. The CG-DESCENT code is coauthored by Hager and Zhang obtained from Hager’s web page (http://www.math.ufl.edu/~hager/papers/CG/). The implementation code was written in Fortran and have been compiled with the Intel Fortran compiler ifort (with compiler settings βˆ’02 -double-size 128) on a PC (2.66 GHz Quad-Core processor, 4 Gbyte RAM) running Linux operating system. All algorithms were implemented with the Wolfe line search proposed by Hager and Zhang [16] and the parameters were set as default. In our experiments, the termination criterion is β€–π‘”π‘˜β€–β‰€10βˆ’6 and set 𝑒=π‘ π‘˜βˆ’1 as in [11]. In the sequel, we focus our interest on our experimental analysis for the best value of parameter 𝑑; hence, we have tested values of 𝑑 ranging from 0 to 1 in steps of 0.005. The detailed numerical results can be found in the web site: http://www.math.upatras.gr/~livieris/Results/MDL_results.zip.

Figure 1 presents the percentage of the test problems that were successfully solved by Algorithm MDL+-CG for each choice of parameter 𝑑 and Figure 2 presents the multigraph of means with respect to function and gradient evaluations. Algorithm MDL+-CG reports the best results relative to the success rate for choices of 𝑑 which belong to the interval [0.785,0.825]. Moreover, in case 𝑑=0.82, Algorithm MDL+-CG illustrates the highest success rate (98.64%), solving 216 out of 219 of the test problems successfully. Clearly, the interpretation in Figures 1 and 2 presents the influence of parameter 𝑑 in the computational cost is more sensitive, hence, we focus our attention to the function and gradient evaluations metrics. Based on the experimental results performed on this limited test set we conjecture that the optimal parameter 𝑑 with respect to the computational cost belongs to the interval [0.02,0.115]. Notice that in case 𝑑=0.07, Algorithm MDL+-CG illustrates the least mean number of function and gradient evaluations. Furthermore, for values of 𝑑 in the intervals [0.32,0.42],[0.61,0.70], and [0.95,1], Algorithm MDL+-CG exhibits its worst performance with respect to the computational cost. It is worth noticing that the choice 𝑑=0.995 exhibits the worst performance in terms of computational cost and success rate.

We conclude our analysis by considering the performance profiles of Dolan and MorΓ© [25] for the worst and the best parameter 𝑑 choices. The use of performance profiles provide a wealth of information such as solver efficiency, robustness, and probability of success in compact form and eliminate the influence of a small number of problems on the benchmarking process and the sensitivity of results associated with the ranking of solvers [25]. The performance profile plots the fraction 𝑃 of problems for which any given method is within a factor 𝜏 of the best method. The horizontal axis of the figure gives the percentage of the test problems for which a method is the fastest (efficiency), while the vertical axis gives the percentage of the test problems that were successfully solved by each method (robustness). The curves in Figures 3 and 4 have the following meaning.(i)β€œCG-DESCENT” stands for the CG-DESCENT method.(ii)β€œMDL+1” stands for Algorithm MDL+-CG with 𝑑=0.07.(iii)β€œMDL+2” stands for Algorithm MDL+-CG with 𝑑=0.82.(iv)β€œMDL+3” stands for Algorithm MDL+-CG with 𝑑=0.995.

Figures 3–5 present the performance profiles of CG-DESCENT, MDL+1MDL+2, and MDL+3 relative to the function evaluations, gradient evaluations, and CPU time (in seconds), respectively. Obviously, MDL+1 exhibits the best overall performance, significantly outperforming all other conjugate gradient methods, relative to all performance metrics. More analytically, MDL+1 solves about 64.4% and 66.2% of the test problems with the least number of function and gradient evaluations, respectively, while CG-DESCENT solves about 48.8% and 47%, in the same situations. Moreover, MDL+2 is more robust than the CG-DESCENT since it solves 55.3% and 53% of the test problems with the least number of function and gradient evaluations, respectively. As regarding the CPU time metric, the interpretation in Figure 5 illustrates that MDL+1 reports the best performance, followed by MDL+2. More specifically, MDL+1 solves 68.4% of the test problems with the least CPU time while MDL+2 solves about 58% of the test problems. In terms of efficiency, MDL+2 and CG-DESCENT exhibit the best performance, successfully solving 216 out of 219 of the test problems. MDL+3 presents the worst performance, since its curves lie under the curves of the other conjugate gradient methods, regarding all performance metrics. In summary, based on the performance of MDL+1,MDL+2 and MDL+3, we point out that the choice of parameter 𝑑 crucially affects the efficiency of Algorithm MDL+-CG.

5. Conclusions and Future Research

In this paper, we proposed a conjugate gradient method which consists of a modification of Dai and Liao method. An important property of our proposed method is that it ensures sufficient descent independence of the accuracy of the line search. Moreover, it achieves a high-order accuracy in approximating the second-order curvature information of the objective function by utilizing the modified secant condition proposed by Babaie-Kafaki et al. [11]. Under mild conditions, we establish that the proposed method is globally convergent for general functions under the Wolfe line search conditions.

The preliminary numerical results show that if we choose a good value of parameter 𝑑, our proposed algorithm performs very well. However, we have not theoretically established an optimal parameter 𝑑, yet which consists our motivation for future research. Moreover, an interesting idea is to apply our proposed method to a variety of challenging real-world problems such as protein folding problems [29].