Abstract
A modified PRP nonlinear conjugate gradient method to solve unconstrained optimization problems is proposed. The important property of the proposed method is that the sufficient descent property is guaranteed independent of any line search. By the use of the Wolfe line search, the global convergence of the proposed method is established for nonconvex minimization. Numerical results show that the proposed method is effective and promising by comparing with the VPRP, CG-DESCENT, and DL+ methods.
1. Introduction
The nonlinear conjugate gradient method is one of the most efficient methods in solving unconstrained optimization problems. It comprises a class of unconstrained optimization algorithms which is characterized by low memory requirements and simplicity.
Consider the unconstrained optimization problem where is continuously differentiable, and its gradient is available.
The iterates of the conjugate gradient method for solving (1.1) are given by where stepsize is positive and computed by certain line search, and the search direction is defined by where , and is a scalar. Some well-known conjugate gradient methods include Polak-Ribière-Polyak (PRP) method [1, 2], Hestenes-Stiefel (HS) method [3], Hager-Zhang (HZ) method [4], and Dai-Liao (DL) method [5]. The parameters of these methods are specified as follows: where is the Euclidean norm and . We know that if is a strictly convex quadratic function, the above methods are equivalent in the case that an exact line search is used. If is nonconvex, their behaviors may be further different.
In the past few years, the PRP method has been regarded as the most efficient conjugate gradient method in practical computation. One remarkable property of the PRP method is that it essentially performs a restart if a bad direction occurs (see [6]). Powell [7] constructed an example which showed that the PRP method can cycle infinitely without approaching any stationary point even if an exact line search is used. This counterexample also indicates that the PRP method has a drawback that it may not globally be convergent when the objective function is nonconvex. Powell [8] suggested that the parameter is negative in the PRP method and defined as Gilbert and Nocedal [9] considered Powellβs suggestion and proved the global convergence of the modified PRP method for nonconvex functions under the appropriate line search. In addition, there are many researches on convergence properties of the PRP method (see [10β12]).
In recent years, much effort has been investigated to create new methods, which not only possess global convergence properties for general functions but also are superior to original methods from the computation point of view. For example, Yu et al. [13] proposed a new nonlinear conjugate gradient method in which the parameter is defined on the basic of such as where (in this paper, we call this method as VPRP method). And they proved the global convergence of the VPRP method with the Wolfe line search. Hager and Zhang [4] discussed the global convergence of the HZ method for strong convex functions under the Wolfe line search and Goldstein line search. In order to prove the global convergence for general functions, Hager and Zhang modified the parameter as where The corresponding method of (1.7) is the famous CG-DESCENT method.
Dai and Liao [5] proposed a new conjugate condition, that is, Under the new conjugate condition, they proved global convergence of the DL conjugate gradient method for uniformly convex functions. According to Powellβs suggestion, Dai and Liao gave a modified parameter The corresponding method of (1.10) is the famous DL+ method. Under the strong Wolfe line search, they researched the global convergence of the DL+ method for general functions. Zhang et al. [14] proposed a modified DL conjugate gradient method and proved its global convergence. Moreover, some researchers have been studying a new type of method called the spectral conjugate gradient method (see [15β17]).
This paper is organized as follows: in the next section, we propose a modified PRP method and prove its sufficient descent property. In Section 3, the global convergence of the method with the Wolfe line search is given. In Section 4, numerical results are reported. We have a conclusion in the last section.
2. Modified PRP Method
In this section, we propose a modified PRP conjugate gradient method in which the parameter is defined on the basic of as follows: in which . We introduce the modified PRP method as follows.
2.1. Modified PRP (MPRP) Method
Step 1. Set , , and , if , then stop.
Step 2. Compute by some inexact line search.
Step 3. Let , , if , then stop.
Step 4. Compute by (2.1), and generate by (1.3).
Step 5. Set , and go to Step 2.
In the convergence analyses and implementations of conjugate gradient methods, one often requires the inexact line search to satisfy the Wolfe line search or the strong Wolfe line search. The Wolfe line search is to find such that where . The strong Wolfe line search consists of (2.2) and the following strengthened version of (2.3):
Moreover, in most references, we can see that the sufficient descent condition is always given which plays a vital role in guaranteeing the global convergence properties of conjugate gradient methods. But, in this paper, can satisfy (2.5) without any line search.
Theorem 2.1. Consider any method (1.2)-(1.3), where . If for all , then
Proof. Multiplying (1.3) by , we get
If , from (2.7), we know that the conclusion (2.6) holds. If , the proof is divided into two cases in the following.
Firstly, if , then from (2.1) and (2.7), one has
Secondly, if , then from (2.7), we also have
From the above, the conclusion (2.6) holds under any line search.
3. Global Convergences of the Modified PRP Method
In order to prove the global convergence of the modified PRP method, we assume that the objective function satisfies the following assumption.
Assumption H
(i) The level set is bounded, that is, there exists a positive constant such that for all , .
(ii) In a neighborhood of , is continuously differentiable and its gradient is Lipchitz continuous, namely, there exists a constant such that
Under these assumptions on , there exists a constant such that
The conclusion of the following lemma, often called the Zoutendijk condition, is used to prove the global convergence properties of nonlinear conjugate gradient methods. It was originally given by Zoutendijk [18].
Lemma 3.1. Suppose that, Assumption H holds. Consider any iteration of (1.2)-(1.3), where satisfies for and satisfies the Wolfe line search, then
Lemma 3.2. Suppose that Assumption H holds. Consider the method (1.2)-(1.3), where , and satisfies the Wolfe line search and (2.6). If there exists a constant , such that then one has where .
Proof. From (2.1) and (3.4), we get
By (2.6) and (3.6), we know that for each .
Define the quantities
By (1.3), one has
Since is unit vector, we get
From and the above equation, one has
By (2.1), (3.4), and (3.6), one has
From (3.3), (2.6), (3.4), and (3.11), one has
so
By (3.10) and the above inequality, one has
Lemma 3.3. Suppose that Assumption H holds. If (3.4) holds, then has property (*), that is, (1)there exists a constant , such that ,(2)there exists a constant , such that .
Proof. From Assumption (ii), we know that (3.2) holds. By (2.1), (3.2), and (3.4), one has Define . If , then from (2.1), (3.1), (3.2), and (3.4), one has
Lemma 3.4 (see [19]). Suppose that Assumption H holds. Let and be generated by (1.2)-(1.3), in which satisfies the Wolfe line search and (2.6). If has the property (*) and (3.4) holds, then there exits , for any and , for all , such that where , denotes the number of the .
Theorem 3.5. Suppose that Assumption H holds. Let and be generated by (1.2)-(1.3), in which satisfies the Wolfe line search and (2.6), , then one has
Proof. To obtain this result, we proceed by contradiction. Suppose that (3.18) does not hold, which means that there exists such that
so, we know that Lemmas 3.2 and 3.4 hold.
We also define , then for all , one has
where , that is,
From Assumption H, we know that there exists a constant such that
From (3.21) and the above inequality, one has
Let be a positive integer and where has been defined in Lemma 3.4. From Lemma 3.2, we know that there exists such that
From the Cauchy-Schwartz inequality and (3.24), , one has
By Lemma 3.4, we know that there exists such that
It follows from (3.23), (3.25), and (3.26) that
From (3.27), one has , which is a contradiction with the definition of . Hence,
which completes the proof.
4. Numerical Results
In this section, we compare the modified PRP conjugate gradient method, denoted the MPRP method, to VPRP method, CG-DESCENT method, and DL+ method under the strong Wolfe line search about problems [20] with the given initial points and dimensions. The parameters are chosen as follows: , , , , and . If is satisfied, we will stop the program. The program will be also stopped if the number of iteration is more than ten thousands. All codes were written in Matlab 7.0 and run on a PC with 2.0βGHz CPU processor and 512βMB memory and Windows XP operation system.
The numerical results of our tests with respect to the MPRP method, VPRP method, CG-DESCENT method, and DL+ method are reported in Tables 1, 2, 3, 4, respectively. In the tables, the column βProblemβ represents the problemβs name in [20], and βCPU,β βNI,β βNF,β and βNGβ denote the CPU time in seconds, the number of iterations, function evaluations, gradient evaluations, respectively. βDimβ denotes the dimension of the tested problem. If the limit of iteration was exceeded, the run was stopped, and this is indicated by NaN.
In this paper, we will adopt the performance profiles by Dolan and MorΓ© [21] to compare the MPRP method to the VPRP method, CG-DESCENT method, and DL+ method in the CPU time, the number of iterations, function evaluations, and gradient evaluations performance, respectively (see Figures 1, 2, 3, 4). In figures,
Figures 1β4 show the performance of the four methods relative to CPU time, the number of iterations, the number of function evaluations, and the number of gradient evaluations, respectively. For example, the performance profiles with respect to CPU time means that for each method, we plot the fraction of problems for which the method is within a factor of the best time. The left side of the figure gives the percentage of the test problems for which a method is the fastest; the right side gives the percentage of the test problems that are successfully solved by each of the methods. The top curve is the method that solved of the most problems in a time that was within a factor of the best time.
Obviously, Figure 1 shows that MPRP method outperforms VPRP method, CG-DESCENT method, and DL+ method for the given test problems in the CPU time. Figures 2β4 show that the MPRP method also has the best performance with respect to the number of iterations and function and gradient evaluations since it corresponds to the top curve. So, the MPRP method is computationally efficient.
5. Conclusions
We have proposed a modified PRP method on the basic of the PRP method, which can generate sufficient descent directions with inexact line search. Moreover, we proved that the proposed modified method converge globally for general nonconvex functions. The performance profiles showed that the proposed method is also very efficient.
Acknowledgments
The authors wish to express their heartfelt thanks to the referees and Professor Piermarco Cannarsa for their detailed and helpful suggestions for revising the paper. This work was supported by The Nature Science Foundation of Chongqing Education Committee (KJ091104) and Chongqing Three Gorge University (09ZZ-060).