Abstract

This paper further studies the WYL conjugate gradient (CG) formula with and presents a three-term WYL CG algorithm, which has the sufficiently descent property without any conditions. The global convergence and the linear convergence are proved; moreover the n-step quadratic convergence with a restart strategy is established if the initial step length is appropriately chosen. Numerical experiments for large-scale problems including the normal unconstrained optimization problems and the engineer problems (Benchmark Problems) show that the new algorithm is competitive with the other similar CG algorithms.

1. Introduction

Consider the following minimization optimizations modelling: where is a continuously differentiable function. The CG algorithms for (1) have the following iterative processes: where is the th iterate, is the step length, and is the search direction defined by where is the gradient and the parameter is a scalar determining different formulas (see [18], etc.). The PRP algorithm [6, 7] is one of the most effective CG algorithms and its convergence can be found (see [7, 9, 10], etc.). Powell [9] suggested that should not be less than zero; then many new CG formulas are proposed (see [1117], etc.) to ensure the scalar At present, there are many results obtained in CG algorithms (see [11, 1826], etc.) and a modified weak Wolfe-Powell line search technique is presented to study open unconstrained optimization (see [27, 28]). If a restart strategy is used, the PRP algorithm is n-step quadratic convergence (see [2931]). Li and Tian [32] proved that a three-term CG algorithm has quadratic convergence with a restart strategy under some inexact line searches and the suitable assumptions.

Recently, Wei et al. [21] proposed a new CG formula defined by , where , and are the gradient of at and , respectively, and denotes the Euclidean norm of vectors. It is easy to deduce that The global convergences of the WYL algorithm with exact linear search, the Grippo-Lucidi Armijo line search, and the Wolfe-Powell line search have been established by [21, 3335]. By restricting the parameter under the strong Wolfe-Powell linear search, the WYL algorithm can meet the sufficiently descent property. However the quadratic convergence is still open based on this algorithm. In this paper, we mainly further research the WYL algorithm. On the base of paper [32] and the paper [21], we propose a new WYL three-term CG formula. We show that the new CG algorithm has global convergence for general functions and has the n-step quadratic convergence for uniformly convex functions with r-step restart and standard Armijo line search under appropriate conditions. The numerical results show that the new algorithm performs quite well. The main attributes of this algorithm are listed as follows.

(i) A new WYL three-term CG algorithm is introduced, which has sufficiently descent property automatically.

(ii) The global convergence, linear convergent rate, and the n-step quadratic convergence are established.

(iii) Numerical results show that this algorithm is competitive with the normal algorithm for the given problems.

This paper is arranged as follows. In Section 2, we mainly review the motivation and introduce the modified WYL algorithm. We show that the global convergence and r-step linear convergence of the new algorithm with the standard Armijo line search in Section 3. In Section 4, the n-step quadratic convergence of the given algorithm is proved. In Section 5, some numerical experiments are done.

2. Motivation and Algorithm

In this section, we will give motivations based on the WYL formulas. Consider the WYL search direction and we all know that is sufficiently descent by restricting the parameter under the strong Wolfe-Powell linear search [33]. By the definition of in (4), for , we get In order to ensure that the sufficiently descent property holds, then the first term of the above equality should be maintained. So the directive idea is to add another term to eliminate the second term of the above equality; at the same time, the conjugacy should be guaranteed. Therefore the new conjugate gradient formula called MWYL algorithm is defined by where and . It is easy to see that the above search direction is the normal WYL algorithm if the exact linear search is used. It is not difficult to see from (6) that is a descent direction of at ; namely, we have and Moreover we obtain and Now we list some linear search techniques that will be used in the following sections.

(i) The exact line search is to find such that the function is minimized along the direction , that is, , satisfying

(ii) The Armijo line search is to find a step length which satisfies where and .

(iii) The Wolfe line search conditions are where and .

In the following we will give the MWYL algorithm.

Algorithm 1 (a modified WYL three-terms CG algorithm, called MWYL). : Choose an initial point , , , ; let , : If , stop. Otherwise go to the next step.: Compute step size by the Armijo line search or Wolfe line search.: Let . If , then stop.: Compute the search direction using (6).: Let . Go to step 2.

3. Convergence of Algorithm 1

In this part, we will prove the global convergence and the r-linear convergence of the algorithm with the Armijo line search and Wolfe line search. The following assumptions are required.

Assumption i. The level set is bounded and, in some neighborhood of , is continuously differentiable and bounded below, and its gradient is globally Lipschitz continuous; namely, there exists a constant such that where is the closed convex hull of

Now we establish the global convergence of Algorithm 1.

Theorem 2. Let Assumption i hold and the sequence be generated by Algorithm 1. Then the relation holds.

Proof. We will prove this theorem by contradiction. Suppose that (15) does not hold, then, for all , there exists a constant satisfying Using (9) and (12), if is bounded from below, it is not difficult to get In particular, we have If , by (9) and (18), we obtain This contradicts (16); then (15) holds.
Otherwise if Then there exists an infinite index set satisfying By Step 2 of Algorithm 1, when is sufficiently large, does not satisfy (12), which implies that By (16), similar to the proof of Lemma 2.1 in [36], it is easy to deduce that there exists a constant such that Using (21), (9), and the mean-value theorem, we have where and the last inequality follows (26). Combining with (20), for all sufficiently large, we obtain By (21) and , then the above inequality implies that This is a contradiction too. The proof is complete.

In the next, we will prove the linear convergence of the sequence by the MWYL algorithm with the Armijo or Wolfe line search. The following assumption is further needed.

Assumption ii. is twice continuously differentiable and the uniformly convex function. In other words, there are positive constants such that where denotes the Hessian matrix of at .
It is not difficult to see that, under the Assumption ii, is continuous and is Lipschitz continuous and problem (1) has a unique solution which satisfies and

Lemma 3. Let Assumption ii hold and the sequence be generated by the MWYL algorithm with the Armijo or Wolfe line search, one has where and . In addition, if the Wolfe line search is used, the following holds:

Proof. We have from line search (12) that hold for any , because the objective function is uniformly convex of Assumption ii, is bounded below, so the inequality holds. Combining the Taylor theorem and Assumption ii, we obtain where belong to the segment . Therefore, we get By the inequalities and , we get . Using (12), (30), and Assumption ii, we obtain which includes .
It is not difficult to get that By the second inequality of (13), we get and By Assumption ii, we obtain . This completes the proof.

Lemma 4. Let the sequence be generated by the MWYL algorithm with the Armijo or Wolfe line search and Assumption ii hold; then there is a constant such that

Proof. Set where . By the mean-value theorem, we have and Therefore, by (9), (39), Lemma 3, and the Assumption ii, we get and By the above conclusion, (6), and the Lipschitz continuity of , we get If the Armijo line search is used, using the line search rule, if , then will not satisfy line search condition (12). Namely, Using the mean-value theorem and the above inequality, there exists satisfying Thus, by the above conclusion and (42), we get and letting , we have (36). If the Wolfe line search is used, from the second inequality of (13), we obtain By similar way to that for the Armijo line search, we can find a lower positive bound of ; the proof is completed.

Similar to [32], It is easy to get the r-linear convergence theorem of Algorithm 1. So we only state it as follows but omit the proof.

Theorem 5. Let Assumption ii hold, be the unique solution of (1), and the sequence be generated by the MWYL algorithm with the Armijo or Wolfe line search. Then there are constants and satisfying

Proof. By (12) or the first relation of (13), we get where the first equality follows (9), the second inequality follows (26), and the last inequality (25). Setting generates By (25) again, we have and this relation shows that the proof is complete.

4. The Restart MWYL Algorithm’s N-Step Quadratic Convergence

Setting as exact line search step length, then holds. Thus, where . It is feasible to use the initial step length of the Armijo or Wolfe line search as an approximation of , where is defined bywhere the integer sequence as . If is a quadratic function, then and are consistent; namely, The above discussions can also be found in [32]; in fact, our ideas are motivated by this paper partly. The following Theorem 6 will show that, for sufficiently large , the inexact line search step which is defined by (53) satisfies the Armijo and Wolfe conditions.

Theorem 6. Let sequence be generated by the MWYL algorithm and Assumption ii hold. Then, when is sufficiently large, satisfies the Armijo and Wolfe conditions.

Proof. Let , using Assumption ii and (10), we have and Using , Assumption ii, (47), and (55), we get For is sufficiently large, we have When is sufficiently large, satisfies the Armijo condition. Setting , we get So, for sufficiently large , we have This implies that satisfy the Wolfe line search. The proof is complete.

If we use the restart algorithm, the n-step quadratic convergence is desirable. In the next, we use the as the initial step-length and give the algorithm steps of the restart MWYL algorithm.

Algorithm 7 (called RWYL). : Given , , , , , let : If , stop.: If the inequality holds, we set . Otherwise, we determine satisfying : Let , and .: If , stop.: If , we let . Go to step 1.: Compute by (6). Go to step 2.

Lemma 8. Let Assumption ii hold and be generated by the RWYL algorithm. Then there exist positive numbers , , such that

Proof. Considering the first inequality of (62), we get where . By the definition of we discuss the other three inequalities of (62), respectively. Starting from , by the (39) and (62), we get By (40), (62), and the definition of , we obtain By the above conclusion and the definition of , we have The proof is complete.

In the following, we will prove the n-order quadratic convergence of the RWYL algorithm. We always let Assumption ii hold and be generated by the RWYL algorithm. Using as the unique solution of problem (1), by Theorem 2, we have . The equation always holds if only is large enough by the Theorem 6. In order to establish this convergence of the RWYL algorithm, we further need the following assumption.

Assumption iii. In some neighborhood of , is Lipschitz continuous.
Based on Assumption iii and the above lemma, we have the following remarks. Let be the second-order approximate function of in the neighborhood of the initial point , then we have Let and be the iterations and directions generated by the RWYL algorithm to minimize the quadratic function with initial point Specifically, the sequence is generated by using the following process: and where for : From the proof process of Theorem 6, it is not difficult to see that when is sufficiently large, step length can always be found. Because is a quadratic function, is the same as the step length obtained by the exact line search. Consequently, we have ; moreover there is an index such that is the exact minimizer of .
Similar to Lemmas in the paper [30], it is not difficult to get the following relations: and

The following lemma shows that the parameter will converge to 0.

Lemma 9. For the parameter , one has

Proof. Let and be defined by Lemma 8; we get where is Lipschitz constant for on set . Then we get For sufficiently large, we get . By the mean value theorem, we have Therefore, by the definition of , we have This completes the proof.

Theorem 10. Let Assumptions ii and iii hold; then, for all , one gets and

Proof. First, we will prove the following relationship. For all , we have and For the RWYL algorithm, by and . The equalities (81)-(83) obviously hold for . Suppose that (81)-(83) hold for , we prove that the equalities (81)-(83) hold for . By inequality (72), we get Using the equality and the mean value theorem, we get and where the above inequalities follow (71), (72), (73), (81), and (82). Thus, equalities (81)-(83) hold for all Moreover equality (79) holds too. Now we prove that (80) holds. Considering we get Therefore equality (80) holds. This completes the proof.

Based on the above lemmas, similar to Theorem 4.2 of [32], we can get the n-step quadratic convergence of the RWYL algorithm. Here we only state it as follows but omit the proof too.

Theorem 11. Let Assumptions ii and iii hold; then there exists a constant satisfying Namely, the RWYL algorithm is quadratically convergent.

5. Numerical Results

This section reports some numerical experiments with Algorithm 7 (RWYL). In order to show the effectiveness of the given algorithm, we will test the algorithm in [15, 19] (Hager-Zhang), the algorithm in [6, 7] (PRP), and the algorithm in [21] (WYL). In these four algorithms, the Wolfe-Powell line search technique is used as well as the parameters and The restart constant The program will be stopped if or holds.

5.1. Normal Unstrained Optimization Problems

The unconstrained optimization problems with the given initial points can be found athttp://camo.ici.ro/neculai/THREECG/funcname.txt,

which were collected by Neculai Andrei. The programs are written by Fortran and the codes are downloaded fromhttp://users.clas.ufl.edu/hager/papers/Software/,

which are written by Hager and Zhang.

All codes run on PC Core 2 Duo CPU at 3.2 GHz, 2.00GB of RAM, and Windows 7 operation system. The dimension of the test problems is 10000, 50000, and 10,0000 variables. The dimension (dim) of the variable, the CPU time in seconds (CPU) and the number of iterations (NI), function evaluations and gradient evaluations (NFG), the final function values, and the norm value of the gradient when the program is stopped for these four algorithms are computed. The profiles of Dolan and Moré [37] are used to analyze the performance data of these four algorithms. The fraction of problems is plotted where any given algorithm is within a factor of the best time. In a performance profile plot, the top curve shows that the algorithm solves the most problems in a time, which was within a factor of the best time.

Figures 13 show the performance of the RWYL, Hager-Zhang, PRP, and WYL algorithms with the dimension 10000 about NI, NFG, and CPU time, respectively. It is not difficult to see that these four algorithms can successfully solve the given problems. The RWYL is the best profile among these four algorithms and the normal WYL algorithm has the worst performance.

In Figures 46, we use CPU time, NI, and NFG to compare the performance of the conjugate gradient codes RWYL, Hager-Zhang, PRP, and WYL algorithms on the dimension 50000. These four figures indicate that, relative to the CPU time, NI, and NFG, RWYL is fastest, then PRP, then Hager-Zhang algorithm, and then WYL. These codes only differ in their choice of the search direction; then we can conclude that the RWYL generates the best search directions for these test problems, on average.

In Figures 79, we use CPU time, NI, and NFG to compare the performance of the conjugate gradient codes RWYL, Hager-Zhang algorithm, PRP, and WYL on the dimension 100000. These four figures indicate that, relative to the CPU time, NI, and NFG, RWYL is fastest, then Hager-Zhang algorithm, then PRP, and then WYL.

According to these nine figures, it is easy to see that the RWYL has the best performance for the dimensions 10000, 50000, and 100000. The Hager-Zhang algorithm becomes competitive with the dimension become large, which shows that the Hager-Zhang algorithm is very effective for large-scale problems. The PRP algorithm has the stable numerical performance for any dimensions. The normal WYL can also successfully solve the optimization problems and its efficiency is limited. To directly show the CPU time, NI, and NFG of these four algorithms, each algorithm number is listed in Table 1.

5.2. Benchmark Problems (Engineer Problems)

The following Benchmark Problems can be found athttp://www.cs.cmu.edu/afs/cs/project/jair/pub/volume24/ortizboyer05a-html/node6.html.

Sphere function:

Schwefel function:

Schwefel’s function:

Griewank function:

Rastrigin function:Benchmark Problems are from the engineer fields and there are many scholars focusing on the studies of these problems. The given algorithm of this paper can also successfully solve them. We do the experiments about the RWYL and the normal WYL for comparing and omit the other two methods Hager-Zhang and PRP. The parameters and the stopping rule are the same as the above subsection. The codes are written by Matlab 2017 and run on PC Core 2 Duo CPU at 2.26 GHz, 6.00GB of RAM, and Windows 7 operation system. The dimension is 300 and 1000 variables. The detailed numerical results are listed in Tables 2 and 3.

To directly see the results of Tables 2 and 3, we compute the total NI and NFG and set them in Table 4.

The results of Table 4 show that the restart algorithm is more competitive with the normal algorithm for the Benchmark Problems.

5.3. Parameters Estimation of Nonlinear Muskingum Models

The basic Muskingum model, the continuity, and storage equations are defined by where, at time , is channel storage, is rate of inflow and denotes outflow, is storage-time constant, and is weighting factor for the river reach. The generalized trapezoidal formula [38] isTo conveniently estimate the parameters , , and in the nonlinear Muskingum model, the objective function can rewritten as This subsection will use our RWYL to estimate the parameters of the above two Muskingum models, named Model 1 (96) and Model 2 (98).

All in all, we can conclude that that the restart algorithm is competitive with the norm algorithm without restart technique and other similar algorithms.

6. Conclusions

Nonlinear conjugate gradient algorithm is one of the most effective algorithms in optimization algorithms, especially for large-scale optimization problems. Many scholars have obtained many interesting results in this field. This paper focuses on a modified WYL CG algorithm with restart technique for large-scale optimization. In our opinion, there are at least seven issues that warrant further research and improvement: (i) The first issue that should be considered is the choice of the restart parameter in the RWYL algorithm; the value () used here is not the only choice. (ii) The second important issue is the termination condition; better termination conditions may exist for the CG algorithms, which may improve the numerical performance and convergence. (iii) Under the restart strategy, other similar CG algorithms with quadratic convergence are worth studying. (iv) It would be interesting to test the performance of the given algorithm when applied to other optimization problems that arise in the image processing field. (v) We all know that the nonmonotone line search techniques are very effective. In the future, we will study the possibility of combining CG algorithms with nonmonotonic techniques for large-scale optimization problems and will attempt to obtain good results. (vi) In the experiment, there are 59 optimization problems with the dimension 10000, 50000, and 100000 variables that are tested. We also do the test for the Benchmark Problems which has wild applications in engineer fields. In the future, more problems and numerical experiments should be done to turn out the performance of the CG algorithms. (vii) The last issue is the use of the CG algorithm for nonsmooth optimization and nonlinear equations, which we consider to be important for future research. All these topics will be the focus of our future work.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work is supported by the National Natural Science Foundation of China (Grant no. 11661009), the Guangxi Science Fund for Distinguished Young Scholars (no. 2015GXNSFGA139001), the Guangxi Natural Science Key Fund (no. 2017GXNSFDA198046), and the Guangxi Science Fund of Young and Middle-Aged Teachers for the Basic Ability Promotion (no. 2017KY0019).