A Newton-Like Trust Region Method for Large-Scale Unconstrained Nonconvex Minimization
We present a new Newton-like method for large-scale unconstrained nonconvex minimization. And a new straightforward limited memory quasi-Newton updating based on the modified quasi-Newton equation is deduced to construct the trust region subproblem, in which the information of both the function value and gradient is used to construct approximate Hessian. The global convergence of the algorithm is proved. Numerical results indicate that the proposed method is competitive and efficient on some classical large-scale nonconvex test problems.
We consider the following unconstrained optimization: where is continuously differentiable.
Trust region methods [1–14] are robust, can be applied to ill-conditioned problems, and have strong global convergence properties. Another advantage of trust region methods is that there is no need to require the approximate Hessian of the trust region subproblem to be positive definite. So, trust region methods are important and efficient for nonconvex optimization problems [6–8, 10, 12, 14]. For a given iterate , the main computation of trust region algorithms is solving the following quadratic subproblem: where is the gradient of at , is the true Hessian or its approximation, is a trust region radius, and refers to the Euclidean norm on . For a trial step , which is generated by solving the subproblem (2), adequacy of the predicted reduction and true variation of the objective function is measured by means of the ratio Then the trust region radius is updated according to the value of . Trust region methods ensure that at least a Cauchy (steepest descent-like) decrease on each iteration satisfies an evaluation complexity bound of the same order under identical conditions . It follows that Newton’s method globalized by trust region regularization satisfies the same evaluation upper bound; such a bound can also be shown to be tight  provided additionally that the Hessian on the path of the iterates for which pure Newton steps are taken is Lipschitz continuous.
Newton’s method has been efficiently safeguarded to ensure its global convergence to first- and even second-order critical points, in the presence of local nonconvexity of the objective using line search , trust region , or other regularization techniques [9, 13]. Many variants of these globalization techniques have been proposed. These generally retain fast local convergence under some nondegeneracy assumptions, are often suitable when solving large-scale problems, and sometimes allow approximate rather than true Hessians to be employed. Solving-large scale problems needs expensive computation and storage. So many researchers have studied the limited memory techniques [15–24]. The limited memory techniques are firstly applied to line search method. Liu and Nocedal [15, 16] proposed a limited memory BFGS method (L-BFGS) for solving unconstrained optimization and proved its global convergence. Byrd et al.  gave the compact representations of the limited memory BFGS and SR1 formula, which made it possible for combining limited memory techniques with trust region method. Considering that the L-BFGS updating formula used the gradient information merely and ignored the available function value information, Yang and Xu  deduced modified quasi-Newton formula with limited memory compact representation based on the modified quasi-Newton equation with a vector parameter . Recently, some researchers combined the limited memory techniques with trust region method for solving large-scale unconstrained and constrained optimizations [20–24].
In this paper, we deduce a new straightforward limited memory quasi-Newton updating based on the modified quasi-Newton equation, which uses both available gradient and function value information, to construct the trust region subproblem. Then the corresponding trust region method is proposed for large-scale unconstrained nonconvex minimization. The global convergence of the new algorithm is proved under some appropriate conditions.
The rest of the paper is organized as follows. In the next section, we deduce a new straightforward limited memory quasi-Newton updating. In Section 3, a Newton-like trust region method for large-scale unconstrained nonconvex minimization is proposed and the convergence property is proved under some reasonable assumptions. Some numerical results are given in Section 4.
2. The Modified Limited Memory Quasi-Newton Formula
In this section, we deduce a straightforward limited memory quasi-Newton updating based on the modified quasi-Newton equation, which employs both the gradients and function values to construct the approximate Hessian and is a compensation for the missing data in limited memory techniques. And then we apply the derived formula in trust region method.
Consider the following modified quasi-Newton equation : where , , , and . The quasi-Newton updating matrix constructed by (4) achieves a higher order accuracy in approximating Hessian. Based on (4), the modified BFGS (MBFGS) updating is as follows: For twice continuously differentiable function, if converges to a point at which and is positive definite, then , and then . Moreover, if is sufficiently large, the MBFGS updating approaches to the BFGS updating.
Then formula (5) can be rewritten into the straightforward formula where and . Thus, can be recursively expressed as Let , and let . Then the above formula can be simply written as Formula (8) is called the whole memory quasi-Newton formula. For a given positive integer ( usually is taken for ), if we use the last pairs at the th iteration to update the starting matrix times, according to (8), we get the following limited memory MBFGS (L-MBFGS) formula: where , ; then where and .
Since the vectors and can be obtained and saved from the previous iterations, we only need to compute the vectors and to achieve the limited memory quasi-Newton updating matrix. Suppose , the computation of needs multiplications. Then we consider the computation of . If can be saved and multiplies by directly, the process needs multiplications. In this paper, we compute the product by (9). Consider So we need multiplications to achieve . Let ; then . It takes multiplications to compute . Ignoring lower order terms, it is a total of multiplications to obtain .
It is noticed that the only difference between the limited memory quasi-Newton method and the standard quasi-Newton method is in the matrix updating. Instead of storing the matrices , we need to store pairs vectors to define implicitly. The product or is obtained by performing a sequence of inner products involving and the most recent vectors pairs .
In the following, we discuss the computation of the products and , . As the situation of (11), we need multiplications to obtain . If has been computed, we only need to solve a vector product to obtain which needs multiplications. If has not been computed, we compute directly by using (9). Consider The whole computation only requires multiplications. Thus, multiplications are saved in contrast to the previous method.
If we take , and , have been obtained and saved from the previous iteration, from (11), there are multiplications to compute ; it is a considerable improvement on computation comparing with .
Algorithm 1. Compute and save , . For ,
Step 1. Compute .
Step 2. Compute .
Step 3. Compute .
Algorithm 2. Compute , . Let be the current iteration point, the vectors , and matrixes , have been obtained by the previous iteration.
Step 1. Update , .
Step 2. Compute , .
We use the form of (9) to store . Instead of updating into , we update , into , .
3. Newton-Like Trust Region Method
In this section, we present a Newton-like trust region method for large-scale unconstrained nonconvex minimization.
Algorithm 3. Step 0. Given , , , , , is a given matrix. Compute ; set .
Step 1. If , then stop.
Step 2. Solve the subproblem (2) to obtain .
Step 3. Compute
Step 4. Compute
Step 5. Update the trust region radius as the following:
Step 6. By implementing Algorithm 1 to update , into , in order to update into , set ; go to Step 1.
In Step 2, using CG-Steihaug algorithm in  to solve the subproblem (2), the algorithm is suitable for solving large-scale unconstrained optimization. In the solving process, the products and are computed by Algorithm 2. Then the whole computation of solving subproblem only requires multiplications.
To give the convergence result, we need the following assumptions.
Assumption 4. (H1) The level set is contained in a bounded convex set.(H2) The gradient of the objective function is Lipschitz continuous in the neighborhood of ; that is, there is a constant such that (H3) The solution of the subproblem (2) satisfies where .(H4) The solution of subproblem (2) satisfies for .
Lemma 5. Suppose that (H1) holds and is positive definite; there exist constants such that for any with . Then matrices are uniformly bounded.
Proof. From Taylor expansion
From (19), we obtain that
It is obvious that
Since , and from (9) (in which ), we have then by (25) and being positive definite, we have
By the definition of Euclidean norm: , when is a symmetric matrix, . Obviously, is a symmetric matrix. Suppose the eigenvalues of are ; then So, is uniformly bounded.
The proof is similar to Theorem 4.7 in  and is omitted.
4. Numerical Results
In this section, we apply Algorithm 3 to solve nonconvex programming problems. Preliminary numerical results to illustrate the performance of Algorithm 3 are denoted by NLMTR. The contrast tests are called NTR, which is the same as NLMTR except that is updated by BFGS formula. All tests are implemented by using Matlab R2008a on a PC with CPU 2.00 GHz and 2.00 GB RAM. The test problem collections for nonconvex unconstrained minimization are taken from Moré et al. in , the CUTEr collection [26, 27]. These problems are listed in Table 1.
All numerical results are listed in Table 2, in which iter stands for the number of iterations, which equals the number of gradient evaluations; nf stands for the number of objective function evaluations; Prob stands for the problem label; Dim stands for the number of variables of the tested problem; cpu denotes the CPU time for solving the problems; is the terminated gradient; and denotes the optimal value.
We compare NLMTR with NTR. The trial step is computed by CG-steihaug algorithm . The matrix of NLMTR is updated by the straightforward modified L-MBFGS formula (9). Choosing , . The matrices of NTR is updated by BFGS formula in . The iteration is terminated by or , where . The related figures are listed in Table 2.
From Table 2, we can see that for small-scale problems, the optimal values and the gradient norms of NTR are more accurate than NLMTR. For middle-scale problems, the accuracy of NTR is higher, but the cpu time of NLMTR is shorter. For large-scale problems, the cpu time of NTR is much more than NLMTR, and for some problems NTR fails, especially when . So NLMTR is suitable for solving large-scale nonconvex problems.
This work is supported in part by the NNSF (11171003) of China, the Key Project of Chinese Ministry of Education (no. 211039), and Natural Science Foundation of Jilin Province of China (no. 201215102).
C. Cartis, N. I. M. Gould, and P. L. Toint, “On the complexity of steepest descent, Newton's and regularized Newton's methods for nonconvex unconstrained optimization problems,” SIAM Journal on Optimization, vol. 20, no. 6, pp. 2833–2852, 2010.View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
H. Liu and Q. Ni, “New limited-memory symmetric secant rank one algorithm for large-scale unconstrained optimization,” Transactions of Naniing University of Aeronautics and Astronautics, vol. 25, no. 3, pp. 235–239, 2008.View at: Google Scholar
H. Y. Benson, “Cute models,” http://orfe.princeton.edu/~rvdb/ampl/nlmodels/cute/.View at: Google Scholar