Research Article  Open Access
Mahboubeh Farid, Wah June Leong, Lihong Zheng, "Accumulative Approach in Multistep Diagonal GradientType Method for LargeScale Unconstrained Optimization", Journal of Applied Mathematics, vol. 2012, Article ID 875494, 11 pages, 2012. https://doi.org/10.1155/2012/875494
Accumulative Approach in Multistep Diagonal GradientType Method for LargeScale Unconstrained Optimization
Abstract
This paper focuses on developing diagonal gradienttype methods that employ accumulative approach in multistep diagonal updating to determine a better Hessian approximation in each step. The interpolating curve is used to derive a generalization of the weak secant equation, which will carry the information of the local Hessian. The new parameterization of the interpolating curve in variable space is obtained by utilizing accumulative approach via a norm weighting defined by two positive definite weighting matrices. We also note that the storage needed for all computation of the proposed method is just . Numerical results show that the proposed algorithm is efficient and superior by comparison with some other gradienttype methods.
1. Introduction
Consider the unconstrained optimization problem: where is twice continuously differentiable function. The gradienttype methods for solving (1.1) can be written as where and denote the gradient and the Hessian approximation of at , respectively. By considering , Barzilai and Borwein (BB) [1] give where it is derived by minimizing respect to with and . Recently, some improved onestep gradienttype methods [2โ5] in the frame of BB algorithm were proposed to solve (1.1). It is proposed to let be a diagonal nonsingular approximation to the Hessian and a new approximating matrix to the Hessian is developed based on weak secant equation of Dennis and Wolkowicz [6] In onestep method, data from one previous step is used to revise the current approximation of Hessian. Later Farid and Leong [7, 8] proposed multistep diagonal gradient methods inspired by the multistep quasiNewton method of Ford [9, 10]. In this multistep framework, a fixedpoint approach for interpolating polynomials was derived from data in previous iterations (not only one previous step) [7โ10]. General approach of multistep method is based on the measurement of distances in the variable space where the distance of every iterate is measured from oneselected iterate. In this paper, we are interested to develop multistep diagonal updating based on accumulative approach for defining new parameter value of interpolating curve. From this point, the distance is accumulated between consecutive iterates as they are traversed in the natural sequence. For measuring the distance, we need to parameterize the interpolating polynomial through a norm that is defined by a positive definite weighting matrix, say . Therefore, the performance of the multistep method may be significantly improved by carefully defining the weighting matrix. The rest of paper is organized as follows. In Section 2, we discuss a new multistep diagonal updating scheme based on the accumulative approach. In Section 3, we establish the global convergence of our proposed method. Section 4 presents numerical result and comparisons with BB method and onestep diagonal gradient method are reported. Conclusions are given in Section 5.
2. Derivation of the New Diagonal Updating via Accumulative Approach
This section motivates to state new implicit updates for diagonal gradienttype method through accumulative approach to determining a better Hessian approximation at each iteration. In multistep diagonal updating methods, weak secant equation (1.4) may be generalized by means of interpolating polynomials, instead of employing data just from one previous iteration like in onestep methods. Our aim is to derive efficient strategies for choosing a suitable set of parameters to construct the interpolating curve and investigate the best norm for measurement of the distances required to parameterize the interpolating polynomials. In general, this method obeys the recursive formula of the form where is the th iteration point, is step length which is determined by a line search, is an approximation to the Hessian in a diagonal form, and is the gradient of at . Consider a differentiable curve in . The derivative of , at point , can be obtained by applying the chain rule: We are interested to derive a relation that will be satisfied by the approximation of Hessian in diagonal form at . If we assume that passes through and choose so that then we have As in this paper, we use twostep method, therefore; we use information of most recent points , , and their associated gradients. Consider as the interpolating vector polynomial of degree 2: The selection of distinct scalar value efficiently through the new approach is the main contribution of this paper and will be discussed later in this section. Let be the interpolation for approximating the gradient vector: By denoting and defining we can obtain our desired relation that will be satisfied by the Hessian approximation at in diagonal form. Corresponding to this twostep approach, weak secant equation will be generalized as follows: Then, can be obtained by using an appropriately modified version of diagonal updating formula in [3] as follows: where . Now, we attempt to construct an algorithm for finding desired vector and to improve the Hessian approximation. The proposed method is outlined as follows. First, we seek to derive strategies for choosing a suitable set of values , , and . The choice of is such that to reflect distances between iterates in that are dependent on some metric of the following general form: The establishment on can be made via the socalled accumulative approach where the accumulating distances (measured by the metric ) between consecutive iterates are used to approximate . This leads to the following definitions (where without loss of generality, we take to be origin for value of ): Then, we can construct the set as follows: where and are depending on the value of . As the set measures the distances, therefore they need to be parameterized the interpolating polynomials via a norm defined by a positive definite matrix . It is necessary to choose with some care, while improving the approximation of Hessian can be strongly influenced via the choice of . Two choices for the weighting matrix are considered in this paper. In first choice, if , the reduces to the Euclidean norm, and then we obtain the following values accordingly: The second choice of weighting matrix is to take , where the current is diagonal approximation to the Hessian. By these two means, the measurement of the relevant distances is determined by the properties of the current quadratic approximation (based on ) to the objective function: Since is a diagonal matrix, then it is not expensive to compute at each iteration. The quantity is introduced here and defined as follows: and and are given by the following expressions: To safeguard on the possibility of having very small or very large , we require that the condition is satisfied (we use and ). If not, then we replace and . More that the Hessian approximation () might not preserve the positive definiteness in each step. One of the fundamental concepts in this paper is to determine an โimprovedโ version of the Hessian approximation to be used even in computing the metric when and a weighing matrix as norm should be positive definite. To ensure that the updates remain positive definite, a scaling strategy proposed in [7] is applied. Hence, the new updating formula that incorporates the scaling strategy is given by where This guarantees that the updated Hessian approximation is positive. Finally, the new accumulative MD algorithm is outlined as follows.
2.1. Accumulative MD Algorithm
Step 1. Choose an initial point , and a positive definite matrix .
Let .
Step 2. Compute . If , stop.
Step 3. If , set . If set and go to Step 5.
Step 4. If and is considered, compute from (2.13).
Else if , compute from (2.14).
Compute , and , from (2.15), (2.16),โโ(2.17), and (2.20), respectively.
If , set and .
Step 5. Compute and calculate such that the Armijo [11], condition holds:
, where is a given constant.
Step 6. Let , and update by (2.19).
Step 7. Set , and return to Step 2.
3. Convergence Analysis
This section is devoted to study the convergence of accumulative MD algorithm, when applied to the minimization of a convex function. To begin, we give the following result, which is due to Byrd and Nocedal [12] for the step generated by the Armijo line search algorithm. Here and elsewhere, denotes the Euclidean norm.
Theorem 3.1. Assume that is a strictly convex function. Suppose the Armijo line search algorithm is employed in a way that for any with , the step length, satisfies where , and . Then, there exist positive constants and such that either or is satisfied.
We can apply Theorem 3.1 to establish the convergence of some Armijotype line search methods.
Theorem 3.2. Assume that is a strictly convex function. Suppose that the Armijo line search algorithm in Theorem 3.1 is employed with chosen to obey the following conditions: there exist positive constants and such that for all sufficiently large . Then, the iterates generated by the line search algorithm have the property that
Proof. By (3.4), we have that either (3.2) or (3.6) becomes for some positive constants. Since is strictly convex, it is also bounded below. Then, (3.1) implies that as . This also implies that as or at least
To prove that the accumulative MD algorithm is globally convergent when applied to the minimization of a convex function, it is sufficient to show that the sequence generated by (2.19)(2.20) is bounded both above and below, for all finite so that its associated search direction satisfies condition (3.4). Since is diagonal, it is enough to show that each element of says ;โโ is bounded above and below by some positive constants. The following theorem gives the boundedness of .
Theorem 3.3. Assume that is strictly convex function where there exists positive constants and such that for all . Let be a sequence generated by the accumulative MD method. Then, is bounded above and below for all finite , by some positive constants.
Proof. Let be the th element of . Suppose is chosen such that , where and are some positive constants.
Case 1. If (2.18) is satisfied, we have
By (2.18) and the definition of , one can obtain
Thus, if , then satisfies
On the other hand, if , then
where is the th component of . Letting be the largest component (in magnitude) of , that is, ; for all , then it follows that , and the property of , (3.12) becomes
Hence, is bounded above and below, for all in both occasions.
Case 2. If (2.18) is violated, then the updating formula for becomes
where is the th component of , , and .
Because also implies that , then this fact, together with the convexity property (3.8), and the definition of give
Using the similar argument as above, that is, by letting be the largest component (in magnitude) of , then it follows that
Hence, in both cases, is bounded above and below, by some positive constants. Since the upper and lower bound for is, respectively, independent to , we can proceed by using induction to show that is bounded, for all finite .
4. Numerical Results
In this section, we examine the practical performance of our proposed algorithm in comparison with the BB method and standard onestep diagonal gradienttype method (MD). The new algorithms are referred to as AMD1 and AMD2 when and are used, respectively. For all methods we employ Armijo line search [11] where . All experiments in this paper are implemented on a PC with Core Duo CPU using Matlab 7.0. For each run, the termination condition is that . All attempts to solve the test problems were limited to a maximum of 1000 iterations. The test problems are chosen from Andrei [13] and Morรฉ et al. [14] collections. The detailed test problem is summarized in Table 1. Our experiments are performed on a set of 36 nonlinear unconstrained problems, and the problems vary in size from to 10000 variables. Figures 1, 2, and 3 present the Dolan and Morรฉ [15] performance profile for all algorithms subject to the iteration, function call, and CPU time.

From Figure 1, we see that AMD2 method is the top performer, being more successful than other methods in the number of iteration. Figure 2 shows that AMD2 method requires the fewest function calls. From Figure 3, we observe that the AMD2 method is faster than MD and AMD1 methods and needs reasonable time to solve largescale problems when compared to the BB method. At each iteration, the proposed method does not require more storage than classic diagonal updating methods. Moreover, a higherorder accuracy in approximating the Hessian matrix of the objective function makes AMD method need less iterations and less function evaluation. The numerical results by the tests reported in Figures 1, 2, and 3 demonstrate clearly the new method AMD2 shows significant improvements, when compared with BB, MD, and AMD1. Generally, performs better than . It is most probably due to the fact that is a better Hessian approximation than the identity matrix .
5. Conclusion
In this paper, we propose a new twostep diagonal gradient method as view of accumulative approach for unconstrained optimization. The new parameterization for multistep diagonal gradienttype method is developed via employing accumulative approach. The new technique is devised for interpolating curves which are the basis of multistep approach. Numerical results show that the proposed method is suitable to solve largescale unconstrained optimization problems and more stable than other similar methods in practical computation. The improvement that our proposed methods bring does come at a complexity cost of while others are about [9, 10].
References
 J. Barzilai and J. M. Borwein, โTwopoint step size gradient methods,โ IMA Journal of Numerical Analysis, vol. 8, no. 1, pp. 141โ148, 1988. View at: Publisher Site  Google Scholar  Zentralblatt MATH
 M. A. Hassan, W. J. Leong, and M. Farid, โA new gradient method via quasiCauchy relation which guarantees descent,โ Journal of Computational and Applied Mathematics, vol. 230, no. 1, pp. 300โ305, 2009. View at: Publisher Site  Google Scholar  Zentralblatt MATH
 W. J. Leong, M. A. Hassan, and M. Farid, โA monotone gradient method via weak secant equation for unconstrained optimization,โ Taiwanese Journal of Mathematics, vol. 14, no. 2, pp. 413โ423, 2010. View at: Google Scholar  Zentralblatt MATH
 W. J. Leong, M. Farid, and M. A. Hassan, โScaling on diagonal QuasiNewton update for large scale unconstrained Optimization,โ Bulletin of the Malaysian Mathematical Sciences Soceity, vol. 35, no. 2, pp. 247โ256, 2012. View at: Google Scholar
 M. Y. Waziri, W. J. Leong, M. A. Hassan, and M. Monsi, โA new Newtons method with diagonal jacobian approximation for systems of nonlinear equations,โ Journal of Mathematics and Statistics, vol. 6, pp. 246โ252, 2010. View at: Publisher Site  Google Scholar
 J. E. Dennis, Jr. and H. Wolkowicz, โSizing and leastchange secant methods,โ SIAM Journal on Numerical Analysis, vol. 30, no. 5, pp. 1291โ1314, 1993. View at: Publisher Site  Google Scholar  Zentralblatt MATH
 M. Farid, W. J. Leong, and M. A. Hassan, โA new twostep gradienttype method for largescale unconstrained optimization,โ Computers & Mathematics with Applications, vol. 59, no. 10, pp. 3301โ3307, 2010. View at: Publisher Site  Google Scholar  Zentralblatt MATH
 M. Farid and W. J. Leong, โAn improved multistep gradienttype method for large scale optimization,โ Computers & Mathematics with Applications, vol. 61, no. 11, pp. 3312โ3318, 2011. View at: Publisher Site  Google Scholar  Zentralblatt MATH
 J. A. Ford and I. A. Moghrabi, โAlternating multistep quasiNewton methods for unconstrained optimization,โ Journal of Computational and Applied Mathematics, vol. 82, no. 12, pp. 105โ116, 1997. View at: Publisher Site  Google Scholar  Zentralblatt MATH
 J. A. Ford and S. Tharmlikit, โNew implicit updates in multistep quasiNewton methods for unconstrained optimisation,โ Journal of Computational and Applied Mathematics, vol. 152, no. 12, pp. 133โ146, 2003. View at: Publisher Site  Google Scholar
 L. Armijo, โMinimization of functions having Lipschitz continuous first partial derivatives,โ Pacific Journal of Mathematics, vol. 16, pp. 1โ3, 1966. View at: Google Scholar  Zentralblatt MATH
 R. H. Byrd and J. Nocedal, โA tool for the analysis of quasiNewton methods with application to unconstrained minimization,โ SIAM Journal on Numerical Analysis, vol. 26, no. 3, pp. 727โ739, 1989. View at: Publisher Site  Google Scholar  Zentralblatt MATH
 N. Andrei, โAn unconstrained optimization test functions collection,โ Advanced Modeling and Optimization, vol. 10, no. 1, pp. 147โ161, 2008. View at: Google Scholar  Zentralblatt MATH
 J. J. Moré, B. S. Garbow, and K. E. Hillstrom, โTesting unconstrained optimization software,โ ACM Transactions on Mathematical Software, vol. 7, no. 1, pp. 17โ41, 1981. View at: Publisher Site  Google Scholar  Zentralblatt MATH
 E. D. Dolan and J. J. Moré, โBenchmarking optimization software with performance profiles,โ Mathematical Programming A, vol. 91, no. 2, pp. 201โ213, 2002. View at: Publisher Site  Google Scholar  Zentralblatt MATH
Copyright
Copyright © 2012 Mahboubeh Farid et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.