Research Article  Open Access
Mahboubeh Farid, Wah June Leong, Najmeh Malekmohammadi, Mustafa Mamat, "Scaled Diagonal GradientType Method with Extra Update for LargeScale Unconstrained Optimization", Abstract and Applied Analysis, vol. 2013, Article ID 532041, 5 pages, 2013. https://doi.org/10.1155/2013/532041
Scaled Diagonal GradientType Method with Extra Update for LargeScale Unconstrained Optimization
Abstract
We present a new gradient method that uses scaling and extra updating within the diagonal updating for solving unconstrained optimization problem. The new method is in the frame of Barzilai and Borwein (BB) method, except that the Hessian matrix is approximated by a diagonal matrix rather than the multiple of identity matrix in the BB method. The main idea is to design a new diagonal updating scheme that incorporates scaling to instantly reduce the large eigenvalues of diagonal approximation and otherwise employs extra updates to increase small eigenvalues. These approaches give us a rapid control in the eigenvalues of the updating matrix and thus improve stepwise convergence. We show that our method is globally convergent. The effectiveness of the method is evaluated by means of numerical comparison with the BB method and its variant.
1. Introduction
In this paper, we consider the unconstrained optimization problem where is a continuously differentiable function from to . Given a starting point , using notations and as an approximation to the Hessian , the quasiNewtonbased methods for solving (1) are defined by the iteration where the stepsize is determined through an appropriate selection. The updating matrix is usually required to satisfy the quasiNewton equation where and . One of the widely used quasiNewton method to solve general nonlinear minimization is the BFGS method, which uses the following updating formula: On the numerical aspect, this method supersedes most of the optimization methods; however, it needs storage which makes it unsuitable for largescale problems.
On the other hand, an ingenious stepsizes selection for gradient method was proposed by Barzilai and Borwein [1] in which the updating scheme is defined by where and .
Since that, the study of new effective methods in the frame of BBlike gradient methods becomes an interesting research topic for a wide range of mathematical programming; for example, see [2–10]. However, it is well known that BB method cannot guarantee a descent in the objective function at each iteration and the extent of the nonmonotonicity depends in some way on the size of the condition number of objective function [11]. Therefore, the performance of BB method is greatly influenced by the condition of the problem (particularly, condition number of the Hessian matrix). Some new fixed stepsizes gradienttype methods of BB kind are proposed by [12–16] to overcome these difficulties. In contrast with the BB approach in which the stepsize is computed by means of a simple approximation of the Hessian in the form of scalar multiple of identity, these proposed methods consider approximation of the Hessian and its inverse in diagonal matrix form based on the weak secant equation and quasicauchy relation, respectively (for more details see [15, 16]). Though these diagonal updating methods are efficient, their performance can be greatly affected by solving illconditioned problems. Thus, there is room for improve on the quality of the diagonal updates formulation. Since methods as described in [15, 16] have useful theoretical and numerical properties, it is desirable to derive a new and more efficient updating frame for general functions. Therefore our aim is to improve the quality of diagonal updating when it is poor in approximating Hessian.
This paper is organized as follows. In the next section, we describe our motivation and propose our newgradient type method. The global convergence of the method under mild assumption will be established in Section 3. Numerical evidence of the vast improvements due to the new approach is given in Section 4. Finally, conclusion is made in the last section.
2. Scaling and Extra Updating
Assume that is positive definite, and let and be two sequences of vectors such that for all . Because it is usually difficult to satisfy the quasiNewton equation (3) with a nonsingular of the diagonal form, one can consider satisfying it in some directions. If we project the quasiNewton equation (3) (also called the secant equation), in a direction such that , then it gives If is chosen, it leads to the socalled weaksecant relation,
Under this weaksecant equation, [15, 16] employ variational technique to derive updating matrix that approximates the Hessian matrix diagonally. The resulting update is derived to be the solution of the following variational problem: and gives the corresponding solution as follows: where , is the th component of the vector , and denotes the trace operator.
Note that when , the resulting is not necessarily positive definite and it is not appropriate for use within a quasiNewtonbased algorithm. Thus, it is desirable to propose a technique to measure the quality of in terms of its Rayleigh quotient and try to find a way to improve “poor” quality before calculating . For this purpose, it will be useful to propose, at first quality a criterion to distinguish between poor, and acceptable quality of .
Let us begin by considering the curvature of an objective function, in direction , which is represented by where is the average Hessian matrix along . Since it is not practical to compute the eigenvalue of the Hessian matrix in each iteration, we can estimate its relative size on the basis of the scalar If , it implies that the eigenvalues of approximated by its Rayleigh are relatively small compared to those of the local Hessian matrix at . In this condition, we find that the strategy of extra update [17] seems to be useful for improving the quality of by rapidly increasing its eigenvalues up to those of the actual Hessian relatively. This is done by updating twice to obtain : and use it to obtain, finally, the updated :
On the other hand, when , it implies that the eigenvalue of represented by its Rayleigh is relatively large and we have . In this case, we should suggest a useful strategy to encounter this drawback. As we reviewed before, the updating scheme may generate nonpositive definite when has large eigenvalues relative to those possible values of , that is, when . On the contrary, this argument disappears when the eigenvalues of are small (i.e., ). This suggests that the scaling should be made to scale down , that is, choosing only when and take , whenever . Combining these two arguments, we choose the scaling parameter such that This scaling resembles the AlBaali [18] scaling that is applied within the Broyden family. Because the value of is always , then by incorporating the scaling to , it decreases the large eigenvalues of constantly, and consequently we can keep positive definiteness of (since ), which is an important property in descent method. In this case, the following updating: will be used. To this end, we have the following general updating scheme for : where and are given by (13) and (15), respectively.
An advantage of using (17) is that the positive definiteness of can be guaranteed in all iterations. This property is not exhibited in the other diagonal updating formula such as those in [15, 16]. Note that there is no extra storage required to impose our strategy and the cost of computing is also not increased significantly throughout the entire iteration. Now we can state the steps of our new diagonalgradient method algorithm with the safeguarding strategy for monotonicity as follows.
2.1. ESDG Algorithm
Step 1. Choose an initial point and a positive definite matrix . Let . Set .
Step 2. Compute . If , stop.
Step 3. If , set .
Step 4. Compute , and calculate such that the following condition holds: where and is a given constant.
Step 5. If , let and compute and by (11) and (15), respectively. If then update by (16).
Step 6. If then compute and by (12), (13), respectively, and then update as defined (14).
Step 7. Set , and return to Step 2.
In Step 4, we employ the nonmonotone line search of [19, 20] to ensure the convergence of the algorithm. However, some other line search strategies may also be used.
3. Convergence Analysis
This section is devoted to study the convergence behavior of ESDG method. We will establish the convergence of the ESDG algorithm when applied to the minimization of a strictly convex function. To begin, we give the convergence result, which is due to Grippo et al. [21] for the step generated by the nonmonotone line search algorithm. Here and elsewhere, denotes the Euclidean norm.
Theorem 1. Assume that is a strictly convex function and its gradient satisfies the Lipschitz condition. Suppose that the nonmonotone line search algorithm is employed in a case that the steplength, , satisfies where , with and , and the search direction is chosen to obey the following conditions. There exist positive constants and such that for all sufficiently large . Then the iterates generated by the nonmonotone line search algorithm have the property that
To prove that the ESDG algorithm is globally convergent, it is sufficient to show that the sequence generated by (17) is bounded both above and below, for all finite so that its associated search direction satisfies condition (19). Since is diagonal, it is enough to show that each element of , say , , is bounded above and below by some positive constants. The following theorem gives the boundedness of .
Theorem 2. Assume that is strictly convex function where there exist positive constants and such that for all . Let be a sequence generated by the ESDG method. Then is bounded above and below for all finite , by some positive constants.
Proof. Let be the th element of . Suppose that is chosen such that where are some positive constants. It follows from (17) and the definition of in (15) that we have
where
Moreover, by (21) and (11), we obtain
Case??1. When : by (24), one can obtain
Thus, it implies that .
Case??2. When : from (3), we have
Because also implies that , using this fact and (24) give
Let be the largest component in magnitude of , that is, . Then it follows that , and (27) becomes
Using (28) and the same argument as previously mentioned, we can also show that
Hence in both cases, is bounded above and below, by some positive constants. Since the upper and lower bounds for are independent of , respectively, we can proceed by using induction to show that is bounded, for all finite .
4. Numerical Results
In this section we present the results of numerical investigation for ESDG method on different test problems. We also compare the performance of our new method with that of the BB method and that of MDGRAD method which is implemented using SMDQN of [22] with a same nonmonotone strategy as the ESDG method. Our experiments are performed on a set of 20 nonlinear unconstrained problems with dimensions ranging from 10 to 10^{4} (Table 1).

These test problems are taken from [23, 24]. The codes are developed with Matlab 7.0. All runs are performed on a PC with Core Duo CPU. For each test, the termination condition is . The maximum number of iterations is set to 1000.
Figures 1 and 2 show the efficiency of ESDG method when compared to MDGRAD and BB methods. Note that ESDG method increases the efficiency of Hessian approximation devoid of increasing the number of storages. Figure 2 also shows the implementation of the ESDG method with BB and MDGRAD methods using the CPU time as a measure. This figure shows that ESDG method is again faster than MDGRAD method in most problems and requires reasonable time to solve largescale problems when compares to the BB method. Finally, we can conclude that our experimental comparisons indicate that our extension is very beneficial to the performance.
5. Conclusion
We have presented a new diagonal gradient method for unconstrained optimization. Numerical study of the proposed method when compared with BB and MDGRAD methods is also performed. Based on our numerical experiments, we can conclude that ESDG method is significantly preferable compared to the BB and MDGRAD methods. Particularly, the ESDG method is proven to be a good option for largescale problems when highmemory locations are required. In view of the remarkable performance of ESDG method, globally converged and with only storage, we can expect that our proposed method would be useful for unconstrained largescale optimization problems.
References
 J. Barzilai and J. M. Borwein, “Twopoint step size gradient methods,” IMA Journal of Numerical Analysis, vol. 8, no. 1, pp. 141–148, 1988. View at: Publisher Site  Google Scholar  Zentralblatt MATH  MathSciNet
 E. G. Birgin, J. M. Martínez, and M. Raydan, “Nonmonotone spectral projected gradient methods on convex sets,” SIAM Journal on Optimization, vol. 10, no. 4, pp. 1196–1211, 2000. View at: Publisher Site  Google Scholar  Zentralblatt MATH  MathSciNet
 Y.H. Dai and R. Fletcher, “Projected BarzilaiBorwein methods for largescale boxconstrained quadratic programming,” Numerische Mathematik, vol. 100, no. 1, pp. 21–47, 2005. View at: Publisher Site  Google Scholar  Zentralblatt MATH  MathSciNet
 Y.H. Dai and L.Z. Liao, “Rlinear convergence of the Barzilai and Borwein gradient method,” IMA Journal of Numerical Analysis, vol. 22, no. 1, pp. 1–10, 2002. View at: Publisher Site  Google Scholar  MathSciNet
 Y.H. Dai, W. W. Hager, K. Schittkowski, and H. Zhang, “The cyclic BarzilaiBorwein method for unconstrained optimization,” IMA Journal of Numerical Analysis, vol. 26, no. 3, pp. 604–627, 2006. View at: Publisher Site  Google Scholar  Zentralblatt MATH  MathSciNet
 G. Frassoldati, G. Zanghirati, and L. Zanni, “New adaptive stepsize selections in gradient methods,” Journal of Industrial and Management Optimization, vol. 4, no. 2, pp. 299–312, 2008. View at: Publisher Site  Google Scholar  Zentralblatt MATH  MathSciNet
 M. Raydan, “On the Barzilai and Borwein choice of steplength for the gradient method,” IMA Journal of Numerical Analysis, vol. 13, no. 3, pp. 321–326, 1993. View at: Publisher Site  Google Scholar  Zentralblatt MATH  MathSciNet
 M. Raydan, “The Barzilai and Borwein gradient method for the large scale unconstrained minimization problem,” SIAM Journal on Optimization, vol. 7, no. 1, pp. 26–33, 1997. View at: Publisher Site  Google Scholar  Zentralblatt MATH  MathSciNet
 B. Zhou, L. Gao, and Y. Dai, “Monotone projected gradient methods for largescale boxconstrained quadratic programming,” Science in China. Series A, vol. 49, no. 5, pp. 688–702, 2006. View at: Publisher Site  Google Scholar  Zentralblatt MATH  MathSciNet
 B. Zhou, L. Gao, and Y.H. Dai, “Gradient methods with adaptive stepsizes,” Computational Optimization and Applications, vol. 35, no. 1, pp. 69–86, 2006. View at: Publisher Site  Google Scholar  Zentralblatt MATH  MathSciNet
 R. Fletcher, “On the BarzilaiBorwein method,” Tech. Rep. NA/207, Department of Mathematics, University of Dundee, Scotland, UK, 2001. View at: Google Scholar
 M. Farid, W. J. Leong, and M. A. Hassan, “A new twostep gradienttype method for largescale unconstrained optimization,” Computers & Mathematics with Applications, vol. 59, no. 10, pp. 3301–3307, 2010. View at: Publisher Site  Google Scholar  Zentralblatt MATH  MathSciNet
 M. Farid and W. J. Leong, “An improved multistep gradienttype method for large scale optimization,” Computers & Mathematics with Applications, vol. 61, no. 11, pp. 3312–3318, 2011. View at: Publisher Site  Google Scholar  Zentralblatt MATH  MathSciNet
 M. Farid, W. J. Leong, and L. Zheng, “Accumulative approach in multistep diagonal gradienttype method for largescale unconstrained optimization,” Journal of Applied Mathematics, vol. 2012, Article ID 875494, 11 pages, 2012. View at: Publisher Site  Google Scholar  Zentralblatt MATH  MathSciNet
 M. A. Hassan, W. J. Leong, and M. Farid, “A new gradient method via quasiCauchy relation which guarantees descent,” Journal of Computational and Applied Mathematics, vol. 230, no. 1, pp. 300–305, 2009. View at: Publisher Site  Google Scholar  Zentralblatt MATH  MathSciNet
 W. J. Leong, M. A. Hassan, and M. Farid, “A monotone gradient method via weak secant equation for unconstrained optimization,” Taiwanese Journal of Mathematics, vol. 14, no. 2, pp. 413–423, 2010. View at: Google Scholar  Zentralblatt MATH  MathSciNet
 M. AlBaali, “Extra updates for the BFGS method,” Optimization Methods and Software, vol. 13, no. 3, pp. 159–179, 2000. View at: Publisher Site  Google Scholar  Zentralblatt MATH  MathSciNet
 M. AlBaali, “Numerical experience with a class of selfscaling quasiNewton algorithms,” Journal of Optimization Theory and Applications, vol. 96, no. 3, pp. 533–553, 1998. View at: Publisher Site  Google Scholar  MathSciNet
 E. G. Birgin, J. M. Martínez, and M. Raydan, “Inexact spectral projected gradient methods on convex sets,” IMA Journal of Numerical Analysis, vol. 23, no. 4, pp. 539–559, 2003. View at: Publisher Site  Google Scholar  Zentralblatt MATH  MathSciNet
 E. G. Birgin, J. M. Martinez, and M. Raydan, “Nonmonotone spectral projected gradient methods on convex,” Encyclopedia of Optimization, pp. 3652–3659, 2009. View at: Google Scholar
 L. Grippo, F. Lampariello, and S. Lucidi, “A nonmonotone line search technique for Newton's method,” SIAM Journal on Numerical Analysis, vol. 23, no. 4, pp. 707–716, 1986. View at: Publisher Site  Google Scholar  Zentralblatt MATH  MathSciNet
 W. J. Leong, M. Farid, and M. A. Hassan, “Scaling on diagonal quasiNewton update for largescale unconstrained optimization,” Bulletin of the Malaysian Mathematical Sciences Society, vol. 35, no. 2, pp. 247–256, 2012. View at: Google Scholar  Zentralblatt MATH  MathSciNet
 N. Andrei, “An unconstrained optimization test functions collection,” Advanced Modeling and Optimization, vol. 10, no. 1, pp. 147–161, 2008. View at: Google Scholar  Zentralblatt MATH  MathSciNet
 J. J. Moré, B. S. Garbow, and K. E. Hillstrom, “Testing unconstrained optimization software,” ACM Transactions on Mathematical Software, vol. 7, no. 1, pp. 17–41, 1981. View at: Publisher Site  Google Scholar  Zentralblatt MATH  MathSciNet
Copyright
Copyright © 2013 Mahboubeh Farid et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.