Accelerated Double Direction Method for Solving Unconstrained Optimization Problems
An iterative method for solving a minimization problem of unconstrained optimization is presented. This multistep curve search method uses the specific form of iteration with two direction parameters, the approximation of Hessian by appropriately constructed diagonal matrix, and the inexact line search procedure. It is proved that constructed numerical process is well defined under some assumptions. Considering certain conditions, the method is linearly convergent for uniformly convex and strictly convex quadratic functions. Numerical results arising from defined algorithms are also presented and analyzed.
In this paper, we derive a first-order numerical method for solving the following nonlinear unconstrained optimization problem: where is twice continuously differentiable function. The iteration of the form is considered. Here, represents a new iterative point, is the previous iteration, and denotes the stepsize, while and generate search directions. Each of these directions is calculated by particular algorithms. Similar to other iterative methods for solving the unconstrained optimization problems, the crucial moment is to find the appropriate descent direction vectors , and the optimal step length . In the proposed iteration (2), there are two direction vectors, and that was a motivation for naming this method as Accelerated Double Direction method (or shortly method). The decisive point of our research stands actually on two-direction form of the analyzed method and its implementation. Originally, the method of this particular form, but for different assumptions, is described in [1, 2]. In [1, 2], the implementation of the likely defined method is omitted so in a way in this work we extend and fulfill the similar topic. To allow the method to become suitable for the implementation, we modify choice of the vectors and . Another contribution of this paper is obtaining better numerical results with respect to the number of iterations among some known methods for unconstrained optimization.
Further, the following notation is used: where denotes the gradient of and denotes the Hessian of . As usual, denotes the transpose of .
On the other hand, computation of the step length is also important. The common way to determine stepsize is the inexact line search technique. The only requirement in the line search procedure is decreasing in the objective function values. This way we calculate the step length which is appropriate enough for our iterative optimization problem (see [5–12]).
In the present paper, we use a combination of the iterative scheme (2) and the accelerated gradient descent method from . More precisely, the first term, , in (2) is defined using the principles from the  method. The second term, , appears as the correction factor which is defined from the Taylor expansion series.
The paper is organized as follows. In Section 2, the basic motivation and idea for deriving the accelerated gradient descent method of the form (2) are explained. The algorithm of derived Accelerated Double Direction method, in short method, is presented in Section 3, where also the main result of this work is analyzed. The convergence of method is proved in Section 4. Numerical tests and comparisons of derived method with the accelerated gradient descent method with line search (so-called method) originated in  as well as with nonaccelerated version of method ( method) are given in Section 5.
The accelerated gradient descent () methods of the form use as an acceleration parameter. The first method is originated in . A type of method is considered in . This method is called method and it is derived starting from the Newton iterative method with the line search where is an appropriate approximation of the Hessian inverse, presented as a symmetric positive definite matrix. Taking , as an approximation of the inverse of the Hessian, the authors in  derived an accelerated modified Newton scheme: where step length is computed by means of the backtracking inexact line search procedure and is the length of the acceleration parameter given by the following expression:
In this paper, we are using the motivation for calculating an accelerated parameter, , from Taylor’s extension of an objective function. But, unlike method, we choose to evaluate a method which contains two direction vectors.
A multistep minimization iterative process (2) with two direction vectors is described in [1, 2]. This algorithm considers generally nondifferentiable functions and it consists of three partial subalgorithms, each explaining one of the needed parameters: two direction vectors and and the stepsize . Since this work considers uniformly convex or strictly convex quadratic functions, we modify the proposed subalgorithms in a way according to the present conditions. In a further section, we give three supplemented algorithms based on propositions originally written in .
3. Main Algorithm
Taking into consideration the results obtained in [1, 13], we can construct new iterative method. That process has a form predefined by (2), where the parameter has properties taken from method. Using the same notation as in the previous section, the process (2) becomes assuming that the direction vector is, according to method, defined by . Practically, deriving the direction vector is reduced to deriving the positive real number . Taking vector and step length defined similarly as in , we get Now, from Taylor’s expansion of the second rate, the approximation of can be brought as follows: The matrix is, like in , replaced by and the parameter fulfills the following condition: Knowing this, the expression (10) becomes From (12), is computed in the following way:
It is supposed that ; otherwise, the second-order necessary condition and second-order sufficient condition will not be fulfilled. If in some iterative step it happens that , we take .
Then, the next iterative point is computed by
Supplemented further, the main contribution in this paper is presented by Algorithm 3.
Remark 1. It is possible to compare iterations (9) proposed in the present paper with the general iterative scheme proposed in . The search direction in  is defined as a linear combination of and . On the other hand, the search direction in (9) is defined as a particular linear combination of and the vector defined in Algorithm 2.
4. Convergence Analysis
In this section, the convergence analysis of constructed method is discussed. We will first analyze a set of uniformly convex functions and afterwards a subset of strictly convex functions. We will start with the following proposition and lemma that can be found in [16, 17].
Proposition 2 2 (see [16, 17]). If the function is twice continuously differentiable and uniformly convex on , then (1)the function has a lower bound on , where is available; (2)the gradient is Lipschitz continuous in an open convex set which contains ; that is, there exists such that
The estimation of decreasing of a given uniformly convex function in each iteration is described in the following lemma taken from .
Proof. The proof follows directly from the proof of Lemma 4.2 in  using instead of .
The following theorem guarantees a linear convergence of Accelerated Double Direction method. The proof is the same as the proof of Theorem in .
Theorem 5 (see ). If the objective function is twice continuously differentiable and uniformly convex on and the sequence is generated by Algorithm 3, then and the sequence converges to at least linearly.
We now consider the case of strictly convex quadratic function which has the form where is real symmetric positive definite matrix and . This particular case is observed since the convergence of gradient methods is generally difficult and nonstandard. In the following analysis, we will use some known assumptions taken from [18–20]. Let be eigenvalues of the matrix . In , the -linear rate of convergence is presented for BB method under the assumption .
Lemma 6. Let be a strictly convex quadratic function given by the expression (21) which involves symmetric positive definite matrix and the gradient descent method (8). Let and be, respectively, the smallest and the largest eigenvalues of . Let the parameters , , and be determined according to (13) and Algorithm 3. Then, the following holds:
Proof. According to (21), the difference between the values of the objective strictly convex quadratic function in current and previous iterative point is
Knowing the fact that , it follows that
Replacing the equivalence , we obtain Further grouping gives and this leads to Using the symmetry properties of matrix implies
Switching the last equivalence into (13), becomes and further
Finally, which implies definitive expression for : Since is a real symmetric positive definite matrix and since the previous expression for presents the Rayleigh quotient of the real symmetric matrix at the vector , it can be concluded that The fact implies the left hand side of inequality (22). The right hand side of the same inequality arises from the inequality which is proved in , in Lemma 4. The direct consequence of the previous expression is We know that is symmetric and . Considering these two relevant facts, we can conclude that which means that in the last expression the largest eigenvalue of matrix has the property of Lipschitz constant . In the backtracking algorithm, we chose that parameters and take the values and . As a resulting inequality, we have and with this expression we are ending the proof.
Proof. Suppose that are orthonormal eigenvectors of symmetric positive definite matrix and let be the sequence of values constructed by Algorithm 3. For some and value , . On the other hand,
for some real constants and .
From (2), it follows that which together with (41) gives To prove (38), it is enough to show that since for all . There are two cases. First, if implying (22), we can conclude the following: Now, let us examine another case . Since we have
To prove (40), the representation (41) is used: Knowing that together with the proved inequalities (38) leads us to expression (40) which is the final conclusion.
5. Numerical Results
In order to numerically prove the acceleration property of method, we constructed the nonaccelerated version of this scheme and named it Nonaccelerated Double Direction method, shortly method. For that purpose, we had to eliminate the acceleration parameter . Since presents an approximation of the inverse of the Hessian, the question was what the adequate substitution for in iteration (9) is. The natural choice for nonaccelerated counterpart of method is defined by taking constant value for all in each iteration (9). Then, the Hessian is approximated by the identity matrix in each iteration. That is way the nonaccelerated form of the process is obtained and iteration (9) becomes
We tested the presented Accelerated Double Direction method, in short method, on a large scale of unconstrained test problems given in 25 functions proposed in . Through the execution, we investigate the number of iterative steps since our primary goal is to reduce this number. Each of 25 functions is tested for 10 numerical experiments. In order to have more general view of analyzed characteristic number of iterations, we choose to test cases of large number of variables: 1000, 2000, 3000, 5000, 7000, 8000, 10000, 15000, 20000, and 30000. method is compared with method, since in  method is already compared with method and (gradient descent) method from , but for a lower number of variables (500, 1000, 2000, 3000, 5000, 7000, 8000, 10000, and 15000). In the same paper, it is proved that method outperformed and methods with respect to the number of iterative steps. Since our aim is to improve the numerical results with respect to this characteristic by using constructed method, it is enough to show that on this matter algorithm obtains better results than . The stopping criteria for Algorithm 3 are like those in  for method:
The codes that are used for testing are written in the programming language on a Workstation Intel Celeron 2.2 GHz.
The presented results in Table 1 show the enormous dominance of method with respect to the number of iterations. Among the 25 tested functions, a very big difference in the number of iterations in favor of method is obvious. Precisely, in cases of even 20 of the tested functions, ADD is significantly more effective with respect to the analyzed characteristic than method.
In Table 1, the test results of the method are also presented. During the test procedures needed execution time for method was evidently too long. That is why we defined an execution time limiter parameter as follows: if the test execution is lasting more than , we will stop further testing and declare that the testing is too long. The longest execution time in testing method is obtained for Diagonal 7 function and it is totaled 3287 seconds. We doubled this time, approximated it, and denoted this approximation as : Having this criterion included, we were able to test only 3 of 25 test functions by scheme for the proposed number of variables (1000, 2000, 3000, 5000, 7000, 8000, 10000, 15000, 20000, and 30000). Obtained results show obvious dominance of the acceleration properties of method comparing to its nonaccelerated version. The notation in Table 1 means that the execution time exceeds .
Considering the presented results in Table 1 for all 25 given functions and all 250 tests, Table 2 actually illustrates the fact that method has approximately 66 times lower number of iterations than method.
To get more clear comparison between and , we did additional tests for 100 times lower number of variables: 10, 20, 30, 50, 70, 80 100, 150, 200, and 300. The contents of Table 3 display that in this case 9 of 25 test functions were achievable for testing by scheme.
Remark 8. During the testing procedures, we were able to test Generalized quartic function for larger number of variables , and , but while applying iteration for a 100 times lower set of variables on this function specially for and , time limiter parameter is . That is why this function is not displayed in Table 3.
According to the results of Table 3, obtained for and methods, we display average values for 90 test executions among the 9 functions that satisfy defined time limiter condition in Table 4. This table confirms even 1502 times better results in favor of method compared to its nonaccelerated dual method .
We used the proposed form of the iteration for unconstrained optimization problems from  to define in similar way a double direction method for uniformly convex functions and for some strictly convex quadratic functions. The presented iterative method is an acceleration gradient descent method, constructed from the Newton method with the line search, approximating the Hessian by appropriate diagonal matrix.
The aim of constructing Accelerated Double Direction method, in short method, is to improve the number of iterations for chosen test functions from  for a large number of parameters and this goal is successfully obtained. Also, important contribution of method is the implementation of specific form of iteration introduced originated in .
It is proved that is linearly convergent method for the uniformly convex functions and for the special subset of strictly convex quadratic functions.
In order to confirm the advantages of accelerated properties of iteration, a nonaccelerated representation of scheme, method, is constructed. Comparative tests substantiate the enormous benefits of method. Derived accelerated method is also compered with method whose dominance among the method and method has alreday been proven in . algorithm generates multiple times better numerical results with respect to the number of iterations compared to method.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
D. G. Luenberger and Y. Ye, Linear and Nonlinear Programming, Springer Science, Business Media LLC, New York, NY, USA, 2008.View at: MathSciNet
W. Sun and Y. X. Yuan, Optimization Theory and Methods: Nonlinear Programming, vol. 1, Springer, New York, NY, USA, 2006.View at: MathSciNet
J. J. Moré and D. J. Thuente, On Line Search Algorithm with Guaranteed Sufficient Decrease, Mathematics and Computer Science Division Preprint MSC-P153-0590, Argone National Laboratory, Argone, Ill, USA, 1990.
M. J. D. Powell, “Some global convergence properties of a variable metric algorithm for minimization without exact line searches,” in Nonlinear Programming, vol. 9 of SIAM-AMS Proceedings, pp. 53–72, American Mathematical Society, Philadelphia, Pa, USA, 1976.View at: Google Scholar | Zentralblatt MATH | MathSciNet
J. M. Ortega and W. C. Rheinboldt, Iterative Solution of Nonlinear Equation in Several Variables, Academic Press, London, UK, 1970.View at: MathSciNet
R. T. Rockafellar, Convex Analysis, Princeton University Press, Princeton, NJ, USA, 1970.View at: MathSciNet
N. Andrei, “An unconstrained optimization test functions collection,” Advanced Modeling and Optimization, vol. 10, pp. 147–161, 2008.View at: Google Scholar