A Transformation of Accelerated Double Step Size Method for Unconstrained Optimization
A reduction of the originally double step size iteration into the single step length scheme is derived under the proposed condition that relates two step lengths in the accelerated double step size gradient descent scheme. The proposed transformation is numerically tested. Obtained results confirm the substantial progress in comparison with the single step size accelerated gradient descent method defined in a classical way regarding all analyzed characteristics: number of iterations, CPU time, and number of function evaluations. Linear convergence of derived method has been proved.
1. Introduction and Background
The SM iteration from  is defined by the iterative processwhere is a new iterative point, is the previous iterative point, is the gradient vector (search direction), is a step length, and is the acceleration parameter. In  it is verified that the accelerated gradient SM iteration (1) outperforms the gradient descent, GD, as well as the Andrei’s accelerated gradient descent AGD method from .
Double direction and double step size accelerated methods, denoted by ADD and ADSS methods, respectively, for solving the problems of unconstrained optimization are presented in [3, 4]. These two algorithms can be generally formulated through the next merged expression: where is the previous iterative point and real values and denote two step lengths while the vectors and are two vector directions. The values of step lengths are determined by backtracking line search techniques. The gradient is basically used for defining a search direction, but some new suggestions for deriving a descending vector direction are given in [3, 5]. Taking the substitutionsinto (2) produces the ADD iterative scheme from :where represents the acceleration parameter for the iteration (4). The benefits of the acceleration properties that arise from the usage of the parameter are explained in . The so called nonaccelerated version of ADD method (NADD method shortly) is defined in order to numerically verify the acceleration property of the parameter . Three methods, SM, ADD, and NADD, are numerically compared in . Results show the enormous efficiency of ADD scheme in comparison with its nonaccelerated counterpart NADD. Derivation of the direction vector is explained by the Algorithm 3.2 in . The ADD outperforms its competitive SM method from  with respect to the number of iterations.
By replacing the vectors and from (2) by and , respectively, the next iteration is defined as The previous scheme is noted as ADSS model and it is proposed in . In the same paper, a huge improvement in performances of this accelerated gradient descent method when compared to the accelerated gradient descent SM method from  is numerically confirmed.
The main contribution of the present paper is a transformation of the double step size iterative scheme (5) for unconstrained optimization into an appropriate accelerated single step size scheme (called TADSS shortly). Convergence properties of the introduced method are investigated. A special contribution is given by the numerical confirmation that the TADSS algorithm developed from the double step size ADSS model (5) is evidently more efficient than the accelerated SM method obtained in a classical way. Surprisingly, numerical experiments show that the TADSS method overcomes the initial ADSS method.
The paper is organized in the following way. The reduction of the double step size ADSS model into the single step size iteration TADSS and the presentation of defined accelerated gradient decent model are given in Section 2. Section 3 contains the convergence analysis of derived algorithm for uniformly convex and strictly convex quadratic functions. The results of numerical experiments as well as their comparative analysis of developed method and its forerunners are illustrated in Section 4.
2. Transformation of ADSS Scheme into a Single Step Size Iteration
Very advanced numerical results obtained in  motivated further research on this topic. An idea is to investigate the properties of a single step size method developed as a reduction of the double step size ADSS model. This reduction is defined by an additional assumption which represents a trade-off between two step length parameters and in the ADSS scheme:Taking into account assumption (6) into expression (5), which defines the ADSS iteration, leads to the iterative processThe iteration (7) is noted as transformed ADSS method, or shortly TADSS method. Defined TADSS iteration represents not only a reduction of the double step size ADSS model into the corresponding single step size method, but also a sort of modification of the single step size SM iteration from . This modification can be explained as the substitution of the product , from the SM iteration (1), by the multiplying factor of the gradient from the TADSS iteration (7).
For the sake of simplicity, we use the notation whenever it is possible. The value of the acceleration parameter in th iteration can be derived by using Taylor’s expansion, similarly as described in [1, 3, 4]:The vector in (8) satisfiesFurther, it is reasonable to replace in (8) the Hessian by the diagonal matrix , where is an appropriately chosen real number. This replacement implies The relation (10) allows us to compute the acceleration parameter :
Next, the natural inequality is inevitable. This condition is required in order to fulfill second-order necessary condition and second-order sufficient condition. The choice is reasonable in the case when the inequality appears for some . This choice produces the next iterative point aswhich evidently represents the classical gradient descent step.
We consider now the th iteration, , which is given byExamine the function :defined as the finite part of the Taylor expansion of the functionunder the assumption . This function is convex when , and its derivative is calculated in the following way:
Since the inequality is achieved, the following is valid:Therefore, the function decreases in the case and achieves its own minimum in the case . According to the criteria given by (17), desirable values for are within the interval . Now, (7) is a kind of the gradient descent process in the case . Since , it is easy to verify the following condition for the step length :Since in the case , this fractional number is not appropriate upper bound for in this case. On the other hand, the inequality holds in the case , so that is an appropriate upper bound for in this case.
According to the previous discussion, the iterative step is derived by the backtracking line search procedure presented in Algorithm 1.
Algorithm 1 (calculation of the step size by the backtracking line search which starts from the upper bound defined in (22)). Requirement: objective function , the direction of the search at the point , and real numbers and .(1)Set .(2)While take .(3)Return .
Finally, the TADSS algorithm of the defined accelerated gradient descent scheme (7) is presented.
Algorithm 2 (transformed accelerated double step size method (TADSS method)). Requirement: , , , .(1)Set , compute , , and take .(2)If , then go to Step 8; else continue by the next step.(3)Find the step size applying Algorithm 1.(4)Compute using (7).(5)Determine the scalar using (11).(6)If or , then take .(7)Set ; go to Step 2.(8)Return and .
3. Convergence of TADSS Scheme
The content of this section is the convergence analysis of the TADSS method. In the first part of this section a set of uniformly convex functions is considered. The proofs of the following statements can be found in [6, 7] and have been omitted.
Proposition 3 (see [6, 7]). If the function is twice continuously differentiable and uniformly convex on then (1)the function has a lower bound on , where is available;(2)the gradient is Lipschitz continuous in an open convex set which contains ; that is, there exists such that
Lemma 4. Under the assumptions of Proposition 3 there exist real numbers , satisfying such that has a unique minimizer and
The value of decreasing of analyzed function through each iteration is given by the next lemma which is restated and proven in . The same estimation can similarly be found considering iteration (7). Theorem 6 is approved in  and confirms a linear convergence of the constructed method.
Lemma 5. For twice continuously differentiable and uniformly convex function on and for the sequence generated by Algorithm (7) the following inequality is valid:where
Theorem 6. If the objective function is twice continuously differentiable as well as uniformly convex on and the sequence is generated by Algorithm 2 thenand the sequence converges to at least linearly.
In the following review the case of strictly convex quadratic functions is analyzed. This set of functions is given byIn the previous expression is a real symmetric positive definite matrix and . It is assumed that the eigenvalues of the matrix are given and lined as . Since the convergence for the most gradient methods is quite difficult to analyze, in many research articles of this profile convergence analysis is reduced on the set of convex quadratics [8–10]. The convergence of TADSS method is also analyzed under similar presumptions.
Lemma 7. By applying the gradient descent method defined by (7) in which parameters and are given by relation (11) and Algorithm 1 on the strictly convex quadratic function expressed by relation (29) where presents a symmetric positive definite matrix, the next inequalities hold:where and are, respectively, the smallest and the largest eigenvalues of .
Proof. Considering expression (29), the difference between function value in the current and the previous point isApplying expression (7) the following is obtained: Using the facts that the gradient of the function (29) is in conjunction with the equality , one can verify the following: Substituting (33) into (11), the parameter becomesThe last relation confirms that is the Rayleigh quotient of the real symmetric matrix at the vector , so the next inequalities hold:which combined with the fact that prove the right hand side in (30):The estimation proved in , is considered in order to prove the left hand side of (30). Using the notation adopted in this paper, expression (37) becomesInequality (38) and the facts that , and lead to the following conclusion: In the last estimation, Lipschitz constant can be replaced by . The conclusion that the eigenvalue of matrix has the property of Lipschitz constant is to be derived from the next analysis. Since matrix is symmetric and the following inequality can be provided:Substituting by , inequality (39) becomes and this proves the left hand side of inequalities (30).
Remark 8. Comparing the estimations resulting from the similarly proposed lemma in [1, 3, 4] with the estimation derived from the previous lemma, considering the TADSS method, it can be concluded that the estimation provided by the TADSS scheme involves only the eigenvalues and and not the parameter from the backtracking procedure.
Theorem 9. Let the additional assumptions for the eigenvalues of matrix be imposed and let be the strictly convex quadratic function given by (29). Assume is the orthonormal eigenvectors of symmetric positive definite matrix and suppose that is the sequence of values constructed by Algorithm 2. The gradients of convex quadratics defined by (29) are and can be expressed as for some real constants and for some integer . Then the application of the gradient descent method (7) on the goal function (29) satisfies the following two statements:
Proof. Taking into account (7) one can verify and by taking (42) we get In order to prove (43), it is enough to show that . So, two cases have to be analyzed. In the first one, it is supposed that . Applying (30) leads to In the other case, it is assumed that . From this condition arrives the following conclusion: Expression (42) impliesThe fact that the parameter , from (43), satisfies confirms expression (44).
4. Numerical Experience
Numerical results provided by applying the implementation of TADSS, ADSS, and SM methods on test functions for unconstrained test problems, proposed in [2, 11], are presented and investigated. We chose most of the functions from the set of test functions presented in [3, 4] and, as proposed in these papers, also investigated the experiments with a large number of variables in each function: 1000, 2000, 3000, 5000, 7000, 8000, 10000, 15000, 20000, and 30000. The stopping criteria are the same as in [1, 3, 4]. Backtracking procedure is developed using the values , of needed parameters. Three main indicators of the efficiency are observed: number of iterations, CPU time, and number of function evaluations. First, we compare the performance of the TADSS scheme with the ADSS method. The reasons for this selection is obvious: the TADSS scheme presents a one-step version of ADSS method. Also, the intention to examine behavior of TADSS and compare it with its forerunner is natural. Obtained numerical values are displayed in Table 1 and refer to the number of iterative steps, the CPU time of executions computed in seconds, and the number of evaluations of the objective function.
Obtained numerical results, generally, confirm advantages in favor of TADSS, considering all three tested indicators. More precise, regarding the number of iterative steps TADSS shows better results in 17 out of 22 functions, while ADSS outperforms TADSS in 4 out of 22 experiments and for the extended three exponential terms function both methods require the same number of iterations. Results concerning spanned CPU time confirm that both methods, TADSS and ADSS, are very fast. In 9 out of 22 cases TADSS is faster than ADSS, in 2 out of 22 testings ADSS is faster than TADSS, and even in half of examinations the CPU time of both iterations equals zero. On the issue of the number of evaluations of an objective function, TADSS improves ADSS in 17 out of 22 testings and the opposite case appears in 5 out of 22 cases. Table 2 displays average results of tested values.
According to results displayed in Table 2, it can be concluded that although TADSS outperforms ADSS in 17 out of 22 testings with regard to the number of iterations, average results show slight advantage of ADSS on this matter. Considering the average number of evaluations, there is an opposite case in favor of TADSS. Consumed CPU time is averagely three times less in favor to the TADSS comparing to the ADSS. Generally, it can be concluded that the one-step variant of the ADSS method, constructed TADSS scheme, behaves slightly better than the original ADSS iteration, especially when we consider the speed of executions.
Some additional experiments have been carried out in further numerical research. These tests show the comparison between the TADSS and the SM iterations. As mentioned before, both of the schemes, TADSS and SM, are accelerated gradient descent methods with one iterative step size parameter. We choose this additional numerical comparison in order to confirm that the accelerated single step size TADSS algorithm, derived from the accelerated double step size ADSS model, gives better performances with respect to the all three analyzed aspects than the classically defined accelerated single step size SM method. Table 3 with 30 displayed test functions verifies the previous assertion.
It can be observed from displayed numerical outcomes that the TADSS method provides better results than the SM method considering the number of iterations in 24 out of 30 testings, while the number of opposite cases is 5 out of 30. For the NONSCOMP test function, both models have the same number of iterations. Concerning the CPU time, both algorithms give the same results for 10 test functions. The TADSS method is faster than SM for 19 test functions, while the SM method is faster than TADSS for one test function only. The greatest progress is obtained with respect to the number of evaluations of the objectives. On this matter, using the TADSS algorithm, better results are obtained in 27 out of 30 test functions, while the opposite case holds for two test functions only. For the NONSCOMP test function both of the compared iterations give the same number of evaluations. From Table 3, we can also notice that for 2 out of 30 test functions testings are lasting more than the time limiter constant defined in , while for all 30 test functions when the TADSS algorithm is applied the time of execution is far less than .
The results arranged in Table 4 give even more general view on the benefits provided by applying the TADSS method with regard to the SM method. The average values of 28 test functions, which were possible to test by both methods according to the constant , are presented in the table.
The results presented in the previous table confirm that by applying the TADSS method approximately 25 times less iterations and even 35 times less evaluations of the objective function are needed in comparison with the SM method. Finally, when the TADSS is used, the testing is lasting even 107 times shorter than when the SM is applied.
The codes for presented numerical experiments are written in the Visual C++ programming language on a Workstation Intel 2.2 GHz.
The accelerated single step size gradient descent algorithm, called TADSS, is defined as a transformation of the accelerated double step size gradient descent model ADSS, proposed in . More precisely, the TADSS scheme is derived from the accelerated double step size gradient descent scheme ADSS by imposing relation (6) between two step parameters in the ADSS iteration. The efficiency of ADSS model regarding all analyzed characteristics in comparison to the accelerated gradient descent single step size SM method has been numerically proved in .
The method defined in this way is comparable with its double step size forerunner, ADSS method, as well as with the single step size accelerated gradient descent SM method which is defined in a classical manner.
Results illustrated in Tables 1 and 2 generally indicate that the TADSS method behaves similarly as the ADSS method. From the point of mean values, the ADSS scheme gives slightly better results considering the number of iterations. On the other hand, a certain improvement regarding the number of function evaluations and needed CPU time is obtained by applying the TADSS iterations.
Even greater advantages of derived TADSS method are presented in Tables 3 and 4 where the comparisons between the TADSS and the SM are given. Evidently, the TADSS scheme improves the SM method with respect to all three analyzed characteristics, which was the prime goal in the research presented herein.
Linear convergence of the TADSS method is proved for the uniformly convex and the strictly convex quadratic functions.
Obtained results motivate further investigations of possible accelerated double step size gradient descent models and its transformations into corresponding single step size variants.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
The authors gratefully acknowledge the financial support of the Serbian Ministry of Education, Science and Technological Development.
J. M. Ortega and W. C. Rheinboldt, Iterative Solution of Nonlinear Equation in Several Variables, Academic Press, London, UK, 1970.View at: MathSciNet
R. T. Rockafellar, Convex Analysis, Princeton University Press, Princeton, NJ, USA, 1970.View at: MathSciNet
N. Andrei, “An unconstrained optimization test functions collection,” Advanced Modeling and Optimization, vol. 10, pp. 147–161, 2008.View at: Google Scholar