Abstract
This paper presents a general and comprehensive description of Optimization Methods, and Algorithms from a novel viewpoint. It is shown, in particular, that Direct Methods, Iterative Methods, and Computer Science Algorithms belong to a well-defined general class of both Finite and Infinite Procedures, characterized by suitable descent directions.
1. Introduction
The dichotomy between Computer Science and Numerical Analysis has been for many years the main obstacle to the development of eclectic computational tools. With the latter term the author indicates the capability of implementing algorithms properly adaptable to particular environmental requirements and, therefore, optimized for this aim.
Since the formulation of a problem requires the preliminary definition of the variables, and the functions involved in the model, the antithesis between finite and continuous applied mathematics is even stronger from a computational point of view.
In Computer Science, problems are typically defined on discrete sets (graphs, integer variables and so forth) and are characterized by procedures formalized in a finite number of steps.
Direct Methods, which are classical tools of Numerical Analysis, can be considered, in fact, algorithms according to the standard Computer Science definitions. However, the presence of ill-conditioned matrices can seriously affect the practical implementation of Direct Methods. On the other hand, Iterative Methods are based in the majority of cases on the convergence of a sequence approximating the optimal solution of a problem defined in a continuous range. Proper stopping rules on the truncation error reduce the latter computational scheme to a finite process, but, unfortunately, in many cases the theoretical result is affected by a variety of numerical instability problems, thereby preventing a precise forecast of the true number of iterations, requested to achieve the desired approximation.
Furthermore, Linear Programming, Convex Quadratic Programming, and the unconstrained minimization of a symmetric positive definite bilinear form are continuous problems that can be exactly solved with a finite number of steps. This proves that the distinction between algorithms and infinite iterative procedures is not always characterized by the discrete or the continuous range of the variables involved in the problem.
Most of Numerical Analysis methods are based upon the application of the Fixed Point theorem, assuring the convergence of the iterative scheme by means of a contraction of the distance between successive terms of the sequence approximating the optimal solution.
Gradient methods are usually considered in the literature as particular procedures in the frame of optimization techniques, for classical unconstrained or constrained problems.
The main aim of the present paper is to show that Gradient or Gradient-type methods represent the fundamental computational tool to solve a wide set of continuous optimization problems, since they are based on a unitary principle, referred to both to finite and to infinite procedures.
Moreover, some classical discrete optimization algorithms can be also viewed in the framework of Gradient-type methods.
Hence, the gradient approach allows to deal with problems involving variables defined both in a continuous range and in a discrete one, by utilizing finite or infinite procedures in a quite general perspective.
It is essential to underline that ABS methods [1], which represent a remarkable class of algorithms for solving linear and nonlinear equations, are founded on a quite different approach. Roughly speaking, ABS-methods construct, in fact, a set of spanning matrices in , by performing an adaptive optimization, associated to the dimension of the subspace and parameter dependent. In many ABS-methods the choice of the set of optimal parameters is, in fact, crucial in order to identify by a unified approach the structural features of the optimization algorithms. Parameter dependence is not present in Gradient-type methods.
It is important to emphasize that the typical finiteness of Computer Science algorithms is characterized by classes of Gradient-type methods converging to an isolated point of a suitable sequence, generated by the procedure.
Furthermore, the most recent algorithms for Local Optimization can be precisely described by Gradient-type methods in a general framework. As a matter of fact, Interior Points techniques [2, 3], Barrier Algorithms [4] represent a wide set of Gradient-type methods for NonLinear Programming.
Moreover, a fundamental role in this new approach is played by the properties of suitable Structured matrices, associated to the optimization procedures. Advanced Linear Algebra Techniques are, in fact, essential to construct low-complexity algorithms.
We point out, in particular, the techniques based on Fast Transforms and the corresponding approximations by algebras of matrices simultaneously diagonalized [5–10].
The utilization of Advanced Linear Algebra Techniques in NonLinear Programming opens a new research field, leading in many cases to a significant improvement both of the efficiency and in the practical application of Gradient-type methods for problems of operational interest [11, 12].
In Deterministic Global Optimization structured matrices allow remarkable results in the frame of the Tunneling techniques, by using the classical approach [9].
The novel results on Tensor computation [13] are a promising area of research to improve the efficiency of global optimization algorithms for large-scale problems and particularly for the effective construction of more general sets of Repeller matrices in the Tunneling phases [14, 15]. This approach can have important consequences also in Nonlinear Integer Optimization (see the pioneer work in [16]), taking into account the more recent results concerning the discretization of the problem by the continuation methods (see, e.g., [17]).
Therefore, this survey has also the aim of finding in-depth general relationships between Local Optimization techniques and Deterministic Global Optimization algorithms in the frame of Advanced Linear Algebra Techniques.
2. The Gradient and the Gradient-Type Approach
Let be an unconstrained problem to be solved.
By assuming , the simplest heuristic procedure to deal with (2.1) is to determine the stationary points of , that is, by the recursive computational scheme: with being the initial point of the procedure, the direction of maximum local decreasing, the one-dimensional step-size, ,
is computed such that and the sequence satisfies the condition The iterative method (2.2) is a particular case of the following general Gradient-type method: where is a descent direction, that is, .
The following theorem generalizes a well-known result shown in [18].
Theorem 2.1 (see [7]). If is a descent direction in for a function , then
with being a symmetric positive definite (spd) matrix.
Moreover, the following property holds:
Remark 2.2. Particular cases of descent directions can be obtained, by setting,
,
,
(see [5, 6, 18, 19]).
It is useful to underline that the general theory of admissible directions for unconstrained optimization [20] is also a special case of (2.5). By setting, in fact, for a given :
one can obtain other Gradient-type methods described by (2.5).
The iterative scheme described by Algorithm 1 contains several ingredients of a general Gradient-type method see [20].
|
The convergence of Algorithm 1 is guaranteed by the following result (see again [20]).
Theorem 2.3. Let , open . Let . Then, evaluated by Algorithm 1: (i);(ii)if has at least an Extremal Point (EP) ;(iii)every EP of is a stationary point, that is, .
Remark 2.4. Notice that Theorem 2.3 can be also applied in the case of classical Computer Science algorithms. As a matter of fact, if condition (ii) is not verified, then, by definition, , implying , that is, the convergence to a stationary point in a finite number of steps . Moreover, the convergence to an isolated point of the sequence can be proven ab absurdo by showing that We will see in par. 3-4 that the convergence in a finite number of steps of a given iterative procedure can be verified in this way both for the unconstrained problems and for the constrained ones.
3. Local Unconstrained Optimization
Let and be a spd matrix of order and a -dimensional vector, respectively.
It is well known that the problem can be exactly solved in at most steps by the Conjugate Gradient method [21], which represents a direct method to solve (3.1). The quadratic form associated to a spd matrix is, in fact, a convex function.
However, it can be also proved that the application of the procedure defined in (2.2), that is, the Steepest Descent method, always requires an infinite number of iterations, apart from the trivial case . The latter result shows that the existence of a finite procedure to solve (3.1) does not depend only by the role played by convexity but it is also the consequence of a sort of optimal matching between the problem and the corresponding algorithm, which is in this case the method. On the other hand, the latter method can be also interpreted as an iterative method in the family of the following fixed point procedures: with being a suitable scalar parameter. By setting, in fact, one can obtain the classical iterative scheme Since , (3.4) is convergent if the original matrix is well conditioned.
Moreover, if is the optimal solution of (3.1), the truncation error of the method is: In the case of method, one can prove the inequality Equation (3.6) shows that, if the dimension is huge and the matrix is well conditioned, from a computational point of view it is more convenient to implement the method as a classical iterative procedure with a stopping rule based on the above inequality.
So, once again, the distinction between Numerical Analysis direct methods (or Computer Science algorithms) and infinite procedures cannot be considered as the fundamental classification rule in computational mathematics.
In the case of Steepest Descent method, the truncation error is The difference between (3.6) and (3.7) clearly indicates the major efficiency of method.
In [22] the finite version of -method was extended to a family of nonquadratic functions, including the following important sets: where .
According to the classical definition, the function indicated in (3.8) is called conic. If , then .
Hence, represent a class of nonquadratic functions for which the optimal solution can be found with a finite number of steps if the matrix is spd.
As a matter of fact, the following result holds.
Theorem 3.1 (see [22] Theorem 3.1, Lemmas 3.2 and 5.1). Let be defined as in (3.9). Then the minimum problem can be solved in at most steps.
Let us now consider some generalizations of the convexity, which play an important role in global optimization see [7].
Let be a descent direction in for a function . The importance of the following definitions will be shown in the next results of this paragraph.
Definition 3.2. A function is called algorithmically convex if evaluated by an algorithm of type (2.5), one has
Definition 3.3. A function is called weakly convex if evaluated by an algorithm (2.5), the following inequality holds:
Definition 3.4. Let be descent directions of an algorithm of type (2.5) applied to problem (2.1). Then the method is called secant if the matrix solves the secant equation:
Definition 3.2 is clearly a generalization of convexity. As a matter of fact, if then , is convex if and only if (3.11) is verified (see [23]).
Definition 3.3 is also a generalization of convexity. In [24], in fact, it is proved that if , is convex, then (3.12) is satisfied . So (3.12) is a necessary, but not sufficient, condition for a function to be convex.
Definition 3.4 is an -dimensional generalization of the classical secant iterative formula to compute the zeroes of the derivative of a function , that is, Observe, in fact, that (3.14) can be rewritten as Hence, the expression of is the 1-dimensional version of (3.13).
In [7] it is proved the following result.
Theorem 3.5 (see also [18]). Let be descent directions of a secant method, that is, satisfying (3.13), applied to problem (2.1). Moreover, let conditions (2.7) and (3.12) be verified. Then, , such that
Remark 3.6. Theorem 3.5 shows that a global convergence for a quasi-Newton secant method applied to problem (2.1) can be obtained if the function is weakly convex and the matrices approximating are well conditioned.
Remark 3.7. By utilizing Armijo-Goldstein-Wolfe's method [18] and setting , the step in (2.5) is such that Hence, by Definition 3.2, in this case the function is also algorithmically convex. For general descent directions , evaluated by a quasi-Newton secant method, inequality (3.11) is not always satisfied.
4. Local Constrained Optimization
Quadratic Programming (QP) is defined in the following way: with being a symmetric semidefinite positive (ssdp) matrix of order and a matrix with rows and columns.
Remark 4.1. Let . The optimal solution of (4.1) can be located in any point of . Hence, (4.1) is a continuous problem which cannot be immediately reduced to a finite problem as in the case , that is, Linear Programming (LP).
Let us consider, for instance, the following problems:
The optimal solution of (4.2) is the point which is in the boundary of but is not a vertex. On the other hand, problem (4.3) has the optimal solution in the inner point . However, QP can be solved in general in a finite number of steps by means of Frank-Wolfe's algorithm [25]. So, QP can be considered as a finite continuous constrained optimization problem.
The following question arises: does QP characterize the boundary, separating finite continuous constrained optimization problems from infinite ones? In other words, there exist more general nonlinear constrained optimization problems that can be solved in a finite number of iterations? Since in the unconstrained case we have shown in the previous paragraph that there exist nonquadratic problems that can be exactly solved in a finite number of iterations by utilizing the -method, the answer is expected to be positive.
Given a convex function convex , Convex Programming with Linear Constraints (CPLC) is defined as Problem (4.4) can be solved by the Reduced Gradient (RG) algorithm or by the Gradient Projection (GP) method [23, 26, 27].
Assuming with maximum rank and taking into account Remark 2.4, one can introduce the following.
Definition 4.2. Let convex , be a convex function.
Let be a nonempty polyhedron. The corresponding CPLC problem (4.4) is a finite continuous constrained optimization problem, if and only if there exists a convergent Gradient-type method (2.5) and a positive real number , such that if (2.5) would require an infinite number of steps, then
Equation (4.5) clearly implies that .
The importance of Definition 4.2 can be pointed out by the next result, showing the relationship between (4.4) and a particular linear optimization problem.
Theorem 4.3. Let be an admissible solution of (4.4). Let be a descent direction in for the function . Then is an admissible descent direction for (4.4) if .
Moreover, for any fixed the optimal solution of the problem
is given by
By setting and it was proven in [28] that (4.6) is equivalent to a general LP problem, that is,
Furthermore, if , the following result holds (see [29]).
Theorem 4.4. Given a suitable integer and the function. then, (4.6) and hence (4.8) are equivalent to find a point : Moreover, it is possible to determine a real number and a sequence by a GP algorithm with a suitable scaling procedure (see again [29]) such that
By Theorem 4.4 and Definition 4.2 it follows that there exists a Gradient-type method (2.5) solving in a finite number of steps. Hence is a finite continuous constrained optimization problem. It is important to underline that the latter result is not a consequence of the intrinsic finiteness of the set of the possible optimal solutions (the vertices of a polyhedron) as in the classical simplex algorithm.
Given the convex functions, convex , let us now consider the general Convex Programming (CP) problem: The following property is well known [23, 26].
Definition 4.5. Letting and , then the constraints of (4.12) are qualified if one of the following conditions is satisfied: If , then (4.14) is trivially satisfied .
So, from Definition 4.5 we deduce that the constraints of CPLC problem (4.4) are always qualified. Assuming in (4.4) with maximum rank, we clearly obtain a condition equivalent to (4.13).
Definition 4.6. A set is called a convex cone if The following theorem was proved in [30] in a general Hilbert space (see Theorem 2.3).
Theorem 4.7. Let be closed convex cones, and let denote the interior of . Assume that .
Then the corresponding conic feasibility problem
can be solved in a finite number of steps.
The technique utilized to prove Theorem 4.7 is based upon the so-called Method of Alternative Projections (MAP) (see [31]).
Theorem 4.7 was extended in [30] (see Proposition 2.1) by assuming and be closed convex sets, thereby proving that a convex feasibility problem is equivalent to a conic feasibility problem. However, the open question remains how to express explicit formulas for the projection operators to convert the algorithm from and to the conified closed sets and in the case of nonlinear and nonquadratic problems. The Linear Matrix Inequality (LMI) feasibility problem was, in fact, efficiently solved in the literature (see [32]).
Remark 4.8. Theorem 4.7 can be applied to CPLC problem (4.4), by assuming
Hence, explicit formulas for the projection operators for suitable classes of nonlinear convex feasibility problems in terms of the corresponding conified sets might allow to solve CPLC problem (4.4) in the nonquadratic case with a finite number of steps. By utilizing Theorem 3.1, we can prove, in fact, the following important theorem.
Theorem 4.9 (see [33]). Consider the particular CPLC problem Assume that the optimal solution of problem (4.18) be such that . Then, (4.18) can be converted into a convex feasibility problem by utilizing a proper modification of the Alternative Projection method, and the latter algorithm converges to the optimal solution with a finite number of steps.
Remark 4.10. Given the convex set of feasible solutions the proof of Theorem 4.9 is essentially based upon the following computational ingredients:
(a)by Theorem 4.7, one can convert the closed convex set defined in (4.19) into a closed convex cone; (b)by Theorem 3.1, the extended version of -method and a suitable projection algorithm can be applied to problem (4.18) thereby obtaining a convergence with a finite number of steps.5. Global Optimization
One can prove the following global convergence theorem [7].
Theorem 5.1. Consider Problem (2.1), where .
Let be the value of the optimal solution. Assume that
If in an iterative scheme of BFGS-type,
the following conditions are satisfied:
Then
Theorem 5.1 points out as follows three conditions for a global optimization BFGS-type method.
Condition (5.1) assumes an optimal matching between the BFGS-type algorithm and the function [34]. (5.3) is equivalent to (3.12), that is, is weakly convex (see [24]).
Condition (5.4) can be easily satisfied, by modifying the matrices by a restarting procedure, because every descent direction is associated to an spd matrix (see Theorem 2.1).
Let us now consider the classical “box-constrained” problem: Let denote the current box at iteration .
Set The following global convergence theorem holds (see [35, 36]).
Theorem 5.2. Consider Problem (5.6) and assume . These hypotheses imply Set Then, it follows : Moreover, : Theorem 5.2 can be immediately extended to Problem (2.1), by assuming a growth condition on the function .
In fact, we have the following.
Corollary 5.3. Given : Equation (5.14) implies Assume Then, the convergence results proved for (5.6) can be applied to (2.1).
We can fruitfully combine the results of Theorems 5.1 and 5.2, by proving the following.
Theorem 5.4. Consider Problem (5.6) and assume .
If in a BFGS-type iterative scheme
the following conditions are satisfied:
Then (5.17) is convergent to the optimal solution of (5.6).
Proof. By the assumptions it follows: .
Hence, by (5.7) we have for all :
Set
Therefore, by (5.21) and (5.22) for all :
So,
Let be a local minimum in the box .
If and , then define
Else, set
with being a local minimum evaluated by the starting point and contained in the box . Since the assumptions of Theorem 5.2 are satisfied, by the results of [7] (see Theorem 2 and Corollary 2), it follows that (5.19), (5.20) imply that :
Applying Theorem 5.2, by inequalities (5.11) and (5.27) and by setting , we have that ,
This completes the proof.
Although the local minimization phases are performed effectively by the iterative scheme (5.17), the convergence of the method to the global minimum is usually very slow by the very nature of the approach. In particular, the number of the upper bounds and the corresponding boxes , requested to obtain a satisfactory approximation can be unacceptable from a computational point of view. In order to overcome this problem, a fast determination of “good” local minima is essential.
More precisely, by the utilization of terminal repellers and tunneling techniques [14], one can build algorithms based on a sequence of cycles, where each cycle has two phases, that is, a local optimization phase and a tunneling one. The main aim of these procedures is to build a favourable sequence of local minima (maxima), thereby determining a set of possible candidates for the global minimum (maximum) more efficiently.
By injecting in the method suitable “tunneling phases,” one can avoid the unfair entrapment in a “bad” local minimum, that is, when the condition is verified for several iterations. For this purpose, the power of the repellers, utilized in the tunneling phases, plays a crucial role. The classical and well-known use of scalar repellers [14, 34] is often unsuitable, when the dimension of the problem assumes values of operational interest. A repeller structured matrix, based on the sum of a diagonal matrix and a low-rank one [15], can be constructed to overcome the latter difficulty.
Let be an approximation of a local minimizer for .
A matrix is called a repeller matrix for if , The repeller matrix for any given computed local minimizer can be approximated in the following way (see [15]): with being the maximal scalar repeller [34] that is, with being of the following structure: In this way, the application of a BFGS-type method can be effectively extended to the tunneling phases and hence to the whole global optimization scheme (see [9, 33]).
The structure in (5.33) can be generalized by using the recent Tensor-Train (TT)-cross approximation theory [13].
It is well known, in fact, that a rank-p matrix can be recovered from a cross of p linearly independent columns (or rows). Therefore, an arbitrary matrix can be interpolated by a pseudoskeleton approximation (see [15] and again [13]). In particular, since a repeller matrix is not arbitrary and possesses some hidden structure, it is fundamental to discover a low-parametric representation, which can be useful in the tunneling phases.
An operational cross approximation method, evaluating large close-to-rank-p matrices in time complexity and by computing elements, was shown in [37].
6. Discrete Optimization
A well-known family of Computer Science methods is represented by the so-called Greedy algorithms. The simplest application of this type of procedures is in the standard Knapsack Problem (KP), that is, Greedy approach is essentially a generalization of the classical Dynamical Programming (DP) methods, which are based on the Bellman Principle. By utilizing the DP computational scheme and assuming integer, problem (6.1) can be reduced to the recursive solution of the following family of problems: where indicate the vectors associated to the first components of , respectively.
Given and , let be the value of the objective function corresponding to the optimal solution of problem (6.2).
The algorithm computes by the recursive formula By (6.3), the optimal value of (6.2) is determined by a generalized discrete Steepest Descent algorithm, since is the component of the gradient of the objective function and represents, in fact, the increase associated to the choice of the object.
Therefore, formula (6.3) is based on a discrete Steepest Descent approach, and the value assures that the corresponding solution is admissible.
Integer Nonlinear Programming with Linear Constraints problems (INPLCs) can be transformed into continuous GO problems over the unit hypercube [17]. In order to reduce the difficulties caused by the introduction of undesirable local minimizers, a special class of continuation methods, called smoothing methods, can be introduced [38]. These methods deform the original objective function into a function whose smoothness is controlled by a parameter. Of course, the success of the latter approach depends on the existence of a suitable smoothing function.
Hence, the Gradient-type methods for Global Optimization of Section 4 can be also applied to INPLC.
7. Conclusions
In this paper we have tried to demonstrate that Gradient or Gradient-type methods lead both to a general approach to optimization problems and to the construction of efficient algorithms.
In particular, we have shown that the class of problems for which the optimal solution can be obtained in a finite number of steps is larger than canonical unconstrained Convex Quadratic problems or Convex Quadratic Programming. Moreover, we have pointed out that the classical distinction between Direct Methods and Iterative Methods cannot be considered as a fundamental classification of techniques in Numerical Analysis. Many optimization problems can be, in fact, solved in a finite number of steps by suitable hybrid efficient algorithms (see [33]).
Furthermore, if the matrices involved in the computation are well conditioned, the superiority of Iterative Methods with respect to Direct ones, which is a typical feature of algorithm, can be proved an a more general context (see again [33]).
Several heuristic and ad hoc algorithms in operational environments can be considered, in fact, as particular cases of a general Gradient-type approach to the problem. In some cases, surprisingly enough, the convergence of Iterative Methods can be guaranteed only by utilizing a special Line-Search Minimization algorithm (see f.i. Fletcher-Reeves method in conjunction with Armijo-Goldstein-Wolfe's procedure, [18], Theorem 5.8).
It is also important to underline that many combinatorial problems, representing a remarkable benchmark set in Computer Science, can be translated in terms of Gradient-type methods in a general framework.
Once again, we stress that the Fixed Point theorem, which is considered a milestone in Numerical Analysis and guarantees the convergence of most of classical Iterative Methods, represents the background for only a subset of Gradient-type methods.
Acknowledgment
This paper was partially supported by PRIN 2008 N. 20083KLJEZ.