Abstract

This paper presents a general and comprehensive description of Optimization Methods, and Algorithms from a novel viewpoint. It is shown, in particular, that Direct Methods, Iterative Methods, and Computer Science Algorithms belong to a well-defined general class of both Finite and Infinite Procedures, characterized by suitable descent directions.

1. Introduction

The dichotomy between Computer Science and Numerical Analysis has been for many years the main obstacle to the development of eclectic computational tools. With the latter term the author indicates the capability of implementing algorithms properly adaptable to particular environmental requirements and, therefore, optimized for this aim.

Since the formulation of a problem requires the preliminary definition of the variables, and the functions involved in the model, the antithesis between finite and continuous applied mathematics is even stronger from a computational point of view.

In Computer Science, problems are typically defined on discrete sets (graphs, integer variables and so forth) and are characterized by procedures formalized in a finite number of steps.

Direct Methods, which are classical tools of Numerical Analysis, can be considered, in fact, algorithms according to the standard Computer Science definitions. However, the presence of ill-conditioned matrices can seriously affect the practical implementation of Direct Methods. On the other hand, Iterative Methods are based in the majority of cases on the convergence of a sequence approximating the optimal solution of a problem defined in a continuous range. Proper stopping rules on the truncation error reduce the latter computational scheme to a finite process, but, unfortunately, in many cases the theoretical result is affected by a variety of numerical instability problems, thereby preventing a precise forecast of the true number of iterations, requested to achieve the desired approximation.

Furthermore, Linear Programming, Convex Quadratic Programming, and the unconstrained minimization of a symmetric positive definite bilinear form are continuous problems that can be exactly solved with a finite number of steps. This proves that the distinction between algorithms and infinite iterative procedures is not always characterized by the discrete or the continuous range of the variables involved in the problem.

Most of Numerical Analysis methods are based upon the application of the Fixed Point theorem, assuring the convergence of the iterative scheme by means of a contraction of the distance between successive terms of the sequence approximating the optimal solution.

Gradient methods are usually considered in the literature as particular procedures in the frame of optimization techniques, for classical unconstrained or constrained problems.

The main aim of the present paper is to show that Gradient or Gradient-type methods represent the fundamental computational tool to solve a wide set of continuous optimization problems, since they are based on a unitary principle, referred to both to finite and to infinite procedures.

Moreover, some classical discrete optimization algorithms can be also viewed in the framework of Gradient-type methods.

Hence, the gradient approach allows to deal with problems involving variables defined both in a continuous range and in a discrete one, by utilizing finite or infinite procedures in a quite general perspective.

It is essential to underline that ABS methods [1], which represent a remarkable class of algorithms for solving linear and nonlinear equations, are founded on a quite different approach. Roughly speaking, ABS-methods construct, in fact, a set of spanning matrices in 𝑅𝑛, by performing an adaptive optimization, associated to the dimension of the subspace and parameter dependent. In many ABS-methods the choice of the set of optimal parameters is, in fact, crucial in order to identify by a unified approach the structural features of the optimization algorithms. Parameter dependence is not present in Gradient-type methods.

It is important to emphasize that the typical finiteness of Computer Science algorithms is characterized by classes of Gradient-type methods converging to an isolated point of a suitable sequence, generated by the procedure.

Furthermore, the most recent algorithms for Local Optimization can be precisely described by Gradient-type methods in a general framework. As a matter of fact, Interior Points techniques [2, 3], Barrier Algorithms [4] represent a wide set of Gradient-type methods for NonLinear Programming.

Moreover, a fundamental role in this new approach is played by the properties of suitable Structured matrices, associated to the optimization procedures. Advanced Linear Algebra Techniques are, in fact, essential to construct low-complexity algorithms.

We point out, in particular, the techniques based on Fast Transforms and the corresponding approximations by algebras of matrices simultaneously diagonalized [510].

The utilization of Advanced Linear Algebra Techniques in NonLinear Programming opens a new research field, leading in many cases to a significant improvement both of the efficiency and in the practical application of Gradient-type methods for problems of operational interest [11, 12].

In Deterministic Global Optimization structured matrices allow remarkable results in the frame of the Tunneling techniques, by using the classical 𝛼𝐵𝐵 approach [9].

The novel results on Tensor computation [13] are a promising area of research to improve the efficiency of global optimization algorithms for large-scale problems and particularly for the effective construction of more general sets of Repeller matrices in the Tunneling phases [14, 15]. This approach can have important consequences also in Nonlinear Integer Optimization (see the pioneer work in [16]), taking into account the more recent results concerning the discretization of the problem by the continuation methods (see, e.g., [17]).

Therefore, this survey has also the aim of finding in-depth general relationships between Local Optimization techniques and Deterministic Global Optimization algorithms in the frame of Advanced Linear Algebra Techniques.

2. The Gradient and the Gradient-Type Approach

Let min𝐱𝐴𝑓(𝐱),𝐴𝑅𝑛,(2.1) be an unconstrained problem to be solved.

By assuming 𝑓(𝐱)𝐶1(𝐴), the simplest heuristic procedure to deal with (2.1) is to determine the stationary points of 𝑓(𝐱)𝐴, that is, 𝐱𝐴𝑓(𝐱)=0 by the recursive computational scheme: 𝐱(𝑘+1)=𝐱(𝑘)𝜆𝑘𝐱𝑓(𝑘),𝑘=0,1(2.2) with𝐱(0)𝐴 being the initial point of the procedure, 𝑓(𝐱(𝑘)) the direction of maximum local decreasing, 𝜆𝑘 the one-dimensional step-size, 𝜆𝑘>0,

forall𝑘,𝜆𝑘 is computed such that𝐱(𝑘+1)𝑓𝐱𝐴,(𝑘+1)𝐱𝑓(𝑘),(2.3) and the sequence {𝐱(𝑘+1)} satisfies the condition lim𝑘𝐱𝑓(𝑘)=𝟎.(2.4) The iterative method (2.2) is a particular case of the following general Gradient-type method:𝐱(𝑘+1)=𝐱(𝑘)𝜆𝑘𝐬(𝑘),𝑘=0,1,(2.5) where𝐬(𝑘) is a descent direction, that is, 𝐬𝑇(𝑘)𝑓(𝐱(𝑘))>0cos(𝑓(𝐱(𝑘)),𝐬(𝑘))0.

The following theorem generalizes a well-known result shown in [18].

Theorem 2.1 (see [7]). If 𝐬(𝑘) is a descent direction in 𝐱(𝑘) for a function 𝑓(𝐱), then 𝐬(𝑘)=𝐴𝑘1𝐱𝑓(𝑘)(2.6) with 𝐴𝑘 being a symmetric positive definite (spd) matrix.
Moreover, the following property holds: 𝐱cos𝑓(𝑘),𝐬(𝑘)𝐴𝑐>0cond𝑘𝑀,𝑘.(2.7)

Remark 2.2. Particular cases of descent directions can be obtained, by setting𝐴𝑘=𝐼(SteepestDescentmethod), 𝐴𝑘=2𝑓(𝐱(𝑘))(Newton-Raphsonmethod), 𝐴𝑘2𝑓(𝐱(𝑘))(generalquasi-NewtonandclassicalBFGSmethods), 𝐴𝑘structured2𝑓(𝐱(𝑘))(low-complexityBFGS-typemethods) (see [5, 6, 18, 19]).
It is useful to underline that the general theory of admissible directions for unconstrained optimization [20] is also a special case of (2.5). By setting, in fact, for a given 𝛾>0: 𝐷𝛾,𝐱(𝑘)=𝐬(𝑘)𝑅𝑛𝐬(𝑘)𝐱=1,𝑓(𝑘)𝑇𝐬(𝑘)𝐱𝛾𝑓(𝑘),(2.8) one can obtain other Gradient-type methods described by (2.5).

The iterative scheme described by Algorithm 1 contains several ingredients of a general Gradient-type method see [20].

(a) Given { 𝛾 𝑘 } , { 𝜎 𝑘 } , 𝑘 = 0 , 1 s.t.
i n f { 𝛾 𝑘 } > 0 , i n f { 𝜎 𝑘 } > 0
Let 𝐱 ( 0 ) 𝑅 𝑛 be a starting point
For a given vector 𝐬 ( 𝑘 ) 𝐷 ( 𝛾 𝑘 , 𝐱 ( 𝑘 ) ) , set
𝐱 ( 𝑘 + 1 ) = 𝐱 ( 𝑘 ) 𝜆 𝑘 𝐬 ( 𝑘 ) , with 𝜆 𝑘 ( 0 , 𝜎 𝑘 𝑓 ( 𝐱 ( 𝑘 ) ) ) :
𝑓 ( 𝐱 ( 𝑘 + 1 ) ) m i n 𝜇 { 𝑓 ( 𝐱 ( 𝑘 ) 𝜇 𝐬 ( 𝑘 ) ) , 0 < 𝜇 𝜎 𝑘 𝑓 ( 𝐱 ( 𝑘 ) ) }

The convergence of Algorithm 1 is guaranteed by the following result (see again [20]).

Theorem 2.3. Let 𝑓(𝐱)𝐶1(𝐴), 𝐴 open 𝑅𝑛. Let 𝐾={𝐱𝑅𝑛𝑓(𝐱)𝑓(𝐱(0))}𝐴𝐱(0)𝐾. Then, forall{𝐱(𝑘)} evaluated by Algorithm 1: (i)𝐱(𝑘)𝐾,forall𝑘;(ii)if 𝐱(𝑘+1)𝐱(k),{𝐱(𝑘)} has at least an Extremal Point (EP) 𝐱;(iii)every EP  𝐱 of {𝐱(𝑘)} is a stationary point, that is, 𝑓(𝐱)=0.

Remark 2.4. Notice that Theorem 2.3 can be also applied in the case of classical Computer Science algorithms. As a matter of fact, if condition (ii) is not verified, then, by definition, ̃𝑘𝐱(̃𝑘+1)=𝐱(̃𝑘), implying 𝑓(𝐱(̃𝑘))=0, that is, the convergence to a stationary point in a finite number of steps ̃𝑘. Moreover, the convergence to an isolated point ̂𝐱 of the sequence {𝐱(𝑘)} can be proven ab absurdo by showing that ̂if𝐱EP𝑘>𝑘0𝐱,𝑓(𝑘)𝐱𝑓(𝑘+1)>𝑐0,𝑐0>0.(2.9) We will see in par. 3-4 that the convergence in a finite number of steps of a given iterative procedure can be verified in this way both for the unconstrained problems and for the constrained ones.

3. Local Unconstrained Optimization

Let 𝐶 and 𝑏 be a spd matrix of order 𝑛 and a 𝑛-dimensional vector, respectively.

It is well known that the problem 1min2𝐱𝑇𝐶𝐱𝐛𝑇𝐱,𝐱𝑅𝑛(3.1) can be exactly solved in at most 𝑛 steps by the Conjugate Gradient (𝐶𝐺) method [21], which represents a direct method to solve (3.1). The quadratic form associated to a spd matrix 𝐶 is, in fact, a convex function.

However, it can be also proved that the application of the procedure defined in (2.2), that is, the Steepest Descent method, always requires an infinite number of iterations, apart from the trivial case 𝐱(0)=𝐱. The latter result shows that the existence of a finite procedure to solve (3.1) does not depend only by the role played by convexity but it is also the consequence of a sort of optimal matching between the problem and the corresponding algorithm, which is in this case the (𝐶𝐺) method. On the other hand, the latter method can be also interpreted as an iterative method in the family of the following fixed point procedures: 𝐱(𝑘+1)=(𝑟𝐼𝐶)𝐱(𝑘)+𝐛𝑟(3.2) with 𝑟 being a suitable scalar parameter. By setting, in fact, 𝐶𝐻=𝐼𝑟,1𝐷=𝑟010𝑟1000𝑟10𝑟,(3.3) one can obtain the classical iterative scheme 𝐱(𝑘+1)=𝐻𝐱(𝑘)+𝐷𝐛.(3.4) Since 𝐻𝑠=11/cond(𝐶), (3.4) is convergent if the original matrix 𝐶 is well conditioned.

Moreover, if ̂𝐱 is the optimal solution of (3.1), the truncation error of the method is:𝐱(𝑘)̂𝐱211cond(𝐶)𝑘𝐱(0)̂𝐱2.(3.5) In the case of (𝐶𝐺) method, one can prove the inequality𝐱(𝑘)̂𝐱22cond(𝐶)1cond(𝐶)+1𝑘𝐱(0)̂𝐱2.(3.6) Equation (3.6) shows that, if the dimension 𝑛 is huge and the matrix 𝐶 is well conditioned, from a computational point of view it is more convenient to implement the (𝐶𝐺)method as a classical iterative procedure with a stopping rule based on the above inequality.

So, once again, the distinction between Numerical Analysis direct methods (or Computer Science algorithms) and infinite procedures cannot be considered as the fundamental classification rule in computational mathematics.

In the case of Steepest Descent method, the truncation error is𝐱(𝑘)̂𝐱22cond(𝐶)1cond(𝐶)+1𝑘𝐱(0)̂𝐱2.(3.7) The difference between (3.6) and (3.7) clearly indicates the major efficiency of (𝐶𝐺) method.

In [22] the finite version of (𝐶𝐺)-method was extended to a family of nonquadratic functions, including the following important sets: 𝐱𝐹(𝐱)=𝑇𝐶𝐱𝐜𝑇𝐱2,𝐱𝑋,(3.8)𝐺(𝐱)=𝐱𝑇𝐜𝐶𝐱𝑇𝐱𝑘,𝑘integer,𝐱𝑋,(3.9) where 𝑋={𝐱𝑅𝑛𝐜𝑇𝐱>0}.

According to the classical definition, the function 𝐹 indicated in (3.8) is called conic. If 𝑘=2, then 𝐺(𝐱)𝐹(𝐱).

Hence, 𝐺 represent a class of nonquadratic functions for which the optimal solution can be found with a finite number of steps if the matrix 𝐶 is spd.

As a matter of fact, the following result holds.

Theorem 3.1 (see [22] Theorem 3.1, Lemmas 3.2 and 5.1). Let 𝐺(𝐱) be defined as in (3.9). Then the minimum problem min𝐺(𝐱),𝐱𝑋,(3.10) can be solved in at most 𝑛 steps.

Let us now consider some generalizations of the convexity, which play an important role in global optimization see [7].

Let 𝐬(𝑘) be a descent direction in 𝐱(𝑘) for a function 𝑓(𝐱). The importance of the following definitions will be shown in the next results of this paragraph.

Definition 3.2. A function 𝑓(𝐱)𝐶1(𝑅𝑛) is called algorithmically convex if forall𝐱(𝑘),𝐱(𝑘+1) evaluated by an algorithm of type (2.5), one has 𝐬(𝑘+1)𝐬(𝑘)𝑇𝐱(𝑘+1)𝐱(𝑘)0.(3.11)

Definition 3.3. A function 𝑓(𝐱)𝐶1(𝑅𝑛) is called weakly convex if forall𝐱(𝑘),𝐱(𝑘+1) evaluated by an algorithm (2.5), the following inequality holds: 𝐱𝑓(𝑘+1)𝐱𝑓(𝑘)2𝐱𝑓(𝑘+1)𝐱𝑓(𝑘)𝑇𝐱(𝑘+1)𝐱(𝑘)𝑀.(3.12)

Definition 3.4. Let 𝐬(𝑘)=𝐴𝑘1𝑓(𝐱(𝑘)),forall𝑘 be descent directions of an algorithm of type (2.5) applied to problem (2.1). Then the method is called secant if the matrix 𝐴𝑘 solves the secant equation: 𝐴𝑘𝐱(𝑘)𝐱(𝑘1)𝐱=𝑓(𝑘)𝐱𝑓(𝑘1).(3.13)

Definition 3.2 is clearly a generalization of convexity. As a matter of fact, if 𝐬(𝑘)=𝑓(𝐱(𝑘)) then 𝑓(𝐱)𝐶1(𝐴),𝐴𝑅𝑛, is convex if and only if (3.11) is verified forall𝐱(𝑘),𝐱(𝑘+1)𝐴 (see [23]).

Definition 3.3 is also a generalization of convexity. In [24], in fact, it is proved that if 𝑓(𝐱)𝐶1(𝐴),𝐴𝑅𝑛, is convex, then (3.12) is satisfied forall𝐱(𝑘),𝐱(𝑘+1)𝐴. So (3.12) is a necessary, but not sufficient, condition for a function 𝑓 to be convex.

Definition 3.4 is an 𝑛-dimensional generalization of the classical secant iterative formula to compute the zeroes of the derivative of a function 𝑓(𝑥)𝐶1(𝑅1), that is, 𝑥𝑘+1=𝑓𝑥𝑘𝑥𝑘1𝑓𝑥𝑘1𝑥𝑘𝑓𝑥𝑘𝑓𝑥𝑘1.(3.14) Observe, in fact, that (3.14) can be rewritten as 𝑥𝑘+1=𝑥𝑘𝑓𝑥𝑘𝑎𝑘,𝑎𝑘𝑥𝑘𝑥𝑘1=𝑓𝑥𝑘𝑓𝑥𝑘1.(3.15) Hence, the expression of 𝑎𝑘 is the 1-dimensional version of (3.13).

In [7] it is proved the following result.

Theorem 3.5 (see also [18]). Let 𝐬(𝑘)=𝐴𝑘1𝑓(𝐱(𝑘)) be descent directions of a secant method, that is, satisfying (3.13), applied to problem (2.1). Moreover, let conditions (2.7) and (3.12) be verified. Then, {𝑓(𝐱(𝑘𝑖))}, such that lim𝑖𝐱𝑓(𝑘𝑖)=𝟎.(3.16)

Remark 3.6. Theorem 3.5 shows that a global convergence for a quasi-Newton secant method applied to problem (2.1) can be obtained if the function 𝑓(𝐱) is weakly convex and the matrices 𝐴𝑘 approximating 2𝑓(𝐱(𝑘)) are well conditioned.

Remark 3.7. By utilizing Armijo-Goldstein-Wolfe's method [18] and setting 𝐬(𝑘)=𝑓(𝐱(𝑘)), the step 𝜆𝑘 in (2.5) is such that forall𝑘𝐱𝑓(𝑘+1)𝐱𝑓(𝑘)𝑇𝐱(𝑘+1)𝐱(𝑘)𝑓𝐱>0,(𝑘+1)𝐱<𝑓(𝑘).(3.17) Hence, by Definition 3.2, in this case the function 𝑓(𝐱) is also algorithmically convex. For general descent directions 𝐬(𝑘), evaluated by a quasi-Newton secant method, inequality (3.11) is not always satisfied.

4. Local Constrained Optimization

Quadratic Programming (QP) is defined in the following way: min𝐱𝑇𝐶𝐱+𝐜𝑇𝐱,𝐴𝐱=𝐛,𝐱𝟎(4.1) with 𝐶 being a symmetric semidefinite positive (ssdp) matrix of order 𝑛 and 𝐴 a matrix with 𝑚 rows and 𝑛 columns.

Remark 4.1. Let 𝑃={𝐱𝑅𝑛𝐴𝐱=𝐛,𝐱0}. The optimal solution of (4.1) can be located in any point of 𝑃. Hence, (4.1) is a continuous problem which cannot be immediately reduced to a finite problem as in the case 𝐹(𝐱)=𝐜𝑇𝐱, that is, Linear Programming (LP).
Let us consider, for instance, the following problems: min𝑥213𝑥2,2𝑥1𝑥24,𝑥12𝑥216,2𝑥1+4𝑥28,7𝑥1+8𝑥2𝑥35,10,𝑥20,(4.2)min𝑥21+4𝑥224𝑥124𝑥2+40,7𝑥16𝑥2+420,5𝑥1+3𝑥2𝑥+100,10,𝑥20.(4.3)

The optimal solution of (4.2) is the point (3,2) which is in the boundary of 𝑃 but is not a vertex. On the other hand, problem (4.3) has the optimal solution in the inner point (2,3). However, QP can be solved in general in a finite number of steps by means of Frank-Wolfe's algorithm [25]. So, QP can be considered as a finite continuous constrained optimization problem.

The following question arises: does QP characterize the boundary, separating finite continuous constrained optimization problems from infinite ones? In other words, there exist more general nonlinear constrained optimization problems that can be solved in a finite number of iterations? Since in the unconstrained case we have shown in the previous paragraph that there exist nonquadratic problems that can be exactly solved in a finite number of iterations by utilizing the (𝐶𝐺)-method, the answer is expected to be positive.

Given a convex function 𝑓(𝐱)𝐶1(𝑆),𝑆 convex 𝑅𝑛, Convex Programming with Linear Constraints (CPLC) is defined as min𝑓(𝐱),𝐴𝐱=𝐛,𝐱𝟎.(4.4) Problem (4.4) can be solved by the Reduced Gradient (RG) algorithm or by the Gradient Projection (GP) method [23, 26, 27].

Assuming 𝐴 with maximum rank and taking into account Remark 2.4, one can introduce the following.

Definition 4.2. Let 𝑓(𝐱)𝐶1(𝑆),𝑆 convex 𝑅𝑛, be a convex function.
Let 𝑃={𝐱𝑅𝑛̂𝐴𝐱=𝐛,𝐱𝟎} be a nonempty polyhedron. The corresponding CPLC problem (4.4) is a finite continuous constrained optimization problem, if and only if there exists a convergent Gradient-type method (2.5) and a positive real number 𝑐0, such that if (2.5) would require an infinite number of steps, then 𝐱(𝑘)𝑃,𝑘,inf𝑘𝑓𝐱(k)𝑓𝐱(𝑘+1)𝑐0,𝑘>𝑘0,𝑐0>0.(4.5) Equation (4.5) clearly implies that 𝑘𝑓(𝐱(𝑘))=min𝐱𝑃𝑓(𝐱).

The importance of Definition 4.2 can be pointed out by the next result, showing the relationship between (4.4) and a particular linear optimization problem.

Theorem 4.3. Let 𝐱(𝑘) be an admissible solution of (4.4). Let 𝐬(𝑘) be a descent direction in 𝐱(𝑘) for the function 𝑓(𝐱). Then 𝐬(𝑘) is an admissible descent direction for (4.4) if 𝐴𝐬(𝑘)=0.
Moreover, for any fixed ̂𝐱(𝑘) the optimal solution 𝐬 of the problem ̂𝐱min𝑓(𝑘)𝑇𝐬,𝐴𝐬=𝟎,𝐬=1(4.6) is given by 𝐬=𝐼𝐴𝑇𝐴𝐴𝑇1𝐴̂𝐱𝑓(𝑘)𝐼𝐴𝑇𝐴𝐴𝑇1𝐴̂𝐱𝑓(𝑘).(4.7) By setting ̂𝐱𝐜=𝑓((𝑘)) and 𝐱=𝐬 it was proven in [28] that (4.6) is equivalent to a general LP problem, that is, min𝐜𝑇𝐱,𝐴𝐱=𝐛,𝐱𝟎.(4.8) Furthermore, if 𝑇={𝐱𝑅𝑛+,𝐴𝐱=𝟎,𝐱=1}, the following result holds (see [29]).

Theorem 4.4. Given a suitable integer 𝐿 and the function. 𝑔(𝐱)=𝑛𝑗=1𝐜ln𝑇𝐱𝑥𝑗=𝑛ln𝐜𝑇𝐱𝑛𝑗=1ln𝑥𝑗,(4.9) then, (4.6) and hence (4.8) are equivalent to find a point 𝐱: 𝐱𝑔𝐱𝑇,<2𝑛𝐿.(4.10) Moreover, it is possible to determine a real number 𝑐0 and a sequence {𝐱(𝑘)}𝑇 by a GP algorithm with a suitable scaling procedure (see again [29]) such that 𝑔𝐱(𝑘+1)𝐱<𝑔(𝑘)𝑐0.(4.11)

By Theorem 4.4 and Definition 4.2 it follows that there exists a Gradient-type method (2.5) solving 𝐿𝑃 in a finite number of steps. Hence 𝐿𝑃 is a finite continuous constrained optimization problem. It is important to underline that the latter result is not a consequence of the intrinsic finiteness of the set of the possible optimal solutions (the vertices of a polyhedron) as in the classical simplex algorithm.

Given the convex functions, 𝑓(𝐱),1(𝐱),2(𝐱)𝑚(𝐱)𝐶1(𝑆),𝑆 convex 𝑅𝑛, let us now consider the general Convex Programming (CP) problem:min𝑓(𝐱),𝑖(𝐱)0,𝑖=1,2𝑚,𝐱𝟎.(4.12) The following property is well known [23, 26].

Definition 4.5. Letting ̂𝐱0 and 𝐼={𝑖𝑖(̂𝐱)=0}, then the constraints of (4.12) are qualified if one of the following conditions is satisfied: 𝐱𝟎𝑖𝐱<0,𝑖=1,2𝑚,(4.13)𝑖(̂̂̂𝐱)islocallyconcavein𝐱,𝑖𝐼,𝐱.(4.14) If 𝑖(𝐱)=𝐜𝑖𝑇𝐱, then (4.14) is trivially satisfied ̂forall𝐱,forall𝑖𝐼.

So, from Definition 4.5 we deduce that the constraints of CPLC problem (4.4) are always qualified. Assuming in (4.4) 𝐴 with maximum rank, we clearly obtain a condition equivalent to (4.13).

Definition 4.6. A set 𝐶𝑅𝑛 is called a convex cone if 𝐱𝐶𝜆𝐱𝐶,𝜆>0,𝐱(1),𝐱(2)𝐶,0𝜆1,𝜆21,𝜆1𝐱(1)+𝜆2𝐱(2)𝐶.(4.15) The following theorem was proved in [30] in a general Hilbert space (see Theorem 2.3).

Theorem 4.7. Let 𝑆1,𝑆2𝑅𝑛 be closed convex cones, and let 𝑆𝑜1 denote the interior of 𝑆1. Assume that 𝑆𝑜1.
Then the corresponding conic feasibility problem nd𝐱𝑆𝑜1𝑆2(4.16) can be solved in a finite number of steps.

The technique utilized to prove Theorem 4.7 is based upon the so-called Method of Alternative Projections (MAP) (see [31]).

Theorem 4.7 was extended in [30] (see Proposition 2.1) by assuming 𝑆1 and 𝑆2 be closed convex sets, thereby proving that a convex feasibility problem is equivalent to a conic feasibility problem. However, the open question remains how to express explicit formulas for the projection operators to convert the algorithm from 𝑆1 and 𝑆2 to the conified closed sets con(𝑆1) and con(𝑆2) in the case of nonlinear and nonquadratic problems. The Linear Matrix Inequality (LMI) feasibility problem was, in fact, efficiently solved in the literature (see [32]).

Remark 4.8. Theorem 4.7 can be applied to CPLC problem (4.4), by assuming 𝑆1={𝐱𝑅𝑛𝑆𝐴𝐱=𝐛,𝐱𝟎},2(𝑘)=𝐱𝑅𝑛𝑓(𝐱)𝑡(𝑘),𝑡(𝑘)𝑅,𝑘=1,2,𝑘0.(4.17)

Hence, explicit formulas for the projection operators for suitable classes of nonlinear convex feasibility problems in terms of the corresponding conified sets might allow to solve CPLC problem (4.4) in the nonquadratic case with a finite number of steps. By utilizing Theorem 3.1, we can prove, in fact, the following important theorem.

Theorem 4.9 (see [33]). Consider the particular CPLC problem min𝐱𝑇𝐜𝐶𝐱𝑇𝐱2,𝐶spd,𝐴𝐱=𝐛,𝐜𝑇𝐱0,𝐱𝟎.(4.18) Assume that the optimal solution 𝐱 of problem (4.18) be such that 𝐜𝑇𝐱<0. Then, (4.18) can be converted into a convex feasibility problem by utilizing a proper modification of the Alternative Projection method, and the latter algorithm converges to the optimal solution with a finite number of steps.

Remark 4.10. Given the convex set of feasible solutions 𝑆1=𝐱𝑅𝑛𝐴𝐱=𝐛,𝐜𝑇𝐱0,𝐱𝟎,(4.19) the proof of Theorem 4.9 is essentially based upon the following computational ingredients:

(a)by Theorem 4.7, one can convert the closed convex set defined in (4.19) into a closed convex cone; (b)by Theorem 3.1, the extended version of (𝐶𝐺)-method and a suitable projection algorithm can be applied to problem (4.18) thereby obtaining a convergence with a finite number of steps.

5. Global Optimization

One can prove the following global convergence theorem [7].

Theorem 5.1. Consider Problem (2.1), where 𝑓(𝐱)𝐶2(𝑅𝑛).
Let 𝑓min be the value of the optimal solution. Assume that 𝜖𝑎+,𝜖𝑠+𝐱𝑓(𝑘)>𝜖𝑠𝑓𝐱exceptfor𝑘(𝑘)𝑓min<𝜖𝑎.(5.1) If in an iterative scheme of BFGS-type, 𝐱(𝑘+1)=𝐱(𝑘)𝜆𝑘𝐵(𝑘)1𝐱𝑓(𝑘),𝐵(𝑘)𝐵=𝜑(𝑘1),,𝑘,(5.2) the following conditions are satisfied: 𝐱𝑓(𝑘+1)𝐱𝑓(𝑘)2𝐱𝑓(𝑘+1)𝐱𝑓(k)𝑇𝜆𝑘𝐝(𝑘)=𝐲𝑘2𝐲𝑇𝑘𝐬𝑘𝑀,(5.3)𝐵cond(𝑘)𝑁.(5.4) Then 𝜖𝑎+,𝑘𝑘>𝑘𝑓𝐱(𝑘)𝑓min<𝜖𝑎.(5.5) Theorem 5.1 points out as follows three conditions for a global optimization BFGS-type method.

Condition (5.1) assumes an optimal matching between the BFGS-type algorithm and the function 𝐹 [34]. (5.3) is equivalent to (3.12), that is, 𝑓(𝐱) is weakly convex (see [24]).

Condition (5.4) can be easily satisfied, by modifying the matrices 𝐵(𝑘) by a restarting procedure, because every descent direction is associated to an spd matrix (see Theorem 2.1).

Let us now consider the classical “box-constrained” problem: min𝑓(𝐱),𝐱𝐋𝐱𝐱𝐔.(5.6) Let 𝐱𝐿𝑐(𝑚)𝐱𝑐(𝑚)𝐱𝑈𝑐(𝑚) denote the current box at iteration 𝑚.

Set 𝛼𝐱𝑐(𝑚)1=max0,2min𝑖𝜆𝑖2𝑓𝐱𝑐(𝑚),(5.7)𝐿𝑐(𝑚)𝐱𝑐(𝑚)𝐱=𝑓𝑐(𝑚)+𝛼𝐱𝑐(𝑚)𝐱𝐿𝑐(𝑚)𝐱𝑐(𝑚)𝐱𝑈𝑐(𝑚)𝐱𝑐(𝑚).(5.8) The following global convergence theorem holds (see [35, 36]).

Theorem 5.2. Consider Problem (5.6) and assume 𝑓(𝐱)𝐶2. These hypotheses imply cond2𝑓(𝐱)𝑐,𝑚𝛼𝑚=max𝐱𝑐(𝑚)𝛼𝐱𝑐(𝑚).(5.9) Set 𝑓𝐿𝑐(𝑚)=inf𝐱𝑐(𝑚)𝐿𝑐(𝑚)𝐱𝑐(𝑚),𝑓𝑈𝑐(𝑚)𝐱=𝑓𝐿𝑐(𝑚)+𝐱𝑈𝑐(𝑚)2.(5.10) Then, it follows forall𝑚: 𝑓𝐿𝑐(𝑚)𝑓𝐿𝑐(𝑚+1)min𝐱𝑐(𝑚+1)𝑓𝐱𝑐(𝑚+1)min𝐱𝑓(𝐱),(5.11)𝑓𝑈𝑐(𝑚)𝑓𝑈𝑐(𝑚+1)min𝐱𝑓(𝐱)𝑓𝐿𝑐(𝑚).(5.12) Moreover, forall𝜖𝑎>0,𝑚forall𝑚𝑚: 𝑓𝑈𝑐(𝑚)𝑓𝐿𝑐(𝑚)<𝜖𝑎,𝐱𝑈𝑐(m)𝐱𝐿𝑐(𝑚)24𝜖𝑎𝑐,𝑐constant(5.13) Theorem 5.2 can be immediately extended to Problem (2.1), by assuming a growth condition on the function 𝑓(𝐱).

In fact, we have the following.

Corollary 5.3. Given 𝑓(𝐱)𝐶2(𝑅𝑛): lim𝐱𝑓(𝐱)=+.(5.14) Equation (5.14) implies 𝐾0min𝐱𝑅𝑛𝑓(𝐱)min𝐱𝐾0𝑓(𝐱),2𝑓(𝐱)𝑐1,𝐱𝐾0.(5.15) Assume 2𝑓(𝐱)1𝑐2andhencecond2𝑓(𝐱)𝑐1𝑐2.(5.16) Then, the convergence results proved for (5.6) can be applied to (2.1).

We can fruitfully combine the results of Theorems 5.1 and 5.2, by proving the following.

Theorem 5.4. Consider Problem (5.6) and assume 𝑓(𝐱)𝐶2(𝑅𝑛).
If in a BFGS-type iterative scheme 𝐱(𝑘+1)=𝐱(𝑘)𝜇𝑘(𝑘)1𝐱𝑓(𝑘),(5.17) the following conditions are satisfied: 𝐱𝐋𝐱(𝑘)𝐱𝐔,𝑘,(5.18)cond(𝑘)𝑁,(5.19)𝐱𝑓(𝑘+1)𝐱𝑓(𝑘)2𝐱𝑓(𝑘+1)𝐱𝑓(𝑘)𝑇𝜆𝑘𝐝(𝑘)=𝐲k2𝐲𝑇𝑘𝐬𝑘𝑀.(5.20) Then (5.17) is convergent to the optimal solution of (5.6).

Proof. By the assumptions it follows: cond(2𝐹(𝐱))𝑐.
Hence, by (5.7) we have for all 𝑚: 𝛼𝑚=max𝐱𝑐(𝑚)𝛼𝐱𝑐(𝑚).(5.21) Set 𝐿𝑐(𝑚)𝐱𝑐(𝑚)𝐱=𝑓𝑐(𝑚)+𝛼𝑚𝐱𝐿𝑐(𝑚)𝐱𝑐(𝑚)𝐱𝑈𝑐(𝑚)𝐱𝑐(𝑚).(5.22) Therefore, by (5.21) and (5.22) for all 𝑚: 𝐿𝑐(𝑚)𝐱𝑐(𝑚)𝐿𝑐(𝑚)𝐱𝑐(𝑚),𝐱𝑐(𝑚).𝐿𝑐(𝑚)𝐱𝑐(𝑚)convex𝐱c(𝑚).(5.23) So, 𝑓𝐿𝑐(𝑚)=inf𝐱𝑐(𝑚)𝐿𝑐(𝑚)𝐱𝑐(𝑚)=min𝐱𝑐(𝑚)𝐿𝑐(𝑚)𝐱𝑐(𝑚),𝑚.(5.24) Let 𝐱(̃𝑘𝑚)𝑐(𝑚) be a local minimum in the box 𝑐(𝑚).
If 𝐱𝐿𝑐(𝑚+1)𝐱(̃𝑘𝑚)𝑐(𝑚)𝐱𝑈𝑐(𝑚+1) and 𝑓((𝐱𝐿𝑐(𝑚+1)+𝐱𝑈𝑐(𝑚+1))/2)𝑓(𝐱(̃𝑘𝑚)𝑐(𝑚)), then define 𝑓𝑈𝑐(𝑚+1)𝐱=𝑓(̃𝑘𝑚)𝑐(𝑚).(5.25) Else, set 𝐱(0)𝑐(𝑚+1)=𝐱𝐿𝑐(𝑚+1)+𝐱𝑈𝑐(𝑚+1)2,𝑓𝑈𝑐(𝑚+1)𝐱=𝑓(̃𝑘𝑚+1)𝑐(𝑚+1)(5.26) with 𝐱(̃𝑘𝑚+1)𝑐(𝑚+1) being a local minimum evaluated by the starting point 𝐱(0)𝑐(𝑚+1) and contained in the box 𝑐(𝑚+1). Since the assumptions of Theorem 5.2 are satisfied, by the results of [7] (see Theorem 2 and Corollary 2), it follows that (5.19), (5.20) imply that forall𝜖𝑏>0,{𝐱(̃𝑘𝑚𝑖)𝑐(𝑚𝑖)}: 𝑓𝑈𝑐𝑚𝑖𝑓𝑈𝑐𝑚𝑖+1,𝐱𝑓(̃𝑘𝑚𝑖)𝑐𝑚𝑖<𝜖𝑏.(5.27) Applying Theorem 5.2, by inequalities (5.11) and (5.27) and by setting 𝜖=max{𝜖𝑎,𝜖𝑏}, we have that forall𝜖>0,{𝐱(̃𝑘𝑚𝑖)𝑐(𝑚𝑖)}, 𝐱𝑓(̃𝑘𝑚𝑖)𝑐𝑚𝑖2𝐱<𝜖,𝑈𝑐𝑚𝑖𝐱𝐿𝑐𝑚𝑖24𝜖𝑐,𝑓𝐱(̃𝑘𝑚𝑖)𝑐𝑚𝑖𝑓min𝑓𝑈𝑐𝑚𝑖𝑓𝐿𝑐𝑚𝑖<𝜖.(5.28) This completes the proof.

Although the local minimization phases are performed effectively by the iterative scheme (5.17), the convergence of the method to the global minimum is usually very slow by the very nature of the 𝛼𝐵𝐵 approach. In particular, the number of the upper bounds 𝑓𝑈𝑐(𝑚i) and the corresponding boxes 𝑚𝑖, requested to obtain a satisfactory approximation can be unacceptable from a computational point of view. In order to overcome this problem, a fast determination of “good” local minima is essential.

More precisely, by the utilization of terminal repellers and tunneling techniques [14], one can build algorithms based on a sequence of cycles, where each cycle has two phases, that is, a local optimization phase and a tunneling one. The main aim of these procedures is to build a favourable sequence of local minima (maxima), thereby determining a set of possible candidates for the global minimum (maximum) more efficiently.

By injecting in the method suitable “tunneling phases,” one can avoid the unfair entrapment in a “bad” local minimum, that is, when the condition 𝑓𝑈𝑐(𝑚+1)𝐱=𝑓(̃𝑘𝑚+1)𝑐(𝑚+1)𝐱=𝑓(̃𝑘𝑚)𝑐(𝑚+1)=𝑓𝑈𝑐(𝑚)𝑓min(5.29) is verified for several iterations. For this purpose, the power of the repellers, utilized in the tunneling phases, plays a crucial role. The classical and well-known use of scalar repellers [14, 34] is often unsuitable, when the dimension 𝑛 of the problem assumes values of operational interest. A repeller structured matrix, based on the sum of a diagonal matrix and a low-rank one [15], can be constructed to overcome the latter difficulty.

Let 𝐱(̃𝑘) be an approximation of a local minimizer for 𝑓(𝐱)𝐶1.

A matrix 𝕬(̃𝑘) is called a repeller matrix for 𝐱(̃𝑘) if ̂𝐱, ̂𝐱=𝐱(̃𝑘)𝒜(̃𝑘)𝐱𝑓(̃𝑘),𝑓(̂𝐱𝐱)<𝑓(̃𝑘).(5.30) The repeller matrix 𝕬(̃𝑘) for any given computed local minimizer 𝐱(̃𝑘) can be approximated in the following way (see [15]): 𝕬(̃𝑘)𝜆(̃𝑘)𝐼𝐼+𝜇+𝑅1,2rank(𝑅)4(5.31) with 𝜆(̃𝑘) being the maximal scalar repeller [34] that is, 𝜆(̃𝑘)=𝜖𝑎𝑓(𝐱(̃𝑘))2,𝑓(𝐱(̃𝑘)𝜖𝑎,𝜖𝑎desiredprecision,(5.32) with 𝑅 being of the following structure: 𝑅=𝜇1𝐩𝐩𝑇+𝜇2𝐪𝐪𝑇+𝜇3𝐩𝐫𝑇+𝜇4𝐫𝐪𝑇𝐩,𝐪,𝐫suitablevectors𝜇1,𝜇2,𝜇3,𝜇4scalars.(5.33) In this way, the application of a BFGS-type method can be effectively extended to the tunneling phases and hence to the whole global optimization scheme (see [9, 33]).

The structure in (5.33) can be generalized by using the recent Tensor-Train (TT)-cross approximation theory [13].

It is well known, in fact, that a rank-p matrix can be recovered from a cross of p linearly independent columns (or rows). Therefore, an arbitrary matrix can be interpolated by a pseudoskeleton approximation (see [15] and again [13]). In particular, since a repeller matrix is not arbitrary and possesses some hidden structure, it is fundamental to discover a low-parametric representation, which can be useful in the tunneling phases.

An operational cross approximation method, evaluating large close-to-rank-p matrices in 𝒪(𝑛𝑝2) time complexity and by computing 𝒪(𝑛𝑝) elements, was shown in [37].

6. Discrete Optimization

A well-known family of Computer Science methods is represented by the so-called Greedy algorithms. The simplest application of this type of procedures is in the standard Knapsack Problem (KP), that is, max𝐜𝑇𝐚𝐱,𝑇𝐱𝑏,𝐱𝟎,integer.(6.1) Greedy approach is essentially a generalization of the classical Dynamical Programming (DP) methods, which are based on the Bellman Principle. By utilizing the DP computational scheme and assuming 𝑦 integer, problem (6.1) can be reduced to the recursive solution of the following family of problems: max𝐜𝑇(𝑘)𝐱(𝑘),𝐚𝑇(𝑘)𝐱(𝑘)𝐱𝑦,(𝑘)𝟎,integer,1𝑘𝑛,1𝑦𝑏,integer,(6.2) where 𝐜(𝑘),𝐚(𝑘),𝐱(𝑘) indicate the vectors associated to the first 𝑘 components of 𝐜,𝐚,𝐱, respectively.

Given 𝑘 and 𝑦, let 𝜓𝑘(𝑦) be the value of the objective function corresponding to the optimal solution of problem (6.2).

The algorithm computes 𝜓𝑘(𝑦) by the recursive formula 𝜓𝑘𝜓(𝑦)=max𝑘1(𝑦),𝜓𝑘𝑦𝑎𝑘(𝑘)+𝑐𝑘(𝑘).(6.3) By (6.3), the optimal value of (6.2) is determined by a generalized discrete Steepest Descent algorithm, since 𝑐𝑘(𝑘) is the 𝑘.𝑡 component of the gradient of the objective function and represents, in fact, the increase associated to the choice of the 𝑘.𝑡 object.

Therefore, formula (6.3) is based on a discrete Steepest Descent approach, and the value 𝜓𝑘(𝑦𝑎𝑘(𝑘))+𝑐𝑘(𝑘) assures that the corresponding solution is admissible.

Integer Nonlinear Programming with Linear Constraints problems (INPLCs) can be transformed into continuous GO problems over the unit hypercube [17]. In order to reduce the difficulties caused by the introduction of undesirable local minimizers, a special class of continuation methods, called smoothing methods, can be introduced [38]. These methods deform the original objective function into a function whose smoothness is controlled by a parameter. Of course, the success of the latter approach depends on the existence of a suitable smoothing function.

Hence, the Gradient-type methods for Global Optimization of Section 4 can be also applied to INPLC.

7. Conclusions

In this paper we have tried to demonstrate that Gradient or Gradient-type methods lead both to a general approach to optimization problems and to the construction of efficient algorithms.

In particular, we have shown that the class of problems for which the optimal solution can be obtained in a finite number of steps is larger than canonical unconstrained Convex Quadratic problems or Convex Quadratic Programming. Moreover, we have pointed out that the classical distinction between Direct Methods and Iterative Methods cannot be considered as a fundamental classification of techniques in Numerical Analysis. Many optimization problems can be, in fact, solved in a finite number of steps by suitable hybrid efficient algorithms (see [33]).

Furthermore, if the matrices involved in the computation are well conditioned, the superiority of Iterative Methods with respect to Direct ones, which is a typical feature of (𝐶𝐺) algorithm, can be proved an a more general context (see again [33]).

Several heuristic and ad hoc algorithms in operational environments can be considered, in fact, as particular cases of a general Gradient-type approach to the problem. In some cases, surprisingly enough, the convergence of Iterative Methods can be guaranteed only by utilizing a special Line-Search Minimization algorithm (see f.i. Fletcher-Reeves method in conjunction with Armijo-Goldstein-Wolfe's procedure, [18], Theorem 5.8).

It is also important to underline that many combinatorial problems, representing a remarkable benchmark set in Computer Science, can be translated in terms of Gradient-type methods in a general framework.

Once again, we stress that the Fixed Point theorem, which is considered a milestone in Numerical Analysis and guarantees the convergence of most of classical Iterative Methods, represents the background for only a subset of Gradient-type methods.

Acknowledgment

This paper was partially supported by PRIN 2008 N. 20083KLJEZ.