#### Abstract

To solve an unconstrained nonlinear minimization problem, we propose an optimal algorithm (OA) as well as a globally optimal algorithm (GOA), by deflecting the gradient direction to the best descent direction at each iteration step, and with an optimal parameter being derived explicitly. An invariant manifold defined for the model problem in terms of a locally quadratic function is used to derive a purely iterative algorithm and the convergence is proven. Then, the rank-two updating techniques of BFGS are employed, which result in several novel algorithms as being faster than the steepest descent method (SDM) and the variable metric method (DFP). Six numerical examples are examined and compared with exact solutions, revealing that the new algorithms of OA, GOA, and the updated ones have superior computational efficiency and accuracy.

#### 1. Introduction

The steepest descent method (SDM), which can be traced back to Cauchy (1847), is the simplest gradient method for solving unconstrained minimization problems. However, the SDM performs well during earlier stages and as approaching to a stationary point it converges very slowly. In this paper, we consider the following nonlinear minimization problem without considering constraint: where is a differentiable function.

In the iterative solution of (1), if is the current iterative point, then we denote by , by , and by , which is known to be a symmetric Hessian matrix. The second-order Taylor expansion of function at the point is where . The superscript signifies the transpose and meanwhile signifies the inner product of and .

Let , and inserting it into (2) we can obtain By requiring the minimization with respect to , we can derive Then, we have a famous steepest descent method (SDM) for solving (1).(i)Give an initial , and then compute .(ii)For , we repeat the following iteration: If , then stop; otherwise, go to step (ii).

For the minimization problem (1) we need to solve , and hence the residual means the value of . The above convergence criterion means that when the residual norm is smaller than a given error tolerance , the iterations are terminated.

In the derivation of SDM for solving (1) it is easy to see that we have transformed the global minimization problem into a model problem in terms of a quadratic minimization problem of and determined the coefficient by (4), where with being a constant to raise the level value of , and is a constant vector within each iterative step. Here for the purpose of simple notation we omit the subscript in (6), and we are going to modify the SDM by starting from the above locally quadratic function.

Several modifications of the SDM have been addressed. These modifications have led to a new interest in the SDM that the gradient vector itself is not a bad choice but rather that the original steplength leads to a slow convergence behavior. Barzilai and Borwein [1] have presented a new choice of steplength through a two-point stepsize. Although their method did not guarantee the descent of the minimum function values, Barzilai and Borwein [1] were able to produce a substantial improvement of the convergence speed for a certain test of a quadratic function. The results of Barzilai and Borwein [1] have spurred many researches on the SDM, for example, Raydan [2, 3], Friedlander et al. [4], Raydan and Svaiter [5], Dai et al. [6], Dai and Liao [7], Dai and Yuan [8], Fletcher [9], and Yuan [10]. In this paper, we will approach this problem from a quite different viewpoint of invariant manifold and propose a new strategy to modify the steplength and the descent direction. Besides the SDM, there were many modifications of the conjugate gradient method for the unconstrained minimization problems, like Birgin and Martinez [11], Andrei [12–14], Zhang [15], Babaie-Kafaki et al. [16], and Shi and Guo [17].

Also, there is another class method with the descent direction in being taken to be , where is a positive definite matrix that approximates the inverse of the Hessian matrix , which is usually named the quasi-Newton method. The earlier method of minimization of a nonlinear function by using this type approach is performed by Davidon [18], which was then simplified and reformulated by Fletcher and Powell [19] and was referred to as the variable metric method (DFP).

The remaining portions of this paper are arranged as follows. In Section 2 we describe an invariant manifold to derive the governing ordinary differential equations (ODEs). The main results are derived in Section 3, which includes the proof of convergence theorem, optimal parameter, optimal algorithm, a critical parameter, and a globally optimal algorithm. Then, in Section 4 we employ the rank-two Broyden-Fletcher-Goldfarb-Shanno (BFGS) updating techniques to update the Hessian matrix or its inversion, resulting in several novel optimal algorithms. The numerical examples are tested in Section 5 to assess the performance of the newly proposed algorithms. Finally, the conclusions are drawn in Section 6.

#### 2. An Invariant Manifold

From this section on, we focus on the local minimum problem defined in terms of in (6), rather than that of . When the novel algorithms are developed, we will return to the minimization problem (1). The present approach is different from the conventional line search method by minimizing the steplength in where is a given search direction.

At the first, we consider an iterative scheme of derived from the ordinary differential equations (ODEs) defined on an invariant manifold which is formed from : Here, we let be a function of a fictitious time variable . We do not need to specify the function a priori, of which is merely a measure of the decreasing of in time. Hence, we expect that in our algorithm if is an increasing function of , the iterative point can tend to the minimal point. We let , and is determined from the initial condition by We can suitably choose the constant and hence in (6), such that . Indeed, the different level of does not alter its minimal point.

When and , the manifold defined by (8) is continuous, and thus the following differential operation being carried out on the manifold makes sense. For the requirement of consistency condition we have which is obtained by taking the time differential of (8) with respect to , using (6), and considering . We suppose that is governed by the following ODEs: where is to be determined. Inserting (11) into (10) we can solve where

We further suppose that where is a parameter to be determined below through the solution of an optimal equation derived, and is a descent matrix to be specified. Here, we assert that the driving vector is an optimal linear combination of the gradient vector and a supplemental vector being the gradient vector times .

#### 3. Numerical Methods

##### 3.1. Convergence Theorem

Before the derivation of optimal algorithms we can prove the following convergence result.

Theorem 1. *For an iterative scheme to solve (1) by
**
which is generated from the ODEs in (11), the iterative point on the manifold (8) has the following convergence rate:
**
where
**
and is a relaxation parameter.*

*Proof. *The proof of this theorem is quite lengthy and we divide it into three parts.

(A) Inserting (12) into (11) we can obtain an evolution equation for :
In the algorithm if can be guaranteed to be an increasing function of , we might have an absolutely convergent property in finding the minimum of through the following equation:
which is obtained from (8). Here we simplify the notation of to .

(B) By applying the Euler method to (20) we can obtain the following algorithm:
where

In order to keep on the manifold defined by (21) we can insert the above into
and obtain
Thus, by (13), (21), and (6) and through some manipulations we can derive the following scalar equation:
where
of which the inequality can be achieved by taking a suitable value of and hence in (6).

(C) Let
and by (26) we can derive
From (29), we can take the solution of to be
Let
and the sufficient condition in (30) is satisfied automatically, and thus by (30) and (31) we can obtain a preferred solution of by
Here is a relaxation parameter. The inequality follows from (32), , and . Inserting the above into (22) and using (19) we can derive algorithm (16). For this algorithm we can define the local convergence rate by
which, using (28) and (21) and just proved, renders (17). This ends the proof of Theorem 1.

##### 3.2. Optimization of

In Algorithm (22) we do not yet specify how to choose the parameter . We can determine a suitable value of such that defined in (32) is minimized with respect to , because a smaller will lead to a faster convergence as shown by (17).

Thus by inserting (19) for into (32) we can write to be where as defined by (15) includes a parameter . Let , and through some algebraic operations we can solve and denote it by where the subscript signifies that is the optimal value of .

*Remark 2. *For the usual three-dimensional vectors , , , the following formula is famous:
Liu [20] has developed a Jordan algebra by extending the above formula to vectors in -dimension:

In terms of the Jordan algebra we can write
where the symmetry of was used. It can be seen that the above equation is a more symmetric form than that in (36).

##### 3.3. An Optimal Algorithm

Now we can let denote the numerical value of at the th step and go back to , to , to , and to . Thus, by using (16) we can derive an iterative algorithm: where

Therefore, we have the following optimal algorithm (OA).(i)Select , and give an initial .(ii)For , we repeat the following iterations: If , then stop; otherwise, go to step (ii).

Again we emphasize that we need to solve for the minimization problem (1), and hence the residual means the value of . The above convergence criterion means that when the residual norm is smaller than a given error tolerance , the iterations are terminated.

##### 3.4. A Critical Value for

In Sections 3.2 and 3.3 we have used (or equivalently, ) to find the optimal value of in the descent vector . Usually, this value of obtained from is not the global minimum of (or ). Here, we try another approach and attempt to derive a better value of than , such that the value of obtained in this manner is the global minimum of (or ).

In practice, we can take When is near to 0.5, the convergence speed is very fast. Inserting (15) for into the above equation and through some elementary operations we can derive a quadratic equation to solve : where

If the following condition is satisfied:
then in (44) has a* real solution*:

Inserting (45)–(47) into the* critical equation*:
we can derive an algebraic equation to determine that is the lowest bound of (48).* In this lowest bound ** is a critical value denoted by *, and for all it can satisfy (48) automatically. From (50) through some elementary operations, the critical value can be solved as
Then by inserting it for into (45) and (46) we can obtain a* critical value* for from (49):
where was used in view of (50).

By inserting (51) for into (52) and cancelling out the common term we can derive a final equation for , of which we use the same symbols for saving notations:

Here we must emphasize that, in the current descent vector , the above value is the* best one*, and the vector
is the* best descent vector. Due to its criticality, if one attempts to find a better value of the parameter ** than **, there would be no real solution of *. Furthermore, the best descent vector is also better than the optimal vector derived in Section 3.2.

##### 3.5. A Globally Optimal Algorithm

Then, we can derive the following* globally optimal algorithm* (GOA) to solve the minimization problem in (1).(i)Select and give an initial guess of .(ii)For , we repeat the following iterations:
If converges according to a given stopping criterion , then stop; otherwise, go to step (ii).

*Remark 3. *We have derived a* novel globally optimal algorithm* for solving the minimization problem in (1). In terms of the descent vector , the GOA is the best one, which leads to the* global minimum* of (or ) and hence the* largest convergence rate*. While the parameter is chosen by the user with problem dependence, the parameter is exactly given by (56). Up to here we have successfully derived a novel best descent vector algorithm, with the help from (19), (53), and (54).

*Remark 4. *At the very beginning we have set an invariant manifold in (8) as our starting point to derive the iterative optimal algorithms, which includes a locally objective function in the governing equations. However, in the final stage the terms which include can be cancelled out, and thus we have obtained the optimal algorithm in (42) and the globally optimal algorithm in (57), which are both independent of .

#### 4. The Broyden-Fletcher-Goldfarb-Shanno Updating Techniques

In the above we have derived two optimal algorithms by leaving the descent matrix to be specified by the user. First we fix to be the exact Hessian matrix , and then we can obtain two optimal algorithms OA and GOA.

We can also apply the technique of BFGS by updating the Hessian matrix and its inverse matrix . To derive this updating technique let us mention the Newton iterative scheme to solve (1): where is the optimal steplength along the Newton direction at the th step. In order to construct a matrix to approximate we can analyze the relation between and the first order derivative . We take the Taylor expansion of at a point , obtaining Then we have Let and from (60) we have Assume that the Hessian matrix is invertible and denote the inverse matrix by . Then from the above equation we have the so-called quasi-Newton condition:

When we take the descent matrix to be the inverse of the Hessian matrix , we might accelerate the convergence speed, which is however a more difficult task when the dimension of the Hessian matrix is quite large. So far, as that done in the quasi-Newton method, we can employ the following updating technique due to BFGS: The advantage by using the above updating technique is that we can obtain an approximation of the inverse of the Hessian matrix, and we do not need to really calculate .

By the same token we can also apply the technique of BFGS to update the Hessian matrix without needing the computation of the real Hessian matrix: where at the first step we can take . The advantage of this approach is that we do not need to calculate the Hessian matrix exactly, as developed independently by Broyden [21], Fletcher [22], Goldfarb [23], and Shanno [24]. It is easy to check that the updating technique in (66) satisfies the quasi-Newton condition: which is an inversion relationship of (64).

The present notations of and are the same as those and used by Dai [25, 26].

The above technique together with the OA by using generated from (66) will be named the OABFGS, and two numerical examples will be given to test its performance.

When the matrix is given exactly by the Hessian matrix in the OA and GOA, we use the above technique to compute in the OA and GOA, and the resultant iterative algorithms are named OABFGS1 and GOABFGS1, respectively. Finally, if in the OA and GOA both and are updated by (66) and (65), the resultant iterative algorithms will be named OABFGS2 and GOABFGS2, respectively.

In order to show the high performance of the optimal algorithms OABFGS1, GOABFGS1, and OABFGS2, we will also apply the DFP method [18, 19] to solve nonlinear minimization problems, of which the updating technique is For each iterative step we search the minimization of to find by solving the optimality equation by using the half-interval method. Gill et al. [27], among several others, have shown that the modification in (65) performs more efficiently than that in (68) for most problems.

#### 5. Numerical Examples

In order to evaluate the performance of the newly developed methods let us investigate the following examples. Some results are compared with those obtained from the steepest descent method (SDM) and the DFP method [18, 19]. In the above algorithms there exists a relaxation parameter , which is problem dependent. A good parameter value of can be selected easily by comparing the convergence speeds for different values of .

*Example 1. *As the first testing example of OA and GOA we use the following function given by Rosenbrock [28]:
In mathematical optimization, the Rosenbrock function is a nonconvex function used as a performance test benchmark problem for optimization algorithms. It is also known as Rosenbrock’s valley or Rosenbrock’s banana function. The minimum is zero occurring at . This function is difficult to be minimized because it has a steep sided valley following the parabolic curve . Kuo et al. [29] have used the particle swarm method to solve this problem; however, the numerical procedures are rather complex. Liu and Atluri [30] have applied a fictitious time integration method to solve the above problem under the constraints of and , whose accuracy can reach to the fifth order.

We apply the OA to this problem by starting from the initial point at and under a convergence criterion . The SDM is run 3749 steps as shown in Figures 1(a) and 1(b) by solid lines for showing the residual and the objective function . The SDM can reach a very accurate minimum value of with . The OA with converges very fast with only 6 steps, with the residual and the objective function being shown in Figures 1(a) and 1(b) by dashed lines. The OA is faster than the SDM with about 625 times, and furthermore the minimum value of can be reduced to . In Figure 1(c) we compare the solution paths generated by the SDM and OA. When the SDM is moving very slowly along the valley, the OA is moving outside the region of valley and is convergent very fast to the solution. In Figure 1(c) the red dashed line is used to represent the valley of the Rosenbrock function, which is not used to indicate the result of a numerical method. As shown in Figure 1(b), GOA is slightly better than OA, although the paths of OA and GOA seem to be identical in Figure 1(c).

The SDM usually works very well during early stages; however, as a stationary point is approached, it behaves poorly, taking small nearly orthogonal steps. On the other hand, with the help of the optimal direction the OA can fast reach to the final end point with high accuracy. For this example the GOA spends the same number of steps as the OA; however, the GOA gives very accurate minimum value of .

**(a)**

**(b)**

**(c)**

*Example 2. *Next, we consider a generalization of the Rosenbrock function [31, 32]:
This variant has been shown to have exactly one minimum for at and exactly two minima for . The global minimum happens at and a local minimum is near . This result is obtained by setting the gradient of the function equal to zero, noticing that the resulting equation is a rational function of . For small the polynomials can be determined exactly and Sturm’s theorem can be used to determine the number of real roots, while the roots can be bounded in the region of [32]. For larger this method breaks down due to the size of the coefficients involved.

We apply the OA to this problem with , starting from , and under a convergence criterion . Under the above stopping criterion, the SDM is run over 13830 steps as shown in Figure 2(a) by solid lines for showing the residual and . The SDM can reach a very accurate minimum value of with . The OA with converges with 9956 steps, with the residual and being shown in Figure 2 by dashed lines. The OA is faster than the SDM, and similarly the minimum of can be reduced to .

At the same time we show the values of the optimal in Figure 3(a) for the OA. The value of is much smaller than 1, which means that the term plays a dominant role in the descent direction; however, the influence of , although a little, is a key point to speed up the convergence.

Then we apply the GOA to this problem under the same conditions as that in OA. The GOA with converges with 9846 steps, with the residual and being shown in Figure 2 by dashed-dotted lines. The GOA is faster than the SDM, and the minimum value of can be further reduced to . We show the values of the optimal for the GOA in Figure 3(b). This problem is quite difficult, and many other algorithms are failure to solve this problem.

**(a)**

**(b)**

**(a)**

**(b)**

*Example 3. *We consider a problem due to Powell [33]:
The minimum of is zero occurring at . We apply the OA to this problem, starting from and under a convergence criterion of . The SDM converges very slowly over 50,000 steps as shown in Figure 4 by solid lines for showing , , and the residual. The SDM can reach a very accurate value of . At the same time, the OA with converges with 349 steps, with , , and the residual shown in Figure 4 by dashed lines. The OA is faster than the SDM over 130 times, and furthermore we can get a more accurate .

As shown in Figure 4(c) the residual obtained by the OABFGS is rapidly growing to the order and then fast decaying to the order , and through 1043 steps the OABFGS leads to a better solution with than that obtained by the SDM. Here the combination of the optimal algorithm with the BFGS updating technique is very useful when the exact Hessian matrix is not available or when the computation of the Hessian matrix is cumbersome.

Then we apply the GOA in Section 3.5 to the Powell problem. Under the same convergence criterion of , the GOA with converges only with 96 steps as shown in Figure 4(c), where the GOA is faster than the SDM with 500 times and than OA with 3.5 times, and furthermore we can get a very accurate minimum value . In Figure 5 we compare the values of obtained by the OA and the GOA, of which we can observe that both are quite small and may be negative.

Then, we apply the OABFGS1 and the GOABGGS1 mentioned in Section 4 to solve the Powell problem. Under the same convergence criterion of , the OABFGS1 with converges only with 29 steps as shown in Figure 6(a), where we can get very accurate value of . Similarly, the GOABFGS1 with converges with 30 steps as shown in Figure 6(a), where we can get very accurate value of . In Figure 6(b) we compare the values of obtained by the OABFGS1 and GOABFGS1, of which we can observe that both are quite large, which means that the quasi-Newton direction is a dominant factor to accelerate the convergence speed.

Then, we apply the OABFGS2 to solve the Powell problem with , which converges through 116 steps and with the value of to be , which is very accurate. In Figures 6(c) and 6(d) we show the residual and the value of obtained by the OABFGS2.

In order to reveal the high performance of the optimal algorithms OABFGS1, GOABFGS1, and OABFGS2, we apply the DFP method [18, 19] to solve the Powell problem, which converges through 131 steps as shown in Figure 6(a). For each iteration step we search the minimization of to find by solving the optimality equation by using the half-interval method, with two end points fixed by and , and the convergence criterion is given by . At the end of the iteration we obtain the minimum value of to be , which is very accurate.

**(a)**

**(b)**

**(c)**

**(a)**

**(b)**

**(c)**

**(d)**

*Example 4. *In this example we design an office block inside a structure with a curved roof given by . Suppose that the number of total cuboids is and each cuboid can have different size, and we attempt to find the dimensions of all cuboids with maximum volume which would fit inside the given roof structure; that is,
where is the height of the th cuboid.

The maximum of is tending to when is increasing. When , we apply the GOA to this problem by starting from and under a convergence criterion of . The SDM is convergent with 6868 steps as shown in Figure 7 by solid lines for showing , , residual, and . At the same time, the GOA with converges with 96 steps, with , , residual, and being shown in Figure 7 by dashed lines. Both are tending to 661.9945. The GOA is faster than the SDM with 80 times. The heights and widths of the cuboids with respect to the number of floors are plotted in Figure 8. For this problem both OA and GOA lead to the same numerical results.

Moreover, we also apply the OABFGS and the OABFGS1 as well as the DFP to this problem under the above same initial condition and convergence criterion, where for the DFP we use the half-interval method to solve the local minimization to find the best with two end points fixed by and , and the convergence criterion is given by . The values of used in the OABFGS and OABFGS1 are, respectively, 0.3 and 0.05. The residuals of these three methods are compared in Figure 9(c), from which we can observe that the OABFGS converges with 35 steps, the OABFGS1 converges with 32 steps, and the DFP converges with 46 steps. They are all better than the above OA and GOA algorithms. The value of as shown in Figure 9(a) for the OABFGS is quite small, which indicates that the descent direction is dominated by the gradient direction. The value of as shown in Figure 9(b) for the OABFGS1 is quite large, which indicates that the descent direction is dominated by the quasi-Newton direction.

**(a)**

**(b)**

**(c)**

**(d)**

**(a)**

**(b)**

**(c)**

*Example 5. *In this example we test the minimization of the Schwefel function with :
The minimum is zero at .

We apply the OA to this problem by starting from and under a convergence criterion of . The SDM does not converge within 10000 steps as shown in Figure 10 by solid lines for showing , , residual, and . At the same time, the OA with converges with 276 steps, with , , residual, and being shown in Figure 10 by dashed lines. The OA is faster than the SDM over 100 times, and is tending to which is more accurate than obtained by the SDM. When we apply the GOA with to this problem, it is convergence with 299 steps and is faster than the SDM over 100 times, and is tending to which is more accurate than SDM and OA.

**(a)**

**(b)**

**(c)**

**(d)**

*Example 6. *In this example we test the minimization of the Whitley function with :
where . The minimum is zero at . Here, we fix .

It is very difficult to minimize the Whitley function. The SDM diverges as shown in Figure 11 by solid lines. We apply the OA to this problem, starting from and under a convergence criterion of . The OA with converges with 24 steps, with residual and being shown in Figure 11 by dashed lines and tending to .

**(a)**

**(b)**

#### 6. Conclusions

By formulating the minimization problem into a continuous manifold, we can derive a governing system of ODEs for deriving the iterative algorithms which being proven convergence to find the unknown minimum point of a given nonlinear minimization function. The novel algorithm is named “an optimal algorithm (OA)," because in the local frame we have derived the optimal parameter of in the descent direction, which is a linear combination of the gradient vector and a supplemental vector. We have demonstrated a critical descent vector to derive a* globally optimal algorithm* (GOA), which can substantially accelerate the convergence speed in the numerical solution of nonlinear minimization problem. It was proven that the critical value in the critical descent vector leads to the* largest convergence rate* among all the descent vectors specified by . Due to its criticality, if one attempts to find a better descent vector than , there would be no real descent vector of . Through several numerical tests we found that the both the OA and the GOA outperformed very well. Then we have proposed novel algorithms based on OA and GOA by replacing the Hessian matrix and the descent matrix with the updating techniques of BFGS for one or for both of these two matrices. Two numerical examples were given to test the performances of these algorithms, which are faster than the original OA and GOA algorithms, due to the enhancement by using the quasi-Newton conditions on the updated matrices.

#### Conflict of Interests

This paper is a purely academic research, and the author declares that there is no conflict of interests regarding the publication of this paper.

#### Acknowledgments

The author highly appreciates the constructive comments from anonymous referees, which improved the quality of this paper. Highly appreciated are also the Project NSC-102-2221-E-002-125-MY3, the 2011 Outstanding Research Award from the National Science Council of Taiwan, and the 2011 Taiwan Research Front Award from Thomson Reuters. It is also acknowledged that the author has been promoted as being a Lifetime Distinguished Professor of National Taiwan University since 2013.