Research Article  Open Access
Partitioned QuasiNewton Approximation for Direct Collocation Methods and Its Application to the FuelOptimal Control of a Diesel Engine
Abstract
The numerical solution of optimal control problems by direct collocation is a widely used approach. QuasiNewton approximations of the Hessian of the Lagrangian of the resulting nonlinear program are also common practice. We illustrate that the transcribed problem is separable with respect to the primal variables and propose the application of dense quasiNewton updates to the small diagonal blocks of the Hessian. This approach resolves memory limitations, preserves the correct sparsity pattern, and generates more accurate curvature information. The effectiveness of this improvement when applied to engineering problems is demonstrated. As an example, the fueloptimal and emissionconstrained control of a turbocharged diesel engine is considered. First results indicate a significantly faster convergence of the nonlinear program solver when the method proposed is used instead of the standard quasiNewton approximation.
1. Introduction
QuasiNewton (QN) methods have become very popular in the context of nonlinear optimisation. Above all, in nonlinear programs (NLPs) arising from direct transcription of optimal control problems (OCPs), the Hessian of the Lagrangian often cannot be derived analytically in a convenient way. Algorithmic differentiation may fail due to unsupported operations or blackbox parts in the model functions. Furthermore, both approaches are computationally expensive if the model functions are complex and yield long expressions for the second derivatives. On the other hand, numerical approximation by finite differences is inaccurate and hardly improves the computational performance.
A common approach in these cases is to approximate the Hessian by QN updates using gradient information collected during the NLP iterations. However, if applied to realworld OCPs, several limitations arise. These problems often exhibit large dimensions; thus limitedmemory versions of the approximations are applicable only. Since many updates may be necessary until a good approximation of the full Hessian is obtained, the approximation remains poor when using the most recent steps only. Furthermore, the favourable sparsity structure of the underlying discretisation scheme is generally not preserved. This fillin drastically reduces the performance of solvers for the linear system of equations defining the step direction during each NLP iteration.
Partial separability, a concept introduced by [1], describes a structural property of a nonlinear optimisation problem. When present, this property allows for partitioned QN updates of the Hessian of the Lagrangian (or of the objective, in the unconstrained case). For unconstrained optimisation, this approach was proposed and its convergence properties were analysed in [2]. Although only superlinear local convergence can be proven, a performance close to that obtained with exact Newton methods, which exhibit quadratic local convergence, was observed in practical experiments [3, 4].
This brief paper presents how partitioned QN updates can be applied to the NLPs resulting from direct collocation of OCPs. The concept of direct collocation to solve OCPs has been widely used and analysed [5–9]. We will show that the Lagrangian of the resulting NLPs is separable in the primal variables at each discretisation point. Due to this separability, its Hessian can be approximated by fullmemory QN updates of the small diagonal blocks. This procedure increases the accuracy of the approximation, preserves the sparsity structure, and resolves memory limitations. The results are first derived for Radau collocation schemes, which include the right interval boundary as a collocation point. The adaptations to Gauss schemes, which have internal collocation points only, and to Lobatto schemes, which include both boundaries, are provided thereafter in condensed form. A consistent description of all three families of collocation is provided in [10].
The partitioned update is applied to a realworld engineering problem. The fuel consumption of a turbocharged diesel engine is minimised while the limits on the cumulative pollutant emissions need to be satisfied. This problem is cast in the form of an OCP and is transcribed by Radau collocation. The resulting NLP is solved by an exact and a quasiNewton method. For the latter, the partitioned update achieves an increased convergence rate and a higher robustness with respect to a poor initialisation of the approximation as compared to the full QN update. Therefore, the findings for the unconstrained case seem to transfer to the NLPs resulting from direct collocation of OCPs.
2. Material and Methods
Consider the system of nonautonomous ordinary differential equations (ODEs) on the interval , with , . Radau collocation represents each element of the state vector as a polynomial, say of degree . The time derivative of this polynomial is then equated to the values of at collocation points , The notation is adopted. The left boundary is a noncollocated point in Radau collocation schemes. By introducing the appropriate matrices, this matrix equation in can be written in short as The rows of and correspond to one collocation point each. In turn, the columns of and represent one state variable and its corresponding ODE righthand side at all collocation points. In the following, consider the notation in (2) as shorthand for stacking the transpose of the rows in one large column vector.
The step length of the interval is assumed to be accounted for in the differentiation matrix . Lagrange interpolation by barycentric weights is used to calculate along with the vector of the quadrature weights [11]. The latter may be used to approximate the definite integral of a function as .
2.1. Direct Collocation of Optimal Control Problems
We consider an OCP of the formwhere , , and . Simple bounds on and , equality constraints, and a fixed initial or end state can be included in the path constraints (3d). An objective term in Lagrange form is used, which is preferable over an equivalent Mayer term [12, Section 4.9].
Direct transcription discretises all functions and integrals by consistently applying an integration scheme. Here, integration intervals are used with . The number of collocation points can be different for each interval. Summing up the collocation points throughout all integration intervals results in a total of discretisation points. The “linear index” thereby corresponds to collocation node in interval , The transcribed OCP readsThe notation denotes all instances of variable at any applicable index . The vector of the “global” quadrature weights results from stacking the vectors of the quadrature weights of each interval after removing the first element, which is zero. For the first interval, is the initial state .
The Lagrangian of the NLP (5a)–(5d) is the sum of the objective (5a) and all constraints (5b), (5c), and (5d), which are weighted by the Lagrange multipliers . To clarify the notation, the Lagrange multipliers are grouped according to the problem structure. The multipliers for the discretised dynamic constraints on each integration interval are denoted by , the multipliers for the integral inequalities are stacked in , and the multipliers for the path constraints at each discretisation point are gathered in the vector .
2.2. Separability in the Primal Variables
The objective (5a), the integral inequalities (5c), and the path constraints (5d) are inherently separated with respect to time; that is, the individual terms are pairwise disjoint in and . We thus focus on the separability of the dynamic constraints (5b). For the derivation, we assume that , , and are scalar. The extension to the vectorvalued case is straightforward and will be provided subsequently.
Consider the term of the Lagrangian representing the discretised dynamic constraints (5b) for interval , This formulation constitutes a separation in the dual variables (the Lagrange multipliers). By collecting terms at each collocation point and accounting for the terms in the previous interval, we obtain a separation in the primal variables, with
Each term inside the round brackets in (7) is a collocationpoint separated part of the Lagrangian which stems from the dynamic constraints. We denote these terms by and introduce the notation The gradient with respect to the primal variables is The Hessian is simply
2.2.1. VectorValued Case and Complete Element Lagrangian
For multiple control inputs and state variables, the primal variables at each collocation point become a vector in . Consistently, we define the gradient of a scalar function with respect to as a row vector. Thus, the model Jacobian is a matrix in , and the Hessian of each modelfunction element , , is a square matrix of size . The multiplier itself also becomes a vector in . All terms involving , its Jacobian, or its Hessian therefore turn into sums.
The full element Lagrangian consists of the terms of the dynamic constraints as derived above, plus the contributions of the objective, the integral inequalities, and the path constraints, The Lagrangian of the full NLP is obtained by summing these element Lagrangians, which are separated in the primal variables. Its Hessian thus is a perfect blockdiagonal matrix with uniformly sized square blocks of size .
2.2.2. Extension to Gauss and Lobatto Collocation
Gauss collocation does not include the right interval boundary. Thus, the terms involving can be included locally in each interval, which simplifies the separation in the primal variables. However, the continuity constraint has to be introduced for each interval. Similarly to the procedure above, this constraint can be separated. The quadrature weights are stacked in without any modification.
Lobatto collocation includes both boundaries as collocation points. Thus, the matrix in (1) and (2) has an additional “zeroth” row, and the argument of becomes in (2). The additional term arises in . Each element of corresponding to the interval boundary between any two neighbouring intervals and is a sum of the two weights and .
2.3. BlockDiagonal Approximation of the Hessian
The separability of the problem with respect to the primal variables allows a perfect exploitation of the problem sparsity. In fact, the Jacobian of the objective and the constraints, as well as the Hessian of the Lagrangian, can be constructed from the first and second partial derivatives of the nonlinear model functions , , , and at each discretisation point [13].
We propose to also exploit the separability when calculating QN approximations of the Hessian. These iterative updates collect information about the curvature of the Lagrangian by observing the change of its gradient along the NLP iterations. Although they perform well in practice, they exhibit several drawbacks for large problems.(I)Loss of sparsity. QN approximations generally do not preserve the sparsity pattern of the exact Hessian, which leads to low computational performance [12, Section 4.13]. Enforcing the correct sparsity pattern results in QN schemes with poor performance [14, Section 7.3].(II)Storage versus accuracy. Due to the loss of sparsity, the approximated Hessian needs to be stored in dense format. To resolve possible memory limitations, “limitedmemory” updates can be applied, which rely on a few recent gradient samples only. However, these methods provide less accuracy than their fullmemory equivalents [14, Section 7.2].(III)Dimensionality versus sampling. When sampling the gradient of a function that lives in a highdimensional space, many samples are required to construct an accurate approximation. In fact, to obtain an approximation that is valid along any direction, an entire spanning set needs to be sampled. Although QN methods require accurate secondorder information only along the direction of the steps [15], the step direction may change fast in highly nonlinear problems such as the one considered here. In these cases, an exhaustive set of gradient samples would ensure a fast convergence, which conflicts with (II).
Using approximations of the small diagonal blocks, that is, exploiting the separability illustrated in Section 2.2, resolves these problems.(I)The exact sparsity pattern of the Hessian is preserved.(II)Only numbers have to be stored, compared to for the full Hessian.(III)Since the dimension of each diagonal block is small, a good approximation is already obtained after few iterations of the Hessian update [3, 4]. The partitioned QN update can be combined with the exploitation of the problem sparsity to reduce the number of the model evaluations required. In fact, when these two concepts are combined, the gradients of the model functions at each collocation point are sufficient to construct an accurate and sparsitypreserving approximation of the Hessian of the Lagrangian.
2.4. Implementation
Any QN approximation operates with the differences between two consecutive iterates and the corresponding gradients of the Lagrangian. For constrained problems, The hat indicates the values at the current iteration, that is, the new data. In the following formulas, denotes the QN approximation. Here, the damped BFGS update is used [14, Section 18.3], which reads This update scheme preserves positive definiteness, which is mandatory if a linesearch globalisation is used. In a trustregion framework, indefinite approaches such as the safeguarded SR1 update [14, Section 6.2] could be advantageous since they can approximate the generally indefinite Hessian of the full or element Lagrangian more accurately.
The Hessian block corresponding to the element Lagrangian (12) at collocation point is approximated as follows. In the difference of the gradients, all linear terms cancel. Thus, (16) becomes Recall that the linear index is defined such that . The QN update (17a)–(17c) is applied to each diagonal block individually, with and given by (18). As initialisation, one of the approaches described in [14, Chapter 6] can be used.
2.5. Engineering Test Problem
As a realworld engineering problem, we consider the minimisation of the fuel consumption of a turbocharged diesel engine. The statutory limits for the cumulative and soot emissions are imposed as integral inequality constraints. The control inputs are the position of the variablegeometry turbine, the start of the main injection, the commonrail pressure, and the fuel mass injected per cylinder and combustion cycle. The model is described and its experimental validation is provided in [16]. It features a dynamic meanvalue model for the air path with state variables, and physicsbased models for the combustion and for the pollutant emissions. The resulting OCP is stated in [17, 18]. The desired load torque, the bounds on the actuator ranges, and mechanical and thermal limits are imposed as nonlinear and linear path constraints (3d).
The model evaluations are expensive. Therefore, QN updates are preferable over exact Newton methods to achieve a fast solution process. The main drawback is the slow local convergence rate of QN methods when applied to the large NLPs resulting from the consideration of long time horizons in the OCP [18].
3. Results and Discussion
The results presented here are generated using the NLP solver IPOPT [19, 20] with the linear solver MUMPS [21] and the fillreducing preordering implemented in METIS [22]. Either the exact Hessian, calculated using central finite differences on the model functions, or a full or the partitioned QN update just described is supplied to the solver as userdefined Hessian. In all cases, the first derivatives are calculated by forward finite differences.
Radau collocation at flipped Legendre nodes is applied. These collocation points are the roots of the orthogonal Legendre polynomials and have to be computed numerically [23, Section 2.3]. The resulting scheme, sometimes termed Radau IIA, exhibits a combination of advantageous properties [24], [25, Section 3.5].
Two test cases for the engineering problem outlined in Section 2.5 are considered. Case (a) is a mild driving pattern of 6 s duration which is discretised by firstorder collocation with a uniform step length of 0.5 s. This discretisation results in total collocation points and 117 NLP variables. Test case (b) considers a more demanding driving pattern of 58 s duration and uses thirdorder collocation with a step size of 0.8 s, resulting in collocation points and 1,953 NLP variables.
The performance is assessed in terms of the number of iterations required to achieve the requested tolerance. Figure 1 shows the convergence behaviour of the exactNewton method and of the two QN methods. Starting with iteration 5, the full Newton step is always accepted. Thus, the difference between the local quadratic convergence of the exact Newton method and the local superlinear convergence of the full BFGS update becomes obvious. The partitioned update performs substantially better than the full update. Moreover, the advantage becomes more pronounced when the size (longer time horizon) and the complexity (more transient driving profile, higherorder collocation) of the problem are increased from the simple test case (a) to the more meaningful case (b).
(a)
(b)
The Hessian approximation is initialised by a multiple of the identity matrix; . A factor of is found to be a good choice for the problem at hand. Table 1 shows the number of iterations required as is changed. The partitioned update is robust against a poor initialisation, whereas the full update requires a significant number of iterations to recover. This finding confirms that an accurate approximation is obtained in fewer iterations when the partitioned QN update is applied.

4. Conclusion
We illustrated the separability of the nonlinear program resulting from the application of direct collocation to an optimal control problem. Subsequently, we presented how this structure can be exploited to apply a partitioned quasiNewton update to the Hessian of the Lagrangian. This sparsitypreserving update yields a more accurate approximation of the Hessian in fewer iterations and thus increases the convergence rate of the NLP solver.
A more accurate approximation of the second derivatives from first order information is especially beneficial for highly nonlinear problems for which the exact second derivatives are expensive to evaluate. In fact, for the realworld engineering problem used as a test case here, symbolic or algorithmic differentiation is not an expedient option due to the complexity and the structure of the model. In this situation, using a quasiNewton approximation based on first derivatives calculated by finite differences is a valuable alternative. The numerical tests presented in this paper indicate that a convergence rate close to that of an exact Newton method can be reclaimed by the application of a partitioned BFGS update.
A selfcontained implementation of the partitioned update in the framework of an NLP solver itself could fully exploit the advantages of the method proposed. Furthermore, it should be assessed whether a trustregion globalisation is able to take advantage of an indefinite but possibly more accurate quasiNewton approximation of the diagonal blocks of the Hessian of the Lagrangian.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgment
This work was partially funded by the Swiss Innovation Promotion Agency CTI under Grant no. 10808.1 PFIWIW.
References
 A. Griewank and A. Toint, “On the unconstrained optimization of partially separable functions,” in Nonlinear Optimization, M. Powell, Ed., Academic Press, New York, NY, USA, 1981. View at: Google Scholar
 A. Griewank and P. L. Toint, “Local convergence analysis for partitioned quasiNewton updates,” Numerische Mathematik, vol. 39, no. 3, pp. 429–448, 1982. View at: Publisher Site  Google Scholar  Zentralblatt MATH  MathSciNet
 D. C. Liu and J. Nocedal, “On the limited memory BFGS method for large scale optimization,” Mathematical Programming, vol. 45, no. 1–3, pp. 503–528, 1989. View at: Publisher Site  Google Scholar  Zentralblatt MATH  MathSciNet
 J. Nocedal, “Large scale unconstrained optimization,” in The State of the Art in Numerical Analysis, pp. 311–338, Oxford University Press, New York, NY, USA, 1996. View at: Google Scholar  Zentralblatt MATH  MathSciNet
 L. T. Biegler, “Solution of dynamic optimization problems by successive quadratic programming and orthogonal collocation,” Computers and Chemical Engineering, vol. 8, no. 34, pp. 243–247, 1984. View at: Publisher Site  Google Scholar
 D. Tieu, W. R. Cluett, and A. Penlidis, “A comparison of collocation methods for solving dynamic optimization problems,” Computers and Chemical Engineering, vol. 19, no. 4, pp. 375–381, 1995. View at: Google Scholar
 A. L. Herman and B. A. Conway, “Direct optimization using collocation based on highorder GaussLobatto quadrature rules,” Journal of Guidance, Control, and Dynamics, vol. 19, no. 3, pp. 592–599, 1996. View at: Publisher Site  Google Scholar
 D. A. Benson, G. T. Huntington, T. P. Thorvaldsen, and A. V. Rao, “Direct trajectory optimization and costate estimation via an orthogonal collocation method,” Journal of Guidance, Control, and Dynamics, vol. 29, no. 6, pp. 1435–1440, 2006. View at: Publisher Site  Google Scholar
 S. Kameswaran and L. T. Biegler, “Convergence rates for direct transcription of optimal control problems using collocation at Radau points,” Computational Optimization and Applications, vol. 41, no. 1, pp. 81–126, 2008. View at: Publisher Site  Google Scholar  Zentralblatt MATH  MathSciNet
 D. Garg, M. Patterson, W. W. Hager, A. V. Rao, D. A. Benson, and G. T. Huntington, “A unified framework for the numerical solution of optimal control problems using pseudospectral methods,” Automatica, vol. 46, no. 11, pp. 1843–1851, 2010. View at: Publisher Site  Google Scholar  Zentralblatt MATH  MathSciNet
 J.P. Berrut and L. N. Trefethen, “Barycentric Lagrange interpolation,” SIAM Review, vol. 46, no. 3, pp. 501–517, 2004. View at: Publisher Site  Google Scholar  Zentralblatt MATH  MathSciNet
 J. T. Betts, Practical Methods for Optimal Control and Estimation Using Nonlinear Programming, Advances in Design and Control, Society for Industrial and Applied Mathematics, Philadelphia, Pa, USA, 2nd edition, 2010. View at: Publisher Site  Zentralblatt MATH  MathSciNet
 M. A. Patterson and A. V. Rao, “Exploiting sparsity in direct collocation pseudospectral methods for solving optimal control problems,” Journal of Spacecraft and Rockets, vol. 49, no. 2, pp. 364–337, 2012. View at: Publisher Site  Google Scholar
 J. Nocedal and S. J. Wright, Numerical Optimization, Springer Series in Operations Research and Financial Engineering, Springer, New York, NY, USA, 2nd edition, 2006. View at: MathSciNet
 P. E. Gill and E. Wong, “Sequential quadratic programming methods,” in Mixed Integer Nonlinear Programming, J. Lee and S. Leyffer, Eds., vol. 154 of The IMA Volumes in Mathematics and its Applications, pp. 301–312, Springer, New York, NY, USA, 2012. View at: Google Scholar
 J. Asprion, O. Chinellato, and L. Guzzella, “Optimisationoriented modelling of the ${\text{NO}}_{\text{x}}$ emissions of a diesel engine,” Energy Conversion and Management, vol. 75, pp. 61–73, 2013. View at: Publisher Site  Google Scholar
 J. Asprion, C. H. Onder, and L. Guzzella, “Including drag phases in numerical optimal control of diesel engines,” in Proceedings of the 7th IFAC Symposium on Advances in Automotive Control, pp. 489–494, Tokyo, Japan, 2013. View at: Publisher Site  Google Scholar
 J. Asprion, O. Chinellato, and L. Guzzella, “Optimal control of diesel engines: numerical methods, applications, and experimental validation,” Mathematical Problems in Engineering, vol. 2014, Article ID 286538, 21 pages, 2014. View at: Publisher Site  Google Scholar  MathSciNet
 A. Wächter and L. T. Biegler, “On the implementation of an interiorpoint filter linesearch algorithm for largescale nonlinear programming,” Mathematical Programming, vol. 106, no. 1, pp. 25–57, 2006. View at: Publisher Site  Google Scholar  Zentralblatt MATH  MathSciNet
 “IPOPT homepage,” 2013, http://www.coinor.org/Ipopt. View at: Google Scholar
 “MUMPS homepage,” 2013, http://mumps.enseeiht.fr. View at: Google Scholar
 “METIS—serial graph partitioning and fillreducing matrix ordering,” 2013, http://glaros.dtc.umn.edu/gkhome/metis/metis/overview. View at: Google Scholar
 C. Canuto, M. Y. Hussaini, A. Quarteroni, and T. A. Zang, Spectral Methods: Fundamentals in Single Domains, Scientific Computation, Springer, Berlin, Germany, 2006. View at: MathSciNet
 E. Hairer and G. Wanner, “Stiff differential equations solved by Radau methods,” Journal of Computational and Applied Mathematics, vol. 111, no. 12, pp. 93–111, 1999. View at: Publisher Site  Google Scholar  Zentralblatt MATH  MathSciNet
 J. C. Butcher, Numerical Methods for Ordinary Differential Equations, John Wiley & Sons, Winchester, UK, 2003. View at: Publisher Site  MathSciNet
Copyright
Copyright © 2014 Jonas Asprion et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.