A Stochastic Optimal Regulator for a Class of Nonlinear Systems
This work investigates an optimal control problem for a class of stochastic differential bilinear systems, affected by a persistent disturbance provided by a nonlinear stochastic exogenous system (nonlinear drift and multiplicative state noise). The optimal control problem aims at minimizing the average value of a standard quadratic-cost functional on a finite horizon. It has been supposed that neither the state of the system nor the state of the exosystem is directly measurable (incomplete information case). The approach is based on the Carleman embedding, which allows to approximate the nonlinear stochastic exosystem in the form of a bilinear system (linear drift and multiplicative noise) with respect to an extended state that includes the state Kronecker powers up to a chosen degree. This way the stochastic optimal control problem may be restated in a bilinear setting and the optimal solution is provided among all the affine transformations of the measurements. The present work is a nontrivial extension of previous work of the authors, where the Carleman approach was exploited in a framework where only additive noises had been conceived for the state and for the exosystem. Numerical simulations support theoretical results by showing the improvements in the regulator performances by increasing the order of the approximation.
Consider the following stochastic differential system described by means of the Itô formalism:where is the state of the system, is the control input, and is the measured output. and are independent standard Wiener processes with respect to a family of increasing σ-algebras , referred to a probability space . The standard assumption rank is made. The initial state is an -measurable random vector, independent of and . is a persistent disturbance generated by the following nonlinear stochastic exogenous differential system (the exosystem):where is a smooth nonlinear map. is the element of which represents a standard Wiener process with respect to , independent of the state and output noise processes and .
Linear and nonlinear exosystems have been widely exploited to model uncertainties as well as sustained perturbations, especially within applicative engineering frameworks such as missile systems, robotics, and wind turbines (one can refer to  and references therein, where an observer-based approach is exploited to estimate the unknown exosystem). Explicit noises in the exosystem dynamics could further enhance the nonlinearities of the system, and they have been considered in the recent literature for an exosystem with linear drift .
The optimal control problem here investigated refers to the following standard quadratic-cost index to be minimized on a finite horizon :where S, Q are symmetric positive-semidefinite matrices and R a symmetric positive-definite matrix. stands for the expectation value operator.
The problem under investigation is clearly framed in the context of stochastic optimal control problems for nonlinear systems. Even though neglecting the stochastic disturbances, the nonlinear fashion of the optimal control problem does not ensure analytical solutions from the application of the maximum principle, because it requires the solution of the Two-Point Boundary Value problem (see, e.g., [2–4] and references therein). Dealing with stochastic systems, such investigation has been usually carried out in the framework of a complete knowledge of the state of the system (i.e., the state is directly available with no need for outputs providing possibly noisy/incomplete measurements of the state). Usual difficulties involve the solution of the Hamilton–Jacobi–Bellman (HJB) equations associated to the optimal control problem: in , the stochastic HJB equation is iteratively solved with successive approximations; in , the infinite-time HJB equation is reformulated as an eigenvalue problem; in , a transformation approach is proposed for solving the HJB equation arising in quadratic-cost control for nonlinear deterministic and stochastic systems. Finally, in a pair of recent papers, a solution to the nonlinear HJB equation is provided, by expressing it in the form of decoupled Forward and Backward Stochastic Differential Equations (FBSDEs), for an - and an -type optimal control setting (see [8, 9], respectively). As stated above, the solutions proposed in these references rely on a complete knowledge of the state of the system; thus, they do not require any nonlinear state-estimation algorithm to infer information from noisy measurements. To the best of our knowledge, the only reference that deals with stochastic optimal control problems in a nonlinear framework with incomplete knowledge of the state is , though nonlinearities are restricted only to the diffusion term where the noise affects the state dynamics in a multiplicative fashion; the state drift and the output equation providing noisy measurements are both linear.
To cope with the incomplete information case, a state-estimation algorithm is required. The optimal state estimate among all the Borel transformations of the measurements, in this case, requires the knowledge of the whole conditional probability density provided by the solution of the Kolmogorov equation, a nontrivial infinite-dimensional problem. Several methods can be found in the literature in order to achieve it, dealing with techniques inherently based on the searching for PDE numerical solutions (see, e.g., the recent approaches on finite elements methods [11, 12]) or with Monte Carlo approaches such as, among the others, particle methods  or multilevel Monte Carlo methods . All these approaches share a nontrivial computational cost.
A different philosophy consists in introducing an approximation of the original setting according to which the optimization problem is restated in a form for which there exist available solutions in the literature. In this case, a tradeoff should be searched between the simplifications provided by the approximation and its displacement from the real case. For instance, the Extended Kalman Filter relies on the linearization of a stochastic nonlinear system and is among the most widely used algorithms for real-time state estimate because of its simplicity ; nonetheless, there are many applications where linearization is a very coarse approximation of reality and filters simply do not work.
In the spirit of the aforementioned philosophy, in this note, we apply the Carleman approximation to the nonlinear exosystem. The Carleman approach consists in the embedding of the original nonlinear differential stochastic system onto an infinite-dimensional system whose state accounts for the Kronecker powers of any order of the original state. With respect to such a state, the dynamics can be written in a bilinear fashion (linear drift and multiplicative noise), and the ν-degree Carleman approximation is achieved by truncating the higher-than-ν Kronecker powers. The idea is further supported by the new results on polynomial filtering, which take advantage of the polynomial structure of the problem to achieve more accurate estimations, as in [16, 17]. Bilinear systems have gained an increasing interest since early seventies, when they have been started to be investigated as an appealing class of nearly linear systems, ; according to more recent literature, they can be found in different fields of engineering and mathematical sciences, including economics, electronic circuits, and theoretical biology (see  and references therein). Moreover, there can be found suboptimal state-estimation algorithms, suitably designed for stochastic bilinear systems [20, 21]. Within this framework, the Carleman embedding technique has been successfully applied in the recent years both to a discrete- and continuous-time framework to solve filtering problems by first reformulating them in a bilinear fashion and, then, by applying known suboptimal algorithms (see, e.g. [22–25]).
In this note, once the Carleman bilinear approximation of the exosystem is coupled to the state equations, the optimal solution of the reformulated problem is still not implementable because of the nonavailability of a finite-dimensional algorithm for the optimal control. Therefore, we propose the optimal linear regulator, by extending the results of , consisting of the optimal solution among all of the -valued square-integrable affine transformations of the observations. A similar approach can also be found in  in a discrete-time framework, and in  in a continuous-time framework, where only additive noises had been conceived for the state and for the exosystem. The present note is actually a nontrivial extension of , since multiplicative noises are now considered both in the system dynamics and in the exosystem one.
2. Carleman Approximation of the Stochastic Exosystem
Consider the Taylor polynomial expansion around a given point for the exosystem, supposed to exist according to standard analyticity hypotheses. By properly exploiting the Kronecker formalism, defining the displacement , it follows:
The square brackets denote the Kronecker power, and the differential operator applied to a generic function is defined as follows:with and the Jacobian of the vector function ψ (see the Appendix for a quick survey on the Kronecker algebra). Thus, by taking into account (2) and (4), we have thatwhere . We will drop hereafter the explicit dependence of in to shorten notation. The differential of the Kronecker powers of the displacement in (7) is then required to be computed in order to build up the Carleman embedding. By standard Itô calculus , it follows, for any ,
By exploiting Lemma 1 in the Appendix, according to the definition of matrices and in (A.4),with denoting the identity matrix in and
By means of the definitionsequation (8) can be written in more compact form:
Collecting the Kronecker powers , in a unique vector , one obtains the following infinite-dimensional bilinear differential equation:with andwith
The ν-degree Carleman approximation consists in collecting the first ν components of vector in the finite-dimensional vector:and, then, describing its dynamics according to the finite-dimensional version of (13):in which , , , are the finite-dimensional matrices achieved by accounting for the first ν blocks of the Carleman embedding matrices, i.e., accounting for the first rows and columns of (14)–(18). Then, we need to substitute in (1) by means of its Carleman approximation provided by . By doing this, the state dynamics no more refers to the original ; therefore, it will be replaced by . For the same reason, we replace with .
Remark 1. It is worth noting that the Itô correction term in (8) introduces nontrivial blocks in , the dynamic matrix of the Carleman embedding, defined in (14)-(15), see also (11). These blocks could play an active role in determining the stability properties of the Carleman approximation. Clearly, such investigation gains much more importance for control problems that involve the asymptotic behavior of the system, such as in optimal control with infinite-horizon cost functionals.
Therefore, the state dynamics (1), endowed with the output measurements, are now replaced by the following equations:whereFinally, with the aim of writing equation (22) in a more compact form, we define the extended state vectoraccording to which the finite-dimensional bilinear extended system is achieved:Considering (24), the cost functional (3) becomeswith
3. Optimal Linear Regulator
By means of the Carleman approximation scheme, the original nonlinear optimal control problem (1)–(3) is now restated in the problem of minimizing (29), subject to the bilinear system (24). As already stated in the Introduction, the optimal solution is still not affordable according to a finite-dimensional state-estimate algorithm; therefore, we look for suboptimal solutions. To this end, we synthesize the solution providing the minimum of index (29) among all the -valued square-integrable affine transformations of the random variables . Such a problem has been properly formalized (quadratic functional cost and bilinear differential system) in , where a solution is given. It is worth noticing that the solution provided in  is not straightforwardly applicable here because of a constant deterministic drift and of an additive noise in the state equation of the extended system. The extension of  to such a case has been presented in : indeed, although the original nonlinear frameworks addressed in this note and in  are different, the Carleman approximation provides the same mathematical structure for both the embedded system. This fact highlights the advantages of the Carleman approximation, that allows to restate quite different filtering/control problems into the unifying bilinear formulation. Besides, in the spirit of , the results proposed can be interpreted as a Separation Principle in a suboptimal sense, since the optimal linear filter is designed independently of the optimal regulator that benefits of the state estimate according to the incomplete information case. In summary, results in  are quite straightforwardly applicable and resumed in the following theorem that somehow resembles the Separation Principle.
Theorem 1 . Suppose a solution exists for the following backward generalized Riccati equations:with . Then, the solution to the optimal control problem of minimizing the cost criterion (29), under the differential constraints (24), with is given bywithwhere is the optimal (in the sense of the minimum error variance) estimate of among all the -valued square-integrable affine transformations of , which is the projection of onto :(formally the projection onto is a random variable such that the difference is orthogonal to , i.e., is uncorrelated with all random variables in ).
Remark 2. It is worth noting that sums in equation (31) include more terms than the corresponding ones exploited in . This is because, different from , here, the multiplicative noise in the state dynamics makes nontrivial matrices for , see (26).
Concerning the optimal linear filter providing in (32), the following theorem provides the equations. Its proof is a straightforward consequence of Theorem 4.2 in .
Theorem 2. Consider the stochastic system (24) with , , as in (33), and as in (34). Then, satisfies the equation:with and , where is the error covariance matrix evolving according to the equation:with andobeying the following equations:
Note that according to the optimal initialization of the filtering algorithm associated to the proposed control law, the second-order moments of the initial extended state have to be finite and available, which means finite and available moments up to order 2 for and up to order for .
Remark 3. The filter proposed in Theorem 2 provides the optimal linear estimate of as a function of the observations . However, the available measurements are given by the output y (instead of ); therefore, the differential in (33) should be replaced by .
4. Numerical Simulations
Numerical simulations refer to a second-order system for (1) with scalar input and scalar output:
According to (2), two kinds of exosystems are considered:
Disregarding the noise, provides a unique asymptotically stable equilibrium in the origin. The linear approximation around the origin exhibits the same qualitative behavior of the nonlinear exosystem, since it is asymptotically stable. As a matter of fact, the first-order Carleman approximation is expected to work well, with higher-order Carleman approximations playing a marginal role. On the other hand, with regard to , we have a unique unstable equilibrium point in the origin, and the qualitative behavior exhibits a limit cycle in absence of noise. By applying the Carleman linearization around the origin, we find that the approximate linear exosystem is marginally stable. As a matter of fact, for both the original nonlinear system and for its linear approximation, disregarding the noises, we have sustained oscillations converging to different limit cycles (the one related to the linear approximation that strongly depends of the initial conditions). This fact may result in an unsatisfying linear approximation, and the addition of noises may further worsen the performances.
The following matrices are involved in the exosystem (2):
Initial conditions are , for all the simulations, whilst initial conditions for the filtering algorithm are , . The cost functional weight matrices are , , and . The Carleman approximation of the exosystem is achieved around the origin.
100 numerical simulations have been produced for both the types of exosystems, by using the Euler–Maruyama method , with step integration in the time interval . As expected, with regard to , results show increasing improvements of the performance index up to degree , although according to little improvements with respect to the first-order case (, , , ). Results are shown in Figure 1. On the other hand, with regards to , results show that the second-order Carleman approximation is enough to provide a significant improvement in the cost functional (, ) with a reduction of the average value of the cost functional of about 15%. Higher-order approximations exhibit a behavior analogous to the second-order case (the cost functional seems to approach a plateau). Results are shown in Figure 2.
A stochastic optimal control problem has been investigated for bilinear stochastic differential systems, driven by a persistent perturbation provided by an exogenous stochastic nonlinear system with multiplicative noise. This work represents a nontrivial extension of a previous work where additive noise (instead of multiplicative noise as in the present case) was considered affecting both state and exosystem. The approach followed relies on the Carleman embedding, successfully exploited in the stochastic framework in the last decade both for filtering and control purposes.
A. Kronecker Algebra
The symbol denotes the Kronecker matrix product, and the notation is used for the Kronecker power of matrix A, that is, , repeated i times. The trivial case is . Any further details can be found in .
The Kronecker product is not commutative: given a pair of integers , the symbol denotes a commutation matrix, that is a matrix in such that given any two matrices and where , are defined so that, denoted their entries:
The following lemma allows to compute the first- and second-order differentials of the Kronecker power of a given vector and has been used to obtain (12) from (8). Its early version is in . Here, the version of  is reported, due to the recursive feature of the matrix coefficients computation.
Lemma 1. For any , it results thatwhere and are recursively computed asfor , with .
There are no experimental data.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
A. E. Bryson and Y. C. Ho, Applied Optimal Control, Wiley, New York, NY, USA, 1995.
W. Bangerth and R. Rannacher, Adaptive Finite Element Methods for Differential Equations, Birkhäuser, Basel, Switzerland, 2013.
A. H. Jazwinski, Stochastic Processes and Filtering Theory, Academic Press, Cambridge, MA, USA, 1970.
F. Cacace, F. Conte, A. Germani, and G. Palombo, “Quadratic filtering for non-Gaussian and not asymptotically stable linear discrete-time systems,” in Proceedings of the 53rd IEEE Conference on Decision and Control (CDC), Los Angeles, CA, USA, December 2014.View at: Google Scholar
N. S. Patil and S. N. Sharma, “On the mathematical theory of a time-varying bilinear stochastic differential system and its application to two dynamic circuits,” Transactions of the Institute of Systems, Control and Information Engineers, vol. 27, no. 12, pp. 485–492, 2014.View at: Publisher Site | Google Scholar
R. S. Liptser and A. N. Shiryayev, Statistics of Random Processes I and II, Springer, Berlin, Germany, 1977.