Stochastic Systems 2013View this Special Issue
Research Article | Open Access
Relationship between Maximum Principle and Dynamic Programming for Stochastic Recursive Optimal Control Problems and Applications
This paper is concerned with the relationship between maximum principle and dynamic programming for stochastic recursive optimal control problems. Under certain differentiability conditions, relations among the adjoint processes, the generalized Hamiltonian function, and the value function are given. A linear quadratic recursive utility portfolio optimization problem in the financial engineering is discussed as an explicitly illustrated example of the main result.
The nonlinear backward stochastic differential equation (BSDE) was introduced by Pardoux and Peng . Independently, Duffie and Epstein  introduced BSDE from economic background. In , they presented a stochastic differential formulation of recursive utility. Recursive utility is an extension of the standard additive utility with the instantaneous utility depending not only on the instantaneous consumption rate but also on the future utility. As found by El Karoui et al. , the utility process can be regarded as a solution to a special BSDE. The optimal control problem that the cost functional is described by the solution to a BSDE is called a stochastic recursive optimal control problem. In this case, the control systems become forward-backward stochastic differential equations (FBSDEs). This kind of optimal control problems has found important applications in real-world problems such as mathematical economics, mathematical finance, and engineering (see Schroder and Skiadas , El Karoui et al. [3, 5], Ji and Zhou , Williams , and Wang and Wu ).
It is well known that Pontryagin’s maximum principle and Bellman’s dynamic programming are two of the most important tools in solving stochastic optimal control problems. See the famous reference book by Yong and Zhou  for systematic discussion. For stochastic recursive optimal control problems, Peng  first obtained a maximum principle when the control domain is convex. And then Xu  studied the nonconvex control domain case, but he needs to assume that the diffusion coefficient does not contain the control variable. Ji and Zhou  established a maximum principle when the forward state is constrained in a convex set at the terminal time. Wu  established a general maximum principle, where the control domain is nonconvex and the diffusion coefficient depends on the control variable. Maximum principle for stochastic recursive optimal control systems with Poisson jumps, and their applications in finance were studied in Shi and Wu , where the control domain is convex.
For another important approach—dynamic programming—to study stochastic recursive optimal control problems, Peng  (also see Peng ) first obtained the generalized dynamic programming principle and introduced a generalized Hamilton-Jacobi-Bellman (HJB) equation which is a second-order parabolic partial differential equation (PDE). Result that the value function is the viscosity solution to the generalized HJB equation is also proved in . Wu and Yu  extended the results of [14, 15] with obstacle constraint for the cost functional described by the solution to a reflected backward stochastic differential equation and proved that the value function is the unique viscosity solution to their generalized HJB equation. Li and Peng  generalized the results of [14, 15] by considering the cost functional defined by the controlled BSDE with jumps. They proved that the value function was the viscosity solution to the associated generalized HJB equation with integral-differential operators.
Hence, a natural question arises: are there any relations between these two methods? Such a topic was intuitively discussed by Bismut  and Bensoussan  and then studied by many researchers. Under certain differentiability conditions, the relationship between the maximum principle and dynamic programming is essentially the relationship between the derivatives of the value function and the solution to the adjoint equation along the optimal state. However, the smoothness conditions do not hold in general and are difficult to verify a priori, see Zhou  for the deterministic case and Yong and Zhou  for its stochastic counterpart. Zhou  first obtained the relationship between general maximum principle and dynamic programming using the viscosity solution theory (see also Zhou  or Yong and Zhou ), without the assumption that the value function is smooth. For diffusion with jumps, the relationship between maximum principle and dynamic programming was first given by Framstad et al. [23, 24] under certain differentiability conditions, and then Shi and Wu  eliminated these restrictions within the framework of viscosity solutions. For singular stochastic optimal control problem, the relationship between maximum principle and dynamic programming was given by Bahlali et al. , with the derivatives of the value function. For Markovian regime-switching jump diffusion model, the relationship between maximum principle and dynamic programming was given by Zhang et al. , also with the derivatives of the value function.
In this paper, we derive the relationship between maximum principle and dynamic programming for the stochastic recursive optimal control problem. For this problem, we connect the maximum principle of  with the dynamic programming of [14, 15], under certain differentiability conditions. Specifically, when the value function is smooth, we give relations among the adjoint processes, the generalized Hamiltonian function, and the value function. For this target, in Section 2, we first adopt some related results of [14, 15], which in this paper are stated as a stochastic verification theorem. Also we prove that under additional convexity conditions, the necessary conditions in the maximum principle of  are in fact sufficient. In Section 3, we show the relationship between maximum principle and dynamic programming under certain differentiability conditions for our stochastic recursive optimal control problem, by the martingale representation technique. In Section 4, we discuss a linear quadratic (LQ) recursive utility portfolio optimization problem in the financial engineering. In this problem, the state feedback optimal control is obtained by both the maximum principle and dynamic programming approaches, and the relations we obtained are illustrated explicitly. Finally, we end this paper with some concluding remarks in Section 5.
Notations. Throughout this paper, we denote by the space of -dimensional Euclidean space, by the space of matrices, and by the space of symmetric matrices. and denote the scalar product and norm in the Euclidean space, respectively. appearing in the superscripts denotes the transpose of a matrix.
2. Problem Statement and Preliminaries
Let be a complete probability space equipped with a -dimensional standard Brownian motion . For fixed , the filtration is generated as , where contains all -null sets in and denotes the -field generated by . In particular, if , we write .
Let be finite and let be nonempty, convex. For any initial time and state , consider the state given by the following controlled SDE: Here , are given functions.
Given , we denote by the set of -adapted processes. For given and , an -valued process is called a solution to (1) if it is an -adapted process such that (1) holds. We refer to such as an admissible control and as an admissible pair. We assume the following.(H1), are uniformly continuous in , and there exists a constant such that for all , , , For any , under (H1), SDE (1) has a unique solution by the classical SDE theory (see, e.g., Yong and Zhou ).
Next, we introduce the following controlled BSDE coupled with controlled SDE (1): Here , are given functions. We assume the following.(H2), are uniformly continuous in and there exists a constant such that for all , , , , , Then for any and the given unique solution to (1), under (H2), BSDE (3) admits a unique solution by the classical BSDE theory (see Pardoux and Peng  or Yong and Zhou ).
Given , we introduce the cost functional
Our recursive stochastic optimal control problem is the following.
Problem 1 (RSOCP). For given , to minimize (5) subject to (1)–(3) over .
We define the value function Any achieves the above infimum is called an optimal control, and the corresponding solutions to (1) and to (3) are called optimal state. For simplicity, we refer to as the optimal quartet.
Remark 1. Because , , , are all deterministic functions, then from [15, Proposition 5.1 of Peng], we know that under (H1) and (H2), the above value function is a deterministic function. So our definition (6) is meaningful.
We introduce the following generalized HJB equation: where the generalized Hamiltonian function is defined as
We have the following result.
Lemma 2 (stochastic verification theorem). Let (H1)-(H2) hold and let be fixed. Suppose that is a solution to (7), then Furthermore, an admissible pair is optimal for Problem (RSOCP) if and only if a.e. , -a.s.
Proof. For any with the corresponding state , applying Itô’s formula to , we obtain the following: Thus (9) holds. Next, applying the above inequality to , we have The desired result follows immediately from the fact that which is due to the generalized HJB equation (7). The proof is complete.
This kind of control system was studied by Peng  and a maximum principle was obtained. In order to mention his result, we need the following assumption.(H3), , are continuously differentiable in and is continuously differentiable in . Moreover, , , , , , , , are bounded and there exists a constant such that
Let be an optimal quartet. For all , we denote and similar notations are used for all their derivatives.
We introduce the adjoint equation which is an FBSDE: and the Hamiltonian function is defined as
Under (H1)–(H3), the forward equation in (17) admits an obvious unique solution , and then the backward equation in (17) admits a unique solution . We call , , the adjoint processes. Next, the following result holds.
Lemma 3 (necessary maximum principle). Let (H1)–(H3) hold and be fixed. Suppose that is an optimal control for Problem (RSOCP), and is the corresponding optimal state. Let be the adjoint processes. Then a.e. , -a.s.
Proof. It is an immediate consequence of [10, Theorem 4.4 of Peng].
As we mentioned in the introduction, we can also prove that, under some additional convexity conditions, the above necessary condition in Lemma 3 is also sufficient.
Lemma 4 (sufficient maximum principle). Let (H1)–(H3) hold. Suppose that is an admissible control, and is the corresponding state, with , . Let be the adjoint processes. Suppose that the Hamiltonian function is convex in . Then is an optimal control for Problem (RSOCP) if it satisfies (19).
Proof. For any with the corresponding state . By Remark 1, we have for fixed , Applying Itô’s formula to , noting (14), (17), and , , we get Since is convex in , then by the maximum condition (19), Thus is really the optimal control for Problem (RSOCP). The proof is complete.
3. Relationship between Maximum Principle and Dynamic Programming
In this section, we investigate the relationship between the maximum principle and dynamic programming. That is, the connection between the value function , the generalized Hamiltonian function , and adjoint processes , , . Our main result is the following.
Theorem 5. Let (H1)–(H3) hold and let be fixed. Suppose that is an optimal control for Problem (RSOCP), and is the corresponding optimal state. Let be the adjoint processes. If , then a.e. , -a.s. Furthermore, if and is also continuous, then where
Proof. Obviously (25) can be obtained via solving the forward SDE in (17) directly. Now let us prove (24). By [15, Theorem 5.4 of Peng], for fixed , it is easy to obtain
Define a square-integrable -martingale
Thus, by the martingale representation theorem (see Yong and Zhou ), there exists a unique satisfying
where is fixed. Then
On the other hand, applying Itô’s formula to , we obtain
Comparing the above two equalities, we conclude that
However, by the uniqueness of solution to BSDE (3), we have
Since , it satisfies the generalized HJB equation (7), which implies (23). Also, by (7), we have Consequently, if and is also continuous, then This is equivalent to (recall (8)), for all , where with .
On the other hand, applying Itô’s formula to , we get Note that . Applying again Itô’s formula to , we have Hence, by the uniqueness of the solutions to the adjoint equation (17), we obtain (24). The proof is complete.
4. Applications to Financial Portfolio Optimization
In this section, we consider an LQ recursive utility portfolio optimization problem in the financial engineering. In this problem, the optimal portfolio in the state feedback form is obtained by both maximum principle and dynamic programming approaches, and the relations which we obtained in Theorem 5 are illustrated explicitly.
Suppose the investors have two kinds of securities in the market for possible investment choice.(i)A risk-free security (e.g., a bond), where the price at time is given by here is a bounded deterministic function.(ii)A risky security (e.g., a stock), where the price at time is given by here is a 1-dimensional Brownian motion and are bounded deterministic functions with .
Let denote the total market value of the investor’s wealth invested in the risky security which we call portfolio. Given the initial wealth , combining (39) and (40), we can get the following wealth dynamics: We denote by the set of admissible portfolios valued in .
For any given initial wealth , Kohlmann and Zhou  discussed a mean-variance portfolio optimization problem. That is, the investor’s object is to find an admissible portfolio which minimizes the variance Var at some future time under the condition that for some given . Using the Lagrange multiplier method, we know that it is equivalent to study the following problem: where some is given. Using the completion of squares technique, the study of  obtained an optimal portfolio in the state feedback form by some stochastic Riccati equation and BSDE. The optimal value function was also obtained.
In this paper, we generalize the above mean-variance portfolio optimization problem to a recursive utility portfolio optimization problem. The recursive utility means that the utility at time is a function of the future utility (in this section, we do not consider the consumption). In fact, in our framework, the recursive utility can be assumed to satisfy some controlled BSDE.
We consider a small investor, endowed with initial wealth , who chooses at each time his/her portfolio . The investor wants to choose an optimal portfolio to maximize the following recursive utility functional with generator: where constant .
Remark 6. In fact, the recursive utility functional (43) defined above stands for some standard additive utility of recursive type. It is a meaningful and nontrivial generalization of the classical standard additive utility and has many applications in mathematical economics and mathematical finance. For more details about utility functions, see Duffie and Epstein , Section 1.4 of El Karoui et al.  or Schroder and Skiadas .
More precisely, for any , the investor’s utility functional is defined by where
In fact, in our framework, the wealth process and recursive utility process can be regarded as the solution to the following controlled FBSDE: and our portfolio optimization problem can be rewritten as (denoting )
Since we are going to involve the dynamic programming in treating the above problem, we will adopt the formulation as in Section 2. Let be given. For any , consider the following controlled FBSDE: And our recursive utility portfolio optimization problem is to find an optimal portfolio to minimize the recursive utility functional . We define the value function as
We can check that all the assumptions in Section 2 are satisfied. Then we can use both the dynamic programming (Lemma 2) and maximum principle (Lemmas 3 and 4) approaches to solve the above problem (49).