Abstract

This paper is concerned with the relationship between maximum principle and dynamic programming for stochastic recursive optimal control problems. Under certain differentiability conditions, relations among the adjoint processes, the generalized Hamiltonian function, and the value function are given. A linear quadratic recursive utility portfolio optimization problem in the financial engineering is discussed as an explicitly illustrated example of the main result.

1. Introduction

The nonlinear backward stochastic differential equation (BSDE) was introduced by Pardoux and Peng [1]. Independently, Duffie and Epstein [2] introduced BSDE from economic background. In [2], they presented a stochastic differential formulation of recursive utility. Recursive utility is an extension of the standard additive utility with the instantaneous utility depending not only on the instantaneous consumption rate but also on the future utility. As found by El Karoui et al. [3], the utility process can be regarded as a solution to a special BSDE. The optimal control problem that the cost functional is described by the solution to a BSDE is called a stochastic recursive optimal control problem. In this case, the control systems become forward-backward stochastic differential equations (FBSDEs). This kind of optimal control problems has found important applications in real-world problems such as mathematical economics, mathematical finance, and engineering (see Schroder and Skiadas [4], El Karoui et al. [3, 5], Ji and Zhou [6], Williams [7], and Wang and Wu [8]).

It is well known that Pontryagin’s maximum principle and Bellman’s dynamic programming are two of the most important tools in solving stochastic optimal control problems. See the famous reference book by Yong and Zhou [9] for systematic discussion. For stochastic recursive optimal control problems, Peng [10] first obtained a maximum principle when the control domain is convex. And then Xu [11] studied the nonconvex control domain case, but he needs to assume that the diffusion coefficient does not contain the control variable. Ji and Zhou [6] established a maximum principle when the forward state is constrained in a convex set at the terminal time. Wu [12] established a general maximum principle, where the control domain is nonconvex and the diffusion coefficient depends on the control variable. Maximum principle for stochastic recursive optimal control systems with Poisson jumps, and their applications in finance were studied in Shi and Wu [13], where the control domain is convex.

For another important approach—dynamic programming—to study stochastic recursive optimal control problems, Peng [14] (also see Peng [15]) first obtained the generalized dynamic programming principle and introduced a generalized Hamilton-Jacobi-Bellman (HJB) equation which is a second-order parabolic partial differential equation (PDE). Result that the value function is the viscosity solution to the generalized HJB equation is also proved in [14]. Wu and Yu [16] extended the results of [14, 15] with obstacle constraint for the cost functional described by the solution to a reflected backward stochastic differential equation and proved that the value function is the unique viscosity solution to their generalized HJB equation. Li and Peng [17] generalized the results of [14, 15] by considering the cost functional defined by the controlled BSDE with jumps. They proved that the value function was the viscosity solution to the associated generalized HJB equation with integral-differential operators.

Hence, a natural question arises: are there any relations between these two methods? Such a topic was intuitively discussed by Bismut [18] and Bensoussan [19] and then studied by many researchers. Under certain differentiability conditions, the relationship between the maximum principle and dynamic programming is essentially the relationship between the derivatives of the value function and the solution to the adjoint equation along the optimal state. However, the smoothness conditions do not hold in general and are difficult to verify a priori, see Zhou [20] for the deterministic case and Yong and Zhou [9] for its stochastic counterpart. Zhou [21] first obtained the relationship between general maximum principle and dynamic programming using the viscosity solution theory (see also Zhou [22] or Yong and Zhou [9]), without the assumption that the value function is smooth. For diffusion with jumps, the relationship between maximum principle and dynamic programming was first given by Framstad et al. [23, 24] under certain differentiability conditions, and then Shi and Wu [25] eliminated these restrictions within the framework of viscosity solutions. For singular stochastic optimal control problem, the relationship between maximum principle and dynamic programming was given by Bahlali et al. [26], with the derivatives of the value function. For Markovian regime-switching jump diffusion model, the relationship between maximum principle and dynamic programming was given by Zhang et al. [27], also with the derivatives of the value function.

In this paper, we derive the relationship between maximum principle and dynamic programming for the stochastic recursive optimal control problem. For this problem, we connect the maximum principle of [10] with the dynamic programming of [14, 15], under certain differentiability conditions. Specifically, when the value function is smooth, we give relations among the adjoint processes, the generalized Hamiltonian function, and the value function. For this target, in Section 2, we first adopt some related results of [14, 15], which in this paper are stated as a stochastic verification theorem. Also we prove that under additional convexity conditions, the necessary conditions in the maximum principle of [10] are in fact sufficient. In Section 3, we show the relationship between maximum principle and dynamic programming under certain differentiability conditions for our stochastic recursive optimal control problem, by the martingale representation technique. In Section 4, we discuss a linear quadratic (LQ) recursive utility portfolio optimization problem in the financial engineering. In this problem, the state feedback optimal control is obtained by both the maximum principle and dynamic programming approaches, and the relations we obtained are illustrated explicitly. Finally, we end this paper with some concluding remarks in Section 5.

Notations. Throughout this paper, we denote by the space of -dimensional Euclidean space, by the space of matrices, and by the space of symmetric matrices. and denote the scalar product and norm in the Euclidean space, respectively. appearing in the superscripts denotes the transpose of a matrix.

2. Problem Statement and Preliminaries

Let be a complete probability space equipped with a -dimensional standard Brownian motion . For fixed , the filtration is generated as , where contains all -null sets in and denotes the -field generated by . In particular, if , we write .

Let be finite and let be nonempty, convex. For any initial time and state , consider the state given by the following controlled SDE: Here , are given functions.

Given , we denote by the set of -adapted processes. For given and , an -valued process is called a solution to (1) if it is an -adapted process such that (1) holds. We refer to such as an admissible control and as an admissible pair. We assume the following.(H1), are uniformly continuous in , and there exists a constant such that for all , , , For any , under (H1), SDE (1) has a unique solution by the classical SDE theory (see, e.g., Yong and Zhou [9]).

Next, we introduce the following controlled BSDE coupled with controlled SDE (1): Here , are given functions. We assume the following.(H2), are uniformly continuous in and there exists a constant such that for all , , , , , Then for any and the given unique solution to (1), under (H2), BSDE (3) admits a unique solution by the classical BSDE theory (see Pardoux and Peng [1] or Yong and Zhou [9]).

Given , we introduce the cost functional

Our recursive stochastic optimal control problem is the following.

Problem 1 (RSOCP). For given , to minimize (5) subject to (1)–(3) over .
We define the value function Any achieves the above infimum is called an optimal control, and the corresponding solutions to (1) and to (3) are called optimal state. For simplicity, we refer to as the optimal quartet.

Remark 1. Because , , , are all deterministic functions, then from [15, Proposition 5.1 of Peng], we know that under (H1) and (H2), the above value function is a deterministic function. So our definition (6) is meaningful.
We introduce the following generalized HJB equation: where the generalized Hamiltonian function is defined as
We have the following result.

Lemma 2 (stochastic verification theorem). Let (H1)-(H2) hold and let be fixed. Suppose that is a solution to (7), then Furthermore, an admissible pair is optimal for Problem (RSOCP) if and only if a.e. , -a.s.

Proof. For any with the corresponding state , applying Itô’s formula to , we obtain the following: Thus (9) holds. Next, applying the above inequality to , we have The desired result follows immediately from the fact that which is due to the generalized HJB equation (7). The proof is complete.

In convenient to state the maximum principle, we regard the above controlled SDE (1) and BSDE (3) as a controlled FBSDE:

This kind of control system was studied by Peng [10] and a maximum principle was obtained. In order to mention his result, we need the following assumption.(H3), , are continuously differentiable in and is continuously differentiable in . Moreover, , , , , , , , are bounded and there exists a constant such that

Let be an optimal quartet. For all , we denote and similar notations are used for all their derivatives.

We introduce the adjoint equation which is an FBSDE: and the Hamiltonian function is defined as

Under (H1)–(H3), the forward equation in (17) admits an obvious unique solution , and then the backward equation in (17) admits a unique solution . We call , , the adjoint processes. Next, the following result holds.

Lemma 3 (necessary maximum principle). Let (H1)–(H3) hold and be fixed. Suppose that is an optimal control for Problem (RSOCP), and is the corresponding optimal state. Let be the adjoint processes. Then a.e. , -a.s.

Proof. It is an immediate consequence of [10, Theorem 4.4 of Peng].

As we mentioned in the introduction, we can also prove that, under some additional convexity conditions, the above necessary condition in Lemma 3 is also sufficient.

Lemma 4 (sufficient maximum principle). Let (H1)–(H3) hold. Suppose that is an admissible control, and is the corresponding state, with , . Let be the adjoint processes. Suppose that the Hamiltonian function is convex in . Then is an optimal control for Problem (RSOCP) if it satisfies (19).

Proof. For any with the corresponding state . By Remark 1, we have for fixed , Applying Itô’s formula to , noting (14), (17), and , , we get Since is convex in , then by the maximum condition (19), Thus is really the optimal control for Problem (RSOCP). The proof is complete.

3. Relationship between Maximum Principle and Dynamic Programming

In this section, we investigate the relationship between the maximum principle and dynamic programming. That is, the connection between the value function , the generalized Hamiltonian function , and adjoint processes , , . Our main result is the following.

Theorem 5. Let (H1)–(H3) hold and let be fixed. Suppose that is an optimal control for Problem (RSOCP), and is the corresponding optimal state. Let be the adjoint processes. If , then a.e. , -a.s. Furthermore, if and is also continuous, then where

Proof. Obviously (25) can be obtained via solving the forward SDE in (17) directly. Now let us prove (24). By [15, Theorem 5.4 of Peng], for fixed , it is easy to obtain Define a square-integrable -martingale Thus, by the martingale representation theorem (see Yong and Zhou [9]), there exists a unique satisfying where is fixed. Then On the other hand, applying Itô’s formula to , we obtain Comparing the above two equalities, we conclude that However, by the uniqueness of solution to BSDE (3), we have
Since , it satisfies the generalized HJB equation (7), which implies (23). Also, by (7), we have Consequently, if and is also continuous, then This is equivalent to (recall (8)), for all , where with .
On the other hand, applying Itô’s formula to , we get Note that . Applying again Itô’s formula to , we have Hence, by the uniqueness of the solutions to the adjoint equation (17), we obtain (24). The proof is complete.

4. Applications to Financial Portfolio Optimization

In this section, we consider an LQ recursive utility portfolio optimization problem in the financial engineering. In this problem, the optimal portfolio in the state feedback form is obtained by both maximum principle and dynamic programming approaches, and the relations which we obtained in Theorem 5 are illustrated explicitly.

Suppose the investors have two kinds of securities in the market for possible investment choice.(i)A risk-free security (e.g., a bond), where the price at time is given by here is a bounded deterministic function.(ii)A risky security (e.g., a stock), where the price at time is given by here is a 1-dimensional Brownian motion and are bounded deterministic functions with .

Let denote the total market value of the investor’s wealth invested in the risky security which we call portfolio. Given the initial wealth , combining (39) and (40), we can get the following wealth dynamics: We denote by the set of admissible portfolios valued in .

For any given initial wealth , Kohlmann and Zhou [28] discussed a mean-variance portfolio optimization problem. That is, the investor’s object is to find an admissible portfolio which minimizes the variance Var at some future time under the condition that for some given . Using the Lagrange multiplier method, we know that it is equivalent to study the following problem: where some is given. Using the completion of squares technique, the study of [28] obtained an optimal portfolio in the state feedback form by some stochastic Riccati equation and BSDE. The optimal value function was also obtained.

In this paper, we generalize the above mean-variance portfolio optimization problem to a recursive utility portfolio optimization problem. The recursive utility means that the utility at time is a function of the future utility (in this section, we do not consider the consumption). In fact, in our framework, the recursive utility can be assumed to satisfy some controlled BSDE.

We consider a small investor, endowed with initial wealth , who chooses at each time his/her portfolio . The investor wants to choose an optimal portfolio to maximize the following recursive utility functional with generator: where constant .

Remark 6. In fact, the recursive utility functional (43) defined above stands for some standard additive utility of recursive type. It is a meaningful and nontrivial generalization of the classical standard additive utility and has many applications in mathematical economics and mathematical finance. For more details about utility functions, see Duffie and Epstein [2], Section 1.4 of El Karoui et al. [3] or Schroder and Skiadas [4].
More precisely, for any , the investor’s utility functional is defined by where
In fact, in our framework, the wealth process and recursive utility process can be regarded as the solution to the following controlled FBSDE: and our portfolio optimization problem can be rewritten as (denoting )
Since we are going to involve the dynamic programming in treating the above problem, we will adopt the formulation as in Section 2. Let be given. For any , consider the following controlled FBSDE: And our recursive utility portfolio optimization problem is to find an optimal portfolio to minimize the recursive utility functional . We define the value function as
We can check that all the assumptions in Section 2 are satisfied. Then we can use both the dynamic programming (Lemma 2) and maximum principle (Lemmas 3 and 4) approaches to solve the above problem (49).

4.1. Maximum Principle Approach

In this case, the Hamiltonian function (18) reduces to and the adjoint equation (17) writes Noting that in this case the adjoint process reduced to a deterministic function because our generator does not contain the process . We immediately have Due to the terminal condition of (51), we try a process of the form where , are deterministic differentiable functions. Applying Itô’s formula to (53), we have Comparing (51) with (54), noting (52) and (53), we get

Let be a candidate optimal portfolio, the corresponding solution to controlled FBSDE (48) with corresponding solution to the adjoint equation (51). Now the Hamiltonian function (50) is Since this is a linear expression of , by the maximum condition (19), we have Substituting (52), (53) and (56) into (58), we can get

On the other hand, (55) gives

Combining (59) and (60) (noting the terminal condition in (51)), we get The solutions to these equations are With this choice of and , the processes satisfying the adjoint equation (51) with given by (59). With this choice of , the maximum condition (19) of Lemma 3 holds. Moreover, we can check that all conditions in Lemma 4 hold, then given by (59) is really the optimal control.

Finally, let , then we can solve the initial problem (47) and give the explicit optimal portfolio in the state feedback form.

Theorem 7. The optimal solution of our recursive utility portfolio optimization problem (47), when the wealth dynamics obeys (41), is given in the state feedback form by for , where , are given by (63) and (64), respectively.

4.2. Dynamic Programming Approach

In this case, the value function should satisfy the following generalized HJB equation: where the generalized Hamiltonian function (8) is

We conjecture that is quadratic in , namely, for some deterministic differentiable functions , , and with Substituting (69) into (68) and using completion of squares repeatedly, we get provided that for all , which we will prove later. Then we see that the optimal state feedback portfolio is given by In addition, the generalized HJB equation (67) now reads Then noting (70), by comparing the quadratic terms and linear terms in , we recover (61) and (62), respectively. That is to say, we have that coincides with and coincides with . Then by (63), we have , for all as expected before. And also we have The solution to (74) is Then the value function where , , and are determined by (61), (62), and (75), respectively. By Lemma 2, we have proved the following

Theorem 8. The optimal solution for our recursive utility portfolio optimization problem (47), when the wealth dynamics obeys (41), is given in the state feedback form by for , and the value function is given by (76), where , , and are determined by (61), (62), and (75), respectively.

4.3. Relationship

We now can explicitly illustrate the relationships in Theorem 5. In fact, relationship (23) is obvious from (67). And (65) is exactly the relationship given in (25) and (24).

5. Concluding Remarks

In this paper, we have studied the relationship between maximum principle and dynamic programming for stochastic recursive optimal control problems. Under certain differentiability conditions, we give relations among the adjoint processes, the generalized Hamiltonian function, and the value function. A linear quadratic recursive utility portfolio optimization problem in the financial market is discussed as an explicitly illustrated example of our result.

An interesting and challenging problem remains open. For the stochastic recursive optimal control problem, what is the relationship between maximum principle and dynamic programming without the illusory differentiability conditions on the value function? This problem may be solved in the framework of nonsmooth analysis. Viscosity solution theory is certainly a nice tool (e.g., see Yong and Zhou [9]). A new result on stochastic verification theorem for forward-backward controlled system using viscosity solutions has been published very recently by Zhang [29]. However, at this moment, we do not have publishable results for the relationship within the framework of viscosity solutions. We hope to address this problem in the future work.

Acknowledgments

The authors would like to thank the anonymous referees for many constructive comments that led to an improved version of the paper. The authors also thank the Academic Editor for his efficient handling of this paper. Finally, many thanks are devoted to Dr. Qingxin Meng for his suggesting discussion during the revising progress. This work is supported by China Postdoctoral Science Foundation Funded Project (No. 20100481278), Postdoctoral Innovation Foundation Funded Project of Shandong Province (No. 201002026), the National Natural Sciences Foundations of China (No. 11201264 and 11101242) and Shandong Province (No. ZR2011AQ012, ZR2010AQ004), and the Scientific Research Foundation for the Returned Overseas Chinese Scholars, State Education Ministry, China.