#### Abstract

This paper is concerned with one kind of delayed stochastic linear-quadratic optimal control problems with state constraints. The control domain is not necessarily convex and the control variable does not enter the diffusion coefficient. Necessary conditions in the form of maximum principle as well as sufficient conditions are established.

#### 1. Introduction

In the classical case, many random phenomena are described by stochastic differential equations (SDEs), such as the evolution of the stock prices. However, there also exist many phenomena which are characteristic of past dependence; that is, their present value depends not only on the present situation but also on the past history. Such models may be identified as stochastic differential delay equations (SDDEs). SDDEs have a wide range of applications in physics, biology, engineering, economics, and finance. See [1–4] and the references therein.

A stochastic control system whose state function is described by the solution of an SDDE is called a delayed stochastic system. This kind of stochastic control problem appears widely in different research fields; see, for example, [3, 5]. It is worth pointing out that the delayed responses make it more difficult to deal with the system, not only for the infinite dimensional problem, but also for the absence of Itô's formula to deal with the delayed part of the trajectory.

One fundamental research direction for stochastic optimal control problems is to establish necessary optimality conditions—Pontryagin maximum principle. By the duality between linear SDEs and backward stochastic differential equations (BSDEs), stochastic maximum principle for forward, backward, and forward-backward systems has been studied by many authors, including Peng [6, 7], Wu [8, 9], Xu [10], and Yong [11]. Recently, Peng and Yang [12] introduced a new type of BSDEs called anticipated BSDEs of the following form: in which the coefficient contains not only the values of solutions of present but also those of the future. A duality between linear SDDEs and anticipated BSDEs was established in [12], which gave a new way to study the maximum principle for delayed stochastic control problems. Along this line, [13] studied the maximum principle for delayed stochastic optimal control problems in which the control domain is assumed to be convex and both the control variable and its delay part enter the diffusion coefficient. After that, [14] studied the optimal control problem in which the control system is described by a fully coupled anticipated forward-backward stochastic differential delayed equation, and then [15] generalized [13] to the case when the system involves both continuous and impulse controls and the coefficients are random.

In practice, sometimes state constraints are inevitably encountered in stochastic optimal control problems; see, for example, [6, 10, 16, 17]. However, little attention was paid to the study of delayed stochastic control problem with state constraints by means of anticipated BSDEs.

It is well known that the linear-quadratic (LQ) optimal control problem is an extremely important class of optimal control problems; it can model many problems in applications and many nonlinear control problems can be reasonably approximated by the LQ problems. This paper is concerned with a delayed stochastic LQ optimal control problem, in which the control system evolves by a linear SDDE and the cost functional has a quadratic criterion. We assume that the control domain is not necessarily convex and the control variable does not enter the diffusion coefficient. The coefficients may be random, and the delays enter both the state and the control variables. Besides, the terminal value of the state process is imposed to satisfy the following constraint: . Making use of Ekeland's variational principle and the duality between linear SDDEs and anticipated BSDEs, we establish necessary optimality conditions of the maximum principle type. Sufficient optimality conditions are also presented, which helps find optimal controls.

Firstly, this paper involves many derivation details which were omitted in most existing literature. Secondly, when , the state constraint disappears and the results in this paper degenerate to the corresponding ones without state constraints. Besides, in [6, 10], the function is assumed to have linear growth, while is allowed to have quadratic growth in this paper. Thirdly, we can study unbounded control domain case. However, it is worth pointing out that when we apply Ekeland's variational principle to deal with the case when there are state constraints, we need the continuity of the state process and the lower semicontinuity of a penalty functional in the control variable , which is impossible to prove when the control domain is unbounded. To overcome this difficulty, we adopt a convergence technique inspired by Tang and Li [16]. To be precise, we first study the optimal control problem with bounded control domain, and then extend the results to the case with unbounded control domain using a convergence technique. This method was also used in [9].

In the classical LQ optimal control problem, a state feedback form of the optimal control can be obtained by virtue of the Riccati equations; for stochastic LQ problems with delays, see [18, 19]. On the one hand, we make use of the maximum principle method in this paper to investigate necessary conditions satisfied by the optimal control, which is different from the method of Riccati equations. Secondly, the study of LQ problems via Riccati equations is mostly carried on under the assumption that the admissible control can take values on the whole space, while we can study bounded control domain case as well as nonconvex control domain case in this paper.

The organization of our paper is as follows. In Section 2, we give the formulation of the problem. Section 3 is devoted to the study of the maximum principle when the control domain is bounded. In Section 4, we prove the maximum principle as well as the sufficient optimality condition for general control domain case.

#### 2. Formulation of the Problem

Let () be a probability space and the expectation with respect to . is a one-dimensional standard Brownian motion, and is its natural filtration augmented with the -null sets of . Let us denote by the set of real-valued -measurable random variables ’s such that . For , we denote by the set of one-dimensional progressively measurable processes such that , and by the set of one-dimensional progressively measurable processes such that .

In this paper, we only consider one-dimensional case for simplicity, and the results can be extended to multidimensional case without difficulty. Throughout this paper, we use and to represent positive constants which can be different from line to line.

Assume that is a positive constant, and and are two nonnegative constants. Let be a nonempty set in . We denote by the set of feasible controls, which is the collection of progressively measurable processes satisfying The control system considered in this paper evolves by the following linear SDDE: where , . We assume that is continuous. The coefficients , , are bounded progressively measurable processes, which are assumed to vanish outside .

Let us mention that the initial path is independent of the control , since can affect only for . It is easy to check that SDDE (3) admits a unique solution for any (for this, one can see Theorem 2.2 in [13] or Theorem 2.1 in [15]).

In addition, we require that the state process satisfies the following constraint: where satisfies that is -measurable for all , , and is continuously differentiable with . Under these assumptions, has a quadratic growth: .

If also satisfies the state constraint (4), then is called an admissible control. The set of admissible controls is denoted by .

The cost functional is given as follows: where , . We assume that is a nonnegative bounded -measurable random variable, and the time-varying coefficients , , are nonnegative bounded progressively measurable processes which are assumed to vanish outside . It is easy to see that the functional is well defined on .

The objective of the optimal control problem is to minimize over . An admissible control is called optimal if it satisfies . We use to denote the optimal trajectory.

Let us define a metric on by where is an indicator function, that is, , if holds, and otherwise. It is well known that is a complete metric space.

We will need the following Ekeland's variational principle.

Lemma 1. *Let be a complete metric space and lower semicontinuous and bounded from below. Assume that satisfies for some . Then for any , there exists such that , , and for any .*

#### 3. Maximum Principle in the Case When Is Bounded in

In this section, we only consider the case when the control domain is a bounded set in . Let us denote by the trajectory corresponding to .

The following results will play a crucial role in this section.

Lemma 2. *There exists , such that for any , , it holds that
*

*Proof. *Recall that the coefficients are bounded and the control domain is bounded. Let us first prove (7). By the basic inequality, the Cauchy-Schwartz inequality, and the BDG inequality we have, for ,
Then by a change of variables we get
So, (7) can be obtained by the Gronwall inequality. Then result (8) is obvious. Next, let us prove (9). Denote , . In the classical way, we have
Using a change of variables gives
Then applying the Gronwall inequality leads to (9). Finally we prove (10). Firstly, since
by (7) and (9), applying the Cauchy-Schwartz inequality gives
Next, since , using the Cauchy-Schwartz inequality gives
In the same way, we can use a change of variables to get
Thus, (10) can be obtained.

Let us define the following: for , where is small enough. Let us mention that the functional is defined on the feasible control set , rather than just the admissible control set . In other words, we are able to get rid of the sate constraint by introducing such a penalty functional. It is obvious that and for any . Thus, . The following lemma shows that is continuous.

Lemma 3. *There exists such that holds for any , .*

*Proof. *Since for , we have
where
We first consider . On the one hand, by the growth condition of , we can use (7) to get . On the other hand, since
by (7) and (9), we can use the Cauchy-Schwartz inequality to get
Thus,
Next, from (8) it follows that , so by (10) we have
Thus . So .

Now applying Lemma 1 leads to the existence of such that In what follows, let us first derive the necessary conditions for and then take to get proper conditions for .

For any and , let us define where is small enough such that we can always assume that . It is obvious that . Let us point out that we cannot get even if . This also shows why the functional is defined on rather than on. It is easy to see that Then, by (27),

Let , be the trajectories corresponding to , , respectively. We introduce the following variational equation: It is easy to check that this equation admits a unique solution . Moreover, we have the following.

Lemma 4. *There exists , which is independent of , such that
*

*Proof. *By the basic inequality, the Cauchy-Schwartz inequality, and the BDG inequality, we can use a change of variables to get, for ,
By the definition of , we have
Then by (29), applying the Cauchy-Schwartz inequality gives
Finally, the result can be derived by the Gronwall inequality applied to (33).

Let us denote . Then it's easy to check that satisfies By the existence and uniqueness of the solution for this equation, we have , a.s., a.e. That is, for a.e. , —a.s.,

Lemma 5. *We have
**
where is defined by
*

*Proof. *It is obvious that , where
Firstly, , where
On the one hand, since
by (7), (32), and (37), we can use the Cauchy-Schwartz inequality to get . On the other hand, we can use the Cauchy-Schwartz inequality and the dominated convergence theorem to derive . Thus
Next we consider . On the one hand, by (32) and (37) it’s easy to check that
On the other hand, by (29) and (32) we can use the Cauchy-Schwartz inequality to derive , and thus
So, from the fact that and
we have , and thus
The proof is complete.

Let us introduce the following Hamiltonian: The following is the maximum principle for the delayed stochastic LQ control problem with bounded control domain.

Theorem 6. *Assume that is a bounded set in . Then for the optimal control , there exist satisfying
**
and the solution of the following adjoint equation:
**
such that
**
where is defined by
**
with .*

*Proof. *From (29) it follows that as . So by Lemma 3 we have as . By Lemma 5 and (30), we have
where
Besides, it's easy to check that . Therefore, there exists a subsequence, still denoted by , such that
for some , with

Let us introduce the following equation:
It is easy to check that this equation admits a unique solution which belongs to . Applying Itô's formula to and then taking expectations, we can use a change of variables to get
Then, by (53) and the definition of we have
Let be the solution of
Then, by subdividing the time interval , we can use (55) to derive
as . Consequently, considering the arbitrariness of , dividing (59) by , and then taking lead to

Now let us take . On the one hand, by (56), there exists a subsequence of , which converges to , and (49) holds. On the other hand, from (26) it follows that as , so we can use (9) to obtain
as . Consequently, we can check that
as , where is the solution of the adjoint equation (50). Let us assume without loss of generality that for . Consequently, we can take in (62) to get
In order to obtain (65), we only need to prove that the terms on the left-hand side of (62) converge to the corresponding ones in (65) along a subsequence. For this, we first prove
as . In fact, since
by the Cauchy-Schwartz inequality we have
as . With the same method and by a change of variables we can also prove
Next, since
we have
In a similar way, we can use a change of variables to get
Thus, we can derive (65).

Let us recall that is arbitrarily chosen, so result (65) holds for all . Next, we drop the expectation in (65). For any and , let us define . It is obvious that the defined is an element of . Applying this in (65) gives
Since is arbitrarily chosen, it implies
which leads to
which is just the conclusion of (51).

#### 4. Maximum Principle for General Control Domain

In this section, we study the maximum principle in the case when can be unbounded in . This case can be treated via the bounded case in Section 3 with a convergence technique.

Let us define , . Then is a bounded set in for fixed . Besides, We denote by the set of progressively measurable processes satisfying and by the collection of satisfying the state constraint (4). Then from (76) it follows that

Since , by (77) there exists such that for . Thus, is still optimal when the original admissible control set is replaced by for . So, by Theorem 6, for , there exist , satisfying and the solution of the following adjoint equation: such that By (78), there exists a subsequence of , still denoted by , such that for some , with Then, by (81) it's easy to check that as , where is the solution of Let us assume without loss of generality that for .

For any fixed , from (76) it follows that there exists such that for , and consequently we see from (80) that Then, similar to the proof of (65), by (81) and (83), taking along a subsequence in (85) leads to Note that the above inequality holds for all , and therefore we have the main result of this section.

Theorem 7. *For the optimal control , there exist satisfying (82) and the solution of the adjoint equation (84) such that
**
where is defined by
**
with .*

In what follows, let us investigate under what condition an admissible control turns out to be optimal. To this end, let us assume that for all and , where is a given function satisfying .

Let us assume that (H) and is convex or and is concave.

Theorem 8. *Assume (H). Assume that is an admissible control and is the corresponding trajectory. Let , satisfy (82), and satisfy (84). Then is an optimal control if it satisfies (87).*

*Proof. *Let us denote for . Applying Itô's formula to for and then using a change of variables lead to
On the one hand, from (87) it follows that
On the other hand, since , , and are nonnegative, by the property of convex functions we have
Thus, it follows that , so
Let us recall that holds for . Then, by (H), (92) leads to
which gives .

*Remark 9. *When , namely, there is no state constraint for , we have . In this case, Theorems 6 and 7 degenerate to the maximum principle for stochastic LQ problem with delays and without state constraints. When