Control Problems of Nonlinear Systems with Applications 2021View this Special Issue
A BSDE Approach to Stochastic Differential Games with Regime Switching
In this paper, we study a two-player zero-sum stochastic differential game with regime switching in the framework of forward-backward stochastic differential equations on a finite time horizon. By means of backward stochastic differential equation methods, in particular that of the notion from stochastic backward semigroups, we prove a dynamic programming principle for both the upper and the lower value functions of the game. Based on the dynamic programming principle, the upper and the lower value functions are shown to be the unique viscosity solutions of the associated upper and lower Hamilton–Jacobi–Bellman–Isaacs equations.
The differential game is concerned with the problem that multiple players make decisions, according to their own advantages and trade-off with other partners in a dynamic system. Stochastic differential games (SDGs) have been well studied. Recently, Lv  studied the two-player zero-sum SDGs in a regime switching model with an infinite horizon. Compared with the traditional diffusion model, the regime switching model has two obvious advantages. First, the underlying Markov chain can be used to model discrete events with larger long-term system impact. For instance, in financial markets, it is easy to capture market trend by using finite state Markov chain. However, it is difficult to incorporate this dynamic into pure diffusion model. Second, when conducting numerical experiments, regime switching models require very limited data input. In recent years, due to the capacity for characterizing all kinds of random events and the tractability, regime switching models have attracted extensive attention [1–3]. In this paper, we introduce a new method, which is different from the method in . We investigate two-player zero-sum SDGs with regime switching on a finite time horizon by using the backward stochastic differential equation (BSDE) methods.
Pardoux and Peng  first introduced the nonlinear BSDEs in 1990. The theory of BSDE was originally developed by Peng  for stochastic control theory. And later Hamadène and Lepeltier  and Hamadène et al.  introduced this theory to SDGs. Buckdahn and Li  studied a recursive SDG problem and interpreted the relationship between the controlled system and the Hamilton–Jacobi–Bellman–Isaacs (HJBI) equation. The theory of BSDEs has been well studied and applied to many fields, such as stochastic control, SDGs, mathematical finance, and partial differential equation theory (see [5–7, 9–11] for details). The readers interested in other topics about game theory are referred to [12–15].
In this paper, let be a fixed probability space on which a d-dimensional Brownian motion and a Markov chain are defined on some sample space , is the completed Borel -algebra over , and is the Wiener measure. Here, we assume that , where . Let denote the filtration generated by Brownian motion . And denote the filtration generated by the Markov chain . Assume that and are independent. The Markov chain takes values in a finite state space and is observable. And the generator of the Markov chain is given bywhere is the transition rate from market regime to , , and , for every .
We will investigate a two-player zero-sum SDG with regime switching in the framework of BSDE on a finite time horizon. The dynamics of the SDG are described by the following functional stochastic differential equation (SDE): for ,where is a fixed finite time horizon, is regarded as the initial state, and . is the value of at time , and are the pair of -adapted processes, take their values in some compact metric spaces and , and are called admissible controls of the two players I and II, respectively. Precise assumptions on the coefficients and are given in the next section.
The cost functional is introduced by BSDE:where and are introduced in (2). The above BSDE has a unique solution . And for given control processes and , we introduce the associated cost functional:where is defined by BSDE (3). In the game, player I aims to maximize (4) and contrarily player II aims to minimize (4). We define the lower and the upper value functions and , respectively:
Precise definitions of and are given in the next section. In the case we say that the game admits a value. The main objective of this paper is to show that and are, respectively, the unique viscosity solutions of the following lower and upper HJBI equations, and both are systems consisting of m coupled equations:associated withwhere is defined asfor . If Isaacs’ condition holds, i.e., , then (5) and (6) coincide, and the uniqueness of viscosity solution implies , that is, the game admits a value.
The paper is organized as follows. In Section 2, we introduce some notations and preliminaries, which will be needed in what follows. In Section 3, we introduce the dynamic programming principle. In Section 4, based on the dynamic programming principle, we investigate that the upper and the lower value functions are the unique viscosity solutions of the associated upper and lower HJBI equations.
Let us introduce the following spaces, which will be needed in what follows.
We consider the BSDE with data :
Here is such that, for any , is -progressively measurable. We make the following assumptions: (A1) There exists a positive constant such that for all , (A2) .
Lemma 1. Let assumptions (A1) and (A2) hold; then, for any random variable , BSDE (12) has a unique solution:
We give the comparison theorem for solutions of BSDEs.
Lemma 2. (comparison theorem). Let and and satisfy (A1) and (A2). We denote by and the solutions of BSDEs with data and , respectively, and we suppose that(i).(ii).
Then, we have ., for all . Moreover, if , then , and in particular, .
With the notations in the above lemma, we assume that, for some satisfying (A1) and (A2), the drivers have the following form:
Then, we have the following lemma.
Lemma 3. The difference of the solutions and of BSDE (12) with the data and , respectively, satisfies the following estimate:where and is the Lipschitz constant in (A1).
We now consider the assumptions on the coefficients and . The coefficients and are two given functions. The mappings and satisfy the following conditions: (A3)(i)For every fixed , and are continuous with respect to .(ii)For any , and , there exists a positive constant such that
From (A3), we can get the global linear growth conditions of and , i.e., the existence of some such that, for all , , , , ,
Suppose the above assumptions hold; for any and , control system (2) has a unique solution . And we have the following estimates.
Lemma 4. Under the assumptions of the mappings and , there exists a positive constant such that, for any , and ,
Suppose that the two functions and the terminal cost satisfy the following conditions: (A4)(i)For any fixed , is continuous with respect to .(ii)There exists such that, for all , , , , , and ,(iii)There is a constant such that, for all , ,
Under the above conditions, (3) has a unique solution . And we have the following estimates.
Lemma 5. For all , , , and , there exists a constant such that,For the proof of this lemma, the readers can refer to .
Now, we introduce the admissible controls and admissible strategies. Let , be two deterministic times, and .
Definition 1. An admissible control process (resp., ) for player (resp., player ) on is a process taking values in (resp., ), progressively measurable with respect to the filtration , where is the filtration generated by and .
The set of all admissible controls for player (resp., player ) on time is denoted by (resp., ).
Definition 2. A nonanticipative strategy for player on is a mapping such that, for any -stopping time and any , if on (with the notation ). In the same way, we define a nonanticipative strategy for player on .
The set of all nonanticipative strategies for player (resp., player ) on is denoted by (resp., ).
Now we give some properties about the lower and the upper value functions and . The following lemma was established in , and the situation was slightly different. For the proof of this lemma, the readers can refer to .
Lemma 6. Under the assumptions (A3) and (A4), for all , the value functions and are deterministic functions.
Lemma 7. Under the assumptions (A3) and (A4), for all and , we have(i)(ii) is -Hölder continuous with respect to :(iii)
The same properties hold true for the function .
3. Dynamic Programming Principles
The dynamic programming principle is one of the principal and most commonly used methods to solve the optimal control problem. In this section, we present the dynamic programming principle for a two-player zero-sum SDG with regime switching in the framework of BSDE on a finite time horizon. It will be used in the next section.
We first introduce the backward stochastic semigroup. For given initial state , a positive number , for admissible control processes and , and a real value random variable , we definewhere is the solution of the following BSDE with terminal time :and is the solution of SDE (2). According to the uniqueness of the solution of the BSDE, we observe that for the solution of BSDE (3), we have
We now introduce the dynamic programming principle for the value functions of SDGs with regime switching.
Proposition 1. Under the assumptions (A3) and (A4), the following dynamic programming principle holds: for all , , ,
We proceed with the proof that coincides with into the following steps. Step 1. Let be arbitrarily fixed. Then, given a , we define as follows the restriction of to : where extends to an element of . Obviously, . And, from the nonanticipative property of we deduce that is independent of the special choice of . Thus, from the definition of , and we use the notation for some sequences such that, Let and set . Construct . Certainly, forms an -partition, and . Moreover, from the nonanticipativity of , we have . According to the existence and uniqueness of the BSDEs, it follows that for . Hence, We now focus on the interval . Because does not depend on , we can define , for any . From , we know that belongs to . Thus, from the definition of , for any , From Lemmas 5 and 7, there exists a constant such that, for any , , We can show by approximating that To estimate the right side of the latter inequality, we note that there exists some sequence such that Let and set . Construct . Certainly, forms an -partition; moreover, . Therefore, from the nonanticipativity of , we have , and from the definition of , we know that . According the existence and uniqueness of our BSDE, it follows that Therefore, where . From (35) and (41), Since has been arbitrarily chosen, we have (42) for all . Thus, Then, letting , we get . Step 2. We now deal with the other case: . From the definition of , we havefor some such that,
For any , we put , , and . Certainly, forms an -partition; moreover, . According the existence and uniqueness of our BSDE, we conclude thatfor all . Next,for all .
We now focus on the interval . From the definition of , we deduce that, for any , there exists such that