#### Abstract

With the robust control framework of Hansen and Sargent (2001), this paper investigates a stochastic differential game of transboundary pollution between two regions under Knightian uncertainty of stock dynamics. Both regions are assumed to play a noncooperative and a cooperative game, and the worst-case pollution accumulation processes for discrete robustness parameters are characterized. Our objective is to identify both regions’ optimal output and emission levels and analyze the effects of the Knightian uncertainty of pollution stock dynamics on both regions’ optimization behavior. We illustrate the results with some numerical examples.

#### 1. Introduction

Since the last century, the pollution problem is becoming ever more serious with the rapidly developing industrialization. In particular, many kinds of pollutants can spread incredible distances meaning that it is not contained within the boundaries of any single region or nation, so, the “transboundary pollution” has become a problem across regions and even borders. In each region those who suffer from the pollution wishes that the polluter in neighboring regions would either reduce polluting or compensate for the damages and there is generally a game between the sufferers and the polluter around pollution abatement and benefit compensation [1].

In recent years, transboundary pollution problems have raised increasing interest among academic circles and policy makers. Among them, in a dynamic framework, Breton et al. (2010) built a model to analyze how countries join international environmental agreements (IEAs). With a dynamic control model, Li [2] studies the outcome of a pollution game between two neighboring countries. Yeung [3] and Jørgensen et al. (2010) analyze the transboundary pollution problems considering the regional government and industrial firms, simultaneously. Huang et al. [4] develop a model in which there is a Stackelberg game between the industrial firms and their local government while governments can cooperate in transboundary pollution control. Yeung and Petrosyan [1] and Yi et al. [5] develop a cooperative stochastic differential game model to analyze transboundary industrial pollution control, in which the uncertain dynamics of pollution are taken into account.

In this paper, we extend Yeung and Petrosyan [1] and Yi et al.’s [5] model to an even more general model. In the models of Yeung and Petrosyan [1] and Yi et al. [5], the uncertain stock dynamics are actually a risk, where the probabilistic structure of pollution stock evolving can be fully captured by a single Bayesian before. However, in our model, the ‘‘uncertainty’’ of evolving stock is seen as a broad term. This ‘‘uncertainty’’ means an inability to posit precise probabilistic structure of stock dynamics. This stems from the concept of uncertainty as introduced by Knight [6] to represent a situation where a decision-maker lacks adequate information to assign probabilities to events. Knight [6] contends that this deeper kind of uncertainty is quite common in economic decision-making and thus deserving of systematic study. Along the way of Knight [6], an axiomatic foundation of maximum expected that utility is established by Gilboa and Schmeidler [7]. They believe that it is sensible, when the underlying uncertainty of an economic system is not well understood and axiomatically compelling, to optimize the worst-case outcome. Klibanoff et al. (2009) built a ‘‘smooth ambiguity’’ model, in this model, different degrees of aversion for uncertainty are explicitly parameterized in agents’ preferences. Hansen and Sargent [8] and Hansen et al. [9] extend Gilboa and Schmeidler’s insight to continuous-time dynamic optimization problems, introducing the concept of robust control to economic environments. They show how standard dynamic programming techniques can be changed to yield robust solutions to problems in which the underlying stochastic nature of the model is not perfectly known. Using the framework of Hansen and Sargent [8], Weitzman (2009) examines the effects of global warming and Athanassoglou and Xepapadeas [10] investigate the problem of pollution controlling with uncertain stock dynamics. In this paper, the framework of Hansen and Sargent [8] is applied to analyze the noncooperative and cooperative optimal emission levels of two neighboring regions and compare and characterize the worst-case pollution accumulation processes for discrete robustness parameter. Further, we consider the problems of the cooperation revenues allocation.

The paper is organized as follows. Section 2 provides the basic model. In Section 3, we characterize the noncooperative outcomes. Cooperative arrangements and individual rationality are analyzed in Section 4. We illustrate the results of a numerical example in Section 5. Section 6 is the results summarizing the paper.

#### 2. The Basic Model

Consider a multinational economy, which is comprised of two regions. Following Saltari et al. [11], the output of region () at time () is a function of emissions . That is,where denotes the technology of region , which is given, and a concave function.

Utility is given by , where is a damage function of pollution stock , andwhere is a utility parameter of and () denotes the damage parameter of the pollution stock which evolves in accordance with the following linear differential equation:where denotes the environment’s self-cleaning capacity.

Risk is introduced to the standard model so that the stock of the pollutant accumulates according to the diffusion process:where is a Brownian motion on an underlying probability space and is the instantaneous standard deviation. If in a world without uncertainty, Region ’s objective is to maximize welfare:where denotes discount rate and we assume the two regions have the same discount rate. Optimization problem (6) is referred to as the benchmark model.

If one does not worry about the effects of model misspecification, solving the benchmark problem (6) would be sufficient. However, because there is a Knight Uncertainty, the probabilistic structure implied by stochastic differential equation (5) is distorted and the probability measure is replaced by another . The perturbed model is obtained by performing a change of measure and replacing in (5) bywhere is a Brownian motion and is a measurable drift distortion. Thus, changes to the distribution of are parameterized as drift distortions to a fixed Brownian motion . The measurable process could correspond to any number of misspecified or omitted dynamic effects such as (i) a miscalculation of exogenous sources of emissions, (ii) a miscalculation of the natural pollution decay rate, and (iii) an ignorance of more complex dynamic structure involving irreversibility, feedback, or hysteresis effects. The distortions will be zero when and the two measures and coincide. Pollution dynamics under model misspecification are given by

We give the following equation which is the relative entropy with the discrepancy between the two measures and :where is a entropy constraint. The decision-maker can control the degree of model misspecification by modifying the entropy constraint ; then, region ’s robust control problem over continuous time can be defined aswhere the parameter is a Lagrangian multiplier which associated with entropy constraint . Our choice of lies in an interval , where the lower bound is a breakdown point beyond which it is fruitless to seek more robustness. On the other hand, when or, equivalently, , there are no concerns about model misspecification.

The Bellman-Isaacs condition of the robust control problem (10) can be expressed as

Having got the Bellman-Isaacs condition of the robust control problem, next, we will apply the robust control methods to investigate the cooperative and noncooperative strategy between both regions.

#### 3. Noncooperative Strategy

##### 3.1. Problem Solution

In order to get the noncooperative strategy of both regions gaming for transboundary industrial pollution, first we minimize with respect to of the Bellman-Isaacs condition (11) and by which we obtain

Substituting the results of (12) into (11), differential equations (13a) and (13b) are given as

Investigating differential equations (13a) and (13b), we find that they are actually the HJB equation adding a negative term, and when , the negative term in (13a) and (13b) is also close to zero; then the robust control problems become the benchmark control problems.

Next, let us use Proposition 1 to investigate the noncooperative strategy and identify the game equilibrium results with subscript “”.

Proposition 1. *If both regions play a noncooperative game, their optimal instantaneous emissions can be given as follows:where , , , and are undetermined parameters related to .*

*Proof. *Solving the first-order partial derivative of (13a) and (13b) with respect to and and setting them equal to zero, we obtain (15a) and (15b). Then solving the second-order partial derivative of (13a) and (13b) with respect to and , (15c) and (15d) are given:From (15a) and (15c), seeing that , , this means to choose the emissions levels are optimal for Region 1. In a similar way, from (15b) and (15d), one can find that Region 2 should choose emissions levels to maximum her value. Furthermore, we find from (13a) and (13b) that if there is a game equilibrium, this equilibrium must be in the case , , where and are response functions of Region 1 to Region 2 and Region 2 to Region 1 when both regions play a noncooperative strategy. In other words, the equilibrium must be a Nash equilibrium.

Substituting (15a) and (15b) into (13a) and (13b), respectively, (16a) and (16b) are obtained:In order to solve the value functions (16a) and (16b), we assumewhere , , , , , and are undetermined parameters related to .

Maximizing with respect to in (17a) and (17b), respectively, we obtainSubstituting (18a)-(18d) into (16a) and (16b), the values of undetermined parameters are given: can be determined by the implicit function (19d):Investigating (19a)-(19f), it can be found that the value function is well-defined for and diverges for . Hence the breakpoint is equal to and we from now on only consider .

Connecting (15a), (15c), (18a), and (18b), we get Proposition 1.

This ends the proof.

Investigating the relationship between the value functions and optimal emissions of both regions, we obtain Lemma 2.

Lemma 2. *The value functions of both regions are inversely related to pollutant stocks; that is, , .*

*Proof. **From *(18a) and (18c), one easily obtainThis ends the proof.

##### 3.2. Characterizing the Worst-Case Pollution Accumulation Process

Substituting (18a) and (18c) into (12), we get

Substituting (21a) and (21b) into (9), the pollution dynamics under model misspecification became the following form:

There are two negative effects of model misspecification in (22); one is the additional constant drift term equal to ; this suggests the presence of exogenous sources of pollution beyond those responsible for preindustrial pollution stock ; another is the term of ; this term says that the environment’s self-cleaning capacity has been reduced by an amount.

The region () reacts to this worst-case scenario by adopting an emissions strategy and given by Proposition 1. Therefore, at optimality the worst-case pollution process, , can be given by the following stochastic differential equation:

Substituting (14a), (14b), (21a), and (21b) into (23) reduces towhere and are given by

Next, let us apply Proposition 3 to probe the solutions of the stochastic differential equation (23).

Proposition 3. * *

(i)* has expectation and variance* (ii)* has a stationary distribution:*

From Proposition 3, we see that the expected value and variance of the worst-case pollution levels are decreasing in , and when , we obtain the following functions:

Using Proposition 3, the entropy of the worst-case model misspecification can be given:

#### 4. Cooperative Arrangements

Now consider the case when both regions cooperate in pollution control. To uphold the cooperative scheme, both group rationality and individual rationality are required to be satisfied at any time.

##### 4.1. Group Optimality and Cooperative State Trajectory

To secure group optimality, the participating two regions would seek to maximize their joint expected payoff by solving following stochastic control problem:

The Bellman-Isaacs condition of the robust control problem (32) is given bywhere , .

Minimizing with respect to and of the Bellman-Isaacs condition (33), respectively, we obtain

Substituting the results of (34) into (33), it takes the following form:

Next, we will apply Proposition 4 to investigate the cooperative strategy of both regions and identify the game equilibrium results with subscript “”.

Proposition 4. *If both regions use cooperative strategy, their optimal instantaneous emissions can be given aswhere , , and are undetermined parameters related to .*

*Proof. *Solving the first-order partial derivative of (35) with respect to and and setting it equal to zero, respectively, we getThen identify following second-order partial derivatives: ; ; ; . According to these second-order conditions, one can conclude that and maximize the cooperative value ; in other words, and are optimal for the partners.

Substituting (37a) and (37b) into (35), we getIn order to solve the value functions (38), we make the following assumption:where undetermined parameters , , and can be given byInspecting (40a)-(40c), we find that the value function is well-defined for and diverges for . Hence the breakpoint is equal to and we from now on only consider .

Differentiating (39) with respect to yieldsSubstituting the results of (41) into (37a) and (37b), Proposition 4 is obtained.

This ends the proof.

Investigating the relationship between the value functions and optimal emissions of both regions under cooperative strategy, we obtain the following Lemma.

Lemma 5. *(i) The difference between both regions’ optimal emissions under cooperative strategy is equal to the difference between both regions’ utility parameter; that is, .**(ii) The cooperative total value functions are inversely related to pollutant stocks; that is, .*

*Proof. *(i) From Proposition 4, it is easy to get(ii) Using (41), we obtain .

This ends the proof.

##### 4.2. Characterizing the Cooperative Worst-Case Pollution Accumulation Process

Substituting (41) into (34) yields

Substituting the results of (43) into (9), the pollution dynamics under model misspecification became the following form:

There are two negative effects of model misspecification in (44); one is the additional constant drift term equal to ; this suggests the presence of exogenous sources of pollution beyond those responsible for preindustrial pollution stock ; another is the term of ; this term shows that the environment’s self-cleaning capacity has been reduced by an amount.

Both regions react to this worst-case scenario by adopting an emissions strategy and given by (36a) and (36b), respectively. Therefore, at optimality the worst-case pollution process reduces to

Substituting (36a), (36b), and (43) into (45), one gets the stochastic differential equation (46):where and are given by

Next, we rely on Proposition 6 to show the solutions of the stochastic differential equation (46).

Proposition 6. *The expectation, variance, and a stationary distribution of can be given as*

From Proposition 6, we find that the expected value and variance of the worst-case pollution levels are decreasing in , and when , we obtain following functions:

From Proposition 6, the entropy of the worst-case model misspecification under cooperative strategy can be given:

##### 4.3. Individually Rational and Time-Consistent Imputation and Payment Distribution Mechanism

An agreed upon optimality principle must be sought to allocate the cooperative payoff. In a dynamic framework individual rationality has to be maintained at every instant of time within the cooperative duration along the cooperative trajectory (46). Let denote the set of realizable values of at time generated by (46). The term is used to denote an element in the set . For , let vector denote the solution imputation (payoff under cooperation) over the period to Region given that the state is . Individual rationality along the cooperative trajectory requireswhere denote the payoff to region under noncooperation over the period. Let denote the instantaneous payoff of the cooperative game at time for the cooperative game . We apply Proposition 7 to investigate the time-consistent imputation.

Proposition 7. *An instantaneous payment at time equalingfor and yields a subgame consistent solution to the cooperative game .*

*Proof. *Along the cooperative trajectory , we defineNote thatExpression (57) means that the extension of the solution policy to a situation with a later starting time and along the optimal trajectory remains optimal, so condition in (57) guarantees time consistency of the solution imputations.

Since is continuously differentiable in and , we getwhere and as .

Since , from (58), one can obtainTherefore, (58) can be expressed asThere is a relationship as follows:When , (61) can be expressed asDividing (62) throughout by , with , we haveAppling (57), we obtain , and . Then (63) can be converted to (55). Hence Proposition 7 follows.

Next, we consider time consistent solutions under specific optimality principles. From Yeung and Petrosyan [1], let us use “the principle of equality” to build a payment distribution mechanism under which both regions’ expected gain from cooperation is shared proportionally to the regions’ relative sizes of expected noncooperative payoffs. In accordance with the principle of equality, the imputation scheme has to satisfy the following.

In the game , at time and at time , an imputation in shown as (64) and (65), respectively:where . Appling Proposition 7, we obtain an instantaneous payoff of the cooperative game at time :The specific payment imputation in the game can be given by using the distribution mechanism (66) and the value functions (17a), (17b), and (39) at time , respectively:Having obtained the cooperative and noncooperative strategy and cooperative residual distribution mechanism applied to satisfy individual rational condition, next, some numerical examples are wielded to investigate the results of model analysis.

#### 5. Numerical Examples

In this section, we will apply the solutions of model reached at Sections 3 and 4 to perform a numerical exercise that provides some context for the theoretical results. The parameters used in the numerical examples are presented in Table 1; in particular, we assume the robust control parameter equal to 0.9, 1.9, 2.9, and 3.9, respectively. In theory, the greater the value of robust control parameter , the more cautious the region. We use version 7.0 of the Wolfram Mathematica Matlab to obtain the numerical solutions.

First, through Figures 1 and 2, we investigate the effects of robust control parameter on the and .

From Figures 1 and 2, we find that and are convex and decreasing with time and has significant influence on and . The smaller the value of , the greater the changes in time evolution of and , which reflects that the more cautious the region, the smaller the worst-case misspecification.

Next, the effects of robust control parameter on optimal emission strategy of both regions are shown in Figures 3 and 4.

From Figures 3 and 4, we find that and are concave and decreasing with time and has significant influence on and . The smaller the value of , the greater the change in the time evolution of and which reflects that the more cautious the region, the smaller the amount of optimal emission of both regions.

Next, in Figure 5 the optimal time evolution paths of the pollution stock are shown under different robust control parameters .

Figure 5 shows that optimal pollution stock is concave and decreasing with time and has significant influence on it. The smaller the value of , the greater the change in the time evolution of the optimal pollution stock which reflects that the more cautious the region, the smaller the amount of the optimal pollution stock; that is to say, the precautious participant tends to reduce pollution emissions.

The noncooperative emissions strategy and gaming results have been shown in Figures 3–5. Next, we use Figures 6–8 to carry out numerical analysis of cooperative emissions strategy and gaming results.

Figures 6-7 show that both regions’ optimal emission levels are concave and decreasing with time under cooperation and the optimal emission levels tend to reduce with the increasing of robust control parameters . Figure 8 shows that the optimal pollution stock is convex and decreasing with time .

From Figures 1–8, one can see the effects of robust control parameters on the behavior of Region , , in cooperation and noncooperation fully. In order to contrast the cooperative strategy and noncooperative strategy, we perform several analyses shown in Figures 9–12, in which the robust control parameters are fixed at 0.9.

Figures 9-10 show that, compared to noncooperative strategy, both regions’ optimal emission levels are lower when both regions play a cooperative game.

Figures 11-12 display that at any point in time, compared to noncooperative strategy, both regions earn higher net benefits when they play a cooperative game. This means that the individual rationality condition and group rationality condition are satisfied simultaneously. Therefore, our cooperative residual distribution mechanism given in Section 4 can ensure a subgame-consistent solution.

#### 6. Conclusion

In this paper, we apply the robust control framework of Hansen and Sargent [8] to investigate a stochastic differential game of transboundary pollution between two regions under Knightian uncertainty of stock dynamics. Both regions are presumed to play a Markov Nash equilibrium strategy and a cooperative strategy and the worst-case pollution accumulation processes for discrete robustness parameters are characterized. The results show the following.

(i) Under Knightian uncertainty of stock dynamics, when the regions get more cautious, the model miscalculation reduces and the time path of each region’s optimal emission levels is significantly decreasing.

(ii) Compared to noncooperative strategy, both regions’ optimal emission levels are lower when both regions play a cooperative game.

(iii) Compared to noncooperative strategy, both regions earn higher net benefits when they play a cooperative game.

(iv) Under the mechanism that the expected gain from cooperation is shared proportionally to the regions’ relative sizes of expected noncooperative payoffs, the individual rationality condition and group rationality condition can be satisfied simultaneously which ensures a subgame-consistent solution.

#### Data Availability

The data used to support the findings of this study are included within the article.

#### Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

#### Acknowledgments

The authors would like to acknowledge the financial support from the Hunan Natural Science Foundation, China (Project no. 2018JJ2335) for this paper.