#### Abstract

Backward Stochastic Differential Equation (BSDE) has been well studied and widely applied. The main difference from the Original Stochastic Differential Equation (OSDE) is that the BSDE is designed to depend on a terminal condition, which is a key factor in some financial and ecological circumstances. However, to the best of knowledge, the terminal-dependent statistical inference for such a model has not been explored in the existing literature. This paper is concerned with the statistical inference for the integral form of Forward-Backward Stochastic Differential Equation (FBSDE). The reason why I use its integral form rather than the differential form is that the newly proposed inference procedure inherits the terminal-dependent characteristic. In this paper the FBSDE is first rewritten as a regression version, and then a semiparametric estimation procedure is proposed. Because of the integral form, the newly proposed regression version is more complex than the classical one, and thus the inference methods are somewhat different from those designed for the OSDE. Even so, the statistical properties of the new method are similar to the classical ones. Simulations are conducted to demonstrate finite sample behaviors of the proposed estimators.

#### 1. Introduction

The Backward Stochastic Differential Equation (BSDE) was first presented by Bismut [1] for the linear case and by Pardoux and Peng [2] for the general case. The solution of a BSDE consists of a pair of adapted processes satisfying where is the generator, is the standard Brownian motion, and is the terminal condition. Usually the terminal condition is designed as a random variable with given distribution. If meets certain conditions, the BSDE has a unique solution. The integral form of the BSDE can be expressed as

The study history of the BSDE was relatively short but progressed rapidly. In addition to the interesting mathematical nature, its extensive applications gained more and more attentions; see for example Peng [3], Pardoux and Peng [4], Pardoux and Tang [5], Peng and Wu [6], Ma and Yong [7], and Nualart and Schoutens [8]. Duffie and Epstein [9] used the BSDE to describe the consumer preferences under uncertain economic environment (i.e., the stochastic differential utility). El Karoui and Quenez [10] stated that in financial markets, prices of many important derivative securities could be solved by a certain BSDE. Lin et al. [11] used an extended statistical model to describe an ecological problem. Furthermore, the BSDE is closely related to nonlinear partial differential equation, more generally, the inseparability of nonlinear semigroup or stochastic control problems. Meanwhile, this type of equation appears frequently in mathematical finance as pointed out by Quenez [12]. Recently, Delong [13] introduced the most recent advances in BSDE (including FBSDE) and applied BSDE with jumps to insurance and finance fields.

In terms of the backward equation, within a complete market it serves to characterize the dynamic value of replicating portfolio with a final wealth and a special quantity that depends on the hedging portfolio. Especially when the randomness of of BSDE comes from the state of the forward equation, the corresponding equation is proved to be a Forward-Backward Stochastic Differential Equation (FBSDE), which can be expressed as with satisfying Compared to the Ordinary Stochastic Differential Equation (OSDE) that contains an initial condition, the solution of the FBSDE is affected by the terminal condition . As is well known, there exist a number of parametric and nonparametric methods to deal with estimation and test for the OSDE. However, these methods can not be directly employed to infer the BSDE and FBSDE because the two models are related to a terminal condition.

For the FBSDE defined above, the statistical inference was investigated initially by Su and Lin [14], Chen and Lin [15], and a relevant model which was proposed by Lin et al. [11]. However, they did not take the terminal condition into account in the inference procedure. In the framework of the FBSDE mentioned above, the terminal condition is additional, which is not nested into the equation. Thus, there is an essential difficulty to use the terminal condition to refine the inference procedure. As a result, their methods fail to cover the full problems given in the FBSDE.

As well the FBSDE could turn to the integral form: In this paper I focus only on the integral form because it contains the terminal condition as an additive term of the equation. With such a construction, a terminal-dependent inference could be built. I am concerned with the semiparametric estimation of the FBSDE in this paper. Note that is usually unobservable and can not be completely specified in the financial market. The problems of interest are therefore to give both proper estimations of the generator and the process based on observed data and the terminal condition . As an initial investigation, this paper only considers the model with generator being parametric structure; that is to say, can be written in the form of , where is an unknown parameter vector. Even so, such a simplified form is widely used in financial markets, and, furthermore, the proposed methods can be extended to the other complicated forms.

It is worth mentioning that the key point of the method is the use of the integral equation rather than the differential equation. This change leads to a completely new work among the existing researches. Unlike the forward equation, because of the integral, the cumulative error appears not neglectable; nevertheless, the resultant estimation is still asymptotically unbiased for the condition of mixing dependency of attached. Another difference from the ordinary model is that the generator contains the unobservable process , and then it is necessary to estimate first. After plugging the estimator of into the generator, I could infer generator with the newly proposed methods.

The paper is organized as follows. In Section 2, the FBSDE is first rewritten as a special regression, and, by this representation, the estimation procedure for the FBSDE with linear generator is designed. Next I discuss the asymptotic properties in Section 3. A supplement for the inference of equation is suggested, and an extension for nonlinear model is briefly discussed in Section 4. Simulation study is proposed in Section 5 to illustrate the methods. The proofs of the theorems are presented in Section 6.

#### 2. Terminal-Dependent Semiparametric Estimation for the FBSDE

##### 2.1. Model and Its Statistical Version

I consider the integral form of the standard FBSDE: where is the Brownian motion and is a smooth function. Here the generator is a function of , , and , with being usually unobservable. Furthermore, the adapted process and terminal condition could be indicated as a function of . As is known to everyone, the existence and uniqueness result of the FBSDE have been studied elaborately. This section is intended to represent the FBSDE as a statistical framework and then address the proper estimators of and based on observed data and the terminal condition .

To recast the model (6) as a statistical model, I first examine the property of the last term of the first equation in (6). By the property of Itô integral and the relation between the two equations in (6), I have Then I regard as error and consequently rewrite the first equation of model (6) as where is the error term with mean zero and bounded variance, and the adapted process and terminal condition depend on via the second equation of (6).

*Remark 1. *It seems that formula (8) proposes a regression that is determined by both expectation and variance frameworks. However, such a regression is quite unlike the classical one. In the newly defined structure, although the expectation of the error is zero, the conditional expectation of the error is nonzero. Even so, the resultant estimation is asymptotically unbiased, and thus the consistency of the estimators defined below still holds because of the condition of mixing dependency of given below; for details see the following theorems and the proofs of the theorems.

Given the initial calendar time point , I record the observed time series data at the equally spaced time points . Denote for and . Note that is the distance between the last observation time and the terminal time ; indeed it may be quite large and then makes the following formula (9) inaccurate. Therefore I first assume small enough, that is, , and then propose an adjustment in Section 4 for the case with larger . On the other hand, since the distribution of is supposed known, I can get the samples for .

In this section I assume can be expressed as linear function , where , , and are unknown parameters. Then the model (8) can be approximately rewritten as where and , satisfying and .

This is the statistical version of (8), a new regression model. It is worth mentioning that the new model (9) is somewhat different from the classical regression; that is, in addition to the mean-variance structure, the new one has a complicated structure and contains a terminal information.

##### 2.2. Semi-Parametric Estimation for the FBSDE

I now turn to estimating unknown parameter vector in model (9). While the generator contains unobservable interesting process , it is necessary to estimate for plugging the estimator into the generator firstly. After that, the common parametric estimation methods can be employed to estimate parameters.

Concerning inference of , despite the connection between and the variance of in (9), the second formula of (9) is related to the weighted sum of , which causes inconvenience for estimating by residual-based method. I now adopt a difference-based method instead.

To this end, consider the FBSDE model (6), motivated by Stanton [16], for the Markov process which follows the SDE, the infinitesimal generator is defined as where the bivariate function satisfies the sufficient smoothing condition [17]. By Taylor's expansion, the condition expectation can be expressed as which implies that when the time increment , the first-order approximation formula for can be given by In addition, we need the following generalized Feynman-Kac formula. Let be a function, and suppose there exists a constant such that, for each , and is the solution of the following system of quasilinear parabolic partial differential equation Then , a.s., where is the unique solution of the FBSDE, based on the result in Pardoux and Peng [4].

Denote by , respectively, for short. By using the Taylor's expansion of , then , and Finally an approximation of could be expressed as that is,

By (19), I regard as point-wise nonparametric regression function. For simplicity, here the N-W kernel estimator is taken as an example of nonparametric smooth estimators: where , is the kernel function satisfying the regularity condition given below and is the bandwidth or smoothing parameter. Similarly, if also depends on besides , the corresponding estimator could be Since having calculated , I plug it in the first formula of (9), obtaining

From the above, it is simple to deduce the estimator of with common parametric methods, the least square method for example, by minimizing For simplicity, denote Finally, I can write the estimator as

#### 3. Asymptotic Results

The following two theorems are concerned with asymptotic properties of the estimators deduced in the previous section.

First of all, I lead in several conditions.(a) are -mixing dependent; namely, the -mixing coefficients satisfy as , where with . (b) (a.s.) uniformly for , where is a positive constant and . (c) The continuous kernel function is symmetric about 0, with a support of interval , and (d) As , where the matrix is nonsingular and satisfies with and being the smallest and largest eigenvalues of , respectively.

The condition (a) is commonly used for the weakly dependent process; see for example Rosenblatt [18, 19], Kolmogorov and Rozanov [20], Bradley and Bryc [21], Lin and Lu [22], and Su and Lin [14]. The condition (b) is also reasonable because, as is shown by (19), can be regarded as the deviation between the adjacent two observations. The condition (c) is standard for nonparametric kernel function, and the condition (d) is obviously common because it describes the property of average. Furthermore, as remarked in the previous section, to express the estimator related to rather than model variables and , I apply conditions mainly on the latent variable , including the stationary -mixing Markov character used in the following theorems. Actually the process may be unstationary.

Theorem 2. *Besides the conditions (a), (b), and (c), suppose that is a stationary -mixing Markov process with the -mixing coefficients satisfying for and has a common probability density satisfying , . Furthermore, functions and have continuous two derivatives in a neighborhood of . As , if , and , then
*

The proof is presented in Section 6. The asymptotic result in the theorem is standard for nonparametric kernel estimator, and here undersmoothing is used to eliminate asymptotic bias.

Theorem 3. *In addition to the condition of Theorem 2, if the condition (d) holds, then as ,
**
where . *

The proof is also presented in Section 6. The result is eventually standard in the sense of asymptotic normality with the convergence rate of order . As was shown in the remark given in the previous section, even the conditional mean of error of the model is nonzero, the newly proposed estimation is consistency because of the mixing dependency; for details see the proof of Theorem 3. Furthermore, because of the terminal condition, the asymptotic variance is larger than that without use of the terminal condition.

#### 4. Supplement and Extension

##### 4.1. Supplement

As is mentioned in Section 2.1, when the last observation is far away from the terminal , the new model (9) appears inaccurate. In this case I need an adjustment to obtain a relatively accurate model. The main steps of adjustment are defined as follows: first I ignore the terminal condition to obtain both the accurate model and parameters estimations limited in ; next I estimate the unobservable variables in the interval by the first step estimated model; finally, I substitute the estimators for the unobservable variables in and build a relatively accurate model defined in the whole interval and related to the terminal condition.

For arbitrary , This equation is accurate and thus I can get the estimators and of and for by the methods given in Section 2. When , this method is however unsuitable for estimating because it cannot be extrapolated to the interval , so I attempt to complete the data within this interval.

Set . Discretize model (32) and write its forward linear version as Similar to formula (9), the expectation-variance structure is shown as To estimate the unobservable data for , I treat as being parameterizable. It is known by, for example, Morris [23] that variance can be expressed as the quadratic function of mean for several common distributions, such as normal, gamma, binomial, negative binomial, and Poisson. For the mean-variance structure in (34), I might as well suppose the following parametric structure: for some parameters . By simply transforming and neglecting terms, I see where or , and denote .

Let and plug into (32). I then could get the estimators through the methods in Section 2; denote by , , and the estimators of , and , respectively. Finally I could refine the original orbit and estimate one by one, more precisely, Iterating the above procedures, I obtain the complete data in . Consequently, the same approaches as in Section 2 could be performed again, and a refined estimator of could be constructed.

##### 4.2. Extension

Consider that the semiparametric models in Section 2 are of linear structure in the sense that is linearly related to parameters , , and . However, some generators are nonlinear in parameters; thus the resulting model (9) will be nonlinear. For example Constantinides [24] presented the resulting model with the specification form: See for other examples Fan [25], Fan and Zhang [26], Chan et al. [27], and Aït-Sahalia [28].

Then, for the flexibility of modeling the above case, a nonlinear semiparametric model can be defined as where satisfying , is a given function, and is an unknown -dimensional parameter vector.

Before estimating nonlinear model (39), can be estimated similarly by (19) or (20) because its estimator is free of the structure of . Furthermore, the resulting estimator has the same asymptotic properties as in Theorem 2. Thus I only focus on the estimation of parameter vector here.

After plugging the estimator of into the first formula of (39), I can adopt a common method to obtain an estimator of , for example, by minimizing where . Under regularity conditions, I can also get by solving the following equation: where denotes the derivative of . By the similar arguments used in the previous section, the resultant estimator is normally distributed; the details are omitted here.

#### 5. Simulations

In this section I investigate the finite-sample behaviors by simulation. Despite Theorems 2 and 3 based on stationarity of , I also extend this method to nonstationary process such as Geometric Brownian Motion. I use the mean, standard deviation (STD) or mean square error (MSE) to evaluate the estimations, based on 300 repetitions. Apparently, the model with stationary condition will work better.

*Example 4. *Consider Cox-Ingersoll-Ross (CIR) process:
This model describes the interest rate dynamic system and is stationary when . On the other hand, the riskless asset with price per unit is conducted as follows:
with being the constant short rate. Let and denote the quantities invested in bond and asset , respectively. Naturally the total wealth process satisfies . Similar with the classic self-financing FBSDE model in El Karoui et al. [29], the resulting model is

Denote parameters , , and . I put the equal length of time period and choose sample size . So the time interval is . The terminal time is chosen as 122, which is quite near the former one. Let , , , and . In the estimation procedure, I use the Gaussian kernel defined by ; meanwhile the optimal bandwidths would be theoretically, and popular data-driven method can also be used, such as CV, GCV, or plug-in approach. In the simulation, I set for simplicity. The simulation results with other choices are similar.

I present the true curves and the N-W nonparametric estimation curves for and generator and report the mean and MSE of estimator of , respectively, in Table 1. These results show that the estimators of and work well. However, because of the plug-in estimator , the estimator of the coefficient has fairly large bias and the MSE. On the other hand, Figure 1 shows that the estimation curves of drift and diffusion are closed to the true ones.

**(a)**

**(b)**

*Example 5. *In this part I consider the case that the terminal time is far away from the last observed time, as mentioned in the Supplement Section. The distance between and is larger than that in Example 4. I add 10 estimated points by the method given in Section 4 and employ the same model and parameters as before. Table 2 reports the simulation results, which tells us that the parameter estimators do not perform as well as before but still feasible. Besides, Figure 2 presents the estimated curves for and , which also perform well although they are not better than the estimations in Example 4.

**(a)**

**(b)**

*Example 6. *I turn to the nonstationary case in this part. Obviously when forward process does not satisfy the stationary condition, this cumulate effect induced by backward addition performs more significantly, which makes the statistical inference quite a challenge. Under this situation, I choose certain model and parameters to control the relative stationarity.

I consider a simple FBSDE as
where is Geometric Brownian Motion for modeling stock price satisfying
while the riskless asset is the same as formula (43), .

Firstly, let , , , , , and . Obviously . I choose the same pattern kernel function and bandwidth . Table 3 reports the simulation results. The results show that the estimators of work well, but have larger bias and the STD because of the plug-in estimator . While the curves can still be fitted well, that is, the estimated curves of drift and diffusion are closed to the true ones, Figure 3 presents the estimated curves for diffusion and drift by one simulation.

Finally, I choose relatively large as 0.05 and 0.12, which display different extension of volatilities. From Tables 4 and 5 and Figures 4 and 5, I can see that their performances are not so bad, which means that the approach could be applied more widely.

**(a)**

**(b)**

**(a)**

**(b)**

**(a)**

**(b)**

#### 6. Proofs

* Proof of Theorem 2. *Denote . By the Taylor expansion and formula (19), I have
Furthermore,
From the conditions of Markov process and -mixing coefficient,

Note that , where , . Thus and furthermore
To my interest, both the conditional expectation and variance are independent of , so the condition could be erased.

From Lemma 1 of Politis and Romano [30] and the relation between the -mixing condition and the -mixing condition (e.g., Theorem of [22]), I can ensure that , is a -mixing-dependent process and the mixing coefficient, denoted by , satisfies
where is a positive constant. Finally, I use the Central Limit Theorems for -mixing-dependent process (e.g., Theorem of [22]) to complete this proof.

*Proof of Theorem 3. *I present the basic results for , which leads to rate of convergence and asymptotic expansions. Similar to Cui et al. [31] or Su and Lin [14], I need the following decomposition:
By the condition (d), and all eigenvalues of are bounded. Furthermore, by the uniform weak consistency kernel estimator of mixing-dependent variables, (see, e.g., [32, 33]), I have
By the proof of Theorem 2, I have
It is easy to deduce that , , , and . Then I naturally get
Denote . Obviously . Combining the results above, I can see that
where is an unobservable sequence of independent identical distribution random variables with mean zero and variance one.

From and the central limit theorem it follows that

On the other hand, the expectation of does not converge to zero; that is to say, I have
while . For simplicity, take the third competent as an example to estimate the value. From Lemma 1 of Politis and Romano [30], I can see that , are both -mixing-dependent process and the mixing coefficient denoted by , too,
I can easily verify that, as , .

By condition (d), the variance is bounded uniformly for . Then
Summing up the above and the independency between and , I get
While the asymptotic bias , therefore
This completes the proof.

#### Acknowledgments

This paper was supported by NBRP (973 Program 2007CB814901) of China, NNSF Project (10771123) of China, RFDP (20070422034) of China, and NSF Projects (Y2006A13 and Q2007A05) of Shandong Province of China.