#### Abstract

We consider the nonparametric estimation of the generalised regression function for continuous time processes with irregular paths when the regressor takes values in a semimetric space. We establish the mean-square convergence of our estimator with the same superoptimal rate as when the regressor is real valued.

#### 1. Introduction

Since the pioneer works of [1, 2], the nonparametric estimation of the regression function has been very widely studied for real and vectorial regressors (see, e.g., [3–8]) and, more recently, the case when the regressor takes values in a semimetric space of infinite dimension has been addressed. Interest in this type of explanatory variables has increased quickly since the foundational work of Ramsay and Silverman (1997), who proposed efficient methods for linear modelling (see [9] for a reissue of this work or [10, 11] for other developments on this topic). Later, fully nonparametric methods have been proposed (e.g., [12–15]) but the increased generality comes at a price in terms of convergence rate: in the regression estimation framework, it is well known that the efficiency of a nonparametric estimator decreases quickly when the dimension of the regressor grows. This problem, known as the “curse of dimensionality,” is due to the sparsity of data in high dimensional spaces. However, when studying continuous time processes with irregular paths, it has been shown in [16] that even when the regressor is -valued, we can estimate the regression function with the parametric rate of convergence . This kind of superoptimal rate of convergence for nonparametric estimators is always obtained under hypotheses on the joint probability density functions of the process which are very similar to those introduced by [17]. Since there is no equivalent of the Lebesgue measure on an infinite-dimensional Hilbert space, the definition of a density is less natural in the infinite-dimensional framework and the classical techniques cannot be applied. Under hypotheses about probabilities of small balls, we show that we can reach superoptimal rates of convergence for nonparametric estimation of the regression function when the regressor takes values in an infinite-dimensional space.

Notations and assumptions are presented in Section 2. Section 3 introduces our estimator and the main result. We comment on hypotheses and results and give some examples of processes fulfilling our hypotheses in Section 4. A numerical study can be found in Section 5. The proofs are postponed to Section 6.

#### 2. Problem and Assumptions

Let be a measurable continuous time process defined on a probability space and observed for , where is real valued and takes values in a semimetric vectorial space equipped with the semimetric . We suppose that the law of does not depend on and that there exists a regular version of the conditional probability distribution of , given (see [18–20] for conditions giving the existence of the conditional probability). Throughout this paper, denotes a compact set of . Let be a real valued Borel function defined on and consider the generalized regression function We aim to estimate from .

We gather hereafter the assumptions that are needed to establish our result.(H1)For any and any , set . There exist three constants such that, for any and any , we have (H2)There exist(i)a function and three constants such that, for any and any , we have (ii)a constant and a function integrable on such that, for any , any , and any , we have (H3)For any , we set . There exists an integrable bounded function on such that, for any , we have (H4)Let be the sigma-algebra generated by . There exists a constant , not depending on , such that

#### 3. Estimator and Result

We define the generalized regression function estimate by where is the indicator function on and is a bandwidth decreasing to when . Remark that this estimator is the same as the one defined in [21, page 130] with the use of the semimetric instead of the simple difference used in the real case.

Theorem 1 explores the performance of in terms of mean-square error.

Theorem 1. *Suppose that (H1)–(H4) hold. Let be (1) and be (7) defined with . Then, one has
*

We can compare this rate of convergence with the one obtained for discrete time processes in [14], which is, with our notations, Remark that, with infinite-dimensional variables, can decrease to zero, when tends to zero, at an exponential rate so that have to tend to zero at a logarithmic rate.

#### 4. Comments and Examples

(H1) is a very classical Hölderian condition on the true regression function, but, in the infinite-dimensional framework, this condition depends on the semimetric used.

The assumption on small balls probabilities given in (H2)-(i) is widely used in nonparametric estimation for functional data (see, e.g., the monograph [22]). However, we want to point out the fact that if we define equivalence classes using the semidistance , we can construct a quotient space on which is a distance and if this quotient space is infinite-dimensional, then this condition can be satisfied only very locally in that for any point of our compact , we can find, for any , a point and a positive number such that and : in that case, we could not extend our hypothesis to every point in an open ball (see [23] for a result on the consequences of a similar hypothesis on every point in a ball).

The most specific and restrictive assumption is (H2)-(ii), which is an adaptation to infinite-dimensional processes of the conditions on the density function introduced in [17] for real valued processes and transposed in [21, pages 135-136] to the estimation of the regression function with a vectorial regressor. Note that when and , the rate of convergence obtained in Theorem 5.3 in [21, page 136] is the same as the one we obtain here, and the condition I2 used implies (H2)-(ii). On the other hand, processes can meet (H2)-(ii) and infringe the condition in [21], especially when the vectorial process does not admit a density. For real valued processes, a slightly different version of the Castellana and Leadbetter hypothesis on the joint density is given in [24] where it is shown that this hypothesis is satisfied for a wide class of diffusion processes, including the Ornstein-Uhlenbeck Process: these processes are also examples of the range of applications of our result. Real continuous-time fractional ARMA processes studied in [25] are given as examples in [26]. Depending on the choice of the impulse response functions, a vector composed of such processes can fulfil (H2)-(ii) for any : using the notations of [25], if are independent processes complying with conditions of Proposition 4 in [25] with and , then the vectorial process meets (H2)(ii). For processes valued in infinite-dimensional spaces, we can also give the example of hidden processes: let be a nonobserved process valued in , for which conditions of Theorem 5.3 in [21, page 136] hold for every in a compact , let be an unknown function from to a space (that can be infinite-dimensional) equipped with a semimetric , and let be an observed process. If there exist two positive constants such that for any , , then fulfills (H2) with and . Note that even if with , does not satisfy the assumptions usually imposed to vectorial processes to obtain a superoptimal rate.

There are two conditions in (H3). The condition is less restrictive than imposing that the regressor and the noise are independent. is a weak condition on the decay of dependence as the distance between observations increases, and may not be -mixing. Note that we do not impose to to be an irregular path process.

At last, it is much less restrictive to impose (H4) than to suppose that is bounded. In particular, this assumption allows us to consider the model where is a bounded function, is a square integrable process, and and are independent.

On a given space, we can define many semidistances and hypotheses (H1)-(H2,) as well as the estimator itself, depending largely on the choice of this semidistance: the importance of this choice is widely discussed in [22] and a method to choose the semimetric for independent variables is proposed in [27], but this method does not ensure that (H1) holds. Actually, we can obtain a semimetric such that . It would be of interest to develop a data driven method adapted to continuous time processes to select the semimetric.

In the statement of our theorem, we impose that where is an unknown parameter so that the adaptation to continuous time processes of the method developed in [28] to choose the bandwidth would be interesting but is not in theory necessary in our framework. In point of fact, and it is what was very surprising when Castellana and Leadbetter first obtained a superoptimal rate of convergence, the bound for the variance of the estimator does not depend on and we can choose which will always satisfy for large enough: even if this choice has no reason to be optimal, it leads to the claimed superoptimal rate of convergence.

Recently, results have been obtained in the case where the response is valued in a Banach space, which can be infinite-dimensional (see [29, 30]). Note that until is a real valued Borelian function, there is no need to change our proofs to obtain our result if is valued in a Banach space. However, in the case where is a Banach valued variable, we could not easily adapt our proofs and obtaining a superoptimal rate would involve very different techniques; it would be an interesting extension for further works.

#### 5. Simulations

We chose endowed with its natural norm as the functional space and simulated our process as follows.

At first we simulated an Ornstein-Uhlenbeck process solution of the stochastic differential equation where denotes a Wiener process. Here, we took .

Denoting the floor function by , let be the function from to defined by where is the Legendre polynomial of degree and . Then we define our functional process for any setting For any square integrable function on , we define the function and set where and is a Wiener process independent of .

In order to obtain a panel of 20 points (in ) where we can evaluate the regression function, we did a first simulation with and set . Once obtained, is considered as a deterministic set. We represent these functions in Figure 1.

*Remark. *We check here that the simulated processes fulfil our hypotheses.

At first, denoting by the identity function on , for any , we have and satisfies (H1) with .

The Ornstein-Uhlenbeck process satisfies the part of Condition I2 on the regressor's density in [21, page 136]. Moreover, is a bijection from to , and it can be shown that, for some constant , there exist such that for any and any , the two following implications are correct: which implies that (H2)(i)-(ii) are fulfilled when taking .

Since and are independent and if , (H3) is satisfied.

Finally, the model used in the simulation corresponds to the choice of the identity function for in (1), where is an unbounded process and is not a bounded function. However, is bounded on and so (H4) is fulfilled.

We simulated the paths of the process for different values of . Figure 2 represents the path of the process for .

We estimated the regression function at each point in , for different values of , and compared our results to those obtained when studying a discrete time functional process, that is, when we observe only for , and we use the estimator defined in [12] with the indicator function as the kernel: it corresponds to an infinite-dimensional version of Nadaraya-Watson estimator with a uniform kernel. When working with the discrete time process we used the data-driven way of choosing the bandwidth proposed in [28]. When working with the continuous time process that is observed on a very thin grid, for , we chose the same bandwidth as the one used for the discrete time process and, for , we supposed to be Lipschitz (i.e., , which is the case here) and used the bandwidth . In Table 1, we give the mean square error evaluated on the functions of the panel for different , 500, and 2000.

We can see that, for , we already have a smaller mean square error with the estimator using the continuous time process, and when increase, the mean square error seems to decrease much more quickly when working with the continuous time process. However, the continuous time approach takes much more time and much more memory; we had to split the calculation into several parts and delete intermediate calculations to avoid saturating memory.

In Figures 3 and 4, we have in abscissa the value of the real regression function applied to each function of our panel and in ordinate the estimated value of the regression function. We represent on the left the results for the continuous time estimator and on the right the results for the discrete time estimator.

(a) Outputs for = 500 |

(b) Outputs for = 2000 |

#### 6. Proofs

##### 6.1. Intermediary Results

In the sequel, we use the following notations:

Lemma 2 below studies the behavior of the bias of .

Lemma 2. *Under the conditions of Theorem 1, one has
**
Lemma 3 below provides an upper bound for the variances of and .*

Lemma 3. *Under the conditions of Theorem 1, one has
*

##### 6.2. Proofs of the Intermediary Results

For the sake of conciseness, when no confusion is possible, we use the notations and .

* Proof of Lemma 2. *Observe that, for any ,
Hence,
Owing to (H1), we have . Therefore, by Jensen's inequality and , we have
This ends the proof of Lemma 2.

* Proof of Lemma 3. *For any , by Fubini's Theorem, we have
*Upper Bound of the Covariance Term.* In order to simplify the notations, we set and . Note that

Therefore, the covariance term can be expended as follows:
Set
We have
with
The triangular inequality and Jensen's inequality yield
where
*Upper Bound for *. Using (H2)-(ii), we have
*Upper Bound for *. Owing to (H1), we have . It follows from this inequality and (H2)-(i) that
*Upper Bound for **.* By similar techniques to those in the bound for and (H3), we obtain
On the other hand, by (H2)-(ii),
Hence,
Therefore, setting
the obtained upper bounds for , , and yield
*Final Bound*. Combining (24) and (38) and using (H2)-(i), we have

Since and are integrable and is bounded on and , there exists a constant such that
The special choice of leads us to
This last inequality concludes the proof of Lemma 3.

* Proof of Theorem 1. *We can write
The elementary inequality: , , yields
where
*Upper Bound for *. Lemma 3 yields
*Upper Bound for *. Lemma 2 yields
*Upper Bound for *. We define, for any , the quantity:
Note that, when ,
so that

Using (H4) and Lemma 3, we get
Similarly, (H4), Lemma 3, and Chebyshev's inequality lead to
We finally obtain
Putting the obtained upper bounds for , , and together, we have
Theorem 1 is proved.

#### Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

#### Acknowledgments

The authors wish to thank the Editor and two anonymous referees for their constructive suggestions which led to some improvements in some earlier versions of the paper.