Research Article | Open Access
Superoptimal Rate of Convergence in Nonparametric Estimation for Functional Valued Processes
We consider the nonparametric estimation of the generalised regression function for continuous time processes with irregular paths when the regressor takes values in a semimetric space. We establish the mean-square convergence of our estimator with the same superoptimal rate as when the regressor is real valued.
Since the pioneer works of [1, 2], the nonparametric estimation of the regression function has been very widely studied for real and vectorial regressors (see, e.g., [3–8]) and, more recently, the case when the regressor takes values in a semimetric space of infinite dimension has been addressed. Interest in this type of explanatory variables has increased quickly since the foundational work of Ramsay and Silverman (1997), who proposed efficient methods for linear modelling (see  for a reissue of this work or [10, 11] for other developments on this topic). Later, fully nonparametric methods have been proposed (e.g., [12–15]) but the increased generality comes at a price in terms of convergence rate: in the regression estimation framework, it is well known that the efficiency of a nonparametric estimator decreases quickly when the dimension of the regressor grows. This problem, known as the “curse of dimensionality,” is due to the sparsity of data in high dimensional spaces. However, when studying continuous time processes with irregular paths, it has been shown in  that even when the regressor is -valued, we can estimate the regression function with the parametric rate of convergence . This kind of superoptimal rate of convergence for nonparametric estimators is always obtained under hypotheses on the joint probability density functions of the process which are very similar to those introduced by . Since there is no equivalent of the Lebesgue measure on an infinite-dimensional Hilbert space, the definition of a density is less natural in the infinite-dimensional framework and the classical techniques cannot be applied. Under hypotheses about probabilities of small balls, we show that we can reach superoptimal rates of convergence for nonparametric estimation of the regression function when the regressor takes values in an infinite-dimensional space.
Notations and assumptions are presented in Section 2. Section 3 introduces our estimator and the main result. We comment on hypotheses and results and give some examples of processes fulfilling our hypotheses in Section 4. A numerical study can be found in Section 5. The proofs are postponed to Section 6.
2. Problem and Assumptions
Let be a measurable continuous time process defined on a probability space and observed for , where is real valued and takes values in a semimetric vectorial space equipped with the semimetric . We suppose that the law of does not depend on and that there exists a regular version of the conditional probability distribution of , given (see [18–20] for conditions giving the existence of the conditional probability). Throughout this paper, denotes a compact set of . Let be a real valued Borel function defined on and consider the generalized regression function We aim to estimate from .
We gather hereafter the assumptions that are needed to establish our result.(H1)For any and any , set . There exist three constants such that, for any and any , we have (H2)There exist(i)a function and three constants such that, for any and any , we have (ii)a constant and a function integrable on such that, for any , any , and any , we have (H3)For any , we set . There exists an integrable bounded function on such that, for any , we have (H4)Let be the sigma-algebra generated by . There exists a constant , not depending on , such that
3. Estimator and Result
We define the generalized regression function estimate by where is the indicator function on and is a bandwidth decreasing to when . Remark that this estimator is the same as the one defined in [21, page 130] with the use of the semimetric instead of the simple difference used in the real case.
Theorem 1 explores the performance of in terms of mean-square error.
We can compare this rate of convergence with the one obtained for discrete time processes in , which is, with our notations, Remark that, with infinite-dimensional variables, can decrease to zero, when tends to zero, at an exponential rate so that have to tend to zero at a logarithmic rate.
4. Comments and Examples
(H1) is a very classical Hölderian condition on the true regression function, but, in the infinite-dimensional framework, this condition depends on the semimetric used.
The assumption on small balls probabilities given in (H2)-(i) is widely used in nonparametric estimation for functional data (see, e.g., the monograph ). However, we want to point out the fact that if we define equivalence classes using the semidistance , we can construct a quotient space on which is a distance and if this quotient space is infinite-dimensional, then this condition can be satisfied only very locally in that for any point of our compact , we can find, for any , a point and a positive number such that and : in that case, we could not extend our hypothesis to every point in an open ball (see  for a result on the consequences of a similar hypothesis on every point in a ball).
The most specific and restrictive assumption is (H2)-(ii), which is an adaptation to infinite-dimensional processes of the conditions on the density function introduced in  for real valued processes and transposed in [21, pages 135-136] to the estimation of the regression function with a vectorial regressor. Note that when and , the rate of convergence obtained in Theorem 5.3 in [21, page 136] is the same as the one we obtain here, and the condition I2 used implies (H2)-(ii). On the other hand, processes can meet (H2)-(ii) and infringe the condition in , especially when the vectorial process does not admit a density. For real valued processes, a slightly different version of the Castellana and Leadbetter hypothesis on the joint density is given in  where it is shown that this hypothesis is satisfied for a wide class of diffusion processes, including the Ornstein-Uhlenbeck Process: these processes are also examples of the range of applications of our result. Real continuous-time fractional ARMA processes studied in  are given as examples in . Depending on the choice of the impulse response functions, a vector composed of such processes can fulfil (H2)-(ii) for any : using the notations of , if are independent processes complying with conditions of Proposition 4 in  with and , then the vectorial process meets (H2)(ii). For processes valued in infinite-dimensional spaces, we can also give the example of hidden processes: let be a nonobserved process valued in , for which conditions of Theorem 5.3 in [21, page 136] hold for every in a compact , let be an unknown function from to a space (that can be infinite-dimensional) equipped with a semimetric , and let be an observed process. If there exist two positive constants such that for any , , then fulfills (H2) with and . Note that even if with , does not satisfy the assumptions usually imposed to vectorial processes to obtain a superoptimal rate.
There are two conditions in (H3). The condition is less restrictive than imposing that the regressor and the noise are independent. is a weak condition on the decay of dependence as the distance between observations increases, and may not be -mixing. Note that we do not impose to to be an irregular path process.
At last, it is much less restrictive to impose (H4) than to suppose that is bounded. In particular, this assumption allows us to consider the model where is a bounded function, is a square integrable process, and and are independent.
On a given space, we can define many semidistances and hypotheses (H1)-(H2,) as well as the estimator itself, depending largely on the choice of this semidistance: the importance of this choice is widely discussed in  and a method to choose the semimetric for independent variables is proposed in , but this method does not ensure that (H1) holds. Actually, we can obtain a semimetric such that . It would be of interest to develop a data driven method adapted to continuous time processes to select the semimetric.
In the statement of our theorem, we impose that where is an unknown parameter so that the adaptation to continuous time processes of the method developed in  to choose the bandwidth would be interesting but is not in theory necessary in our framework. In point of fact, and it is what was very surprising when Castellana and Leadbetter first obtained a superoptimal rate of convergence, the bound for the variance of the estimator does not depend on and we can choose which will always satisfy for large enough: even if this choice has no reason to be optimal, it leads to the claimed superoptimal rate of convergence.
Recently, results have been obtained in the case where the response is valued in a Banach space, which can be infinite-dimensional (see [29, 30]). Note that until is a real valued Borelian function, there is no need to change our proofs to obtain our result if is valued in a Banach space. However, in the case where is a Banach valued variable, we could not easily adapt our proofs and obtaining a superoptimal rate would involve very different techniques; it would be an interesting extension for further works.
We chose endowed with its natural norm as the functional space and simulated our process as follows.
At first we simulated an Ornstein-Uhlenbeck process solution of the stochastic differential equation where denotes a Wiener process. Here, we took .
Denoting the floor function by , let be the function from to defined by where is the Legendre polynomial of degree and . Then we define our functional process for any setting For any square integrable function on , we define the function and set where and is a Wiener process independent of .
In order to obtain a panel of 20 points (in ) where we can evaluate the regression function, we did a first simulation with and set . Once obtained, is considered as a deterministic set. We represent these functions in Figure 1.
Remark. We check here that the simulated processes fulfil our hypotheses.
At first, denoting by the identity function on , for any , we have and satisfies (H1) with .
The Ornstein-Uhlenbeck process satisfies the part of Condition I2 on the regressor's density in [21, page 136]. Moreover, is a bijection from to , and it can be shown that, for some constant , there exist such that for any and any , the two following implications are correct: which implies that (H2)(i)-(ii) are fulfilled when taking .
Since and are independent and if , (H3) is satisfied.
Finally, the model used in the simulation corresponds to the choice of the identity function for in (1), where is an unbounded process and is not a bounded function. However, is bounded on and so (H4) is fulfilled.
We simulated the paths of the process for different values of . Figure 2 represents the path of the process for .
We estimated the regression function at each point in , for different values of , and compared our results to those obtained when studying a discrete time functional process, that is, when we observe only for , and we use the estimator defined in  with the indicator function as the kernel: it corresponds to an infinite-dimensional version of Nadaraya-Watson estimator with a uniform kernel. When working with the discrete time process we used the data-driven way of choosing the bandwidth proposed in . When working with the continuous time process that is observed on a very thin grid, for , we chose the same bandwidth as the one used for the discrete time process and, for , we supposed to be Lipschitz (i.e., , which is the case here) and used the bandwidth . In Table 1, we give the mean square error evaluated on the functions of the panel for different , 500, and 2000.
We can see that, for , we already have a smaller mean square error with the estimator using the continuous time process, and when increase, the mean square error seems to decrease much more quickly when working with the continuous time process. However, the continuous time approach takes much more time and much more memory; we had to split the calculation into several parts and delete intermediate calculations to avoid saturating memory.
In Figures 3 and 4, we have in abscissa the value of the real regression function applied to each function of our panel and in ordinate the estimated value of the regression function. We represent on the left the results for the continuous time estimator and on the right the results for the discrete time estimator.
(a) Outputs for = 500
(b) Outputs for = 2000
6.1. Intermediary Results
In the sequel, we use the following notations:
Lemma 2 below studies the behavior of the bias of .
Lemma 3. Under the conditions of Theorem 1, one has
6.2. Proofs of the Intermediary Results
For the sake of conciseness, when no confusion is possible, we use the notations and .
Proof of Lemma 3. For any , by Fubini's Theorem, we have
Upper Bound of the Covariance Term. In order to simplify the notations, we set and . Note that
Therefore, the covariance term can be expended as follows: Set We have with The triangular inequality and Jensen's inequality yield where
Upper Bound for . Using (H2)-(ii), we have Upper Bound for . Owing to (H1), we have . It follows from this inequality and (H2)-(i) that Upper Bound for . By similar techniques to those in the bound for and (H3), we obtain On the other hand, by (H2)-(ii), Hence, Therefore, setting the obtained upper bounds for , , and yield
Final Bound. Combining (24) and (38) and using (H2)-(i), we have
Since and are integrable and is bounded on and , there exists a constant such that The special choice of leads us to This last inequality concludes the proof of Lemma 3.
Proof of Theorem 1. We can write
The elementary inequality: , , yields
Upper Bound for . Lemma 3 yields
Upper Bound for . Lemma 2 yields
Upper Bound for . We define, for any , the quantity: Note that, when , so that
Using (H4) and Lemma 3, we get Similarly, (H4), Lemma 3, and Chebyshev's inequality lead to We finally obtain Putting the obtained upper bounds for , , and together, we have Theorem 1 is proved.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
The authors wish to thank the Editor and two anonymous referees for their constructive suggestions which led to some improvements in some earlier versions of the paper.
- E. Nadaraya, “On estimating regression,” Theory of Probability and Its Applications, vol. 9, pp. 141–142, 1964.
- G. Watson, “Smooth regression analysis,” Sankhyā, vol. 26, no. 4, pp. 359–372, 1964.
- M. Rosenblatt, “Conditional probability density and regression estimators,” in Multivariate Analysis II (Proceedings of the 2nd International Symposium, Dayton, Ohio, 1968), pp. 25–31, Academic Press, New York, NY, USA, 1969.
- C. J. Stone, “Optimal global rates of convergence for nonparametric regression,” The Annals of Statistics, vol. 10, no. 4, pp. 1040–1053, 1982.
- G. Collomb and W. Härdle, “Strong uniform convergence rates in robust nonparametric time series analysis and prediction: kernel regression estimation from dependent observations,” Stochastic Processes and their Applications, vol. 23, no. 1, pp. 77–89, 1986.
- A. Krzyzak and M. Pawlak, “The pointwise rate of convergence of the kernel regression estimate,” Journal of Statistical Planning and Inference, vol. 16, no. 2, pp. 159–166, 1987.
- G. G. Roussas, “Nonparametric regression estimation under mixing conditions,” Stochastic Processes and Their Applications, vol. 36, no. 1, pp. 107–116, 1990.
- D. Bosq, “Vitesses optimales et superoptimales des estimateurs fonctionnels pour les processus à temps continu,” Comptes Rendus de l'Académie des Sciences I: Mathematics, vol. 317, no. 11, pp. 1075–1078, 1993.
- J. O. Ramsay and B. W. Silverman, Functional Data Analysis, Springer Series in Statistics, Springer, New York, NY, USA, 2nd edition, 2005.
- J. O. Ramsay and B. W. Silverman, Applied Functional Data Analysis: Methods and Case Studies, Springer Series in Statistics, Springer, New York, NY, USA, 2002.
- L. Horváth and P. Kokoszka, Inference for Functional Data with Applications, Springer Series in Statistics, Springer, New York, NY, USA, 2012.
- F. Ferraty and P. Vieu, “Nonparametric models for functional data, with application in regression, time-series prediction and curve discrimination,” Journal of Nonparametric Statistics, vol. 16, no. 1-2, pp. 111–125, 2004, The International Conference on Recent Trends and Directions in Nonparametric Statistics.
- E. Masry, “Nonparametric regression estimation for dependent functional data: asymptotic normality,” Stochastic Processes and Their Applications, vol. 115, no. 1, pp. 155–177, 2005.
- F. Ferraty, A. Mas, and P. Vieu, “Nonparametric regression on functional data: inference and practical aspects,” Australian & New Zealand Journal of Statistics, vol. 49, no. 3, pp. 267–286, 2007.
- F. Ferraty and P. Vieu, “Kernel regression estimation for functional data,” in The Oxford Handbook of Functional Data Analysis, pp. 72–129, Oxford University Press, Oxford, UK, 2011.
- D. Bosq, “Parametric rates of nonparametric estimators and predictors for continuous time processes,” The Annals of Statistics, vol. 25, no. 3, pp. 982–1000, 1997.
- J. V. Castellana and M. R. Leadbetter, “On smoothed probability density estimation for stationary processes,” Stochastic Processes and Their Applications, vol. 21, no. 2, pp. 179–193, 1986.
- M. Jirina, “Conditional probabilities on strictly separable -algebras,” Czechoslovak Mathematical Journal, vol. 4, no. 79, pp. 372–380, 1954.
- M. Jirina, “On regular conditional probabilities,” Czechoslovak Mathematical Journal, vol. 9, no. 84, pp. 445–451, 1959.
- R. Grunig, “Probabilités conditionnelles régulières sur des tribus de type non dénombrable,” Annales de l'institut Henri Poincaré B: Probabilités et Statistiques, vol. 2, no. 3, pp. 227–229, 1966.
- D. Bosq, Nonparametric Statistics for Stochastic Processes: Estimation and Prediction, vol. 110 of Lecture Notes in Statistics, Springer, New York, NY, USA, 2nd edition, 1998.
- F. Ferraty and P. Vieu, Nonparametric Functional Data Analysis: Theory and Practice, Springer Series in Statistics, Springer, New York, NY, USA, 2006.
- J. Azaïs and J.-C. Fort, “Remark on the finite-dimensional character of certain results of functional statistics,” Comptes Rendus Mathematique, vol. 351, no. 3-4, pp. 139–141, 2013.
- F. Leblanc, “Density estimation for a class of continuous time processes,” Mathematical Methods of Statistics, vol. 6, no. 2, pp. 171–199, 1997.
- M.-C. Viano, C. Deniau, and G. Oppenheim, “Continuous-time fractional ARMA processes,” Statistics and Probability Letters, vol. 21, no. 4, pp. 323–336, 1994.
- D. Blanke, “Sample paths adaptive density estimation,” Mathematical Methods of Statistics, vol. 13, no. 2, pp. 123–152, 2004.
- C. Timmermans, L. Delsol, and R. von Sachs, “Using Bagidis in nonparametric functional data analysis: predicting from curves with sharp local features,” Journal of Multivariate Analysis, vol. 115, pp. 421–444, 2013.
- K. Benhenni, F. Ferraty, M. Rachdi, and P. Vieu, “Local smoothing regression with functional data,” Computational Statistics, vol. 22, no. 3, pp. 353–369, 2007.
- F. Ferraty, A. Laksaci, A. Tadj, and P. Vieu, “Kernel regression with functional response,” Electronic Journal of Statistics, vol. 5, pp. 159–171, 2011.
- F. Ferraty, I. van Keilegom, and P. Vieu, “Regression when both response and predictor are functions,” Journal of Multivariate Analysis, vol. 109, pp. 10–28, 2012.
Copyright © 2014 Christophe Chesneau and Bertrand Maillot. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.