A Note on Wavelet Estimation of the Derivatives of a Regression Function in a Random Design Setting
We investigate the estimation of the derivatives of a regression function in the nonparametric regression model with random design. New wavelet estimators are developed. Their performances are evaluated via the mean integrated squared error. Fast rates of convergence are obtained for a wide class of unknown functions.
We consider the nonparametric regression model with random design described as follows. Let be random variables defined on a probability space , where are i.i.d. random variables such that and , are i.i.d. random variables with common density , and is an unknown regression function. It is assumed that and are independent for any . We aim to estimate , that is, the th derivative of , for any integer , from .
In the literature, various estimation methods have been proposed and studied. The main ones are the kernel methods (see, e.g., [1–5]), the smoothing splines, and local polynomial methods (see, e.g., [6–9]). The object of this note is to introduce new efficient estimators based on wavelet methods. Contrary to the others, they have the benefit of enjoying local adaptivity against discontinuities thanks to the use of a multiresolution analysis. Reviews on wavelet methods can be found in, for example, Antoniadis , Härdle et al. , and Vidakovic . To the best of our knowledge, only Cai  and Petsa and Sapatinas  have proposed wavelet estimators for from (1) but defined with a deterministic equidistant design; that is, . The consideration of a random design complicates significantly the problem and no wavelet estimators exist in this case. This motivates our study.
In the first part, assuming that is known, we propose two wavelet estimators: the first one is linear nonadaptive and the second one nonlinear adaptive. Both use the approach of Prakasa Rao  initially developed in the context of the density estimation problem. Then we determine their rates of convergence by considering the mean integrated squared error (MISE) and assuming that belongs to Besov balls. In a second part, we develop a linear wavelet estimator in the case where is unknown. It is derived from the one introduced by Pensky and Vidakovic  considering the estimation of from (1). We evaluate its rate of convergence again under the MISE over Besov balls. The obtained rates of convergence are similar to those attained by wavelet estimators for the derivatives of a density (see, e.g., [15, 17, 18]).
The organization of this note is as follows. The next section describes some basics on wavelets and Besov balls. Our estimators and their rates of convergence are presented in Section 3. The proofs are carried out in Section 4.
This section is devoted to the presentation of the considered wavelet basis and the Besov balls.
2.1. Wavelet Basis
We set We consider the wavelet basis on introduced by Cohen et al. . Let and be the initial wavelet functions of the Daubechies wavelets family with (see, e.g., ). These functions have the distinction of being compactly supported and belong to the class for . For any , we set and, for ,
With appropriated treatments at the boundaries, there exists an integer such that, for any integer , forms an orthonormal basis of . For any integer and , we have the following wavelet expansion: where These quantities are called the wavelet coefficients of . See, for example, Cohen et al.  and Mallat .
2.2. Besov Balls
We consider the following wavelet sequential definition of the Besov balls. We say that with , , , and if there exists a constant such that and (6) satisfy with the usual modifications if or .
The interest of Besov balls is to contain various kinds of homogeneous and inhomogeneous functions . For particular choices of , , and , correspond to standard balls of function spaces, as the Hölder and Sobolev balls (see, e.g., [11, 22]).
In this section, we set the assumptions on the model, present our wavelet estimators, and determine their rates of convergence under the MISE over Besov balls.
We formulate the following assumptions.(K1)We have for any .(K2)There exists a constant such that (K3)There exists a constant such that (K4)There exists a constant such that
3.2. Wavelet Estimators: When Is Known
We consider the wavelet basis with to ensure that and belong to .
Linear Wavelet Estimator. We define the linear wavelet estimator by where and is an integer chosen a posteriori.
The definition of is motivated by the following unbiased property: using the independence between and , , and integrations by parts with (K1), we obtain which is the wavelet coefficient of associated with .
This approach was initially introduced by Prakasa Rao  for the estimation of the derivatives of a density. Its adaptation to (1) gives a suitable alternative to the wavelet methods developed by Cai  and Petsa and Sapatinas  in the case , specially in the treatment of the random design.
Note that, for the standard case , this estimator has been considered and studied in Chesneau .
Theorem 1 investigates the rate of convergence attained by under the MISE assuming that belongs to Besov balls.
Theorem 1. Suppose that (K1), (K2), and (K3) are satisfied and that with , , , and . Let be defined by (11) with such that
, and denotes the integer part of .
Then there exists a constant such that
The rate of convergence corresponds to the one obtained in the derivatives density estimation framework. See, for example, Prakasa Rao  and Chaubey et al. [17, 18]. For , Theorem 1 becomes [23, Theorem ], with .
In the rest of the study, the rate of convergence will be taken for benchmark. However, we do not claim that it is the optimal one in a minimax sense; the lower bounds are not determined. However, from some logical considerations, it is a serious candidate.
Hard Thresholding Wavelet Estimator. We define the hard thresholding wavelet estimator by , where is defined by (12), is the indicator function, is a large enough constant, is the integer satisfying and .
The construction of is an adaptation of the hard thresholding wavelet estimator introduced by Delyon and Juditsky  to the estimation of from (1). It used the modern version developed by Chaubey et al. . The advantage of over (11) is that is adaptive; thanks to the thresholding in (17), its performance does not depend on the knowledge of the smoothness of . The second thresholding in (17) enables us to relax some assumptions on the model, and, in particular, to only suppose on (its density can be unknown). Basics and important results on hard thresholding wavelet estimators can be found in, for example, Donoho and Johnstone [26, 27], Donoho et al. [28, 29], and Delyon and Juditsky .
Theorem 2 determines the rate of convergence attained by under the MISE assuming that belongs to Besov balls.
Theorem 2. Suppose that (K1), (K2), and (K3) are satisfied and that with , , or . Let be defined by (16). Then there exists a constant such that
The proof is based on a general result proved by [25, Theorem 6.1]. Let us observe that, for the case , is equal to the rate of convergence attained by up to a logarithmic factor (see Theorem 1). However, for the case , it is significantly better in terms of power.
3.3. Wavelet Estimators: When Is Unknown
In the case where is unknown, we propose the linear wavelet estimator defined by where , is an integer chosen a posteriori, refers to (K3), and is an estimator of constructed from the random variables . For instance, we can consider the linear wavelet estimator by where and is an integer chosen a posteriori.
The estimator is close to the “NES linear wavelet estimator” proposed by Pensky and Vidakovic  for . However, there are notable differences in the thresholding in (21), the partitioning of the variables, and the definition of , making the study of its performance under the MISE more simpler (see the proofs of Theorem 3 below).
Theorem 3 determines an upper bound of the MISE of and then exhibits its rate of convergence when belongs to Besov balls.
Theorem 3. Suppose that (K1), (K2), and (K3) are satisfied and that with , , , and . Let be defined by (20). Then there exists a constant such that
In addition, suppose that (K4) is satisfied, with , , , and ; consider with the estimator defined by (22) with such that and such that
Then there exists a constant such that
The first point of Theorem 3 is proved for any estimator of depending on . Taking , it corresponds to the upper bound of the MISE for established in the proof of Theorem 1. Note that the rate of convergence described in the second point is slower to the one attained by (see Theorem 1). The fact that the smoothness of influences the performance of and, a fortiori, seems natural. This phenomenon also appears in [16, Theorem 2.1], for .
Remark 4. If exists but is unknown, we can define as (20) with instead of in the threshold of (21). The impact of this modification is a logarithmic term in Theorem 3; that is, Moreover, choosing such that there exists a constant such that
Remark 5. Note that the assumption (K4) has been only used in the second point of Theorem 3.
Conclusion and Perspectives. We explore the estimation of from (1). Distinguishing the cases where is known or not, we propose wavelet methods and prove that they attain fast rates of convergence under the MISE assuming that .
Perspectives of this work are(i)to develop an adaptive wavelet estimator, as the hard thresholding one, for the estimation of in the case where is unknown;(ii)to relax assumptions on the model. Indeed, several techniques exist to relax (K3); that is, has potential zeros. See, for example, Kerkyacharian and Picard , Gaïffas , and Antoniadis et al. . However, their adaptations to the estimation of are more difficult than they appear at first glance;(iii)to consider dependent .
These aspects need further investigations that we leave for a future work.
In this section, denotes any constant that does not depend on , , and . Its value may change from one term to another and may depend on or .
Proof of Theorem 1. First of all, we expand the function on at the level given by (14):
where and .
Since forms an orthonormal basis of , we get Using the fact that is an unbiased estimator of (see (13)), are i.i.d., the inequalities: for any random variable , and , , and (K2) and (K3), we have Using , the change of variables , and the fact that is compactly supported, we obtain Therefore and, for satisfying (14), it holds that On the other hand, we have (see [11, Corollary 9.2]), which implies It follows from (32), (36), and (37) that Theorem 1 is proved.
Proof of Theorem 2. Observe that, for , any integer and any , (i)using arguments similar to (13), we obtain
(ii)using arguments similar to (33) and (34), we have
Applying [25, Theorem 6.1], (presented in Appendix) with , , , , and with , , either or , we prove the existence of a constant such that Theorem 2 is proved.
Proof of Theorem 3. As in the proof of Theorem 1, we first expand the function on at the level given by (26):
Since forms an orthonormal basis of , we get Using (see [11, Corollary 9.2]), we have Let be (12) with and (26). The elementary inequality , , yields where Proceeding as in (36), we get Let us now investigate the upper bound for .
The triangular inequality gives Moreover, we have It follows from the triangular inequality, the indicator function, (K3), , and the Markov inequality that Hence where Let us now consider . For any random variable , we have the equality where denotes the expectation of conditionally to and , the variance of conditionally to . Therefore where Let us now observe that, owing to the independence of , the random variables conditionally to are independent. This remark combines with the inequalities: for any random variable and , , the independence between and , (K2) and (K3), yields Thanks to the support compact of , we have . Therefore, using , On the other hand, by the Hölder inequality for conditional expectations and arguments similar to (33) and (34), we get Hence It follows from (55), (58), and (60) that Putting (46), (48), and (61) together, we get
Combining (44), (45), and (62), we obtain
A slight adaptation of , Proposition 1, gives the following result. Suppose that (K4) is satisfied and with , , , and . Let be defined by (22) with as (25). Then there exists a constant such that
Therefore, choosing as (26) and using (63), we have Theorem 3 is proved.
Appendixwhere , and is the integer satisfying Here, we suppose that there exist (i) functions with for any ,(ii)two sequences of real numbers and satisfying and Such that, for , (A1) any integer and any , (A2) there exist two constants, and , such that, for any integer and any ,
Let be (A.1) under (A1) and (A2). Suppose that with , or . Then there exists a constant such that
Conflict of Interests
The author declares that there is no conflict of interests regarding the publication of this paper.
The author is thankful to the reviewers for their comments which have helped in improving the presentation.
M. P. Wand and M. C. Jones, Kernel Smoothing, Chapman and Hall, London, UK, 1995.View at: MathSciNet
S. Mallat, A Wavelet Tour of Signal Processing, The Sparse Way, with Contributions from Gabriel Peyré, Elsevier, Academic Press, Amsterdam, The Netherlands, 3rd edition, 2009.View at: MathSciNet
Y. Meyer, Wavelets and Operators, Cambridge University Press, Cambridge, UK, 1992.View at: MathSciNet