Optimal Wavelet Estimation of Density Derivatives for Size-Biased Data
A perfect achievement has been made for wavelet density estimation by Dohono et al. in 1996, when the samples without any noise are independent and identically distributed (i.i.d.). But in many practical applications, the random samples always have noises, and estimation of the density derivatives is very important for detecting possible bumps in the associated density. Motivated by Dohono's work, we propose new linear and nonlinear wavelet estimators for density derivatives when the random samples have size-bias. It turns out that the linear estimation for attains the optimal covergence rate when , and the nonlinear one does the same if .
Wavelet analysis plays important roles in both pure and applied mathematics such as signal processing, image compress, numerical solution, and local fractional calculus [1, 2]. One of which is to estimate an unknown density function based on random samples [3–8]. The perfect achievement was made by Dohono et al. , when the i.i.d. samples have not any noise. On the other hand, Besov spaces contain many functional spaces (e.g., Hölder spaces and Sobolev spaces with noninteger exponents) as their special examples. In some statistical models, the error is measured in norm [9–13].
In practice, it usually happens that getting the direct sample from a random variable is impossible. In this paper, we want to consider the true density function . But we can only observe the samples , for the size-biased data; that is, where is the so-called bias function, .
In many cases, a linear is recommended, but, in general, the form of should be studied via additional experiments. The purpose of this paper is to estimate the derivatives of the true density functions , ; we study the optimal convergence rate of wavelet estimators in norm over Besov spaces.
Size-biased data arise when an observation depends on samples magnitude. Several examples of model (1) can be found in the literature . For instance, in , it is shown that the distribution of the concentration of alcohol in the blood of intoxicated drivers is of interest; since the drunken drivers have a larger chance of being arrested, the collected data are size-biased.
The estimation problem for biased data (1) has been discussed in some papers. In 1982, Vardi  considered the nonparametric maximum likelihood estimation for . In 1991, Jones  discussed the mean squared error properties of the kernel density estimation. In 2004, Efromovich  developed the Efromovich-Pinsker adaptive Fourier estimator. It was based on a block shrinkage algorithm and achieved the minimax rate of convergence under the risk over the Besov class .
In 2010, Ramírez and Vidakovic  proposed a linear wavelet estimator and discussed the consistency of function in under the mean integrated squared error (MISE) sense. But the wavelet estimator in paper  contained the unknown parameter . In the same year, Chesneau  constructed a nonlinear wavelet estimator and evaluated the risk in the Besov space . But about the estimation of the density derivatives about model (1), to our knowledge, we have not seen any result. Estimation of the derivatives of a density is very important in detecting possible bumps.
The current paper is organized as follows. In Section 2, we briefly describe the preliminaries on wavelets and Besov space. The linear estimator and its convergence rate are presented in Section 3. In order to discuss optimality, Section 4 is devoted to give the lower bound for an arbitrary estimator. In Section 5, we consider nonlinear wavelet estimator and its optimal convergence rate. Our estimations improve the theorems in [10, 13, 14, 18].
2. Wavelets and Besov Spaces
In this section, we will recall some useful and well-known concepts and lemmas.
In order to construct a wavelet basis, we need a structure in which can decompose into a direct sum of mutually orthogonal spaces.
Definition 1 (see ). A multiresolution analysis (MRA) of is a set of increasing, closed linear subspaces , for all , called scaling spaces, satisfying(a), ;(b) if and only if all ;(c) if and only if for all ;(d)there exists a function such that is an orthogonal basis in . The function is called the scaling function of the multiresolution analysis.
With the standard notation in wavelet analysis, there exists a corresponding wavelet function , such that for fixed is an orthonormal basis of which is the orthogonal complement of the space in . For fixed , both and are orthonormal bases of .
As usual, denotes the classical Lebesgue space on the real line . Although wavelet bases are constructed for , most of them constitute unconditional bases for .
Lemma 2 (see ). Let be a compactly supported, orthonormal scaling function and the corresponding wavelet. Then for any with , the following expansion: converges to for almost everywhere , where
Lemma 3 (see ). If the scaling function satisfies , then for any sequence , one has where , , , .
Letting , , , , the Besov spaces are defined by with the associated norm , where denotes the smoothness modulus of , and
Between the different Besov spaces, the following embedding conclusions are established . Let , ; then(i), ;(ii), , ,
where denotes that the Banach space is continuously embedding in the Banach space ; that is, there exists a constant such that, for any , we have .
A scaling function is called -regular, if has continuous derivatives of order , and its corresponding wavelet has vanishing moments of order ; that is,
One of advantages of wavelets is that they can characterize Besov spaces.
Lemma 4 (see ). Let be a compact supported, -regular orthonormal scaling function with the corresponding wavelet and . If , , and , , then the following are equivalent:(i);(ii), where is the projection operator to ; that is, ;(iii). In this case,
Note 1. The notation indicates that with a positive constant , which is independent of and . If and , we write .
In this paper, the Besov balls are defined by
3. Linear Estimator
In this section, we will give a linear estimator for density derivatives in Besov spaces .
The linear wavelet estimator of the derivative of a density is defined as follows: where
The following inequalities play important roles in this paper.
Lemma 5 (see  (Rosenthal inequality)). Let be independent random variables such that and ; then there exists a constant such that(i), ,(ii), .
About the defined coefficients in (9), although , we have the following estimation.
Lemma 6. If , then, for any , one has .
Proof. By the definitions of , and triangular inequality, one observes that Since for any , one has that and . Thanks to embedding theorem , for any , one gets . It is easy to see , , are bounded. Using the convexity inequality, one obtains where (i)To estimate : denote . Note that are i.i.d. samples, and . Moreover, for any given integer , one has Therefore, By Rosenthal’s inequality Lemma 5,if , one has if , one gets (ii)To estimate the term , since let . It is easy to see , and, for any integer , one obtains Similarly, by Rosenthal’s inequality Lemma 5, (a)for , , one gets (b)for , that is, , and , one has Summarizing the above estimation about , , one obtains that .
Theorem 7. Let scaling function be compactly supported and -regular. If be the estimator defined in (9), then for , , one has where .
Proof. Firstly, using triangular inequality and convexity inequality, we decompose into the bias term and the stochastic term; that is, For the bias term , one can estimate it as follows.(i)When , Lemma 4 reduces to When , using Besov space embedding theorems , one has When , Hölder’s inequality and Lemma 4 tell us that Hence, for , one obtains that Next, we estimate the stochastic term . Clearly, due to Lemmas 3 and 6, one gets By choosing such that , one obtains that
4. Lower Bound and Optimality
Lemma 9 (Varshamov-Gilbert Lemma ). Let , ; then there exists a subset of with such that and .
Based on the above lemmas, we have the following lower bound estimation.
Theorem 11. Let with , , and ; there exist two constants and such that . If is any estimator of with i.i.d. random samples, then
Proof. (i) Firstly, we prove
It is sufficient to construct such that and
Suppose that is a compactly supported, regular and orthonormal scaling function and is the corresponding wavelet with . Assume . Define , and
Obviously, and ; that is, . Moreover, for and . So, one gets
Clearly, satisfies for . Then, Fano’s Lemma 10 tells us that
On the other hand, one has
where . Next, one shows that .
Recall that where , . Note that, , if , one has Since and , Taking , then . One can choose such that and . Therefore One has Noting that , one gets (ii)Next, we prove Similarly, it is sufficient to construct , such that and Similarly to prove (i), suppose that , . Defining , and with . Moreover, since , one knows that and By Lemma 10, . Hence . According to Lemma 9, there exist such that and Since for , this leads to and Clearly, the sets , , satisfy for . Using Fano’s Lemma 10, one has where , and one can get due to the similar arguments as (i).
Taking , then . One can choose a constant such that . By , then one obtains On the other hand, reduces to Therefore, one gets the following desired result:
Note that , if . Then we have the following corollary.
Corollary 12. If , the linear estimator (9) attains the optimal covergence rate.
5. Nonlinear Estimator
In this paper, the nonlinear wavelet estimator is defined as follows: where The hard thresholding wavelet coefficients are , where
About the wavelet coefficients, we can get the following lemmas whose proof is very similar to Lemma 6 and we omit it.
Lemma 13. If , then, for any , one has .
Lemma 14 (see  (Bernstein inequality)). Let be independent random variables such that , ; then
Lemma 15. If