Research Article | Open Access

# Optimal Wavelet Estimation of Density Derivatives for Size-Biased Data

**Academic Editor:**D. Baleanu

#### Abstract

A perfect achievement has been made for wavelet density estimation by Dohono et al. in 1996, when the samples without any noise are independent and identically distributed (i.i.d.). But in many practical applications, the random samples always have noises, and estimation of the density derivatives is very important for detecting possible bumps in the associated density. Motivated by Dohono's work, we propose new linear and nonlinear wavelet estimators for density derivatives when the random samples have size-bias. It turns out that the linear estimation for attains the optimal covergence rate when , and the nonlinear one does the same if .

#### 1. Introduction

Wavelet analysis plays important roles in both pure and applied mathematics such as signal processing, image compress, numerical solution, and local fractional calculus [1, 2]. One of which is to estimate an unknown density function based on random samples [3â€“8]. The perfect achievement was made by Dohono et al. [9], when the i.i.d. samples have not any noise. On the other hand, Besov spaces contain many functional spaces (e.g., HÃ¶lder spaces and Sobolev spaces with noninteger exponents) as their special examples. In some statistical models, the error is measured in norm [9â€“13].

In practice, it usually happens that getting the direct sample from a random variable is impossible. In this paper, we want to consider the true density function . But we can only observe the samples , for the size-biased data; that is, where is the so-called bias function, .

In many cases, a linear is recommended, but, in general, the form of should be studied via additional experiments. The purpose of this paper is to estimate the derivatives of the true density functions , ; we study the optimal convergence rate of wavelet estimators in norm over Besov spaces.

Size-biased data arise when an observation depends on samples magnitude. Several examples of model (1) can be found in the literature [14]. For instance, in [15], it is shown that the distribution of the concentration of alcohol in the blood of intoxicated drivers is of interest; since the drunken drivers have a larger chance of being arrested, the collected data are size-biased.

The estimation problem for biased data (1) has been discussed in some papers. In 1982, Vardi [16] considered the nonparametric maximum likelihood estimation for . In 1991, Jones [17] discussed the mean squared error properties of the kernel density estimation. In 2004, Efromovich [18] developed the Efromovich-Pinsker adaptive Fourier estimator. It was based on a block shrinkage algorithm and achieved the minimax rate of convergence under the risk over the Besov class .

In 2010, RamÃrez and Vidakovic [14] proposed a linear wavelet estimator and discussed the consistency of function in under the mean integrated squared error (MISE) sense. But the wavelet estimator in paper [14] contained the unknown parameter . In the same year, Chesneau [10] constructed a nonlinear wavelet estimator and evaluated the risk in the Besov space . But about the estimation of the density derivatives about model (1), to our knowledge, we have not seen any result. Estimation of the derivatives of a density is very important in detecting possible bumps.

The current paper is organized as follows. In Section 2, we briefly describe the preliminaries on wavelets and Besov space. The linear estimator and its convergence rate are presented in Section 3. In order to discuss optimality, Section 4 is devoted to give the lower bound for an arbitrary estimator. In Section 5, we consider nonlinear wavelet estimator and its optimal convergence rate. Our estimations improve the theorems in [10, 13, 14, 18].

#### 2. Wavelets and Besov Spaces

In this section, we will recall some useful and well-known concepts and lemmas.

In order to construct a wavelet basis, we need a structure in which can decompose into a direct sum of mutually orthogonal spaces.

*Definition 1 (see [19]). *A multiresolution analysis (MRA) of is a set of increasing, closed linear subspaces , for all , called scaling spaces, satisfying(a), ;(b) if and only if all ;(c) if and only if for all ;(d)there exists a function such that is an orthogonal basis in . The function is called the scaling function of the multiresolution analysis.

With the standard notation in wavelet analysis, there exists a corresponding wavelet function , such that for fixed is an orthonormal basis of which is the orthogonal complement of the space in . For fixed , both and are orthonormal bases of .

As usual, denotes the classical Lebesgue space on the real line . Although wavelet bases are constructed for , most of them constitute unconditional bases for .

Lemma 2 (see [20]). *Let be a compactly supported, orthonormal scaling function and the corresponding wavelet. Then for any with , the following expansion:
**
converges to for almost everywhere , where
*

Lemma 3 (see [3]). *If the scaling function satisfies , then for any sequence , one has
**
where , , , .*

Letting , , , , the Besov spaces are defined by with the associated norm , where denotes the smoothness modulus of , and

Between the different Besov spaces, the following embedding conclusions are established [3]. Let , ; then(i), ;(ii), , ,

where denotes that the Banach space is continuously embedding in the Banach space ; that is, there exists a constant such that, for any , we have .

A scaling function is called -regular, if has continuous derivatives of order , and its corresponding wavelet has vanishing moments of order ; that is,

One of advantages of wavelets is that they can characterize Besov spaces.

Lemma 4 (see [3]). *Let be a compact supported, -regular orthonormal scaling function with the corresponding wavelet and . If , , and , , then the following are equivalent:*(i)*;*(ii)*, where is the projection operator to ; that is, ;*(iii)*. In this case,
*

*Note 1. *The notation indicates that with a positive constant , which is independent of and . If and , we write .

In this paper, the Besov balls are defined by

#### 3. Linear Estimator

In this section, we will give a linear estimator for density derivatives in Besov spaces .

The linear wavelet estimator of the derivative of a density is defined as follows: where

The following inequalities play important roles in this paper.

Lemma 5 (see [3] (Rosenthal inequality)). *Let be independent random variables such that and ; then there exists a constant such that*(i)*, ,*(ii)*, .*

About the defined coefficients in (9), although , we have the following estimation.

Lemma 6. *If , then, for any , one has .*

*Proof. *By the definitions of , and triangular inequality, one observes that
Since for any , one has that
and . Thanks to embedding theorem , for any , one gets . It is easy to see , , are bounded. Using the convexity inequality, one obtains
where
(i)To estimate :
denote . Note that are i.i.d. samples, and . Moreover, for any given integer , one has
Therefore,
By Rosenthalâ€™s inequality Lemma 5,if , one has
if , one gets
(ii)To estimate the term , since
let . It is easy to see , and, for any integer , one obtains
Similarly, by Rosenthalâ€™s inequality Lemma 5, (a)for , , one gets
(b)for , that is, , and , one has
Summarizing the above estimation about , , one obtains that .

Theorem 7. *Let scaling function be compactly supported and -regular. If be the estimator defined in (9), then for , , one has
**
where .*

*Proof. *Firstly, using triangular inequality and convexity inequality, we decompose into the bias term and the stochastic term; that is,
For the bias term , one can estimate it as follows.(i)When , Lemma 4 reduces to
When , using Besov space embedding theorems , one has
When ,â€‰â€‰HÃ¶lderâ€™s inequality and Lemma 4 tell us that
Hence, for , one obtains that
Next, we estimate the stochastic term . Clearly, due to Lemmas 3 and 6, one gets
By choosing such that , one obtains that

*Remark 8. *Theorem 7 can be considered as natural extension of [14] if , . Moreover, the next part shows the optimality of our linear estimation for .

#### 4. Lower Bound and Optimality

This section is devoted to showing that the linear estimator defined in (9) attains the optimal covergence rate for . The idea of proof is motivated by [21].

Lemma 9 (Varshamov-Gilbert Lemma [5]). *Let , ; then there exists a subset of with such that and .*

Lemma 10 (Fanoâ€™s lemma [22]). *Let be probability measurable spaces and , . If for , one has
**
where stands for the complement of and stands for Kullback distance in [5].*

Based on the above lemmas, we have the following lower bound estimation.

Theorem 11. *Let with , , and ; there exist two constants and such that . If is any estimator of with i.i.d. random samples, then*

*Proof. *(i) Firstly, we prove
It is sufficient to construct such that and
Suppose that is a compactly supported, regular and orthonormal scaling function and is the corresponding wavelet with . Assume . Define , and
Obviously, and ; that is, . Moreover, for and . So, one gets
Clearly, satisfies for . Then, Fanoâ€™s Lemma 10 tells us that
On the other hand, one has
Then
where . Next, one shows that .

Recall that
where , . Note that, , if , one has
Since and ,
Taking , then . One can choose such that and . Therefore
One has
Noting that , one gets
(ii)Next, we prove
Similarly, it is sufficient to construct , such that and
Similarly to prove (i), suppose that , . Defining , and
with . Moreover, since , one knows that and
By Lemma 10, . Hence . According to Lemma 9, there exist such that and
Since for , this leads to and
Clearly, the sets