Learning TheoryView this Special Issue
Research Article | Open Access
Wavelet Optimal Estimations for Density Functions under Severely Ill-Posed Noises
Motivated by Lounici and Nickl's work (2011), this paper considers the problem of estimation of a density based on an independent and identically distributed sample from . We show a wavelet optimal estimation for a density (function) over Besov ball and risk () in the presence of severely ill-posed noises. A wavelet linear estimation is firstly presented. Then, we prove a lower bound, which shows our wavelet estimator optimal. In other words, nonlinear wavelet estimations are not needed in that case. It turns out that our results extend some theorems of Pensky and Vidakovic (1999), as well as Fan and Koo (2002).
1. Introduction and Preliminary
Wavelets have made great achievements in studying the statistical model , where stands for real-valued random variable with unknown probability density , and denotes an independent random noise (error) with density .
In 1999, Pensky and Vidakovic  investigate Meyer wavelet estimation over Sobolev spaces and risk under moderately and severely ill-posed noises. Three years later, Fan and Koo  extend those works from to Besov spaces . It should be pointed out that, by using different method, Lounici and Nickl  study wavelet optimal estimation over Besov spaces and risk under both noises. In , we provide a wavelet optimal estimation over and risk under moderately ill-posed noise. This current paper deals with the same problem under the severely ill-posed noises. It turns out that our result contains some theorems of [1, 2] as special cases. Our discussion also shows that nonlinear wavelet estimations are not needed for severely ill-posed noise, which is totally different with moderately ill-posed case.
Let and be a scaling and mother wavelet function, respectively. Then each has an expansion (in sense): with and . Here and throughout, we use the standard notation in wavelet analysis . A class of important wavelets are Meyer's, whose Fourier transforms are and compactly supported on . It is easy to see that , such that . In this paper, the Fourier transform for is defined by The classical method extends that definition to functions.
The following two lemmas are fundamental in our discussions. We use to denote norm of , and do norm of , where
Lemma 1 (see ). Let be a scaling or a wavelet function with . Then, there exist such that for with ,
One of the advantages of wavelet bases is that they can characterize Besov spaces. To introduce those spaces , we need the well-known Sobolev spaces with integer exponents with the Sobolev norm . Then can be considered as . For , and with , a Besov space is defined by with the norm , where denotes the smoothness modulus of and
Lemma 2 (see ). Let be a Meyer scaling function and be the corresponding wavelet. If , , , and , then the following assertions are equivalent:(i);(ii), where ;(iii).
In each case, Here and after, denotes for some constant ; means ; stands for both and , does for the sequence of .
At the end of this subsection, we make some assumptions on noise density , which will be dealt with in this current paper. For , , ,(C1); (C2); (C3). Clearly, the classical Cauchy densities satisfy (C1)–(C3) with and , and the Gaussian density does satisfy (C1)–(C3) with , , and . It should be pointed out that those above conditions (C1)–(C3) are a little different with .
In the next section, we define a wavelet linear estimator and provide an upper bound estimation over Besov spaces and risk under the condition (C3); the third part gives a lower bound estimation which shows the result of Section 2 optimal; some concluding remarks are discussed in the last part.
2. Upper Bound
To introduce the main theorem of this section, we assume that are independent and identically distributed (i.i.d) random variables of , the density of random noise satisfies condition (C3), and stands for the Meyer scaling function. As in , define as well as a linear wavelet estimator (the positive integer will be given later on). Then , is well defined, and .
We use supp to stand for the support of and to do its length. Moreover, for , denote It is reasonable to assume for , since in that case.
Theorem 3. Let satisfy (C3) and be the Meyer scaling function. If , , , then, with , , , and , In particular, can be replaced by , when .
Proof. When , . Since is continuously embedded to , Lemma 2 implies . Hence,
When , one obtains that, for some , and
In fact, and Hölder inequality imply that due to . By the definition of Besov norm, . According to (13) and (14), it is sufficient to prove
for the conclusion of Theorem 3.
Recall that and . Then By and Lemma 2, To estimate the middle term of (16), one observes that , . Since is the Meyer scaling function, and On the other hand, with . Therefore and . This with Lemma 1 leads to
Now, it remains to consider : Using and Lemma 1, one knows Clearly, . Define . Then and . To apply Rosenthal's inequality (Proposition 10.2, ), one estimates and : note that due to (C3) and . Then
Because are i.i.d, the Rosenthal's inequality tells that This with (21) implies that, for , . Moveover, (20) reduces to Then it follows from (16)–(19) and (23) that By the choices and (stated in Theorem 3), one receives that , Finally, the desired conclusion (15) follows.
Remark 4. Note that the choices of and do not depend on the unknown parameters , , and . Then our linear wavelet estimator over Besov space is adaptive or implementable. The same conclusion holds for and estimations; see Theorem 2 in  and Corollary 1 in . On the other hand, when and , our Theorem 3 reduces to Theorem 4 in ; from the proof of Theorem 3, we find that, for , the assumption can be replaced by , which is the same as in . Therefore, for , Theorem 3 of  follows directly from our Theorem 3.
3. Lower Bound
In this part, we will provide a lower bound estimation, which shows Theorem 3 to be the best possible in some sense. The following lemmas are needed in the proof of our main theorem of this section.
Lemma 5. Let with , , and . Then for ( when ), there exists such that . If is the Meyer wavelet function and , then, for some small ,
Proof. It is easy to see that (for ) and by the definition of Besov space. Since
where denotes the largest integer no more than can be made small enough by choosing small , when . Clearly, is needed, when .
If , then because is the Meyer function. Note that . Then for some small and , Hence, (26) holds for . On the other hand, when , and . Therefore, (26) is true, when small enough. This completes the proof of Lemma 5.
The next lemma extends an estimate in the proof of Theorem 1 in .
Lemma 6. Let be the Meyer wavelet function, , defined as in Lemma 5. If satisfies (C1), (C2), and , then
Proof. As shown in proof of Theorem 1 of , one finds easily that and therefore
By Parseval identity, (C1) and , . Moreover, the orthonormality of concludes that
To estimate , one proves an inequality: Note that , , and . Then and Since , ; On the other hand, the boundedness of and implies that as well as . Hence, , which reaches (32).
Define . Then and is locally absolutely continuous. Therefore, and Clearly, and thanks to (C1), (C2), and Moreover, because of (32) and the orthonormality of . This with (35), (31), and (30) leads to the desired conclusion of Lemma 6.
Two more classical theorems play important roles in our discussions. We list the first one as Lemma 7, which can be found in .
Lemma 7 (Varshamov-Gilbert). Let with . Then there exists a subset of such that , , and for ,
Given two probability measures and on a measurable space , the Kullback divergence of and is defined by Here, stands for absolutely continuous with respect to . In that case, , where the function denotes the density function of . The second classical theorem is taken from .
Lemma 8 (Fano). Let be probability measurable spaces and , . If for , then where , , and denotes the complement of a set .
Now, we are in the position to state the main theorem in this section.
Theorem 9. Let satisfy (C1) and (C2), and let be an estimator of . Then for , , , and , there exists independent of such that with ,
Proof. Assume that is the Meyer wavelet function, then . By Lemma 2,
for . Furthermore, with the function defined in Lemma 5, there exists such that and due to that Lemma. Define
Then because and .
By Lemma 7, one finds with and such that for and , . It is easy to see that This with Lemma 1 leads to , and therefore Define for . Then , when . Clearly, is a density function because both and are density functions. Let be the probability measure on the Lebesgue space with the density . Then Lemma 8 tells that
According to Lemma 5, and . Moreover, . Since , . Combining this with , one knows Because , the above inequality reduces to thanks to Lemma 6. Hence,
Note that and take such that Then (choose small enough). Furthermore, (45) reduces to Hence, . This with (44) and (48) leads to which is the desired conclusion of Theorem 9, when ( in that case).
When , , it remains to show Similar to the proof of (50), one takes small such that satisfies , and . Clearly, and for . Since is the Meyer wavelet function, and
Define . Then and due to Lemma 8. Similar (even simpler) arguments to the estimation of show . Taking as in (48), one receives that and by choosing small . Thus (54) reduces to Moreover, . This with (53) and (48) leads to (51). This completes the proof of Theorem 9.
Remark 10. By Theorems 9 and 3, the linear wavelet estimator is optimal for a density in Besov spaces with severely ill-posed noise. Therefore, we do not need to consider nonlinear wavelet estimations in that case. This contrasts sharply with moderately ill-posed noise case under which nonlinear wavelet estimation improves the linear one [2, 4].
Remark 11. When and , our Theorem 9 is better than Theorem 6 in , because . Moreover, Theorems 9 and 3 lead to Theorem 3 in that paper for and . In addition, our conditions (C1) and (C2) are a little weaker than the assumptions in .
4. Concluding Remarks
This paper provides an risk upper bound for a linear wavelet estimator (Theorem 3), which turns out to be optimal (Theorem 9). Therefore, nonlinear estimations are not needed under severely ill-posed noises. Although we assume in Theorem 9, the proof of that theorem shows that, for , In particular, when , this above estimation reduces to partial result of Theorem 1 in .
Note that our model assumes the noise to be severely ill-posed; that is, the density of noise satisfies