A Note on the Adaptive Estimation of a Conditional Continuous-Discrete Multivariate Density by Wavelet Methods
We investigate the estimation of a multivariate continuous-discrete conditional density. We develop an adaptive estimator based on wavelet methods. We prove its good theoretical performance by determining sharp rates of convergence under the risk with for a wide class of unknown conditional densities. A simulation study illustrates the good practical performance of our estimator.
The estimation of conditional densities is an important statistical challenge with applications in many practical problems, especially those connected with forecasting (economics, etc.). There is a vast literature in this area. We refer to the papers of Li and Racine , Akakpo and Lacour , and Chagny  and the references therein. In this paper we focus our attention on a specific problem: the estimation of a multivariate continuous-discrete conditional density. The considered model is described as follows. Let , , , and be positive integers and let be iid random vectors defined on the probability space . We suppose that is continuous with support and that is discrete with support . Let be the density of . We define the density function of conditionally to the event by. We aim to estimate from . The most common approach is based on the kernel methods developed by Li and Racine . Applications and recent developments for these methods are described in detail in Li and Racine .
In this paper we develop a new estimator based on wavelet methods. It is now an established fact that, in comparison to kernel methods, wavelet methods have the advantage to achieve a high degree of adaptivity for a large class of unknown functions, with possible complex discontinuities (jumps, spikes, etc.). See, for instance, Antoniadis , Härdle et al. , and Vidakovic . This fact motivates our interest to develop wavelet methods for the considered conditional density estimation problem. The main ingredients in the construction of are an estimation of with a new wavelet estimator , an estimation of by an empirical estimator, and a global thresholding technique developed by Vasiliev . In particular, the considered estimator can be viewed as a multivariate (but “nonsmooth”) version of the one introduced in the univariate case, that is, , in Chesneau et al. . We prove that is both adaptive and efficient; it is not dependent on the smoothness of in its construction and, under mild assumptions on the smoothness of (we assume that it belongs to a wide class of functions, the so-called Besov balls), it attains fast rates of convergence under the risk (with ). These theoretical guarantees are illustrated by a numerical study showing the good practical performance of our estimator.
The remainder of this paper is set out as follows. Next, in Section 2, we briefly describe the considered multidimensional wavelet bases and Besov balls. Our wavelet estimator and some of its theoretical properties are presented in Section 3. A short numerical study can be found in Section 4. Finally, the proofs are postponed to Section 5.
2. Multidimensional Wavelet Bases and Besov Balls
Let be positive integers and let . First of all, we define the spaces as .
In this study, we consider a wavelet base on based on the scaling and wavelet functions and , respectively, from Daubechies family (see ). For any , we setwhere forms the set of all nonvoid subsets of of cardinality greater than or equal to .
For any integer and any , we consider
Let . Then, with an appropriate treatment at the boundaries, there exists an integer such that the collection forms an orthonormal basis of . A function can be expanded into a wavelet series aswhereAll the details about these wavelet bases, including the expansion into wavelet series as described above, can be found in, for example, Meyer , Daubechies , Cohen et al. , and Mallat .
Let , , , and . We say that a function belongs to the Besov ball if and only if the associated wavelet coefficients (5) satisfy and with the usual modifications for or .
These sets contain function classes of significant spatial inhomogeneity, including Sobolev balls and Hölder balls. Details about Besov balls can be found in, for example, Meyer  and Härdle et al. .
3. Conditional Density Estimation
We formulate the following assumptions.There exists a known constant such that There exists a known constant such that We propose the following “ratio-thresholding estimator” for :, where 1 denotes the indicator function, refers to the constant in , and is defined bywhere is a large enough constant and is an integer such that , and is defined by The estimator (10) uses a hard thresholding technique of the wavelet coefficients estimators (12). Such a selection rule is at the heart of the adaptive nature of wavelet methods which have the ability to capture the most important wavelet coefficients of a function, that is, those with the high magnitudes. We refer to Antoniadis , Härdle et al. , and Vidakovic  for further details. The definition of the threshold, that is, , corresponds to the universal one proposed by Donoho and Johnstone  and Donoho et al. . It is based on technical considerations ensuring good convergence properties of the hard thresholding wavelet estimator (see also Theorem A.3 in Appendix).
Note that (10) can be viewed as a nonsmooth multivariate version of the estimator proposed by Chesneau et al. . The main advantage of this estimator is to be more easy to implement from a practical point of view (see Section 4 below for a numerical comparison in the univariate case). Concerning , let us mention that it is a natural unbiased estimator for with nice convergence properties. They will be used in the proof of our main result.
The global construction of (9) follows the idea proposed by Vasiliev  for other statistical contexts. Note that a control on the lower bound of is necessary; it must be large enough to ensure good statistical properties for (9).
The following result investigates the rates of convergence attained by (9) under the risk with .
Theorem 1. Let , let , let be (1), and let be defined by (9) with a large enough (the exact condition is described in (29)). Suppose that and hold and that with , , , and . Then there exists a constant such that, for being large enough, where
Theorem 1 provides theoretical guarantees on the convergence of (9) under mild assumptions on the smoothness of and a fortiori under the risk. The obtained rates of convergence are sharp. However, since the lower minimax bounds are not established in our setting, we do not claim that they are the optimal ones in the minimax sense. An important benchmark is that they correspond to the optimal ones in the minimax sense for the standard multivariate density estimation problem, corresponding to and , is constant almost surely, up to a logarithmic term (see ).
Finally, note that the factor plays a secondary role in our study; it only appears in the presentation of the model and the construction of and its performance does not depend on the value of .
4. A Short Numerical Study
In this section we investigate some practical aspects of our wavelet methods. For the sake of simplicity, we focus our attention on the univariate case, that is, (so , , , etc.). The codes are written in MATLAB and are adapted from Ramirez and Vidakovic . First we compare the performance of new estimators of density functions with those proposed in our former publication, Chesneau et al. , in two styles, accuracy and speed of computation. In order to illustrate the rate of decrease of errors, as Chesneau et al. , we employ the indicator defined by where and are sample size and the number of replications, respectively, represents the true density, and is an estimator. We consider three estimators based on our statistical methodology: the linear wavelet estimator; that is,, the hard thresholding wavelet estimator defined by (10), and the smooth version of the linear wavelet estimator after local linear regression (see, e.g., ). The practical construction of this smooth version of linear wavelet estimators was proposed by Ramirez and Vidakovic . Several studies confirm that this version of estimators has nice performance in different fields (see, e.g., [20, 21]). We adopt similar setup from Chesneau et al.  for our example; that is, we use Daubechies’s compactly supported “Daubechies 3” and we take . Also, we generate different sample sizes , and data points , from distribution. The discrete random sample is generated from Binomial(); the bivariate density function is . Table 1 gives the value of computed from simulations for different sample sizes. This table should be compared with Table 1 in page 70 in Chesneau et al. . As we see, similar results could be obtained; decreases while the sample size increases. The performance of the smooth version of linear wavelet estimator is the best. As we see there is no significant difference between the new version of estimators with former versions in Chesneau et al. .
On the other hand, Table 2 depicts the speed of computation for two groups of estimators in seconds. The codes are run with an ordinary laptop with 4.3 RAM. As we see the speed of new version of estimators is much less than the former. For example, when the sample size is , the speed of computation is about times less than the former version of wavelets estimators of densities. These differences will be much bigger when the sample size increases.
In the second part of this section we show the performance of proposed estimators of conditional density functions. Note that the conditional density function in the above examples satisfies . Figures 1 and 2 depict and , respectively. In each case the true conditional density function is shown in black line, the linear wavelet estimator is blue, the hard thresholding wavelet estimator is red, and the smooth version of linear one is green.
All the figures illustrate the good performance of our proposed linear and nonlinear estimators of conditional density functions. It should be reminded that the hard thresholding one has no tuning parameter; it is entirely adaptive. The smooth version of our wavelet linear estimator has the best performance. Furthermore, Table 3 represents the impact of sample size on performance of our estimators. This table also compares the performance of three estimators. The number of replications is 500. As the sample size increases the value of indicator decreases and the performance of smooth version of linear wavelet estimators is the best.
5. Proof of Theorem 1
In what follows, denotes any constant that does not depend on , , and . Its value may change from one term to another. For the sake of simplicity, we set . Observe that Owing to , we have implying that and . Moreover, note that and, thanks to , . It follows from the triangular inequality and the above inequalities that By the inequality , , we obtainwhere Let us now bound and in turn.
Upper Bound for . We investigate an upper bound for by using Theorem A.3 in Appendix. First of all, thanks to implying that , let us expand the density on the considered wavelet basis: where and . Let us now prove that the wavelet coefficients estimators and satisfy Assumptions and of Theorem A.3.
First of all, observe that and are unbiased estimators for and , respectively: We prove similarly that .
Investigation of . Let us focus on the second inequality in ; the first one can be proved with similar arguments. For any , set . Then are zero-mean iid random variables with, by and ,It follows from the Rosenthal inequality (see Appendix) that
Investigation of . With the same random variables defined as above, using , note that . It follows from the Bernstein inequality (see Appendix) with , and, by (26), with , , thatwhere . Taking such that , we obtainIt follows from Theorem A.3 that
Upper Bound for . For any , set . Then are zero-mean iid random variables with . It follows from the Rosenthal inequality (see Appendix) thatCombining (22), (30), and (31), we obtain This completes the proof of Theorem 1.
Here we state the two results that have been used for proving our theorem.
Lemma A.1 (see ). Let be a positive integer, let , and let be zero-mean random variables such that . Then there exists a constant such that
Lemma A.2 (see ). Let be a positive integer and let be iid zero-mean independent random variables such that there exists a constant satisfying . Then, for any ,
Theorem A.3. We consider a general statistical nonparametric framework. Let and let be an unknown function to be estimated from observations and consider the wavelet decomposition given by (4). Let and be estimators of and , respectively, such that there exist two constants and satisfying Assumptions and below. For any , and for any such that , , and , For any such that , , and , Let us define the estimator by , where is the integer satisfying .
Suppose that with , , , and . Then there exists a constant such that where
Theorem A.3 can be proved using similar arguments to [16, Theorem 5.1] for a bound of the -risk and the multidimensional framework of [17, Theorem 1] for the determination of the rates of convergence.
The authors declare that they have no competing interests.
Q. Li and J. S. Racine, Nonparametric Econometrics: Theory and Practice, Princeton University Press, Princeton, NJ, USA, 2007.View at: MathSciNet
A. Antoniadis, “Wavelets in statistics: a review (with discussion),” Journal of the Italian Statistical Society Series B, vol. 6, pp. 97–144, 1997.View at: Google Scholar
B. Vidakovic, Statistical Modeling by Wavelets, John Wiley & Sons, New York, NY, USA, 1999.
I. Daubechies, Ten Lectures on Wavelets, SIAM, Philadelphia, Pa, USA, 1992.
Y. Meyer, Wavelets and Operators, Cambridge University Press, Cambridge University Press, 1992.View at: MathSciNet
S. Mallat, A Wavelet Tour of Signal Processing: The Sparse Way, Elsevier, Amsterdam, The Netherlands, 3rd edition, 2009, with Contributions from Gabriel Peyré.
V. V. Petrov, Limit Theorems of Probability Theory: Sequences of Independent Random Variables, vol. 4 of Oxford Studies in Probability, Clarendon Press, Oxford, UK, 1995.View at: MathSciNet