Abstract

This paper considers the estimation of the common probability density of independent and identically distributed variables observed with additive measurement errors. The self-consistent estimator of the density function is constructed when the error distribution is known, and a modification of the self-consistent estimation is proposed when the error distribution is unknown. The consistency properties of the proposed estimators and the upper bounds of the mean square error and mean integrated square error are investigated under some suitable conditions. Simulation studies are carried out to assess the performance of our proposed method and compare with the usual deconvolution kernel method. Two real datasets are analyzed for further illustration.

1. Introduction

The statistical methodology of density estimation has been widely studied in the literature (see [15]) and still is a hot issue in nonparametric statistics. For example, Park et al. [6] studied the kernel-based local likelihood density estimation. Jones and Henderson [7] proposed a Gaussian copula kernel estimator. Botev et al. [8] introduced an adaptive kernel density estimation method based on the smoothing properties of linear diffusion processes. The most commonly used nonparametric method is kernel density estimation. Usually, the choice of kernel function is not crucial, whereas the bandwidth parameter, which controls the degree of smoothing, must be chosen carefully. However, it is well known that the choice of bandwidth is difficult. Cross-validation techniques have been previously applied for this, but they are computationally expensive. Furthermore, it becomes difficult to estimate the density and choose the bandwidth when there exists the measurement error.

In this paper, we consider the density estimation in the presence of additive measurement error. Suppose we have i.i.d. available observations which have the same distribution as that of , to estimate the unknown density of a random variable , where with a measurement error , and is independent of . The density function of is denoted as .

When the distribution of is known, the statistical methodology for estimating the unknown density has been extensively discussed in the literature. For instance, Carroll and Hall [2] and Fan [9, 10] discussed the optimal rates of convergence over a class of functions whose derivatives are Lipschitz continuous using kernel method. Zhang [11] studied the deconvolution kernel density estimation of the mixing densities and distributions and derived the optimal rates of convergence. See also Koo [12], Pensky and Vidakovic [13], Fan and Koo [14], and Comte et al. [15], among others, for earlier contributions. Recently, Butucea [4] and Butucea and Tsybakov [5] evaluated the minimax rate of convergence of the pointwise risk using the kernel method and computed upper bounds for the risk of the estimator. However, most of those papers deal only with theoretical aspects of the estimation, and very few focus attention on the yet important issue of choosing the bandwidth in practice. Delaigle and Gijbels [16] proposed a bandwidth selection procedure based on bootstrap techniques and proved that the mean integrated squared error (MISE) of the bootstrap estimator for the density function converges to the exact mean integrated squared error. However, the algorithm is complex and the quality of the estimator of the density depends strongly on the choice of the pilot bandwidth which must be chosen before the bootstrap procedure.

Indeed, in most practical applications, the distribution of cannot be perfectly known. Meister [17] studied the effect of the misspecification of the error density on the MISE of the deconvolution estimator. He pointed out that the limit of MISE can be infinite in some cases. Sometimes, this problem can be solved by repeated observations of the same variable of interest, each time with an independent error; see, for example, Li and Vuong [18], Delaigle et al. [19], or Neumann [20] and references therein. However, there are also many application fields where it is not possible to do repeated measurements of the same variable. In that case, the information of the error distribution can be drawn from an additional experiment; this means that knowledge of can be replaced by observations , which is a noise sample with distribution , and is independent of . Neumann [21] and Kerkyacharian et al. [22] replaced the characteristic function of the error by its empirical version. Johannes [23] studied the density deconvolution with unknown (but observed) noise and showed that the proposed estimators are asymptotically optimal in a minimax sense over a Sobolev space. The resulting estimators depend on two bandwidth-type parameters, but the data-driven selection of these bandwidths was not done. Recently, Comte and Lacour [24] proposed an adaptive estimator of the density based on a data-driven model selection strategy and discussed the convergence rates of the estimator. In addition, they studied the link between and if one wants to preserve the rate that is found in the case where the distribution of is known.

In the present paper, we study model (1), completed with a known or unknown distribution of error , and construct the self-consistent estimators of the density by searching for the optimal kernel when the error distribution is ordinary smooth or super smooth. The method of constructing the estimator of density function by optimal kernel was first proposed by Watson and Leadbetter [25]. But the result can not be used directly, since the Fourier transform of the true density function is unknown. Bernacchia and Pigolotti [26] derived the exact expression of the density estimator based on optimal kernel and defined it as self-consistent estimate. They showed that the self-consistent method has better performance than existing methods for all examples that they studied. But the data in their paper did not include measurement errors. The self-consistent method shares some desired features: the choice of the bandwidth-type parameter is more convenient than that of kernel or spline method; thus the computing speed can be improved significantly; the resulting estimators are consistent; the proposed method is preferable for applications, especially for the large datasets. Thus, it is of great significance to extend the self-consistent estimate method to more general case.

The paper is organized as follows. In Section 2, we propose the self-consistent method for density estimation with the known distribution of , or with the noise sample when is unknown, and give the asymptotic properties. In Section 3, some simulations are carried out to illustrate the efficacy of the proposed method. Two real data examples are used for illustration in Section 4. The proofs of the main results are included in Appendix.

2. Methodology and the Main Results

2.1. Error Distribution Is Known

For the sake of descriptive convenience, we first introduce some notations. For two real numbers and , we denote and . Let be a complex number, let denote its conjugate, and let denote its modulus. Let be the -norm of ; that is, . The Fourier transform of is defined by Similarly, we denote the characteristic function as for the known density function of .

The smoothness of is described by the following assumption. There exist some positive constants , , , and and constants ( if ), such that the characteristic function of the error distribution satisfies When we call the distribution of as super smooth of order . For example, standard normal and Cauchy distributions are super smooth with and 1, respectively. When we call the distribution of as ordinary smooth of order . For example, gamma distribution and double exponential distribution are ordinary smooth with and 2, respectively.

Assume that the density function of belongs to the following type of smoothness spaces: where , , , , and if . If , the density function is called super smooth and ordinary smooth otherwise.

Let denote the common density of the ’s. By (1) and the independence assumption between and , we have where denotes the convolution and denotes the characteristic function of . Let be a kernel estimator of with the kernel function . Then, the Fourier transform of is defined by , where is the empirical characteristic function of . Therefore, under the classical assumption , for any , we can obtain an estimator of as follows: By the inverse Fourier transform, we have Note that (8) does not depend on any bandwidth , compared to the deconvolution kernel density estimation methods in the literature. Instead of looking for an arbitrary shape for the kernel and choosing an optimal bandwidth (see [1]), we rather look for an optimal shape of the kernel such that the estimator minimizes the mean integrated square error By means of Parseval’s theorem and (7), we have Note that and , and the MISE can be rewritten as Then we get the optimal kernel, which in Fourier space reads Replacing in (7) by the optimal kernel (12), we have Similar to the discussion of Bernacchia and Pigolotti [26], the self-consistent estimator of is equivalent to solving the equation Simple calculation yields where is the indicator function ( if ; if ) and , , is a set of accepted frequencies (i.e., the frequencies giving a nonzero contribution to the estimate). In practical applications, when the sample has been observed, we can choose a bounded interval where is a truncation parameter and can be chosen as follows.

Step 1. Give an initial value . In our simulation, we take the initial value , where and are the minimum and maximum sample quantiles of .

Step 2. Set , if the inequality condition holds in one half of the interval . Otherwise, set , where is a positive constant, in our simulation we take .

Step 3. Repeat Step 2 until satisfies the inequality condition.

From the simulation studies in Section 3, we can see that the choice of is convenient and fast, and the density estimator is not sensitive to the choice of .

By (15) and the inverse Fourier transform, the self-consistent estimate of is defined by Let Then, can be rewritten in the kernel form

We state the asymptotic behavior of the self-consistent estimator in the following theorem.

Theorem 1. In addition to the regularity conditions (A1)–(A3) listed in the Appendix, assume that either of the following conditions holds.(a)The error distribution is super smooth and .(b)The error distribution is ordinary smooth and ,where .
Then the self-consistent density estimator defined by (17) is consistent; that is, , as .

Remark 2. As many other methods, for example, higher-order kernel estimators, spline estimators, wavelet estimators, orthogonal expansion estimators, and so forth, the resulting self-consistent estimators are not nonnegative, but those can be corrected without any error cost by translating the estimate downwards until the positive part is normalized to 1 and setting to 0 the negative part. For example, the modified estimator is defined by , where is chosen in such a way that ; see more details in Efromovich [27], Glad et al. [28], and so on.

Theorem 3. Assume that conditions (A1)–(A3) hold and .(a)When the error distribution is super smooth and , one has (b)When the error distribution is ordinary smooth and , one has

2.2. Error Distribution Is Unknown

Suppose that, in addition to a sample from , we observe a noise sample from the unknown distribution . We assume that information about is obtained by the preliminary calibration measures before the work or is provided by an independent experiment.

When the characteristic function in (15) is unknown, we then cannot estimate the density function using (17) directly. Thus, we need to estimate the characteristic function . Here, we can estimate based on the preliminary noise sample as follows: Similar to Neumann [21], we introduce the following truncated estimator: Then we can obtain the self-consistent estimator of the density function where .

Theorem 4. In addition to the regularity conditions (A1)–(A3) listed in the Appendix, assume that either of the following conditions holds.(a)The error distribution is super smooth, , , and .(b)The error distribution is super smooth, , as , and .(c)The error distribution is ordinary smooth, , , and .(d)The error distribution is ordinary smooth, , as , and .Then the self-consistent density estimator defined by (24) is consistent; that is, , as .

Remark 5. In the cases or and , from the proof of theorem in Appendix, one can check that the estimator defined by (24) is consistent as long as the noise sample size tends to infinity as increases, while the rate of tending to infinity is not restricted. Therefore, in practice, the noise sample size does not need to be very large in these cases.

Theorem 6. Assume that conditions (A1)–(A3) hold and .(a)When the error distribution is super smooth, , , and , then one has (b)When the error distribution is super smooth, , , and , then one has (c)The error distribution is ordinary smooth, , and ; then one has

3. Simulation Studies

In this section, we report on the simulation studies to illustrate the finite sample performances of the proposed self-consistent (SC) estimation method and compare it with the kernel (KN) method. For comparison, we compute the estimators for different signal densities and different types of noise. The following densities are considered:(a)gamma distribution with shape parameter and scale parameter ;(b)standard normal distribution .

Two kinds of error distributions are considered to study their effects on the MISE of the estimators.(c)Double exponential error: the density of is given by (d)Normal error: the density of is given by

For kernel method, we choose the Gaussian kernel for double exponential error, and for normal error we suppose that the kernel has a Fourier transform . For the unknown error distributions, we suppose that the noise sample comes from the above two error distributions. Throughout the simulations, we set and . The MISE of the estimators are computed with 1000 random samples, and the results are reported in Tables 1 and 2. When is known or unknown, the gamma density estimator through the double exponential noise and the normal density estimator through the normal noise are shown in Figures 1 and 2, respectively.

Tables 1 and 2 indicate the following simulation results.(1)When the error distribution is unknown, we can see that the estimator of the characteristic function of the error does not spoil the procedure so much. It even happens that the estimation with unknown error distribution works better, which is probably due to the truncation (23).(2)For the same observation sample, the self-consistent method performs slightly better than the kernel method. It is worth mentioning that the self-consistent method can save more time than the kernel method, because the truncation parameter can be chosen conveniently.(3)The MISE obtained by self-consistent and kernel methods decrease as increases whether the error distribution is known or unknown.

Figures 1 and 2 clearly show that the self-consistent estimators perform well, no matter whether the error distribution is known or unknown. Moreover, for , the results show that our method is still very satisfactory for small noise sample size.

Next, we evaluate the sensitivity of the self-consistent density estimation procedure on the choice of truncation parameter . In this simulation, we only consider the case when the error distribution is double exponential, and the value of is fixed at , , and , respectively, where is obtained by the iterative algorithm given in Section 2.1. The MISE of the self-consistent estimator is reported in Table 3. From Table 3 we can see that, for given , the performance of the proposed method does not depend sensitively on the choice of the value of .

4. Application to Real Data

We now use two real datasets to illustrate the proposed method. The first dataset is the density of direction chosen by an ant to an evenly illuminated black target. The experiment with 100 ants was described by Fisher [29], who concluded that the ants tend to run toward the target with some moderate variation. For this particular example, , a point where an ant intersects a circle, can be treated as indirect observation of the direction chosen by an ant. The estimation method of (17) is used to analyze the dataset. In Figure 3, the solid line denotes the estimate based on the assumption that the data can be accurately measured (i.e., ), and the dashed and dot-dashed lines denote the case of measurement errors with and , respectively. We compare our results with Efromovich [30] who already used the orthogonal series approach to estimate the density.

From Figure 3, we find that when the data are accurately measured (), the corresponding density estimate is rather smooth and has a pronounced mode (direction of the target) with a relatively large deviation. When the standard deviation increases, it makes the density more picky interestingly and makes two additional local modes more significant. Therefore, the assumption about measurement error reveals the presence of three distinct groups (strata) of ants. A majority clearly tends to move toward the target with a rather small variation, and a minority tends to move in two other directions about the target also with small variation. These findings basically agree with those that were discovered by Efromovich [30]. But the two directions are not symmetric here, and the variation of the right side is slightly less than that of the left side.

The second example involves estimating density of the magnitudes of Alaskan earthquakes for the period from 1969 to 1978. The data comes from the National Oceanic and Atmospheric Administration’s Hypocenter Data File [31]. In this dataset, denotes the logarithm of the seismogram amplitude of longitudinal body waves for 62 Alaskan earthquakes. There is a measurement error associated with the observations, which includes errors made in determining the amplitude of ground motion arising from such things as the orientation of a limited number of observation stations to the fault plane of the earthquake. In Figure 4, the solid line denotes the estimate based on assumption that the body waves are accurately measured (i.e., ), and the dashed and dot-dashed lines denote the case of Laplace measurement errors with position parameter and scale parameters and , respectively.

Figure 4 clearly shows the nature of the indirect data. Under the assumption that the data are accurately measured (), the corresponding density estimate is smooth and has two pronounced modes. For the case of , the distinctions between the density estimates become more noticeable. It makes the data become a unimodal distribution with increasing. It is clear that the presence of the measurement error allows one to gain some insight into the data.

Appendix

Proofs of Theorems

We begin this section by listing the regularity conditions needed in the proofs of all the theorems:(A1) is square integrable and its Fourier transform is integrable;(A2) for any ;(A3) as .

In the following text, we use to represent any positive constant. The proofs of theorems rely on the following lemma.

Lemma A.1. Let ; then there is a positive constant such that

Proof. Lemma A.1 is from Lemma 2.1 in Neumann [21]; the proof is omitted.

Proof of Theorem 1. By (15), note that and for . Then, we have By conditions (A1) and (A3), as , it is easy to show that . To finish the proof of Theorem 1, we need to show that and .
We first consider the case that the error distribution is super smooth (). For , by (3), we have By Cauchy-Schwarz inequality and Jensen inequality, it is easy to verify that By Lemma 2 in Butucea and Tsybakov [5] and a simple calculation, we have where . Hence, we show that .
Now we consider . Let and ; we have By (3) and the inequality for , we have As , similar to the proof of (A.3), it is easy to show that . Thus, we finish the proof of super smooth case in Theorem 1.
Next we consider the case that the error distribution is ordinary smooth (). By (3), we can derive that Again using Cauchy-Schwarz inequality and Jensen inequality, we have which implies that if . Hence, we obtain that .
For , using the same arguments, we can obtain that Similar to the proof of (A.8) we can show that as and . This finishes the proof of Theorem 1.

Proof of Theorem 3. From (19), by Lemma 2 in Butucea and Tsybakov [5] and Cauchy-Schwarz inequality, we have Now we compute the variance of as follows:
When the error distribution is super smooth (), by (3), we have If , we can obtain that Therefore, .
When the error distribution is ordinary smooth (), by (3), we have If , we can obtain that . Therefore,
Next, we consider the MISE of the estimator . By Parseval theorem, we have For , by (4), we have
We now consider . By some simple calculations, we can drive that Therefore, when the error distribution is super smooth (), we have Choosing , we have
When the error distribution is ordinary smooth (, ), we have Choosing , we have The proof of Theorem 3 is completed.

Proof of Theorem 4. The proof of Theorem 4 can easily be derived by Theorem 6, we hence omit the details, and we only need to prove Theorem 6.

Proof of Theorem 6. By (24) and Parseval theorem, we have
Therefore, we only need to deal with . Invoking Lemma A.1, it is easy to show that Then, when the error distribution is super smooth (), by (4), we have If , it is easy to show that Hence, by choosing and , together with (A.18) and (A.20), we can drive that If , we have Hence, by choosing and , together with (A.18) and (A.20), we can drive that
When the error distribution is ordinary smooth (), we have Then, by choosing and , together with (A.18) and (A.22), we can drive that Thus, we finish the proof of Theorem 6.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

Junhua Zhang’s research was supported by the National Nature Science Foundation of China (11472057, 11002005), the Science and Technology Project of Beijing Municipal Education Commission (KM201511232001), and the Training Programme Foundation for the Beijing Municipal Excellent Talents (2013D005007000005). Yuping Hu’s research was supported by the National Statistical Science Research Program of China (2014LZ45). Sanying Feng’s research was supported by the National Nature Science Foundation of China (11101014, 11001118) and the Doctoral Fund of Innovation of Beijing University of Technology.