Bandwidth Selection for Recursive Kernel Density Estimators Defined by Stochastic Approximation Method

Slaoui, Yousri

doi:https://doi.org/10.1155/2014/739640

Journal of Probability and Statistics

On this page

Abstract Introduction Conclusion Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2014 | Article ID 739640 | https://doi.org/10.1155/2014/739640

Bandwidth Selection for Recursive Kernel Density Estimators Defined by Stochastic Approximation Method

Yousri Slaoui¹

Academic Editor: Shein-chung Chow

Received18 Jan 2014

Revised18 May 2014

Accepted18 May 2014

Published02 Jun 2014

Abstract

We propose an automatic selection of the bandwidth of the recursive kernel estimators of a probability density function defined by the stochastic approximation algorithm introduced by Mokkadem et al. (2009a). We showed that, using the selected bandwidth and the stepsize which minimize the MISE (mean integrated squared error) of the class of the recursive estimators defined in Mokkadem et al. (2009a), the recursive estimator will be better than the nonrecursive one for small sample setting in terms of estimation error and computational costs. We corroborated these theoretical results through simulation study.

1. Introduction

The problem of automatic choice of smoothing parameters has been widely studied. There are many reasons to use an automatic choice of smoothing. One is in many situations the smoothing which is used by nonexperts. In this paper we focus only on one-dimensional kernel density estimation. The main ideas are useful in all types of nonparametric curve estimation, including regression, distribution, and time series. The bandwidth selection methods studied in the literature can be divided into two broad classes: the cross-validation techniques and the plug-in ideas.

There are many varieties of the technique cross-validation: pseudolikelihood cross-validation [1], least squares cross-validation [2], and biased cross-validation [3]. Reviews of all these bandwidth selection methods can be found in Marron [4].

Plug-in methods [5], also called “second generation methods” [6], need to use a pilot bandwidth to estimate the unknown quantities. For a choice of pilot bandwidth, a number of approaches have been proposed; see Jones et al. [7] for details and references. An interesting approach to choose the pilot bandwidth is the smoothed bootstrap [8]. In this paper, we developed a specific second generation bandwidth selection method of the recursive kernel estimators of a probability density function defined by the stochastic approximation algorithm introduced by Mokkadem et al. [9].

Let be independent, identically distributed random variables and let denote the probability density of . To construct a stochastic algorithm, which approximates the function at a given point , Mokkadem et al. [9] define an algorithm of search of the zero of the function . Following Robbins-Monro's procedure, this algorithm is defined by setting , and, for all , where is an “observation” of the function at the point and the stepsize is a sequence of positive real numbers that goes to zero. To define , Mokkadem et al. [9] follow the approach of Révész [10, 11] and of Tsybakov [12] and introduce a kernel (i.e., a function satisfying ), a bandwidth (i.e., a sequence of positive real numbers that goes to zero), and set . Then, the estimator to recursively estimate the density function at the point can be written as This estimator was introduced by Mokkadem et al. [9] whose large and moderate deviation principles were established by Slaoui [13].

Throughout this paper, we suppose that and we let ; then it follows from (2) that one can estimate recursively at the point by Moreover, it was shown in Mokkadem et al. [9] that the bandwidth which minimizes the of depends on the choice of the stepsize ; they show in particular that the sequence belongs to this set, under some conditions of regularity of , and they show that the bandwidth must equal The first aim of this paper is to propose an automatic selection of such bandwidth through a plug-in method, and the second aim is to give the conditions under which the recursive estimator will be better than the nonrecursive kernel density estimator introduced by Rosenblatt [14] (see also Parzen [15]) and defined as The simulation results given in Section 3 are corroborating these theoretical results. The remainder of the paper is organized as follows. In Section 2, we state our main results. Section 3 is devoted to our simulation results. We conclude the paper in Section 4.

2. Assumptions and Main Results

We define the following class of regularly varying sequences.

Definition 1. Let and let be a nonrandom positive sequence. One says that if

Condition (6) was introduced by Galambos and Seneta [16] to define regularly varying sequences (see also Bojanic and Seneta [17]) and by Mokkadem and Pelletier [18] in the context of stochastic approximation algorithms. Note that the acronym stands for [16]. Typical sequences in are, for , , , and so on.

The assumptions to which we will refer are as follows.(A1) is a continuous, bounded function satisfying , , and .(A2)(i) with .(ii) with .(iii).(A3) is bounded and differentiable and is bounded.

Assumption on the limit of as goes to infinity is usual in the framework of stochastic approximation algorithms. It implies in particular that the limit of is finite. Throughout this paper we will use the following notations: In order to measure the quality of our recursive estimator (3), we use the following quantity: Moreover, in the case , Proposition 1 in Mokkadem et al. [9] shows that and that Then The following corollary ensures that the bandwidth which minimizes the depends on the stepsize and then the corresponding depends also on the stepsize .

Corollary 2. Let assumptions hold. To minimize the of , the stepsize must be chosen in and the bandwidth must equal Then, one has

The following corollary shows that, for a special choice of the stepsize , which fulfilled that and that , the optimal value for depends on and then the corresponding depends on .

Corollary 3. Let assumptions hold. To minimize the of , the stepsize must be chosen in , , and the bandwidth must equal and one then has Moreover, the minimum of is reached at ; then the bandwidth must equal and one then has

In order to estimate the optimal bandwidth (16), we must estimate and . We followed the approach of Altman and Léger [19], which is called the plug-in estimate, and we use the following kernel estimator of : where is a kernel and is the associated bandwidth.

In practice, we take (see Silverman [20]) with the sample standard deviation and , denoting the first and third quartiles, respectively.

The following theorem gives the bias and variance of .

Theorem 4. Let assumptions hold and suppose that the kernel satisfies assumption and , with ; one has

The following corollary shows that the bandwidth which minimizes the of depends on the stepsize and then the corresponding depends also on the stepsize .

Corollary 5. Let the assumptions of Theorem 4 hold. To minimize the of , the stepsize must be chosen in and the bandwidth must equal Then, one has

The following corollary shows that, for a special choice of the stepsize , which fulfilled that and that , the optimal value for depends on and the corresponding depends on .

Corollary 6. Let the assumptions of Theorem 4 hold. To minimize the of , the stepsize must be chosen in , , and the bandwidth must equal Then, one has

Moreover, the minimum of is reached at ; then the bandwidth must equal Then It follows that Furthermore, to estimate , we introduce the following kernel estimator: where is the second order derivative of a kernel . The bias and variance of are computed in the following theorem.

Theorem 7. Let assumptions hold, and suppose that the kernel satisfies assumption and , with ; one has

The following corollary ensures that the bandwidth which minimizes the of depends on the stepsize and then the corresponding depends also on the stepsize .

Corollary 8. Let the assumptions of Theorem 7 hold. To minimize the of , the stepsize must be chosen in and the bandwidth must equal then one has

The following corollary shows that, for a special choice of the stepsize , which fulfilled that and that , the optimal value for depends on and the corresponding depends on .

Corollary 9. Let the assumptions of Theorem 7 hold. To minimize the of , the stepsize must be chosen in , , and the bandwidth must equal then one has

Moreover, the minimum of is reached at ; then the bandwidth must equal then we have It follows that Finally, the plug-in estimator of the bandwidth using the recursive algorithm (3) must equal Now, let us recall that the bias and variance of Rosenblatt's estimator are given by It follows that To minimize the of , the bandwidth must equal and then we have To estimate the optimal bandwidth (41), we must estimate and . As suggested by Hall and Marron [21], we use the following kernel estimator of : The following lemma gives the bias and variance of .

Lemma 10. Suppose that the kernel satisfies assumption , , and assumption and satisfies assumption To minimize the of , the bandwidth must equal then one has

Furthermore, to estimate , we use the following kernel estimator introduced in Hall and Marron [21]: The following lemma gives the bias and variance of .

Lemma 11. Suppose that the kernel satisfies assumption , , and assumption and satisfies assumption To minimize the of , the bandwidth must equal then one has It follows then that

Then the plug-in estimator of the bandwidth using the nonrecursive estimator (5) must equal The following corollary gives the expected of the recursive estimator and the nonrecursive estimator .

Corollary 12. Let the assumptions of Theorem 7 hold. Then where

The following theorem gives the conditions under which the expected of the recursive estimator will be smaller than the expected of the nonrecursive estimator .

Theorem 13. Let the assumptions of Theorem 7 hold and let the bandwidth equal (37) and the stepsize . One has

Then, the expected of the recursive estimator defined by (3) is smaller than the expected of the nonrecursive estimator defined by (5) for small sample setting.

3. Simulation

The aim of our simulation study is to compare the performance of the nonrecursive Rosenblatt's estimator defined in (5) with that of the recursive estimators defined in (3).

When applying one needs to choose three quantities.(i)For the function , we choose the normal kernel.(ii)The stepsize .(iii)The bandwidth is chosen to be equal to (37).

When applying one needs to choose two quantities. (i)For the function , as in the recursive framework, we use the normal kernel.(ii)The bandwidth is chosen to be equal to (52).

In order to investigate the comparison between the two estimators, we consider two densities of : the standard normal distribution (see Table 1) and the exponential distribution (see Table 2). For each of these two cases, samples of sizes and were generated. For each fixed bandwidth , we computed the mean and the standard deviation (over the samples) of , , , and . The plug-in estimators (37) and (52) require two kernels to estimate and . In both cases we use the normal kernel with and given in (19), with equal, respectively, to and . Both tables show that the bias, respectively, and the standard deviation of using the recursive algorithm (3) are very similar to the bias, respectively, and the standard deviation of using the nonrecursive estimator (5), the bias, respectively, and the standard deviation of using the recursive algorithm (3) are always smaller than the bias, respectively, and the standard deviation of using the nonrecursive estimator (5), the mean, respectively, and the standard deviation of the bandwidths selected by the recursive estimator (3) are always smaller than the bias, respectively, and the standard deviation of the bandwidths selected by the nonrecursive estimator (5), and the mean, respectively, and the standard deviation of the of the bandwidths selected by the recursive estimator (3) are always smaller than the bias, respectively, and the standard deviation of of the bandwidths selected by the nonrecursive estimator (5). In Tables 1 and 2 the Ref. column could be used as a reference for the mean of and .

Figures 1 and 2 show boxplots of the selected bandwidths by the two algorithms (3) and (5), respectively. For samples of size and , the bandwidths selected by the recursive estimator (3) are always smaller than the bandwidths selected by the nonrecursive estimator (5).

(a)

(b)

(a)

(b)

Figures 3 and 4 show boxplots of the expected by the two algorithms (3) and (5), respectively. For samples of size and , the expected of the selected bandwidths by the recursive estimator (3) are always smaller than the expected of the selected bandwidths by the nonrecursive estimator (5).

(a)

(b)

(a)

(b)

In order to give some comparative elements with nonrecursive estimator (5), including computational costs, we consider samples of size generated from a standard normal distribution ; moreover, we suppose that we receive an additional samples of size generated also from a standard normal distribution . Performing the two methods, the running time using the recursive estimator defined by algorithm (2) with stepsize and the bandwidth given in (37) was roughly 6880 s on the author's workstation, while the running time using the nonrecursive estimator defined by algorithm (5) with the bandwidth given in (52) was roughly 14080 s on the author's workstation.

This simulation study shows the good performance of the recursive estimator defined by algorithm (3) with stepsize and the bandwidth given in (37) for small sample setting.

4. Conclusion

In this paper we proposed an automatic selection of the bandwidth of the recursive kernel estimators of a probability density function defined by the stochastic approximation algorithm (2). We showed that, using the selected bandwidth and the stepsize (the stepsize which minimizes the of the class of the recursive estimators defined in Mokkadem et al. [9]), the recursive estimator will be better than the nonrecursive one for small sample setting. The simulation study corroborated these theoretical results. Moreover, the simulation results indicate that the proposed recursive estimator has more computing efficiency than the nonrecursive estimator.

In conclusion, the proposed method allowed us to obtain better results than the nonrecursive estimator proposed by Rosenblatt [14] for small sample setting. Moreover, we plan to make an extension of our method in the future and to consider the case of a regression function as in Härdle and Marron [22] in recursive way (see Mokkadem et al. [23]) and the case of time series as in Hart and Vieu [24] in recursive way.

Conflict of Interests

The author declares that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

The author is grateful to the editor and to the referee for their helpful comments, which have led to this substantially improved version of the paper.

References

R. P. W. Duin, “On the choice of smoothing parameters for parzen estimators of probability density functions,” IEEE Transactions on Computers, vol. C-25, no. 11, pp. 1175–1179, 1976.
View at: Google Scholar
M. Rudemo, “Empirical choice of histograms and kernel density estimators,” Scandinavian Journal of Statistics, vol. 9, pp. 65–78, 1982.
View at: Google Scholar
D. W. Scott and G. R. Terrell, “Biased and unbiased cross-validation in density estimation,” Journal of the American Statistical Association, vol. 82, pp. 1131–1146, 1987.
View at: Google Scholar
J. S. Marron, “Automatic smoothing parameter selection: a survey,” Empirical Economics, vol. 13, no. 3-4, pp. 187–208, 1988.
View at: Publisher Site | Google Scholar
M. Woodroofe, “On choosing a delta-sequence,” Annals of Statistics, vol. 41, pp. 1665–1671, 1970.
View at: Google Scholar
M. P. Wand and M. C. Jones, Kernel Smoothing, Chapman and Hall, London, UK, 1994.
M. C. Jones, J. S. Marron, and S. J. Sheather, “A brief survey of bandwidth selection for density estimation,” Journal of the American Statistical Association, vol. 91, no. 433, pp. 401–407, 1996.
View at: Google Scholar
J. S. Marron, “Bootstrap bandwidth selection,” in Exploring the Limits of Bootstrap, R. LePage and L. Billard, Eds., pp. 249–262, John Wiley, New York, NY, USA, 1992.
View at: Google Scholar
A. Mokkadem, M. Pelletier, and Y. Slaoui, “The stochastic approximation method for the estimation of a multivariate probability density,” Journal of Statistical Planning and Inference, vol. 139, no. 7, pp. 2459–2478, 2009.
View at: Publisher Site | Google Scholar
P. Révész, “Robbins-Monro procedure in a Hilbert space and its application in the theory of learning processes I,” Studia Scientiarum Mathematicarum Hungarica, vol. 8, pp. 391–398, 1973.
View at: Google Scholar
P. Révész, “How to apply the method of stochastic approximation in the nonparametric estimation of a regression function,” Mathematische Operationsforschung und Statistik, Series Statistics, vol. 8, pp. 119–126, 1977.
View at: Google Scholar
A. B. Tsybakov, “Recurrent estimation of the mode of a multidimensional distribution,” Problems of Information Transmission, vol. 8, pp. 119–126, 1990.
View at: Google Scholar
Y. Slaoui, “Large and moderate principles for recursive kernel density estimators defined by stochastic approximation method,” Serdica Mathematical Journal, vol. 39, pp. 53–82, 2013.
View at: Google Scholar
M. Rosenblatt, “Remarks on some nonparametric estimates of a density function,” The Annals of Mathematical Statistics, vol. 27, pp. 832–837, 1956.
View at: Google Scholar
E. Parzen, “On estimation of a probability density and mode,” The Annals of Mathematical Statistics, vol. 33, pp. 1065–1076, 1962.
View at: Google Scholar
J. Galambos and E. Seneta, “Regularly varying sequences,” American Mathematical Society, vol. 41, pp. 110–116, 1973.
View at: Google Scholar
R. Bojanic and E. Seneta, “A unified theory of regularly varying sequences,” Mathematische Zeitschrift, vol. 134, no. 2, pp. 91–106, 1973.
View at: Publisher Site | Google Scholar
A. Mokkadem and M. Pelletier, “A companion for the Kiefer-Wolfowitz-Blum stochastic approximation algorithm,” Annals of Statistics, vol. 35, no. 4, pp. 1749–1772, 2007.
View at: Publisher Site | Google Scholar
N. Altman and C. Léger, “Bandwidth selection for kernel distribution function estimation,” Journal of Statistical Planning and Inference, vol. 46, no. 2, pp. 195–214, 1995.
View at: Google Scholar
B. W. Silverman, Density Estimation for Statistics and Data Analysis, Chapman & Hall, London, UK, 1986.
P. Hall and J. S. Marron, “Estimation of integrated squared density derivatives,” Statistics and Probability Letters, vol. 6, no. 2, pp. 109–115, 1987.
View at: Google Scholar
W. Härdle and J. S. Marron, “Optimal bandwidth selection in nonparametric regression function estimation,” Annals of Statistics, vol. 12, pp. 1465–1481, 1985.
View at: Google Scholar
A. Mokkadem, M. Pelletier, and Y. Slaoui, “Revisiting Révész's stochastic approximation method for the estimation of a regression function,” ALEA—LatinAmerican Journal of Probability and Mathematical Statistics, vol. 6, pp. 63–114, 2009.
View at: Google Scholar
J. D. Hart and P. Vieu, “Data-driven bandwidth choice for density estimation based on dependent data,” Annals of Statistics, vol. 18, pp. 873–890, 1990.
View at: Google Scholar

Copyright

Copyright © 2014 Yousri Slaoui. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

1446

Downloads

1182

Citations