#### Abstract

We propose an automatic selection of the bandwidth of the recursive kernel estimators of a probability density function defined by the stochastic approximation algorithm introduced by Mokkadem et al. (2009a). We showed that, using the selected bandwidth and the stepsize which minimize the MISE (mean integrated squared error) of the class of the recursive estimators defined in Mokkadem et al. (2009a), the recursive estimator will be better than the nonrecursive one for small sample setting in terms of estimation error and computational costs. We corroborated these theoretical results through simulation study.

#### 1. Introduction

The problem of automatic choice of smoothing parameters has been widely studied. There are many reasons to use an automatic choice of smoothing. One is in many situations the smoothing which is used by nonexperts. In this paper we focus only on one-dimensional kernel density estimation. The main ideas are useful in all types of nonparametric curve estimation, including regression, distribution, and time series. The bandwidth selection methods studied in the literature can be divided into two broad classes: the cross-validation techniques and the plug-in ideas.

There are many varieties of the technique cross-validation: pseudolikelihood cross-validation [1], least squares cross-validation [2], and biased cross-validation [3]. Reviews of all these bandwidth selection methods can be found in Marron [4].

Plug-in methods [5], also called “second generation methods” [6], need to use a pilot bandwidth to estimate the unknown quantities. For a choice of pilot bandwidth, a number of approaches have been proposed; see Jones et al. [7] for details and references. An interesting approach to choose the pilot bandwidth is the smoothed bootstrap [8]. In this paper, we developed a specific second generation bandwidth selection method of the recursive kernel estimators of a probability density function defined by the stochastic approximation algorithm introduced by Mokkadem et al. [9].

Let be independent, identically distributed random variables and let denote the probability density of . To construct a stochastic algorithm, which approximates the function at a given point , Mokkadem et al. [9] define an algorithm of search of the zero of the function . Following Robbins-Monro's procedure, this algorithm is defined by setting , and, for all , where is an “observation” of the function at the point and the stepsize is a sequence of positive real numbers that goes to zero. To define , Mokkadem et al. [9] follow the approach of Révész [10, 11] and of Tsybakov [12] and introduce a kernel (i.e., a function satisfying ), a bandwidth (i.e., a sequence of positive real numbers that goes to zero), and set . Then, the estimator to recursively estimate the density function at the point can be written as This estimator was introduced by Mokkadem et al. [9] whose large and moderate deviation principles were established by Slaoui [13].

Throughout this paper, we suppose that and we let ; then it follows from (2) that one can estimate recursively at the point by Moreover, it was shown in Mokkadem et al. [9] that the bandwidth which minimizes the of depends on the choice of the stepsize ; they show in particular that the sequence belongs to this set, under some conditions of regularity of , and they show that the bandwidth must equal The first aim of this paper is to propose an automatic selection of such bandwidth through a plug-in method, and the second aim is to give the conditions under which the recursive estimator will be better than the nonrecursive kernel density estimator introduced by Rosenblatt [14] (see also Parzen [15]) and defined as The simulation results given in Section 3 are corroborating these theoretical results. The remainder of the paper is organized as follows. In Section 2, we state our main results. Section 3 is devoted to our simulation results. We conclude the paper in Section 4.

#### 2. Assumptions and Main Results

We define the following class of regularly varying sequences.

*Definition 1. *Let and let be a nonrandom positive sequence. One says that if

Condition (6) was introduced by Galambos and Seneta [16] to define regularly varying sequences (see also Bojanic and Seneta [17]) and by Mokkadem and Pelletier [18] in the context of stochastic approximation algorithms. Note that the acronym stands for [16]. Typical sequences in are, for , , , and so on.

The assumptions to which we will refer are as follows.(A1) is a continuous, bounded function satisfying , , and .(A2)(i) with .(ii) with .(iii).(A3) is bounded and differentiable and is bounded.

Assumption on the limit of as goes to infinity is usual in the framework of stochastic approximation algorithms. It implies in particular that the limit of is finite. Throughout this paper we will use the following notations: In order to measure the quality of our recursive estimator (3), we use the following quantity: Moreover, in the case , Proposition 1 in Mokkadem et al. [9] shows that and that Then The following corollary ensures that the bandwidth which minimizes the depends on the stepsize and then the corresponding depends also on the stepsize .

Corollary 2. *Let assumptions hold. To minimize the of , the stepsize must be chosen in and the bandwidth must equal
**
Then, one has
*

The following corollary shows that, for a special choice of the stepsize , which fulfilled that and that , the optimal value for depends on and then the corresponding depends on .

Corollary 3. *Let assumptions hold. To minimize the of , the stepsize must be chosen in , , and the bandwidth must equal
**
and one then has
**
Moreover, the minimum of is reached at ; then the bandwidth must equal
**
and one then has
*

In order to estimate the optimal bandwidth (16), we must estimate and . We followed the approach of Altman and Léger [19], which is called the plug-in estimate, and we use the following kernel estimator of : where is a kernel and is the associated bandwidth.

In practice, we take (see Silverman [20]) with the sample standard deviation and , denoting the first and third quartiles, respectively.

The following theorem gives the bias and variance of .

Theorem 4. *Let assumptions hold and suppose that the kernel satisfies assumption and , with ; one has
*

The following corollary shows that the bandwidth which minimizes the of depends on the stepsize and then the corresponding depends also on the stepsize .

Corollary 5. *Let the assumptions of Theorem 4 hold. To minimize the of , the stepsize must be chosen in and the bandwidth must equal
**
Then, one has
*

The following corollary shows that, for a special choice of the stepsize , which fulfilled that and that , the optimal value for depends on and the corresponding depends on .

Corollary 6. *Let the assumptions of Theorem 4 hold. To minimize the of , the stepsize must be chosen in , , and the bandwidth must equal
**
Then, one has
*

Moreover, the minimum of is reached at ; then the bandwidth must equal Then It follows that Furthermore, to estimate , we introduce the following kernel estimator: where is the second order derivative of a kernel . The bias and variance of are computed in the following theorem.

Theorem 7. *Let assumptions hold, and suppose that the kernel satisfies assumption and , with ; one has
*

The following corollary ensures that the bandwidth which minimizes the of depends on the stepsize and then the corresponding depends also on the stepsize .

Corollary 8. *Let the assumptions of Theorem 7 hold. To minimize the of , the stepsize must be chosen in and the bandwidth must equal
**
then one has
*

The following corollary shows that, for a special choice of the stepsize , which fulfilled that and that , the optimal value for depends on and the corresponding depends on .

Corollary 9. *Let the assumptions of Theorem 7 hold. To minimize the of , the stepsize must be chosen in , , and the bandwidth must equal
**
then one has
*

Moreover, the minimum of is reached at ; then the bandwidth must equal then we have It follows that Finally, the plug-in estimator of the bandwidth using the recursive algorithm (3) must equal Now, let us recall that the bias and variance of Rosenblatt's estimator are given by It follows that To minimize the of , the bandwidth must equal and then we have To estimate the optimal bandwidth (41), we must estimate and . As suggested by Hall and Marron [21], we use the following kernel estimator of : The following lemma gives the bias and variance of .

Lemma 10. *Suppose that the kernel satisfies assumption , , and assumption and satisfies assumption **
To minimize the of , the bandwidth must equal
**
then one has
*

Furthermore, to estimate , we use the following kernel estimator introduced in Hall and Marron [21]: The following lemma gives the bias and variance of .

Lemma 11. *Suppose that the kernel satisfies assumption , , and assumption and satisfies assumption **
To minimize the of , the bandwidth must equal
**
then one has
**
It follows then that
*

Then the plug-in estimator of the bandwidth using the nonrecursive estimator (5) must equal The following corollary gives the expected of the recursive estimator and the nonrecursive estimator .

Corollary 12. *Let the assumptions of Theorem 7 hold. Then
**
where
*

The following theorem gives the conditions under which the expected of the recursive estimator will be smaller than the expected of the nonrecursive estimator .

Theorem 13. *Let the assumptions of Theorem 7 hold and let the bandwidth equal (37) and the stepsize . One has
*

Then, the expected of the recursive estimator defined by (3) is smaller than the expected of the nonrecursive estimator defined by (5) for small sample setting.

#### 3. Simulation

The aim of our simulation study is to compare the performance of the nonrecursive Rosenblatt's estimator defined in (5) with that of the recursive estimators defined in (3).

When applying one needs to choose three quantities.(i)For the function , we choose the normal kernel.(ii)The stepsize .(iii)The bandwidth is chosen to be equal to (37).

When applying one needs to choose two quantities. (i)For the function , as in the recursive framework, we use the normal kernel.(ii)The bandwidth is chosen to be equal to (52).

In order to investigate the comparison between the two estimators, we consider two densities of : the standard normal distribution (see Table 1) and the exponential distribution (see Table 2). For each of these two cases, samples of sizes and were generated. For each fixed bandwidth , we computed the mean and the standard deviation (over the samples) of , , , and . The plug-in estimators (37) and (52) require two kernels to estimate and . In both cases we use the normal kernel with and given in (19), with equal, respectively, to and . Both tables show that the bias, respectively, and the standard deviation of using the recursive algorithm (3) are very similar to the bias, respectively, and the standard deviation of using the nonrecursive estimator (5), the bias, respectively, and the standard deviation of using the recursive algorithm (3) are always smaller than the bias, respectively, and the standard deviation of using the nonrecursive estimator (5), the mean, respectively, and the standard deviation of the bandwidths selected by the recursive estimator (3) are always smaller than the bias, respectively, and the standard deviation of the bandwidths selected by the nonrecursive estimator (5), and the mean, respectively, and the standard deviation of the of the bandwidths selected by the recursive estimator (3) are always smaller than the bias, respectively, and the standard deviation of of the bandwidths selected by the nonrecursive estimator (5). In Tables 1 and 2 the Ref. column could be used as a reference for the mean of and .

Figures 1 and 2 show boxplots of the selected bandwidths by the two algorithms (3) and (5), respectively. For samples of size and , the bandwidths selected by the recursive estimator (3) are always smaller than the bandwidths selected by the nonrecursive estimator (5).

**(a)**

**(b)**

**(a)**

**(b)**

Figures 3 and 4 show boxplots of the expected by the two algorithms (3) and (5), respectively. For samples of size and , the expected of the selected bandwidths by the recursive estimator (3) are always smaller than the expected of the selected bandwidths by the nonrecursive estimator (5).

**(a)**

**(b)**

**(a)**

**(b)**

In order to give some comparative elements with nonrecursive estimator (5), including computational costs, we consider samples of size generated from a standard normal distribution ; moreover, we suppose that we receive an additional samples of size generated also from a standard normal distribution . Performing the two methods, the running time using the recursive estimator defined by algorithm (2) with stepsize and the bandwidth given in (37) was roughly 6880 s on the author's workstation, while the running time using the nonrecursive estimator defined by algorithm (5) with the bandwidth given in (52) was roughly 14080 s on the author's workstation.

This simulation study shows the good performance of the recursive estimator defined by algorithm (3) with stepsize and the bandwidth given in (37) for small sample setting.

#### 4. Conclusion

In this paper we proposed an automatic selection of the bandwidth of the recursive kernel estimators of a probability density function defined by the stochastic approximation algorithm (2). We showed that, using the selected bandwidth and the stepsize (the stepsize which minimizes the of the class of the recursive estimators defined in Mokkadem et al. [9]), the recursive estimator will be better than the nonrecursive one for small sample setting. The simulation study corroborated these theoretical results. Moreover, the simulation results indicate that the proposed recursive estimator has more computing efficiency than the nonrecursive estimator.

In conclusion, the proposed method allowed us to obtain better results than the nonrecursive estimator proposed by Rosenblatt [14] for small sample setting. Moreover, we plan to make an extension of our method in the future and to consider the case of a regression function as in Härdle and Marron [22] in recursive way (see Mokkadem et al. [23]) and the case of time series as in Hart and Vieu [24] in recursive way.

#### Conflict of Interests

The author declares that there is no conflict of interests regarding the publication of this paper.

#### Acknowledgments

The author is grateful to the editor and to the referee for their helpful comments, which have led to this substantially improved version of the paper.