Journal of Probability and Statistics

Volume 2015, Article ID 242683, 21 pages

http://dx.doi.org/10.1155/2015/242683

## Optimal Bandwidth Selection for Kernel Density Functionals Estimation

Department of Mathematical Sciences, The University of Memphis, Memphis, TN 38152, USA

Received 10 April 2015; Revised 19 June 2015; Accepted 21 June 2015

Academic Editor: Ricardas Zitikis

Copyright © 2015 Su Chen. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

The choice of bandwidth is crucial to the kernel density estimation (KDE) and kernel based regression. Various bandwidth selection methods for KDE and local least square regression have been developed in the past decade. It has been known that scale and location parameters are proportional to density functionals with appropriate choice of and furthermore equality of scale and location tests can be transformed to comparisons of the density functionals among populations. can be estimated nonparametrically via kernel density functionals estimation (KDFE). However, the optimal bandwidth selection for KDFE of has not been examined. We propose a method to select the optimal bandwidth for the KDFE. The idea underlying this method is to search for the optimal bandwidth by minimizing the mean square error (MSE) of the KDFE. Two main practical bandwidth selection techniques for the KDFE of are provided: Normal scale bandwidth selection (namely, “Rule of Thumb”) and direct plug-in bandwidth selection. Simulation studies display that our proposed bandwidth selection methods are superior to existing density estimation bandwidth selection methods in estimating density functionals.

#### 1. Introduction

Suppose that a random variable with a probability density function (p.d.f.) belongs to a location-scale family. Let and be the location and scale parameter of , respectively. We have for some base function . If is a symmetric function, then is usually chosen to be the same class of distribution with mean zero. For instance, if is the p.d.f. of Normal distribution with mean and standard deviation , then is usually chosen to be the density of standard Normal distribution. In the nonparametric world, is not assumed to have any prespecified distributional format. Therefore, and are unknown and can not be estimated by any distribution based method such as maximum likelihood estimate. Ahmad [1] proposed a nonparametric kernel estimation of location and scale parameters via density functionals estimation with known base functions. The location and scale functions are written in terms of density functionals as follows:Apparently, the location and scale only rely on two functionals of unknown density , namely, and , if is known. Ahmad [1] showed that the new kernel location and scale estimates had better asymptotic property than MLE. Simulation results in Ahmad and Amezziane [2], a subsequent work of Ahmad [1], indicated that the kernel location and scale estimators have a comparable variability to that of the MLE and smaller than that of Huber’s M-estimator. However, it is usually difficult or impossible to know the base density especially in the nonparametric world. Moreover can be derived in terms of and if the base density is given. In this case, it becomes a parametric situation and MLE can be considered. From this point of view, Ahmad’s scale and location estimates are not very practical in real world application because the base density function needs to be known first.

Chen [3] proposed kernel-based nonparametric tests of equality of scale and location parameters among populations based on the kernel scale and location estimators proposed by Ahmad [1]. To test is equivalent to test according to (1), where and are the scale and density function of th population, respectively, and . Likewise, is equivalent to by (3) if homogeneous scale is assumed. This fact motivates Chen [3] to build test statistics for equality of scale and location on the density functionals estimation of and , respectively. Chen [3] brought a new life to the two kernel density functionals estimations, which were originally introduced to estimate location and scale parameter by Ahmad [1]. When comparing the scale (or location) parameters among populations, the differences in scale (or location) can be completely determined by (or ) and becomes irrelevant if we assume populations are from same distributional family but differed only in locations and/or scales. Thus the assumption of having to know base density as required in kernel scale and location estimation was successfully dropped. To find a good estimate of density functionals and becomes our next concern.

Aubuchon and Hettmansperger [4] proposed a kernel estimation of by a convolution of kernel density estimation function with the empirical CDF and showed its asymptotic equivalence to Lehmann’s estimator (see Lehmann [5] for details) based on the Wilcoxon confidence interval. Ahmad [1] provided two approaches to estimate and , one is similar to Aubuchon and Hettmansperger [4] and the other is to approximate the density function with an orthogonal series expansion, and then estimate the functionals of density. Grübel [6] estimated the density functionals for known under certain conditions through the kernel density estimate of the unknown .

Choice of bandwidth (window width or smoothing parameter) is crucial for every kernel based procedure, such as kernel density estimation and kernel regression. A vast amount of literatures has been devoted in choosing practical optimal bandwidth for techniques built on kernel estimation. Representative surveys of bandwidth selection techniques can be found in Bowman [10], Jones et al. [11], Loader [12], and Wand and Jones [13]. Jones et al. [11] grouped data-based bandwidth selection methods for density estimation into “first generation” method and “second generation” method. The first generation methods, including the least-square cross-validation (LSCV) in Bowman [8] and biased cross-validation in Scott and Terrell [14], suffer from a slow relative rate of convergence to of order . Härdle and Marron [15] applied the least squares cross-validation idea to bandwidth selection on Nadaraya-Watson estimator. The second generation methods are mainly based on plug-in techniques. The idea of “plug-in” is to replace with a consistent estimate first proposed by Nadaraya [16] and Woodroofe [17]; however, the practical choice of pilot bandwidth was not discussed. Sheather and Jones [9] proposed a refined plug-in method, so-called “solve-the-equation (STE)” plug-in , which has faster rate of convergence of order than cross-validation estimators. Smoothed cross-validation (SCV) is also a plug-in type method with pilot bandwidth of format . It was developed by Hall et al. [18]. Müller [19] and Staniswalis [20] employed the idea in the kernel regression. Hall et al. [21] constructed root- bandwidth selectors and achieved the optimal relative rate of convergence by appropriate choice of the parameters in pilot bandwidth . Gasser et al. [22] and Ruppert et al. [23] borrowed the simple direct plug-in idea to local linear regression. Fan and Gijbels [24] applied “cross-validation technique, the Normal-reference method, and the plug-in approach” for the density estimation setting to their corresponding bandwidth selectors for local polynomial regression method.

However, few literature studies the optimal bandwidth for the estimation of and , which are two important density functionals for estimating location and scale parameters as discussed in the prior paragraphs. Aubuchon and Hettmansperger [4] chose the bandwidth by removing the bias term. Grübel [6] suggested using the MISE-optimal choice of bandwidth in kernel density estimation. Chen [3] uses the least-square cross-validation bandwidth selection method for density estimation. In this paper, we will derive optimal bandwidth selection of kernel location and scale estimation by minimizing the MSE of the kernel functionals estimation for and . This paper will also propose two practical bandwidth selection methods and then compare them with various bandwidth selections for kernel density estimation such as Rule-of-Thumb, direct plug-in (DPI), least square cross-validation, and biased cross-validation (BCV).

For simplicity of illustration, a unified format (i.e., ) of the two density functionals mentioned above will be used throughout the paper. When and , it equals and , respectively. The paper is organized as follows. The optimal bandwidth for estimation in terms of AMSE criterion is derived in Section 2.1. Two practical bandwidth selection methods for are provided in Sections 2.2 and 2.3 when and . Asymptotic distribution of direct plug-in bandwidth for kernel functionals estimation of is given in Section 2.3 as well. Section 3 conducts three simulation studies to explore the properties of proposed bandwidth selection methods and evaluate their performance compared to several classical bandwidth selection methods for kernel density estimation.

#### 2. Main Results

##### 2.1. Optimal Bandwidth Selection

Define and . Let us write and in a more general density functionals . Note that and are special cases of , where is and , respectively. Suppose are independent random variable from a distribution with density function , where is unknown. Similar to Aubuchon and Hettmansperger [4] and Grübel [6], we obtain the kernel density functionals estimate of by , where is the kernel density estimate of and is the empirical CDF. Thus, a kernel density functionals estimate of is given bywhere and is the kernel function (details can be found in Wand’s book). The following theorem provides the mean and variance of in (4) for fixed .

Theorem 1. *For in (4), the expected value and variance of are given by**where and .*

We prove this in Appendix A. The first term in (6) is nonnegative by Jensen’s inequality. Then the MSE of can be written as follows:Therefore, the optimal bandwidth selection for density functionals estimation of is , the minimizer of . To obtain a closed form of optimal bandwidth for kernel functionals estimation of , the minimizer of the asymptotic mean square error (AMSE) of is studied instead. The optimal bandwidth for estimation of with respect to AMSE criterion is given bywhereHowever, in (8) is not computable since and depend on unknown function . A quick and simple guess of AMSE-optimal bandwidth is “Normal scale” bandwidth. It gives reasonable answers whenever the data are close to Normal. In the next section, Normal scale bandwidth selection will be studied for and , respectively.

##### 2.2. Normal Scale Bandwidth Selection

When , reduces to . By (8) in Section 2.1, the bandwidth that minimizes asymptotically iswhere is the kernel density functionals estimation of .

Proposition 2. *If is Normal with mean and variance then the Normal scale AMSE-optimal bandwidth selector for is given by**where is some estimate of .*

The proof of Proposition 2 can be found in Appendix B. If Gaussian kernel is chosen, that is, is the density of standard Normal distribution, then and . Hence (11) is simplified towhich can be called “Rule-of-Thumb” (ROT) bandwidth selector for kernel scale estimation.

When , becomes . The bandwidth selector that minimizes isfollowed by (8) in Section 2.1.

Proposition 3. *If is Normal with mean and variance then the Normal scale AMSE-optimal bandwidth selector for is given by**where is an estimate of and is an estimate of . If (sample standard deviation) and (sample mean) then (14) can be rewritten as**where is the coefficient of variation (CV). Particularly when for fixed , goes to infinity.*

The proof of Proposition 3 is given in Appendix C. When kernel function is the density of standard Normal distribution, then the “Rule-of-Thumb” bandwidth selector for kernel location estimation isBoth (14) and (16) infer that the larger the location of in absolute value is, the smaller the optimal bandwidth is needed. In another word, the optimal-AMSE bandwidth for goes to infinity. This fact also not merely applies to Normal with zero mean but also can be extended to any distribution with p.d.f. an even function (symmetric distribution around zero).

Corollary 4. *For any distribution with even density function , that is, , then the optimal-AMSE bandwidth selector is .*

*Remarks*. (1) The optimal bandwidth for estimation of is not effected by the location of , but the scale parameter. However, the optimal bandwidth for estimation of not only depends on scale but also varies along with the location. This fact will be illustrated by the simulation study in Section 3.1. Note that scale parameter is determined by and location parameter is determined by .

(2) The common choice of is sample standard deviation as in Silverman [7]. However, Wand and Jones [13] recommended the smaller value between and interquartile range . Janssen et al. [25] also studied other more sophisticated estimates of .

##### 2.3. Direct Plug-In Bandwidth Selection

If the distribution of ’s, that is, , departs far from Normal distribution, then Normal scale bandwidth selector will be problematic. Note that and in (8) are unknown and need to be estimated to obtain a practical optimal bandwidth selector. A natural estimate of isSimilarly, can be estimated byReplacement of and by and leads to the* direct plug-in* (DPI) bandwidth selector for :

Obviously, the kernel density functionals estimates in (17) and (18) rely on the choice of pilot bandwidth . Simple candidates for pilot bandwidth are to use* Normal scale* bandwidth selector proposed in Section 2.2 for or smoothing parameters for traditional density estimate (e.g., ROT, LSCV, BCV, and DPI surveyed in Wand and Jones [13]). The DPI bandwidth selection can be practically computed through the following procedures.

*Step 1. *Estimate using the Normal scale bandwidth proposed in Section 2.2 (i.e., for estimation of and for estimation of ) or bandwidth selection for density estimation (such as ROT [7], LSCV [8], BVC [14], and DPI [9]).

*Step 2. *Estimate and using in (17) and in (18).

*Step 3. *The DPI bandwidth selection for is obtained followed by (19).

The performance of these pilot bandwidth selections is compared in terms of MSE of through Monte Carlo simulation in Section 3.2 (*Simulation Study 2*). Next, we will study the asymptotic distribution of . The limiting distribution of practical bandwidth selector is very important in that the rate of convergence is the chief concern.

Proposition 5. *If and density function are continuous and satisfy and , then*

*The proof of Proposition 5 is provided in Appendix D. Thus the direct plug-in bandwidth selection for functional density estimation has relative convergence rate of order .*

*Remarks*. Particularly, when , the DPI bandwidth selector for estimation of is , where and . Likewise, when , the DPI bandwidth selector for estimation of is , where and .

*3. Simulation Study*

*Three simulation studies are carried out to evaluate [ Simulation Study 1] the accuracy of and (Normal scale bandwidth for and ) comparing to and under normality assumption; [Simulation Study 2] the optimal choices of pilot bandwidth for and in terms of MSE of and , respectively; [Simulation Study 3] the performance of proposed practical optimal bandwidth selection methods (ROT and DPI proposed in Sections 2.2 and 2.3) versus traditional (classical) bandwidth selection for kernel density estimate in terms of MSE of . As to the choice of kernel function , it has been shown in literatures that the choice of bandwidth overrides the effect of choice of kernel function. So for simplicity, we just use the Gaussian kernel in all the three simulation studies.*

*3.1. **Simulation Study 1*

*Simulation Study 1*

*The purpose of this study is threefold (1) to evaluate the performance of and when samples are from Normal distribution, (2) to study and (the optimal bandwidths that minimize the MSE of and , resp.) in terms of the location parameter of Normal distribution, and (3) to illustrate numerically that optimal bandwidth that minimizes the MSE of goes to infinity when location parameter gets closer to zero.*

*Figure 1 plots the MSE of versus the choice of bandwidth when sample of sizes 20, 50, 100, and 200 is drawn from and , respectively (the simulation result is not sensitive to the choice of scale). The blue curve in each subplot represents the MSE() as bandwidth ranges from 0 to 2. The minimum point of the blue curve indicates . The red vertical line in the subplot represents and is computed from (10) by replacing with the p.d.f. of , where in Figure 1(a) and in Figure 1(b). is an estimate of (an asymptotic approximation of ) under normality assumption. Simulation results in Figure 1 show that tends to have small variance and stabilized around the true for normality data. The optimal bandwidth does not change with location parameter as shown in Figure 1(a) () and Figure 1(b) () (more simulation results based on location parameters other than 0 and 1 are available upon request.).*