Table of Contents Author Guidelines Submit a Manuscript
Journal of Probability and Statistics
Volume 2015 (2015), Article ID 242683, 21 pages
http://dx.doi.org/10.1155/2015/242683
Research Article

Optimal Bandwidth Selection for Kernel Density Functionals Estimation

Department of Mathematical Sciences, The University of Memphis, Memphis, TN 38152, USA

Received 10 April 2015; Revised 19 June 2015; Accepted 21 June 2015

Academic Editor: Ricardas Zitikis

Copyright © 2015 Su Chen. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

The choice of bandwidth is crucial to the kernel density estimation (KDE) and kernel based regression. Various bandwidth selection methods for KDE and local least square regression have been developed in the past decade. It has been known that scale and location parameters are proportional to density functionals with appropriate choice of and furthermore equality of scale and location tests can be transformed to comparisons of the density functionals among populations. can be estimated nonparametrically via kernel density functionals estimation (KDFE). However, the optimal bandwidth selection for KDFE of has not been examined. We propose a method to select the optimal bandwidth for the KDFE. The idea underlying this method is to search for the optimal bandwidth by minimizing the mean square error (MSE) of the KDFE. Two main practical bandwidth selection techniques for the KDFE of are provided: Normal scale bandwidth selection (namely, “Rule of Thumb”) and direct plug-in bandwidth selection. Simulation studies display that our proposed bandwidth selection methods are superior to existing density estimation bandwidth selection methods in estimating density functionals.

1. Introduction

Suppose that a random variable with a probability density function (p.d.f.) belongs to a location-scale family. Let and be the location and scale parameter of , respectively. We have for some base function . If is a symmetric function, then is usually chosen to be the same class of distribution with mean zero. For instance, if is the p.d.f. of Normal distribution with mean and standard deviation , then is usually chosen to be the density of standard Normal distribution. In the nonparametric world, is not assumed to have any prespecified distributional format. Therefore, and are unknown and can not be estimated by any distribution based method such as maximum likelihood estimate. Ahmad [1] proposed a nonparametric kernel estimation of location and scale parameters via density functionals estimation with known base functions. The location and scale functions are written in terms of density functionals as follows:Apparently, the location and scale only rely on two functionals of unknown density , namely, and , if is known. Ahmad [1] showed that the new kernel location and scale estimates had better asymptotic property than MLE. Simulation results in Ahmad and Amezziane [2], a subsequent work of Ahmad [1], indicated that the kernel location and scale estimators have a comparable variability to that of the MLE and smaller than that of Huber’s M-estimator. However, it is usually difficult or impossible to know the base density especially in the nonparametric world. Moreover can be derived in terms of and if the base density is given. In this case, it becomes a parametric situation and MLE can be considered. From this point of view, Ahmad’s scale and location estimates are not very practical in real world application because the base density function needs to be known first.

Chen [3] proposed kernel-based nonparametric tests of equality of scale and location parameters among populations based on the kernel scale and location estimators proposed by Ahmad [1]. To test is equivalent to test according to (1), where and are the scale and density function of th population, respectively, and . Likewise, is equivalent to by (3) if homogeneous scale is assumed. This fact motivates Chen [3] to build test statistics for equality of scale and location on the density functionals estimation of and , respectively. Chen [3] brought a new life to the two kernel density functionals estimations, which were originally introduced to estimate location and scale parameter by Ahmad [1]. When comparing the scale (or location) parameters among populations, the differences in scale (or location) can be completely determined by (or ) and becomes irrelevant if we assume   populations are from same distributional family but differed only in locations and/or scales. Thus the assumption of having to know base density as required in kernel scale and location estimation was successfully dropped. To find a good estimate of density functionals and becomes our next concern.

Aubuchon and Hettmansperger [4] proposed a kernel estimation of by a convolution of kernel density estimation function with the empirical CDF and showed its asymptotic equivalence to Lehmann’s estimator (see Lehmann [5] for details) based on the Wilcoxon confidence interval. Ahmad [1] provided two approaches to estimate and , one is similar to Aubuchon and Hettmansperger [4] and the other is to approximate the density function with an orthogonal series expansion, and then estimate the functionals of density. Grübel [6] estimated the density functionals for known under certain conditions through the kernel density estimate of the unknown .

Choice of bandwidth (window width or smoothing parameter) is crucial for every kernel based procedure, such as kernel density estimation and kernel regression. A vast amount of literatures has been devoted in choosing practical optimal bandwidth for techniques built on kernel estimation. Representative surveys of bandwidth selection techniques can be found in Bowman [10], Jones et al. [11], Loader [12], and Wand and Jones [13]. Jones et al. [11] grouped data-based bandwidth selection methods for density estimation into “first generation” method and “second generation” method. The first generation methods, including the least-square cross-validation (LSCV) in Bowman [8] and biased cross-validation in Scott and Terrell [14], suffer from a slow relative rate of convergence to of order . Härdle and Marron [15] applied the least squares cross-validation idea to bandwidth selection on Nadaraya-Watson estimator. The second generation methods are mainly based on plug-in techniques. The idea of “plug-in” is to replace with a consistent estimate first proposed by Nadaraya [16] and Woodroofe [17]; however, the practical choice of pilot bandwidth was not discussed. Sheather and Jones [9] proposed a refined plug-in method, so-called “solve-the-equation (STE)” plug-in , which has faster rate of convergence of order than cross-validation estimators. Smoothed cross-validation (SCV) is also a plug-in type method with pilot bandwidth of format . It was developed by Hall et al. [18]. Müller [19] and Staniswalis [20] employed the idea in the kernel regression. Hall et al. [21] constructed root- bandwidth selectors and achieved the optimal relative rate of convergence by appropriate choice of the parameters in pilot bandwidth . Gasser et al. [22] and Ruppert et al. [23] borrowed the simple direct plug-in idea to local linear regression. Fan and Gijbels [24] applied “cross-validation technique, the Normal-reference method, and the plug-in approach” for the density estimation setting to their corresponding bandwidth selectors for local polynomial regression method.

However, few literature studies the optimal bandwidth for the estimation of and , which are two important density functionals for estimating location and scale parameters as discussed in the prior paragraphs. Aubuchon and Hettmansperger [4] chose the bandwidth by removing the bias term. Grübel [6] suggested using the MISE-optimal choice of bandwidth in kernel density estimation. Chen [3] uses the least-square cross-validation bandwidth selection method for density estimation. In this paper, we will derive optimal bandwidth selection of kernel location and scale estimation by minimizing the MSE of the kernel functionals estimation for and . This paper will also propose two practical bandwidth selection methods and then compare them with various bandwidth selections for kernel density estimation such as Rule-of-Thumb, direct plug-in (DPI), least square cross-validation, and biased cross-validation (BCV).

For simplicity of illustration, a unified format (i.e., ) of the two density functionals mentioned above will be used throughout the paper. When and , it equals and , respectively. The paper is organized as follows. The optimal bandwidth for estimation in terms of AMSE criterion is derived in Section 2.1. Two practical bandwidth selection methods for are provided in Sections 2.2 and 2.3 when and . Asymptotic distribution of direct plug-in bandwidth for kernel functionals estimation of is given in Section 2.3 as well. Section 3 conducts three simulation studies to explore the properties of proposed bandwidth selection methods and evaluate their performance compared to several classical bandwidth selection methods for kernel density estimation.

2. Main Results

2.1. Optimal Bandwidth Selection

Define and . Let us write and in a more general density functionals . Note that and are special cases of , where is and , respectively. Suppose are independent random variable from a distribution with density function , where is unknown. Similar to Aubuchon and Hettmansperger [4] and Grübel [6], we obtain the kernel density functionals estimate of by , where is the kernel density estimate of and is the empirical CDF. Thus, a kernel density functionals estimate of is given bywhere and is the kernel function (details can be found in Wand’s book). The following theorem provides the mean and variance of in (4) for fixed .

Theorem 1. For in (4), the expected value and variance of are given bywhere and .

We prove this in Appendix A. The first term in (6) is nonnegative by Jensen’s inequality. Then the MSE of can be written as follows:Therefore, the optimal bandwidth selection for density functionals estimation of is , the minimizer of . To obtain a closed form of optimal bandwidth for kernel functionals estimation of , the minimizer of the asymptotic mean square error (AMSE) of is studied instead. The optimal bandwidth for estimation of with respect to AMSE criterion is given bywhereHowever, in (8) is not computable since and depend on unknown function . A quick and simple guess of AMSE-optimal bandwidth is “Normal scale” bandwidth. It gives reasonable answers whenever the data are close to Normal. In the next section, Normal scale bandwidth selection will be studied for and , respectively.

2.2. Normal Scale Bandwidth Selection

When , reduces to . By (8) in Section 2.1, the bandwidth that minimizes asymptotically iswhere is the kernel density functionals estimation of .

Proposition 2. If is Normal with mean and variance then the Normal scale AMSE-optimal bandwidth selector for is given bywhere is some estimate of .

The proof of Proposition 2 can be found in Appendix B. If Gaussian kernel is chosen, that is, is the density of standard Normal distribution, then and . Hence (11) is simplified towhich can be called “Rule-of-Thumb” (ROT) bandwidth selector for kernel scale estimation.

When , becomes . The bandwidth selector that minimizes isfollowed by (8) in Section 2.1.

Proposition 3. If is Normal with mean and variance then the Normal scale AMSE-optimal bandwidth selector for is given bywhere is an estimate of and is an estimate of . If (sample standard deviation) and (sample mean) then (14) can be rewritten aswhere is the coefficient of variation (CV). Particularly when for fixed , goes to infinity.

The proof of Proposition 3 is given in Appendix C. When kernel function is the density of standard Normal distribution, then the “Rule-of-Thumb” bandwidth selector for kernel location estimation isBoth (14) and (16) infer that the larger the location of in absolute value is, the smaller the optimal bandwidth is needed. In another word, the optimal-AMSE bandwidth for goes to infinity. This fact also not merely applies to Normal with zero mean but also can be extended to any distribution with p.d.f. an even function (symmetric distribution around zero).

Corollary 4. For any distribution with even density function , that is, , then the optimal-AMSE bandwidth selector is .

Remarks. (1) The optimal bandwidth for estimation of is not effected by the location of , but the scale parameter. However, the optimal bandwidth for estimation of not only depends on scale but also varies along with the location. This fact will be illustrated by the simulation study in Section 3.1. Note that scale parameter is determined by and location parameter is determined by .

(2) The common choice of is sample standard deviation as in Silverman [7]. However, Wand and Jones [13] recommended the smaller value between and interquartile range . Janssen et al. [25] also studied other more sophisticated estimates of .

2.3. Direct Plug-In Bandwidth Selection

If the distribution of ’s, that is, , departs far from Normal distribution, then Normal scale bandwidth selector will be problematic. Note that and in (8) are unknown and need to be estimated to obtain a practical optimal bandwidth selector. A natural estimate of isSimilarly, can be estimated byReplacement of and by and leads to the direct plug-in (DPI) bandwidth selector for :

Obviously, the kernel density functionals estimates in (17) and (18) rely on the choice of pilot bandwidth . Simple candidates for pilot bandwidth are to use Normal scale bandwidth selector proposed in Section 2.2 for or smoothing parameters for traditional density estimate (e.g., ROT, LSCV, BCV, and DPI surveyed in Wand and Jones [13]). The DPI bandwidth selection can be practically computed through the following procedures.

Step 1. Estimate using the Normal scale bandwidth proposed in Section 2.2 (i.e., for estimation of and for estimation of ) or bandwidth selection for density estimation (such as ROT [7], LSCV [8], BVC [14], and DPI [9]).

Step 2. Estimate and using in (17) and in (18).

Step 3. The DPI bandwidth selection for is obtained followed by (19).

The performance of these pilot bandwidth selections is compared in terms of MSE of through Monte Carlo simulation in Section 3.2 (Simulation Study  2). Next, we will study the asymptotic distribution of . The limiting distribution of practical bandwidth selector is very important in that the rate of convergence is the chief concern.

Proposition 5. If and density function are continuous and satisfy and , then

The proof of Proposition 5 is provided in Appendix D. Thus the direct plug-in bandwidth selection for functional density estimation has relative convergence rate of order .

Remarks. Particularly, when , the DPI bandwidth selector for estimation of is , where and . Likewise, when , the DPI bandwidth selector for estimation of is , where and .

3. Simulation Study

Three simulation studies are carried out to evaluate [Simulation Study  1] the accuracy of and (Normal scale bandwidth for and ) comparing to and under normality assumption; [Simulation Study  2] the optimal choices of pilot bandwidth for and in terms of MSE of and , respectively; [Simulation Study  3] the performance of proposed practical optimal bandwidth selection methods (ROT and DPI proposed in Sections 2.2 and 2.3) versus traditional (classical) bandwidth selection for kernel density estimate in terms of MSE of . As to the choice of kernel function , it has been shown in literatures that the choice of bandwidth overrides the effect of choice of kernel function. So for simplicity, we just use the Gaussian kernel in all the three simulation studies.

3.1. Simulation Study  1

The purpose of this study is threefold (1) to evaluate the performance of and when samples are from Normal distribution, (2) to study and (the optimal bandwidths that minimize the MSE of and , resp.) in terms of the location parameter of Normal distribution, and (3) to illustrate numerically that optimal bandwidth that minimizes the MSE of goes to infinity when location parameter gets closer to zero.

Figure 1 plots the MSE of versus the choice of bandwidth when sample of sizes 20, 50, 100, and 200 is drawn from and , respectively (the simulation result is not sensitive to the choice of scale). The blue curve in each subplot represents the MSE() as bandwidth ranges from 0 to 2. The minimum point of the blue curve indicates . The red vertical line in the subplot represents    and is computed from (10) by replacing with the p.d.f. of , where in Figure 1(a) and in Figure 1(b). is an estimate of (an asymptotic approximation of ) under normality assumption. Simulation results in Figure 1 show that tends to have small variance and stabilized around the true for normality data. The optimal bandwidth does not change with location parameter as shown in Figure 1(a) () and Figure 1(b) () (more simulation results based on location parameters other than 0 and 1 are available upon request.).

Figure 1: The MSE of when underlying distribution is Normal with mean (a), (b) and standard deviation . The blue curve is the MSE() versus the bandwidth . The boxplot of is based on the 100 sets of simulated samples of size from Normal distribution with mean and standard deviation . The red vertical line represents if is Normal with known mean 0 and variance .

Figure 2 plots the MSE of versus the choice of bandwidth when sample of sizes 20, 50, 100, and 200 is drawn from , , , and , respectively. Similar to Figure 1, the blue curve and red vertical line represent MSE() and . The boxplot of is based on the 100 sets of simulated samples of size from Normal distribution with mean and standard deviation . The red vertical line disappears in Figure 1(a) due to the fact that as . Also the MSE of (blue curve) strictly decreases as rises in Figure 2(a). To conclude from Figure 1(a), the optimal bandwidth for kernel functional estimation of goes to infinity when mean of underlying distribution is zero, which is consistent to Proposition 3. However, the boxplot in Figure 2(a) infers that the distribution of is right-skewed with median, 1st quantile, and 3rd quantile around one, which is far departure from the true value as well as . When slightly deviates from zero, just as the case in Figures 2(b) and 2(c), tends to be less variate (and skewed), overlap with (red vertical line), and get closer to (valley of blue curve), especially as the sample size grows. When increases up to 1 and above as shown in Figure 2(d), the median of , , and coincide when sample size is 50+. More simulation results can be found in Supplementary Material available online at http://dx.doi.org/10.1155/2015/242683.

Figure 2: The bandwidth for when underlying distribution is Normal with mean (a), (b), (c), and (d) and standard deviation .
3.2. Simulation Study  2

Several candidate bandwidth selection methods are available to serve as a pilot bandwidth , such as classical bandwidth selection methods for kernel density estimate (described in Section 2.3 Step 1), the optimal bandwidth for ( or ) on Normal scale references, namely, “” and “” in Section 2.2. This subsection aims to study the pilot bandwidth for and required to estimate in (19) (note that is simplified to and when and , resp.). Five different choices of pilot bandwidth are studied in this subsection to estimate and for under five different underlying distributions: (i) Cauchy with location ( was set to be 1 in simulation study for estimation of and 5 for estimation of )   and scale 2/3, (ii) Generalized Pareto with location , scale 2/3, and shape 1, (iii) Normal with location and scale 2/3, (iv) Mix-Normal I, and (v) Mix-Normal II. Mix-Normal I and Mix-Normal II are weighted mixtures of two Normal distributions: Mix-Normal I is and Mix-Normal I is . Cauchy distribution is a well-known fat-tailed symmetric distribution. Generalized Pareto with shape 1 is a extremely fat-tailed asymmetric (right-skewed) distribution. Density curves of the two Mix-Normal -distributions are given in Figure 3. It is obvious that both Mix-Normal I and Mix-Normal II are bimodal distributions in contrast to unimodal distributions such as Cauchy, Generalized Pareto, and Normal. Mix-Normal I is symmetric bimodal and Mix-Normal II is asymmetric (left-skewed) bimodal distribution. The motivation behind the choice of distributions is to see whether the optimal bandwidth is sensitive to the skewness, extreme outliers, and complex shape of the distributions in contrast to Normal.

Figure 3: Density curve of (a) Mix-Normal I: ; (b) Mix-Normal II: .

Figures 4 and 5 compare five candidate pilot bandwidth selection methods in terms of the boxplots of MSE of and when sample size is 100. The five candidates for pilot bandwidth selection considered in this paper are (1) Rule-of-Thumb bandwidth for and proposed in Section 2.2 (“1-ROT(s)” means ; “1-ROT(L)” means ); (2) Rule-of-Thumb bandwidth for kernel density estimation (KDE) proposed in Scott [26] (“2-ROT(d)” means , where is the minimum of standard deviation and interquantile range); (3) least-square cross-validation bandwidth for density estimation proposed in Bowman [8] (“3-UCV(d)” means for density estimation); (4) biased cross-validation for density estimation proposed in Scott and Terrell [14] (“4-BCV(d)” means for density estimation); (5) direct plug-in bandwidth for density estimation reported by Sheather [27] (“5-DPI(d)” means for density estimation). As shown in Figures 4 and 5 that is the worst candidate for pilot bandwidth in the density functionals estimation . The pilot bandwidth choice gives the lowest MSE() for Cauchy and Normal samples, while results in the lowest MSE() for Generalized Pareto, Mix-Normal I, and Mix-Normal II samples. Similar conclusions can be found in the pilot bandwidth choice for estimation of ; however, leads to slightly smaller MSE() than for Mix-Normal I and Mix-Normal II samples. Simulation Study  1 illustrates that the Normal reference bandwidth (including and ) is not a reliable estimate when the location parameter of underlying distribution is close to zero. Therefore, , rather than , is recommended to serve as pilot bandwidth in estimation of .

Figure 4: MSE of in terms of the choice of pilot bandwidth for ; “1-ROT(s)” means ; “2-ROT(d)” means ; “3-UCV(d)” means ; “4-BCV(d)” means ; “5-DPI(d)” means .
Figure 5: MSE of in terms of the choice of pilot bandwidth for ; “1-ROT(L)” means ; “2-ROT(d)” means ; “3-UCV(d)” means ; “4-BCV(d)” means ; “5-DPI(d)” means .
3.3. Simulation Study  3

This section aims to evaluate the performance of our proposed bandwidth (or ) in Section 2.2 and (or ) in Section 2.3 and compare with the classical bandwidth selection methods for density estimation, that is, , , and in estimation of (or ). Simulation Study  2 recommends and to be pilot bandwidth to estimate direct plug-in bandwidth for estimation of and among the 5 candidate pilot bandwidth methods and 5 different underlying distributions. Therefore, (or ) with these two pilot bandwidth selection methods, and ,  is considered separately in this subsection. Direct plug-in bandwidth for estimation of with pilot bandwidth is denoted by “” in Table 1 and “2a-DPI(s)” in Figure 1, and with pilot bandwidth is denoted by “" in Table 1 and “2b-DPI(s)” in Figure 1.

Table 1: Optimal bandwidth selection in estimation of with comparison to classical bandwidth selection methods.

The summary statistics (mean, 1st quantile, median, and 3rd quantile) for the three proposed bandwidths and 3 classical bandwidths are provided in Tables 1 and 2 to estimate and , respectively. Samples of size 50 and 100 from five different underlying distribution (3 unimodal distributions and 2 bimodal distributions) as in Simulation Study  2 are considered in both simulation studies of and estimation.

Table 2: Optimal bandwidth selection in estimation of with comparison to classical bandwidth selection methods.

In general, the optimal bandwidth for kernel density functionals estimation (estimation of and in this paper) is smaller than the one for kernel density estimation under same sample size and underlying distribution as shown in Tables 1 and 2, except for the least square cross-validation bandwidth for density estimation on Generalized Pareto samples. In another word, kernel density functionals estimation requires less smoothness in the estimation, which exaggerates some characteristics of the sample. For instance, the location and scale estimation will be more sensitive to the outliers than density estimation.

To evaluate the performance of our proposed bandwidth selection methods in contrast to classical bandwidth selection methods for density estimation in estimation of and , the MSE of (and ) are computed and compared. Figures 6 and 7 demonstrate the boxplot of MSE of and , respectively, in terms of 6 bandwidths shown in Tables 1 and 2 under five different distributions: Normal, Cauchy, Generalized Pareto, Mix-Normal I, and Mix-Normal II. Both figures illustrate that (a) MSE of and decreases as sample size increases from 50 to 100; (b) MSE of and is larger for samples from asymmetric distribution rather than symmetric distribution, from bimodal distribution rather than unimodal distribution. Figure 6 infers that (i) Normal scale bandwidths for both estimation of () and density estimation () lead to smaller relative to other 4 types of bandwidth selection methods for Normal samples of size 50. When sample size goes up to 100, outperforms with a smaller MSE; (ii) for Cauchy samples with location 1 and scale 2/3, becomes the worst choice in kernel density functionals estimation of , especially for relative large sample size. Our proposed bandwidth results in the smallest MSE in this case; (iii) for Generalized Pareto samples with shape 1, location 1, and scale 2/3, both and perform very poorly. However, with pilot bandwidth gives the smallest MSE for Pareto samples, which can partly be explained by the fact that gives the second smallest MSE; (iv) for bimodal distributed samples (including Mix-Normal I and Mix-Normal II), the three proposed bandwidth selection methods for estimation of completely dominate the three classical density estimation bandwidth selection methods. Among the three proposed bandwidth, with pilot bandwidth gives minimum .

Figure 6: The MSE of in terms of bandwidth selection ; “1-ROT(s)” means ; “2a-DPI(s)” means with pilot bandwidth ; “2b-DPI(s)” means with pilot bandwidth ; “3-ROT(d)” means ; “4-UCV(d)” means ; “5-DPI(d)” means .
Figure 7: The MSE of in terms of bandwidth selection ; “1-ROT(L)” means ; “2a-DPI(L)” means with pilot bandwidth ; “2b-DPI(L)” means with pilot bandwidth ; “3-ROT(d)” means ; “4-UCV(d)” means ; “5-DPI(d)” means .

Figure 7 compares the performance of the three proposed practical bandwidth selection methods to three classical density estimation bandwidths in terms of kernel density functionals estimation of . It is shown that with pilot bandwidth outperforms other 5 bandwidth methods in kernel density functionals estimation of for Normal and Cauchy samples. Similar to estimation of , direct plug-in bandwidth designed for estimation of , that is, with pilot bandwidth , beats other candidates for Generalized Pareto samples. The optimal bandwidth selection (with three practical estimates , , and ) for estimation of proposed in this paper performs significantly better than the bandwidth selection for density estimation in density functionals estimation for Mix-Normal distributions. Among the three proposed bandwidth selection methods for estimation of , works better than the other for a mixture of 2 Normal distributions as indicated in Figure 7.

4. Discussion

The optimal bandwidth, along with three practical bandwidth selection methods for kernel density functionals estimation of format , is discussed in this paper. Necessity and urgency of this study are due to the fact that and are two core portions for scale and location, respectively. Chen [3] shed a light on a novel field of nonparametric analytic method for experimental design relying on kernel density functionals estimation of and . Simulation studies in Chen [3] found that kernel-based equality of scale and location tests built on estimation of outperform traditional Levene’s test of variance and ANOVA test, respectively, in particular to fat-tailed distribution, such as Cauchy. Our proposed choice of bandwidth selection methods can be directly applied to Chen’s kernel-based equality of location tests, a nonparametric analog of ANOVA test, namely, “kernel-based ANOVA test.” As we discussed in Section 1, Chen [3] uses the least-square cross-validation bandwidth which is designed for density estimation. However, the test statistics of the scale and location test are constructed on the kernel functionals estimation of . Our proposed bandwidth will provide the “kernel-based ANOVA” test a better estimate of kernel functionals estimation of , which in turn may improve the performance of the “kernel-based ANOVA” test. Kernel-based ANOVA, like other group comparison methods, has a broad application in various fields, such as biomedical sciences, education, and psychology, to compare the differences among 2 or more groups. A real-life example in education by Mimoto and Zitikis [28] is to compare the differences of quantitative abilities between science and nonscience majored students.

To broaden the application of this kernel-based nonparametric experimental design and accelerate the methodology development, optimal bandwidth for kernel density functionals estimation is in need to increase the accuracy of scale or location estimation and its relevant hypothesis testing and confidence intervals. is a function of density and hence the classical bandwidth selection for kernel density estimation, such as ROT, UCV, BCV, and DPI, can serve as a choice of bandwidth in its kernel density functionals estimation described in (4). However, our simulation study shows that the proposed optimal bandwidth for is superior to the classical bandwidth selection for KDE in estimations and , two determinant integrations to scale and location parameters. It is also suggested by the simulation study that, in scale estimation, the proposed Rule-of-Thumb bandwidth is highly recommended for samples from symmetric unimodal distribution, whereas direct plug-in bandwidth with pilot bandwidth is recommended for samples from asymmetric unimodal or bimodal distribution. In location estimation, direct plug-in bandwidth with pilot bandwidth is still the best choice for asymmetric unimodal distribution, such as Generalized Pareto, whereas direct plug-in bandwidth with pilot bandwidth is recommended for samples from symmetric and unimodal distribution, such as Normal and Cauchy. The proposed Rule-of-Thumb bandwidth is recommended for samples from bimodal distribution, such as a mixture of two Normal distributions in location estimation or location related testing.

5. Conclusion

Kernel smoothing method has been actively studied in density estimation and local kernel regression in the past decade. It is a modern data analytic method with its expertise in capturing the local humps and valley in the distributions or regression functions. Few literature applied kernel smoothing techniques in analysis of experimental design, even the one-way ANOVA model. Ahmad [1] proposed a purely nonparametric method to estimate the location and scale parameter of any unknown distribution using kernel methods. Chen [3] presented a nonparametric nonrank based version of ANOVA using the kernel-based estimate of location (namely, “ANDFE” test) and opened a door to kernel-based nonparametric techniques for experimental design analysis. Chen [3] revealed that smoothing parameter (i.e., bandwidth) substantially affects the size and power of the ANDFE test.

This paper derived an optimal bandwidth for a nonparametric kernel density functionals estimation of location and scale of unknown density and thus for ANDFE test of equality of location and scale. Thereafter two practical bandwidth selection methods ROT and DPI for location and scale estimation are proposed. Compared to traditional bandwidth selection methods designed for kernel density estimation (e.g., Jones et al. [11]) or kernel regression (e.g., Fan and Gijbels [24]), our proposed bandwidth selection methods demonstrate more accuracy in the estimation of or . Simulation studies showed that our proposed novel optimal bandwidth method designed for kernel density functionals estimation of scale and location parameters works better than classical bandwidth selection designed for kernel density estimation, in particular bimodal distributions.

Appendices

A. Proof for Theorem 1

is an -statistics, and thus by applying the properties of -statistics in Lee [29], and can be computed as follows:where To compute the first term of the variance of , is calculated as follows:where . In order to compute the second term of , that is, the covariance term, needs to be simplified in the following equations:Thus the covariance term cov is given by . Therefore, (6) is proved.

B. Proof for Proposition 2

If is Normal with mean 0 and variance , then and in (10) can be calculated in terms of scale parameter . Note thatThus needs to be estimated since it is unknown. Let be a consistent estimate of , then we have Normal scale bandwidth for followed by (11).

C. Proof for Proposition 3

If is Normal with mean and variance , then and in (13) can be calculated in terms of scale parameter . Note thatThusThe Normal scale bandwidth for is followed by (14) by replacing and with its corresponding estimate.

D. Proof for Proposition 5

The direct plug-in bandwidth in (19) can be written aswhere . An analysis of errors involved in the approximation of by leads toThen the relative error of isBy Taylor’s theorem,where . Equation (D.4) shows that the convergence rate of depends on the density functionals estimation error of as well as the approximation error of to . and are -statistics and thus can be shown easily to follow approximately Normal distribution. The ratio of and is also approximately Normal under certain regular conditions by Hayya et al. [30] with variance given as follows:If the Normal scale bandwidth selector proposed in Section 2.2 is used as pilot bandwidth for the estimations and , then and . If bandwidth for density estimation is used as pilot bandwidth , then and . In both cases, term dominates the density functionals estimation error . Thus .

Conflict of Interests

The author declares that there is no conflict of interests regarding the publication of this paper.

References

  1. I. A. Ahmad, “Nonparametric estimation of the location and scale parameters based on density estimation,” Annals of the Institute of Statistical Mathematics, vol. 34, no. 1, pp. 39–53, 1982. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  2. I. A. Ahmad and M. Amezziane, “Estimation of location and scale parameters based on kernel functional estimators,” in Nonparametric Statistics and Mixture Models: A Festschrift in Honor of Thomas P Hettmansperger, pp. 1–14, World Scientific, 2011. View at Google Scholar
  3. S. Chen, Nonparametric ANOVA using Kernel methods [Ph.D. dissertation], Department of Statistics, Oklahoma State University, 2013.
  4. J. C. Aubuchon and T. Hettmansperger, “A note on the estimation of the integral of f2(x),” Journal of Statistical Planning and Inference, vol. 9, no. 3, pp. 321–331, 1984. View at Publisher · View at Google Scholar
  5. E. L. Lehmann, “Nonparametric confidence intervals for a shift parameter,” The Annals of Mathematical Statistics, vol. 34, pp. 1507–1512, 1963. View at Publisher · View at Google Scholar · View at MathSciNet
  6. R. Grübel, “Estimation of density functionals,” Annals of the Institute of Statistical Mathematics, vol. 46, no. 1, pp. 67–75, 1994. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  7. B. W. Silverman, Density Estimation for Statistics and Data Analysis, Chapman & Hall, London, UK, 1986. View at MathSciNet
  8. A. W. Bowman, “An alternative method of cross-validation for the smoothing of density estimates,” Biometrika, vol. 71, no. 2, pp. 353–360, 1984. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  9. S. J. Sheather and M. C. Jones, “A reliable data-based bandwidth selection method for kernel density estimation,” Journal of the Royal Statistical Society: Series B, vol. 53, no. 3, pp. 683–690, 1991. View at Google Scholar · View at MathSciNet
  10. A. W. Bowman, “A comparative study of some kernel-based nonparametric density estimators,” Journal of Statistical Computation and Simulation, vol. 21, no. 3-4, pp. 313–327, 1985. View at Publisher · View at Google Scholar
  11. M. C. Jones, J. S. Marron, and S. J. Sheather, “A brief survey of bandwidth selection for density estimation,” Journal of the American Statistical Association, vol. 91, no. 433, pp. 401–407, 1996. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet · View at Scopus
  12. C. R. Loader, “Bandwidth selection: classical or plug-in?” The Annals of Statistics, vol. 27, no. 2, pp. 415–438, 1999. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet
  13. M. P. Wand and M. C. Jones, Kernel Smoothing, Chapman and Hall, London, UK, 1995. View at Publisher · View at Google Scholar · View at MathSciNet
  14. D. W. Scott and G. R. Terrell, “Biased and unbiased cross-validation in density estimation,” Journal of the American Statistical Association, vol. 82, no. 400, pp. 1131–1146, 1987. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet
  15. W. Härdle and J. S. Marron, “Optimal bandwidth selection in nonparametric regression function estimation,” The Annals of Statistics, vol. 13, no. 4, pp. 1465–1481, 1985. View at Publisher · View at Google Scholar · View at MathSciNet
  16. E. A. Nadaraya, “On the integral mean square error of some nonparametric estimates for the density function,” Theory of Probability and Its Applications, vol. 19, no. 1, pp. 133–141, 1974. View at Publisher · View at Google Scholar
  17. M. Woodroofe, “On choosing a delta-sequence,” The Annals of Mathematical Statistics, vol. 41, pp. 1665–1671, 1970. View at Publisher · View at Google Scholar · View at MathSciNet
  18. P. Hall, J. S. Marron, and B. U. Park, “Smoothed cross-validation,” Probability Theory and Related Fields, vol. 92, no. 1, pp. 1–20, 1992. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet · View at Scopus
  19. H.-G. Müller, “Empirical bandwidth choice for nonparametric kernel regression by means of pilot estimators,” Statistics & Decisions, supplement 2, pp. 193–206, 1985. View at Google Scholar · View at MathSciNet
  20. J. G. Staniswalis, “Local bandwidth selection for kernel estimates,” Journal of the American Statistical Association, vol. 84, no. 405, pp. 284–288, 1989. View at Publisher · View at Google Scholar · View at MathSciNet
  21. P. Hall, S. J. Sheather, M. C. Jones, and J. S. Marron, “On optimal data-based bandwidth selection in kernel density estimation,” Biometrika, vol. 78, no. 2, pp. 263–269, 1991. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet · View at Scopus
  22. T. Gasser, A. Kneip, and W. Köhler, “A flexible and fast method for automatic smoothing,” Journal of the American Statistical Association, vol. 86, no. 415, pp. 643–652, 1991. View at Publisher · View at Google Scholar · View at MathSciNet
  23. D. Ruppert, S. J. Sheather, and M. P. Wand, “An effective bandwidth selector for local least squares regression,” Journal of the American Statistical Association, vol. 90, no. 432, pp. 1257–1270, 1995. View at Publisher · View at Google Scholar · View at MathSciNet
  24. J. Fan and I. Gijbels, Local Polynomial Modeling and Its Application, Chapman & Hall, London, UK, 1996. View at MathSciNet
  25. P. Janssen, J. S. Marron, N. Veraverbeke, and W. Sarle, “Scale measures for bandwidth selection,” Journal of Nonparametric Statistics, vol. 5, no. 4, pp. 359–380, 1995. View at Publisher · View at Google Scholar · View at MathSciNet
  26. D. W. Scott, Multivariate Density Estimation: Theory, Practice and Visualization, John Wiley & Sons, New York, NY, USA, 1992. View at Publisher · View at Google Scholar · View at MathSciNet
  27. S. J. Sheather, “The performance of six popular bandwidth selection methods on some real datasets,” Computational Statistics, vol. 7, pp. 225–250, 1992. View at Google Scholar
  28. N. Mimoto and R. Zitikis, “Czekanowski's index of overlap, its Lp-type extension, and bias reduction,” American Journal of Mathematical and Management Sciences, vol. 29, no. 1-2, pp. 229–261, 2009. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  29. A. J. Lee, U-Statistics: Theory and Practice, vol. 110, Marcel Dekker, New York, NY, USA, 1990. View at MathSciNet
  30. J. Hayya, D. Armstrong, and N. Gressis, “A note on the ratio of two normally distributed variables,” Management Science, vol. 21, no. 11, pp. 1338–1341, 1975. View at Publisher · View at Google Scholar