Abstract

We propose a cross-validation method suitable for smoothing of kernel quantile estimators. In particular, our proposed method selects the bandwidth parameter, which is known to play a crucial role in kernel smoothing, based on unbiased estimation of a mean integrated squared error curve of which the minimising value determines an optimal bandwidth. This method is shown to lead to asymptotically optimal bandwidth choice and we also provide some general theory on the performance of optimal, data-based methods of bandwidth choice. The numerical performances of the proposed methods are compared in simulations, and the new bandwidth selection is demonstrated to work very well.

1. Introduction

The estimation of population quantiles is of great interest when one is not prepared to assume a parametric form for the underlying distribution. In addition, due to their robust nature, quantiles often arise as natural quantities to estimate when the underlying distribution is skewed [1]. Similarly, quantiles often arise in statistical inference as the limits of confidence interval of an unknown quantity.

Let 𝑋1,𝑋2,,𝑋𝑛be independent and identically distributed random sample drawn from an absolutely continuous distribution function 𝐹with density 𝑓. Further, let 𝑋(1)𝑋(2)𝑋𝑛denote the corresponding order statistics. For (0<𝑝<1) a quantile function 𝑄(𝑝)is defined as follows: 𝑄(𝑝)=inf{𝑥𝐹(𝑥)𝑝}.(1.1) If 𝑄(𝑝) denotes 𝑝th sample quantile, then 𝑄(𝑝)=𝑥([𝑛𝑝]+1) where [𝑛𝑝] denotes the integral part of𝑛𝑝. Because of the variability of individual order statistics, the sample quantiles suffer from lack of efficiency. In order to reduce this variability, different approaches of estimating sample quantiles through weighted order statistics have been proposed. A popular class of these estimators is called kernel quantile estimators. Parzen [2] proposed a version of the kernel quantile estimator as below: 𝑄𝐾(𝑝)=𝑛𝑖=1𝑖/𝑛𝑖1/𝑛𝐾𝑋(𝑡𝑝)𝑑𝑡(𝑖).(1.2) From (1.2) one can readily observe that 𝑄𝐾(𝑝) puts most weight on the order statistics 𝑋(𝑖), for which 𝑖/𝑛 is close to𝑝. In practice, the following approximation to 𝑄𝐾(𝑝) is often used:𝑄𝐴𝐾(𝑝)=𝑛𝑖=1𝑛1𝐾𝑋(𝑖/𝑛𝑝)(𝑖).(1.3) Yang [3] proved that 𝑄𝐾(𝑝) and 𝑄𝐴𝐾(𝑝)are asymptotically equivalent in terms of mean square errors. Similarly, Falk [4] demonstrates that, from a relative deficiency perspective, the asymptotic performance of 𝑄𝐴𝐾(𝑝) is better than that of the empirical sample quantile.

In this paper, we propose a cross-validation method suitable for smoothing of kernel quantile estimators. In particular, our proposed method selects the bandwidth parameter, which is known to play a crucial role in kernel smoothing, based on unbiased estimation of a mean integrated squared error curve of which the minimising value determines an optimal bandwidth. This method is shown to lead to asymptotically optimal bandwidth choice and we also provide some general theory on the performance of optimal, data-based methods of bandwidth choice. The numerical performances of the proposed methods are compared in simulations, and the new bandwidth selection is demonstrated to work very well.

2. Data-Based Selection of the Bandwidth

Bandwidth plays a critical role in the implementation of practical estimation. Specifically, the choice of the smoothing parameter determines the tradeoff between the amount of smoothness obtained and closeness of the estimation to the true distribution [5].

Several data-based methods can be made to find the asymptotically optimal bandwidth in kernel quantile estimators for𝑄𝐴𝐾(𝑝) given by (1.3). One of these methods use derivatives of the quantile density for 𝑄𝐴𝐾(𝑝).

Building on Falk [4], Sheather and Marron [1] gave the MSE of 𝑄𝐴𝐾(𝑝) as follows. If 𝑓 is not symmetric or 𝑓 is symmetric but 𝑝0.5, 𝑄AMSE𝐴𝐾=1(𝑝)4𝜇2(𝑘)2𝑄(𝑝)24𝑄+𝑝(1𝑝)(𝑝)2𝑛1𝑄𝑅(𝐾)(𝑝)2𝑛1,(2.1) where 𝑅(𝐾)=2𝑢𝐾(𝑢)𝐾1(𝑢)𝑑𝑢, 𝜇2(𝑘)=𝑢2𝐾(𝑢)𝑑𝑢and 𝐾1 is the antiderivative of 𝐾.

If 𝑄>0 then opt=𝛼(𝐾)𝛽(𝑄)𝑛1/3,(2.2) where 𝛼(𝐾)=[𝑅(𝐾)/𝜇2(𝑘)2]1/3, 𝛽(𝑄)=[𝑄(𝑝)/𝑄(𝑝)]2/3.

There is no single optimal bandwidth minimizing the 𝑄AMSE(𝐴𝐾(𝑝)) when 𝐹 is symmetric and𝑝=0.5. Also, If 𝑞=0, we need higher terms and the 𝑄AMSE(𝐴𝐾(𝑝)) can be shown to be𝑄AMSE𝐴𝐾(=1𝑝)41𝑛4𝑄(𝑝)2𝜇2(𝑘)2+2𝑛12𝑄(𝑝)2(𝑞𝑡)𝑡𝐾(𝑡)𝑗(𝑡)𝑑𝑡,(2.3)where 𝑗(𝑡)=𝑡𝑥𝐾(𝑥)𝑑𝑥,see Cheng and Sun [6].

In order to obtain opt we need to estimate 𝑄=𝑞 and 𝑄=𝑞. It follows from (1.3) that the estimator of 𝑄=𝑞 can be constructed as follows:̃𝑞𝐴𝐾𝑄(𝑝)=𝐴𝐾(𝑝)=𝑛𝑖=1𝑋(𝑖)𝐾𝑎(𝑖1)𝑛𝑝𝐾𝑎𝑖𝑛𝑝.(2.4) Jones [7] derived that the AMSE(̃𝑞𝐴𝐾(𝑝)) as AMSẼ𝑞𝐴𝐾=𝑎(𝑝)44𝜇2(𝑘)2𝑞(𝑝)2+1[]𝑛𝑎𝑞(𝑝)2𝐾2(𝑦)𝑑𝑦.(2.5) By minimizing (2.5), we obtain the asymptotically optimal bandwidth for 𝑄𝐴𝐾(𝑝): 𝑎opt=𝑄(𝑝)2𝐾2(𝑦)𝑑𝑦𝑛𝑄(𝑝)2𝜇2(𝑘)21/5.(2.6) To estimate 𝑄=𝑞 in (2.2), we employ the known result 𝑄𝐴𝐾𝑑(𝑝)=𝑄𝑑𝑝𝐴𝐾1(𝑝)=𝑎2𝑛𝑖=1𝑋(𝑖)𝐾(𝑖1)/𝑛𝑝𝑎𝐾𝑖/𝑛𝑝𝑎,(2.7) and it readily follows that 𝑎opt=3𝑄(𝑝)2𝐾2(𝑥)𝑑𝑥𝑛𝑄(𝑝)2𝜇2(𝑘)21/7(2.8) which represents the asymptotically optimal bandwidth for 𝑄𝐴𝐾(𝑝). By substituting 𝑎=𝑎opt in (2.4) and 𝑎=𝑎opt in (2.7) we can compute opt.

3. Cross-Valdation Bandwidth Selection

When measuring the closeness of an estimated and true function the mean integrated squared (MISE) defined as MISE()=𝐸10𝑄(𝑝)𝑄(𝑝)2𝑑𝑝(3.1) is commonly used as a global measure of performance.

The value which minimises MISE() is the optimal smoothing parameter, and it is unknown in practice. The following ASE() is the discrete form of error criterion approximatingMISE():1ASE()=𝑛𝑛𝑖=1𝑄𝑖𝑛𝑖𝑄𝑛2.(3.2)

The unknown 𝑄(𝑝) is replaced by 𝑄(𝑝) and a function of cross-validatory procedure is created as:1𝑛𝑛𝑖=1𝑄𝑖𝑖𝑛𝑄𝑖𝑛2,(3.3)where 𝑄𝑖(𝑖/𝑛) denotes the kernel estimator evaluated at observation 𝑥𝑖, but constructed from the data with observation 𝑥𝑖 omitted.

The general approach of crossvalidation is to compare each observation with a value predicted by the model based on the remainder of the data. A method for density estimation was proposed by Rudemo [8] and Bowman [9]. This method can be viewed as representing each observation by a Dirac delta function 𝛿(𝑥𝑥𝑖), whose expectation is 𝑓(𝑥), and contrasting this with a density estimate based on the remainder of the data. In the context of distribution functions, a natural characterisation of each observation is by the indicator function 𝐼(𝑥𝑥𝑖)whose expectation is 𝐹(𝑥). This implies that the kernel method for density estimation can be expressed as1𝑓(𝑥)=𝑛𝑛𝑖=1𝐾𝑥𝑥𝑖,(3.4)

when 0𝐾(𝑥𝑥𝑖)𝛿(𝑥𝑥𝑖).

The kernel method for distribution function1𝐹(𝑥)=𝑛𝑛𝑖=1𝑊𝑥𝑥𝑖,(3.5)

where 𝑊 is a distribution function, is the bandwidth controls the degree of smoothing. When 0𝑊𝑥𝑥𝑖𝐼𝑥𝑥𝑖,(3.6)

where 𝐼(𝑥𝑥𝑖)is the indicator function𝐼𝑥𝑥𝑖=1,if𝑥𝑥𝑖0,0,otherwise.(3.7)

Now, from (1.3) when 0𝑄AK𝑖(𝑝)𝛿𝑛𝑋𝑝(𝑖),(3.8) and thus a cross-validation function can be written as 1CV()=𝑛𝑛𝑖=110𝛿𝑖𝑛𝑋𝑝(𝑖)𝑄𝑖𝑖𝑛2𝑑𝑝.(3.9) The smoothing parameter is then chosen to minimise this function. By subtracting a term that characterise the performance of the true (𝑝) we have1𝐻()=CV()𝑛𝑛𝑖=110𝛿𝑖𝑛𝑋𝑝(𝑖)𝑖𝑄𝑛2𝑑𝑝(3.10) which does not involve . By expanding the braces and taking expectation, we obtain 1𝐻()=𝑛𝑛𝑖=110𝑄2𝑖𝑖𝑛𝑖2𝛿𝑛𝑋𝑝(𝑖)𝑄𝑖𝑖𝑛𝑖+2𝛿𝑛𝑋𝑝(𝑖)𝑄𝑖𝑛𝑄2𝑖𝑛𝑑𝑝.(3.11) When 𝑛 the (𝑛𝑝)th order statistic 𝑥(𝑛𝑝) is asymptotically normally distributed 𝑥(𝑛𝑝)AN𝑄(𝑝),𝑝(1𝑝)𝑛[]𝑓(𝑄(𝑝))2,1𝐸{𝐻()}=𝐸𝑛𝑛𝑖=110𝑄2𝑖𝑖𝑛𝑖2𝛿𝑛𝑋𝑝(𝑖)𝑄𝑖𝑖𝑛𝑖+2𝛿𝑛𝑋𝑝(𝑖)𝑄𝑖𝑛𝑄2𝑖𝑛,1𝑑𝑝𝐸{𝐻()}=𝑛𝑛𝑖=110𝐸𝑄2𝑖𝑖𝑛𝑖2𝛿𝑛𝑄𝑖𝑝𝑛𝐸𝑄𝑖𝑖𝑛𝑖+2𝛿𝑛𝑄𝑝2𝑖𝑛𝑄2𝑖𝑛𝑑𝑝,𝐸{𝐻()}=𝐸10𝑄𝑛1𝑖𝑛𝑖𝑄𝑛2𝑑𝑝,(3.12) where the notation 𝑄𝑛1(𝑖/𝑛) with positive subscript denotes a kernel estimator based on a sample size of 𝑛1. The proceeding arguments demonstrate that CV()provides an asymptotic unbiased estimator of the true MISE() curve for a sample size 𝑛1. The identity at (3.12) strongly suggests that crossvalidation should perform well.

4. Theoretical Properties

From (3.1), we can write MISE()=10bias2(𝑄𝐾(𝑝))𝑑𝑝+10𝑄var(𝐾(𝑝))𝑑𝑝.

Sheather and Marron [1] have shown that 𝑄bias𝐾=1(𝑝)22𝜇2(𝑘)𝑄(𝑝)+02.(4.1) while Falk [4, page 263] proved that𝑄var𝐾𝑄(𝑝)=𝑝(1𝑝)(𝑝)2𝑛1𝑄𝑅(𝐾)(𝑝)2𝑛1𝑛+01.(4.2)

On combining the expressions for bias and variance we can express the mean integrated square error as1MISE()=44𝜇2(𝑘)210𝑄(𝑝)2𝑑𝑝+𝑝(1𝑝)10𝑄(𝑝)2𝑑𝑝𝑛1𝑅(𝐾)10𝑄(𝑝)2𝑑𝑝𝑛1+04+𝑛1,(4.3)

and for 𝐶1=𝑝(1𝑝)10[𝑄(𝑝)]2𝑑𝑝,𝐶2=𝑅(𝐾)10[𝑄(𝑝)]2𝑑𝑝 and 𝐶3=𝜇2(𝑘)210[𝑄(𝑝)]2𝑑𝑝 the MISE can be expressed as MISE()=𝐶1𝑛1𝐶2𝑛11+4𝐶34+04+𝑛1.(4.4) Therefore, the asymptotically optimal bandwidth is 0=𝐶𝑛1/3, where𝐶={𝐶2/𝐶3}1/3.

We can see from (3.12) that 𝐻() may be a good approximation to MISE() or at least to that function evaluated for a sample of size 𝑛1rather than 𝑛. Additionally, this is true if we adjusted 𝐻() by adding the quantity𝐽𝑛=10𝑄(𝑝)𝑄(𝑝)2𝐸𝑄(𝑝)𝑄(𝑝)2.(4.5) This quantity is demean and does not depend on which makes it attractive for obtaining a particularly good approximation to MISE().

Theorem 4.1. Suppose that 𝑄(𝑝) is bounded on [0,1] and right continuous at the point 0, and that 𝐾 is a compactly supported density and symmetric about 0. Then, for each 𝛿,𝜀,𝐶>0, 𝐻()+𝐽=MISE()+02𝑛3/2+𝑛13/2+𝑛1/23𝑛𝛿(4.6) with probability1, uniformly in 0𝐶𝑛𝛿, as 𝑛.

(An outline proof of the above theorem is in the appendix).

From the above theorem, we can conclude that minimisation of 𝐻() produces a bandwidth that is asymptotically equivalent to the bandwidth 0 that minimises MISE().

Corollary 4.2. Suppose that the conditions of previous theorem hold. If denotes the bandwidth that minimises CV() in the range 0𝐶𝑛𝛿, for any 𝐶>0 and any 0𝜀1/3, then 01(4.7) with probability 1 as 𝑛.

5. A Simulation Study

A numerical study was conducted to compare the performances of the two bandwidth selection methods. Namely, the method presented by Sheather and Marron [1] and our proposed method.

In order to account for different shapes for our simulation study we consider a standard normal, Exp(1), Log-normal(0,1) and double exponential distributions and we calculate 18 quantiles ranging from 𝑝=0.05 to 𝑝=0.95. Through the numerical study the Gaussian kernel was used as the kernel function. Sample sizes of 100, 200 and 500 were used, with 100 simulations in each case. The performance of the methods was assessed through the mean squared errors criterion (MSE). MSE()=𝐸{𝑄(𝑝)𝑄(𝑝)}2. And the relative efficiency (R.E)R.E=MISEMethod2Method2,optMISEMethod1Method1,opt.(5.1)

Further, for comparison purposes we refer to our proposed method and that of Sheather and Marron [1] as method 1 and method 2 respectively.

(a) Standard normal distribution (see Table 1 and Figure 1).

(b) Exponential distribution (see Table 2 and Figure 2).

(c) Log-normal distribution (see Table 3 and Figure 3).

(d) Double exponential distribution (see Table 4 and Figure 4).

We can compute and summarize the relative efficiency of Method1,opt for the all previous distributions in Table 5.

From Tables 1, 2, 3, and 4, for the all distributions, it can be observed that in 52.3% of cases our method produces lower mean squared errors, slightly wins Sheather-Marron method.

Also, from Table 5 which describes the relative efficiency for Method1,opt we can see Method1,opt more efficient from Method2,opt for all the cases except the standard normal distribution cases with 𝑛=200,500 and double exponential distribution cases with 𝑛=500.

So, we may conclude that in terms of MISE our bandwidth selection method is more efficient than Sheather-Marron for skewed distributions but not for symmetric distributions.

6. Conclusion

In this paper we have a proposed a cross-validation-based-rule for the selection of bandwidth for quantile functions estimated by kernel procedure. The bandwidth selected by our proposed method is shown to be asymptotically unbiased and in order to assess the numerical performance, we conduct a simulation study and compare it with the bandwidth proposed by Sheather and Marron [1]. Based on the four distributions considered the proposed bandwidth selection appears to provide accurate estimates of quantiles and thus we believe that the new bandwidth selection method is a practically useful method to get bandwidth for the quantile estimator in the form (1.3).

Appendix

Step 1. Let 𝑛𝐻=𝑆12𝑆2, where 𝑆1=𝑖10𝑄𝑖(𝑝)𝑄(𝑝)2,𝑆2=𝑖10𝛿𝑖𝑛𝑋𝑝(𝑖)𝑄𝑄(𝑝)𝑖(𝑝)𝑄(𝑝).(A.1)

Step 2. With 𝐷𝑖(𝑝)=𝐾(𝑖/𝑛𝑝)𝑋(𝑖)𝑄(𝑝) and 𝐷0𝑖(𝑝)=𝛿(𝑖/𝑛𝑝)𝑋(𝑖)𝑄(𝑝)𝑆1=(𝑛1)2𝑛2(𝑛2)10𝑄(𝑝)𝑄(𝑝)2+(𝑛1)𝑛2𝑖=110𝐷2𝑖𝑆(𝑝),2=(𝑛1)1𝑛210𝑄(𝑝)𝑄(𝑝)𝑄(𝑝)𝑄(𝑝)+(𝑛1)𝑛1𝑖=110𝐷𝑖𝐷0𝑖(𝑝).(A.2)

Step 3. This step combines Steps 1 and 2 to prove that 𝐻=1(𝑛1)210𝑄(𝑝)𝑄(𝑝)2+1𝑛(𝑛1)2𝑛𝑖=110𝐷2𝑖(𝑝)21+(𝑛1)110+2𝑄(𝑝)𝑄(𝑝)𝑄(𝑝)𝑄(𝑝)𝑛(𝑛1)𝑛𝑖=110𝐷𝑖(𝑝)𝐷0𝑖(𝑝).(A.3)

Step 4. This step establishes that 𝐸10𝑄(𝑝)𝑄(𝑝)22+𝐸10𝑄(𝑝)𝑄(𝑝)𝑄(𝑝)𝑄(𝑝)2𝑛=02+8,𝐸𝑛𝑛3𝑖=110𝐷2𝑖(𝑝)2𝑛+var𝑛2𝑖=110𝐷𝑖𝐷0𝑖(𝑝)2𝑛=03.(A.4)

Step 5. This step combines Steps 3 and 4, concluding that 𝐻+10𝑄(𝑝)𝑄(𝑝)2=10𝑄(𝑝)𝑄(𝑝)2+2(𝑛1)1𝜇()+02𝑛3/2+𝑛14,(A.5) where 𝜇()=10𝐸(𝐷𝑖(𝑝)𝐷0𝑖(𝑝)).
Let 𝑈=02(𝜉), for a random variable 𝑈=𝑈(𝑛) and a positive sequence 𝜉=𝜉(𝑛)𝐸𝑈2𝜉=02.(A.6)

Step 6. This step notes that 10(𝑄(𝑝)𝑄(𝑝))2=𝑆+𝑇, where 𝑆=𝑛2𝑖𝑗𝑔𝑋𝑖,𝑋𝑗,𝑇=𝑛𝑛2𝑖=1𝑔𝑋𝑖,𝑋𝑖,𝑔𝑋𝑖,𝑋𝑗=10𝐾𝑖𝑛𝑋𝑝(𝑖)𝑖𝛿𝑛𝑋𝑝(𝑖)𝐾𝑗𝑛𝑋𝑝(𝑗)𝑗𝛿𝑛𝑋𝑝(𝑗)𝑑𝑝,(A.7) and that𝑆=𝑆(1)+𝑆(2)+(1𝑛1)𝑔0, where 𝑆(1)=𝑛2𝑖𝑗𝑔𝑋𝑖,𝑋𝑗𝑔1𝑋𝑖𝑔1𝑋𝑗+𝑔0,𝑆(2)=2𝑛11𝑛1𝑛𝑖=1𝑔1𝑋𝑖𝑔0,𝑔1𝑔(𝑥)=𝐸𝑥,𝑋1,𝑔0𝑔=𝐸1𝑋1.(A.8)

Step 7. Shows that 𝐸{𝑔(𝑋1,𝑋1)2}=0(1),𝐸{𝑔(𝑋1,𝑋2)2}=0(3),𝐸{𝑔1(𝑋1)2}=0(6)var{𝑇}=0(𝑛3),𝐸(𝑆(1))2=0(𝑛23)and𝐸(𝑆(2))2=0(𝑛16).

Step 8. This step combines the results of Steps 5, 6, 7, obtaining 𝐻+10𝑄(𝑝)𝑄(𝑝)2=𝐸(𝑇)+1𝑛1𝑔0+2(𝑛1)1𝜇()+02𝑛3/2+𝑛13/2+𝑛1/23=10𝐸𝑄(𝑝)𝑄(𝑝)2+2(𝑛1)1𝜇()+02𝑛3/2+𝑛13/2+𝑛1/23.(A.9)

Step 9. This step notes that 𝜇()=0() and 10𝐸𝑄(𝑝)𝑄(𝑝)2=10𝐸𝑄(𝑝)𝑄(𝑝)2+10𝐸𝑄(𝑝)𝑄(𝑝)22𝑛1𝜇().(A.10)

Step 10. This step combines Steps 8 and 9, establishing that 𝐻+10𝑄(𝑝)𝑄(𝑝)210𝐸𝑄(𝑝)𝑄(𝑝)2=10𝐸𝑄(𝑝)𝑄(𝑝)2+02𝑛3/2+𝑛13/2+𝑛1/23.(A.11) This means that 𝐸{𝐻+𝐽MISE()}2=02𝑛3/2+𝑛13/2+𝑛1/23.(A.12)