Confidence Intervals for the Mean Based on Exponential Type Inequalities and Empirical Likelihood
For independent observations, recently, it has been proposed to construct the confidence intervals for the mean using exponential type inequalities. Although this method requires much weaker assumptions than those required by the classical methods, the resulting intervals are usually too large. Still in special cases, one can find some advantage of using bounded and unbounded Bernstein inequalities. In this paper, we discuss the applicability of this approach for dependent data. Moreover, we propose to use the empirical likelihood method both in the case of independent and dependent observations for inference regarding the mean. The advantage of empirical likelihood is its Bartlett correctability and a rather simple extension to the dependent case. Finally, we provide some simulation results comparing these methods with respect to their empirical coverage accuracy and average interval length. At the end, we apply the above described methods for the serial analysis of a gene expression (SAGE) data example.
Although the classical -test is based on the assumption that observations are normally distributed, it is well known that it is a robust test and works well at least for symmetric distributions (see ). However, for skewed or heavy tailed distributions, the confidence intervals for the mean based on the inversion of the -test may give poor coverage. Using exponential type inequalities (such as Bernstein’s inequality), Rosenblum and van der Laan  presented a simple approach for constructing confidence intervals for the population mean. Rosenblum and van der Laan  dealt with the bounded version of Bernstein’s inequality which is applicable only for a few distributions such as, the bounded uniform distribution. To use it for general distributions, we need to use the unbounded version of Bernstein’s inequality, which was analyzed by Shilane et al.  for the negative binomial distribution. The confidence intervals based on exponential type inequalities have a guaranteed coverage probability under much weaker assumptions than required by the standard methods. Although the obtained confidence intervals are usually too large, there are situations when they give better coverage accuracy than the classical methods.
In this paper, our goal is to use the empirical likelihood method introduced by Owen [4, 5] for the inference on the mean and compare it with the Rosenblum and van der Laan  method. The empirical likelihood method is based on the nonparametric likelihood ratio statistic, which similarly to the parametric case has asymptotically a limiting chi-square distribution. The confidence intervals obtained by the empirical likelihood method have some very appealing characteristics. There are no prespecified parametric assumptions on the distribution of observations and no constraints on the shape of the confidence intervals. Empirical likelihood intervals are Bartlett correctable in most cases. This means that a simple correction for the mean reduces the coverage error from order to order , where denotes the sample size (see, e.g., [6, 7]). Owen  suggested (see Section 2 for more details) that for small sample sizes the distribution may provide a better approximation for the limiting distribution of the test statistic. We found in our simulation study that this approach can provide a significant improvement (see Tables 1 and 3).
Time series data are often correlated. For this situation, either modified or different statistical procedures of statistical inference are required. There exist many exponential type inequalities for weakly dependent data, characterized by -mixing or -mixing sequences (e.g., see [8–10]). However, it is problematic to use them for practical data problems. First, such inequalities typically contain the - or -mixing coefficients which have to be estimated from the data. Only recently McDonald et al.  introduced the estimation procedure for -mixing coefficients. Second, usually these inequalities are established for bounded distributions. The applicability of exponential type inequalities for the dependent case is therefore limited. The blockwise empirical likelihood method introduced by Kitamura  provides a good alternative for the inference regarding the mean. It seems that the Bartlett correction can be applied also in this case.
For our simulation study, we use the negative binomial distribution, which has been analyzed by several authors (e.g., see [3, 13]) and has been widely used for applications involving insurance data. There are several ways how the negative binomial distribution can be parametrized. We will use the parametrization described by Hilbe ; that is, for any and , the resulting probability mass function for the negative binomial random variable is We also simulate some autoregressive processes representing data with a marginal negative binomial distribution. For independent data, we compared the empirical coverage accuracy for the confidence intervals based on exponential type inequalities, the -test, and several empirical likelihood and bootstrap methods. For dependent data, we used a simulation study to compare the blockwise empirical likelihood method, with and without the Bartlett correction, and we also applied the empirical likelihood method to the serial analysis of gene expression (SAGE) data analyzed by Shilane et al. .
This paper is organized as follows. Section 2 deals with the independent case. It contains three subsections which introduce Bernstein inequalities and the empirical likelihood method while also including a simulation study. In Section 3, weakly dependent data are discussed. Subsequent three subsections introduce the blockwise empirical likelihood method, describe two exponential-type inequalities, provide a simulation study, and analyze the SAGE data example. We finish with the main conclusions presented in Section 4.
2. Independent Case
2.1. Confidence Intervals Using Bernstein’s Inequality
Rosenblum and van der Laan  described the method for generating confidence intervals, which bounds the tails of the distribution of the sample mean of independent, bounded random variables. This method is based on exponential type inequalities such as Bernstein’s inequality: for all , where and are constants defined in Theorem 1.
Theorem 1 (Rosenblum and van der Laan ). Let be independent and identically distributed () random variables with mean , and let be such that . Let be a number satisfying . Then for the function In particular, is the confidence interval for .
The accuracy of coverage probabilities depends on the estimation of and , where . Rosenblum and van der Laan  suggested that a bound should be set at least as large as and proposed also to use for an estimator , where is the standard error of (for details, see ).
To examine exponential type inequalities for distributions like the negative binomial distribution, we will need a version of Bernstein’s inequality that does not require the assumption of boundedness.
Lemma 2 (Birgé and Massart ). Let be independent random variables satisfying the moments conditions for all , for some positive constants and . Then, for any positive and , where is defined by the equation .
Suppose that , , are negative binomial with parameters and , and let . In this case, Shilane et al.  obtained the confidence interval for the mean of the form , where is the solution to the equation To estimate the parameter , we can use the method of moments; that is, .
2.2. Empirical Likelihood for the Mean
Let be observations with the distribution and mean . To obtain confidence intervals for , we define the profile empirical likelihood function Owen [4, 5] showed that a unique value for the right–hand side of (8) exists when is inside the convex hull of the data points . This maximization problem can be solved using the Lagrange multiplier method.
Theorem 3 (Owen ). If are random variables with the distribution function , , and , then
The empirical likelihood confidence interval for is of the following form: where is the quantile of the distribution. Owen  stated that the proof of Theorem 3 and some simulations suggest that the quantiles can be replaced by the quantiles of the distribution . The calibration usually gives better results for small sample sizes.
The Bartlett correction replaces the quantile of the limiting chi-square distribution of the test statistic by , where and is th moment of (see ). The Bartlett correction improves the asymptotic error rate for all coverage levels . A Bartlett-corrected empirical likelihood confidence interval has the following form:
2.3. Simulation Study
Our simulation study is based on independent trials. In every trial, we generated random variables from the negative binomial distribution () with different sample sizes , , and . To simulate data, we used the command rnegbin from the package MASS in the statistical programming language.
In Table 1, the empirical coverage accuracy for nominal confidence intervals for the mean is evaluated for the following methods: the -test (), the empirical likelihood method (), the empirical likelihood with Bartlett correction (), the empirical likelihood method with the distribution calibration (), and the empirical likelihood method with the distribution calibration and Bartlett correction (). For comparison, we have also simulated the coverage accuracy of several bootstrap resampling procedures: the percentile (), normal (), basic (), and studentized () bootstrap resampling methods. The package boot and command boot.ci are used to construct the confidence intervals for all the above mentioned bootstrap methods. Although only the unbounded version of Bernstein inequalities is valid for the negative binomial distribution, we also apply the bounded version in order to see the possible effects of the violation of this assumption. In Table 2, we present average confidence interval lengths for the same methods as described in Table 1.
For smaller values, the degree of skewness of the negative binomial distribution is higher. Therefore the coverage accuracy increases along with the increase of the parameter . From the results from Tables 1 and 2, we can see that the coverage accuracy is higher for the confidence intervals based on the bounded Bernstein’s inequality than for those using the unbounded version. However, the resulting confidence intervals have larger average length when the bounded version is used.
Comparing the empirical likelihood and the bootstrap methods, we can see that even the uncorrected method works better than the bootstrap methods (the only exception is the studentized bootstrap, which also has one of the widest intervals) and also the average lengths of the based confidence intervals are only negligibly larger than those for the bootstrap methods.
When the parameter and data become much more asymmetric, the exponential type inequalities give the best coverage. However, as they are exact methods, for greater values, the empirical coverage becomes much larger than the nominal confidence level. When using the Bartlett correction and the calibration with the distribution, the empirical likelihood method provides much better coverage accuracy compared with the -test and bootstrap methods.
3. Dependent Case
3.1. Blockwise Empirical Likelihood for the Mean
In practice, the assumption of independence may be invalid in many situations, and by relaxing this strong assumption, we extend the scope of our research. As first shown by Kitamura , the empirical likelihood method can be applied not only for independent but also for stationary weakly dependent data characterized by some mixing coefficients.
Let be a real-valued strictly stationary process defined on a probability space . For any two -fields and , define the following coefficient of dependence: where the supremum is taken over all and . For , define . is called strongly mixing or -mixing (see, e.g., ) if when .
Further, we also define another coefficient for any two -fields and as follows: where the supremum is taken over all and from such that for all and for all . Define , when . is called absolutely regular or -mixing if when .
Under certain (usually mild) conditions, ARMA, GARCH, nonlinear time series models, and Markov chains are -mixing processes (see, e.g., ). It is important to note that if a process is -mixing, then it follows that it is also an -mixing sequence.
Further, we will use the notation and the results from two papers: Zhang et al.  and Kitamura . Let be a real-valued function such that , where . Let and be the integers depending on , , , and as . Set , where is the integer part of . The blocks of consecutive observations will be used for the estimation. Note that is the block length and is the separation between the block starting points. Furthermore, we will use nonoverlapping blocks with .
In order to address the dependence among Kitamura,  proposed to use the profile empirical likelihood method to the blockwise sample instead of . More specifically, Kitamura  defined the blockwise empirical likelihood function for as which generates the blockwise empirical likelihood ratio statistic Under some regularity conditions, Kitamura  showed that also has a limiting distribution.
The coverage error of confidence intervals can be improved up to the order by using the Bartlett correction; that is, where and denotes the th moment of . This rate is slower than the rate obtained for the independent data due to the nonparametric treatment of dependence.
3.2. Exponential Type Inequalities
In this section, we introduce several exponential type inequalities for - and -mixing sequences and discuss their applicability in practice.
Theorem 4 (Doukhan ). Let be a zero-mean real-valued process such that , for all and for all . Then for each , , and ,
Theorem 5 (Bosq ). Let be a zero-mean real-valued process such that . Then, for each integer and each ,
Both inequalities involve mixing coefficients, which in practice need to be estimated. Recently McDonald et al.  proposed a method for the -mixing coefficient that uses the histogram estimator for stationary time series data. To our knowledge, this is the first result in this area that allows us to estimate such general mixing coefficients. The estimator has the following form: where is the histogram estimator of the joint density of observations and is the 2-dimensional histogram estimator of the joint density of the two sets of observations separated by time points. Typically mixing coefficients decay exponentially fast; therefore, it is possible to get reasonable estimates only for small values of .
In a recent paper, Merlevède et al.  obtained the exponential type inequality depending on the following condition on a strong mixing coefficient; that is, where is some constant. Here, we present the inequality given in Corollary 12 by Merlevède et al. .
Corollary 6 (Merlevède et al. ). Let be a sequence of centred real-valued random variables. Suppose that the sequence satisfies (22) and that there exists a positive such that . For all and , where , and is defined in (22).
Note that the inequality (23) only depends on the value of the constant . Using the -mixing estimator (21) from McDonald et al. , we propose to estimate by the linear regression estimator of the slope from the equation where is some constant. The inequalities defined in Theorems 4 and 5 can be used for practical applications after the estimation of , by exploiting the relation .
3.3. Simulation Study
We simulate from a stationary AR process defined as where is an innovation process which is weakly stationary with mean and autocovariance if and otherwise, and is the coefficient of the process. The data are generated with a marginal negative binomial distribution using the package gsarima. We used the command garsim and selected the parameter link=“identity.” We used different sample sizes and selected different values for the parameter . The minimal value of was set equal to for the algorithm used by the command garsim.
In Table 3, we present the empirical coverage accuracies for the confidence intervals constructed using the blockwise empirical likelihood method with and without the Bartlett correction. We report the corresponding average lengths of the confidence intervals in Table 4. Similar to the independent case the, Bartlett correction provides some improvement in all cases. Furthermore, the results strongly depend on the block length parameter .
A general autoregressive process AR has the mixing property if the process defined in (26) has a continuous marginal density (see ). Thus, it might not be appropriate to use exponential type inequalities in the case of the negative binomial distribution. We performed more simulations using a continuous distribution such as the normal distribution. In this case, the confidence intervals using Bernstein type inequalities typically are too large in comparison with those produced by the blockwise empirical likelihood method.
Finally, we analyze the serial analysis of gene expression data (SAGE) from Shilane et al. . This dataset is used in molecular biology to estimate the relative abundance of messenger ribonucleic acid (mRNA) molecules based upon the frequency of corresponding 14 base pair tag sequences that are extracted from a cell. Due to the fact that the cost of sequencing can be quite high, the sample size is often quite small for such data situations.
Shilane et al.  picked the 20 most frequent tags from the whole data set and constructed the confidence intervals for the mean of the corresponding counts using several methods: bounded and unbounded Bernstein type inequalities, , Gamma, Wald, -test, and bias corrected and accelerated bootstrap methods. Additionally, we report the confidence intervals obtained by the empirical likelihood method in Table 5.
Shilane et al.  outlined some strengths and weaknesses of the methods from Table 5. Both the intervals based on the bounded Bernstein inequality and the Wald method include a range of negative numbers as possible values for the mean , which is unreasonable for the SAGE data. By the contrast, the , Gamma, bootstrap, and empirical likelihood methods give only nonnegative values.
All the methods used in Table 5 require the independence of the data. However, if we look at the whole data set, there are significant correlations (see Figure 1). Still in this case we can use the blockwise empirical likelihood method for statistical inference (see Table 6).
We conclude that large confidence intervals are typically obtained when Bernstein type inequalities are used. This method can still provide good results for highly skewed situations when the sample sizes are small. For the independent observations, the empirical likelihood method is a good alternative not only to the classical -test statistic but also to the bootstrap methods. The Bartlett correction and the calibration using the distribution can provide a significant improvement for small sample sizes. For dependent data, in principle, it is possible to use Bernstein type inequalities for inference for the mean. However, for correlated time series data, usually bigger sample sizes are necessary. Second, for the negative binomial distribution, these inequalities are not valid for ARMA type processes. A reasonable alternative is to use the blockwise empirical likelihood method.
Conflict of Interests
The authors of the paper do not have a direct financial relation with any commercial identity mentioned in the paper that might lead to a conflict of interests for any of the authors.
Valeinis acknowledges a partial support of the Project 2009/0223/1DP/126.96.36.199.0/09/APIA/VIAA/008 of the European Social Fund, and Vucane acknowledges a support by ESF Grant No. 2009/0138/1DP/188.8.131.52.2/09/IPIA/VIAA/004.
A. DasGupta, Asymptotic Theory of Statistics and Probability, Springer Texts in Statistics, Springer, New York, NY, USA, 2008.
P. Hall and B. La Scala, “Methodology and algorithms of empirical likelihood,” International Statistical Review, vol. 58, pp. 109–127, 1990.View at: Google Scholar
A. B. Owen, Empirical Likelihood, Chapman & Hall/CRC, New York, NY, USA, 2001.
P. Doukhan, P. Massart, and E. Rio, “Invariance principles for absolutely regular empirical processes,” Annales de l'Institut Henri Poincaré. Probabilités et Statistiques, vol. 31, no. 2, pp. 393–427, 1995.View at: Google Scholar
F. Merlevède, M. Peligrad, and E. Rio, “Bernstein inequality and moderate deviations under strong mixing conditions,” in High Dimensional Probability V: The Luminy Volume, vol. 5 of Inst. Math. Stat. Collect., pp. 273–292, Inst. Math. Statist., Beachwood, Ohio, USA, 2009.View at: Publisher Site | Google Scholar
D. J. McDonald, C. R. Shalzi, and M. Schervish, “Estimating beta-mixing coefficients,” in Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, G. Gordon, D. Dunson, and M. Dudík, Eds., JMLR W&CP, 2011.View at: Google Scholar
J. M. Hilbe, Negative Binomial Regression, Cambridge University Press, Cambridge, UK, 2007.
R. C. Bradley, Introduction to Strong Mixing Conditions, vol. 1,2,3, Kendrick Press, Heber City, Utah, USA, 2007.
P. Doukhan, Mixing: Properties and Examples, vol. 85 of Lecture Notes in Statistics, Springer, New York, NY, USA, 1994.View at: Publisher Site
D. Bosq, Nonparametric Statistics for Stochastic Processes, vol. 110 of Lecture Notes in Statistics, Springer, New York, NY, USA, 1996.View at: Publisher Site