Journal of Probability and Statistics

Volume 2015, Article ID 420483, 9 pages

http://dx.doi.org/10.1155/2015/420483

## Comparison of the Frequentist MATA Confidence Interval with Bayesian Model-Averaged Confidence Intervals

Department of Mathematics and Statistics, University of Otago, Dunedin 9054, New Zealand

Received 1 June 2015; Revised 22 July 2015; Accepted 13 September 2015

Academic Editor: Shesh N. Rai

Copyright © 2015 Daniel Turek. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Model averaging is a technique used to account for model uncertainty, in both Bayesian and frequentist multimodel inferences. In this paper, we compare the performance of model-averaged Bayesian credible intervals and frequentist confidence intervals. Frequentist intervals are constructed according to the model-averaged tail area (MATA) methodology. Differences between the Bayesian and frequentist methods are illustrated through an example involving cloud seeding. The coverage performance and interval width of each technique are then studied using simulation. A frequentist MATA interval performs best in the normal linear setting, while Bayesian credible intervals yield the best coverage performance in a lognormal setting. The use of a data-dependent prior probability for models improved the coverage of the model-averaged Bayesian interval, relative to that using uniform model prior probabilities. Data-dependent model prior probabilities are philosophically controversial in Bayesian statistics, and our results suggest that their use is beneficial when model averaging.

#### 1. Introduction

Historically, statistical inference has been based on a single model selected from among a set of predetermined candidate models, with no allowance made for model uncertainty. This process of model selection has been shown to produce biased estimators and result in the incorrect calculation of standard error terms [1–4]. Recently, model averaging has gained popularity as a technique to incorporate model uncertainty into the process of inference [5–7]. The use of model averaging has been studied in a variety of settings (e.g., [8, 9]), where it generally exhibits favorable results relative to traditional model selection.

Model averaging is a natural extension in the Bayesian paradigm, where the choice of model is introduced as a discrete-valued parameter. A prior probability mass function is specified for this parameter, defining the prior probability of each candidate model. Posterior model probabilities are defined by the posterior distribution of the model parameter, and the posterior distributions for model parameters are not conditional upon a particular model and hence naturally account for model uncertainty [10, 11]. In practice, Bayesian model averaging is achieved by allowing a Gibbs sampler to traverse the augmented parameter space, which generates approximations to the posterior distributions of interest. Facilitated by recent advances in computation, Bayesian model averaging has been widely applied in a variety of application domains (e.g., [12–14]).

In the frequentist setting, a model-averaged estimate is defined as the weighted sum of single-model estimates: , where is the estimate under model , model weights are determined from an information criterion such as AIC, and the summation is over the set of candidate models.

Several approaches to constructing frequentist model-averaged confidence intervals have been suggested. Wald intervals of the form , where is the quantile of the standard normal distribution, rely on accurate estimation of , the standard error of . Estimation of this term is complicated by the fact that the model weights and the single-model estimates are all random quantities. Burnham and Anderson [6] have suggested a variety of forms for , which are studied by Claeskens and Hjort [7] and by Turek and Fletcher [15]. In each of these studies, model-averaged Wald intervals of this form were found to perform poorly in terms of coverage rate.

An alternate methodology for the construction of frequentist model-averaged intervals is proposed by Turek and Fletcher [15]. Here, each confidence limit is defined as the value for which a weighted sum of the resulting single-model Wald interval error rates is equal to the desired error rate. As this involves averaging the “tail areas” of the sampling distributions of single-model estimates, this new construction is called a model-averaged tail area Wald (MATA-Wald) interval. In a simulation study by Turek and Fletcher [15], the MATA-Wald interval outperformed model-averaged intervals of the form . Fletcher and Turek [16] applied the MATA construction to profile likelihood intervals to produce a model-averaged tail area profile likelihood (MATA-PL) interval. Coverage properties of MATA confidence intervals are also studied in Kabaila et al. [17], and a transformed version of the MATA interval was proposed by Yu et al. [18].

In this paper, we compare the performance of model-averaged Bayesian credible intervals and the MATA-Wald and MATA-PL intervals of Turek and Fletcher [15] and Fletcher and Turek [16]. The effect of using various model prior probabilities and parameter prior distributions on Bayesian intervals is considered. We also study the use of several information criteria to calculate frequentist model weights. A theoretical study of the asymptotic properties of these intervals is complicated by the random nature of the model weights. For this reason, we assess the performance of these intervals through a simulation study.

In Section 2, we define the Bayesian and frequentist model-averaged intervals. The differences between these intervals are shown in Section 3, through an example involving cloud seeding. We describe the simulation study used to compare these intervals in Section 4 and present the results of this study in Section 5. We conclude with a discussion in Section 6.

#### 2. Model-Averaged Intervals

Assume a set of candidate models exists, where the parameter of interest is common to all models. For data , let model have likelihood function , parameterized in terms of and the nuisance parameter , which may be vector-valued. We now define the Bayesian and frequentist model-averaged intervals for .

##### 2.1. Bayesian Interval

The model-averaged posterior distribution for iswhere is the posterior distribution of under model and is the posterior probability of [10]. An equal-tailed model-averaged Bayesian (MAB) credible interval is defined as the and quantiles of .

Each posterior distribution in (1) may be expressed through integration of the joint posterior, asfollowing Bayes’ theorem, where is the joint prior distribution for parameters and under . The posterior model probabilities in (1) may be expressed as , where is the prior probability of model and is the integrated likelihood under , given by

Evaluation of the integrals in (2) and (3) is generally difficult in practice, and Markov chain Monte Carlo (MCMC) simulation is used to approximate the posterior distributions of interest. In the multimodel case, this is implemented using the reversible jump MCMC (RJMCMC) algorithm [19].

##### 2.2. Frequentist Interval

The frequentist MATA intervals are constructed in a manner analogous to Bayesian model averaging. Confidence limits are defined such that the weighted sum of error rates under each single-model interval will produce the desired overall error rate. This utilizes model weights , which are derived from an information criterion.

We initially focus on the information criterion to define model weights, where is the maximized likelihood and is the number of parameters. Model weights are calculated as , where and is the value of the information criterion for model [20]. Other choices of information criteria for defining model weights are addressed in the discussion in Section 6.

###### 2.2.1. MATA-Wald Interval

In the normal linear model, the confidence limits and of a single-model Wald interval for satisfy the equations where is the distribution function of the -distribution with degrees of freedom, is the error degrees of freedom associated with the model, , , and is the estimated standard error of [21, 22]. A MATA-Wald interval is constructed using a weighted sum of the single-model error rates. The lower and upper confidence limits of a MATA-Wald interval, and , are defined as the values satisfyingwhere model has error degrees of freedom, , , and is the estimate of under model .

The MATA-Wald interval may be generalized to nonnormal data, assuming that we can specify a transformation for which the sampling distribution of is approximately normal when is true. For example, when is a probability. In this case, the MATA-Wald confidence limits and are the values satisfying the pair of equationswhere is the standard normal distribution function, , , , and , as set out by Turek and Fletcher [15].

###### 2.2.2. MATA Profile Likelihood Interval

Assuming a single model with likelihood function , the limits and of a profile likelihood interval for satisfywhere is the signed likelihood ratio statistic, defined asand is the profile likelihood function for [23, p. 126–129]. The limits and of the MATA-PL interval are defined as the values which satisfywhere is defined in terms of the corresponding likelihood function , as in (8), and as described by Fletcher and Turek [16].

#### 3. Example

We use a study of cloud seeding to illustrate the differences between these methods of model averaging. There is clear evidence that seeding clouds causes an increase in the mean volume of rainfall [24–26]. However, the size of this effect may depend on the pattern of motion of the clouds. As rainfall volume has agricultural impacts, the results may affect the practicality and focus of cloud seeding operations. The data we consider come from testing conducted by the Experimental Meteorology Laboratory in Florida, USA. Total rainfall volume was measured for 27 stationary clouds, 16 of which were seeded and 11 of which were unseeded. The full data set appears in Biondini [27], and the subset relevant to our analysis is presented in Table 1.