Analysis of the Behrens-Fisher Problem Based on Bayesian Evidence
The Behrens-Fisher problem concerns the inferences for the difference between the means of two normal populations without making any assumption about the variances. Although the problem has been extensively studied in the literature, researchers cannot agree on its solution at present. In this paper, we propose a new method for dealing with the Behrens-Fisher problem in the Bayesian framework. The Bayesian evidence for testing the equality of two normal means and a credible interval at a specified level for the difference between the means are derived. Simulation studies are carried out to evaluate the performance of the provided Bayesian evidence.
The Behrens-Fisher problem may arise in the comparison of two treatments, products, and so forth. It concerns comparing the means of two normal distributions whose variances are unknown. Suppose that and are two independent random samples from two normal populations and , respectively, where both and are completely unspecified. We are interested in testing the hypothesis and giving the interval estimation for the difference between two means, .
The difficulty with the Behrens-Fisher problem is that the standard classical frequentist evidence is not available because nuisance parameters are present. Tsui and Weerahandi  introduced the concept of the generalized value to deal with nuisance parameters in testing hypotheses. If the corresponding sample means and sample variances are denoted by , and , , respectively, a generalized frequentist evidence for testing can be formulated by the approach of the generalized value as where is an -variable with 1 and degrees of freedom and is a -variable with parameters and that is independent of , , and . This generalized frequentist solution is formally equivalent to the Bayesian solution given by Jeffreys  or the fiducial solution given by Wallace . Meng  introduced the concept of the posterior predictive value and provided posterior predictive evidence. In the case of Behrens-Fisher problem, this test is formulated as where is an -variable with and degrees of freedom, is a -variable with parameters and that is independent of , and is a variable with a “combined ” distribution:
Behrens  gave a confidence interval for the difference between the two means in a testing context of against based on the pivotal quantity of Bartlett  revealed, from a frequentist perspective, that the coverage probability of the confidence interval given by Behrens is different from the specified confidence coefficient. Fisher  derived a fiducial interval for which has a specified fiducial level by the method of fiducial inference. Neyman illustrated by calculation that an interval estimator with a fiducial level of is not necessarily a confidence interval with a confidence coefficient of . Welch [8, 9] gave approximate solutions of the confidence intervals which are also constructed in a testing context based on the pivotal quantity . In the Bayesian framework, Jeffreys , based on the objective prior constructed a Bayesian credible interval. This interval is algebraically equivalent to the fiducial interval of Fisher.
In this paper, we derive the Bayesian evidence for the Behrens-Fisher problem using the procedure in Yin  for testing point null hypotheses. Based on the provided Bayesian evidence, a Bayesian credible interval at a specified credible level for the difference of the means is derived in a Bayesian testing context.
This paper is organized as follows. In Section 2, we give the main results of the Bayesian analysis of the Behrens-Fisher problem concerning the testing and interval estimation of the difference of two normal means with the variances completely unknown. Some conclusions and discussions are given in Section 3.
2. Main Results
2.1. Bayesian Evidence for the Behrens-Fisher Problem
Yin  introduced a Bayesian measure of evidence for testing point null hypotheses of the form Let be a random sample from a distribution with density , where is an unknown element of the parameter space . The Bayesian evidence against the null hypothesis based on a prior is given by where is the posterior expectation of under the prior and the probability is taken over the posterior distribution of . A smaller means stronger evidence against the null hypothesis . In his work, Yin illustrated that the Bayesian evidence given by (7) under the Jeffreys noninformative prior is just equivalent to the corresponding frequentist evidence for many classical testing situations and showed that the Lindley's paradox in Lindley  can be avoided by this Bayesian method of testing point null hypotheses.
Now consider the Behrens-Fisher problem of testing hypotheses Note that (8) can be reformulated as The posterior distribution for under the objective prior (5) can be obtained as where and are two independent -variables with and degrees of freedom, respectively. Since the posterior expectation of is the Bayesian evidence under the objective prior (5) can be formulated as where the first probability is taken over the posterior distribution of and the second one is taken over two independent -variables and .
Now we carry out a simulation study to illustrate the performance of the proposed Bayesian evidence. The simulation results listed in Table 1 show that is quite reasonable evidence for testing the Behrens-Fisher problem. For fixed values of and , notice that the more significant the difference between and is, the smaller value of we may obtain, which means that the stronger Bayesian evidence for rejecting the null hypothesis of is given. Moreover, gives more reliable and efficient evidence when the population variances are small. It can also be noticed that the Bayesian evidence is very close to the corresponding generalized frequentist evidence in (1) and the posterior predictive evidence in (2).
By this Bayesian evidence for the Behrens-Fisher problem, we consider two examples. One is included in Lehmann . The driving times from a person's house to his working place following two different routes were measured which we list in Table 2. Another one is in Ghosh et al.  where the data which we list in Table 3 is from a clinical trial conducted by Sahu to compare the improvement score of surgical treatment with that of nonsurgical treatment. If it is assumed that the two independent samples in both Tables 2 and 3 are, respectively, drawn from two normal distributions and and if we are interested in the equality of the two means and , each of these two examples reduces to the Behrens-Fisher problem of testing hypotheses (8). For both situations, the Bayesian evidence and the corresponding generalized frequentist evidence and posterior predictive evidence all give very strong evidence of nearly zero for rejecting the null hypothesis that there is no difference between the two means. This agrees with our intuition from the observed data.
2.2. Bayesian Credible Interval
Based on the proposed Bayesian evidence, a credible interval for the difference of means at a specified credible level can be constructed in a testing context. For the following hypothesis testing problem of comparing two normal means: where the variances are completely unspecified, the Bayesian evidence under the objective prior (5) is where the first probability is taken over the posterior distribution of and the second one is taken over two independent -variables and .
Theorem 1. For the Behrens-Fisher problem, let , , and . For a fixed , if satisfies then one has
By Theorem 1, we know that the credible interval for centered at can be easily obtained by . This is a Bayesian interval obtained in a testing context. Interestingly, the resulting interval by our method is just equivalent to that given by Fisher or Jeffreys.
In fact, we have another interesting result about the interval estimation of on the basis of the Bayesian evidence , which shows that the credible interval centered at the posterior expectation for the Behrens-Fisher problem can be constructed by the and quantiles of the posterior distribution of . We summarize this as the following theorem.
Theorem 2. For the Behrens-Fisher problem, yields the credible interval for centered at the posterior expectation as follows: where and are, respectively, the and quantiles of the posterior distribution
Proof. We first prove that the Bayesian evidence for testing (13) can be expressed as
In fact, if , we have
where the second equation is due to the fact that the posterior distribution of is symmetric about . Similarly, if , we have
By (23) and (24) together with the symmetry of the posterior distribution of , we have
It then follows that if and only if and hold simultaneously, which is equivalent to Since the posterior of is symmetric about , is a credible interval centered at . This completes the proof.
Theorem 2 provides another way of constructing the credible interval for . Moreover, we know easily by the proof of Theorem 2 that the credible interval for which is centered at the posterior expectation can be given by even when other priors are used so long as the posterior of is symmetric.
Now we return to the examples of comparing means of driving time and comparing improvement scores of treatments discussed above. We recommend the credible intervals of and for Lehmann's and Sahu's data, respectively, which are obtained according to our procedure. The recommended intervals are essentially equivalent to the intervals given by the method of Fisher or Jeffreys.
We carry out Bayesian analysis of the Behrens-Fisher problem in this paper. The Bayesian evidence for testing the hypothesis against is given. Simulation results show that our evidence performs quite well and is very close to the corresponding generalized frequentist evidence and posterior predictive evidence for the Behrens-Fisher problem. Based on the proposed evidence, a method of constructing the credible interval at a specified level for the difference of means is provided in a Bayesian testing context. It is interesting that the credible interval given by our method is just in accordance with that derived by Fisher or Jeffreys. This way of constructing the credible interval via the Bayesian testing evidence is in analogy with the way of constructing the confidence interval via the frequentist evidence.
By this method of analyzing the Behrens-Fisher problem, we give an efficient way of dealing with nuisance parameters which are the source of the difficulty with this problem. This is because our inferences about are based on the posterior distribution of the interested parameter, which can be easily obtained in the Bayesian framework even when nuisance parameters are present. Both the Bayesian evidence and the credible interval can be computed quite easily by the Monte Carlo method. Furthermore, by this method, even if an informative prior which is different from that in (5) is used, the corresponding Bayesian evidence and credible intervals could be obtained smoothly. In other words, this method provides an efficient way of combining the information contained in the prior and that contained in the samples. Further research would be needed to evaluate the performance of the inferences by the proposed method if an informative prior is introduced.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
The authors thank the editors and reviewers for their kind help and valuable comments that lead to significant improvement of this paper. The work was supported by the Foundation for Training Talents of Beijing (Grant no. 19000532377), the Project of Construction of Innovative Teams and Teacher Career Development for Universities and Colleges Under Beijing Municipality (Grant no. IDHT20130505), and the Research Foundation for Youth Scholars of Beijing Technology and Business University (Grant no. QNJJ2012-03).
H. Jeffreys, Theory of Probability, Oxford University Press, 3rd edition, 1967.
D. L. Wallace, The Behrens-Fisher and Feiller-Creasy Problems, Edited by R. A. Fisher, Springer, New York, NY, USA, 1980.View at: MathSciNet
B. V. Behrens, “Ein Beitrag zur Fehlerberechnung bei wenige Beobachtungen,” Landwirtschaftliches Jahresbuch, vol. 68, pp. 807–837, 1929.View at: Google Scholar
R. A. Fisher, “The fiducial argument in statistical inference,” The Annals of Eugenics, vol. 11, pp. 141–172, 1935.View at: Google Scholar
B. L. Welch, “The significance of the difference between two means when the population variances are unequal,” Biometrika, vol. 29, pp. 350–362, 1938.View at: Google Scholar
H. Jeffreys, Theory of Probability, Oxford University Press, 1961.View at: MathSciNet
D. V. Lindley, “A statistical paradox,” Biometrika, vol. 44, pp. 187–192, 1957.View at: Google Scholar
E. L. Lehmann, Nonparametrics: Statistical Methods Based on Ranks, Holden-Day, San Francisco, Calif, USA, 1975.
J. K. Ghosh, M. Delampady, and T. Samanta, An Introduction to Bayesian Analysis, Springer, New York, NY, USA, 2006.View at: MathSciNet