Abstract
We proposed a statistical method to construct simultaneous confidence intervals on all linear combinations of means without assuming equal variance where the classical Scheffé's simultaneous confidence intervals no longer preserve the familywise error rate (FWER). The proposed method is useful when the number of comparisons on linear combinations of means is extremely large. The FWERs for proposed simultaneous confidence intervals under various configurations of mean variances are assessed through simulations and are found to preserve the predefined nominal level very well. An example of pairwise comparisons on heteroscedastic means is given to illustrate the proposed method.
1. Introduction
Multiple comparisons on a large number of linear combinations of means is of general interest in many applications. If an inferential statistical procedure relies on the number of comparisons, it may be quite challenge as the number of comparisons is increasing. Additionally, oftentimes we may not be able to make the assumption that all variances of means are equal. Many authors proposed various methods for multiple comparison on means in the past. Scheffé [1] proposed a method to construct simultaneous confidence intervals for all linear combinations of means while keeping Type I error under control. Since Scheffé's method constructs simultaneous confidence intervals for all possible linear combinations of means, his method has its own advantage when dealing with a large number of comparisons on linear combinations of means. It is understood that there are three major assumptions for Scheffé’s simultaneous confidence intervals to be constructed correctly. (1) The samples are independent, (2) the populations are normally distributed, and (3) populations have an equal variance. The third assumption, often referred to as homoscedasticity, is most vulnerable. The violation of homoscedasticity often results in inflation of the familywise error rate (FWER). As pointed out by Scheffé [2], his method has certain robustness when the group sample sizes are the same even when the variances are not equal. However, the FWER is out of control in situation where both the variances and sample sizes are unequal. No explicit formula is available so far for simultaneous confidence intervals on all linear combinations of means in the case of unequal variances.
The problem of comparisons on two means in the case of unequal population variances is known as the Behrens-Fisher problem [3]. Dunnett [4, 5] and Nel and van der Merwe [6] published simulation-based results on assessing different pairwise mean comparison procedures in the unequal variance case. Kim [7] proposed a practical solution to the Behrens-Fisher problem using the geometry of confidence ellipsoids for two mean vectors. Wilcox [8] tackled the Behrens-Fisher problem via trimmed means. Christensen and Rencher [9] compared Type I error rates and power levels in the Behrens-Fisher problem. Fouladi and Yockey [10] conducted a Monte Carlo study to evaluate the performance of the tests on means under the conditions of normality and abnormality. Hoover [11] discussed behavioral interventions with heterogeneous subgroup effects in clinical trials. In this paper, a method for constructing simultaneous confidence intervals on all linear combinations of means with unequal variances is proposed. Since there is no limitation for the number of linear combinations of means the proposed method may be used in situation where the comparisons on a large number of linear combinations of means is deemed to be necessary. The proposed simultaneous confidence intervals, to which we refer as the generalized Scheffé’s confidence intervals, have an explicit format that is similar to their classical counterparts. The equal mean variance assumption is no longer needed. In addition, these simultaneous confidence intervals become the classical Scheffé's confidence intervals when all population variances and sample sizes are equal. Most importantly, the proposed simultaneous confidence intervals preserve FWER in all configurations of variances and sample sizes.
2. Generalized Scheffé Confidence Intervals
Suppose that we have populations and let be the true mean and variance for population . Let be the sample size, sample mean, and sample variance of the th population. In the case of equal variance among populations, that is, , Scheffé simultaneous confidence intervals on all linear combinations of means are given by: where the mean squared error is the pooled estimate of the common variance from populations; is the upper th quantile from the distribution with degrees of freedom , ; is the total sample size. If constants satisfy , Scheffé’s simultaneous confidence intervals on all contracts are given by: If pairwise comparisons are of interest, we can set one pair of to be and rest s to be zero. This is a special case of contrast. Note that Scheffé’s intervals are useful when dealing with a large number of linear combinations of means. When the total number of observations and the number of populations are determined, the quantity stays the same regardless the number of simultaneous confidence intervals. For the Bonferroni approach, the width of the confidence intervals tend to be wider if the number of linear combinations of means is increasing. Suppose that we have 10 populations each with a sample size 10. If we have 100 simultaneous confidence intervals for the linear combinations of means, the in Scheffé's method is . If we apply Bonferroni’s approach the . This means that the width of Scheffé's intervals may be shorter than the width of the Bonferroni's intervals. There is a breakdown point such that Scheffé's intervals may be shorter than the Bonferroni’s intervals when the number of linear combinations of means gets larger. This alerts the common perception that Scheffé's intervals are more conservative than Bonfferoni's intervals.
We now consider the problem of constructing simultaneous intervals without assuming equal variance. Let and define
Note that and . Therefore, and are linear combinations of variables with .
Finding the exact distribution of linear combination of variables, known as Satterthwaite’s problem, is rather difficult. Satterthwaite tried to approximate this type of variable as a random variable divided by its degrees of freedom (see [12]). This degree of freedom is then solved via the method of moment estimation. As noted in Casella and Berger [12], for a variable , we have . Hence
We then set , and , where and are the respective degrees of freedom for and . By applying the results above we can estimate and . First we consider , which can be found as A natural estimate of is given by . For , we have
It can be estimated by and . Furthermore, note that is independent of , therefore, has approximately the distribution with degrees of freedom and . It turns out that has a very simple form Note that if the populations have equal variance, , we have ; additionally, if all populations have the same sample size, that is, , then .
To derive the generalized Scheffé's interval we would need the following projection lemma (see [13] pages 231-232). For real numbers and all to satisfy the following inequality: the necessary and sufficient condition is . We then choose and let satisfy which constitutes the interior of a -dimensional sphere centered at the point with radius . By applying the projection lemma to vector , where , we have Choosing , the quantile of an distribution with and degrees of freedom, based on the results in (2.7), we have
Applying the projection lemma this probability can be pivoted to give the following generalized simultaneous confidence intervals for , For population mean ’s and their pairwise differences , the generalized Scheffé’s confidence intervals are where . By comparing (2.1) with (2.11), it can be seen that the generalized Scheffé's confidence intervals are very similar to their classical counterparts.
3. Assessment of Familywise Error Rate
The Type I error in multiple comparisons is referred to as the probability of incorrectly rejecting at least one of the null hypotheses that make up the family. The validity of the proposed generalized Scheffé's confidence intervals largely lies in successfully controlling the FWER at a given nominal level .
There are two major factors, population sample sizes and variances, which affect the performance of the Scheffé's confidence intervals. We will show through simulation that the FWER will be inflated in the situation where population variances are unequal.
A variety of configurations of variances and sample sizes will be selected to assess the performance of the generalized Scheffé method. To this end, the number of groups is chosen to be . Without loss of generality, we use 0 for all population means, that is, . The specification of sample sizes and variances is given in Table 1.
Although Scheffé’s intervals apply to inference on all linear combinations, for simplicity, we have focused on two sets of inferences only: population means and their pairwise differences. For each configuration we conducted 5,000 simulation runs and for each run 95% Scheffé's intervals and generalized Scheffé's intervals on both population means and pairwise mean differences were computed. We then obtained the coverage rates that the proposed intervals contain the true means, which all equal 0.
Table 1 reports the coverage rates based on both methods. Note that the empirical FWER would be one minus the coverage rate. Clearly, in the case of equal variances, both methods give very similar rates of coverage for balanced design or unbalanced design. In the unequal variance case, the coverage rate of Scheffé’s method drops. However, its FWER still stays well within the nominal level, that is, around , for balanced designs. This confirms the notion of Scheffé that his method is robust to heteroscedasticity when sample sizes from populations are equal. We notice that the FWER is inflated when sample sizes are different among the populations. It can be found from Table 1 that when , for sample sizes the FWERs are 12.3%, 27%, 11.6%, and 26.5%, respectively. When , for sample sizes , the FWERs are 12.8%, 23.5%, 12.95%, and 27.45%, respectively. Note that these FWERs are all significantly greater than the nominal level . It can be seen that the greater the difference in sample sizes is the larger the corresponding FWER will be. On the other hand, the performance of the generalized Scheffé method is much more robust. For the same configuration settings, the FWERs based on the generalized Scheffé‘s intervals are between 0.025% and 0.038%. Although it is conservative, but it stays well within the nominal level of .
It would also be interesting to see how different in width the two types of intervals are. Comparing (2.1) with (2.12), one can see that the difference between them are due to the following two terms:
The averaged and from 5,000 simulation runs are presented in Table 2.
It can be seen that they are very close to each other in the case of equal variances. However, in the case of unequal variances, becomes over optimistically smaller than , which leads to the inflation of FWER. Finally, Scheffé's intervals are derived from the fact that the statistic follows the distribution under a number of assumptions. When these assumptions are violated, the performance of Scheffé's intervals would depend on how the above statistic deviates from the distribution . For the generalized Scheffé's intervals, the FWER largely depends on how accurately approximates . Figure 1 plots the empirical distribution function of and the statistic in (2.13), along with their designated distribution. We selected the following four different configurations of variances and sample sizes, which correspond to homoscedastic/heteroscedastic and balanced/unbalanced cases: (1), , ,(2), , .
(a)
(b)
The configuration (1) indicates the equal variance for the 4 means with equal or different sample sizes. The configuration (2) indicates the unequal variances for the 4 means with equal or different sample sizes. We calculate the empirical distribution function of , and it can be seen that they are nearly overlaps with in all four cases of configurations of variance and sample sizes (1(a)–4(a) in Figure 1). The overlapping between edf of and suggests an excellent approximation of the distribution to the ratio of and . In addition, the edf of the statistic also matches well with the distribution (1(b)–3(b) in Figure 1), except in the unbalanced heteroscedastic case where Scheffé's method fails (4(b) in Figure 1). This explains why the FWER is inflated in the case of unequal variances.
One last comment, the above simulation results suggest that the widths of the generalized Scheffé intervals tend to be wider than that of the Scheffé intervals. This is our overall impression, but may not always be true in general. In the simulations, from time to time, we observed narrower generalized Scheffé intervals. We will see this feature from the data analysis example in the next section.
4. Example of Data Analysis
Solomon et al. [14] studied smoking behavior in pregnant women. They examined the women's determination to quit smoking while pregnant. They interviewed 349 women at their first prenatal visit, all of whom were smokers when they became pregnant, and were classified into four groups: precontemplation (PC), contemplation (C), preparation (P), and action (A). Their intention was to look at the subsequent smoking behavior of these subjects during the course of pregnancy, but one important consideration was how much these women smoked when they became pregnant. The sample sizes, means, and standard deviations of these four groups, in terms of cigarettes smoked per day when they became pregnant, are given in Table 4. Noting that the smallest sample size is 37, we do not need to worry about the normality assumption even if the response of interest is count or integer.
Table 3 presents the Scheffé’s intervals and the generalized Scheffé intervals for the four group means and their differences. Since both sample sizes and variances are quite different from each other, the generalized Scheffé intervals are more reliable.
One may make a number of inferences with a joint confidence level of . For example, women in the preparation (P) group have an average number of cigarettes every day ranging from 26.09 to 31.51, which seems to be the most frequent smoker group. There is no significant difference found between group P and group PC, because their difference has a confidence interval that includes 0. It is also quite interesting to notice that the generalized Scheffé’s intervals are even narrower than the Scheffé intervals.
5. Discussion
Among others, the Scheffé method is one of the commonly-used method to make simultaneous inference on all linear combinations of means. Scheffé intervals are for all possible linear combinations of means and this brings benefit if a large number of linear combinations of means need to be compared. Assumption of equal variance for all means is needed to control type I error. When this assumption is violated the proposed method can be conveniently used for constructing simultaneous confidence intervals where type I error is controlled at a prespecified nominal level. Results from simulations show that the FWER of the proposed simultaneous confidence intervals are well preserved at a nominal level and the equal variance assumption can be simply ignored.