Exact Inference for the Dispersion Matrix
We develop a new and novel exact permutation test for prespecified correlation structures such as compound symmetry or spherical structures under standard assumptions. The key feature of the work contained in this note is the distribution free aspect of our procedures that frees us from the standard and sometimes unrealistic multivariate normality constraint commonly needed for other methods.
Let be an iid -dimensional multivariate sample from an absolutely continuous distribution with dispersion matrix of as Inference about the dispersion matrix takes the general form where we assume that is specified in a particular manner, for example, a block diagonal matrix or a spherical type structure or simply an unstructured form.
In general, research and testing methods of this form assume an underlying multivariate normal distribution with associated exact and approximate tests; for example, for a thorough overview and history of this testing problem, see Seber  and the references therewithin. In practice one can safely say that it would be rare that the multivariate normality assumption holds. Hence, we were motivated to develop an exact permutation method approach to this problem. To the best of our knowledge no so-called exact permutation tests have been developed or explored with the exception of the very special case of dimensions and testing ; for example, see Good . Martin  provides a bootstrap algorithm for testing , which asymptotically can be shown to have the appropriate type I error rate. Unfortunately, the bootstrap methods given by Martin  relative to first standardizing the variables and rotating the data so as to transform the problem to the setting of testing do not work in the permutation setting. The permutation test for the case follows by permuting the second column of the data matrix and calculating the test statistic , where refers to the standard sample Pearson correlation coefficient, over all permutations. This can be done directly via a computationally expensive algorithm or via the more widely used Monte Carlo techniques. With respect to the Monte Carlo methods we generate random permutations of the data and denote the permuted value of the test statistic by . Then the one-sided value for the alternative is given as , where the index corresponds to a given permutation and denotes the indicator function. Alternative approaches found in software packages such as SAS PROC FREQ (SAS version 9.3, Cary, NC) utilize hypergeometric probabilities similar to how Fisher’s exact test is carried out via treating the fixed data as discrete.
In general permutation testing is most often used for comparing two groups in the context of location differences or other features of distributions such as scale measures. Most of the theoretical work has been done in this setting such as type I error control. For a technical treatment of permutation testing see Romano  with respect to a theoretical examination for the behavior of the type I error control for permutation tests under exchangeability versus nonexchangeability conditions. In order to ensure true bounded type I error control in the permutation testing setting either the null hypothesis has to be specified in such a way that exchangeability holds by definition under or some design feature such as randomization or matching needs to be employed. Commenges  studies the more general transformation approach used to preserve exchangeability. Also, see Zhang , Huang et al. , and Janssen and Pauls  with respect to the inflation of type I error rates when comparing means in the two-sample setting along with other types of comparisons. In terms of permutation testing related to correlation structure based hypotheses very little has been accomplished. This paper represents one of the only investigations of this type to date.
In Section 2 we develop the general -dimensional exact tests given a prespecified covariance structure. Special cases include testing for sphericity, compound symmetry, and block diagonality, to name a few. This presentation is followed by a simulation study in Section 3. We then apply our method in Section 4 to an example involving repeated measures mice weight data.
2. Exact Tests for Covariance Structures in Dimensions
The focus of the work in this setting is with respect to two-sided alternatives. In certain instances a subset of these tests with one-sided alternative structure may be constructed. Those tests will not be included as part of this discussion due to the specificity of their applications.
2.1. Unequal Variance Setting
Now let be an iid -dimensional multivariate sample from an absolutely continuous distribution with the first two finite central moments corresponding to each component of given as and , . Let , , , . Furthermore, denote the dispersion matrix of by where is defined to be a positive definite matrix. We represent the Cholesky decomposition of the matrix as such that is defined. The Cholesky decomposition is a key component of the permutation test we propose; however it is not unique to the problem; that is, other decomposition methods may yield similar results and alternative solutions. From a practical standpoint the Cholesky decomposition is built in to several statistical software packages, thus making our methodology more feasible for a larger group of practitioners.
Now our more general hypothesis of interest takes the form where are the hypothesized value of at (3).
Test Statistic. Let the matrix denote the transpose of with the hypothesized values as given elements from (5). Let the matrix denote the data matrix following transformation. Then the dispersion matrix corresponding to the matrix will be a diagonal matrix such that , if and only if at (5) holds true. Under these conditions testing at (5) is equivalent to the test: where the off-diagonal elements of are equal to 0 under at (6).
An exact -level permutation test of can be defined for (6) by considering the permutation of each column of and employing the Pearson correlation coefficient for each combination of columns. Towards this end let us denote the with the corresponding Pearson estimator by . The test statistic of interest in the two-sided case with respect to detecting departures from at (6) is defined as
The exactness of the test in terms of the type I error control follows from a straightforward generalization of the form of the dispersion matrix for the case, where In the case the off-diagonal elements of the dispersion matrix are given asAn examination of the covariance term corresponding to at (9) clearly indicates that it has the value of 0 if and only if at (8) is true. When testing at (8) the Pearson correlation estimate between the transformed variates through , , and serves to appropriately detect departures from . Within the permutation testing framework provides an exact -level test; that is, the covariance of and is 0 if and only if the correlation of and is 0.
We resort to a Monte Carlo approximation in order to obtain the value for testing defined at (5). The steps for performing the Monte Carlo approximation with respect to estimating the value are as follows.(1)Define at (5).(2)Obtain the new random variates by applying the transformation to the observed data .(3)Calculate at (7).(4)Permute each column of independently such that we have the permuted matrix denoted by .(5)Calculate applying the resampled values to at (7).(6)Repeat steps (4) and (5) times.(7)Calculate the Monte Carlo estimated permutation value as , where denotes the indicator function.
2.2. Nontransformation Special Cases
In certain special cases we can test specific forms of hypothesis (5) using our permutation approach without specifying a specific subset of the ’s or ’s. One obvious special case relative to testing hypothesis (5) is the test given for a diagonal dispersion matrix versus nondiagonal dispersion matrix such that under all . Historical tests of this form have relied on assuming -variate multivariate normality; for example, see Mudholkar et al.  for a description of a likelihood ratio approximation to this test. In this instance we have with unspecified ’s under .
In this case there is no transformation of the data required. An exact -level permutation test of can be defined simply by considering the permutation of each column of and employing the Pearson correlation coefficient for each combination of columns. Towards this end denote the with the corresponding Pearson estimator by . The test statistic of interest with respect to detecting departures from the diagonal structure is defined as The Monte Carlo estimated permutation value is calculated similarly as before, where , where denotes the indicator function.
Another special case where we can have a set of unspecified ’s or ’s is when we may be interested in testing for a block diagonal dispersion matrix structure such that under at (5) we now have where the partitioned matrices are given as The dispersion matrix may have different dimensions , , with unspecified ’s and ’s under .
As in the test for a diagonal dispersion matrix above there is no transformation of the data required. An exact -level permutation test of can again be defined simply by considering the permutation of each column of and employing the Pearson correlation coefficient for each combination of columns. The test statistic of interest with respect to detecting departures from the diagonal structure is a slight modification of the test statistic at (11) defined as where the “off-block” correlation elements at (12), , under and denote the indicator function. The Monte Carlo estimated permutation value is calculated similarly as before, where and denotes the indicator function.
2.3. Equal Variance Setting
For the equal variance -dimensional case we have where we now define the dispersion matrix under as where is defined to be a positive definite matrix, , , under and the correlation matrix is given by The Cholesky decomposition of matrix is as In order to test the hypothesis at (15) we utilize the transformation . Note that at (15) will be sensitive to departures from and unequal marginal variances. Furthermore, explicit values for do not need to be specified within this hypothesis testing framework.
The steps for performing the Monte Carlo approximation with respect to estimating the value are as follows.(1)Define at (15).(2)Obtain the new random variates by applying the transformation to the observed data .(3)Calculate at (7).(4)Permute each column of independently such that we have the permuted matrix denoted by .(5)Calculate .(6)Repeat steps (4) and (5) times.(7)Calculate the Monte Carlo estimated permutation value as , where denotes the indicator function.
A special case relative to testing hypothesis (15) is the test that the dispersion matrix is diagonal and all , .
Other special cases of the test at (15) may be of interest and written in the form Examples of specific dispersion structures of importance corresponding to the test at (19) include (1)sphericity: (2)compound symmetry: (3)first-order autoregressive: (4)spatial power:
Several other well-known spatial dispersion matrices similar to the spatial power matrix presented above fit within this same framework and will not be presented here.
3. Simulation Study
In this section we examine the test at (19), where we specify and the form of the correlation structure at (17), for example, compound symmetry. Our simulation study for the case will utilize a -variate standardized multivariate normal distribution with and a special case mixing the marginal distributions across normal, exponential, and uniform forms. Again, differing location and scale doe not vary the general conclusions. In terms of our simulation study we set the null value of under a compound symmetry assumption and a first-order autoregressive assumption, where will take the forms:(1)compound symmetry: (2)first-order autoregressive:
Note that the special case is the same for both covariance structures and is only presented once. It should also be noted that under the assumption of multivariate normality testing under the compound symmetry or first-order autoregressive structure is a special case (equal variance assumption) of the well-known test for “complete independence”; for example, see Mudholkar et al. . Under nonnormality we are essentially testing the “complete uncorrelated” case. In this special case the methods presented here are the first exact methods developed for tackling this particular hypothesis. In terms of large sample theory around similar results see Jiang  and Xiao and Wu .
For our simulation study we used 1000 replicates for our study at and set . The covariance structure was the same under and for this set of simulations. The results are contained in Figures 1, 2, 3, 4, and 5. As anticipated we see the expected results of appropriate type I error control and monotone power functions increasing in either direction about the null value for . The range of under the alternative was dictated by the constraint that is defined to be positive definite.
For the sake of example we modified our simulation and took with marginals given by , , and with assumed to be compound symmetric under and under . The results are shown in Figure 6 and as we can see they do not differ dramatically from Figure 2 assuming multivariate normality, thus illustrating the flexibility and nonparametric nature of our methodology.
As an additional result we studied the power under the correctly specified under with differing in structure. For this study we set with set to compound symmetry under and set to the first-order autoregressive structure under . In other words what is the power to detect a correlation structure different from the null structure given that is the true correlation. At the power to detect a different correlation structure under the alternative for and at and was 0.245, 0.539, 0.761, and 0.894 and 0.520, 0.858, 0.951, and 0.998, respectively.
As an illustration of our method we will use phenotypic weight data from mice as contained in Table 1 from a recent unpublished study conducted within Roswell Park Cancer Institute. The estimated correlation matrix is provided in Table 2. The respective sample variance estimates were , , , , and . For example, suppose we were interested in testing for both having the compound symmetry structure or the first-order autoregressive structure as defined in (23). In this instance the test corresponding to the above hypothesis under a compound symmetry correlation structure yielded a Monte Carlo estimated value <0.0001 (). While the test corresponding to the above hypothesis under the first-order autoregressive correlation structure yielded a Monte Carlo estimated value = 0.002 (). For this example this provides some measure of evidence that the correlation structure does not fit the compound symmetry structure and that the first-order autoregressive structure assuming may be more appropriate. Similarly, the test for diagonality under the equal variances assumption (sphericity), which does not assume a value for , yielded a Monte Carlo estimated value <0.0001 (). Note that we may be rejecting at some specified level under at least one of 3 scenarios: unequal marginal variances, or .
Given our overall value from above which was 0.002, we may wish to examine in further detail what is driving us to reject . In this case we can examine specific submatrices of the dispersion matrix of . For this example we could test using days 0, 1, and 2, only or any other combinations of days such that the appropriate correlation substructure is extracted from the original hypothesized values for . For our example subtest we get indicating no strong evidence against a first-order autoregressive “substructure” with equal variances and . If we add day 3, our value = 0.04, indicating either the correlation structure may be misspecified at this point or the variance is different. Note that further work relative to the multiple comparison problem of subtests and their relative correlation is needed. This is simply an exploratory approach to this issue relative to the example at hand.
5. Concluding Remarks
In this note we provided a method for exact testing around specific covariance structures. We employed the Cholesky decomposition for this purpose. It was noted by a reviewer that other decomposition methodologies may lead to extensions of this methodology, which we will consider in terms of future work.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
This research is supported by the NIH Grant 1R03DE020851-01A1, the National Institute of Dental and Craniofacial Research. The authors wish to thank the reviewers for their time and effort.
G. A. F. Seber, Multivariate Observations, John Wiley & Sons, New York, NY, USA, 1984.View at: Publisher Site | MathSciNet
P. Good, “Robustness of Pearson correlation,” Interstat, vol. 15, no. 5, pp. 1–6, 2009.View at: Google Scholar
M. A. Martin, “Bootstrap hypothesis testing for some common statistical problems: a critical evaluation of size and power properties,” Computational Statistics & Data Analysis, vol. 51, no. 12, pp. 6321–6342, 2007.View at: Publisher Site | Google Scholar | MathSciNet
J. P. Romano, “On the behavior of randomization tests without a group invariance assumption,” Journal of the American Statistical Association, vol. 85, no. 411, pp. 686–692, 1990.View at: Publisher Site | Google Scholar | MathSciNet
D. Commenges, “Transformations which preserve exchangeability and application to permutation tests,” Journal of Nonparametric Statistics, vol. 15, no. 2, pp. 171–185, 2003.View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
S. P. Zhang, “The split sample permutation -tests,” Journal of Statistical Planning and Inference, vol. 139, no. 10, pp. 3512–3524, 2009.View at: Publisher Site | Google Scholar | MathSciNet
Y. Huang, H. Xu, V. Calian, and J. C. Hsu, “To permute or not to permute,” Bioinformatics, vol. 22, no. 18, pp. 2244–2248, 2006.View at: Publisher Site | Google Scholar
A. Janssen and T. Pauls, “A Monte Carlo comparison of studentized bootstrap and permutation tests for heteroscedastic two-sample problems,” Computational Statistics, vol. 20, no. 3, pp. 369–383, 2005.View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
G. S. Mudholkar, M. C. Trivedi, and C. T. Lin, “An approximation to the distri bution of the likelihood ratio statistics for testing complete independence,” Technometrics, vol. 24, no. 2, pp. 139–143, 1982.View at: Publisher Site | Google Scholar
T. Jiang, “The asymptotic distributions of the largest entries of sample correlation matrices,” The Annals of Applied Probability, vol. 14, no. 2, pp. 865–880, 2004.View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
H. Xiao and W. B. Wu, “Asymptotic theory for maximum deviations of sample covariance matrix estimates,” Stochastic Processes and their Applications, vol. 123, no. 7, pp. 2899–2920, 2013.View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet