Research Article  Open Access
Carlos A. Coelho, Filipe J. Marques, Sandra Oliveira, "NearExact Distributions for Likelihood Ratio Statistics Used in the Simultaneous Test of Conditions on Mean Vectors and Patterns of Covariance Matrices", Mathematical Problems in Engineering, vol. 2016, Article ID 8975902, 25 pages, 2016. https://doi.org/10.1155/2016/8975902
NearExact Distributions for Likelihood Ratio Statistics Used in the Simultaneous Test of Conditions on Mean Vectors and Patterns of Covariance Matrices
Abstract
The authors address likelihood ratio statistics used to test simultaneously conditions on mean vectors and patterns on covariance matrices. Tests for conditions on mean vectors, assuming or not a given structure for the covariance matrix, are quite common, since they may be easily implemented. But, on the other hand, the practical use of simultaneous tests for conditions on the mean vectors and a given pattern for the covariance matrix is usually hindered by the nonmanageability of the expressions for their exact distribution functions. The authors show the importance of being able to adequately factorize the c.f. of the logarithm of likelihood ratio statistics in order to obtain sharp and highly manageable nearexact distributions, or even the exact distribution in a highly manageable form. The tests considered are the simultaneous tests of equality or nullity of means and circularity, compound symmetry, or sphericity of the covariance matrix. Numerical studies show the high accuracy of the nearexact distributions and their adequacy for cases with very small samples and/or large number of variables. The exact and nearexact quantiles computed show how the common chisquare asymptotic approximation is highly inadequate for situations with small samples or large number of variables.
1. Introduction
Testing conditions on mean vectors is a common procedure in multivariate statistics. Often a given structure is assumed for the covariance matrix, without testing it, or otherwise this test to the covariance structure is carried out apart. This is often due to the fact that the exact distribution of the test statistics used to test simultaneously conditions on mean vectors and patterns on covariance matrices is too elaborate to be used in practice. The authors show how this problem may be overcome with the development of very sharp and manageable nearexact distributions for the test statistics. These distributions may be obtained from adequate factorizations of the characteristic function (c.f.) of the logarithm of the likelihood ratio (l.r.) statistics used for these tests.
The conditions tested on mean vectors are(i)the equality of all the means in the mean vector,(ii)the nullity of all the means in the mean vectorand the patterns tested on covariance matrices are(i)circularity,(ii)compound symmetry,(iii)sphericity.
Let be a random vector with . The covariance matrix is said to be circular, or circulant, if , , with where , for ; .
For example, for and , we have
Besides the almost obvious area of times series analysis, there is a wealth of other areas and research fields where circular or circulant matrices arise, such as statistical signal processing, information theory and cryptography, biological sciences, psychometry, quality control, and signal detection, as well as spatial statistics and engineering, when observations are made on the vertices of a regular polygon.
We say that a positivedefinite covariance matrix is compoundsymmetric if we can write For example, for , we have
If, in (3), , we say that the matrix is spheric.
The l.r. tests for equality and nullity of means, assuming circularity, and the l.r. tests for the simultaneous test of equality or nullity of means and circularity of the covariance matrix were developed by [1], while the test for equality of means, assuming compound symmetry, and the test for equality of means and compound symmetry were formulated by [2] and the test for nullity of the means, assuming compound symmetry, and the simultaneous test for nullity of the means and compound symmetry of the covariance matrix were worked out by [3]. The exact distribution for the l.r. test statistic for the simultaneous test of equality of means and circularity of the covariance matrix was obtained in [4] and is briefly referred to in Section 2, for the sake of completeness, while nearexact distributions for the l.r. test statistic for the simultaneous test of nullity of the means and circularity of the covariance matrix are developed in Section 3. Nearexact distributions for the l.r. test statistics for the simultaneous test of equality and nullity of the means and compound symmetry of the covariance matrix are developed in Sections 4 and 5, using a different approach from the one used in Section 3. The l.r. statistics for the tests of equality and nullity of all means, assuming sphericity of the covariance matrix, may be analyzed in Appendix C and the l.r. statistics for the simultaneous tests of equality and nullity of all means and sphericity, together with the development of nearexact distributions for these statistics, may be examined in Sections 6 and 7.
Since, as referred above, the exact distributions for the statistics for the simultaneous tests of conditions on means vectors and patterns of covariance matrices are too elaborate to be used in practice, the authors propose in this paper the use of nearexact distributions for these statistics. These are asymptotic distributions which are built using a different concept in approximating distributions which combines an adequately developed decomposition of the c.f. of the statistic or of its logarithm, most often a factorization, with the action of keeping then most of this c.f. unchanged and replacing the remaining smaller part by an adequate asymptotic approximation [5, 6]. All this is done in order to obtain a manageable and very wellfitting approximation, which may be used to compute nearexact quantiles or values. These distributions are much useful in situations where it is not possible to obtain the exact distribution in a manageable form and the common asymptotic distributions do not display the necessary precision. Nearexact distributions show very good performances for very small samples, and when correctly developed for statistics used in Multivariate Analysis, nearexact distributions display a sharp asymptotic behavior both for increasing sample sizes and for increasing number of variables.
In Sections 3–7, nearexact distributions are obtained using different techniques and results, according to the structure of the exact distribution of the statistic.
In order to study, in each case, the proximity between the nearexact distributions developed and the exact distribution, we will use the measure withwhere represents the l.r. statistic, is the exact c.f. of , is the nearexact c.f., and , , , and are the exact and nearexact c.d.f.’s of and .
This measure is particularly useful, since in our cases we do not have the exact c.d.f. of or in a manageable form, but we have both the exact and nearexact c.f.’s for .
2. The Likelihood Ratio Test for the Simultaneous Test of Equality of Means and the Circularity of the Covariance Matrix
Let , where . Then, for a sample of size , the th power of the l.r. statistic to test the null hypothesisis where , is the maximum likelihood estimator (m.l.e.) of , , where is the matrix with running elementwith , and where is the th diagonal element of , andwith , where is the vector of sample means.
This test statistic was derived by [1, sec. 5.2], where the expression for the l.r. test statistic has to be slightly corrected.
According to [1], where are a set of independent r.v.’s.
From this fact we may write the c.f. of as
By adequately handling this c.f., the exact distribution of is obtained in [4] as a Generalized Integer Gamma (GIG) distribution (see [7] for the GIG distribution), since we may writefor
A popular asymptotic approximation for the distribution of is the chisquare asymptotic distribution with a number of degrees of freedom equal to the difference of the number of unknown parameters under the alternative hypothesis and the number of parameters under the null hypothesis, which gives for , for in (8), a chisquare asymptotic distribution with degrees of freedom. Although this is a valid approximation for large sample sizes, in practical terms, this approximation is somewhat useless given the fact that it gives quantiles that are much lower than the exact ones, as it may be seen from the quantiles in Table 1, namely, for small samples or when the number of variables involved is somewhat large.

From the values in Table 1 we may see that even for quite large sample sizes and rather small number of variables as in the case of and , the asymptotic chisquare quantile does not even match the units digit of the exact quantile, a difference that gets even larger as the number of variables increases. The chisquare asymptotic quantiles are always smaller than the exact ones, their use leading to an excessive number of rejections of the null hypotheses, a problem that becomes a grievous one when we use smaller samples or larger numbers of variables.
3. The Likelihood Ratio Test for the Simultaneous Test of Nullity of Means and the Circularity of the Covariance Matrix
For a sample of size , the th power of the l.r. test statistic to test the null hypothesis iswhere , , and , as well as the matrices and , are defined as in the previous section.
According to [1],whereare a set of independent r.v.’s.
Taking and following similar steps to the ones used in [4] to handle the c.f. of , we may write the c.f. of asfor given by (15).
This shows that the exact distribution of is the same as that of the sum of GIG distributions of depth with an independent distributed r.v.
But then, using the result in expression (3) of [8], we know that we can replace asymptotically a distribution by an infinite mixture of distributions , for large values of . This means that we can replace asymptotically
As such, in order to obtain a very sharp and manageable nearexact distribution for , we will use, as nearexact c.f. for ,where the weights , , will be determined in such a way thatwith .
is the c.f. of a mixture of Generalized NearInteger Gamma (GNIG) distributions of depth (see [5] for the GNIG distribution).
As such, using the notation for the p.d.f. and c.d.f. of the GNIG distribution used in Section 3 of [6], the nearexact p.d.f.’s and c.d.f.’s for and arewith given by (15).
In Table 2 we may analyze values of the measure in (5) for the nearexact distributions developed in this section, for different values of and different sample sizes. We may see how these nearexact distributions display very low values of the measure , indicating an extremely good proximity to the exact distribution, even for very small sample sizes, and how they display a sharp asymptotic behavior for increasing values of and .

In Table 3 we may analyze the asymptotic quantiles for for the common chisquare asymptotic approximation for l.r. statistics, here with degrees of freedom and the quantiles for the nearexact distributions that equate 2, 6, and 10 exact moments. These quantiles are shown with 26 decimal places in order to make it possible to identify the number of correct decimal places for the quantiles of the nearexact distributions that match 2 and 6 exact moments. We should note that the quantiles of the nearexact distributions that match 10 exact moments always have much more than 26 decimal places that are correct. Also for the statistic in this section, we may see the lack of precision of the asymptotic chisquare quantiles.

4. The Likelihood Ratio Test for the Simultaneous Test of Equality of Means and Compound Symmetry of the Covariance Matrix
Let us assume that , with . We are interested in testing the hypothesis where represents a compound symmetric matrix, as defined in (3).
For a sample of size , the th power of the l.r. test statistic is (see [2]) where with being the sample matrix and a matrix of 1’s of dimension , with
Wilks [2] has also shown that whereform a set of independent r.v.’s.
As such, the th moment of may be written as
Since the expression in (33) remains valid for any complex , we may write the c.f. of aswhich may be rewritten asThen, we may apply on the relation to obtainwith
Expression (37) shows that the exact distribution of is the same as that of the sum of GIG distributed r.v.’s of depth with an independent sum of independent r.v.’s.
Our aim in building the nearexact distribution will be to keep unchanged and approximate asymptotically .
In order to obtain this asymptotic approximation, we will need to use a different approach from the one used in the previous section. We will use the result in sec. 5 of [9], which implies that a distribution may be asymptotically replaced by an infinite mixture of distributions.
Using a somewhat heuristic approach, we will thus approximate by a mixture of distributions where is the sum of the second parameters of the Logbeta r.v.’s in and is the common rate parameter in the mixture of two Gamma distributions that matches the first 4 moments of , that is, in
As such, in order to build the nearexact distributions for , we will use, as near exact c.f. for , where the weights , , will be determined in such a way thatwith .
The c.f. in (41) is, for integer , the c.f. of a mixture of GIG distributions of depth or, for noninteger , the c.f. of a mixture of GNIG distributions of depth , with shape parameters and rate parameters .
This will yield, for noninteger , nearexact distributions whose p.d.f.’s and c.d.f.’s for and are with given by (38). For integer , we will only have to replace in the above expressions the GNIG p.d.f. and c.d.f. by the GIG p.d.f. and c.d.f., respectively.
In Table 4, we may analyze values of the measure in (5) for the nearexact distributions developed in this section, for different values of and different sample sizes. We may see how these nearexact distributions display, once again, very low values of the measure even for very small sample sizes, indicating an extremely good proximity to the exact distribution and how, once again, they display a sharp asymptotic behavior for increasing values of and , although for large values of , namely, for in Table 4, one may have to consider larger values of in order to be able to observe the asymptotic behavior in terms of sample size.
