Table of Contents Author Guidelines Submit a Manuscript
Mathematical Problems in Engineering
Volume 2016, Article ID 8975902, 25 pages
http://dx.doi.org/10.1155/2016/8975902
Research Article

Near-Exact Distributions for Likelihood Ratio Statistics Used in the Simultaneous Test of Conditions on Mean Vectors and Patterns of Covariance Matrices

1Centro de Matemática e Aplicações (CMA), FCT/UNL, 2829-516 Caparica, Portugal
2Departamento de Matemática, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, 2829-516 Caparica, Portugal
3Departamento de Economia e Gestão, Instituto Politécnico de Setúbal, 2910-761 Setúbal, Portugal

Received 31 May 2015; Accepted 23 November 2015

Academic Editor: Andrzej Swierniak

Copyright © 2016 Carlos A. Coelho et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

The authors address likelihood ratio statistics used to test simultaneously conditions on mean vectors and patterns on covariance matrices. Tests for conditions on mean vectors, assuming or not a given structure for the covariance matrix, are quite common, since they may be easily implemented. But, on the other hand, the practical use of simultaneous tests for conditions on the mean vectors and a given pattern for the covariance matrix is usually hindered by the nonmanageability of the expressions for their exact distribution functions. The authors show the importance of being able to adequately factorize the c.f. of the logarithm of likelihood ratio statistics in order to obtain sharp and highly manageable near-exact distributions, or even the exact distribution in a highly manageable form. The tests considered are the simultaneous tests of equality or nullity of means and circularity, compound symmetry, or sphericity of the covariance matrix. Numerical studies show the high accuracy of the near-exact distributions and their adequacy for cases with very small samples and/or large number of variables. The exact and near-exact quantiles computed show how the common chi-square asymptotic approximation is highly inadequate for situations with small samples or large number of variables.

1. Introduction

Testing conditions on mean vectors is a common procedure in multivariate statistics. Often a given structure is assumed for the covariance matrix, without testing it, or otherwise this test to the covariance structure is carried out apart. This is often due to the fact that the exact distribution of the test statistics used to test simultaneously conditions on mean vectors and patterns on covariance matrices is too elaborate to be used in practice. The authors show how this problem may be overcome with the development of very sharp and manageable near-exact distributions for the test statistics. These distributions may be obtained from adequate factorizations of the characteristic function (c.f.) of the logarithm of the likelihood ratio (l.r.) statistics used for these tests.

The conditions tested on mean vectors are(i)the equality of all the means in the mean vector,(ii)the nullity of all the means in the mean vectorand the patterns tested on covariance matrices are(i)circularity,(ii)compound symmetry,(iii)sphericity.

Let be a random vector with . The covariance matrix is said to be circular, or circulant, if ,  , with where , for ; .

For example, for and , we have

Besides the almost obvious area of times series analysis, there is a wealth of other areas and research fields where circular or circulant matrices arise, such as statistical signal processing, information theory and cryptography, biological sciences, psychometry, quality control, and signal detection, as well as spatial statistics and engineering, when observations are made on the vertices of a regular polygon.

We say that a positive-definite covariance matrix is compound-symmetric if we can write For example, for , we have

If, in (3), , we say that the matrix is spheric.

The l.r. tests for equality and nullity of means, assuming circularity, and the l.r. tests for the simultaneous test of equality or nullity of means and circularity of the covariance matrix were developed by [1], while the test for equality of means, assuming compound symmetry, and the test for equality of means and compound symmetry were formulated by [2] and the test for nullity of the means, assuming compound symmetry, and the simultaneous test for nullity of the means and compound symmetry of the covariance matrix were worked out by [3]. The exact distribution for the l.r. test statistic for the simultaneous test of equality of means and circularity of the covariance matrix was obtained in [4] and is briefly referred to in Section 2, for the sake of completeness, while near-exact distributions for the l.r. test statistic for the simultaneous test of nullity of the means and circularity of the covariance matrix are developed in Section 3. Near-exact distributions for the l.r. test statistics for the simultaneous test of equality and nullity of the means and compound symmetry of the covariance matrix are developed in Sections 4 and 5, using a different approach from the one used in Section 3. The l.r. statistics for the tests of equality and nullity of all means, assuming sphericity of the covariance matrix, may be analyzed in Appendix C and the l.r. statistics for the simultaneous tests of equality and nullity of all means and sphericity, together with the development of near-exact distributions for these statistics, may be examined in Sections 6 and 7.

Since, as referred above, the exact distributions for the statistics for the simultaneous tests of conditions on means vectors and patterns of covariance matrices are too elaborate to be used in practice, the authors propose in this paper the use of near-exact distributions for these statistics. These are asymptotic distributions which are built using a different concept in approximating distributions which combines an adequately developed decomposition of the c.f. of the statistic or of its logarithm, most often a factorization, with the action of keeping then most of this c.f. unchanged and replacing the remaining smaller part by an adequate asymptotic approximation [5, 6]. All this is done in order to obtain a manageable and very well-fitting approximation, which may be used to compute near-exact quantiles or values. These distributions are much useful in situations where it is not possible to obtain the exact distribution in a manageable form and the common asymptotic distributions do not display the necessary precision. Near-exact distributions show very good performances for very small samples, and when correctly developed for statistics used in Multivariate Analysis, near-exact distributions display a sharp asymptotic behavior both for increasing sample sizes and for increasing number of variables.

In Sections 37, near-exact distributions are obtained using different techniques and results, according to the structure of the exact distribution of the statistic.

In order to study, in each case, the proximity between the near-exact distributions developed and the exact distribution, we will use the measure withwhere represents the l.r. statistic, is the exact c.f. of , is the near-exact c.f., and , , , and are the exact and near-exact c.d.f.’s of and .

This measure is particularly useful, since in our cases we do not have the exact c.d.f. of or in a manageable form, but we have both the exact and near-exact c.f.’s for .

2. The Likelihood Ratio Test for the Simultaneous Test of Equality of Means and the Circularity of the Covariance Matrix

Let , where . Then, for a sample of size , the th power of the l.r. statistic to test the null hypothesisis where , is the maximum likelihood estimator (m.l.e.) of , , where is the matrix with running elementwith , and where is the th diagonal element of , andwith , where is the vector of sample means.

This test statistic was derived by [1, sec. 5.2], where the expression for the l.r. test statistic has to be slightly corrected.

According to [1], where are a set of independent r.v.’s.

From this fact we may write the c.f. of as

By adequately handling this c.f., the exact distribution of is obtained in [4] as a Generalized Integer Gamma (GIG) distribution (see [7] for the GIG distribution), since we may writefor

A popular asymptotic approximation for the distribution of is the chi-square asymptotic distribution with a number of degrees of freedom equal to the difference of the number of unknown parameters under the alternative hypothesis and the number of parameters under the null hypothesis, which gives for , for in (8), a chi-square asymptotic distribution with degrees of freedom. Although this is a valid approximation for large sample sizes, in practical terms, this approximation is somewhat useless given the fact that it gives quantiles that are much lower than the exact ones, as it may be seen from the quantiles in Table 1, namely, for small samples or when the number of variables involved is somewhat large.

Table 1: Exact and asymptotic 0.95 and 0.99 quantiles for where for the statistic in (8), for different values of and samples of size .

From the values in Table 1 we may see that even for quite large sample sizes and rather small number of variables as in the case of and , the asymptotic chi-square quantile does not even match the units digit of the exact quantile, a difference that gets even larger as the number of variables increases. The chi-square asymptotic quantiles are always smaller than the exact ones, their use leading to an excessive number of rejections of the null hypotheses, a problem that becomes a grievous one when we use smaller samples or larger numbers of variables.

3. The Likelihood Ratio Test for the Simultaneous Test of Nullity of Means and the Circularity of the Covariance Matrix

For a sample of size , the th power of the l.r. test statistic to test the null hypothesis iswhere , , and , as well as the matrices and , are defined as in the previous section.

According to [1],whereare a set of independent r.v.’s.

Taking and following similar steps to the ones used in [4] to handle the c.f. of , we may write the c.f. of asfor given by (15).

This shows that the exact distribution of is the same as that of the sum of GIG distributions of depth with an independent distributed r.v.

But then, using the result in expression (3) of [8], we know that we can replace asymptotically a distribution by an infinite mixture of distributions , for large values of . This means that we can replace asymptotically

As such, in order to obtain a very sharp and manageable near-exact distribution for , we will use, as near-exact c.f. for ,where the weights , , will be determined in such a way thatwith .

is the c.f. of a mixture of Generalized Near-Integer Gamma (GNIG) distributions of depth (see [5] for the GNIG distribution).

As such, using the notation for the p.d.f. and c.d.f. of the GNIG distribution used in Section 3 of [6], the near-exact p.d.f.’s and c.d.f.’s for and arewith given by (15).

In Table 2 we may analyze values of the measure in (5) for the near-exact distributions developed in this section, for different values of and different sample sizes. We may see how these near-exact distributions display very low values of the measure , indicating an extremely good proximity to the exact distribution, even for very small sample sizes, and how they display a sharp asymptotic behavior for increasing values of and .

Table 2: Values of the measure in (5), for the near-exact distributions of the l.r. test statistic in (17), which match exact moments, for different values of and samples of size .

In Table 3 we may analyze the asymptotic quantiles for for the common chi-square asymptotic approximation for l.r. statistics, here with degrees of freedom and the quantiles for the near-exact distributions that equate 2, 6, and 10 exact moments. These quantiles are shown with 26 decimal places in order to make it possible to identify the number of correct decimal places for the quantiles of the near-exact distributions that match 2 and 6 exact moments. We should note that the quantiles of the near-exact distributions that match 10 exact moments always have much more than 26 decimal places that are correct. Also for the statistic in this section, we may see the lack of precision of the asymptotic chi-square quantiles.

Table 3: Quantiles of orders and for the chi-square approximation and for the near-exact distributions that match , 6, or 10 exact moments, of for the l.r. statistic in (17), for different values of and samples of size .

4. The Likelihood Ratio Test for the Simultaneous Test of Equality of Means and Compound Symmetry of the Covariance Matrix

Let us assume that , with . We are interested in testing the hypothesis where represents a compound symmetric matrix, as defined in (3).

For a sample of size , the th power of the l.r. test statistic is (see [2]) where with being the sample matrix and a matrix of 1’s of dimension , with

Wilks [2] has also shown that whereform a set of independent r.v.’s.

As such, the th moment of may be written as

Since the expression in (33) remains valid for any complex , we may write the c.f. of aswhich may be rewritten asThen, we may apply on the relation to obtainwith

Expression (37) shows that the exact distribution of is the same as that of the sum of GIG distributed r.v.’s of depth with an independent sum of independent r.v.’s.

Our aim in building the near-exact distribution will be to keep unchanged and approximate asymptotically .

In order to obtain this asymptotic approximation, we will need to use a different approach from the one used in the previous section. We will use the result in sec. 5 of [9], which implies that a distribution may be asymptotically replaced by an infinite mixture of distributions.

Using a somewhat heuristic approach, we will thus approximate by a mixture of distributions where is the sum of the second parameters of the Logbeta r.v.’s in and is the common rate parameter in the mixture of two Gamma distributions that matches the first 4 moments of , that is, in

As such, in order to build the near-exact distributions for , we will use, as near exact c.f. for , where the weights , , will be determined in such a way thatwith .

The c.f. in (41) is, for integer , the c.f. of a mixture of GIG distributions of depth or, for noninteger , the c.f. of a mixture of GNIG distributions of depth , with shape parameters and rate parameters .

This will yield, for noninteger , near-exact distributions whose p.d.f.’s and c.d.f.’s for and are with given by (38). For integer , we will only have to replace in the above expressions the GNIG p.d.f. and c.d.f. by the GIG p.d.f. and c.d.f., respectively.

In Table 4, we may analyze values of the measure in (5) for the near-exact distributions developed in this section, for different values of and different sample sizes. We may see how these near-exact distributions display, once again, very low values of the measure even for very small sample sizes, indicating an extremely good proximity to the exact distribution and how, once again, they display a sharp asymptotic behavior for increasing values of and , although for large values of , namely, for in Table 4, one may have to consider larger values of in order to be able to observe the asymptotic behavior in terms of sample size.

Table 4: Values of the measure in (5), for the near-exact distributions of the l.r. test statistic in (26), which match exact moments, for different values of and samples of size .

The asymptotic quantiles for in Table 5, for the common chi-square asymptotic approximation for l.r. statistics, now with degrees of freedom, display again, as in the previous sections, an almost shocking lack of precision, mainly for small sample sizes and/or larger numbers of variables. On the other hand, the near-exact quantiles show a steady evolution towards the exact quantiles for increasing number of exact moments matched, with the quantiles for the near-exact distributions that match 6 exact moments displaying more than 20 correct decimal places, for the larger sample sizes.

Table 5: Quantiles of orders and for the chi-square approximation and for the near-exact distributions that match , 6, or 10 exact moments, of for the l.r. statistic in (26), for different values of and samples of size .

5. The Likelihood Ratio Test for the Simultaneous Test of Nullity of Means and Compound Symmetry of the Covariance Matrix

Let us assume now that . We are interested in testing the hypothesis

We may writewhere

While, for a sample of size , the th power of the l.r. statistic to test may be shown to be (see Appendix A for details) where and are given by (28) and withwhere is the sample matrix, the l.r. test statistic to test is shown by [2] to beagain with and given by (28) and given by (27).

The l.r. test to test in (44) is thus

For a sample of size , , the th power of the l.r. test statistic to test , may be shown to be distributed as (see [3] and Appendix A for details) , where and are independent, with while [2] shows that the th power of the l.r. statistic to test is distributed as , where are independent, with

Based on Theorem  5 in [10], it is then possible to show that the l.r. statistics to test and are independent, since is independent of and is built only on this statistic, since is the same statistic in a constrained subspace.

From this fact, we may show that the th power of the l.r. statistic to test in (44), , is distributed as (see Appendix B for details) where all r.v.’s are independent, withWe note that the r.v.’s are the same as the r.v.’s in (31) and (32).

As such, the c.f. of may be written aswith given by (38).

Then, following a similar approach to the one used for and , in the previous section, we obtain near-exact distributions with a similar structure to those in that section, now with and with determined as the solution of a system of equations similar to the one in (40), with replaced by .

This will yield for and near-exact distributions with p.d.f.’s and c.d.f.’s given by (43), now with given by (57).

We should note that as it happens with and , also for and , may be either an integer or a half-integer, so that, in those cases where is an integer, the near-exact distributions are mixtures of GIG distributions, while when is noninteger, they are mixtures of GNIG distributions.

In Table 6 we may analyze values of the measure in (5) for the near-exact distributions developed in this section, for different values of and different sample sizes. We may see how these near-exact distributions display similar properties to those of the near-exact distributions developed for in the previous section.

Table 6: Values of the measure in (5), for the near-exact distributions of the l.r. test statistic in (51), which match exact moments, for different values of and samples of size .

In Table 7, the asymptotic chi-square quantiles are made available for the common chi-square asymptotic approximation for l.r. statistics, now with degrees of freedom, as well as the near-exact quantiles for . Similar conclusions to those drawn in the previous sections apply here.

Table 7: Quantiles of orders and for the chi-square approximation and for the near-exact distributions that match , 6, or 10 exact moments, of for the l.r. statistic in (51), for different values of and samples of size .

6. The Likelihood Ratio Test for the Simultaneous Test of Equality of Means and Sphericity of the Covariance Matrix

If , where , and we are interested in testing the null hypothesis we may writewhere where, for a sample of size , the th power of the l.r. statistic to test , versus an alternative hypothesis that assumes sphericity for the covariance matrix and no structure for the mean vector, may be shown to be (see Appendix C for details) where is the matrix in (27) andwithWe havewhere is the sample matrix and

In (65), from standard theory on normal r.v.’s, since , independent for , and, since, under , we have , i.i.d. for , under this null hypothesis, Thus, since the r.v.’s in (67) are independent for , while, from (68),

Since and are independent, given that is independent of all and is defined only from the , then

From [6, 11] and [12, sec. 10.7], the th power of the l.r. statistic to test in (60) is given bywith (see [6])where, for , are a set of independent r.v.’s.

The th power of the l.r. statistic to test in (60) is thus where, from Theorem  5 in [10], we may assure the independence between and and as such say thatwhere all r.v.’s are independent, are the r.v.’s in (75), and is a r.v. with the same distribution as in (72).

Let us take , , and . Then we will havewhere, using (36), we may write, for odd ,and, for even , following similar steps,

Taking for the expression for in (A.6) in [6], we may write