Mathematical Problems in Engineering

Mathematical Problems in Engineering / 2016 / Article

Research Article | Open Access

Volume 2016 |Article ID 8975902 | 25 pages | https://doi.org/10.1155/2016/8975902

Near-Exact Distributions for Likelihood Ratio Statistics Used in the Simultaneous Test of Conditions on Mean Vectors and Patterns of Covariance Matrices

Academic Editor: Andrzej Swierniak
Received31 May 2015
Accepted23 Nov 2015
Published24 Mar 2016

Abstract

The authors address likelihood ratio statistics used to test simultaneously conditions on mean vectors and patterns on covariance matrices. Tests for conditions on mean vectors, assuming or not a given structure for the covariance matrix, are quite common, since they may be easily implemented. But, on the other hand, the practical use of simultaneous tests for conditions on the mean vectors and a given pattern for the covariance matrix is usually hindered by the nonmanageability of the expressions for their exact distribution functions. The authors show the importance of being able to adequately factorize the c.f. of the logarithm of likelihood ratio statistics in order to obtain sharp and highly manageable near-exact distributions, or even the exact distribution in a highly manageable form. The tests considered are the simultaneous tests of equality or nullity of means and circularity, compound symmetry, or sphericity of the covariance matrix. Numerical studies show the high accuracy of the near-exact distributions and their adequacy for cases with very small samples and/or large number of variables. The exact and near-exact quantiles computed show how the common chi-square asymptotic approximation is highly inadequate for situations with small samples or large number of variables.

1. Introduction

Testing conditions on mean vectors is a common procedure in multivariate statistics. Often a given structure is assumed for the covariance matrix, without testing it, or otherwise this test to the covariance structure is carried out apart. This is often due to the fact that the exact distribution of the test statistics used to test simultaneously conditions on mean vectors and patterns on covariance matrices is too elaborate to be used in practice. The authors show how this problem may be overcome with the development of very sharp and manageable near-exact distributions for the test statistics. These distributions may be obtained from adequate factorizations of the characteristic function (c.f.) of the logarithm of the likelihood ratio (l.r.) statistics used for these tests.

The conditions tested on mean vectors are(i)the equality of all the means in the mean vector,(ii)the nullity of all the means in the mean vectorand the patterns tested on covariance matrices are(i)circularity,(ii)compound symmetry,(iii)sphericity.

Let be a random vector with . The covariance matrix is said to be circular, or circulant, if ,  , with where , for ; .

For example, for and , we have

Besides the almost obvious area of times series analysis, there is a wealth of other areas and research fields where circular or circulant matrices arise, such as statistical signal processing, information theory and cryptography, biological sciences, psychometry, quality control, and signal detection, as well as spatial statistics and engineering, when observations are made on the vertices of a regular polygon.

We say that a positive-definite covariance matrix is compound-symmetric if we can write For example, for , we have

If, in (3), , we say that the matrix is spheric.

The l.r. tests for equality and nullity of means, assuming circularity, and the l.r. tests for the simultaneous test of equality or nullity of means and circularity of the covariance matrix were developed by [1], while the test for equality of means, assuming compound symmetry, and the test for equality of means and compound symmetry were formulated by [2] and the test for nullity of the means, assuming compound symmetry, and the simultaneous test for nullity of the means and compound symmetry of the covariance matrix were worked out by [3]. The exact distribution for the l.r. test statistic for the simultaneous test of equality of means and circularity of the covariance matrix was obtained in [4] and is briefly referred to in Section 2, for the sake of completeness, while near-exact distributions for the l.r. test statistic for the simultaneous test of nullity of the means and circularity of the covariance matrix are developed in Section 3. Near-exact distributions for the l.r. test statistics for the simultaneous test of equality and nullity of the means and compound symmetry of the covariance matrix are developed in Sections 4 and 5, using a different approach from the one used in Section 3. The l.r. statistics for the tests of equality and nullity of all means, assuming sphericity of the covariance matrix, may be analyzed in Appendix C and the l.r. statistics for the simultaneous tests of equality and nullity of all means and sphericity, together with the development of near-exact distributions for these statistics, may be examined in Sections 6 and 7.

Since, as referred above, the exact distributions for the statistics for the simultaneous tests of conditions on means vectors and patterns of covariance matrices are too elaborate to be used in practice, the authors propose in this paper the use of near-exact distributions for these statistics. These are asymptotic distributions which are built using a different concept in approximating distributions which combines an adequately developed decomposition of the c.f. of the statistic or of its logarithm, most often a factorization, with the action of keeping then most of this c.f. unchanged and replacing the remaining smaller part by an adequate asymptotic approximation [5, 6]. All this is done in order to obtain a manageable and very well-fitting approximation, which may be used to compute near-exact quantiles or values. These distributions are much useful in situations where it is not possible to obtain the exact distribution in a manageable form and the common asymptotic distributions do not display the necessary precision. Near-exact distributions show very good performances for very small samples, and when correctly developed for statistics used in Multivariate Analysis, near-exact distributions display a sharp asymptotic behavior both for increasing sample sizes and for increasing number of variables.

In Sections 37, near-exact distributions are obtained using different techniques and results, according to the structure of the exact distribution of the statistic.

In order to study, in each case, the proximity between the near-exact distributions developed and the exact distribution, we will use the measure withwhere represents the l.r. statistic, is the exact c.f. of , is the near-exact c.f., and , , , and are the exact and near-exact c.d.f.’s of and .

This measure is particularly useful, since in our cases we do not have the exact c.d.f. of or in a manageable form, but we have both the exact and near-exact c.f.’s for .

2. The Likelihood Ratio Test for the Simultaneous Test of Equality of Means and the Circularity of the Covariance Matrix

Let , where . Then, for a sample of size , the th power of the l.r. statistic to test the null hypothesisis where , is the maximum likelihood estimator (m.l.e.) of , , where is the matrix with running elementwith , and where is the th diagonal element of , andwith , where is the vector of sample means.

This test statistic was derived by [1, sec. 5.2], where the expression for the l.r. test statistic has to be slightly corrected.

According to [1], where are a set of independent r.v.’s.

From this fact we may write the c.f. of as

By adequately handling this c.f., the exact distribution of is obtained in [4] as a Generalized Integer Gamma (GIG) distribution (see [7] for the GIG distribution), since we may writefor

A popular asymptotic approximation for the distribution of is the chi-square asymptotic distribution with a number of degrees of freedom equal to the difference of the number of unknown parameters under the alternative hypothesis and the number of parameters under the null hypothesis, which gives for , for in (8), a chi-square asymptotic distribution with degrees of freedom. Although this is a valid approximation for large sample sizes, in practical terms, this approximation is somewhat useless given the fact that it gives quantiles that are much lower than the exact ones, as it may be seen from the quantiles in Table 1, namely, for small samples or when the number of variables involved is somewhat large.


exact Asymptotic-

10 76.77780315606147980433710659
11 184.84579506364826855487849906
60 82.86779631112725385496956047
460 77.50088072977322094345813820
15 153.19790274395621072198817490
16 356.83946609433702153375390686
65 169.23132191434840041430238602
465 155.17420633635277455721974156
25 379.74587752919253597245376194
26 853.62442647392551959929457598
75 437.12290346767321994020024210
475 387.31925318201483716457949700
50 1382.92839770564012472044120417
51 2983.52950629554250120199516974
100 1719.07640203276757900109720368
500 1434.09183007253302711800352147

10 85.95017624510346845181671517
11 221.13637373719535956938670312
60 92.78317859393323169599466291
460 86.75984117402977424037787646
15 165.84100085082047675645088502
16 409.92566639020778120425384446
65 183.23718212829228159346647123
465 167.98095654076846741881112565
25 399.22970790268112734530953113
26 940.55141434060365229501805667
75 459.68274728064743270409254333
475 407.19370031104569525049581690
50 1419.46244733465596475819616876
51 3156.01716925813527651187643029
100 1765.17588807596249988258749774
500 1471.99072215013613268320536543

From the values in Table 1 we may see that even for quite large sample sizes and rather small number of variables as in the case of and , the asymptotic chi-square quantile does not even match the units digit of the exact quantile, a difference that gets even larger as the number of variables increases. The chi-square asymptotic quantiles are always smaller than the exact ones, their use leading to an excessive number of rejections of the null hypotheses, a problem that becomes a grievous one when we use smaller samples or larger numbers of variables.

3. The Likelihood Ratio Test for the Simultaneous Test of Nullity of Means and the Circularity of the Covariance Matrix

For a sample of size , the th power of the l.r. test statistic to test the null hypothesis iswhere , , and , as well as the matrices and , are defined as in the previous section.

According to [1],whereare a set of independent r.v.’s.

Taking and following similar steps to the ones used in [4] to handle the c.f. of , we may write the c.f. of asfor given by (15).

This shows that the exact distribution of is the same as that of the sum of GIG distributions of depth with an independent distributed r.v.

But then, using the result in expression (3) of [8], we know that we can replace asymptotically a distribution by an infinite mixture of distributions , for large values of . This means that we can replace asymptotically

As such, in order to obtain a very sharp and manageable near-exact distribution for , we will use, as near-exact c.f. for ,where the weights , , will be determined in such a way thatwith .

is the c.f. of a mixture of Generalized Near-Integer Gamma (GNIG) distributions of depth (see [5] for the GNIG distribution).

As such, using the notation for the p.d.f. and c.d.f. of the GNIG distribution used in Section 3 of [6], the near-exact p.d.f.’s and c.d.f.’s for and arewith given by (15).

In Table 2 we may analyze values of the measure in (5) for the near-exact distributions developed in this section, for different values of and different sample sizes. We may see how these near-exact distributions display very low values of the measure , indicating an extremely good proximity to the exact distribution, even for very small sample sizes, and how they display a sharp asymptotic behavior for increasing values of and .


2 4 6 10 20

10 11
60
460

15 16
65
465

25 26
75
475

50 51
100
500

In Table 3 we may analyze the asymptotic quantiles for for the common chi-square asymptotic approximation for l.r. statistics, here with degrees of freedom and the quantiles for the near-exact distributions that equate 2, 6, and 10 exact moments. These quantiles are shown with 26 decimal places in order to make it possible to identify the number of correct decimal places for the quantiles of the near-exact distributions that match 2 and 6 exact moments. We should note that the quantiles of the near-exact distributions that match 10 exact moments always have much more than 26 decimal places that are correct. Also for the statistic in this section, we may see the lack of precision of the asymptotic chi-square quantiles.


Near-exact distributions
2
6 10

10 77.93052380523042221626519134
11 186.05876112572581565724084671 186.05432193047513242686020674 186.05432193047314525997232991
60 84.04125105999276667186524643 84.04108458609231348943338211 84.04108458609231109537970540
460 78.65634165506500024084199822 78.65633892721702104079228658 78.65633892721702104079204160
15 154.30151616535021445238246295
16 357.97669643528506074060759290 357.97507556215083974984357415 357.97507556215080939585842368
65 170.35237275637365647369739224 170.35226008885071534567845906 170.35226008885071540240672638
465 156.28032082820735784475589844 156.28031868060615422818671849 156.28031868060615422818672130
25 380.80932798326309965351661308
26 854.70444721767252577468152620 854.70394734650287271287479556 854.70394734650287269133917757
75 438.20003415718340026159076324 438.19996662815447062882655526 438.19996662815447063138401124
475 388.38496822730515375937449496 388.38496656458350909591459832 388.38496656458350909591459983
50 1383.96068056800040468077204943
51 2984.56855943704772921090729785 2984.56844789958359734870454884 2984.56844789958359734870371056
100 1720.11786940941582246751263410 1720.11783827238527706678307676 1720.11783827238527706678545071
500 1435.12611970183181982922209152 1435.12611845907109035050829355 1435.12611845907109035050829357

10 87.16571139978756939605535555
11 222.35460048933098793717199428 222.34999712883572718401028732 222.34999712881867010107412453
60 94.01616090177012981866436862 94.01591912908288010423242638 94.01591912908288738648975948
460 87.97774784195631422822549370 87.97774379040648875818772244 87.97774379040648875818827763
15 166.98739013667388353875528529
16 411.06647044542264851519342470 411.06480936217954062715053117 411.06480936217950312171917698
65 184.39761554112671734787489011 184.39747019675240094311740291 184.39747019675240137066064190
465 169.12943676298651871191909376 169.12943392987951859377231013 169.12943392987951859377239666
25 400.31941006633016981735907054
26 941.63369479396272075223880115 941.63318710455464318301861791 941.63318710455464316544796384
75 460.78287302204290413806922579 460.78279435481890211302276154 460.78279435481890211439671412
475 408.28522573430588726946677528 408.28522375722866010999960584 408.28522375722866010999960757
50 1420.50810052822545308190548728
51 3157.05746081267985238368475051 3157.05734831670317027719825824 3157.05734831670317027719755606
100 1766.22802929701017442904996266 1766.22799581952383557719092088 1766.22799581952383557718851667
500 1473.03793287996522918580780119 1473.03793152230567784068549373 1473.03793152230567784068549371

4. The Likelihood Ratio Test for the Simultaneous Test of Equality of Means and Compound Symmetry of the Covariance Matrix

Let us assume that , with . We are interested in testing the hypothesis where represents a compound symmetric matrix, as defined in (3).

For a sample of size , the th power of the l.r. test statistic is (see [2]) where with being the sample matrix and a matrix of 1’s of dimension , with

Wilks [2] has also shown that whereform a set of independent r.v.’s.

As such, the th moment of may be written as

Since the expression in (33) remains valid for any complex , we may write the c.f. of aswhich may be rewritten asThen, we may apply on the relation to obtainwith

Expression (37) shows that the exact distribution of is the same as that of the sum of GIG distributed r.v.’s of depth with an independent sum of independent r.v.’s.

Our aim in building the near-exact distribution will be to keep unchanged and approximate asymptotically .

In order to obtain this asymptotic approximation, we will need to use a different approach from the one used in the previous section. We will use the result in sec. 5 of [9], which implies that a distribution may be asymptotically replaced by an infinite mixture of distributions.

Using a somewhat heuristic approach, we will thus approximate by a mixture of distributions where is the sum of the second parameters of the Logbeta r.v.’s in and is the common rate parameter in the mixture of two Gamma distributions that matches the first 4 moments of , that is, in

As such, in order to build the near-exact distributions for , we will use, as near exact c.f. for , where the weights , , will be determined in such a way thatwith .

The c.f. in (41) is, for integer , the c.f. of a mixture of GIG distributions of depth or, for noninteger , the c.f. of a mixture of GNIG distributions of depth , with shape parameters and rate parameters .

This will yield, for noninteger , near-exact distributions whose p.d.f.’s and c.d.f.’s for and are