Advances in Decision Sciences
Volume 2010 (2010), Article ID 292013, 8 pages
doi:10.1155/2010/292013
Review Article

Generalised Score and Wald Tests

School of Mathematical and Physical Sciences, University of Newcastle, Newcastle, NSW 2308, Australia

Received 17 September 2009; Accepted 16 December 2009

Academic Editor: Chin Lai

Copyright © 2010 Paul Rippon and J. C. W. Rayner. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

The generalised score and Wald tests are described and related to their nongeneralised versions. Two interesting applications are discussed. In the first a new test for the Behrens-Fisher problem is derived. The second is testing homogeneity of variances from multiple univariate normal populations.

1. Introduction

This paper is intended to be a tutorial for those wishing to inform themselves about the generalised score and Wald Tests. It extends the content of [1] and has similar objectives; that is, it focuses on the use of these tests rather than their properties. It is intended to be very accessible. Readers need only some prior knowledge of partitioned matrices, score and Wald tests, see, for example, [1] and [2, Chapter 3].

The score test is particularly valuable when maximum likelihood (ML) estimation under the full model is not preferred, but ML estimation under the null model is. The converse holds for the Wald test. Thus when ML estimation under one of the null and full models is not preferred, the likelihood ratio test is problematic, but one of the score and Wald tests is not. Here by not preferred we mean that, for example, estimates may be calculated by some iterative scheme with dubious convergence. Other possibilities are that estimates may have a particularly convoluted expression or the finite sample properties (such as large bias) may be inappropriate for the problem of interest.

When ML estimation under both the null and full models is not preferred, we need another way forward. This is provided by the generalised score and Wald Tests. These tests are especially valuable when the model may be misspecified, but that will not be the focus here.

In Section 2 the generalised score and Wald Tests are described. In Section 3 this material is applied to deriving a new test for the Behrens-Fisher problem, while Section 4 looks at testing equality of variances from multiple independent normal samples.

2. M-Estimators and Generalised Score Tests

The class of M-estimators includes both ML and method of moments estimators. An M-estimator ̃ 𝛾 satisfies 𝑛 𝑗 = 1 Ψ 𝑋 𝑗 , ̃ 𝛾 = 0 𝑝 , ( 2 . 1 )

in which 𝑋 1 , , 𝑋 𝑛 are independent but not necessarily identically distributed, Ψ is a known 𝑝 × 1 function not depending on 𝑗 or 𝑛 , 𝛾 is a 𝑝 -dimensional parameter, and in general 0 𝑚 denotes an 𝑚 × 1 vector of zeros. The estimating function Ψ must be sufficiently `smooth.’ In particular, its derivatives up to second order, and their expectations, must exist. Hence the matrices 𝐴 and 𝐵 defined subsequently are assumed to exist. Also, the expectation of the second-order derivatives must be bounded in probability. More technical details on M-estimators may be found in [3, Chapter 5].

In our setting we assume that 𝛾 = ( 𝜃 𝑇 , 𝛽 𝑇 ) 𝑇 and that we wish to test 𝐻 0 : 𝜃 = 0 𝑘 against the alternative 𝐾 : 𝜃 0 𝑘 with 𝜃 being the 𝑘 × 1 vector of primary interest, with 𝛽 a 𝑞 × 1 vector of nuisance parameters, and with 𝑝 = 𝑘 + 𝑞 . The generalised score test is based on the partial M-estimator that satisfies 𝑛 𝑗 = 1 Ψ 𝛽 𝑋 𝑗 , ̃ 𝛾 0 = 0 𝑞 , ( 2 . 2 )

where Ψ is partitioned similarly to 𝛾 , so that Ψ 𝑇 = ( Ψ 𝑇 𝜃 , Ψ 𝑇 𝛽 ) , and where ̃ 𝛾 0 = ( 0 𝑇 𝑘 , ̃ 𝛽 𝑇 0 ) 𝑇 in which ̃ 𝛽 0 is the M-estimator of 𝛽 under the null hypothesis. Define 𝑈 𝜃 ( 𝛾 ) = 𝑛 𝑗 = 1 Ψ 𝜃 𝑋 𝑗 , 𝐴 , 𝛾 ( 𝛾 ) = 𝐸 0 𝜕 Ψ ( 𝑋 , 𝛾 ) = 𝐴 𝜕 𝛾 𝜃 𝜃 𝐴 𝜃 𝛽 𝐴 𝛽 𝜃 𝐴 𝛽 𝛽 , 𝐵 ( 𝛾 ) = 𝐸 0 Ψ Ψ 𝑇 = 𝐵 𝜃 𝜃 𝐵 𝜃 𝛽 𝐵 𝛽 𝜃 𝐵 𝛽 𝛽 , ( 2 . 3 ) in which 𝐸 0 denotes expectation under the null hypothesis. Here 𝐴 ( 𝛾 ) and 𝐵 ( 𝛾 ) are 𝑝 × 𝑝 and 𝐴 𝜃 𝜃 and 𝐵 𝜃 𝜃 are 𝑘 × 𝑘 . We note that 𝐴 ( 𝛾 ) is not necessarily symmetric while 𝐵 ( 𝛾 ) is. This means that the form of the generalised tests given by, for example, [4], needs to be slightly modified. The generalised score test statistic is given by 𝑆 𝐺 = 𝑈 𝑇 𝜃 𝐴 𝑇 1 𝜃 𝜃 𝐴 1 𝐵 𝐴 𝑇 1 1 𝜃 𝜃 𝐴 1 𝜃 𝜃 𝑈 𝜃 ( 2 . 4 )

in which, as can readily be shown, ( 𝐴 1 ) 𝜃 𝜃 = ( 𝐴 𝜃 𝜃 𝐴 𝜃 𝛽 𝐴 1 𝛽 𝛽 𝐴 𝛽 𝜃 ) 1 and the arguments in 𝑈 𝜃 , 𝐴 , and 𝐵 are suppressed; here all are ̃ 𝛾 0 . Similarly the generalised Wald test statistic is given by 𝑊 𝐺 = ̃ 𝜃 𝑇 𝐴 1 𝐵 𝐴 𝑇 1 1 𝜃 𝜃 ̃ 𝜃 ( 2 . 5 )

in which all arguments are ̃ 𝛾 . In the exposition in [4] parameters are estimated by ML but the data do not come from the parametric model: this is ML under misspecification. In [5], Kent’s definitions are given but in place of ML estimators any M-estimators are permitted. It is also noted in [4] that 𝐴 and 𝐵 can in practice be replaced by any consistent estimates.

An alternative form of 𝑆 𝐺 that is more convenient for calculation is given in [2], where it is applied to the construction of generalized smooth tests of goodness of fit. This form gives 𝑆 𝐺 = 𝑈 𝑇 𝜃 ̃ 𝛾 0 Σ 1 𝐺 𝑆 ̃ 𝛾 0 𝑈 𝜃 ̃ 𝛾 0 ( 2 . 6 )

in which Σ 𝐺 𝑆 ( 𝛾 ) = 𝐵 𝜃 𝜃 𝐴 𝜃 𝛽 𝐴 1 𝛽 𝛽 𝐵 𝛽 𝜃 𝐵 𝜃 𝛽 𝐴 1 𝛽 𝛽 𝑇 𝐴 𝑇 𝜃 𝛽 + 𝐴 𝜃 𝛽 𝐴 1 𝛽 𝛽 𝐵 𝛽 𝛽 𝐴 1 𝛽 𝛽 𝑇 𝐴 𝑇 𝜃 𝛽 . ( 2 . 7 ) The equivalence of the two forms requires routine but tedious matrix algebra and is omitted here. The asymptotic distribution of both 𝑆 𝐺 and 𝑊 𝐺 under 𝐻 0 is 𝜒 2 𝑘 .

If Ψ ( 𝑋 , 𝛾 ) is the derivative of logarithm of the likelihood, which is the usual score function, then 𝐴 = 𝐵 is the usual (symmetric) information matrix, and 𝐴 1 𝐵 ( 𝐴 𝑇 ) 1 = 𝐴 1 . If ML estimation is used, then 𝑊 𝐺 = ̂ 𝜃 𝑇 { 𝐴 𝜃 𝜃 𝐴 𝜃 𝛽 𝐴 1 𝛽 𝛽 𝐴 𝛽 𝜃 } ̂ 𝜃 , the usual Wald test statistic, and 𝑆 𝐺 = 𝑈 𝑇 𝜃 { 𝐴 𝜃 𝜃 𝐴 𝜃 𝛽 𝐴 1 𝛽 𝛽 𝐴 𝛽 𝜃 } 1 𝑈 𝜃 , the usual score test statistic. Both are given in this form in [1]. For more information see [5, 6].

In [5, page 328] replacing the inverse of the asymptotic covariance matrix Σ 𝐺 𝑆 ( ̃ 𝛾 0 ) in 𝑆 𝐺 by a generalised inverse of a consistent estimate of Σ 𝐺 𝑆 ( ̃ 𝛾 0 ) is recommended. Although it may sound trivial, when calculating any of the ordinary or generalised score or Wald test statistics, we are finding ( 𝑋 𝐸 [ 𝑋 ] ) 𝑇 Σ 1 ( 𝑋 𝐸 [ 𝑋 ] ) where 𝑋 is at least asymptotically multivariate normal and Σ is at least asymptotically the full rank covariance matrix of 𝑋 . Very occasionally it may be more convenient to find the exact covariance matrix rather than one that is asymptotically equivalent. If so the exact covariance matrix can be used in the above expressions; similarly when appropriate a generalised inverse of the exact or an asymptotically equivalent covariance matrix can be used.

3. The Behrens-Fisher Problem

In the Behrens-Fisher problem, 𝑌 1 , , 𝑌 𝑚 is a random sample from an 𝑁 ( 𝜇 𝑌 , 𝜎 2 𝑌 ) population, and 𝑍 1 , , 𝑍 𝑛 is an independent random sample from an 𝑁 ( 𝜇 𝑍 , 𝜎 2 𝑍 ) population. It is desired to test 𝐻 : 𝜇 𝑌 = 𝜇 𝑍 against 𝐾 : 𝜇 𝑌 𝜇 𝑍 , with the standard deviations 𝜎 𝑌 and 𝜎 𝑍 being nuisance parameters. In [2, Example  3.3.2] the likelihood ratio, score, and Wald tests are derived. The score test requires the solution of an inconvenient cubic equation; so this is one situation in which the Wald statistic looks distinctly more appealing than both the likelihood ratio and score test statistics.

When the estimating function 𝑛 𝑗 = 1 Ψ ( 𝑋 𝑗 , 𝛾 ) is the usual score function, the generalised score test is the usual score test. To conform to our notation put ( 𝑌 1 , , 𝑌 𝑚 , 𝑍 1 , , 𝑍 𝑛 ) = 𝑋 𝑇 , 𝜇 𝑌 𝜇 𝑍 = 2 𝜃 , 𝜇 𝑌 + 𝜇 𝑍 = 2 𝛽 1 , 𝜎 2 𝑌 = 𝛽 2 and 𝜎 2 𝑍 = 𝛽 2 . We test 𝐻 𝜃 = 0 against 𝐾 𝜃 0 , with nuisance parameters 𝛽 1 , 𝛽 2 and 𝛽 3 . The logarithm of the likelihood is

𝑚 c o n s t a n t 2 l o g 𝛽 2 𝑛 2 l o g 𝛽 3 2 𝛽 2 1 𝑖 𝑦 𝑖 𝛽 1 𝜃 2 2 𝛽 3 1 𝑗 𝑧 𝑗 𝛽 1 + 𝜃 2 , ( 3 . 1 )

and therefore the score function has the following components: 𝑆 𝜃 ( 𝛾 ) = 𝑖 𝑦 𝑖 𝛽 1 𝜃 𝛽 2 𝑗 𝑧 𝑗 𝛽 1 + 𝜃 𝛽 3 , 𝑆 𝛽 1 ( 𝛾 ) = 𝑖 𝑦 𝑖 𝛽 1 𝜃 𝛽 2 + 𝑗 𝑧 𝑗 𝛽 1 + 𝜃 𝛽 3 , 𝑆 𝛽 2 𝑚 ( 𝛾 ) = 2 𝛽 2 + 𝑖 𝑦 𝑖 𝛽 1 𝜃 2 2 𝛽 2 2 , 𝑆 𝛽 3 𝑛 ( 𝛾 ) = 2 𝛽 3 + 𝑗 𝑧 𝑗 𝛽 1 + 𝜃 2 2 𝛽 2 3 . ( 3 . 2 ) These are the partial derivatives of the logarithm of the likelihood. Under the null hypothesis the estimating equations are 𝑆 𝛽 1 ( 𝛾 0 ) = 𝑆 𝛽 2 ( 𝛾 0 ) = 𝑆 𝛽 3 ( 𝛾 0 ) = 0 . This leads to the inconvenient cubic equation mentioned previously. If we proceed with this model, the cubic must be solved to find ̃ 𝛽 1 0 , and hence ̃ 𝛽 2 0 = 𝑖 ( 𝑌 𝑖 ̃ 𝛽 1 0 ) 2 / 𝑚 and ̃ 𝛽 3 0 = 𝑗 ( 𝑍 𝑗 ̃ 𝛽 1 0 ) 2 / 𝑛 . We also find 𝑆 𝜃 ̃ 𝛾 0 = 2 𝑌 𝑍 ̃ 𝛽 2 0 ̃ 𝛽 / 𝑚 + 3 0 , 𝑚 / 𝑛 𝐴 = 𝛽 2 + 𝑛 𝛽 3 𝑚 𝛽 2 𝑛 𝛽 3 𝑚 0 0 𝛽 2 𝑛 𝛽 3 𝑚 𝛽 2 + 𝑛 𝛽 3 𝑚 0 0 0 0 2 𝛽 2 2 0 𝑛 0 0 0 2 𝛽 2 3 = 𝐵 , ( 3 . 3 ) whence Σ 𝐺 𝑆 = 4 𝛽 2 / 𝑚 + 𝛽 3 / 𝑛 , ( 3 . 4 )

and the generalised score test statistic is 𝑆 𝐺 = ( 𝑌 𝑍 ) 2 ̃ 𝛽 / ( 2 0 ̃ 𝛽 / 𝑚 + 3 0 / 𝑛 ) . This is just the ordinary score test statistic.

While solving the cubic is not a great difficulty, if we modify 𝑆 𝛽 1 ( 𝛾 ) so that it becomes 𝑆 𝛽 1 ( 𝛾 ) = 𝑖 𝑦 𝑖 𝛽 1 + 𝜃 𝑗 𝑧 𝑗 𝛽 1 + 𝜃 , ( 3 . 5 ) a possibly less efficient but certainly more convenient estimator of the common mean under the null hypothesis may be found. This estimator is the solution to 𝑆 𝛽 1 ( 𝛾 0 ) = 0 , namely, ̃ 𝛽 1 0 = ( 𝑚 𝑌 + 𝑛 𝑍 ) / ( 𝑚 + 𝑛 ) . If we also modify 𝑆 𝜃 ( 𝛾 ) so that 𝑆 𝜃 ( 𝛾 ) = 𝑖 𝑦 𝑖 𝛽 1 𝜃 𝑗 𝑧 𝑗 𝛽 1 + 𝜃 , ( 3 . 6 ) while leaving the other two equations unchanged, the generalised score test is based on 𝑆 𝜃 ̃ 𝛾 0 = 2 𝑚 𝑛 𝑚 + 𝑛 𝑌 𝑍 . ( 3 . 7 ) The estimators of 𝛽 2 and 𝛽 3 are slightly different from those found previously, being ̃ 𝛽 2 0 = 𝑖 ( 𝑌 𝑖 ̃ 𝛽 1 0 ) 2 / 𝑚 and ̃ 𝛽 3 0 = 𝑗 ( 𝑍 𝑗 ̃ 𝛽 1 0 ) 2 / 𝑛 . Modifying the previous derivation gives 𝑚 𝐴 = 𝑚 + 𝑛 𝑚 𝑛 0 0 𝑚 𝑛 𝑚 + 𝑛 0 0 0 0 2 𝛽 2 2 0 𝑛 0 0 0 2 𝛽 2 3 , 𝐵 = 𝑚 𝛽 2 + 𝑛 𝛽 3 𝑚 𝛽 2 𝑛 𝛽 3 0 0 𝑚 𝛽 2 𝑛 𝛽 3 𝑚 𝛽 2 + 𝑛 𝛽 3 𝑚 0 0 0 0 2 𝛽 2 2 0 𝑛 0 0 0 2 𝛽 2 3 , ( 3 . 8 ) whence Σ 𝐺 𝑆 = 4 𝑚 𝑛 𝑚 𝛽 3 + 𝑛 𝛽 2 ( 𝑚 + 𝑛 ) 2 , ( 3 . 9 )

and the generalised score test statistic is

𝑆 𝐺 = 𝑌 𝑍 2 ̃ 𝛽 2 0 ̃ 𝛽 / 𝑚 + 3 0 / 𝑛 . ( 3 . 1 0 )

It may be shown that the Wald test statistic is a one-one function of this 𝑆 𝐺 , so that these two tests are equivalent. However, if using the asymptotic 𝜒 2 1 critical values, the generalised score test has actual test sizes much closer to the nominal sizes than the Wald test. When using simulated critical values that are virtually exact, the generalised score test power is within 1% of the entrenched test due to Welch [7]. So on this criterion the Welch and generalised score tests are virtually indistinguishable.

The Welch test is very similar to the Wald test. Using Satterthwaite’s approximation to the null distribution of the Welch test gives excellent agreement between the nominal and actual test sizes. However Satterthwaite’s approximation does not work nearly as well for 𝑆 𝐺 . Hence, in terms of agreement between nominal and actual test sizes using approximations and asymptotic critical values, the Welch test is to be preferred. Support for these assertions and more numerical details are available in [8].

4. Testing Equality of Variances

Suppose that we have 𝑚 independent random samples, with the 𝑗 th, 𝑗 = 1 , , 𝑚 , being of size 𝑛 𝑗 and from a normal 𝑁 ( 𝜇 𝑗 , 𝜎 2 𝑗 ) population. The total sample size is 𝑛 = 𝑛 1 + + 𝑛 𝑚 . We seek to test equality of variances: 𝐻 : 𝜎 2 1 = = 𝜎 2 𝑚 = 𝜎 2 say against the alternative 𝐾 : not 𝐻 . Popular tests include the likelihood ratio test, frequently referred to as Bartlett’s test, and Levene’s test. The former is known to be nonrobust, while the latter is more robust in that its actual levels are closer to the nominal levels. Levene’s test is less powerful than Bartlett’s when the data are consistent with normality.

We now construct a Wald test of 𝐻 against 𝐾 . We could use the generalised Wald test construction with Ψ ( 𝑋 , 𝛾 ) being the derivative of logarithm of the likelihood, but we leave that as an exercise for the interested reader. We could also calculate one of the forms of the asymptotic covariance matrix, but this is a case where it is simpler to calculate the exact covariance matrix. Moreover the exact covariance matrix involves an inconvenient inverse; so we instead use the Moore-Penrose inverse. This is defined in the appendix, along with some relevant useful results. This approach leads to a simpler test statistic.

Throughout this example, since we are calculating the Wald test statistic, all estimation is ML. As a consequence estimators are denoted by hats ( ) instead of tildes ( ). We also use unbiased versions of the sample variances (with divisors 𝑛 1 instead of 𝑛 ). These are asymptotically equivalent to the usual ML estimators, and the corresponding test statistic is asymptotically equivalent to the usual Wald test statistic.

Before proceeding with the construction, note that if 𝑆 2 is the unbiased sample variance from a random sample of size 𝑛 from a 𝑁 ( 𝜇 , 𝜎 2 ) distribution, then ( 𝑛 1 ) 𝑆 2 / 𝜎 2 has the 𝜒 2 𝑛 1 distribution. As is well known, v a r { ( 𝑛 1 ) 𝑆 2 / 𝜎 2 } = 2 ( 𝑛 1 ) , so that 𝑆 v a r 2 = 2 𝜎 4 ( 𝑆 𝑛 1 ) a n d 𝐸 4 = ( 𝑛 + 1 ) 𝜎 4 ( 𝑛 1 ) . ( 4 . 1 ) From the Rao-Blackwell theorem ( 𝑛 1 ) 𝑆 4 / ( 𝑛 + 1 ) is an optimal estimator of 𝜎 4 , being the unique estimator with minimum variance in the class of unbiased estimators of 𝜎 4 . This optimality is conferred upon 2 𝑆 4 / ( 𝑛 + 1 ) when estimating v a r ( 𝑆 2 ) . Writing 𝑆 2 𝑗 for the unbiased estimator of the 𝑗 th population variance 𝜎 2 𝑗 , 𝑗 = 1 , , 𝑚 , the optimal estimator of v a r ( 𝑆 2 𝑗 ) = 2 𝜎 4 𝑗 / ( 𝑛 𝑗 1 ) is 𝑑 𝑗 = 2 𝑆 4 𝑗 / ( 𝑛 𝑗 + 1 ) for 𝑗 = 1 , , 𝑚 .

Should the null hypothesis be true, an unbiased estimator of the common population variance 𝜎 2 is the pooled sample variance 𝑆 2 = 𝑗 𝑤 𝑗 𝑆 2 𝑗 where 𝑤 𝑗 = ( 𝑛 𝑗 1 ) / ( 𝑛 𝑚 ) for 𝑗 = 1 , , 𝑚 . Note that since 𝑗 ( 𝑛 𝑗 1 ) = 𝑛 𝑚 , 𝑗 𝑤 𝑗 = 1 . Now define 𝜎 2 = 𝑗 𝑤 𝑗 𝜎 2 𝑗 𝜎 , 𝜙 = 2 𝑗 p 𝑤 𝑗 , 𝑢 = p 𝑤 𝑗 , 𝐶 = 𝐼 𝑚 𝑢 𝑢 𝑇 . ( 4 . 2 ) Then 𝜃 = 𝐶 𝜙 = ( ( 𝜎 2 𝑗 𝜎 2 ) p 𝑤 𝑗 ) . This is zero if and only if 𝜎 2 𝑗 = 𝜎 2 for all 𝑗 . Hence testing equality of variances is equivalent to testing 𝐻 : 𝜃 = 0 𝑚 against 𝐾 : 𝜃 0 𝑚 . An unbiased estimator of 𝜃 is ̂ 𝜃 = ( ( 𝑆 2 𝑗 𝑆 2 ) p 𝑤 𝑗 ) and since 𝐶 is symmetric, ̂ 𝜙 𝜃 = 𝐶 has covariance matrix estimated by c ̂ ̂ o v ( 𝜃 ) = C D C where now 𝐷 = d i a g ( 𝑑 𝑗 𝑤 𝑗 ) . Now CDC is not of full rank, and in order to use results on quadratic forms of multivariate normal random variables generalised or pseudoinverses are required. Here we use M + , the Moore-Penrose inverse of the matrix M. See the appendix.

Because C is idempotent, the Moore-Penrose inverse of CDC is given by ( C D C ) + = C + D + C + = C D 1 C . ( 4 . 3 )

A Wald test statistic for testing 𝐻 : 𝜃 = 0 𝑚 against 𝐾 : 𝜃 0 𝑚 is ̂ 𝜃 𝑇 c ̂ ̂ o v ( 𝜃 ) + ̂ C 𝜙 𝜃 = 𝑇 ( C D C ) + C 𝜙 = 𝜙 𝑇 C C D 1 𝜙 = 𝜙 C C 𝑇 C D 1 C ̂ 𝜃 𝜙 = 𝑇 D 1 ̂ 𝜃 = 𝑚 𝑗 = 1 𝑆 2 𝑗 𝑆 2 2 𝑑 𝑗 = 𝑇 M P s a y . ( 4 . 4 ) Since r a n k ( C D C ) = 𝑚 1 , 𝑇 M P should be compared with the 𝜒 2 𝑚 1 distribution to assess significance. Should the test indicate significance at an appropriate level, rough pairwise comparisons can be made as in the comparison of means in the analysis of variance. To see how to do this first note that, as above, ( 𝑛 1 ) 𝑆 2 / 𝜎 2 has the 𝜒 2 𝑛 1 distribution which, for large 𝑛 , is approximately 𝑁 ( ( 𝑛 1 ) , 2 ( 𝑛 1 ) ) . Hence 𝑆 2 is approximately 𝑁 ( 𝜎 2 , 2 𝜎 4 / ( 𝑛 1 ) ) and under the null hypothesis of equality of variances for any 𝑖 𝑗 the difference 𝑆 2 𝑖 𝑆 2 𝑗 is approximately 𝑁 ( 0 , 2 𝜎 4 𝑖 / ( 𝑛 𝑖 1 ) + 2 𝜎 4 𝑗 / ( 𝑛 𝑗 1 ) ) , and v a r ( 𝑆 2 𝑖 𝑆 2 𝑗 ) can be estimated by 𝑑 𝑖 + 𝑑 𝑗 . A least significant difference may be constructed in the usual way.

Appendix

The Moore-Penrose Inverse

One of several pseudo-inverses or generalised inverses is the Moore-Penrose inverse: see, for example, [9, section  8.11]. The unique Moore-Penrose inverse B + of a real symmetric matrix B satisfies B + B B + = B + , B B + B B = B , + B 𝑇 = B + B , B B + 𝑇 = B B + . ( A . 1 ) It is routine to show the following.

(i)If Λ = d i a g ( 𝜆 1 , , 𝜆 𝑟 , 0 , , 0 ) , then Λ + = d i a g ( 𝜆 1 1 , , 𝜆 𝑟 1 , 0 , , 0 ) .(ii)If H is orthogonal, then H + = H 𝑇 .(iii)If A is idempotent, then A + = A .(iv)If the subsequent matrix products are defined, then ( B C ) + = C + B + and ( A B C ) + = C + B + A + .

It is well known that if 𝑋 is 𝑁 𝑝 ( 0 , Σ ) with rank ( Σ ) = 𝑟 < 𝑝 , then 𝑋 𝑇 Σ + 𝑋 has the 𝜒 2 𝑟 distribution where Σ + is a pseudoinverse of Σ . For the scenario here it is reasonable to test 𝐻 against 𝐾 using the test statistic ̂ 𝜃 𝑇 ̂ ̂ ( c o v ( 𝜃 ) ) + ̂ 𝜃 .

References

  1. J. C. W. Rayner, “The asymptotically optimal tests,” Journal of the Royal Statistical Society Series D, vol. 46, no. 3, pp. 337–346, 1997. View at Publisher · View at Google Scholar
  2. J. C. W. Rayner, O. Thas, and D. J. Best, Smooth Tests of Goodness of Fit: Using R, John Wiley & Sons, Singapore, 2nd edition, 2009.
  3. A. W. van der Vaart, Asymptotic Statistics, vol. 3 of Cambridge Series in Statistical and Probabilistic Mathematics, Cambridge University Press, Cambridge, UK, 1998. View at MathSciNet
  4. J. T. Kent, “Robust properties of likelihood ratio tests,” Biometrika, vol. 69, no. 1, pp. 19–27, 1982. View at Zentralblatt MATH · View at MathSciNet
  5. D. Boos, “On generalised score tests,” The American Statistician, vol. 47, pp. 327–333, 1992.
  6. H. White, “Maximum likelihood estimation of misspecified models,” Econometrica, vol. 50, no. 1, pp. 1–25, 1982. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet
  7. B. L. Welch, “The significance of the difference between two means when the population variances are unequal,” Biometrika, vol. 29, pp. 350–362, 1937.
  8. P. Rippon, J. Rayner, and O. Thas, “A competitor for the test Welch test in the Behrens-Fisher problem,” Unpublished Report, 2008.
  9. J. L. Goldberg, Matrix Theory with Applications, McGraw-Hill, New York, NY, USA, 1991.