Abstract

The canonical correlations between subsets of OLS estimators are identified with design linkage parameters between their regressors. Known collinearity indices are extended to encompass angles between each regressor vector and remaining vectors. One such angle quantifies the collinearity of regressors with the intercept, of concern in the corruption of all estimates due to ill-conditioning. Matrix identities factorize a determinant in terms of principal subdeterminants and the canonical Vector Alienation Coefficients between subset estimators—by duality, the Alienation Coefficients between subsets of regressors. These identities figure in the study of D and 𝐷𝑠 as determinant efficiencies for estimators and their subsets, specifically, 𝐷𝑠-efficiencies for the constant, linear, pure quadratic, and interactive coefficients in eight known small second-order designs. Studies on D- and 𝐷𝑠-efficiencies confirm that designs are seldom efficient for both. Determinant identities demonstrate the propensity for 𝐷𝑠-inefficient subsets to be masked through near collinearities in overall D-efficient designs.

1. Introduction

Given {𝐘=𝐗𝜷+𝝐} of full rank with homogeneous, uncorrelated errors, the OLS estimators 𝜷 are unbiased with second-moment matrix 𝑉(𝜷)=𝜎2(𝐗𝐗)1. Such moment matrices pervade experimental design, to include determinants as gauges of 𝐷- and 𝐷𝑠-efficiencies for estimators and their subsets. Early references trace to [14], and more recently to [510] and others. Finding 𝐷𝑠-efficient designs for polynomial models is considered in [1120], for example. Studies examining the 𝐷𝑠-efficiencies of 𝐷-efficient designs confirm that designs are seldom efficient for both; see [13, 2123]. From those beginnings, the study of 𝐷- and 𝐷𝑠-efficiencies continues apace. To wit, a recent key-word search in the Current Index to Statistics shows in excess of 60 listings from 2006 to 2010, and more than 100 from 2001 to 2010. Moreover, these ideas bear fruit in a widening diversity of applications as evidenced in the following.

To fix ideas, let 𝐷 correspond to a polynomial 𝑃𝑐 of degree 𝑐, namely, 𝑔(𝜇)=𝑐𝑖=0𝛽𝑖𝑡𝑖. In toxicology studies, a two-stage experiment is proffered in [24], seeking 𝐷-efficiency in estimating 𝑘=𝑐+1 overall parameters at the first stage, then 𝐷1-efficiency at the second stage in estimating a critical “threshold parameter,” using quasilikelihood in nonlinear models. Coupled with this is the 𝐷𝑘1-efficiency for the remaining 𝑘1 parameters at the second stage. In related work [25], experiments with 𝑐 chemicals in combination are to be examined along fixed-ratio rays. When restricted to a specified ray, the fundamental hypothesis of noninteracting factors can be rejected when higher-order polynomial terms are required in the total dose-response model 𝑔(𝜇)=𝛽0+𝛽1𝑡+𝑐𝑖=2𝛽𝑖𝑡𝑖 in the linear predictor 𝑡. Here 𝐷2 refers to [𝛽0,𝛽1] and 𝐷𝑐1 to 𝐷𝑠-efficiency in the critical estimation of [𝛽2,,𝛽𝑐], which vanish under the conjectured additivity. Moreover, in [11, 12, 21] 𝐷 refers to 𝑃𝑘+1, 𝐷𝑘 to 𝑃𝑘, and 𝐷1 to 𝛽𝑘+1, the highest-order term in 𝑃𝑘+1, for example. In short, users often are properly concerned with both 𝐷- and 𝐷𝑠-efficiencies, and connections between these basic criteria deserve further study, to be undertaken here.

Ill-conditioning, as near-collinearity among the columns of 𝐗, “causes crucial elements of 𝐗𝐗 to be large and unstable,” “creating inflated variances,” and estimates that are “very sensitive to small changes in 𝐗,” having “degraded numerical accuracy;” see [2628], for example. Diagnostics include the condition number 𝑐1(𝐗𝐗), the ratio of largest to smallest eigenvalues; and the Variance Inflation Factors ̂𝛽{VIF(𝑗)=𝑣𝑗𝑗𝑤𝑗𝑗;1𝑗𝑝} with 𝐖=𝐗𝐗 and 𝐕=(𝐗𝐗)1, that is, ratios of actual (𝑣𝑗𝑗) to “ideal” (1/𝑤𝑗𝑗) variances had the columns of 𝐗 been orthogonal. In models with intercept, “collinearity with the intercept can quite generally corrupt the estimates of all parameters in the model whether or not the intercept is itself of interest and whether or not the data have been (mean) centered,” as noted in [29].

To the foregoing list of ills from ill-conditioning, we add that not only are designs seldom efficient for both, but 𝐷𝑠-inefficient estimators may be masked in overall 𝐷-efficient designs, and conversely. This masking may be quantified in terms of structural dependencies, specifically, through determinant identities linking 𝐷- and 𝐷𝑠-efficiencies to various gages of nonorthogonality of the data. The latter include nonvanishing inner products between columns of regressors, Hotelling’s [30] canonical correlations among OLS solutions, and VIFs. An outline follows.

Section 2 contains supporting material. Details surrounding collinearity diagnostics are topics in Section 3, to include duality of angles between subspaces of the design and parameter spaces, and their connections to VIFs. Section 4 develops basic determinant identities and inequalities of independent interest. Section 5 revisits eight small second-order designs with regard to 𝐷𝑠-efficiencies in estimating the constant, linear, pure quadratic, and interactive coefficients, to include the masking of inefficient estimators. Though in wide usage, with no apparent accounting for collinearity, these designs are seen to exhibit varying degrees of collinearity of regressors with the constant. Since computations proceed from the design matrix itself, an advantage is that prospective designs can be evaluated beforehand in regard to issues studied here, before committing to an actual experiment. Section 6 concludes with a brief summary.

2. Preliminaries

2.1. Notation

Spaces of note include 𝑘 as Euclidean 𝑘-space; 𝑘+ as its positive orthant; 𝔽𝑛×𝑘 as the real matrices of order (𝑛×𝑘); 𝕊𝑘 as the (𝑘×𝑘) real symmetric matrices; and 𝕊+𝑘 as their positive definite varieties. The transpose, inverse, trace, and determinant of 𝐀𝕊+𝑘 are 𝐀, 𝐀1, tr(𝐀), and |𝐀|; and 𝐀1/2 is its spectral square root. Special arrays include the unit vector 𝟏𝑛=[1,1,,1]𝑛, the identity 𝐈𝑛 of order (𝑛×𝑛), the block-diagonal matrix Diag(𝐀1,𝐀2)𝕊𝑘, the idempotent form 𝐁𝑛=(𝐈𝑛𝑛1𝟏𝑛𝟏𝑛), and 𝒪(𝑘) as the real orthogonal group of (𝑘×𝑘) matrices. For 𝐗(𝑛×𝑝) of rank 𝑝𝑛, designate a pseudoinverse as 𝐗, its ordered singular values as 𝜎(𝐗)={𝜉1𝜉2𝜉𝑝>0}, and by 𝒮𝑝(𝐗)𝑛, the linear span of columns of 𝐗. Its condition number is 𝑐2(𝐗)=𝜉1/𝜉𝑝, specifically, 𝑐2(𝐗)=[𝑐1(𝐗𝐗)]1/2.

The mean, dispersion matrix, and generalized variance for a random 𝐔𝑘 are designated as 𝐸(𝐔)=𝝁𝑘, 𝑉(𝐔)=𝚺𝕊+𝑘, and 𝐺𝑉(𝐔)=|𝚺|, respectively. To account for dimension, consider 𝐺(𝐔)=[𝐺𝑉(𝐔)]1/𝑘=|𝚺|1/𝑘 as a function homogeneous of unit degree. The class M0{𝐘=𝛽0𝟏𝑛+𝐗𝜷+𝝐}, comprising models with intercept and dispersion 𝑉(𝝐)=𝜎2𝐈𝑛, is our principal focus. Unless stated otherwise, we take 𝜎2=1.0, since variance ratios are scale-invariant. A distinction is drawn between centered and uncentered VIFs, namely, VIF𝑐s and VIF𝑢s, the former from columns of 𝐗 centered to their means. The latter, designated as {VIF𝑢(̂𝛽𝑗);𝑗=0,1,,𝑘}, are diagonal elements of (𝐗0𝐗0)1 divided by reciprocals of diagonals of 𝐗0𝐗0 itself. These are of subsequent interest. Special distributions on 1+ include the Snedecor-Fisher distribution 𝐹(;𝜈1,𝜈2,𝜆) having (𝜈1,𝜈2) degrees of freedom and noncentrality 𝜆.

3. Collinearity Diagnostics

Ill-conditioned models {𝐘=𝐗𝜷+𝝐}, burdened with difficulties as cited, trace to nonorthogonality among columns of 𝐗. To examine aspects of near collinearity, we first establish duality between design linkage parameters among columns of 𝐗, and collinearity among the OLS solutions as quantified by Hotelling's [30] canonical correlations.

3.1. Duality Results

Partition a generic 𝐗𝔽𝑛×𝑝 as 𝐗=[𝐗1,𝐗2] with {𝐗,𝐗1,𝐗2} of orders {(𝑛×𝑝),(𝑛×𝑟),(𝑛×𝑠)}, respectively, having ranks {𝑝,𝑟,𝑠} such that 𝑟𝑠 and 𝑟+𝑠=𝑝<𝑛. Accordingly, write {𝐘=𝐗1𝜷1+𝐗2𝜷2+𝝐}, taking 𝜷=[𝜷1,𝜷2], and denoting by 𝒮𝑝(𝐗1) and 𝒮𝑝(𝐗2), the subspaces of 𝑛 spanned by columns of 𝐗1 and 𝐗2. We seek a canonical form preserving these subspaces and linkage between (𝐗1,𝐗2), a geometric concept independent of bases for representing 𝒮𝑝(𝐗1) and 𝒮𝑝(𝐗2). Accordingly, let 𝐆1=(𝐗1𝐗1)1/2𝐏 and 𝐆2=(𝐗2𝐗2)1/2𝐐, with 𝐏𝒪(𝑟) and 𝐐𝒪(𝑠) to be stipulated. The original model becomes {𝐘=𝐙1𝜶1+𝐙2𝜶2+𝝐} with 𝐙=[𝐙1,𝐙2] and 𝜶=[𝜶1,𝜶2], such that 𝐙1=𝐗1𝐆1, 𝐙2=𝐗2𝐆2, 𝜶1=𝐆11𝜷1, and 𝜶2=𝐆21𝜷2.

Following [31], cosines of angles between 𝒮𝑝(𝐗1) and 𝒮𝑝(𝐗2) are found as singular values generated by (𝐗1,𝐗2), to be designated as design linkage parameters {𝛿1,,𝛿𝑟}. To these ends, observe that 𝐗𝐗 in partitioned form transitions into 𝐙𝐙 through 𝐗𝐗𝐗=1𝐗1𝐗1𝐗2𝐗2𝐗1𝐗2𝐗2𝐆1𝐗1𝐗1𝐆1𝐆1𝐗1𝐗2𝐆2𝐆2𝐗2𝐗1𝐆1𝐆2𝐗2𝐗2𝐆2=𝐈𝑟𝐏𝐐𝐑𝐐𝐑𝐏𝐈𝑠=𝐈𝑟𝐃𝐃𝐈𝑠=𝐙𝐙.(3.1) Here 𝐑=(𝐗1𝐗1)1/2𝐗1𝐗2(𝐗2𝐗2)1/2; its singular decomposition is 𝐑 = 𝐏𝐃𝐐, where 𝐃=[𝐃𝛿,𝟎]; and elements of 𝐃𝛿=Diag(𝛿1,,𝛿𝑟) comprise the singular values of 𝐑. In particular, {𝜙𝑗=arccos(𝛿𝑗);1𝑗𝑟} defines the design linkage angles between 𝒮𝑝(𝐗1) and 𝒮𝑝(𝐗2) as subspaces of 𝑛.

To continue, partition 𝑉(𝜷)=𝚺=[𝚺𝑖𝑗] conformably with 𝜷=[𝜷1,𝜷2],  𝜷1𝑟,𝜷2𝑠; designate their inner product space as (𝑟𝑠,(,)𝚺), where 𝑟𝑠 is the direct sum and (,)𝚺 their inner product, as in Eaton [32, page 409]. Denote by {𝜌1,,𝜌𝑟} Hotelling’s [30] canonical correlations. Then by Proposition  10.2 of [32], {𝜌1,,𝜌𝑟} are cosines of angles between (𝑟,𝑠) as subspaces of (𝑟𝑠,(,)𝚺). In keeping with earlier usage, identify {𝒮𝑝(𝜷1),𝒮𝑝(𝜷2)} with {𝑟,𝑠}. As Hotelling's canonical correlations are invariant under affine transformations {𝜷1𝐀𝜷1+𝐜1,𝜷2𝐁𝜷2+𝐜2}, parameters may be redefined linearly, preserving subspaces, thus leaving the canonical correlations invariant. Retracing steps leading to the canonical design model embodied in (3.1), but now to preserve {𝒮𝑝(𝜷1),𝒮𝑝(𝜷2)}, it thus suffices to begin with the canonical model {𝐘=𝐙1𝜶1+𝐙2𝜶2+𝝐}, where 𝑉(𝜶)=𝜎2(𝐙𝐙)1 with 𝐙𝐙 as the rightmost matrix of (3.1).

We next establish connections between the design linkage parameters 𝐃𝛿 from (3.1), and the corresponding canonical correlations 𝐃𝜌=Diag(𝜌1,,𝜌𝑟), as derived eventually from 𝚺=(𝐗𝐗)1. A critical duality result is encoded in the following.

Theorem 3.1. Consider the design linkage parameters  𝐃𝛿 between {𝒮𝑝(𝐗1),𝒮𝑝(𝐗2)} as subspaces of 𝑛 and Hotelling’s [30] canonical correlations  𝐃𝜌 between {𝒮𝑝(𝜷1),𝒮𝑝(𝜷2)} as subspaces of (𝑟𝑠,(,)𝚺). Then 𝐃𝛿 and 𝐃𝜌 coincide.

Proof. In view of invariance of {𝜌1,,𝜌𝑟} under nonsingular linear transformations of 𝜷1𝑟 and of 𝜷2𝑠, canonical correlations between (𝜷1,𝜷2) proceed as in expression (3.1), but beginning instead on the left with 𝑉(𝜶) = (𝐙𝐙)1  in lieu of 𝐗𝐗. Specifically, with 𝐃 = [𝐃𝛿,𝟎], and using rules for block-partitioned inverses, we have 𝐙𝐙1=𝐈𝑟𝐃𝛿𝐃,𝟎𝛿,𝟎𝐈𝑠1=𝐈𝑟𝐃2𝛿1𝐈𝐃𝑠𝐃01𝐃𝐈𝑟𝐃2𝛿1𝐈𝑠𝐃01𝐈𝑟𝐃𝛿𝐃,𝟎𝛿,𝟎𝐈𝑠,(3.2) where equality at the first step follows using 𝐃𝐃=𝐃2𝛿 and 𝐃𝐃=Diag(𝐃2𝛿,𝟎)=𝐃0. The succeeding step utilizes the factors (𝐈𝑟𝐃2𝛿)1/2 and (𝐈𝑠𝐃0)1/2, taking the principal diagonal blocks of (𝐙𝐙)1 into (𝐈𝑟,𝐈𝑠) as in the rightmost matrix of (3.2), and its off-diagonal block from 𝐈𝑟𝐃2𝛿1/2𝐃𝛿𝐈,𝟎𝑠𝐃01𝐈𝑠𝐃01/2=𝐃𝛿,𝟎,(3.3) since diagonal matrices commute. But the off-diagonal block is precisely [𝐃𝜌,𝟎], the canonical correlations between (𝜷1,𝜷2), to complete our proof.

For subsequent reference, designate 𝜹(𝐗1𝐗2)=(𝛿1,,𝛿𝑟) and 𝜷𝝆(1𝜷2)=(𝜌1,,𝜌𝑟). Moreover, the foregoing analysis applies for models 𝐗0=[𝟏𝑛,𝐗] in M0, where 𝑟=1 and 𝑠=𝑘 as partitioned. In short, we have the following equivalences.

Corollary 3.2. (i) Consider the design linkage parameters {cos(𝜙𝑗)=𝛿𝑗;1𝑗𝑟}, gaging collinearity between {𝒮𝑝(𝐗1),𝒮𝑝(𝐗2)} as subspaces of 𝑛 and the canonical correlations {cos(𝜙𝑗)=𝜌𝑗;1𝑗𝑟}, between {𝒮𝑝(𝜷1),𝒮𝑝(𝜷2)} as subspaces of (𝑟𝑠,(,)𝚺). Then angles between these pairs of subspaces correspond one-to-one, that is, {𝜙𝑗=arccos(𝛿𝑗)=arccos(𝜌𝑗);1𝑗𝑟}.
(ii) For models 𝐗0=[𝟏𝑛,𝐗] in M0, the element 𝜹(𝟏𝑛𝐗)=𝛿1 generates the angle cos(𝜙1)=𝛿1 between the regressor vectors and the constant vector. Equivalently, this is given by cos(𝜙1)=𝜌1̂𝛽=𝝆(0𝜷) from duality.

3.2. Collinearity Indices

Stewart [33] reexamined numerical aspects of ill-conditioning, to the following effects for 𝐗0=[𝟏𝑛,𝐗]. Taking 𝐗0 = (𝐗0𝐗0)1𝐗0 as the pseudoinverse of note, and letting 𝐱𝑗 be its 𝑗th row, each collinearity index in the collection 𝜅𝑗=𝐱𝑗𝐱𝑗;𝑗=0,1,,𝑘(3.4) is constructed to be scale-invariant. Clearly 𝐱𝑗2 is found along the principal diagonal of [(𝐗0)(𝐗0)] = (𝐗0𝐗0)1. In addition, the conventional VIF𝑢s are squares of the collinearity indices, that is, {VIF𝑢(̂𝛽𝑗)=𝜅2𝑗;𝑗=0,1,,𝑘}. In particular, since 𝐱0=𝟏𝑛 in 𝐗0, we have 𝜅20 = 𝑛𝐱02.

Transcending Stewart’s analysis, we connect his collinearity indices to angles between subspaces as follows. Choose a typical 𝐱𝑗 in 𝐗0; rearrange 𝐗0 as [𝐱𝑗,𝐗[𝑗]] and similarly 𝜷 as [𝛽𝑗,𝜷[𝑗]]; and seek elements of 𝐐𝑗𝐗0𝐗01𝐐𝑗=𝐱𝑗𝐱𝑗𝐱𝑗𝐗[𝑗]𝐗[𝑗]𝐱𝑗𝐗[𝑗]𝐗[𝑗]1;𝑗=0,1,,𝑘(3.5) as reordered by each permutation matrix 𝐐𝑗. From the clockwise rule, the (1,1) element of each inverse is 𝐱𝑗𝐈𝑛𝐏𝑗𝐱𝑗1=𝐱𝑗𝐱𝑗𝐱𝑗𝐏𝑗𝐱𝑗1=𝐱𝑗2;𝑗=0,1,,𝑘,(3.6) where 𝐏𝑗 = 𝐗[𝑗][𝐗[𝑗]𝐗[𝑗]]1𝐗[𝑗] is the projection operator onto the subspace 𝒮𝑝(𝐗[𝑗])𝑛. These relationships in turn enable us to connect {𝜅2𝑗;𝑗=0,1,,𝑘} to the geometry of ill-conditioning as follows.

Theorem 3.3. For models in M0, let {VIF𝑢(̂𝛽𝑗)=𝜅2𝑗;𝑗=0,1,,𝑘} be conventional VIF𝑢s in terms of Stewart’s collinearity indices. These in turn quantify collinearities between subspaces through angles (in deg) as follows.
(i)Angles between [𝐱𝑗,𝐗[𝑗]] are given by 𝜙𝑗=arccos[(11/𝜅2𝑗)1/2], in succession for {𝑗=0,1,,𝑘}.(ii)Equivalently, {𝜅2𝑗=1/[1𝜹2(𝐱𝑗𝐗[𝑗])]=1/[1𝝆2(̂𝛽𝑗𝜷[𝑗])];𝑗=0,1,,𝑘}.(iii)In particular, 𝜙0=arccos[(11/𝜅20)1/2] quantifies the degree of collinearity between the regressor vectors and the constant vector.

Proof. From the geometry of the right triangle formed by (𝐱𝑗,𝐏𝑗𝐱𝑗), the squared lengths satisfy 𝐱𝑗2=𝐏𝑗𝐱𝑗2+𝑅𝑆𝑗, where 𝑅𝑆𝑗=𝐱𝑗𝐏𝑗𝐱𝑗2 is the residual sum of squares from the projection. Accordingly, the principal angle between (𝐱𝑗,𝐏𝑗𝐱𝑗) is given by 𝜙cos𝑗=𝐱𝑗𝐏𝑗𝐱𝑗𝐱𝑗𝐏𝑗𝐱𝑗=𝐏𝑗𝐱𝑗𝐱𝑗=1𝑅𝑆𝑗𝐱𝑗21/2=11𝜅2𝑗1/2(3.7) for {𝑗=0,1,,𝑘}, to give conclusion (i) and conclusion (ii) by duality. Conclusion (iii) follows on specializing (𝐱0,𝐏0𝐱0) with 𝐱0=𝟏𝑛 and 𝐏0=𝐗(𝐗𝐗)1𝐗, to complete our proof.

Remark 3.4. The foregoing developments specialize from Section 3.1 in that the partition [𝐱𝑗,𝐗[𝑗]] always has 𝑟=1 and 𝑠=𝑘, giving a single angle 𝜙𝑗. Rules-of-thumb in common use for problematic VIFs include those exceeding 10, as in [34], or even 4 as in [35], for example. In angular measure, these correspond respectively to 𝜙𝑗<18.435deg and 𝜙𝑗<30.0deg.

3.3. Case Study 1

Consider the model M0{𝑌𝑖=𝛽0+𝛽1𝑋1+𝛽2𝑋2+𝜖𝑖}, the design 𝐗0=[𝟏5,𝐗1,𝐗2] of order (5×3), and 𝐗0𝐗0 and its inverse as in 𝐗0=1111110.50.51011100,𝐗0𝐗0=,𝐗53132.501030𝐗01=.0.93751.12500.31251.12501.75000.37500.31250.37500.4375(3.8) Note first that VIF𝑢(̂𝛽0)=𝜅20=0.9375×5=4.6875. Next apply first principles to find 𝜹2𝟏5𝐗1,𝐗2=(5)1312.5003131𝝆=0.786666,2̂𝛽0̂𝛽1,̂𝛽2=(0.9375)11.12500.31251.75000.37500.37500.437511.12500.3125=0.786666,(3.9) both equal to 1(1/𝜅20) as in Theorem 3.3(ii). The remaining VIF𝑢s are found directly as VIF𝑢(̂𝛽1)=1.7500×2.5=4.3750 and VIF𝑢(̂𝛽2)=0.4375×3=1.3125. Using duality and earlier findings, we further compute 𝜹2𝐗1𝟏5,𝐗2=𝝆2̂𝛽1̂𝛽0,̂𝛽21=1𝜅211=1𝜹4.3750=0.771429,2𝐗2𝟏5,𝐗1=𝝆2̂𝛽2̂𝛽0,̂𝛽11=1𝜅221=11.3125=0.238095,(3.10) thereby preempting the need to undertake singular decompositions as required heretofore.

4. Determinant Identities

4.1. Background

The generalized variance, as a design criterion for {𝐘=𝐗𝜷+𝝐}, rests in part on the geometry of ellipsoids of the type 𝑅(𝜷)=𝜷𝑘𝜷𝜷𝐗𝐗𝜷𝜷𝑐2.(4.1) Choices for 𝑐2 in common usage give first (i) a confidence region for 𝜷, whose normal-theory confidence coefficient is 1𝛼 on taking 𝑐2=𝑆2𝑐2𝛼, with 𝑆2 as the residual mean square and 𝑐2𝛼 the 100(1𝛼) percentage point of 𝐹(;𝑘,𝑛𝑘); and otherwise admitting a lower Chebychev bound as in [36, page  92]. The alternative choice 𝑐2=𝑘+2 gives (ii) Cramér’s [37] ellipsoid of concentration for 𝜷, that is, the measure uniform over 𝑅(𝜷) having the same mean and dispersion matrix as 𝜷. The generalized variance 𝐺𝑉(𝜷)=|𝑉(𝜷)| is proportional to the squared volumes of these ellipsoids, smaller volumes reflecting tighter concentrations.

4.2. Factorizations

To continue, let some 𝐓(𝐘)=𝜽𝑘 be random having 𝐸(𝜽)=𝜽 and 𝑉(𝜽)=𝚺𝕊+𝑘; partition 𝜽=[𝜽1,𝜽2] and 𝚺=[𝚺𝑖𝑗] conformably, with 𝜽1𝑟 and 𝜽2𝑠 such that 𝑟𝑠 and 𝑟+𝑠=𝑘; and let 𝐺(𝜽)=|𝚺|1/𝑘. The canonical correlations [30], as singular values of 𝚺1/211𝚺12𝚺1/222, are now to be designated as 𝝆(12) = [𝜌1,,𝜌𝑟],  in lieu of 𝜷𝝆(1𝜷2), and to be ordered as {𝜌1𝜌2𝜌𝑟0}. Moreover, the quantity 𝜸(12)=Π𝑟𝑖=1(1𝜌2𝑖) is the Vector Alienation Coefficient of Hotelling [30]. The factorization |𝚺|=|𝚺11||𝚺22| for 𝚺=Diag(𝚺11,𝚺22) extends directly as an upper bound for any 𝚺=[𝚺𝑖𝑗], with further ramifications as follows.

Theorem 4.1. Consider 𝜽𝜽=[1,𝜽2]𝑘 having 𝜽𝐸()=[𝜽1,𝜽2] and 𝑉(𝜽)=[𝚺𝑖𝑗], such that 𝜽1𝑟 and 𝜽2𝑠 with 𝑟𝑠 and 𝑟+𝑠=𝑘.
(i)The determinant of 𝚺 = [𝚺𝑖𝑗] admits the factorization ||𝚺||=||𝚺11||||𝚺22||𝜸(12)(4.2) such that |𝚺||𝚺11||𝚺22| and 𝜸(12)=Π𝑟𝑖=1(1𝜌2𝑖)1.(ii)If 𝚺=Diag(𝚺11,𝚺22), then 𝐺(𝜽) is the geometric mean 𝜽𝐺(𝜽)=[𝐺(1)]𝑟/𝑘𝜽[𝐺(2)]𝑠/𝑘 of the quantities 𝜽𝐺(1) and 𝜽𝐺(2).(iii)Generally, for any 𝚺, the quantity 𝐺(𝜽) becomes 𝐺𝜽=𝐺𝜽1𝑟/𝑘𝐺𝜽2𝑠/𝑘[]𝜸(12)1/𝑘(4.3) in terms of 𝜽{𝐺(1𝜽),𝐺(2),𝜸(12)}.(iv)If 𝜽𝜽=[1,𝜽2,𝜽3] and 𝚺=[𝚺𝑖𝑗;1𝑖,𝑗3] are partitioned conformably, with 𝜽1𝑟,  𝜽2𝑠, and 𝜽3𝑡, such that 𝑟+𝑠+𝑡=𝑘, then 𝐺(𝜽) admits the factorization 𝐺𝜽=𝐺𝜽1𝑟/𝑘𝐺𝜽2𝑠/𝑘𝐺𝜽3𝑡/𝑘[]𝜸(123)𝜸(23)1/𝑘,(4.4) with 𝜸(123) and 𝜸(23) as the Vector Alienation Coefficients between {𝜽1𝜽,[2,𝜽3]} and between {𝜽2,𝜽3}, respectively.

Proof. As in Section 3.1 with 𝐑=𝚺1/211𝚺12𝚺1/222=𝐏𝐃𝐐 and 𝐃=[𝐃𝜌,𝟎], we have 𝚺11𝚺12𝚺21𝚺22=𝐖1𝟎𝟎𝐖2𝐈𝑟𝐃𝐃𝐈𝑠𝐖1𝟎𝟎𝐖2(4.5) with 𝐖1=𝚺1/211𝐏 and 𝐖2=𝚺1/222𝐐. The middle factor on the right has determinant |𝐈𝑟𝐃𝐃|=Π𝑟𝑖=1(1𝜌2𝑖) from the clockwise rule, so that |𝚺|=|𝚺11||𝚺22|Π𝑟𝑖=1(1𝜌2𝑖) to give conclusion (i). Conclusion (ii) follows directly from 𝐺(𝜽)=[𝐺𝑉(𝜽)]1/𝑘, and conclusion (iii) on combining (i) and (ii). Conclusion (iv) now follows on applying (iii) twice, first on partitioning 𝜽 into {𝜽1𝜽,[2,𝜽3]}, whose canonical correlations are 𝝆(123), then [𝜽2,𝜽3] into {𝜽2,𝜽3} having canonical correlations 𝝆(23), to complete our proof.

Remark 4.2. In short, Theorem 4.1 links determinants and principal subdeterminants precisely through angles between subspaces. Moreover, arguments leading to conclusion (iv) may be iterated recursively to achieve a hierarchical decomposition for four or more factors, as in the following with 𝑘=𝑟+𝑠+𝑡+𝑣, namely, 𝐺𝜽=𝐺𝜽1𝑟/𝑘𝐺𝜽2𝑠/𝑘𝐺𝜽3𝑡/𝑘𝐺𝜽4𝑣/𝑘[]𝜸(1234)𝜸(234)𝜸(34)1/𝑘.(4.6)

Remark 4.3. Hotelling’s [30] Vector Alienation Coefficient 𝜸(12)=Π𝑟𝑖=1(1𝜌2𝑖) is a composite index of linkage between {𝒮𝑝(𝜷1),𝒮𝑝(𝜷2)} as subspaces of (𝑟𝑠,(,)𝚺), decreasing in each {𝜌2𝑖;1𝑖𝑟}. Equivalently, duality asserts that 𝜸(12)=Π𝑟𝑖=1(1𝛿2𝑖) is the identical composite index of linkage between {𝒮𝑝(𝐗1),𝒮𝑝(𝐗2)} as subspaces of 𝑛.

Theorem 4.1 anticipates that 𝐷𝑠-inefficient subset estimators may be masked in a design exhibiting good overall 𝐷-efficiency. Conversely, a 𝐷𝑠-inefficient subset may contraindicate, incorrectly, the overall 𝐷-efficiency of a design. Details are provided in case studies to follow.

5. Case Studies

5.1. The Setting

Our tools are informative in input-output studies. In particular, specify {𝐘=𝐗0𝜷+𝝐} as a second-order model 𝑌(𝑥1,𝑥2,𝑥3) in three regressors and 𝑝=10 parameters, namely, 𝑌𝑖=𝛽0+𝛽1𝑥𝑖1+𝛽2𝑥𝑖2+𝛽3𝑥𝑖3+𝛽11𝑥2𝑖1+𝜷22𝑥2𝑖2+𝛽33𝑥2𝑖3+𝛽12𝑥𝑖1𝑥𝑖2+𝛽13𝑥𝑖1𝑥𝑖3+𝛽23𝑥𝑖2𝑥𝑖3+𝜖𝑖.;𝑖=1,2,,𝑛(5.1) Next partition 𝜷=[𝛽0,𝜷𝐿,𝜷𝑄,𝜷𝐼] with 𝜷𝐿=[𝛽1,𝛽2,𝛽3] as slopes; 𝜷𝑄=[𝛽11,𝛽22,𝛽33] as pure quadratic terms reflecting diminishing () or increasing (+) returns to inputs; and 𝜷𝐼=[𝛽12,𝛽13,𝛽23] as interactive terms reflecting synergistic (+) or antagonistic () effects for pairs of regressors in combination. Further let 𝜷𝑀=[𝜷𝐿,𝜷𝑄,𝜷𝐼] exclusive of 𝛽0, the latter a base line for 𝑌(0,0,0). We proceed under conventional homogeneous and uncorrelated errors, the minimizing solution 𝜷=(𝐗0𝐗0)1𝐗0𝐘 being unbiased with 𝑉(𝜷)=𝜎2(𝐗0𝐗0)1. We take 𝜎2, although unknown, to reflect natural variability in experimental materials and protocol, and thus applicable in a given setting independently of the choice of design. Accordingly, for present purposes we may standardize to 𝜎2=1.0 for reasons cited earlier.

5.2. The Designs

Early polynomial response designs made use of factorial experiments, setting levels as needed to meet the required degree. For example, the second-order model (5.1) in three regressors would require 33=27 runs. However, in the early 1950s such designs were seen to be excessive, in carrying redundant interactions beyond the pairs required in the model (5.1). In industrial and other settings where parsimony is desired, several small second-order designs have evolved, often on appending a few additional runs to two-level factorials or fractions thereof.

Eight such small designs of note here are the hybrids (H310,H311B) of [38], the small composite SCD [39], the BBD [40], the central composite rotatable design CCD [41], and designs ND [42], HD [43], and BDD [44]. The designs [H310,H311B,SCD,BBD,CCD,ND,HD,BDD] have numbers of runs as [11,11,11,13,15,11,11,11], respectively. These follow on adding a center run to all but design ND, rendering all as unsaturated having at least one degree of freedom for error. Specifically, the design ND of [42] already has 11 runs and is unsaturated. All designs have been scaled to span the same range for each regressor; and none strictly dominates another under the positive definite dispersion ordering. All determinants as listed derive from the respective 𝑉(𝜷) = (𝐗0𝐗0)1 and its submatrices. Subset efficiencies for {𝜷𝐿,𝜷𝑄,𝜷𝐼} were examined in [45] for selected designs using criteria other than 𝐷- and 𝐷𝑠-efficiencies. Our usage here, as elsewhere in the literature, considers 𝐺𝑉(𝜷) and 𝐺(𝜷) to be efficiency indices for 𝜷 specific to a particular design, to include subsets {𝜷𝑖;𝑖𝖨}, and smaller values reflect greater efficiencies through smaller volumes of concentration ellipsoids. On the other hand, the comparative efficiencies of two designs for estimating 𝜷 or {𝜷𝑖;𝑖𝖨} are found as ratios of these quantities.

5.3. Numerical Studies

Details for these designs are listed in the accompanying tables. Table 1 gives values 𝐺()=[𝐺𝑉()]1/dim for 𝜷 and selected subsets, with dim as the order of the determinant. Also listed are angles 𝜙(𝟏𝑛𝐗)deg between regressors and the constant, to be noted subsequently. Table 2 displays the squared canonical correlations 𝝆2(𝜷𝑖𝜷𝑗) between designated subsets, and Table 3 the corresponding Vector Alienation Coefficient 𝜷𝜸(𝑖𝜷𝑗)=Π𝑟𝑖=1(1𝜌2𝑖), for specified pairs. Here {0L,QI} refers to the pair ̂𝛽{[0,𝜷𝐿𝜷],[𝑄,𝜷𝐼]}, for example. Moreover, values of the composite indices 𝜷𝜸(𝑖𝜷𝑗) = 𝜸(𝐗𝑖𝐗𝑗), if much less than unity, serve to alert the user as to potential problems with ill-conditioning.

5.3.1. An Overview

To fix ideas, observe for the CCD that 𝜷𝐺(𝑄,𝜷𝐼)=1.13633, 𝜷𝐺(𝑄)=1.03300, and 𝜷𝐺(𝐼)=1.25000 from Table 1. These not only are comparable in magnitude, but are commensurate, in having been adjusted for dimensions and thus homogeneous of unit degree, as are all entries in Table 1. Moreover, since (𝜷𝑄,𝜷𝐼) are uncorrelated and 𝜷𝛾(𝑄,𝜷𝐼) = 1.0 for the CCD from Table 3, 𝜷𝐺(𝑄,𝜷𝐼) is the geometric mean 1.13633=(1.03300)3/6(1.25000)3/6 from Theorem 4.1(ii). A further rough spot check of Table 1 may be summarized as follows.

Summary Properties
(P1)Compared with 𝐺(𝜷), values for ̂𝛽𝐺(0) appear excessive throughout.(P2)Values for 𝜷𝐺(𝐿) are roughly comparable across designs.(P3)The eight designs sort essentially into two groups.(P4)Designs {H310,H311B,SCD,BBD,CCD} overall are comparatively 𝐷- and 𝐷𝑠-efficient, with the noted exception being 𝜷𝐺(𝐼)=4.16667 for the SCD.(P5)The designs {ND,HD,BDD} are considerably less 𝐷-efficient, with their generalized variances 𝐺𝑉(𝜷) being {1192.09,4768.37,2886.03}, respectively, in comparison with {57.342,11.852,74.422,2.722,0.523} for the remaining designs; and each of the former is burdened by unequivocal 𝐷𝑠-inefficiency for 𝜷𝑄, to be treated subsequently.

5.3.2. Further Details

We next examine Hartley’s [39] SCD in some detail, first in terms of generalized variances. Values for 𝐺𝑉(𝜷),  ̂𝛽𝐺𝑉(0), and 𝜷𝐺𝑉(𝑀) appear in the first row of Table 4, along with ̂𝛽𝜸(0𝜷𝑀)=(1.00.909090)=0.090909 using 𝝆2(0𝑀)=0.909090 from Table 2. Theorem 4.1 (i) now asserts that ̂𝛽𝐺𝑉(𝜷)=𝐺𝑉(0𝜷)𝐺𝑉(𝑀̂𝛽)𝜸(0𝜷𝑀), as verified numerically through 74.4216=10.00(81.8638)(0.090909). In a similar manner, 𝜷𝑀 partitions into {𝜷𝐿𝜷,[𝑄,𝜷𝐼]}, where 𝜷𝐺𝑉(𝐿)=4.6296 and 𝜷𝐺𝑉([𝑄,𝜷𝐼])=81.8638 from Table 4. The squared canonical correlations between {𝜷𝐿𝜷,[𝑄,𝜷𝐼]} are 𝝆2(𝐿𝑄𝐼)=[0.4000,0.4000,0.4000] from Table 2, so that 𝜸(𝐿𝑄𝐼)=0.21600 as in Table 3. Theorem 4.1 (i) again recovers 𝜷𝐺𝑉(𝑀) as 81.8638=4.6926(81.8638)(0.21600) since 𝜷𝐺(𝐿) and 𝜸(𝐿𝑄𝐼) are reciprocals in this instance. Moreover, 𝜷𝐺𝑉(𝑄,𝜷𝐼𝜷)=𝐺𝑉(𝑄𝜷)𝐺𝑉(𝐼)𝜸(𝑄𝐼) translates into 81.8638=1.1317(72.3379)(1.0), where 𝜸(𝑄𝐼)=1.0 since elements of {𝜷𝑄,𝜷𝐼} are mutually uncorrelated from Table 2. In summary, the value 𝐺(𝜷) for the SCD admits the factorization of (4.6), on identifying {𝜽1,𝜽2,𝜽3,𝜽4} with {̂𝛽0,𝜷𝐿,𝜷𝑄,𝜷𝐼}, respectively, given numerically from Tables 1 and 3 as 1.53876=(10.00)1/10(1.66667)3/10(1.04210)3/10(4.16667)3/10[](0.090909)(0.21600)(1.0)1/10.(5.2)

Corresponding factorizations proceed similarly for other designs. Details are left to the reader, but values for [𝜸(0𝐿𝑄𝐼)𝜸(𝐿𝑄𝐼)𝜸(𝑄𝐼)]1/10, the rightmost factor of (5.2), are supplied for each design as the final row of Table 3. Although the tables, together with Theorem 4.1, support other factorizations, the one featured here seems most natural in terms of the parameters {𝛽0,𝜷𝐿,𝜷𝑄,𝜷𝐼}, together with their central roles in identifying noteworthy treatment effects in second-order models.

5.4. Masking

The 𝐷-efficiency index of the SCD, at 𝐺𝑉(𝜷)=74.4216, is larger but roughly comparable to that of H310 at 𝐺𝑉(𝜷)=57.3418. What cannot be anticipated from these facts alone, however, is that the (3×3) determinant 𝜷𝐺𝑉(𝐼)=72.3379 for the SCD is comparable to its (10×10) determinant 𝐺𝑉(𝜷)=74.4216, despite their disparate dimensions. Adjusting for dimensions gives 𝐺(𝜷)=(74.4216)1/10=1.53876 and 𝜷𝐺(𝐼)=(72.3379)1/3=4.16667 for the SCD. This illustrates the masking of a remarkably inefficient estimator for 𝜷𝐼, despite the value 𝐺(𝜷)=1.53876 in estimating all parameters. This masking stems from the nonorthogonality of subset estimators as reflected in their canonical correlations and Vector Alienation Coefficients. In contrast are the corresponding commensurate values for the H310 design, namely, 𝐺(𝜷)=(57.3418)1/10=1.49916 and 𝜷𝐺(𝐼)=(4.1768)1/3=1.61045. It may be noted that the condition number 𝑐1(𝐗0𝐗0) is 21.59 for H310, with the somewhat larger value 54.01 for the SCD.

We next examine the 𝐷𝑠-inefficiencies of ND and HD for 𝜷𝑄 as noted earlier, with 𝜷𝐺(𝑄) taking values 7.80031 and 6.61313, respectively. Our reference for masking is 𝜷𝐺(𝐿,𝜷𝑄). These values are not listed in Table 1, but may be recovered from Tables 2 and 3 as follows. Specifically, for ND we have 𝜷𝐺𝑉𝐿𝜷𝐺𝑉𝑄𝛾(𝐿𝐐)1/6=𝜷𝐺𝑉𝐿,𝜷𝑄1/6𝜷=𝐺𝐿,𝜷𝑄,(1.25000)3/6(7.80031)3/6(0.72428)1/6=2.95912,(5.3) where neither 𝜷𝐺(𝐿)=1.25000 nor 𝜷𝐺(𝐿,𝜷𝑄)=2.95912 appears excessive. In consequence, that 𝜷𝐺(𝑄)=7.80031 is excessive would be masked on examining 𝜷𝐺(𝐿) and 𝜷𝐺(𝐿,𝜷𝑄) only. Parallel steps for HD give the factorization (1.91189)3/6(6.61313)3/6(0.78514)1/6 = 3.41528, with similar conclusions in regard to masking.

5.5. Collinearity with the Constant

Advocates for these and other small designs have focused on 𝐷,𝐷𝑠, and other efficiency criteria, as well as the parsimony of small designs and their advantage in industrial experiments. To the knowledge of this observer, none has considered prospects for ill-conditioning and its consequences, despite the fact that columns of 𝐗 are necessarily inter-linked as a consequence of second-order from first-order effects. Nonetheless, from Section 3.1 and Corollary 3.2, we may compute angles between the constant vector and the span of the regressors using duality together with the information at hand. This may prove to be critical in view of the admonition [29] that “collinearity with the intercept can quite generally corrupt the estimates of all parameters in the model.” As noted in Remark 3.4, rules-of-thumb for problematic VIFs include those exceeding 10 or 4 or, in angular measure, 𝜙<18.435deg and 𝜙<30.00deg. From the last row of Table 2, the angles 𝜙(𝟏𝑛𝐗) have been computed for each of the eight designs, as listed in the final column of Table 1. For example, arccos(0.81990)=25.1116deg for H310. It is seen that all designs are flagged as potentially problematic using rules-of-thumb as cited. This adds yet another layer of concerns, heretofore unrecognized, in seeking further to implement these designs already in wide usage.

6. Conclusions

Duality of (i) Hotelling’s [30] canonical correlations {𝜌1,,𝜌𝑟} between the 𝑂𝐿𝑆 estimators {𝜷1,𝜷2} and (ii) the design linkage parameters {𝛿1,,𝛿𝑟} between {𝐗1,𝐗2} is established at the outset. Stewart’s [33] collinearity indices are then extended to encompass angles {𝜙0,𝜙1,,𝜙𝑘} between each column of 𝐗0=[𝟏𝑛,𝐗1,,𝐗𝑘] and remaining columns. In particular, 𝜙0 quantifies numerically the collinearity of regressors with the intercept, of concern in the prospective corruption of all estimates due to ill-conditioning.

Matrix identities factorize a determinant in terms of principal subdeterminants and the Vector Alienation Coefficients of [30] between {𝜷1,𝜷2}. By duality, the latter also are Alienation Coefficients between {𝐗1,𝐗2}. These identities in turn are applied in the study of 𝐷𝑠-efficiencies for the parameters {𝛽0,𝜷𝐿,𝜷𝑄,𝜷𝐼} in eight small second-order designs from the literature. Studies on 𝐷𝑠- and 𝐷-efficiencies, as cited in our opening paragraph, confirm that designs are seldom efficient for both. Our determinant identities support a rational explanation. In particular, these identities unmask the propensity for 𝐷𝑠-inefficient subset estimators to be masked through near collinearities in overall 𝐷-efficient designs.

Finally, the evidence suggests that all eight designs are vulnerable, to varying degrees, to the corruption of all estimates due to ill-conditioning. In short, we have exposed quantitatively the structural origins of masking through Hotelling’s [30] canonical correlations, and their equivalent design linkage parameters. This analysis in turn proceeds from the design matrix itself rather than empirical estimates, so that any design can be evaluated beforehand with regard to masking and possible subset inefficiencies, rather than retrospectively after having committed to a given design in a particular experiment.