Abstract
The canonical correlations between subsets of OLS estimators are identified with design linkage parameters between their regressors. Known collinearity indices are extended to encompass angles between each regressor vector and remaining vectors. One such angle quantifies the collinearity of regressors with the intercept, of concern in the corruption of all estimates due to ill-conditioning. Matrix identities factorize a determinant in terms of principal subdeterminants and the canonical Vector Alienation Coefficients between subset estimators—by duality, the Alienation Coefficients between subsets of regressors. These identities figure in the study of D and as determinant efficiencies for estimators and their subsets, specifically, -efficiencies for the constant, linear, pure quadratic, and interactive coefficients in eight known small second-order designs. Studies on D- and -efficiencies confirm that designs are seldom efficient for both. Determinant identities demonstrate the propensity for -inefficient subsets to be masked through near collinearities in overall D-efficient designs.
1. Introduction
Given of full rank with homogeneous, uncorrelated errors, the OLS estimators are unbiased with second-moment matrix . Such moment matrices pervade experimental design, to include determinants as gauges of - and -efficiencies for estimators and their subsets. Early references trace to [1–4], and more recently to [5–10] and others. Finding -efficient designs for polynomial models is considered in [11–20], for example. Studies examining the -efficiencies of -efficient designs confirm that designs are seldom efficient for both; see [13, 21–23]. From those beginnings, the study of - and -efficiencies continues apace. To wit, a recent key-word search in the Current Index to Statistics shows in excess of 60 listings from 2006 to 2010, and more than 100 from 2001 to 2010. Moreover, these ideas bear fruit in a widening diversity of applications as evidenced in the following.
To fix ideas, let correspond to a polynomial of degree , namely, . In toxicology studies, a two-stage experiment is proffered in [24], seeking -efficiency in estimating overall parameters at the first stage, then -efficiency at the second stage in estimating a critical “threshold parameter,” using quasilikelihood in nonlinear models. Coupled with this is the -efficiency for the remaining parameters at the second stage. In related work [25], experiments with chemicals in combination are to be examined along fixed-ratio rays. When restricted to a specified ray, the fundamental hypothesis of noninteracting factors can be rejected when higher-order polynomial terms are required in the total dose-response model in the linear predictor . Here refers to and to -efficiency in the critical estimation of , which vanish under the conjectured additivity. Moreover, in [11, 12, 21] refers to , to , and to , the highest-order term in , for example. In short, users often are properly concerned with both - and -efficiencies, and connections between these basic criteria deserve further study, to be undertaken here.
Ill-conditioning, as near-collinearity among the columns of , “causes crucial elements of to be large and unstable,” “creating inflated variances,” and estimates that are “very sensitive to small changes in ,” having “degraded numerical accuracy;” see [26–28], for example. Diagnostics include the condition number , the ratio of largest to smallest eigenvalues; and the Variance Inflation Factors with and , that is, ratios of actual ( to “ideal” variances had the columns of been orthogonal. In models with intercept, “collinearity with the intercept can quite generally corrupt the estimates of all parameters in the model whether or not the intercept is itself of interest and whether or not the data have been (mean) centered,” as noted in [29].
To the foregoing list of ills from ill-conditioning, we add that not only are designs seldom efficient for both, but -inefficient estimators may be masked in overall -efficient designs, and conversely. This masking may be quantified in terms of structural dependencies, specifically, through determinant identities linking - and -efficiencies to various gages of nonorthogonality of the data. The latter include nonvanishing inner products between columns of regressors, Hotelling’s [30] canonical correlations among OLS solutions, and s. An outline follows.
Section 2 contains supporting material. Details surrounding collinearity diagnostics are topics in Section 3, to include duality of angles between subspaces of the design and parameter spaces, and their connections to VIFs. Section 4 develops basic determinant identities and inequalities of independent interest. Section 5 revisits eight small second-order designs with regard to -efficiencies in estimating the constant, linear, pure quadratic, and interactive coefficients, to include the masking of inefficient estimators. Though in wide usage, with no apparent accounting for collinearity, these designs are seen to exhibit varying degrees of collinearity of regressors with the constant. Since computations proceed from the design matrix itself, an advantage is that prospective designs can be evaluated beforehand in regard to issues studied here, before committing to an actual experiment. Section 6 concludes with a brief summary.
2. Preliminaries
2.1. Notation
Spaces of note include as Euclidean -space; as its positive orthant; as the real matrices of order ; as the real symmetric matrices; and as their positive definite varieties. The transpose, inverse, trace, and determinant of are , , , and ; and is its spectral square root. Special arrays include the unit vector , the identity of order , the block-diagonal matrix , the idempotent form , and as the real orthogonal group of matrices. For of rank , designate a pseudoinverse as , its ordered singular values as , and by , the linear span of columns of . Its condition number is , specifically, .
The mean, dispersion matrix, and generalized variance for a random are designated as , , and , respectively. To account for dimension, consider as a function homogeneous of unit degree. The class , comprising models with intercept and dispersion , is our principal focus. Unless stated otherwise, we take , since variance ratios are scale-invariant. A distinction is drawn between centered and uncentered s, namely, s and s, the former from columns of centered to their means. The latter, designated as , are diagonal elements of divided by reciprocals of diagonals of itself. These are of subsequent interest. Special distributions on include the Snedecor-Fisher distribution having degrees of freedom and noncentrality .
3. Collinearity Diagnostics
Ill-conditioned models , burdened with difficulties as cited, trace to nonorthogonality among columns of . To examine aspects of near collinearity, we first establish duality between design linkage parameters among columns of , and collinearity among the OLS solutions as quantified by Hotelling's [30] canonical correlations.
3.1. Duality Results
Partition a generic as with of orders , respectively, having ranks such that and . Accordingly, write , taking , and denoting by and , the subspaces of spanned by columns of and . We seek a canonical form preserving these subspaces and linkage between , a geometric concept independent of bases for representing and . Accordingly, let and , with and to be stipulated. The original model becomes with and , such that , , , and .
Following [31], cosines of angles between and are found as singular values generated by , to be designated as design linkage parameters . To these ends, observe that in partitioned form transitions into through Here ; its singular decomposition is = , where ; and elements of comprise the singular values of . In particular, defines the design linkage angles between and as subspaces of .
To continue, partition conformably with , ; designate their inner product space as , where is the direct sum and their inner product, as in Eaton [32, page 409]. Denote by Hotelling’s [30] canonical correlations. Then by Proposition 10.2 of [32], are cosines of angles between as subspaces of . In keeping with earlier usage, identify with . As Hotelling's canonical correlations are invariant under affine transformations , parameters may be redefined linearly, preserving subspaces, thus leaving the canonical correlations invariant. Retracing steps leading to the canonical design model embodied in (3.1), but now to preserve , it thus suffices to begin with the canonical model , where with as the rightmost matrix of (3.1).
We next establish connections between the design linkage parameters from (3.1), and the corresponding canonical correlations , as derived eventually from . A critical duality result is encoded in the following.
Theorem 3.1. Consider the design linkage parameters between as subspaces of and Hotelling’s [30] canonical correlations between as subspaces of . Then and coincide.
Proof. In view of invariance of under nonsingular linear transformations of and of , canonical correlations between proceed as in expression (3.1), but beginning instead on the left with = in lieu of . Specifically, with = , and using rules for block-partitioned inverses, we have where equality at the first step follows using and . The succeeding step utilizes the factors and , taking the principal diagonal blocks of into as in the rightmost matrix of (3.2), and its off-diagonal block from since diagonal matrices commute. But the off-diagonal block is precisely , the canonical correlations between , to complete our proof.
For subsequent reference, designate and . Moreover, the foregoing analysis applies for models in M0, where and as partitioned. In short, we have the following equivalences.
Corollary 3.2.
(i) Consider the design linkage parameters , gaging collinearity between as subspaces of and the canonical correlations , between as subspaces of . Then angles between these pairs of subspaces correspond one-to-one, that is, .
(ii) For models in M0, the element generates the angle between the regressor vectors and the constant vector. Equivalently, this is given by from duality.
3.2. Collinearity Indices
Stewart [33] reexamined numerical aspects of ill-conditioning, to the following effects for . Taking = as the pseudoinverse of note, and letting be its th row, each collinearity index in the collection is constructed to be scale-invariant. Clearly is found along the principal diagonal of = . In addition, the conventional s are squares of the collinearity indices, that is, . In particular, since in , we have = .
Transcending Stewart’s analysis, we connect his collinearity indices to angles between subspaces as follows. Choose a typical in ; rearrange as and similarly as ; and seek elements of as reordered by each permutation matrix . From the clockwise rule, the element of each inverse is where = is the projection operator onto the subspace . These relationships in turn enable us to connect to the geometry of ill-conditioning as follows.
Theorem 3.3. For models in M0, let be conventional s in terms of Stewart’s collinearity indices. These in turn quantify collinearities between subspaces through angles (in ) as follows.
(i)Angles between are given by , in succession for .(ii)Equivalently, .(iii)In particular, quantifies the degree of collinearity between the regressor vectors and the constant vector.
Proof. From the geometry of the right triangle formed by , the squared lengths satisfy , where is the residual sum of squares from the projection. Accordingly, the principal angle between is given by for , to give conclusion (i) and conclusion (ii) by duality. Conclusion (iii) follows on specializing with and , to complete our proof.
Remark 3.4. The foregoing developments specialize from Section 3.1 in that the partition always has and , giving a single angle . Rules-of-thumb in common use for problematic VIFs include those exceeding 10, as in [34], or even 4 as in [35], for example. In angular measure, these correspond respectively to and .
3.3. Case Study 1
Consider the model , the design of order , and and its inverse as in Note first that . Next apply first principles to find both equal to as in Theorem 3.3(ii). The remaining s are found directly as and . Using duality and earlier findings, we further compute thereby preempting the need to undertake singular decompositions as required heretofore.
4. Determinant Identities
4.1. Background
The generalized variance, as a design criterion for , rests in part on the geometry of ellipsoids of the type Choices for in common usage give first (i) a confidence region for , whose normal-theory confidence coefficient is on taking , with as the residual mean square and the percentage point of ; and otherwise admitting a lower Chebychev bound as in [36, page 92]. The alternative choice gives (ii) Cramér’s [37] ellipsoid of concentration for , that is, the measure uniform over having the same mean and dispersion matrix as . The generalized variance is proportional to the squared volumes of these ellipsoids, smaller volumes reflecting tighter concentrations.
4.2. Factorizations
To continue, let some be random having and ; partition and conformably, with and such that and ; and let . The canonical correlations [30], as singular values of , are now to be designated as = , in lieu of , and to be ordered as . Moreover, the quantity is the Vector Alienation Coefficient of Hotelling [30]. The factorization for extends directly as an upper bound for any , with further ramifications as follows.
Theorem 4.1. Consider having and , such that and with and .
(i)The determinant of = admits the factorization
such that and .(ii)If , then is the geometric mean of the quantities and .(iii)Generally, for any , the quantity becomes
in terms of .(iv)If and are partitioned conformably, with , , and , such that , then admits the factorization
with and as the Vector Alienation Coefficients between and between , respectively.
Proof. As in Section 3.1 with and , we have with and . The middle factor on the right has determinant from the clockwise rule, so that to give conclusion (i). Conclusion (ii) follows directly from , and conclusion (iii) on combining (i) and (ii). Conclusion (iv) now follows on applying (iii) twice, first on partitioning into , whose canonical correlations are , then into having canonical correlations , to complete our proof.
Remark 4.2. In short, Theorem 4.1 links determinants and principal subdeterminants precisely through angles between subspaces. Moreover, arguments leading to conclusion (iv) may be iterated recursively to achieve a hierarchical decomposition for four or more factors, as in the following with , namely,
Remark 4.3. Hotelling’s [30] Vector Alienation Coefficient is a composite index of linkage between as subspaces of , decreasing in each . Equivalently, duality asserts that is the identical composite index of linkage between as subspaces of .
Theorem 4.1 anticipates that -inefficient subset estimators may be masked in a design exhibiting good overall -efficiency. Conversely, a -inefficient subset may contraindicate, incorrectly, the overall -efficiency of a design. Details are provided in case studies to follow.
5. Case Studies
5.1. The Setting
Our tools are informative in input-output studies. In particular, specify as a second-order model in three regressors and parameters, namely, Next partition with as slopes; as pure quadratic terms reflecting diminishing or increasing returns to inputs; and as interactive terms reflecting synergistic or antagonistic effects for pairs of regressors in combination. Further let exclusive of , the latter a base line for . We proceed under conventional homogeneous and uncorrelated errors, the minimizing solution being unbiased with . We take , although unknown, to reflect natural variability in experimental materials and protocol, and thus applicable in a given setting independently of the choice of design. Accordingly, for present purposes we may standardize to for reasons cited earlier.
5.2. The Designs
Early polynomial response designs made use of factorial experiments, setting levels as needed to meet the required degree. For example, the second-order model (5.1) in three regressors would require runs. However, in the early 1950s such designs were seen to be excessive, in carrying redundant interactions beyond the pairs required in the model (5.1). In industrial and other settings where parsimony is desired, several small second-order designs have evolved, often on appending a few additional runs to two-level factorials or fractions thereof.
Eight such small designs of note here are the hybrids of [38], the small composite [39], the [40], the central composite rotatable design [41], and designs [42], [43], and [44]. The designs have numbers of runs as , respectively. These follow on adding a center run to all but design , rendering all as unsaturated having at least one degree of freedom for error. Specifically, the design of [42] already has 11 runs and is unsaturated. All designs have been scaled to span the same range for each regressor; and none strictly dominates another under the positive definite dispersion ordering. All determinants as listed derive from the respective = and its submatrices. Subset efficiencies for were examined in [45] for selected designs using criteria other than - and -efficiencies. Our usage here, as elsewhere in the literature, considers and to be efficiency indices for specific to a particular design, to include subsets , and smaller values reflect greater efficiencies through smaller volumes of concentration ellipsoids. On the other hand, the comparative efficiencies of two designs for estimating or are found as ratios of these quantities.
5.3. Numerical Studies
Details for these designs are listed in the accompanying tables. Table 1 gives values for and selected subsets, with as the order of the determinant. Also listed are angles between regressors and the constant, to be noted subsequently. Table 2 displays the squared canonical correlations between designated subsets, and Table 3 the corresponding Vector Alienation Coefficient , for specified pairs. Here refers to the pair , for example. Moreover, values of the composite indices = , if much less than unity, serve to alert the user as to potential problems with ill-conditioning.
5.3.1. An Overview
To fix ideas, observe for the that , , and from Table 1. These not only are comparable in magnitude, but are commensurate, in having been adjusted for dimensions and thus homogeneous of unit degree, as are all entries in Table 1. Moreover, since are uncorrelated and = 1.0 for the from Table 3, is the geometric mean from Theorem 4.1(ii). A further rough spot check of Table 1 may be summarized as follows.
Summary Properties
(P1)Compared with , values for appear excessive throughout.(P2)Values for are roughly comparable across designs.(P3)The eight designs sort essentially into two groups.(P4)Designs overall are comparatively - and -efficient, with the noted exception being for the .(P5)The designs are considerably less -efficient, with their generalized variances being , respectively, in comparison with ,11.852, for the remaining designs; and each of the former is burdened by unequivocal -inefficiency for , to be treated subsequently.
5.3.2. Further Details
We next examine Hartley’s [39] in some detail, first in terms of generalized variances. Values for , , and appear in the first row of Table 4, along with using from Table 2. Theorem 4.1 (i) now asserts that , as verified numerically through . In a similar manner, partitions into , where and from Table 4. The squared canonical correlations between are from Table 2, so that as in Table 3. Theorem 4.1 (i) again recovers as since and are reciprocals in this instance. Moreover, translates into , where since elements of are mutually uncorrelated from Table 2. In summary, the value for the admits the factorization of (4.6), on identifying with , respectively, given numerically from Tables 1 and 3 as
Corresponding factorizations proceed similarly for other designs. Details are left to the reader, but values for , the rightmost factor of (5.2), are supplied for each design as the final row of Table 3. Although the tables, together with Theorem 4.1, support other factorizations, the one featured here seems most natural in terms of the parameters , together with their central roles in identifying noteworthy treatment effects in second-order models.
5.4. Masking
The -efficiency index of the , at , is larger but roughly comparable to that of at . What cannot be anticipated from these facts alone, however, is that the determinant for the is comparable to its determinant , despite their disparate dimensions. Adjusting for dimensions gives and for the . This illustrates the masking of a remarkably inefficient estimator for , despite the value in estimating all parameters. This masking stems from the nonorthogonality of subset estimators as reflected in their canonical correlations and Vector Alienation Coefficients. In contrast are the corresponding commensurate values for the design, namely, and . It may be noted that the condition number is 21.59 for , with the somewhat larger value 54.01 for the .
We next examine the -inefficiencies of and for as noted earlier, with taking values 7.80031 and 6.61313, respectively. Our reference for masking is . These values are not listed in Table 1, but may be recovered from Tables 2 and 3 as follows. Specifically, for we have where neither nor appears excessive. In consequence, that is excessive would be masked on examining and only. Parallel steps for give the factorization = 3.41528, with similar conclusions in regard to masking.
5.5. Collinearity with the Constant
Advocates for these and other small designs have focused on , and other efficiency criteria, as well as the parsimony of small designs and their advantage in industrial experiments. To the knowledge of this observer, none has considered prospects for ill-conditioning and its consequences, despite the fact that columns of are necessarily inter-linked as a consequence of second-order from first-order effects. Nonetheless, from Section 3.1 and Corollary 3.2, we may compute angles between the constant vector and the span of the regressors using duality together with the information at hand. This may prove to be critical in view of the admonition [29] that “collinearity with the intercept can quite generally corrupt the estimates of all parameters in the model.” As noted in Remark 3.4, rules-of-thumb for problematic s include those exceeding 10 or 4 or, in angular measure, and . From the last row of Table 2, the angles have been computed for each of the eight designs, as listed in the final column of Table 1. For example, for . It is seen that all designs are flagged as potentially problematic using rules-of-thumb as cited. This adds yet another layer of concerns, heretofore unrecognized, in seeking further to implement these designs already in wide usage.
6. Conclusions
Duality of (i) Hotelling’s [30] canonical correlations between the estimators and (ii) the design linkage parameters between is established at the outset. Stewart’s [33] collinearity indices are then extended to encompass angles between each column of and remaining columns. In particular, quantifies numerically the collinearity of regressors with the intercept, of concern in the prospective corruption of all estimates due to ill-conditioning.
Matrix identities factorize a determinant in terms of principal subdeterminants and the Vector Alienation Coefficients of [30] between . By duality, the latter also are Alienation Coefficients between . These identities in turn are applied in the study of -efficiencies for the parameters in eight small second-order designs from the literature. Studies on - and -efficiencies, as cited in our opening paragraph, confirm that designs are seldom efficient for both. Our determinant identities support a rational explanation. In particular, these identities unmask the propensity for -inefficient subset estimators to be masked through near collinearities in overall -efficient designs.
Finally, the evidence suggests that all eight designs are vulnerable, to varying degrees, to the corruption of all estimates due to ill-conditioning. In short, we have exposed quantitatively the structural origins of masking through Hotelling’s [30] canonical correlations, and their equivalent design linkage parameters. This analysis in turn proceeds from the design matrix itself rather than empirical estimates, so that any design can be evaluated beforehand with regard to masking and possible subset inefficiencies, rather than retrospectively after having committed to a given design in a particular experiment.