International Journal of Plant Genomics

Volume 2008 (2008), Article ID 584360, 16 pages

http://dx.doi.org/10.1155/2008/584360

## Statistical Analysis of Efficient Unbalanced Factorial Designs for Two-Color Microarray Experiments

Department of Animal Science, College of Agriculture and Natural Resources, Michigan State University, East Lansing, MI 48824-1225, USA

Received 2 November 2007; Revised 22 January 2008; Accepted 25 April 2008

Academic Editor: Chunguang Du

Copyright © 2008 Robert J. Tempelman. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Experimental designs that efficiently embed a fixed effects treatment structure within a random effects design structure typically require a mixed-model approach to data analyses. Although mixed model software tailored for the analysis of two-color microarray data is increasingly available, much of this software is generally not capable of correctly analyzing the elaborate incomplete block designs that are being increasingly proposed and used for factorial treatment structures. That is, optimized designs are generally unbalanced as it pertains to various treatment comparisons, with different specifications of experimental variability often required for different treatment factors. This paper uses a publicly available microarray dataset, as based upon an efficient experimental design, to demonstrate a proper mixed model analysis of a typical unbalanced factorial design characterized by incomplete blocks and hierarchical levels of variability.

#### 1. Introduction

The choice and optimization of experimental designs for two-color microarrays have been receiving increasing attention [1–13]. Interest has been particularly directed towards optimizing experiments that involve a factorial design construction [7, 9, 14] in order to study the joint effects of several factors such as, for example, genotypes, pathogens, and herbicides. It is well known by plant scientists that factorial designs are more efficient than one-factor-at-a-time studies and allow the investigation of potentially interesting interactions between two or more factors. For example, investigators may study how herbicide effects (i.e., mean differences) depend upon plant genotypes or times after application.

Two-color systems such as spotted cDNA or long oligonucleotide microarrays involve hybridizations of two different mRNA samples to the same microarray, each of the two samples being labeled with a different dye (e.g., Cy3 or Cy5; Alexa555 or Alexa647). These microarrays, also simply referred to as arrays or slides, generally contain thousands of probes with generally a few (≤4) spots per probe, and most often just one spot per probe. Each probe specifically hybridizes to a matching mRNA transcript of interest within each sample. After hybridization, microarray images are scanned at two different wavelengths as appropriate for each dye, thereby providing two different fluorescence intensity measurements for each probe. Upon further preprocessing or normalization [15], these dye-specific intensities for each probe are believed to reflect the relative mRNA abundance for the corresponding transcript within the respectively labeled samples. The normalized intensities, or the Cy3/Cy5 ratio thereof, for each spot are typically logarithmically transformed to render data that is generally characterized to be approximately normally distributed.

An increasingly unifying and indisputable message is that the heavily used common reference design is statistically inefficient [1, 9, 10, 12, 13]. Here, the same common reference sample or pool is reused as one of the two samples on every microarray, the other sample deriving from a treatment group of interest. Hence, inferences on differential expression are based only on indirect connections across arrays as samples from different treatments of interest are never directly connected or hybridized together on the same microarray. In contrast, most of the alternatively proposed efficient designs are incomplete block designs, the most popular being various deviations of the loop design as first proposed for microarrays by Kerr and Churchill [16]. In these designs, direct connections or hybridizations are typically reserved for the most important treatment comparisons with inference on other comparisons being generally as efficient as any based on the common reference design.

The intent of this review is to reemphasize the use of mixed models as the foundation for statistical analysis of efficient factorial designs for microarrays. Mixed model analysis for microarray data was first proposed by Wolfinger et al. [17]. However, this and other previous expositions on the use of mixed model analysis for microarray data have been primarily directed towards the analysis of completely balanced designs [18, 19] whereas many recently proposed designs for microarray studies are unbalanced with respect to, for example, different standard errors on all pairwise comparisons between treatment groups [10, 13]. We will review various aspects of mixed model analysis for unbalanced designs, including a demonstration on publicly available data from a recent plant genomics study [20].

#### 2. The Connection between Mixed Models and Efficient Designs

Efficient experimental designs are
typically constructed such that their factors can be broadly partitioned into
two categories: *treatment structure* factors
and *design structure* factors [21]. The treatment structure naturally includes
the factors of greatest interest; for example, herbicides, genotypes, tissues, and
so forth, whose effects are deemed to be fixed.
In other words, the levels of these *fixed
effects* factors are specifically chosen by the investigator such that mean
comparisons between such levels, for example, different treatments, are of primary
interest. These factors also include any
of whose levels are consistently reused over different experiments, such as dye
labels, for example, Cy3 versus Cy5, for two-color microarrays. On the other hand, the design structure
primarily includes *random effects* factors, whereby the levels of each such factor are considered to be randomly
chosen from a conceptually infinite set of such levels [22]. For example, the specific arrays used for a
microarray study are considered to be a random sample from a large, perhaps
hypothetically infinite, population of arrays; similar claims would be made
regarding biological replicates, for example, plants, pools thereof, or even
field plots as dependent upon the experimental design [14]. Within each random-effects factor, the
effects are typically specified to be normally, independently, and identically
distributed (NIID) with variability in effects formally quantified by a
variance component (VC).

These design structure or random
effects factors are typically further partitioned into two subcategories:
*blocking* factors and *experimental error* factors. In two-color microarray experiments, arrays
are typically blocking factors as treatments can be directly compared within
arrays, although this is not true for the common reference design as previously
noted. Blocking represents a
longstanding and efficient experimental design strategy for improving precision
of inference on treatment comparisons.
Experimental error factors, such as plants or pooled samples thereof within
treatments, are often necessary to be
included as random effects in order to properly specify
true experimental replication at the biological level rather than merely at the
measurement or technical level. Such
specifications are particularly required when multiple aliquots are derived
from the same biological specimen for use in multiple arrays
[20, 23] or when probes for each gene
transcript are spotted more than once on each array. Of course, plants may also alternatively serve
as blocking factors in some designs if different tissues are compared within
plants.

Currently, there is much software available for microarray data analysis, some of which is only suited for studies having only a treatment structure but no pure design structure. Common examples include the analysis of data generated from single channel systems (e.g., Affymetrix) or of log ratios generated from common reference designs. When no random effects are specified, other than the residuals, the corresponding statistical models are then simply fixed-effects models. Ordinary least squares (OLS) inference is then typically used to infer upon the treatment effects in these studies. OLS is appropriate if the assumption is valid that there is only one composite residual source of variability such that the residuals unique to each observation are NIID.

Conversely,
statistical analysis of efficient two-color experiments having a fully
integrated treatment and design structure needs to account for fixed and random
effects as typical of a *mixed effects* model, more often simply referred to as a mixed model. Generalized least squares (GLS) analysis,
also referred to as mixed-model analysis, has been recognized as optimal in
terms of minimizing variance of estimates for inference on treatment
comparisons. This is true not only for
efficient microarray designs [10, 17, 19, 24] but even for general plant
science and agronomy research [25–27], including recent applications
in plant genomics research [20, 23, 28]. Some of the more recently popular microarray
data analysis software has some mixed model analysis capabilities [29, 30].

Recall that some designs may be characterized by different levels of variability thereby requiring particular care in order to properly separate biological from technical replication, for example. Hence, it is imperative for the data analyst to know how to correctly construct the hypothesis test statistics, including the determination or, in some cases, the estimation of the appropriate degrees of freedom. Although, some of these issues have been discussed for balanced designs by Rosa et al. [19], they have not generally been carefully addressed for the analysis of microarray data generated from unbalanced designs. Optimally constructed experimental designs are often unbalanced with respect to inference on all pairwise treatment comparisons, such that even greater care for statistical inference is required than in completely balanced designs. For example, Wit et al. [13] proposed a method for optimizing two-color microarray designs to compare any number of treatment groups. Suppose that 9 different treatment groups are to be compared. Using the methods and software developed by Wit et al. [13], the recommended interwoven loop design that is optimized for A-optimality (lowest average squared standard errors for a particular arrangement of treatment comparisons) is provided in Figure 1. Although Figure 1 appears to be visually symmetric with respect to the treatment labels, including that all treatment groups are dye balanced, not all treatment groups are directly hybridized against each other. Hence, inferences on all pairwise comparisons between treatment groups will not be equally precise. For example, the standard errors for the inference on treatments R2 versus R8 or R8 versus R24 will not be the same as that for treatments R8 versus S24 or R8 versus M2 due to the differences in the number and/or degree of direct and indirect connections for these two sets of comparisons in Figure 1.

Even for some balanced factorial designs, where the standard errors for comparing mean differences for levels of a certain factor are the same for all pairwise comparisons, the experimental error structure can vary substantially for different factors. That is, substantial care is required in deriving the correct test statistics, particularly with split plot arrangements [14]. Of course, even when a completely balanced design is intended, data editing procedures that delete poor quality spots for certain genes would naturally result in unbalanced designs.

#### 3. Case Study

##### 3.1. Design

Zou et al. [20] present an experiment where three
different inoculate treatments were applied to soybean
(*Glycine max.*) plants 14 days after planting. The three different inoculates included
bacteria inoculation along with the avirulence gene *avrB* thereby conferring resistance (R), bacteria inoculation
without *avrB* thereby conferring
susceptibility (S), and a control group whereby the inoculate simply contained
an MgCl_{2} solution (M). Unfoliated leaves from three to
four plants were drawn and pooled for each treatment at each of three different
times after postinoculation; 2, 8, and 24 hours. Hence, the treatment structure was comprised
of a factorial, that is, 3 inoculates ×3 times, for a total of 9
groups. A 10th group involving a fourth null
inoculate with leaves harvested at 2 hours postinoculation, N2, was additionally
studied by Zou et al. [20]. The
complete dataset on gene expression data for all 27 684
genes represented on a set of three microarray platforms as used by Zou et al. [20] is available as accession
number GSE 2961 from the NCBI gene expression omnibus (GEO) repository (http://www.ncbi.nlm.nih.gov/geo/). The vast majority of the corresponding probes
were spotted only once per array or slide for each platform.

A graphical depiction of the 13 hybridizations that superimposes the design structure upon one replicate of the factorial treatment structure plus the additional 14th hybridization involving the 10th group N2 is illustrated in Figure 2. Note that at least two aliquots per each pooled sample are used, each aliquot being labeled with different dyes such that each replicate pool is used in at least two different hybridizations or arrays with opposite dye assignments. In other words, this design is characterized by technical replication such that it is imperative to explicitly model samples within inoculate by time combination as the biological replicates, that is, a set of random effects for modeling experimental error. Failing to do so would confuse pseudoreplication with true replication in the statistical analysis as each of the 2+ aliquots per each pool would then be incorrectly counted as 2+ different experimental replicates. The design in Figure 2 was replicated twice by Zou et al. [20], the second replication being of the exact same dye assignment and hybridization orientation as the first, for a total of 28 hybridizations. Hence, there were 20 samples (pools of leaves) utilized in the experiment, 2 per each of the 9 inoculate by time treatment groups plus 2 samples for the N2 control.

We
arbitrarily consider gene expression measurements for just one particular gene
based on the GEO submission from Zou et al. [20]: ID_REF #30 located in the metarow-metacolumn-row-column
location 1-1-2-14 of each array from microarray platform GPL 1013, one of three
different platforms used by Zou et al. [20] and further described in GEO. The statistical analysis of any of the
remaining 27 683
genes that were spotted once on each slide across the three different platforms
would be exactly the same as that for ID_REF #30, at least for those genes
where no observations would be edited out for poor data quality. We use the normalized Cy3 and Cy5 data,
provided as fields *S532N* and *S635N* in accession number GSE 2961 for
ID_REF #30 from GEO. Hence, for the 28
hybridizations considered for two replications of
Figure 2, there were 56 fluorescence
intensities (28 Cy3 and 28 Cy5) for each gene. The 56 fluorescence intensities
for ID_REF #30, as retrieved from GSE 2961 in GEO, are reproduced in
Table 4
.

##### 3.2. Statistical Model

For the purposes of this review, we
concentrate our attention just on the subdesign characterized by the solid arrows
in Figure 2 that connect the three primary inoculates (R,S, and M) together
within each of the 3 different times (2, 8, and 24 hours). The
remaining dashed lines in Figure 2 involve either the 10th group (N2) or
connect adjacent times (2 with 8 and 8 with 24) within each of two inoculates (R
and S); note that no hybridizations connecting any of the three times within inoculate
M were provided with GSE 2961 on GEO. Labeling
inoculate type as Factor A and time after inoculation as Factor B, the
resulting subdesign is an example of the “A-loop” design presented by
Landgrebe et al. [9] as illustrated in their Figure
2 (B), albeit for a
factorial treatment structure in their case. In other words, the only direct connections between
the 9 treatment groups within arrays involve comparisons of levels of Factor A
within levels of Factor B. Using the log
intensities as the response variables for further statistical analysis, an
appropriate linear mixed model to specify for this A-loop design would be as
follows: where is the log fluorescence intensity pertaining
to the *l*th biological replicate
assigned to the th inoculate and th time labeled
with the *k*th dye (*k* = 1 or 2), and hybridized to array
within the th time. Here, *μ* is the overall mean, is the effect of the *i*th inoculate, is the effect of the *j*th time, is the interaction effect between the *i*th inoculate and
*j*th
time, and is the effect of the *k*th dye, all of which are defined to be fixed effects. The design structure component of
(1) is
defined by the random effects of for the *l*th pool or biological
replicate (*l* = 1, 2) within the *ij*th inoculate-time combination, for the *m*th array or slide within the *j*th time, and the residual unique to the same subscript identifiers as
that for .
The typical distributional assumptions
in mixed models are such that each of the three sets of random effects are NIID
with their own VC; that is, , ,
and .
As clearly demonstrated by Dobbin et al. [31] and based on our experiences,
dye effects should be modeled in (1), even after using global normalization
procedures such as loess [15], as gene-specific dye effects
are common. Nevertheless, one would not normally anticipate interaction effects
between dye and other treatment factors (e.g., inoculate or time), and hence these
effects are not specified in (1).

It should be somewhat apparent from the A-loop design of Figure 2 why the nesting or hierarchical specifications are specified as such for the random effects. For example, although each pool or replicate is labeled twice, once with each dye, each pool is still part of or nested within the same inoculate by time combination such that samples or replicates are specified to be nested within inoculate by time. Similarly, arrays are nested within times since each array is associated with only one particular level of time; that is, different times are never directly compared or connected within arrays. Hence, one should intuitively recognize from Figure 2 that there would be greater precision for inferring upon inoculate effects than for time effects using the A-loop design. That is, the variability due to arrays is completely confounded with time differences such that it partly defines the experimental unit or replicate for time.

##### 3.3. Classical ANOVA

The complex nature of different levels of replication in the A-loop this design is further confirmed in the classical analysis of variance or ANOVA [21] for this design in Table 1. However, as demonstrated later, classical ANOVA is not necessarily equivalent to a more optimal GLS or mixed model analysis [32]; in fact, estimates of treatment effects based on classical ANOVA are simply equivalent to OLS estimates where all factors are treated as fixed. Nevertheless, the classical ANOVA table, when extended to include expected mean squares (EMS), is instructive in terms of identifying different levels of replication and hence experimental error.

Classical ANOVA is based on equating sums of squares (SS), also called quadratic forms, to their expectations; typically this involves equating mean squares (MS), being SS divided by their degrees of freedom (ν), to their EMS. For completely balanced designs, there is generally one universal manner in which these quadratic forms, and hence the ANOVA table, are constructed [19, 22]. However, for unbalanced designs, such as all of or even just the A-loop component of Figure 2, there are a number of ways of constructing different quadratic forms and hence different ways of constructing ANOVA tables for the same set of data [21, 32]. The most common ANOVA strategy is based on the use of type III quadratic forms as in Table 1 whereby the SS for each factor is adjusted for every other factor in the model. More details on type III and alternative ANOVA quadratic forms for unbalanced data can be found in Milliken and Johnson [21] and Searle [33].

Table 1 conceptually illustrates the basic components of an ANOVA
table; again, for every
term, say X, in a statistical model like (1), there is a sum of squares , degrees of freedom ,
mean square ,
and expected mean square . Generally, ANOVA tests on fixed effects
are of greatest interest; for example, inoculate, time, and inoculate by time
interaction. The correct *F* ratio test
statistic for any fixed effects term in the ANOVA table is constructed such
that its MS and a denominator MS have the same EMS if the null hypothesis is
true; that is, that there are truly no effects for that particular term. In
statistical parlance, no effects for a term , whether that pertains to the
main effects of a factor or the interaction effects between two or more
factors, is synonymous with its corresponding *noncentrality parameter * being equal to zero; that
is, there is no signal due to that model term [32].

Consider,
for example, the test for the main effects of inoculate denoted as Factor A in
Table 1. If the main effects of inoculate
are nonexistent, that is, there are no overall or marginal mean differences
between any of the inoculates, then . It should be clearly
noted that when , the EMS for inoculate matches
with the EMS for replicate within inoculate
and time, denoted as rep(inoculate^{*}time) in Table 1. In other words, rep(inoculate^{*}time) is said
to be the denominator or *error* term
for the main effects of inoculate such that rep(inoculate^{*}time) defines the
experimental unit or the biological replicate for inoculate effects. Hence, the correct *F* statistic for testing inoculate effects, as demonstrated from
Table 1, is
based
on numerator and denominator degrees of freedom. It
should also be observed that this same error term or experimental unit would be
specified as the denominator MS term for the ANOVA -test on inoculate by time
interaction effects, denoted as inoculate^{*}time in Table 1. That is, when the corresponding noncentrality
parameter , both inoculate^{*}time
and rep(inoculate^{*}time) share the same EMS such that the correct *F* statistic for
testing this interaction is based on numerator and denominator degrees of freedom.

It was previously noted from the
A-loop design of Figure 2 that inference on the main effects of time (Factor B)
should be less precise than that for the main effects of inoculate. In other
words, the size of the experimental unit should be larger for time effects since
arrays are nested within levels of time whereas levels of inoculate treatments
are directly compared within arrays. This is further demonstrated in
Table 1 by
the EMS for time with , being larger than that
for inoculate effects with , under the
corresponding true null hypotheses of no main effects for either factor. In
fact, the experimental error term for time is composite of both rep(inoculate^{*}time)
and arrays(time) such that marginal mean comparisons between the three times,
2, 8, and 24 hours, will be affected by more noise than marginal mean comparisons
between the three inoculates which were directly and indirectly connected
within arrays.

Note
that under the null hypothesis of no time effects , there is no one other
MS that shares the same EMS that would allow one to readily construct an ANOVA *F*-statistic for the main effects of time.
Satterthwaite [34] provided a solution to this
problem by proposing the “synthesis” of a denominator *MS*, call it , as being
a linear combination of *q* random
effects MS:
where are known coefficients such that has the same expectation as that for a certain model term having mean square under the null hypothesis
. Then
is approximately distributed as a random variable from a central *F* distribution with
numerator and denominator degrees of freedom, where with denoting .

In
our example, consider the synthesized
as being a linear combination of the
MS for rep(inoculate^{*}time), array(time), and residual. With reference to (2), is then a linear function of different MS with
, , and . Using the EMS for these three MS provided from
Table 1 as , ,
and ,
respectively, it should be readily seen that the expectation of
is then That is, shares the same EMS as that for time in
Table 1 when . Hence, a suitable *F* statistic
for inferring upon the main effects of time would be .

To
help further illustrate these concepts, let us conduct the ANOVA on the data
generated from the A-loop design of Figure 2 for ID_REF #30 from Zou et al. [20]; that is, using data from arrays
1–9 and 15–23 as provided in Table 4.
The classical ANOVA table using the
*method=type3* option of the popular
mixed-model software SAS PROC MIXED [35] for that particular gene is
provided in Table 2; SAS code for
all statistical analysis presented in this
paper is provided in Figure 3
and also available for download, along with the
data in Table 4, from http://www.msu.edu/~tempelma/ijpg2008.sas. As noted previously, the correct denominator
MS term for testing the main effects of inoculate is replicate within inoculate
by time. Hence, the corresponding *F* statistic =
= ,
with numerator and denominator degrees
of freedom leading to a *P*-value of 0.1172. Similarly, for the inoculate^{*}time
interaction, the appropriate F-test statistic is
= ,
with numerator and denominator degrees of freedom
leading to a *P*-value of 0.3435. Even without considering the control of
false discovery rates (FDRs) that involve the joint control of type I errors with
respect to the remaining 27 683
genes, it seems apparent that neither the main effects of inoculate nor the
interaction between inoculate and time would be statistically significant for
gene ID_REF #30.

The
synthesized denominator
for time
effects is . The estimated degrees of
freedom for this synthesized MS using (3) is then Hence, the main
effects of time, appropriate *F*-test
statistic is =
,
with numerator and denominator degrees of freedom leading to a *P*-value
of 0.0683 as also reported in the SAS
output provided in Table 2.

##### 3.4. Mixed Model Analysis

Although the classical ANOVA table is indeed instructive in terms of illustrating the different levels of variability and experimental error, it is not the optimal statistical analysis method for a mixed effects model, especially when the design is unbalanced. A mixed-model or GLS analysis more efficiently uses information on the design structure (i.e., random effects) for inferring upon the fixed treatment structure effects [27, 32].

Unfortunately, GLS, in spite of its optimality properties, is generally not attainable with real data because the VC (e.g., , , and ) must be known. Hence, the VC must generally be estimated from the data at hand. There are a number of different methods that are available for estimating VC in mixed models [22]. The classical ANOVA method is based on equating MS to their EMS in the ANOVA table. For example, using the bottom row of Table 1, the EMS of MSE is . So then using the numerical results for ID_REF #30 from Table 2, the type III ANOVA estimate of is simply . Now work up one row further in Table 1 to the term array(time). Equating from the same corresponding row in Table 2 to its EMS of using gives . Finally, work up one more (i.e., third to last) row in both tables. Equating from Table 2 to its EMS of using leads to . So array variability is estimated to be roughly four times larger than the biological variability which, in turn, is estimated to be somewhat larger than residual variability for ID_REF #30.

Recall that with unbalanced designs, quadratic forms are not unique such that ANOVA estimators of VC will not be unique either. Nevertheless, type III quadratic forms are most commonly chosen as then the SS for each term is adjusted for all other terms, as previously noted. Although ANOVA estimates of VC are unbiased, they are not efficient nor optimal in terms of estimates having minimum standard error [25]. Restricted maximum likelihood (REML) is a generally more preferred method of VC estimation [22, 36, 37] and is believed to have more desirable properties. Nevertheless, the corresponding REML estimates , and for ID_REF #30 are in some qualitative agreement with the previously provided ANOVA estimates.

Once
the VCs are
estimated, they are substituted for the true unknown VCs to provide the
“estimated” GLS or EGLS of the fixed
effects. It is important to note that typically EGLS = GLS for balanced designs,
such that knowledge of VC is somewhat irrelevant for point estimation of treatment
effects. However, the same is generally not true for unbalanced designs, such
as either the A-loop design derived from Figure 2 or even the interwoven loop
design from Figure 1. Hence, different methods of VC estimation could lead to
different EGLS estimates of treatment effects as we demonstrate later. Suppose that
it was of interest to compare the various mean responses of various inoculate
by time group combinations in the duplicated A-loop design example. Based on
the effects defined in the statistical model for this design in (1), the true
mean response for the *i*th inoculate
at the *j*th time averaged across the
two dye effects ( and ) can be written as If the levels
are, say, ordered alphanumerically, the mean difference between inoculate and at time (2
hours) is specified as . Using (6), this difference written as a
function of the model effects is then = .
Similarly, the mean difference between time (2 hours) and
time (8 hours) for inoculate could be derived as Note that these two comparisons or *contrasts* can be more elegantly written using
matrix algebra notation. A better understanding of contrasts is useful to help determine
the correct standard errors and statistics used to test these contrasts,
including how to write the corresponding SAS code. Hence, a matrix algebra approach to hypothesis
testing on contrasts is provided in Appendix 5 that complements the SAS code
provided in Figure 3. For now, however, we simply use the “hat” notation in referring to the EGLS estimates of these two contrasts as and respectively.

As
we already intuitively noted from the A-loop design of
Figure 2, inference on should be much more precise than that
for since inoculates are compared within arrays
whereas times are not. This distinction should
then be reflected in a larger standard error for than for .
Indeed, using the REML estimates of VC for EGLS inference, this is demonstrated
by whereas for ID_REF #30. However, these
standard errors are actually slightly understated since they do not take into
account the uncertainty of the VC estimates as discussed by Kackar and Harville
[38]. Kenward and Roger [39] derive a procedure to take this
uncertainty into account which is part of the SAS PROC MIXED implementation using
the option *ddfm=kr* [35] as indicated in
Figure 3. Invoking this option raises the two standard
errors accordingly, albeit very slightly, to and .

Now, the denominator degrees of
freedom for inference on these two contrasts should also differ given that the
nature of experimental error variability somewhat differs for inoculate
comparisons as opposed to time comparisons as noted previously from
Figure 2. However, with EGLS, there are no SS and hence
no corresponding MS or EMS expression for each main effects or interaction term
in the model, such that determining the correct test statistic and degrees of
freedom is somewhat less obvious than with the previously described classical
ANOVA approach [32]. Giesbrecht and Burns [40] introduced a procedure for
estimating the denominator degrees of freedom for EGLS inference which, again,
is invoked with the *ddfm=kr* option of SAS PROC MIXED. Using this option along with REML estimation
of VC for the analysis of ID_REF #30, the estimated degrees of freedom for is 5.28 whereas that for is 17.0.

Contrasts
are also used in EGLS to provide ANOVA-like *F* tests for the overall importance of various fixed effects; more details based
on the specification of contrast matrices to test these effects are provided in
Appendix 5. For example, denote the marginal or overall mean of inoculate
*i* averaged across the 3 times and 2 dyes
as . The numerator degrees of freedom hypothesis test for the main effects of inoculates
can be written as a combination of two complementary contrasts (A1) and (A2) ; that is, if both contrasts
are 0, then obviously is also true such that then is true. Similarly, let us suppose
that one wished to test the main effects of times (Factor B). Then, it could be readily demonstrated that
the corresponding hypothesis test can also be written as a combination of complementary contrasts: (B1) and (B2) , where denotes the marginal mean for the *j*th
level of Factor B; that is, the *j*th
time. If both component hypotheses (B1) and (B2) are true, then is also true thereby defining the composite
numerator degrees of freedom hypothesis test for the main effects of
Factor B.

Now the interaction between inoculate and time is a numerator degrees of freedom test as previously noted from Tables 1 and 2, suggesting that there are 4 complementary contrasts that jointly test for the interaction of the two factors. Of course, it is also well known that the interaction degrees of freedom is typically the product of the main effects degrees of freedom for the two factors considered. Two of the four degrees of freedom for the interaction involve testing whether or not the mean difference between inoculates 1 and 3 is the same within time 1 as it is within time 3, that is, (AB1) , and whether or not the mean difference between inoculates 2 and 3 is the same within time 1 as it is within time 3; that is, (AB2) . If both hypotheses (AB1) and (AB2) are true then it should be apparent that is also true; that is, the mean difference between inoculates 1 and 2 is the same within time 1 as it is within time 3. The remaining two degrees of freedom for the interaction involve testing whether or not the mean difference between inoculates 1 and 3 is the same within time 2 as it is within time 3; that is, (AB3) , and whether or not the mean difference between inoculates 2 and 3 is the same within time 2 as it is within time 3; that is, (AB4) . If both hypotheses (AB3) and (AB4) are true then is also true. Hence, contrasts AB1, AB2, AB3, and AB4 completely define the four components or numerator degrees of freedom for the interaction between Factors A and B. That is, the test for determining whether or not the mean differences between all levels of A are the same within each level of B, and vice versa, can be fully characterized by these four complementary contrasts.

The
EGLS statistics used for testing the overall importance of these main effects
or interactions are approximately distributed as *F*-random variables with the numerator degrees of freedom defined
by the number of complementary components or contrasts as previously described; refer to
Appendix 5 and elsewhere
[27, 32, 35] for more details. Now, the denominator degrees of freedom for
each contrast are dependent upon the design and can be determined based on that
using classical ANOVA as in Table 1 or by a multivariate extension of the
Satterthwaite-based procedure proposed by Fai and Cornelius [41]; again this option is
available as *ddfm=kr* using SAS PROC MIXED
(Figure 3).

Unfortunately,
much available software used for mixed model analysis of microarray data does
not carefully take into consideration that various fixed effects terms of
interest may have different denominator degrees of freedom when constructing *F* test statistics. In fact, a typical strategy of such software
is to assume that (i.e., the residual degrees of freedom) is the denominator degrees of freedom
for all tests. This strategy is denoted as the “residual” method for determining
denominator degrees of freedom by Spilke et al. [36] who demonstrated using
simulation work that the use of the
residual method can substantially inflate type I error rate for EGLS inference
on fixed effects; in other words, the number of false-positive results or genes incorrectly declared to
be differentially expressed between treatments would be unduly increased. Spilke et al. [36] further demonstrated that use
of the Kenward-Rogers’ method for degrees of freedom estimation and control for
uncertainty on VC provided best control of the nominal confidence interval coverage
and type I error probabilities.

##### 3.5. Impact of Method of Variance Component Estimation on EGLS

It was previously noted that the estimated standard errors for EGLS on two contrasts and were and , respectively, when REML was used to estimate the variance components for ID_REF #30. If the VC estimates are computed using type III ANOVA, then these estimated standard errors would differ accordingly; that is, and , respectively. What perhaps is even more disconcerting is that the estimates of and also differ between the two EGLS inferences; for example, using REML, and whereas using ANOVA and .

The overall EGLS tests for ID_REF #30 for testing
the main effects of inoculate, time and their interaction as based on the previously
characterized complementary contrasts are provided separately for ANOVA versus
REML estimates of VC in Table 3; this output is generated as type III tests using
the SAS code provided in Figure 3. From here, it should be clearly noted that conclusions upon the overall
importance of various fixed effects terms in
(1) as derived from EGLS inference
subtly depend upon the method of VC estimation; for example, the EGLS *P*-values
in Table 3 tend to be several points smaller using ANOVA compared to REML; furthermore,
note the differences in the estimated denominator degrees of freedom between
the two sets. Naturally, this begs the
question as to which method of VC estimation should be used?

In
completely balanced designs, ANOVA and REML lead to identical estimates of VC and
identical EGLS inference, provided that all ANOVA estimates of VC are positive. ANOVA estimates of VC that are negative are
generally constrained by REML to be zero, thereby causing a “ripple” effect on
REML estimates of other VC and subsequently on EGLS inference [42]. As noted previously, REML
does tend to outperform most other methods for various properties of VC
estimation [37]. Furthermore,
there is evidence that EGLS based on ANOVA leads to poorer control of type I
error rate for inference on fixed effects compared to EGLS based on REML in
unbalanced data structures [36]. However, Stroup and Littell
[42] concluded that EGLS using
REML may sometimes lead to inference on fixed effects that is too conservative
(i.e., actual error rates less than nominal type I error rate) again due to the
nonnegative REML restrictions on the VC estimates and associated ripple effects.
This issue warrants further study given that it has implications for control of
FDR which are most commonly used to control the rate of type I errors in
microarray studies [43]. Estimation of FDR inherently
depends upon the distribution of *P*-values for treatment effects across
all genes such that even mild perturbations on this distribution have potential
bias implications for control of false-positive rates.

#### 4. Other Issues for the Design Analysis Interface

##### 4.1. Log Ratio versus Log Intensity Modeling

Recent work on the optimization and comparison of various efficient microarray designs have been based on the assumption of OLS inference; that is, no random sources of variability other than residuals are considered [2, 8, 9, 13]. While this observation may seem to be counterintuitive given that the arguments laid out in this review for the need of (E)GLS to analyze efficient designs, it is important to note at least a couple of things. First, virtually all of the work on design optimization has been based on the assumption that a sample or pool is used only once; the corresponding interwoven loop designs in such cases [13] have been referred to as classical loop designs [10, 19]. However, sometimes two or more aliquots from each sample are used in separate hybridizations [20, 23] such as the A-loop design, example used in this review; the corresponding designs are connected loop designs [10, 19] that require the specification of random biological replicate effects separate from residual effects as previously noted.

Secondly,
almost all of the design optimization work has been based on the use of Cy3/Cy5
log ratios as the response variables rather than dye-specific log intensities as
used in this review. This data
reduction, that is, from two fluorescence intensities to one ratio per spot on
an array, certainly eliminates array as a factor to specify in a linear model. However, the use of log ratios can severely
limit estimability and inference efficiency of certain comparisons. Suppose that instead of using the 36 log intensities
from the duplicated A-loop design from arrays 1–9 and 15–23 of
Table 4, we used
the derivative 18 Cy3/Cy5 log ratios as the response variables. For example, the two corresponding log_{2}Cy3 and Cy5 fluorescence intensities for array 1 from
Table 4 are 13.9946 and
14.3312. The Cy3/Cy5 log ratio is then the difference or 0.3316 corresponding
to a fold change of . Using log ratios as their
response variables, Landgrebe et al. [9] concluded that it was
impossible to infer upon the main effects of Factor B (e.g., time) in the
A-loop design. However, as we demonstrated earlier, it is possible to infer
upon these effects using ANOVA or EGLS analysis on the log intensities. Jin et al.
[18] similarly illustrate the
utility of log intensity analysis in a split plot design that would not
otherwise have been possible using log ratios.
Milliken et al. [14] provide much more extensive mixed
modeling details on the utility of log intensity analysis in nested or
split-plot microarray designs
similar to the A-loop design.

The relative efficiency of some designs may be seen to depend upon the relative magnitude of biological to technical variation [10, 44]; sometimes it is only possible to separately estimate these two sources of variability using log intensities rather than log ratios thereby requiring the use of (E)GLS rather than OLS. In fact, analysis of log intensities using mixed effects model appears to be not only more flexible than log-ratio modeling but is statistically more efficient in recovering more data information [1, 45]. That is, as also noted by Milliken et al. [14], treatment effects are more efficiently estimated by combining intraarray and interarray information in a mixed model analysis when an incomplete block design is used, and arrays are explicitly included as random effects by analyzing log intensities rather than log ratios.

##### 4.2. Choosing between Efficient Experimental Designs Using Mixed Models

There are a number of different criteria that might be used to choose between different designs for two-color microarrays. We have already noted that the interwoven loop design in Figure 1 is A-optimal for pairwise comparisons between 9 treatment groups. A-optimality has been criticized for microarray studies because it chooses designs with improved efficiency for certain contrasts at the expense of other perhaps more relevant contrasts and further depends upon the parameterization of the linear model [1, 6, 9]; other commonly considered types of optimality criteria are possible and further discussed by Wit et al. [13] and Landgrebe et al. [9]. At any rate, it is somewhat possible to modify A-optimality to explicitly take into account a particular set of scientific questions [13]; furthermore, optimization with respect to one criterion will generally be nearly optimal for others.

For one particular type of optimality criterion, Landgrebe et al. [9] demonstrated that the A-loop design has the best relative efficiency compared to other designs for inference on the main effects of Factor A and the interaction effects between A and B although the main effects of Factor B could not be inferred upon using an analysis of log ratios as previously noted. How does the A-loop design of Figure 2 generally compare to the interwoven loop design of Figure 1 if a factorial treatment structure is imposed on the 9 treatments as implied by the same labels as used in Figure 2? Suppose that Figure 1 is a connected interwoven loop design [10] in the sense that the outer loop of Figure 1 (dashed arrows) connects one biological replicate for each of 9 groups whereas the inner loop of Figure 1 (solid arrows) connects a second biological replicate for each of the 9 groups. Then this design would consume 18 biological replicates and 18 arrays, thereby providing a fair comparison with the duplicated A-loop design of Figure 2.

Recall that Figure 1 is A-optimized for pairwise comparisons between all 9 groups. It is not quite clear what implications this might have for statistical efficiency for the constituent main effects of , , and the effects of their interaction ; note, incidentally, that these degrees of freedom independently sum to 8 as required for 9 groups. As duly noted by Altman and Hua [1], pairwise comparisons between all 9 groups may be not as important as various main effects or interaction contrasts with a factorial treatment structure arrangement. Although, as noted earlier, Figure 1 is symmetric with respect to the treatment labels, the classical ANOVA table for this interwoven loop design would be even more complicated (not shown) than that presented for the A-loop design since there is not one single denominator MS that would serve as the experimental error term for inoculate, time or inoculate by time effects!

One should perhaps compare two alternative experimental designs having the same factorial treatment structure, but a different design structure, for contrasts of highest priority, choosing those designs where such contrasts have the smaller standard error. Let us consider the following comparisons: , , and that is, respectively, the overall mean difference between inoculates 1 and 3, the overall mean difference between times 1 and 3, and the interaction component pertaining to the difference between inoculates 1 and 3 within time 1 versus that same difference within time 3. Recall that these contrasts were components of the EGLS tests on the two sets of main effects and the interaction and previously labeled as (A1), (B1), and (AB1), respectively.

Now the comparison of efficient designs for the relative precision of various contrasts will generally depend upon the relative magnitude of the random effects VC as noted recently by Hedayat et al. [44] and for various microarray design comparisons [10]. Suppose the “true” variance components for , , and were 0.03, 0.06, and 0.25, comparable to either set of estimates provided previously on ID_REF #30 from Zou et al. [20]. The linear mixed model for analyzing data generated from Figure 1 would be identical to that in (1) except that arrays would no longer be specified as being nested within times. For the interwoven loop design of Figure 1, the standard errors for each of the three contrasts are , , and whereas for the A-loop subdesign of Figure 2, the corresponding standard errors are , , and . So whereas the optimized design in Figure 1 using Wit et al. [13] provided a substantial improvement for the estimation of overall mean time differences, the A-loop design is indeed more efficient for inferring upon the main effects of inoculate and the interaction between inoculate and time. Hence, the choice between the two designs would reflect a matter of priority for inference on the various main effects and their interactions. It should be carefully noted as demonstrated by Tempelman [10], that designs leading to lower standard errors for certain comparisons do not necessarily translate to greater statistical power as the denominator degrees of freedom for various tests may be substantially different between the two designs.

##### 4.3. Unbalanced Designs and Shrinkage Estimation

Shrinkage or empirical Bayes (EB) estimation is known to improve statistical power for inference on differential gene expression between treatments in microarray experiments [46]. Shrinkage-based estimation is based on the well-established hierarchical modeling concept that more reliable inferences on gene-specific treatment differences are to be attained by borrowing information across all genes [47, 48]. Typically, such strategies have involved improving estimation of standard errors of gene-specific treatment differences by “shrinking” gene-specific variances towards an overall mean or other measure of central tendency. However, most shrinkage estimation procedures have been developed for fixed effects models, that is, for simple experimental designs having a treatment structure but no or very limited design structure, or even treating all design structure factors as fixed [30]. Currently popular shrinkage estimation procedures [49–51] are certainly appropriate for many designs based on one-color Affymetrix systems or for common reference designs. Other proposed shrinkage procedures have facilitated extensions to very special cases of nested designs [47], including some based on rather strong modeling assumptions such as a constant correlation of within-array replicate spots across all genes [52] or a design structure facilitating the use of permutation testing [29]. However, virtually none of the procedures proposed thus far are well adapted to handle unbalanced designs such as the A-loop design where different sizes of experimental units need to be specified for different treatment factors; hence investigators should proceed with caution when using shrinkage estimation for unbalanced mixed-model designs.

#### 5. Conclusions

We have provided an overview of the use of mixed linear model analysis for the processing of unbalanced microarray designs, given that most efficient incomplete block designs for microarrays are unbalanced with respect to various comparisons. We strongly believe that much mixed-model software currently available for the analysis of microarrays does not adequately address the proper determination of error terms and/or denominator degrees of freedom for various tests. This would be particularly true if we had chosen to analyze all of the data for ID_REF #30 in Table 4 from Zou et al. [20] based on all of the hybridizations depicted in Figure 2. Even then, the size of the standard errors and estimated degrees of freedom would still be seen to be somewhat different for estimating the main effects of times compared to estimating the main effects of inoculates given the lower degree of within-array connectivity between the various levels of time as illustrated in Figure 2. If inferences on various comparisons of interest are not conducted correctly in defining a list of differently expressed genes, all subsequent microarray analysis (e.g., FDR estimates, gene clustering, gene class analysis, etc.) are absolutely futile.

We believe that it is useful to choose proven mixed-model software (e.g., SAS) to properly conduct these tests and, if necessary, to work with an experienced statistician in order to do so. We have concentrated our attention on the analysis of a particular gene. It is, nevertheless, straightforward to use SAS to serially conduct mixed-model analysis for all genes on a microarray [53]; furthermore, SAS JMP GENOMICS (http://www.jmp.com/software/genomics/) provides an even more powerful user interface to the mixed model analysis of microarray data.

#### Appendix

#### C. Matrix Representation of The Mixed Model Analysis of The A-loop Design of Zou Et Al.

Any mixed model,
including that specified in (1), can be written in matrix algebra formml: Here ** is the vector of all data, **** is the vector of all fixed effects (e.g.,
inoculate, time, dye, and inoculate by time interaction effects), ****u** is the vector of all random effects
(e.g., arrays and sample within inoculate by time effects), and is the vector of random residual effects. Furthermore, **X** and **Z** are
corresponding incidence matrices that specify the treatment and design
structure of the experiment, thereby linking the treatment and design effects, ** and ****u**,
respectively, to **y**. Note that **y** has a dimension of for the
duplicated A-loop design of Zou et al. [20]. Now ** and ****u** can be further partitioned into the effects as specified in
(1); for our
example,
such that ** is
a vector of fixed effects; that is, there are 18 elements
in (A.2).
Furthermore, can be similarly partitioned into a
vector of random effects, ,
for replicates within inoculate by time and another vector of random
effects, ,
for arrays within time; that is, there are a total of 18 biological replicates
and 18 arrays in the study, each characterized by a random effect. Note that it
is coincidence that the row dimensions of ****, ,
and are all 18 for this particular example design.**

Again,
the distributional assumptions on the random and residual effects are specified
the same as in the paper but now written in matrix algebra notation: , ,
and with ** denoting a vector of zeros and denoting an identity matrix of dimension
. Reasonably assuming that and are pairwise independent of each other
(i.e., biological sample effects are not influenced by array effects and vice
versa), then the variance-covariance matrix ****G** of **u** is a
diagonal matrix with the first 18 diagonal elements being and the remaining 18 diagonal elements
being . The GLS estimator, ,
of ** can
be written [22, 32] as with its
variance-covariance matrix defined by such that denotes the generalized inverse of .**

Once
the VC are estimated, they are substituted for the true unknown VC in **V** to
produce which are then used to provide the “estimated”
GLS or EGLS ,
of **: As noted in the
text, typically (i.e., EGLS = GLS) for balanced designs
but not necessarily for unbalanced designs, such as those depicted in
Figures 1
or 2.**

It
was previously noted in the paper that the mean difference between inoculate and at time as could be written as a function of the model effects in (1)
as . Similarly, the mean difference between time and time for inoculate *i* could be written as . These two contrasts written in matrix
notation as and ,
respectively, where are *contrast vectors* whose coefficients
align in order with the elements of ** in (A.2).
For example, note from (A.6) that the nonzero coefficients of 1, 1, 1,
and 1 occur within the 2nd, 3rd, 8th, and 11th positions of ,
respectively. When these coefficients
are multiplied in the same order with the 2nd, 3rd, 8th, and 11th elements of **** provided in (A.2), one gets which is indeed = as specified previously. The
reader should be able to make a similar observation for in considering how the nonzero elements of (A.7)
align in position with elements of in (A.2) to produce . In Figure 3, SAS PROC MIXED is used to
provide the estimates, standard errors, and test statistics for these two
contrasts. That is, note how all of the
elements from (A.6) and (A.7) are completely reproduced in the ***estimate* statements as *“k1 contrast”* and *“k2 contrast,”*
respectively, in Figure 3.

Now,
when the VC are known, these two contrasts can be estimated by their GLS, and . Furthermore, using (A.4), the true standard
errors of these two estimates can be determined as and ,
respectively. However, as previously
noted, the VC are generally not known but must be estimated from the data such
that the two contrasts are typically estimated using and with approximate standard errors determined by and . Using the REML estimates of VC as provided
in the paper, the code from Figure 3 can be executed to provide whereas for ID_REF #30 by simply changing *method=type3* to *method=reml* and by deleting
*ddfm=kr*.
However, these standard errors are actually slightly understated since
they do not take into account the uncertainty of the VC or as an estimate of
** as discussed by Kackar and Harville
[38].**

Kenward
and Roger [39] derive a procedure to take
this uncertainty into account and which is part of the SAS PROC MIXED
implementation using the *ddfm=kr* option [35] as specified in
Figure 3. Invoking this option raises the two
standard errors accordingly, albeit very slightly, to and . Furthermore, the *ddfm=kr* option
invokes the procedure of Giesbrecht and Burns [40] to estimate the denominator
degrees of freedom for EGLS inference. Using this option and REML, the estimated degrees of freedom for is 5.28 whereas that for is 17.0 as would be noted from executing the SAS code in
Figure 3. The
corresponding SAS output will furthermore include the *t*-test statistics for the two contrasts as and .
These statistics when compared to their Student
distributions with their respective estimated degrees of freedom,
5.28 and 17.0, lead to *P*-values of 0.66
and 0.37, respectively; that is, there is no evidence that either contrast is
statistically significant.

*Contrast matrices* on can be used to derive ANOVA-like
*F* tests for the overall importance of
various fixed effects using EGLS. Recall
from the paper that the test for the main effects of inoculates can be written
as a joint function of contrasts and where with μ* _{ij.}* is defined as in (6). These two contrasts, labeled as (A1) and (A2) in the
paper, can be jointly written together as a linear function of the elements of

**in (A.2), where For example, the first row of specifies the coefficients for testing as a function of the elements of**

**using (6). In other words, matching up, in order, the first row of in (A.8) with the elements of**

**in (A.2), the corresponding contrast can be rewritten as . Similarly, the second row of in (A.8) specifies the contrast coefficients for as a function of the elements of**

**.**

Recall that the main effects of
times (Factor B) involves a joint test of contrasts and labeled as (B1) and (B2) in the paper, where . In terms of the elements of ** in (A.2), these two contrasts are jointly
specified as withThat is, (A.9) is
another 2 18 contrast matrix, just like ,
where the two rows of specify the coefficients for the contrasts and ,
respectively, as a function of the elements of
in
(6).**

Recall that the interaction between
the effects of inoculates and times was numerator degrees of freedom test based on jointly testing four complementary
and independent contrasts, suggesting that there are four rows that determine
the corresponding contrast matrix. The complete interaction contrast can then be written as where Note that the 4
rows in (A.10) specify contrast coefficients on the model effects for each of
the 4 constituent component hypotheses, (AB1), (AB2), (AB3), and (AB4) as
defined in the paper, when aligned with the coefficients of ** in
(A.2). As a sidenote, the somewhat
uninteresting contrast for dye effects could be written using a contrast vector (not shown) in order to test the overall mean
difference between the two dyes.**

The
EGLS test statistic for testing the overall importance of any fixed effects
term, say , is specified as . Here
is distributed as an *F*-random variable under with the numerator
degrees of freedom being defined by the number of rows of the contrast matrix [27, 32, 35]. The denominator degrees of
freedom for each contrast is dependent upon the design and can be determined
based on that using classical ANOVA as in Table 1
or a multivariate extension
of the Satterthwaite-based procedure from Giesbrecht and Burns
[40] as proposed by Fai and Cornelius
[41]; again this option is
available as *ddfm=kr* using SAS PROC MIXED
(Figure 3). The corresponding EGLS ANOVA output for
ID_REF #30, based on either ANOVA or REML estimation of VC, is provided in
Table 3.

#### Acknowledgment

Support from the Michigan Agricultural Experiment Station (Project MICL 1822) is gratefully acknowledged.

#### References

- N. S. Altman and J. Hua, “Extending the loop design for two-channel microarray experiments,”
*Genetical Research*, vol. 88, no. 3, pp. 153–163, 2006. View at Publisher · View at Google Scholar - F. Bretz, J. Landgrebe, and E. Brunner, “Design and analysis of two-color microarray experiments using linear models,”
*Methods of Information in Medicine*, vol. 44, no. 3, pp. 423–430, 2005. View at Google Scholar - J. S. S. Bueno Filho, S. G. Gilmour, and G. J. M. Rosa, “Design of microarray experiments for genetical genomics studies,”
*Genetics*, vol. 174, no. 2, pp. 945–957, 2006. View at Publisher · View at Google Scholar - F.-S. Chai, C.-T. Liao, and S.-F. Tsai, “Statistical designs for two-color spotted microarray experiments,”
*Biometrical Journal*, vol. 49, no. 2, pp. 259–271, 2007. View at Publisher · View at Google Scholar - K. Dobbin, J. H. Shih, and R. Simon, “Questions and answers on design of dual-label microarrays for identifying differentially
expressed genes,”
*Journal of the National Cancer Institute*, vol. 95, no. 18, pp. 1362–1369, 2003. View at Publisher · View at Google Scholar - G. F. V. Glonek and P. J. Solomon, “Factorial and time course designs for cDNA microarray experiments,”
*Biostatistics*, vol. 5, no. 1, pp. 89–111, 2004. View at Publisher · View at Google Scholar - S. Gupta, “Balanced factorial designs for cDNA microarray experiments,”
*Communications in Statistics: Theory and Methods*, vol. 35, no. 8, pp. 1469–1476, 2006. View at Publisher · View at Google Scholar - K. F. Kerr, “Efficient ${2}^{\text{k}}$ factorial designs for blocks of size 2 with microarray applications,”
*Journal of Quality Technology*, vol. 38, no. 4, pp. 309–318, 2006. View at Google Scholar - J. Landgrebe, F. Bretz, and E. Brunner, “Efficient design and analysis of two colour factorial microarray experiments,”
*Computational Statistics & Data Analysis*, vol. 50, no. 2, pp. 499–517, 2006. View at Publisher · View at Google Scholar - R. J. Tempelman, “Assessing statistical precision, power, and robustness of alternative experimental designs for two color microarray platforms based on mixed effects models,”
*Veterinary Immunology and Immunopathology*, vol. 105, no. 3-4, pp. 175–186, 2005. View at Publisher · View at Google Scholar - S.-F. Tsai, C.-T. Liao, and F.-S. Chai, “Statistical designs for two-color microarray experiments involving technical replication,”
*Computational Statistics & Data Analysis*, vol. 51, no. 3, pp. 2078–2090, 2006. View at Publisher · View at Google Scholar - V. Vinciotti, R. Khanin, D. D'Alimonte et al., “An experimental evaluation of a loop versus a reference design for two-channel microarrays,”
*Bioinformatics*, vol. 21, no. 4, pp. 492–501, 2005. View at Publisher · View at Google Scholar - E. Wit, A. Nobile, and R. Khanin, “Near-optimal designs for dual channel microarray studies,”
*Journal of the Royal Statistical Society: Series C*, vol. 54, no. 5, pp. 817–830, 2005. View at Publisher · View at Google Scholar - G. A. Milliken, K. A. Garrett, and S. E. Travers, “Experimental design for two-color microarrays applied in a pre-existing split-plot experiment,”
*Statistical Applications in Genetics and Molecular Biology*, vol. 6, no. 1, article 20, 2007. View at Publisher · View at Google Scholar - Y. H. Yang, S. Dudoit, P. Luu et al., “Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation,”
*Nucleic Acids Research*, vol. 30, no. 4, pp. 1–10, 2002. View at Publisher · View at Google Scholar - M. K. Kerr and G. A. Churchill, “Statistical design and the analysis of gene expression microarray data,”
*Genetical Research*, vol. 77, no. 2, pp. 123–128, 2001. View at Publisher · View at Google Scholar - R. D. Wolfinger, G. Gibson, E. D. Wolfinger et al., “Assessing gene significance from cDNA microarray expression data via mixed models,”
*Journal of Computational Biology*, vol. 8, no. 6, pp. 625–637, 2001. View at Publisher · View at Google Scholar - W. Jin, R. M. Riley, R. D. Wolfinger, K. P. White, G. Passador-Gurgell, and G. Gibson, “The contributions of sex, genotype and age to transcriptional variance in
*Drosophila melanogaster*,”*Nature Genetics*, vol. 29, no. 4, pp. 389–395, 2001. View at Publisher · View at Google Scholar - G. J. M. Rosa, J. P. Steibel, and R. J. Tempelman, “Reassessing design and analysis of two-colour microarray experiments using mixed effects models,”
*Comparative and Functional Genomics*, vol. 6, no. 3, pp. 123–131, 2005. View at Publisher · View at Google Scholar - J. Zou, S. Rodriguez-Zas, M. Aldea et al., “Expression profiling soybean response to
*Pseudomonas syringae*reveals new defense-related genes and rapid HR-specific downregulation of photosynthesis,”*Molecular Plant-Microbe Interactions*, vol. 18, no. 11, pp. 1161–1174, 2005. View at Publisher · View at Google Scholar - G. A. Milliken and D. E. Johnson,
*Analysis of Messy Data, Volume I: Designed Experiments*, Wadsworth, Belmont, Calif, USA, 1984. - S. R. Searle, G. Casella, and C. E. McCulloch,
*Variance Components*, John Wiley & Sons, New York, NY, USA, 1992. - M. Vuylsteke, F. van Eeuwijk, P. Van Hummelen, M. Kuiper, and M. Zabeau, “Genetic analysis of variation in gene expression in
*Arabidopsis thaliana*,”*Genetics*, vol. 171, no. 3, pp. 1267–1275, 2005. View at Publisher · View at Google Scholar - X. Cui and G. A. Churchill, “How many mice and how many arrays? Replication in mouse cDNA microarray experiments,” in
*Methods of Microarray Data Analysis III*, S. M. Lin and K. F. Johnson, Eds., pp. 139–154, Kluwer Academic Publishers, Norwell, Mass, USA, 2003. View at Google Scholar - H. P. Piepho, A. Büchse, and K. Emrich, “A hitchhiker's guide to mixed models for randomized experiments,”
*Journal of Agronomy and Crop Science*, vol. 189, no. 5, pp. 310–322, 2003. View at Publisher · View at Google Scholar - H. P. Piepho, A. Büchse, and C. Richter, “A mixed modelling approach for randomized experiments with repeated measures,”
*Journal of Agronomy and Crop Science*, vol. 190, no. 4, pp. 230–247, 2004. View at Publisher · View at Google Scholar - J. Spilke, H. P. Piepho, and X. Hu, “Analysis of unbalanced data by mixed linear models using the MIXED procedure of the SAS system,”
*Journal of Agronomy and Crop Science*, vol. 191, no. 1, pp. 47–54, 2005. View at Publisher · View at Google Scholar - D. Nettleton, “A discussion of statistical methods for design and analysis of microarray experiments for plant scientists,”
*The Plant Cell*, vol. 18, no. 9, pp. 2112–2121, 2006. View at Publisher · View at Google Scholar - X. Cui, J. T. G. Hwang, J. Qiu, N. J. Blades, and G. A. Churchill, “Improved statistical tests for differential gene expression by shrinking variance components estimates,”
*Biostatistics*, vol. 6, no. 1, pp. 59–75, 2005. View at Publisher · View at Google Scholar - G. K. Smyth, “Linear models and empirical bayes methods for assessing differential expression in microarray experiments,”
*Statistical Applications in Genetics and Molecular Biology*, vol. 3, no. 1, article 3, 2004. View at Publisher · View at Google Scholar - K. K. Dobbin, E. S. Kawasaki, D. W. Petersen, and R. M. Simon, “Characterizing dye bias in microarray experiments,”
*Bioinformatics*, vol. 21, no. 10, pp. 2430–2437, 2005. View at Publisher · View at Google Scholar - R. C. Littell, “Analysis of unbalanced mixed model data: a case study comparison of ANOVA versus REML/GLS,”
*Journal of Agricultural, Biological, and Environmental Statistics*, vol. 7, no. 4, pp. 472–490, 2002. View at Publisher · View at Google Scholar - S. R. Searle,
*Linear Models for Unbalanced Data*, John Wiley & Sons, New York, NY, USA, 1987. - F. E. Sattherthwaite, “An approximate distribution of estimates of variance components,”
*Biometrics Bulletin*, vol. 2, no. 6, pp. 110–114, 1946. View at Publisher · View at Google Scholar - R. C. Littell, G. A. Milliken, W. W. Stroup, R. D. Wolfinger, and O. Schabenberger,
*SAS for Mixed Models*, SAS Institute, Cary, NC, USA, 2nd edition, 2006. - J. Spilke, H.-P. Piepho, and X. Hu, “A simulation study on tests of hypotheses and confidence intervals for fixed effects in mixed models for blocked experiments with missing data,”
*Journal of Agricultural, Biological, and Environmental Statistics*, vol. 10, no. 3, pp. 374–389, 2005. View at Publisher · View at Google Scholar - W. H. Swallow and J. F. Monahan, “Monte Carlo comparison of ANOVA, MIVQUE, REML, and ML estimators of variance components,”
*Technometrics*, vol. 26, no. 1, pp. 47–57, 1984. View at Publisher · View at Google Scholar - R. N. Kackar and D. A. Harville, “Approximations for standard errors of estimators of fixed and random effects in mixed linear models,”
*Journal of the American Statistical Association*, vol. 79, no. 388, pp. 853–862, 1984. View at Publisher · View at Google Scholar - M. G. Kenward and J. H. Roger, “Small sample inference for fixed effects from restricted maximum likelihood,”
*Biometrics*, vol. 53, no. 3, pp. 983–997, 1997. View at Publisher · View at Google Scholar - F. G. Giesbrecht and J. C. Burns, “Two-stage analysis based on a mixed model: large-sample asymptotic theory and small-sample simulation results,”
*Biometrics*, vol. 41, no. 2, pp. 477–486, 1985. View at Publisher · View at Google Scholar - A. H.-T. Fai and P. L. Cornelius, “Approximate F-tests of multiple degree of freedom hypotheses in generalized least squares analyses of unbalanced split-plot experiments,”
*Journal of Statistical Computation and Simulation*, vol. 54, no. 4, pp. 363–378, 1996. View at Publisher · View at Google Scholar - W. W. Stroup and R. C. Littell, “Impact of variance component estimates on fixed effect inference in unbalanced linear mixed models,” in
*Proceedings of the 14th Annual Kansas State University Conference on Applied Statistics in Agriculture*, pp. 32–48, Manhattan, Kan, USA, April 2002. - J. D. Storey and R. Tibshirani, “Statistical significance for genomewide studies,”
*Proceedings of the National Academy of Sciences of the United States of America*, vol. 100, no. 16, pp. 9440–9445, 2003. View at Publisher · View at Google Scholar - A. S. Hedayat, J. Stufken, and M. Yang, “Optimal and efficient crossover designs when subject effects are random,”
*Journal of the American Statistical Association*, vol. 101, no. 475, pp. 1031–1038, 2006. View at Publisher · View at Google Scholar - J. P. Steibel,
*Improving experimental design and inference for transcription profiling experiments*, thesis, Department of Animal Science, Michigan State University, East Lansing, Mich, USA, 2007. - D. B. Allison, X. Cui, G. P. Page, and M. Sabripour, “Microarray data analysis: from disarray to consolidation and consensus,”
*Nature Reviews Genetics*, vol. 7, no. 1, pp. 55–65, 2006. View at Publisher · View at Google Scholar - I. Lönnstedt and T. Speed, “Replicated microarray data,”
*Statistica Sinica*, vol. 12, no. 1, pp. 31–46, 2002. View at Google Scholar - G. K. Robinson, “That BLUP is a good thing: the estimation of random effects,”
*Statistical Science*, vol. 6, no. 1, pp. 15–51, 1991. View at Publisher · View at Google Scholar - P. Baldi and A. D. Long, “A Bayesian framework for the analysis of microarray expression data: regularized $t$-test and statistical inferences of gene changes,”
*Bioinformatics*, vol. 17, no. 6, pp. 509–519, 2001. View at Publisher · View at Google Scholar - V. G. Tusher, R. Tibshirani, and G. Chu, “Significance analysis of microarrays applied to the ionizing radiation response,”
*Proceedings of the National Academy of Sciences of the United States of America*, vol. 98, no. 9, pp. 5116–5121, 2001. View at Publisher · View at Google Scholar - G. W. Wright and R. M. Simon, “A random variance model for detection of differential gene expression in small microarray experiments,”
*Bioinformatics*, vol. 19, no. 18, pp. 2448–2455, 2003. View at Publisher · View at Google Scholar - G. K. Smyth, J. Michaud, and H. S. Scott, “Use of within-array replicate spots for assessing differential expression in microarray experiments,”
*Bioinformatics*, vol. 21, no. 9, pp. 2067–2075, 2005. View at Publisher · View at Google Scholar - G. Gibson and R. D. Wolfinger, “Gene expression profiling using mixed models,” in
*Genetic Analysis of Complex Traits Using SAS*, A. M. Saxton, Ed., pp. 251–279, SAS Users Press, Cary, NC, USA, 2004. View at Google Scholar