Since Lander and Botstein proposed the interval mapping method for QTL mapping data analysis in 1989, tremendous progress has been made in the last many years to advance new and powerful statistical methods for QTL analysis. Recent research progress has been focused on statistical methods and issues for mapping multiple QTL together. In this article, we review this progress. We focus the discussion on the statistical methods for mapping multiple QTL by maximum likelihood and Bayesian methods and also on determining appropriate thresholds for the analysis.
1. Introduction
Quantitative genetics studies the variation of
quantitative traits and their genetic basis. When R. A. Fisher laid down the
basic theoretical foundations of quantitative genetics, the focus of study was
to partition the overall variation into genetic and environmental ones. With
the development of polymorphic markers for many species, current research
interest is to partition genetic variation to individual quantitative trait
loci (QTL) in the genome as well as interaction among them [1]. A QTL is a chromosomal region that is likely to
contain causal genetic factors for the phenotypic variation under study.
The basic principle of QTL mapping has been
established in Sax's work [2] work in beans. If there is a linkage disequilibrium
(LD) between the causal factor and a marker locus, mean values of the trait
under study will differ among subject groups with different genotypes at the
marker locus [3]. Though this idea is still directly used in certain
settings (e.g., LD-based QTL mapping in unrelated human), the advance of QTL
mapping methodology has allowed simultaneous use of multiple marker information
to improve the accuracy and power to estimate QTL locations and effects. Lander and Botstein [4] presented a likelihood-based framework for interval
mapping (IM), where the putative QTL genotype was conditional upon a pair of
flanking markers' genotypes as well as the phenotype. A least square equivalence
of IM [5] was also proposed where phenotypic values were
regressed onto expected genetic coefficients of a putative QTL. Motivated by
the conditional independency between marker genotypes, composite interval
mapping [6] proposed to introduce additional flanking markers as
covariates into the likelihood function to reduce the confounding effects from
nearby QTL when scanning the current interval. However, most of these methods
were still designed to detect a single QTL at a time based on a statistical
test that a candidate position for a QTL has significant effect or not. The
test was constructed to test each position in a genome and thus created a
genome scan for QTL analysis.
Though intuitive and widely used, these methods are
still insufficient to study the genetic architecture of complex quantitative
traits that are affected by multiple QTL. When a trait is affected by multiple
loci, it is more efficient statistically to search for those QTL together. Also
in order to study epistasis of QTL, multiple QTL need to be analyzed together.
In this setting, QTL analysis is basically a model-selection problem. In this
paper, we discuss recent research progress and outstanding statistical issues
associated with mapping multiple QTL in experimental cross populations.
2. Multiple Interval Mapping (MIM)
Multiple interval mapping is targeted to analyze
multiple QTL with epistasis together through a model selection procedure to
search for the best genetic model for the quantitative trait [1, 7, 8].
For putative causal genes for the trait, the model
of MIM is specified as
where(i) is the phenotypic value of individual , ;(ii) is the mean of the model;(iii) is the main effect of the th putative causal gene, ;(iv) is an indicator variable denoting genotype of the th putative causal gene, which follows a
multinomial distribution conditional upon flanking marker genotypes and genetic
distances;(v) is the possible epistatic effect between the th and the th putative causal genes, assuming there are such effects;(vi) is an environmental effects assumed to be
normally distributed.
As shown by Kao and Zeng [7], Kao et al. [8], given a genetic model (number, location, and
interaction of multiple QTL), this linear model suggests a likelihood function
similar to that in IM but with more complexity. An expectation/maximmization
(EM) algorithm can be used to maximize the likelihood and obtain maximum
likelihood estimates (MLE) of parameters.
The following model-selection method is used to
transverse the genetic model space in QTL cartographer [1, 9, 10].(1)Forward selection of QTL main effects
sequentially. In each cycle of selection, pick the best position of an
additional QTL, and then perform a likelihood ratio test for its main effect.
If a test statistic exceeds the critical value, this effect is retained in the
model. Stop when no more QTL can be found.(2)Search for epistatic effects between QTL main
effects included in the model, and perform likelihood ratio tests on them. If a
test statistic exceeds the critical value, the epistatic effect is retained in
the model. Repeat the process until no more significant epistatic effects can
be found.(3)Reevaluate the significance of each QTL main
effect in the model. If the test statistic for a QTL falls below the
significant threshold conditional on other retained effects, this QTL is
removed from the model. However, if a QTL is involved in a significant
epistatic effect with other QTL, it is not subject to this backward elimination
process. This process is performed stepwisely until no effects can be dropped.(4)Optimize estimates of QTL positions based on
the currently selected model. Instead of performing a multidimensional search
around the regions of current estimates of QTL positions, estimates of QTL
positions are updated in turn for each region. For the th QTL in the model, the region between its
two neighbor QTLs is scanned to find the position that maximizes the likelihood
(conditional on the current estimates of positions of other QTL and QTL
epistasis). This refinement process is repeated sequentially for each QTL
position until there is no change on estimates of QTL positions.
An important issue in model selection is the
significance level to include or eliminate effects. In regression analysis,
such threshold is usually decided based on information criteria, which has the
following general formwhere is the likelihood of data given a genetic
model with parameters, is a penalty function and can take a variety
of forms, such as,
(i),
which is the classical Bayesian information criterion (BIC);(ii), which is Akaike information criteron (AIC);(iii);(iv);(v).
When the penalty for an additional parameter in the
model is low, more QTL and epistatic effects are likely to be included in the
model. Thus it is particularly important to determine an appropriate penalty
function for model selection.
Sequential genome scans require detectable main
effects for the components of interaction effect. An alternative approach,
exhaustive search of all marker combinations, is a computational and statistical
problem even in two dimensions. From a yeast eQTL mapping data with over 6000
expression traits and 112 individuals [11], Storey et al. [12] showed that the sequential search was more powerful than
exhaustive one to detect pair-wise QTL main effects and interaction effects.
However, in a different setting using simulations under a series of
quantitative trait model assumptions, Evans et al. [13] showed that the exhaustive search can be more powerful
than the sequential one with over 1000 individuals in the mapping population and a
Bonferroni correction for 100 000 tests. The inconsistency is partially related
to sample size. A larger sample can make unadjusted values more significant. Witte et al. [14]
showed that the required sample size increases linearly as the number of tests increases logarithmically with a simple
Bonferroni correction.
3. Threshold to Claim QTL
We need to decide the threshold for declaring QTL from the profile of test statistics
across the genome. General asymptotic results for regression and likelihood
ratio tests are not directly applicable for genome scans given the large number
of correlated tests performed in the scans and the limited sample size.
3.1. Type I Error Rate Control
When markers are dense and the sample size is large, Lander and Botstein [4] showed that an appropriate threshold for LOD score
was ,
where solves the equation . is the number of chromosomes of the organism. is the length of the genetic map, measured in
Morgans. is the probability that a random variable from
a distribution is less than .
Churchill and Doerge [15] proposed a method based on permutation tests to find
an empirical threshold specifically for a QTL mapping study. Data were shuffled
by randomly pairing one individual's genotypes with another's phenotypes, in
order to simulate the null hypothesis of no intrinsic relationship between
genotypes and phenotypes. Thus, this method takes into account sample size,
genome size of the organism under study, genetic marker density, segregation
ratio distortions, and missing data.
According to Churchill and Doerge [15], the genome-wide threshold to control type I error
rate for mapping a single trait can be found in the following procedure.(1)Shuffle the data times by randomly pairing trait values with
genotypes. When there are multiple traits under study, these phenotypes should
be shuffled together to keep their correlation structure.(2)Perform mapping analysis and obtain the
maximum test statistic in each of shuffled data. This provides an empirical
distribution of the test statistic for the genome scan at
the null.(3)The percentile of will provide an estimated critical value.
This permutation procedure is equivalent to the
Bonferroni correction for multiple testing when test statistics are
independent. Suppose there are such statistics (for ) from a null distribution . ,
the distribution function of maximum of the statistics, can be expressed as .
When we find a threshold ,
such that ,
it is equivalent to require ,
or ,
the Bonferroni adjusted threshold. When test statistics are correlated, the
permutation method provides a threshold that is less than that from Bonferroni.
A related permutation procedure was also suggested by Doerge and Churchill [16] for mapping procedures that QTLs are declared sequentially
using a forward selection procedure. Two methods were suggested to find a
genome-wide threshold for the second QTL while controlling effects of the first
QTL.(i)Conditional empirical threshold (CET). Mapping
subjects are put into blocks according to the genotype of the marker identified
as (or closest to) the first QTL. Permutation is applied within each block.
Following the procedure described above by Churchill and Doerge [15], maximal test statistic of each genome scan is collected
and CET is obtained. One problem of CET is that markers linked to the first QTL
will continue to show assocation with the trait variation as in the original
data. To avoid CET being elevated by such markers, it is suggested to exclude
the complete chromosome where the first QTL is located when collecting null
statistics.(ii)Residual empirical threshold (RET). The
residues from the genetic model with the first QTL are used as new phenotypic
values to be permuted. Maximal null statistics from genome-wide scans are then
collected to find RET.
Applied to multiple interval mapping, Zeng et al. [1] also suggested to use a residual permutation or
bootstrap test to estimate appropriate threshold for the model selection in each
step of sequential test. In this test, after fitting a model with identified
QTL effects, residuals of individuals are calculated and permutated or
bootstrapped to generate a null model for the conditional test to identify
additional QTL. This threshold is more appropriate for the conditional search,
but computationally more intensive.
3.2. Score-Statistic-Based Resampling Methods
Permutation is a computationally intensive method for generating empirical threshold. Zou et al. [17] suggested that the genome-wide
threshold could be more efficiently computed based on score statistic and a resamplng method. A
score statistic can be computed at each genome position. If we multiply a score
function by a standard normal random variable with mean zero and variance one,
the resulting score statistic mimics that under the null hypothesis. Thus by
multiplying a number of standard normal variables, we can very efficiently
generate an empirical distribution of score statistic under the null. This method
is flexible and can be used to test a null hypothesis in a complex model. A
similar algorithm was also suggested by Seaman and Müller-Myhsok [18]. Conneely and Boehnke [19] extended the approach by replacing the resampling step
with a high dimensional numerical integration.
Unlike these approaches, Nyholt [20] suggested another method that addresses the multiple
testing issue: the number of independent tests across the genome can be
approximately estimated as a function of eigenvalues derived from the
correlation matrix of marker genotypes.
3.3. False Discovery Rate (FDR)
In QTL mapping, as marker coverage in a genome increases, it is less likely that a casual
variant is not in LD with any marker and may be missed. On the other hand, the
number of markers showing significant correlation with the phenotype by chance
is also expected to grow, if the type I error rate for each test is controlled
at a preset level. To handle this multiple testing problem, stringent
family-wise control of type I error is usually applied, which is designed to
control the probability of making at least one false discovery in a genome-wide
test. However, a more powerful approach may be to control false discovery rate
(FDR) [21], or to control the expected proportion of false
discoveries among all the markers passing a threshold. This essentially allows
multiple false positive declarations when many “significant” test statistics
are found. Such a relaxation is driven by the nature of the problem under
study. “It is now often up to the statistician to find as many interesting
features in a data set as possible rather than test a very specific hypothesis
on one item” [22].
According to the notation from Storey [22], Table 1 shows the possible outcomes
when hypotheses are tested. For independent tests, Benjamini and Hochberg [21] provided a procedure (known as linear step-up
procedure or BH procedure) to control expected FDR, that is, at the desired level or (since is generally unknown, a conservative up-bound
estimate was used) as follows:(i)sort values from the smallest to the largest such
that ;(ii)starting from ,
compare with ;(iii)let be the first time ,
reject all through .
Table 1: Possible outcomes from hypothesis tests.
Benjamini and Yekutieli [23] showed that “if test statistics are positively
regression dependent on each hypothesis from the subset corresponding to true
null hypotheses (PRDS), the BH procedure controls FDR at level .” For QTL mapping, PRDS can be interpreted as
follows [24]: if two markers have correlated allele frequencies
and neither is related biologically to the trait, test statistics associated
with the two markers should be positively correlated. Such positive correlation
is intuitively correct and supported by simulation results [24].
To check the performance of BH procedure on FDR
control in genome-wide QTL scan for a single trait, Sabatti et al. [24] considered a simulated case-control study in human. Three susceptibility genes were simulated to affect the disease status. The
genes were assumed to be additive and located on different chromosomes. The
results showed that the BH procedure can control the expected value of the FDR
for single-trait genome-wide scan. For multiple-trait QTL analysis, Benjamini and Yekutieli [25] considered 8 positively or negatively correlated
traits. Using simulation, they showed that BH approach seemed to work for
multiple trait analysis too.
According to Benjamini and Yekutieli [25], to control FDR for QTL analysis in each trait at
level does not always mean that the overall FDR for these
multiple traits is also :
if there are independent and nonheritable traits, the
overall FDR should be .
It is safer to control FDR for all the tests simultaneously.
Yekutieli and Benjamini [26] suggested to make use of dependency structure in
data, rather than treat them as annoying cases. They expected an increase of
testing power when using an empirical true null statistic distribution instead
of assuming some theoretical ones to get values. Empirical null distributions
are used extensively in pFDR and local FDR as discussed below.
Though BH approach is a handy and intuitive tool, it
should be used with caution when applied to QTL mapping. First, BH approach
controls the expected value of FDR. Simulation studies showed that the actual
FDR for a particular QTL mapping dataset can be higher [24]. Second, may be tricky to interpret when is far below than 1. Weller et al. [27] are the first ones to apply FDR criteria in QTL
mapping area. They claimed that of the 10 QTL declared in their study were
probably true by controlling FDR at using BH approach. Zaykin et al. [28] however objected to the interpretation because could be much higher than when was much smaller than 1. It is ,
also known as pFDR discussed in Section 3.4, that really contains the
information about the proportion of false discovery. Weller [29] further argued that if one assumes follows a Poisson distribution, should be very close to 1 when they observed .
In later FDR literature, the assumption that has been widely adopted [25, 30].
3.4. Positive Discovery Rate (pFDR)
pFDR, or ,
was considered less favorable than FDR [21] without the additional term :
we cannot decide an arbitrary threshold , , and guarantee that pFDR regardless of the actual proportion of true
null hypothesis in all tests. For example, when ,
pFDR is always 1 and cannot be controlled at a small .
In this case, however, FDR = pFDR can be controlled at by reducing the rejection region to push ,
and then FDR, towards .
Thus, Benjamini and Yekutieli [25] considered that pFDR should be only estimated after a
fixed rejection threshold was decided; or in QTL mapping, pFDR should estimate
(instead of control) the proportion of true/false QTL, after the null
hypothesis was rejected at linkage peaks by a certain rule (e.g., a type-I
error control procedure).
From the discussion in Section 3.3 we know
that allowing much smaller than 1 might bring trouble in
interpreting the results; and when ,
we might want to see an FDR measure that is close to 1 [31]. This helped to bring up the interest in pFDR. Storey [31] presented a Bayesian interpretation of pFDR based on
values. Assuming there are tests with associated values ,
and(i) denots the th null hypothesis is true, and otherwise;(ii)to model the uncertainty in each hypothesis
test, each is viewed as an identical and independent
random variable from a Bernoulli distribution with and ;(iii) follows a distribution if
and if ;
it follows a mixed distribution unconditionally: ;(iv) is the common rejection region for all , then
Notice that, when and is reasonably large, formula ((5)
is roughly equal to ,
which is the BH approach to estimate FDR. We can also see from formula
((5) that instead of assuming , a data-driven estimate of will give a smaller pFDR estimate or a more
powerful procedure if is estimated smaller than 1. Efron et al. [32] gave an upper bound of by assuming an “accept region” such that :
where are the density functions of ,
respectively. The left-hand side of this equation is equivalent with the
following estimator [30]:
These formulas lead to value, the minimal pFDR when
rejecting a hypothesis with value .
This approach is more powerful than the BH approach [21] in genomics applications. An R package “ value” is
available to convert a list of values to values [30]. The value approach can actually find while controlling a specific pFDR rate besides
finding pFDR given .
However, as discussed above, FDR control in QTL mapping should be applied with
caution.
When are identical and independent distributed,
pFDR becomes identical to the definition of proportion of false positives (PFP)
[22, 33]:PFP is also estimated similarly
as pFDR (cf. formula (5) [33]:Contrary to FDR discussed in
Section 3.3, PFP has a property that when it is controlled at for each of sets of independent tests, the
overall PFP is still . from different sets of tests can have
different distributions. In this case, PFP can be different from pFDR [33].
Currently available procedures to control or estimate
pFDR or PFP may have variable utility in various mapping designs. Chen and Storey [34] noted that the threshold to control FDR at marker
level from one-dimensional genome scan for a single trait could be “dubious”
because FDR is affected by the marker densities along the chromosomes. Since a
true discovery is to claim a QTL at a marker which is in strong LD with a
causal polymorphism, markers that are in strong LD with the true discovery can
be additional true discoveries. Thus, FDR decreases when we genotype more
markers around a true linkage peak. Using simulations, Chen and Storey [34] showed that the threshold is obtained by controlling FDR
varied with the marker density. However, in real applications, people generally
consider all markers surrounding a test statistic peak as parts of one QTL,
rather than distinct positive discoveries. On the other hand, we can still
estimate FDR from values at linkage peaks for different traits that pass
certain cutoff value. This is a common situation from an expression QTL study
where thousands of traits are analyzed together.
Zou and Zuo [35] showed that familywise error rate control via
Bonferroni correction can be more powerful than PFP control. In their
simulation, they assumed 1 to 5 true non-null hypotheses out of 1000
independent tests, corresponding to %,
which might be too pessimistic for certain QTL mapping studies. As we can see
from formula (10), when is so small, has to be very large so that its product with is considerably larger than and there is an acceptable PFP level. When the
density function of is monotonously decreasing in ,
which is quite common in reality, has to be very small to increase the ratio of
power over type-I error. Thus, it is not surprising that such a from PFP would result in a more stringent test
than that under Bonferroni correction. [36] made a similar argument and pointed out that
familywise error rate control was effective when was relatively low, and PFP or pFDR approach
can be more powerful when was high. Again, expression QTL studies stand
out as an example when pFDR is more favorable: in the yeast eQTL mapping data [11], Storey et al. [12] estimated that among the genome-wide maximal test statistic
for each expression trait.
3.5. Local FDR
The Bayesian interpretation of pFDR extends naturally to local FDR [32], denoted as FDR here,
where is the test statistic associated with ; denotes that the th null hypothesis is true, and otherwise; has a density if ,
and if ;
or when hypotheses are mixed together.
There is great similarity between formula
(3) and formula (12). Actually Efron [37] showed that value associated with is equivalent with .
Storey et al. [12] showed that FDR could be estimated within each cycle
of a sequential genome scan for thousands of expression traits; then the
average of the FDR in the rejection region was an estimate of pFDR.
The key part in estimating FDR is to estimate .
It is possible to assume certain standard distribution under the null
hypothesis as and estimate from nonparametric regression [37]. The following procedure is modified from [12, 32]:(1)permute data under null hypothesis times, and obtain test statistics , , ;(2)estimate from and (see below);(3)estimate using formula (7).
is estimated in the following way:(1)pool all and into bins;(2)create an indicator variable ,
let for each and for each ,
thus, in each bin;(3)obtain a smooth estimate of in each bin from an overall regression curve
across bins, by combining natural cubic spline with generalized linear models
for binomially distributed response variables;(4)equate with ,
and get a moment estimate of .
It is noticed that and its associated or are assumed to be from a mixture distribution
in both pFDR and FDR estimation. Thus, as pointed out by Storey [22], there is a connection between multiple hypothesis
testing and classification. For each test, the test procedure is to classify as 0 or 1, or accepted or rejected.
Classification decisions can be made based upon ,
with a rejection region :
if , we classify as 1.
4. Bayesian Methods
Bayesian QTL
mapping methods try to take full account of the uncertainties in QTL number,
location, and effects by studying their joint distributions. Such a method takes
the prior knowledge about these parameters as a prior distribution, reduces the
uncertainty by integrating the information from the data, and expresses the
remaining uncertainty as a posterior distribution of parameters.
Satagopan et al. [38] illustrated the use of Markov chain Monte Carlo
(MCMC) algorithm to generate a sequence of samples from the joint posterior
distribution of QTL parameters conditional upon the number of QTL. Gibbs
sampling algorithm approximates the joint distribution by sampling each
parameter in turn, conditional upon the current values of the rest of
parameters. Conjugate priors can be chosen so that most of these conditional
distributions have explicit distribution functions and can be sampled directly.
Otherwise, Metropolis-Hastings algorithm is to be used. Point estimates of
individual parameters are obtained by taking the averages of the corresponding
marginal distributions. Confidence intervals are obtained as high posterior
density regions.
Sen and Churchill [39] proposed to sample QTL genotypes from their posterior
distribution conditional upon marker genotypes and trait phenotypes. This
multiple imputation method offers a framework to handle several issues in QTL
mapping: nonnormal and multivariate phenotypes, covariates, and genotyping
errors. It differs from usual MCMC procedures in that a two-step approach is
used: first, QTL genotypes are sampled conditioning only on marker genotypes;
then, weights can be calculated as the likelihood of the phenotypes given a
genotype. These genotypes and weights are then used to estimate posterior
distributions.
In both studies, MCMC simulations were conditional
upon the number of QTL () underlying the phenotype. Different values
of were tried and Bayes factors were used to
decide which value of was more plausible. Bayes factor is the ratio
of the probability density of data under two hypotheses. It is a likelihood
ratio in some simple cases. In Bayesian QTL mapping, however, likelihoods are
weighted by prior distributions of parameters under different hypotheses to get
Bayes factors [38].
The development of reversible jump MCMC algorithm [40] suggested one way to treat as a parameter and generate its posterior
distribution. Satagopan and Yandell [41] introduced this method to QTL mapping: when updating
the current value of ,
a single QTL may be added or deleted from the model. Yi et al. [42] extended this method to model interacting QTL by
allowing 3 types of transdimensional exploration: a main effect or epistatic
effect, a single QTL with its effects or a pair of QTL may be added or deleted
in one updating step. However, they also observed low acceptance rates for
transdimensional jump proposals and hence slow mixing of the markov chains and
high computational demand associated with the algorithm.
Yi [43] introduced composite model space approach as a
unified Bayesian QTL mapping frame work. The approach incorporates reversible
jump MCMC as a special case, and turns the transdimensional search into a
fixed-dimensional one. The central idea is to use a fixed length vector to
record the current locations of putative QTL and another indicator vector to
record whether a QTL's main or interaction effect is included or excluded.
These two vectors decide the actual number of QTL. The fixed length sets an
upper bound for the number of detectable QTL. This approach has been
implemented in the R/qtlbim package [44, 45].
Without considering computational efficiency, the
upper bound can actually be fairly large. Unlike multiple regression analysis
based upon least square criteria, which gets in trouble when the number of
explanatory variables is larger than the number of observations, Bayesian
analysis can handle a large number of explanatory variables through a large
number of cycles within each step of Gibbs sampling. Xu [46] showed an example where markers across genome were
used simultaneously in Bayesian linear regression and suggested that the
Bayesian method could result in model selection naturally: markers with weak
effects are “shrunk” so that their posterior mean effects are around zero,
markers with strong effects remain strong.
A closely related procedure is ridge regression, where
the parameters to an usual linear model are estimated as (, , is an identity matrix). The estimator turns
out to be identical to a Bayesian one which assumes that has a prior normal distribution [47]. Ridge regression was initially applied to marker-assisted
selection by Whittaker et al. [48] to reduce the standard errors in estimation. Xu [46] found ridge regression was not applicable to
genome-wide markers simultaneously because each marker demand a different in its prior distribution variance. This
problem of ridge regression can be fixed in a generalized ridge estimator where is a diagonal matrix with different diagonal
elements [49]. However, an iterative procedure is required to find
these ,
which is similar to the MCMC sampling in Bayesian analysis.
A Bayesian QTL mapping analysis usually results in a
posterior distribution over the QTL model space. Numerical characteristics from
such a distribution provide estimates for parameters. Such an estimation is
based upon entire model space, weighted by the posterior probabilities, and
hence is more robust than usual MLE. In terms of hypothesis test, Bayes
factors, together with prior odds, are used to compare two hypotheses. Unlike
values, Bayes factors are calculated under both competing hypotheses; like
values, they have to be compared with some commonly used cutoff values (like
0.05 for values) to decide which hypothesis to prefer [50].
There is very clearly a null hypothesis and an
alternative hypothesis in single QTL scan [4]. In model selection for a multiple QTL model,
however, the number of hypotheses in the model space is huge and the data may
well support many models for a complex trait. Posterior distribution emphasizes
such uncertainty. Picking a better supported model to interpret mapping results
may not fully convey the uncertainty of inference. On the other hand, a
pragmatic mathematical model may choose to simplify the complexity and present
an inference with basic structural characteristics. See Kass and Raftery [50] for more discussion.
Bayesian QTL mapping does not have the multiple
testing issue discussed above. Bayesians agree with the idea to require very
strong evidence to call QTL from genome. However, they believe that the reason is
that the genome is so large that the prior probability that any one variant or
variant combination is causative is very low. Thus, in Bayesian QTL mapping,
the multiple testing issue is handled by the low prior density on any one
marker or low prior odds for any one hypothesis; and such a stringent
requirement is recommended even when exploring a very limited QTL model space
unless there is strong prior knowledge against that [51].
5. Conclusion
To document the
evolution of the statistical approaches for QTL mapping, we attempt to follow
some threads of methodological development on multiple QTL mapping, threshold
determination, and Bayesian QTL mapping methods. We see that this area has been
advanced greatly by the interaction between genotyping technologies and
statistical methodologies in the last several years, and will continue to be so
in the future. The availability of efficient computational algorithms/softwares
is essential to the QTL mapping community. However, it is equally important
that these tools are applied with thorough understanding of the genetic data
and the tools themselves.
Acknowledgment
This work was partially supported by the USDA Cooperative State Research, Education and
Extension Service, Grant no. 2005-00754.