Methodology Report | Open Access
EM Algorithm for Mapping Quantitative Trait Loci in Multivalent Tetraploids
Multivalent tetraploids that include many plant species, such as potato, sugarcane, and rose, are of paramount importance to agricultural production and biological research. Quantitative trait locus (QTL) mapping in multivalent tetraploids is challenged by their unique cytogenetic properties, such as double reduction. We develop a statistical method for mapping multivalent tetraploid QTLs by considering these cytogenetic properties. This method is built in the mixture model-based framework and implemented with the EM algorithm. The method allows the simultaneous estimation of QTL positions, QTL effects, the chromosomal pairing factor, and the degree of double reduction as well as the assessment of the estimation precision of these parameters. We used simulated data to examine the statistical properties of the method and validate its utilization. The new method and its software will provide a useful tool for QTL mapping in multivalent tetraploids that undergo double reduction.
Genetic analysis in polyploids has received considerable interest in recent years because of the biological and economic importance [1–3]. Genetic linkage maps constructed from molecular markers have been published for several major polyploids [4–10]. Statistical models for linkage analysis and map construction that consider unique biological properties of polyploids have been developed [11–14]. For bivalent polyploids, Wu et al. [15, 16] incorporated the so-called chromosomal pairing preference  into the linkage analysis framework, to increase the biological relevance of linkage mapping models. There have been several statistical models developed to map quantitative trait loci (QTLs) in bivalent polyploids [18, 19].
There is also a group of polyploids, called multivalent polyploids, in which chromosomes pair among more than two homologous copies at meiosis, rather than only two copies as like in bivalent polyploids. The origin of multivalent polyploids is mostly from the duplication of similar genomes and, for this reason, they are called autopolyploids [20, 21]. The consequence of multivalent pairing in autopolyploids is the occurrence of double reduction, that is, two sister chromatids of a chromosome sort into the same gamete . Fisher  proposed a conceptual model for characterizing the individual probabilities of 11 different modes of gamete formation for a quadrivalent polyploid in terms of the recombination fraction between two different loci and their double reductions. Wu et al.  used Fisher's model to derive the EM algorithm for the estimation of the linkage between fully informative markers. Wu and Ma  extended this model to analyze any type of markers, regardless of their informativeness and dominant or codominant nature. The significant advantage of the models by Wu and colleagues directly lies in their generality, flexibility, and robustness.
In this paper, we develop a statistical method for QTL mapping in multivalent tetraploids by considering Fisher's  11 classifications of gamete formation. The method allows the estimation and test of not only the QTL-marker linkage, but also the extent of double reduction of the QTL. Because of the inherent complexity of classification analyses of gamete formation, we will focus on the modeling and analysis of one-marker/one-QTL associations. A two-stage hierarchical model is derived to estimate the probabilities of gamete formation modes and therefore double reduction in the upper hierarchy and estimate the marker-QTL recombination fraction in the lower hierarchy within the maximum likelihood context implemented with the EM algorithm. The method is used to analyze a simulated data set, with results demonstrating statistical properties of the method and its analytical and biological merits.
2.1. Genetic Design
Consider a heterozygous multivalent tetraploid line crossed with a homozygous line to generate a so-called pseudotest backcross population. For such a population, the genotypes of progeny are consistent with the genotypes of gametes produced by the heterozygous parent and, therefore, the derivation of mapping models can be based on the segregation of gametes. Suppose there are individuals in the pseudotest backcross population. A panel of codominant markers is typed for each individual, with which a linkage map is constructed. All the pseudotest backcross individuals are phenotyped for a quantitative trait that is assumed to be controlled by QTLs.
2.2. Tetrasomic Co-Inheritance
To simplify our analysis, we assume that the QTLs underlying the trait are mapped with single markers. Let be the four alleles at a marker , and let be the four alleles at a QTL linked with the marker. The marker and QTL are linked with a recombination fraction of . Because of the double reduction, the multivalent tetraploid generates 10 diploid gametes for locus, which are arrayed as (, , , , , , , , , ) for the markers, and (, , , , , , , , , ) for the QTL. In each case, the first four gametes are derived from the double reduction, whereas the second six gametes are derived from the chromosome paring. Let and be the frequencies of double reduction at the marker and QTL, respectively. The frequency of double reduction is a constant for any given locus, with the value depending on its distance from the centromere.
When the marker and QTL are co-segregating in a multivalent tetraploid, a total of 136 diploid gamete formation mechanisms are generated although there are only 100 gamete genotypes that are observable. Based on the presence/absence of double reduction and the number of recombinant events, Fisher  classified these 136 formation mechanisms into 11 gamete modes. Of these 11 gamete modes, however, only nine can be observed each with a frequency denoted by ). These 9 observable gamete modes were rearranged by Wu et al.  in matrix form expressed aswhere and are associated with double reductions at both the marker and QTL, and with double reductions only at QTL , and with double reductions only at marker , and with nondouble reductions. From matrix (1), we see that there are no, one and two recombinant events in the cells , , and , respectively. The cells () and () are each a mixture of two different gamete formation mechanisms or configurations ( and ), that is, and , with relative proportions determined by . Because different configurations contain different numbers of recombination events, the expected number of recombination events in each cell, that is, an observable gamete genotype, should be the weighted average of the number of recombination events for each configuration. Wu et al.  used a matrix form (e) to count the expected number of recombination events for each observable gamete genotype expressed aswhere Based on matrices (1) and (2), the expressions for the frequencies of double reduction ( and ) and the recombination fraction can be expressed in terms of as
2.3. Quantitative Genetic Model
For a given QTL, there are 10 different QTL gamete genotypes in the multivalent tetraploid, whose values can be partitioned into additive and dominance genetic effects of different types, expressed as where is the overall mean, , , and are the additive genetic effects of alleles , , and relative to allele , and , , , , , and are the dominant genetic effects due to interactions between different alleles and , and , and , and , and , and and , respectively.
From expression (5), we can solve the overall mean and additive and dominant effects as
2.4. EM Algorithm
Ignoring the effects of other covariates, the phenotypic value, , for individual in the pseudotest backcross can be expressed in terms of the QTL effect and residual error as where is the indicator variable that is defined as 1 if individual has a QTL genotype (), and 0 otherwise, is the genotypic value of QTL genotype as defined in (5), and is the residual error assumed to be normally distributed with mean zero and variance . We use to denote the unknown vector .
For a QTL mapping experiment, marker genotypes are observable. Let be the observation of marker genotype (). The likelihood of the phenotypic () and marker data () is constructed, within the mixture model framework, as where is the conditional probability of QTL genotype given marker genotype , and is assumed to follow a normal distribution with mean and variance . Prior conditional probability is calculated as the frequency of joint marker-QTL genotype , expressed in terms of nine probabilities in matrix (1), divided by the frequency of marker genotype . Marker genotype frequencies are for each of double reduction gametes , , , and , and for each of nondouble reduction gametes , , , , , and .
The estimates of unknown parameters that maximize the likelihood (17) can be obtained by implementing the EM algorithm. In step E, we calculate the posterior probability of a QTL genotype given a specific marker genotype of individual by
In step M, we calculate the frequencies of nine observable gamete modes based on the calculated posterior probabilities using the following: which lead to the estimates of the frequencies of double reduction as
The genotypic value of QTL genotype and residual variance are estimated by The iteration is repeated between the E step, (3) and (18), and M step, (19)–(22), until stable estimates are obtained. The stable estimates are the maximum likelihood estimates (MLEs) of parameters.
2.5. Hypothesis Testing
Following parameter estimation, several hypotheses should be tested. The hypothesis about the presence of a QTL segregating in the pseudotest backcrosses is formulated as The difference between the log-likelihood functions under the null and alternative hypotheses are calculated. But the distribution of this log-likelihood ratio (LR) is not known because of the violation of regularity conditions for the mixture model (1). For this reason, a commonly used empirical approach based on permutation tests by reshuffling the relationships between the marker genotypes and phenotypes  is used to determine the critical threshold, in order to judge whether there is a QTL for the trait.
After a significant QTL is detected, the next hypothesis is about the additive genetic effect of the QTL. This can be tested by formulating the null hypothesis, under which the estimates of genotypic values of QTL genotypes can be obtained with the EM algorithm as described above, but posing three constraints derived from (7), (8), and (9). Similarly, the dominant genetic effects can be tested with the null hypothesis, with estimates of genotypic values under the constraints derived from (10)–(15). All these genetic effects can be tested individually.
2.6. Application to Simulated Data
A pseudotest backcross for a multivalent tetraploid was hypothesized, in which a marker is assumed to be linked with a QTL that affects a quantitative trait. Marker and QTL genotypes were simulated for the pseudotest backcross of different sample sizes () based on a range of double reduction (0.05, 0.15, 0.30) and recombination fraction (0.05, 0.25). We assume the same frequency of double reduction between the marker and QTL. The phenotypic value of an individual is expressed as the summation of genotypic values of a QTL genotype carried by this individual and a normally distributed error. The genotypic values of a QTL genotype are calculated by (5), where the overall mean is assigned as 1, and the additive and dominant effects assigned as and . The error variance is determined according to the heritability of and, respectively.
In this simulation study, fully informative markers and QTL are assumed and, thus, the double reduction at the marker can be estimated analytically. The estimates of the parameters converge to stable values at a rapid rate given that there are closed forms for parameter estimators in the EM framework. We evaluate the estimation of the other parameters related to QTL segregation, effects, and position. The means of the MLEs of the QTL-related parameters and their standard errors based on 1000 simulation replicates are illustrated in Tables 1, 2, and 3. With a small sample size (100), the double reduction of the QTL was accurately estimated, with the precision of estimation relatively independent of the magnitude of heritability and the degree of QTL-marker linkage (Table 1). The most significant factor that affected the estimate of QTL position (in terms of its recombination with the marker) was the heritability, followed by sample size and the degree of QTL-marker linkage. In general, at least a sample size of 200 was required to reliably estimate the QTL position for a major gene that explains about 20%–30% of the phenotypic variance.
The estimation precision of the QTL effects depended on the heritability, sample size, and degree of QTL-marker linkage. As heritability, sample size, and linkage degree increased, the estimates of various QTL effects were more precise. As compared with the dominant genetic effects, the estimates of the additive genetic effects required a larger sample size, more precise phenotypic measurements (leading to a higher heritability), and a denser linkage map (with a stronger degree of QTL-marker linkage). We found that the estimates of QTL effects were influenced by the frequency of QTL double reduction. At low frequencies of double reduction, the effects of QTL were more accurately estimated than at higher frequencies. For a QTL undergoing a strong double reduction (say ), a sample size of at least 400 is required even if the QTL explains a large proportion of the phenotypic variance (0.4). For a modest-sized QTL, a much larger sample size was required.
We performed a simulation study to test how the misspecification of double reduction affects the estimate of QTL-related parameters. This was done by using traditional mapping models (without considering double reduction) to analyze the simulated data of QTL genotypes with different degrees of double reduction. When a QTL undergoes double reduction, traditional models that do not consider double reduction provided misleading results about the estimates of QTL effects and position (data not shown). Furthermore, increasing heritabilities and sample sizes did not improve the estimates. In this case, the power of QTL detection was reduced.
A statistical method for genetic mapping of quantitative trait loci (QTLs) in a multivalent tetraploid undergoing a double reduction process is described. As an important cytological characteristic of polyploids, double reduction may play a significant role in plant evolution and maintenance of genetic polymorphism in natural populations. Also, because double reduction affects the result of linkage analysis through the crossing-over events between different chromosomes [24, 25], it is important to incorporate double reduction into a QTL mapping framework. This method provides a powerful tool for QTL mapping and understanding the genetic control of a quantitative trait in a multivalent tetraploid.
The method capitalizes on 11 different classifications of two-locus gamete formations, derived by Fisher , during multivalent tetraploid meiosis and has proven to be powerful for simultaneous estimation of the frequencies of double reduction and the recombination fraction between different loci. Although a couple of statistical approaches have been proposed to map multivalent tetraploid QTLs [26, 27], this method has for the first time incorporated Fisher's tetrasomic inheritance into the mapping framework, thus enhancing the cytological relevance of QTL detection. Results from simulation studies showed that the method can be used to map QTLs in a controlled cross of multivalent tetraploids when the mapping population is adequately large (say 400). When a QTL undergoes double reduction, traditional mapping approaches will incorrectly estimate the position and effects of the QTL, proportional to the degree of double reduction. The new method can estimate the double reduction of a QTL, an important parameter related to the genetic diversity and evolution of polyploids [28, 29].
Because of the high complexity of the mixture model implemented with tetrasomic inheritance, we only considered a one-marker model for QTL mapping. Interval mapping, which localizes a QTL with two flanking markers, has proven to be more advantageous in parameter estimation over the one-marker model . It will be worthwhile to integrate components of our model into the interval mapping framework to fully explore the statistical merits of interval mapping for QTL mapping in multivalent tetraploids. Furthermore, the model proposed in this article assumes the segregation of fully informative codominant loci, each with 10 distinct genotypes, in a controlled cross of multivalent tetraploids. For partially informative codominant markers, a two-stage hierarchical mixture model will be needed to model the different allelic configurations for a phenotypically identical genotype. Although molecular marker technologies have improved in recent years, dominant markers may still be used in genetic mapping projects of some underrepresentative species including polyploids. Thus, it is also important to extend our model to map QTLs with dominant markers. For partially informative loci, the number of QTL genotypes may be unknown and, thus, a model selection procedure should be incorporated to determine the optimal number of genotypes at a QTL.
The genetic mapping of polyploids is complex because of their complex inheritance modes. Sophisticated statistical models are required to tackle genetic problems hidden in the polysomic inheritance of polyploids. Currently, there are some debates on the optimal modeling of tetrasomic inheritance in linkage analysis [13, 25] and QTL mapping [18, 31] partly because of our limited knowledge about these fascinating species. Before a detailed understanding of the cytological mechanisms for meioses in multivalent polyploids is obtained, this type of debate will continue. In any case, the development of powerful statistical models for polyploid mapping continues to be a pressing need. The application of these models to real-world data will not only test their usefulness, but also provide an unprecedented opportunity to understand the genetic differentiation among polyploid genomes and characterize the genetic architecture of quantitatively inherited traits for this unique group of species. Software for the method described is available at http://statgen.psu.edu/.
This work is supported by Joint grant DMS/NIGMS-0540745. Additional support is through the Office of Sciences (BER), U.S. Department of Energy, Interagency Agreement no. DE-A102-07ER64453.
- P. S. Soltis and D. E. Soltis, “The role of genetic and genomic attributes in the success of polyploids,” Proceedings of the National Academy of Sciences of the United States of America, vol. 97, no. 13, pp. 7051–7057, 2000.
- J. G. Robins, D. Luth, T. A. Campbell et al., “Genetic mapping of biomass production in tetraploid alfalfa,” Crop Science, vol. 47, no. 1, pp. 1–10, 2007.
- M. Stift, C. Berenos, P. Kuperus, and P. H. Van Tienderen, “Segregation models for disomic, tetrasomic and intermediate inheritance in tetraploids: a general procedure applied to rorippa (yellow cress) microsatellite data,” Genetics, vol. 179, no. 4, pp. 2113–2123, 2008.
- J. A. G. da Silva, M. E. Sorrells, W. L. Burnquist, and S. D. Tanksley, “RFLP linkage map and genome analysis of Saccharum spontaneum,” Genome, vol. 36, no. 4, pp. 782–791, 1993.
- R. Ming, S. C. Liu, Y. R. Lin et al., “Detailed alignment of Saccharum and Sorghum chromosomes: comparative organization of closely related diploid and polyploid genomes,” Genetics, vol. 150, no. 4, pp. 1663–1682, 1998.
- R. Ming, S. C. Liu, J. E. Irvine et al., “Comparative QTL analysis in a complex autopolyploid: candidate genes for determinants of sugar content in sugarcane,” Genome Research, vol. 11, pp. 2075–2084, 2001.
- R. C. Meyer, D. Milbourne, C. A. Hackett, J. E. Bradshaw, J. W. McNichol, and R. Waugh, “Linkage analysis in tetraploid potato and association of markers with quantitative resistance to late blight (Phytophthora infestans),” Molecular and General Genetics, vol. 259, no. 2, pp. 150–160, 1998.
- D. J. Brouwer and T. C. Osborn, “A molecular marker linkage map of tetraploid alfalfa (Medicago sativa L.),” Theoretical and Applied Genetics, vol. 99, no. 7-8, pp. 1194–1200, 1999.
- Z. W. Luo, C. A. Hackett, J. E. Bradshaw, J. W. McNicol, and D. Milbourne, “Construction of a genetic linkage map in tetraploid species using molecular markers,” Genetics, vol. 157, no. 3, pp. 1369–1385, 2001.
- B. Julier, S. Flajoulot, P. Barre et al., “Construction of two genetic linkage maps in cultivated tetraploid alfalfa (Medicago sativa) using microsatellite and AFLP markers,” BMC Plant Biology, vol. 3, article 9, 2003.
- C. A. Hackett, J. E. Bradshaw, R. C. Meyer, J. W. McNicol, D. Milbourne, and R. Waugh, “Linkage analysis in tetraploid species: a simulation study,” Genetical Research, vol. 71, no. 2, pp. 143–154, 1998.
- M. I. Ripol, G. A. Churchill, J. A. G. da Silva, and M. Sorrells, “Statistical aspects of genetic mapping in autopolyploids,” Gene, vol. 235, no. 1-2, pp. 31–41, 1999.
- Z. W. Luo, R. M. Zhang, and M. J. Kearsey, “Theoretical basis for genetic linkage analysis in autotetraploid species,” Proceedings of the National Academy of Sciences of the United States of America, vol. 101, no. 18, pp. 7040–7045, 2004.
- Z. W. Luo, ZE. Zhang, L. Leach, R. M. Zhang, J. E. Bradshaw, and M. J. Kearsey, “Constructing genetic linkage maps under a tetrasomic model,” Genetics, vol. 172, no. 4, pp. 2635–2645, 2006.
- R. Wu, C. X. Ma, and G. Casella, “A bivalent polyploid model for linkage analysis in outcrossing tetraploids,” Theoretical Population Biology, vol. 62, no. 2, pp. 129–151, 2002.
- R. Wu, C. X. Ma, and G. Casella, “A mixed polyploid model for linkage analysis in outcrossing tetraploids using a pseudo-test backcross design,” Journal of Computational Biology, vol. 11, no. 4, pp. 562–580, 2004.
- J. Sybenga, “Preferential pairing estimates from multivalent frequencies in tetraploids,” Genome, vol. 37, no. 6, pp. 1045–1055, 1994.
- C. X. Ma, G. Casella, Z. J. Shen, T. C. Osborn, and R. Wu, “A unified framework for mapping quantitative trait loci in bivalent tetraoploids using single-dose restriction fragments: a case study from alfalfa,” Genome Research, vol. 12, no. 12, pp. 1974–1981, 2002.
- R. Wu, C. X. Ma, and G. Casella, “A bivalent polyploid model for mapping quantitative trait loci in outcrossing tetraploids,” Genetics, vol. 166, no. 1, pp. 581–595, 2004.
- J. D. Bever and F. Felber, “The theoretical population genetics of autopolyploidy,” Oxford Surveys in Evolutionary Biology, vol. 8, pp. 185–217, 1992.
- D. V. Butruille and L. S. Boiteux, “Selection-mutation balance in polysomic tetraploids: impact of double reduction and gametophytic selection on the frequency and subchromosomal localization of deleterious mutations,” Proceedings of the National Academy of Sciences of the United States of America, vol. 97, no. 12, pp. 6608–6613, 2000.
- C. D. Darlington, “Chromosome behaviour and structural hybridity in the Tradescantiae,” Journal of Genetics, vol. 21, no. 2, pp. 207–286, 1929.
- R. A. Fisher, “The theory of linkage in polysomic inheritance,” Philosophical Transactions of the Royal Society B, vol. 233, pp. 55–87, 1947.
- S. S. Wu, R. Wu, C. X. Ma, Z. B. Zeng, M. C. K. Yang, and G. Casella, “A multivalent pairing model of linkage analysis in autotetraploids,” Genetics, vol. 159, no. 3, pp. 1339–1350, 2001.
- R. Wu and C. X. Ma, “A general framework for statistical linkage analysis in multivalent tetraploids,” Genetics, vol. 170, no. 2, pp. 899–907, 2005.
- C. A. Hackett, J. E. Bradshaw, and J. W. McNicol, “Interval mapping of quantitative trait loci in autotetraploid species,” Genetics, vol. 159, no. 4, pp. 1819–1832, 2001.
- D. Cao, B. A. Craig, and R. W. Doerge, “A model selection-based interval-mapping method for autopolyploids,” Genetics, vol. 169, no. 4, pp. 2371–2382, 2005.
- K. G. Haynes and D. S. Douches, “Estimation of the coefficient of double reduction in the cultivated tetraploid potato,” Theoretical and Applied Genetics, vol. 85, no. 6-7, pp. 857–862, 1993.
- R. Wu, M. Gallo-Meagher, R. C. Littell, and Z. B. Zeng, “A general polyploid model for analyzing gene segregation in outcrossing tetraploid species,” Genetics, vol. 159, no. 2, pp. 869–882, 2001.
- E. S. Lander and S. Botstein, “Mapping mendelian factors underlying quantitative traits using RFLP linkage maps,” Genetics, vol. 121, no. 1, pp. 185–199, 1989.
- D. Cao, T. C. Osborn, and R. W. Doerge, “Correct estimation of preferential chromosome pairing in autotetraploids,” Genome Research, vol. 14, no. 3, pp. 459–462, 2004.
Copyright © 2010 Jiahan Li et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.