Advanced Designs and Statistical Methods for Genetic and Genomic Studies of Complex Diseases
View this Special IssueResearch Article  Open Access
Jing Han, Yongzhao Shao, "The Transmission Disequilibrium/Heterogeneity Test with ParentalGenotype Reconstruction for Refined Genetic Mapping of Complex Diseases", Journal of Probability and Statistics, vol. 2012, Article ID 256574, 14 pages, 2012. https://doi.org/10.1155/2012/256574
The Transmission Disequilibrium/Heterogeneity Test with ParentalGenotype Reconstruction for Refined Genetic Mapping of Complex Diseases
Abstract
In linkage analysis for mapping genetic diseases, the transmission/disequilibrium test (TDT) uses the linkage disequilibrium (LD) between some marker and trait loci for precise genetic mapping while avoiding confounding due to population stratification. The sibTDT (STDT) and combinedTDT (CTDT) proposed by Spielman and Ewens can combine data from families with and without parental marker genotypes (PMGs). For some families with missing PMG, the reconstructioncombined TDT (RCTDT) proposed by Knapp may be used to reconstruct missing parental genotypes from the genotypes of their offspring to increase power and to correct for potential bias. In this paper, we propose a further extension of the RCTDT, called the reconstructioncombined transmission disequilibrium/heterogeneity (RCTDH) test, to take into account the identicalbydescent (IBD) sharing information in addition to the LD information. It can effectively utilize families with missing or incomplete parental genetic marker information. An application of this proposed method to Genetic Analysis Workshop 14 (GAW14) data sets and extensive simulation studies suggest that this approach may further increase statistical power which is particularly valuable when LD is unknown and/or when some or all PMGs are not available.
1. Introduction
Genetic linkage analysis is an important step in localizing and identifying genes in the chromosomes that underlie many human diseases and other traits of interest. A brief overview of commonly used statistical methods for linkage analysis including recently developed modelfree and modelbased methods for mapping qualitative and quantitativetrait loci, can be found in Shao [1]. For more extensive discussions on linkage analysis, readers can consult Ott [2].
Mapping genes that underlie complex diseases is of great current interest. The essence of linkage analysis is to identify statistical association between the inheritance of a complex genetic disease phenotype and inheritance of specific pieces of genetic material (called marker alleles). Many complex diseases including cancers have an inheritable component. For marker alleles that are associated with inheritance of complex diseases, it is common that the transmission probabilities of a marker allele of interest vary across heterozygous parents, due to locus heterogeneity, etiological heterogeneity, and many other complexities and/or combinations of them [3, 4]. Under such transmission heterogeneity, the transmission likelihood generally has the form of mixture models with many parameters [4, 5]. It can be shown that the efficient score test of such mixture likelihood includes two parts, one part related to transmission disequilibriums reflected by existence of linkage disequilibrium (LD) and the other related to transmission heterogeneity in the form of excessive dispersion in sharing of genetic markers as might be inferred from identical by descent (IBD) patterns (e.g., allelesharing patterns among affected sibpairs).
The transmission/disequilibrium test (TDT) developed by Spielman et al. [6] uses the LD information between some marker and disease loci for precise genetic mapping while avoiding confounding due to population stratification. It has been extended in multiple directions to meet the need for mapping complex traits, for example [7, 8]. In particular, missing parental genetic marker genotypes are very common for studying diseases with late onset. The sibTDT (STDT) and combinedTDT (CTDT) proposed by Spielman and Ewens [9] can deal with families without parental marker genotypes (PMGs) and can combine with data from families having PMG available. For some families with missing PMG, the reconstructioncombined TDT (RCTDT) proposed by Knapp [10, 11] may be used to reconstruct missing PMG from the genotypes of their offspring to increase power of the CTDT with a correction for potential bias in using reconstructed PMG [12].
An attractive feature of the RCTDT is that it utilizes the missing PMG that can be uniquely determined from the genotypes of the children and corrects potential biases resulting from using reconstructed PMG by employing appropriate null expectation and variance, supplied in Tables 1 and 2 of Knapp [10]. Similar to the TDT and CTDT, the RCTDT is powerful only when there is strong LD. Usually LD is unknown, and it is difficult to measure, thus it is generally desirable to combine LD information with information on allele sharing obtained based on IBD patterns [5, 13].
For fine mapping of complex genetic disorders, Shao [4] derived a general mixture likelihood for allele transmission under various transmission disequilibrium and/or heterogeneity and further proposed a transmission disequilibrium/heterogeneity (TDH) test to efficiently combine the transmission disequilibrium and heterogeneity information to maximize the power for detecting linkage using genetic data from nuclear families. The TDH test was shown to be an efficient score test of the general mixture likelihood derived in Shao [4] which is a summation of two parts, a transmission/disequilibrium test (TDT) part which utilizes the LD information and a transmission heterogeneity test (THT) part that utilizes IBDsharing information. To see that the THT utilizes IBDsharing information, it should be pointed out that general mixture likelihood contains the mixture binomial likelihood discussed in Huang and Jiang [13] and Lo et al. [5], and the test statistic of the classical mean test for affected sibpairs (ASPs) is a special case of the THT statistic with in Shao [4]. The classical mean test for affected sibpairs is the most wellknown IBD sharingbased linkage test [14]. The THT is applicable to general sibship and thus can be regarded as an extension of the classical mean test for affected sibpairs.
In practice, parental marker genotypes are often incomplete for many genetic studies particularly for late onset diseases. Only using families with complete parental maker genotype information would lead to throwing away a large portion of the useful data and can also lead to biases. It is thus crucially important to make the TDH test applicable to families with missing or incomplete parental marker genotype information. In this paper, we develop a transmission disequilibrium/heterogeneity test with parentalgenotype reconstruction, which utilizes both the LD information and the IBDsharing information and can combine families with or without PMG information.
The transmission disequilibrium/heterogeneity test with parentalgenotype reconstruction (RCTDH) will be introduced in the next section. In Section 3, the RCTDH test is applied to a data set from GAW14, and the results are compared with those of the RCTDT. Finally, simulation studies that use common genetic models [5, 15] are carried out to compare the power and the true size of the RCTDT and RCTDH test. The numerical results suggest that RCTDH test may greatly increase the statistical power which is particularly valuable whenever LD levels are unknown and/or whenever there is missing PMG information as in studying of a disease with late age of onset.
It should be pointed out that the main comparison made in this paper will be between RCTDT and RCTDH. We will not formally compare them with the classical IBDbased linkage tests such as those implemented in Genehunter and other softwares. The main rationale is as follows. We are mainly interested in fine mapping of genetic variants that underlie complex diseases, where the classical linkage tests are known to have low power because they do not utilize LD information effectively. With the rapid advancement of biotechnology, it is now feasible and affordable to use dense genetic markers, for example, the single nucleotide polymorphisms (SNPs), for genomewide linkage scan. With a large number of dense genetic markers (e.g., SNPs) some of the markers can be expected to fall into the LD block of the causal genetic variants; thus LD would generally exist to some degree for many markers. Thus the TDT and TDH tests would have power advantage over classical linkage tests which only effectively utilize the IBD information.
2. Method
2.1. Notation
It will be assumed that there are two alleles and at the marker locus, and allele is of particular interest. Let denote the number of affected children, let denote the number of unaffected children, and let denote the size of the sibship for family . In each family, all children have been typed at the marker locus, but the PMG may or may not be available. Let be random variables, denoting the number of affected (or unaffected) children with genotype in family . Small letters (i.e., and ) are used to denote the observed values of and . Further, let and denote the random variable and the observed number of children with genotype in family , respectively. denotes the number of alleles in affected children (i.e., ). The notation introduced here is consistent with Knapp [10, 11] and Han [16].
2.2. The TDH Test with Complete PMG
For completeness, we first consider the case when PMG are observed along with children's marker genotypes. Let be the number of alleles transmitted by the th marker heterozygous parent to the affected children. When the exact number of marker alleles transmitted to affected children cannot be determined as might happen in families with two heterozygous parents, then can be used to replace . Using in families with ambiguous transmissions, the TDT statistic can be written as where The transmission heterogeneity test (THT) statistic is denoted as where where the moments of under given the parental marker genotypes (PMGs) are summarized in Table 1.

The transmission disequilibrium/heterogeneity (TDH) test is based on the following test statistic [4]:
In terms of statistical optimality, it can be shown that the TDH test is the efficient score test from the mixture likelihood function under transmission disequilibrium and heterogeneity [4]. In theory, the efficient score test is known to be locally most powerful.
2.3. The ReconstructionCombined TDH (RCTDH) Test
When at least one parent with missing PMG, Knapp [10] proposed a reconstructioncombined TDT (RCTDT) to reconstruct PMG from the genotypes of their offspring and correct for the biases resulting from using reconstructed PMG. To improve the power to detect linkage, we propose the reconstructioncombined TDH test (RCTDH) using the following test statistic: where denotes the number of marker alleles in affected children, and , denote the appropriate null expectation and variance of , respectively, as can be found in Tables 1 and 2 of Knapp [10]. In the RCTDH statistic, the first term is the RCTDT statistic of Knapp [10] and the second term is the RCTHT statistic with the restriction. To get the appropriate null expectation , we need to derive the conditional distribution of given the constraint for reconstruction .
When one parental genotype is missing and reconstructible, the conditional probabilities of are listed in Table 2. Note that the family index has been dropped in the formula in Table 2. In the first column, the first parental genotype is typed and the second one is reconstructed. The second column presents a necessary and sufficient condition, for the observed marker genotypes in the offspring, to allow reconstruction of the parental genotypes. The details of the derivation are provided in Han [16].

When both parental genotypes are missing, the reconstruction condition and the conditional probabilities of are the same as that of one parental genotype is missing and the known parental genotype is .
When at least one parental genotype is missing and cannot be reconstructed, but the condition for the STDT is satisfied (i.e., there is at least one affected and at least one unaffected child in this family, not all of the children possess the same genotype), the distribution of can be calculated using the affected and unaffected children genotypes by the hypergeometric distribution. The details are provided in the Appendix section.
As in CTDT and RCTDT, families not belonging to the previous categories will be ignored.
3. Application to Genetic Analysis Workshop 14 Data
The proposed RCTDH test was applied to a Genetic Analysis Workshop 14 (GAW14) dataset to compare the power with that of RCTDT. The GAW14 simulated data were generated by Dr. David Greenberg. A behavioral disorder has been simulated in multiple replicates of four different populations/groups. There are 100 families in the Aipotu, Karnagar, and Danacaa data sets. There are 100 replicates for each data set. The results of power comparison of RCTDH with RCTDT to analyze the linkage between the trait b disease allele and the marker B01T0561 are presented in Table 3. This trait has incomplete penetrance with . Application of the RCTDH is illustrated in Table 3 with 50% and 100% missing parental genotypes. The power is based on type I error at 0.05 level.
 
In the study of 100% PMG missing, we ignore all the parental marker genotypes. In the study of 50% PMG missing, we use 50% families with parental marker genotypes and 50% families without parental marker genotypes. 
4. Simulation
4.1. Simulation SetUp
Simulation studies are conducted to compare the powers of the proposed RCTDH test with the RCTDT. To attain the correct type I error rates, we directly simulated the critical values under the null hypothesis of no linkage, in which (recombination frequency) = 0.5. In the simulations for the null distribution, 1,000,000 replicates of samples of nuclear families are generated and the empirical critical values are obtained. Based on 500 independent replicates and the empirical critical values, we estimate the power of the tests using the relative frequencies of the simulated test statistics which exceed the empirical critical values.
To generate the familybased data, as in earlier work [5], we consider two biallelic loci: one disease locus (with disease allele and normal allele ) and one marker locus (with allele and ). The frequency for disease allele is and for marker allele is . The linkage disequilibrium is the deviation of the frequency of haplotype from its equilibrium value (expected by chance). Define the parameter as In our simulations, we assume is the allele in with . Thus, the range of the parameter is in , in which 0 indicates linkage equilibrium. There are three penetrance parameters, , , and , corresponding to three possible disease genotypes.
Simulation study 1 closely followed the approach used by Boehnke and Langefeld [15]. For each model, a disease prevalence of 5% was assumed. The disease allele frequency that resulted from each of the disease models can be calculated by . Summary of the parameters used in this simulation study is in Table 4.

Summary of the parameters used in simulation study 2 is in Table 5. Four commonly used disease models are used here: dominant (), additive (, multiplicative (), and recessive () models.

4.2. Simulation Results
Table 6 presents estimates of the critical values for RCTDH at significance levels of .05, .01, and .001. Table 7 presents the estimates of the true type I error rate, at nominal significance levels of .05, .01, and .001. The simulations support the validity of approximating the null distribution with a standard normal distribution for RCTDT.
 
Note: determined on the basis of the dominant model with (Scenario 4 in Table 4). 
 
Determined on the basis of the dominant model with (scenario 4 in Table 4). 
The results of simulation study 1 are shown in Table 8. The disease models are denoted by “,” “,” and “” for the mode of inheritance (i.e., dominant, additive, and recessive); “1” and “2” for the value of (i.e., 1.0 and 0.5). The presented results come from the simulations with 4 sibs in each family, which have the same trend as those with 2 or 6 sibs in each family. In instances for which there is no parental genotype information available, application of the RCTDH instead of the RCTDT results in a consistent gain of power, especially when linkage disequilibrium is weak.
 
(dominant), (recessive), (additive); : 1 (1.0), 2 (0.5); with typeI error rate .05 based on 500 independent replicates of 150 nuclear families. is the measurement for linkage disequilibrium. When , there is no linkage disequilibrium. In this simulation study, all the parental marker genotypes are missing. 
We conducted simulation study 2 to compare the power of the proposed RCTDH test with that of RCTDT according to linkage disequilibrium in different scenarios based on Table 5, such as tight linkage versus weak linkage, full penetrance versus incomplete penetrance. Each simulated sample consists of families with an identical number of sibs () in each family (with ), which are ascertained on the basis of the presence of an affected child. Each sample consists of a total of 600 children. Half of the 200 families have complete PGM, and half of the families without PGM. To assess the power of the tests, 500 replicate samples are generated, under different simulation scenarios. For each replicate sample, the statistics obtained with the proposed RCTDH and with the RCTDT were calculated.
To compare power of the RCTDH with that of the RCTDT at different levels, we set the range of between 0 and 1, recombination fraction at 0.01, the frequency of allele at 0.1, the frequency of allele at 0.5, penetrance for genotype at full penetrance 1, penetrance for genotype at 0.01, and then the penetrance for genotype can be determined by the modes of inheritance. The results in Table 9 and Figure 1 show that the power increases with , and the proposed RCTDH is more powerful than RCTDT, especially when is weak as in scenario 1 of Table 4.
 
In this simulation, we used 50% families with available parental marker genotypes and 50% families without parental marker genotypes. 
Penetrance is the conditional probability of observing a phenotype given a specified disease genotype. In scenario 1, we set (the penetrance for a subject whose marker genotype is ) at 1, which is an idealistic penetrance. To compare the power of the proposed RCTDH with that of its competitor under different penetrance, is varied from full penetrance to incomplete penetrance 0.5, which is more realistic. The results in Table 9 and Figure 2 show that the proposed RCTDH has better power than RCTDT with half penetrance for genotype individuals as in scenario 5 of Table 5.
In summary, our simulation results show that the proposed RCTDH is generally more powerful than RCTDT for a broad range of , the tightness of the linkage, and across disease models.
5. Discussion
For mapping complex diseases, it is common that the transmission probabilities of a marker allele of interest vary across heterozygous parents, due to locus heterogeneity, etiological heterogeneity, and many other complexities and/or combinations of them [3, 4]. Under such transmission heterogeneity, the transmission likelihood generally has the form of mixture models with many parameters, and the efficient score test has two parts in the form of a TDH test [4]. This paper studies a TDH test which allows the inclusion of reconstructed parental marker genotype data and extends the RCTDT of Knapp [10, 11]. The proposed new approach was validated by simulation studies and GAW14 data sets, and the results indicate that the new approach might improve the power of familybased linkage analysis for a broad range of . Moreover, the simulation studies also indicate that the systematic power advantage of the RCTDH test over the RCTDT holds regardless of the underlying genetic models (e.g., recessive, dominant, additive, multiplicative).
Similar to RCTDT, the new approach can utilize the missing parental information that can be reconstructed from the child genotypes, especially including some families with genotypeconcordant or phenotypeconcordant sibs. In addition, the proposed test is a sibshiporiented method which does not require specification of the underlying genetic model; it naturally uses the multiple siblings by considering the sibship as a whole. The second part of the RCTDH statistic, the THT part of the test statistic, is based on information from IBD. This is quite obvious in the situation of affected sibpairs, where the THT is essentially equivalent to the socalled mean test [4, 13].
Many other linkage analysis tests such as the tests implemented by Genehunter have relatively low power with respect to TDT or TDH when is present. In reality, some degree of is often present particularly when we use dense genetic markers (e.g., SNPs) along the genome because they are available at increasingly cheaper cost, and these dense markers are already very affordable. With a large number of dense genetic markers, some markers may be expected to fall into the block of the causal variants. When using these affordable dense markers along the genome or candidate gene regions, we believe that RCTDH will have better chance of success than the classical IBDbased linkage methods in detecting linkage signals along the genome.
As high density SNP arrays become increasingly affordable to researchers, genomewide linkage studies are becoming common. Our TDH test has simple closed form test statistics which is computationally easy in addition to good overall power across a broad range of . Thus the proposed method would be potentially useful for genomewide linkage analysis. In contrast, likelihood ratio test for mixture likelihood is generally computationally intensive [5, 17]. Many existing linkage tests and algorithms such as the likelihood ratio test discussed in Lo et al. [5] would be too computationally intensive for genomewide studies or when the number of genotyped markers is large.
It is possible to further extend the method to be applicable to markers with more than two alleles, which would be of great interest in studying haplotypes of multiple loci. However, our proposed tests are already applicable to the commonly used biallelic markers; for instance, the widely used single nucleotide polymorphisms (SNPs) are convenient biallelic markers.
Appendix
A. Computational Details for the RCTDH Test
When there are no parents who have been typed, the conditional probability has been derived in equation (A.6) of Knapp [10]. When only one parent has been typed as , the same constraint for reconstruction applies, thus (A.6) of Knapp [10] also works. Next we derive the the conditional probability when only one parent has been typed as . The case of when only one parent has been typed as is obvious due to symmetry between and .
A.1. One Parental Genotype Has Been Typed as
Note that the family index has been dropped in the following formula.
Only one parental genotype has been typed, which is , but the genotype of the missing parent can be reconstructed as , if there is at least one child with genotype and at least one child with genotype . Here, the condition is and . To calculate the conditional distribution of , we first calculate the probability of satisfying the constraint for reconstruction, :
Then we calculate the joint probability of and :
There are three cases for the calculation: case 1: , , case 2: , , case 3: , .
Therefore the distribution of conditioned on is
A.2. At Least One Parental Genotype Is Missing and Cannot Be Reconstructed, but the Condition for the STDT Is Satisfied
In a sibship with affected and unaffected sibs, the total number of sibs is . Suppose that in this sibship the number of sibs who are of genotype is and the number of sibs who are of genotype is . Let be the number of sibs and let be the number of sibs who are classified as affected. As discussed in Spielman and Ewens [9], given the totals , , , , and , the numbers , can be regarded as two entries in a contingency table with marginal totals , , , , and . Therefore, the distribution of can be obtained by the generalized hypergeometric distribution [18, page 47]. More specifically, we have More formulas of parental marker genotype reconstruction probabilities under various missing genotypes types and constraints, as well as detailed derivations of these formulas, can be found in Han [16].
Acknowledgments
This research was partially supported by a Stony WoldHerbert Foundation grant, the MPD Research Consortium Project Grant (1P01 CA108671), and the New York University Cancer Center Supporting Grant (2P30 CA16087) and by the NYU NIEHS Center Grant (5P30 ES00260). The research of JH was carried out as part of her Ph.D. dissertation work at New York University.
References
 Y. Shao, “Linkage Analysis,” in Encyclopedia of Quantitative Risk Analysis and Assessment, John Wiley & Sons, Hoboken, NJ, USA, 2008. View at: Google Scholar
 J. Ott, Analysis of Human Genetic Linkage, Johns Hopkins University, 3rd edition, 1999.
 E. S. Lander and N. J. Schork, “Genetic dissection of complex traits,” Science, vol. 265, no. 5181, pp. 2037–2048, 1994. View at: Google Scholar
 Y. Shao, “Adjustment for transmission heterogeneity in mapping complex genetic diseases using mixture models and score tests,” Proceeding of the American Statistical Association, pp. 383–393, 2005. View at: Google Scholar
 S. H. Lo, X. Liu, and Y. Shao, “A marginal likelihood model for familybased data,” Annals of Human Genetics, vol. 67, no. 4, pp. 357–366, 2003. View at: Publisher Site  Google Scholar
 R. S. Spielman, R. E. McGinnis, and W. J. Ewens, “Transmission test for linkage disequilibrium: the insulin gene region and insulindependent diabetes mellitus (IDDM),” American Journal of Human Genetics, vol. 52, no. 3, pp. 506–516, 1993. View at: Google Scholar
 H. Zhao, “Familybased association studies,” Statistical Methods in Medical Research, vol. 9, no. 6, pp. 563–587, 2000. View at: Publisher Site  Google Scholar
 W. J. Ewens and R. S. Spielman, “The transmission/disequilibrium test,” in Handbook of Statistical Genetics, D. J. Balding, M. Bishop, and C. Cannings, Eds., John Wiley & Sons, 2nd edition, 2003. View at: Google Scholar
 R. S. Spielman and W. J. Ewens, “A sibship test for linkage in the presence of association: the sib transmission/disequilibrium test,” American Journal of Human Genetics, vol. 62, no. 2, pp. 450–458, 1998. View at: Publisher Site  Google Scholar
 M. Knapp, “The transmission/disequilibrium test and parentalgenotype reconstruction: the reconstructioncombined transmission/disequilibrium test,” American Journal of Human Genetics, vol. 64, no. 3, pp. 861–870, 1999. View at: Publisher Site  Google Scholar
 M. Knapp, “Using exact P values to compare the power between the reconstruction combined transmission/disequilibrium test and the sib transmission/disquilibrium test,” American Journal of Human Genetics, vol. 65, no. 4, pp. 1208–1210, 1999. View at: Publisher Site  Google Scholar
 D. Curtis, “Use of siblings as controls in casecontrol association studies,” Annals of Human Genetics, vol. 61, no. 4, pp. 319–333, 1997. View at: Publisher Site  Google Scholar
 J. Huang and Y. Jiang, “Linkage detection adaptive to linkage disequilibrium: the disequilibrium maximumlikelihoodbinomial test for affectedsibship data,” American Journal of Human Genetics, vol. 65, no. 6, pp. 1741–1759, 1999. View at: Publisher Site  Google Scholar
 W. C. Blackwelder and R. C. Elston, “A comparison of sibpair linkage tests for disease susceptibility loci,” Genetic Epidemiology, vol. 2, no. 1, pp. 85–97, 1985. View at: Google Scholar
 M. Boehnke and C. D. Langefeld, “Genetic association mapping based on discordant sib pairs: the discordantalleles test,” American Journal of Human Genetics, vol. 62, no. 4, pp. 950–961, 1998. View at: Publisher Site  Google Scholar
 J. Han, Familybased linkage analysis allowing for missing parental information [Ph.D. thesis], New York University, 2005.
 X. Liu and Y. Shao, “Asymptotics for likelihood ratio tests under loss of identifiability,” The Annals of Statistics, vol. 31, no. 3, pp. 807–832, 2003. View at: Publisher Site  Google Scholar  Zentralblatt MATH
 W. Feller, An Introduction to Probability Theory and Its Applications, vol. 1, John Wiley & Sons, New York, NY, USA, 3rd edition, 1968. View at: Zentralblatt MATH
Copyright
Copyright © 2012 Jing Han and Yongzhao Shao. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.