Abstract

Statistical power is one of the major concerns in genetic association studies. Related individuals such as twins are valuable samples for genetic studies because of their genetic relatedness. Phenotype similarity in twin pairs provides evidence of genetic control over the phenotype variation in a population. The genetic association study on human longevity, a complex trait that is under control of both genetic and environmental factors, has been confronted by the small sample sizes of longevity subjects which limit statistical power. Twin pairs concordant for longevity have increased probability for carrying beneficial genes and thus are useful samples for gene-longevity association analysis. We conducted a computer simulation to estimate the power of association study using longevity concordant twin pairs. We observed remarkable power increases in using singletons from longevity concordant twin pairs as cases in comparison with cases of sporadic proband. A similar power would require doubled sample sizes for fraternal twins than for identical twins who are concordant for longevity suggesting that longevity concordant identical twins are more efficient samples than fraternal twins. We also observed an approximate of 2- to 3-fold increase in sample sizes needed for longevity cutoff at age 90 as compared with that at age 95. Overall, our results showed high value of twins in genetic association studies on human longevity.

1. Introduction

Complex phenotypes such as human longevity are associated with multiple genetic and environmental factors with perhaps the majority of them having low to modest effects [1]. As such, the power issue has been a crucial concern in genetic association studies. Although a desired statistical power can always be achieved by increased sample sizes, there can be many factors including the laboratory cost that easily limit the scale of a study. This is especially true for the currently still expensive genomic analysis, for example, the genome sequencing technologies. Twins are special samples that have made remarkable contribution to human genetic studies due to their genetic and environmental sharing. In genetic epidemiology, the popular classical twin design has been widely used in estimating the genetic and environmental components in the variation of disease phenotypes or traits [2]. For example, using Danish twins, the genetic contribution to human longevity has been estimated as about 25% [3, 4]. The low heritability and the complex nature of human longevity make genetic association study on the trait low powered. In the literature, the search for genes associated with longevity has continued over many decades with only one gene, APOE, being conclusively confirmed.

Because of their genetic relatedness, twin pairs concordant for longevity are enriched for carrying beneficial genes and thus association studies using singletons from longevity concordant twin pairs should have increased power in comparison with using sporadic longevity individuals. This paper assesses and explores the power advantage for the use of longevity concordant twin pairs by computer simulation. The simulation is based on a proportional hazard assumption and makes use of the recent life table data of the Danish population. Lifespan data will be generated for identical or monozygotic (MZ) and fraternal or dizygotic (DZ) twin pairs with power compared between zygosities and across different experiment setups.

2. Materials and Methods

2.1. Experiment Design

The most popular experiment design for genetic association study on human longevity is the case-control design which samples longevity individuals (e.g., centenarians and nonagenarians) as cases and young or middle aged individuals as controls [5]. The power issue for the case-control design has been investigated by Tan et al. [6]. The current simulation study focuses on the power advantage of using singleton twins from twin pairs concordant for longevity as cases (Figure 1). That is, from each concordant twin pair reaching certain threshold for longevity (e.g., 90 or 95 in this simulation), only one twin will be taken as case for genotyping. The controls will be collected as in ordinary case-control studies from unrelated individuals. With this design, the final data for analysis contain unrelated cases and controls with cases collected as singletons from longevity concordant twin pairs (one from each pair) and controls as unrelated individuals at age 40–50 years. The study design draws equal number of cases and controls in the final sample.

2.2. The Danish Life Table Data

Our simulation of individual lifespan data is based on Danish population survival information, that is, the Danish life table data available from Statistics Denmark (http://www.statbank.dk/). The current simulation work used the most recent Danish life table data for period 2012-2013 with life expectancy 78 years for males and 82 years for females. The simulation takes the mean survival rate over the two sexes. The use of observed population survival rate avoids imposing any parametric form for the survival function in the simulation and thus ensures that our simulated survival data follows survival distribution in a real population (i.e., the Danish population).

2.3. The Proportional Hazard Model

For a given genetic variant, for example, single nucleotide polymorphism (SNP), we assign a frequency parameter and a relative risk parameter for the minor allele of the SNP. Then the observed population survival rate at any age is a weighted average of three subpopulations carrying 0, 1, and 2 copies of the minor allele, respectively [7], that is, In (1), , , , are survival rate for the total population and for the three subpopulations at age ; , , are frequencies for corresponding genotypes following binomial distribution of the minor allele frequency (MAF), . With the proportional hazard assumption, the hazards of death corresponding to and can be expressed as and so that and likewise, . With these relationships and for given MAF and relative risk , (1) can be solved numerically to obtain a nonparametric estimate of the baseline survival and then and can be calculated and used for generating individual lifespan. In the simulation, we introduced a heterogeneity model for the baseline survival function in order to take into account of the unobserved factors that also affect individual survival [8].

2.4. Simulating Lifespan

In order to simulate lifespan using genotype-specific survival, , , and , a genotype will be randomly assigned to each individual using the binomial probability of MAF. For MZ twin pairs, this was first done for one singleton and then the same genotype was copied to the cotwin. For DZ twin pairs, we started with independently simulating genotypes for each parent of a twin pair and assigned genotype for a singleton in a DZ pair by randomly taking one allele from each parent. This ensures that two twins within a pair have 50% chance of inheriting an allele identical by descent (IBD). The lifespan for unrelated controls was simulated by randomly assigning a genotype to each control subject using the binomial distribution of the minor allele. Subjects at age 40–50 years were selected as controls. We simulated power for cases from concordant MZ and DZ twin pairs separately aiming at comparing power difference between zygosities.

2.5. Statistical Testing and Power Calculation

Since the samples collected in our simulation design are case-control samples, the popular Armitage’s trend test [9] was applied for statistical testing on our simulated samples in each replicate. Similar test had been used in our previous power simulation for genetic association studies on human longevity [6, 10]. Power simulation was done for different combinations of mode of inheritance (additive, dominant, and recessive), allele frequency (), and risk of allele () and for different sample sizes of cases (). For each combination, replicates were simulated and statistical testing was applied to each replicate. With 1000 values obtained from Armitage’s trend test, corresponding power was calculated as where is an indicator function for logical expression with 1 if true and 0 if false.

3. Results

In Table 1, we show the power estimates for additive effect of SNP alleles with different combinations of genetic parameters (MAF, relative risk) for different sample sizes from concordant MZ twins and for different cutoffs of longevity. With 800 cases aged 95+, the design is able to identify a common SNP () with a small effect of only 5% reduction of rate of death (). For a small sample size of 200 cases aged over 95 years, the study design has good power (over 0.8) to capture a common SNP with that reduces hazard of death by 10% (); a SNP with lower and hazard reduction of 15% (); and a rare SNP with and hazard reduction of 20% (). A small sample of 100 cases aged 95+ is able to detect a common SNP () with risk reduction of 15% (). When the longevity cut-off is set to 90 years, a sample size of 500 to 800 cases is required to achieve comparable power, an increase of about 3 folds. The power for detecting dominant effect SNPs using MZ cases (Table 2) is almost comparable with that for the additive effect SNPs with low MAF in Table 1 although the difference increases with increasing MAF. Note that, for dominant effect SNPs, the power starts to decline when MAF approaches 0.5. The statistical power is largely reduced for recessive effect SNPs (Table 3). However, for high MAF SNPs (), the design has good power with 500 cases aged over 95 for . Comparable power can be achieved with only 200 cases for . Interestingly, when comparing power estimates between the two longevity cutoffs (90 and 95 years) for defining cases, we see that the cutoff of 90+ needs 2 to 3 times larger sample sizes to obtain comparable power as compared with that of 95+, a conclusion that applies to Tables 1 to 3.

Tables 4, 5, and 6 carry power estimates for similar parameter settings corresponding to Tables 13 except that we added a bigger sample size of 1500 cases considering the relative ease in sampling DZ than MZ concordant twin pairs. The power estimates for DZ twins exhibit similar pattern as for MZ twins but a 2-3-fold increase in sample size is required to obtain comparable power for corresponding settings as in Tables 13.

4. Discussions

Using computer simulation, we have estimated statistical power for a case-control design using singleton cases from twin pairs concordant for longevity. Different from the ordinary case-control studies that collect sporadic centenarians as cases, we limit our cutoffs for longevity to 90 and 95 considering rarity of twin pairs concordant for extreme longevity. It is interesting to compare our power estimates with those from our previous simulation study on ordinary case-control design with sporadic nontwin centenarians as cases [6]. Even with lowered threshold for longevity at age 95, the concordant MZ twin design is able to achieve equivalent power as in ordinary case-control design with centenarian cases for similar or even smaller sample sizes (comparing Tables 13 with Tables 1–3 in Tan et al. [6]). With an age cutoff at 90 years, our power estimates can be compared to those in our previous simulation which also simulated power for using nonagenarians as cases (Tables 4–6 in Tan et al. [6]). For comparable power estimates, the case-control design with cases as sporadic nonagenarians would need much larger sample sizes compared with using nonagenarian cases from concordant twin pairs (3-4 folds for MZ and about 2 folds for DZ twins). Overall, our results revealed remarkable power advantage in using longevity concordant twins over ordinary case-control design.

Comparing the power estimates in Tables 13 with those in Tables 46, we observe that a power advantage in using cases of MZ twins over cases of DZ twins with the latter requiring almost doubled sample sizes to reach equivalent power. Although relatively lower powered, the DZ twin pairs are actually the same as sibling pairs in genetic sharing meaning that, in practice, concordant DZ twins can be replaced by concordant sibling pairs making sample collection easier and more feasible. On the other hand, when laboratory cost for genotyping is a major concern (such as genome-wide analysis or next-generation sequencing), MZ cases are the best choice as they help to maintain good power but with the lowest sample sizes.

Another interesting finding is the power difference between the two age cutoffs. For both MZ and DZ twins, approximately 2 to 3 times larger sample sizes are needed for cases of 90+ as compared with cases of 95+. For example, for an additive effect allele with and , the power for 300 cases of 95+ is equivalent to that for 800 cases of 90+ in both Tables 1 and 2. The large difference in power is understandable considering the very high selection pressure going on during this age interval. Survival data from the Danish 1905 birth cohort showed equal chances for surviving from birth to age 92 as from age 92 to 100 [11]. As a trade-off for power advantage, the extremely high survival selection also adds difficulty in collecting concordant twin pairs aged 95+. The study design should always balance sampling feasibility, power, and age cutoff.

The power advantage in using singleton cases from longevity concordant twin pairs is purely due to increased likelihood for carrying longevity-linked genetic variants. For the same sample size, this study design has the same genotyping cost as an ordinary case-control study but with much higher power. In other words, acceptable power can be achieved with lower cost. This is especially important as current techniques for genomic analysis, for example, the microarray and the next-generation sequencing techniques, are still expensive. Although this study focuses only on power advantage of using twins in longevity studies, our results should also reflect similar situation in human disease studies, that is, using disease concordant twins as cases. Moreover, although our current study focuses on advantage of concordant MZ twins in gene-longevity association studies, MZ twin pairs discordant for longevity or diseases are also useful samples for looking for environmental factors. Taking all together, the unique samples of twins will have a good potential in contributing to the molecular genetic studies of human complex diseases and traits.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This study was supported by Lundbeck Foundation research Grant Vort j. nr. R100-A9716. The authors thank Weilong Li, student help from Department of Mathematics and Computer Science, University of Southern Denmark, for excellent assistance in computer simulation work and technical supports.