Abstract

The determination of the meiosis I nondisjunction fraction plays an important role in identifying the characteristics of affected individuals and their mothers, which can generate aneuploidies. The number of individuals with one, two, and three peaks pattern is used to obtain the information; however, the data are susceptible to misclassification. We review the misclassification model previously introduced in the literature which considers a common misclassification error. This paper aims to introduce a joint prior distribution for the meiosis I nondisjunction fraction and the misclassification error. We prove that the reference prior is a proper distribution. We analyze a Brazilian Down syndrome dataset and compare the results with those obtained through Bayes-Laplace and beta prior distributions.

1. Introduction

In humans, aneuploidies are common causes of mental retardation, pregnancy losses, and fetal death. Although the causes of aneuploidies are unknown, it is known that the risk of having children with some kind of aneuploidy, such as trisomies 21 (Down syndrome), 18 (Edward’s syndrome), or 13 (Patau syndrome), increases with the mother’s age [1].

Trisomy 21 is the most prevalent human genetic disorder and occurs in approximately 1 out of 700 births. It is the most common cause of mental retardation of genetic origin. Down syndrome affects the cognitive abilities of the child and approximately half of them can also have congenital heart defects and problems with hearing and vision, and they are prone to develop pulmonary hypertension. The causes of Down syndrome are unknown, but there is evidence that in the trisomy of chromosome 21 the rate of nondisjunction increases with the age of the mother [2]. Women aged 35 or older have significantly higher risk of having a child with Down syndrome. In addition, the increase in the rate of non-disjunction in meiosis II is higher than in meiosis I, if the mother is between 35 and 39 years old. As a result, the determination of the fraction of non-disjunction in chromosomal segregation, taking place in meiosis I in each chromosome, plays an important role in understanding aneuploidies. It is useful to identify possible factors generating such abnormalities, for example: geography, nutrition, age, reproductive, practices. Prenatal diagnosis of aneuploidies is usually done by employing chromosome karyotype, fluorescent in situ hybridization (FISH), and polymerase chain reaction (PCR-) based approaches, see [1, 36] for further details.

Methods to estimate considering information from the affected children and their parents are presented in [715] among others. More recently, Bayesian and classical approaches to infer about , assuming models that do not take into account the parental information, are presented in [2, 16].

The model proposed in [16] considers that, using the PCR, it is possible to type microsatellites located near the chromosomal centromere (to avoid problems due to recombination) through primers designed from the unique DNA sequence flanking the tandem repeat arrays, followed by quantitative analysis by computer-assisted laser densitometry [17]. Trisomic patients will display, in informative microsatellite loci, three fragment peaks of equal intensity, two fragments at an average 2 : 1 dosage, or one individual fragment (see an example of classification in Figure 1). The relative proportion of the three cases depends on the type of non-disjunction, although the heterozygosity level is also important. For a three-allele pattern to emerge, it is necessary for the non-disjunction to occur in the first meiotic division, the mother to be heterozygous at the relevant locus, and the allele carried by the sperm to be different from the two maternal alleles. A two-allele pattern is observed either in the first or the second meiosis division, depending on the combination of the chromosomes transmitted by the parents. One peak pattern occurs if the parents are homozygous for the allele inherited, see details in [9]. A scheme for the non-disjunction in meiotic process can be found in [16].

The model proposed in [16] brings some novelty to the analysis of trisomies in the sense that the parental information is not taken into account in the modeling. The use of archive material is possible and important in the study of rare trisomies; however, since some other peaks can also be observed, as a consequence of residuals generated by the preparation of the genetic material (see Figure 1), misclassification can occur and thus the fraction may be poorly estimated. Extensions to the model in [16] were introduced in [18, 19] to accommodate misclassification; such misclassification models include the one indicated in [16] as a particular case. In both papers, the inference is developed under the Bayesian paradigm using beta prior distributions for the parameters and .

The specification of priors makes it possible to incorporate scientific hypotheses into the analysis and, consequently, allows us to handle complex problems and situations in which little sample information is available. Reference (or noninformative) priors are frequently used to describe prior uncertainty about a parameter. There are several methods to construct non-informative priors; we consider the Jeffreys’ approach which is widely used in the literature. The idea behind the Jeffreys prior is to provide as little prior information as possible, relative to the sample information.

In this study, we review the misclassification model for trisomies presented in [18]. The main goals are to obtain the joint Jeffreys prior for the fraction of non-disjunction taking place in the meiosis I, to determine the misclassification error , and to prove that the Jeffreys prior is proper. We implement a Metropolis-Hastings algorithm to sample from the posterior distributions. A case study is developed to analyze the sample of Brazilian individuals with Down syndrome reported in [16]. We compare the results obtained from the Jeffreys prior suggested here with those obtained in [18] via Bayes-Laplace priors and those using a similar approach under the model proposed in [16].

This paper is organized as follows. In Section 2, we briefly present the misclassification model for trisomies according to [18]. We build the joint Jeffreys prior for and and present some of its properties. In Section 3, we apply the proposed method to analyze a real dataset and compare the results with those obtained in the literature. Finally Section 4 shows the main conclusions.

2. Model Description and Inference

Here, we present the misclassification model introduced in [18] and build the Jeffreys prior for the parameters indexing the model. We also review the model proposed in [16] which ignores the presence of misclassification.

In order to build the model, we assume that the hypothesis of Hardy-Weinberg equilibrium [20] has been verified for the population; thus, we can consider as known the relative frequency , , of the allele in a multiallelic locus of microsatellites.

Let be the number of individuals with peaks pattern, , observed in a sample of trisomic individuals; define . Denote by the probability of being , , the true number of peaks in the microsatellite locus of interest. As proved in [16], depends on the fraction of non-disjunction taking place in Meiosis I as follows: where , for all for all , and for all . Note that, under this notation, we have and .

2.1. The Misclassification Model

In order to define the misclassification model, we also consider the auxiliary random variables and denoting the true (nonobserved) and the observed number of peaks in a trisomic individual, respectively. In addition, denote by the probability of being , , the observed number of peaks in the microsatellite locus of interest.

Let for , , be the probability of misclassifying an individual. It follows from probability calculus that the vector of probabilities is given by , where and is the () matrix below Since , for each , it can be proved that . As a consequence of the previous assumptions, paper [18] establishes that ~multinomial whose probability function is with . An identifiable model is obtained by assuming equal probabilities of misclassification, that is, As a result, the likelihood function in (3) becomes where and .

Results in [18] are determined under the Bayesian paradigm. Because some information about are available in the literature for other populations, such pieces of information were used to build more informative beta prior distributions for . In the literature, it is assumed a uniform prior for the misclassification error and these errors are considered independent of a priori. Although the beta family is very rich in form and can represent well many different prior opinions about , some researches prefer to perform an objective analysis. In the following, we consider the Jeffreys approach to build non-informative prior distributions.

2.2. The Jeffreys Prior for

In this section, we introduce the joint Jeffreys prior for the parameters induced by the model in (5). We consider that is independent of and obtain the marginal Jeffreys prior for assuming as fixed. The Jeffreys prior for is determined using the same strategy. This approach to calculate the Jeffreys prior in a multiparameter scenario is quite common and avoids some misinterpretation issues related to the posterior results that usually occur whenever the Fisher information matrix is assumed to build the Jeffreys prior on a multi-parameter model.

Consider again the notation in (1); note that by fixing and taking the derivative of (5) with respect to , it follows that the Fisher information is Since our interest lies on constructing the prior distribution for , the terms , , are assumed as constants and they do not carry information about . As a result, the marginal Jeffreys prior for is for all and any constant values , such that . Note that if for all , we have Therefore, is a proper distribution. If for some the propriety of is also verified. As can be seen, if we assume , for instance, it follows that Consequently, there is a constant such that for all and . As a result, we have which guarantees the propriety of .

Similarly, if we fix and calculate the derivative of (5) with respect to , the Fisher information is given by Thus, since the terms in (11), depending of , do not carry any information about , the Jeffreys prior for is where and . Similar to what was observed for the prior distribution of , it can be proved that the prior for is proper.

Since and are assumed to be independent, the joint Jeffreys prior for is obtained through the multiplication of the expressions (7) and (12). Figure 2 presents the marginal Jeffreys prior of for three different values of , and the marginal Jeffreys prior of for three different values of . Figure 3 shows a three-dimensional surface plot representing the joint Jeffreys prior for . Note that the three priors are decreasing functions and they tend to infinity as the parameters tend to zero. However, since the and are both proper distributions, the joint prior for is also proper.

The joint posterior of is obtained via Bayes theorem assuming the likelihood function in (5) and the priors given in (7) and (12). The posterior distribution cannot be obtained analytically; we use a Metropolis-Hastings algorithm to sample from it.

2.3. Model in Franco et al.

The model in [16] does not consider the misclassification errors that might occur when data are obtained. Conditional on , paper [16] shows that the random vector has a Multinomial distribution with parameters , and , denoted by ~multinomial , which has the following probability function where , are indicated in (1). Note that this model is a particular case of the model presented in (5) whenever is assumed to be zero. As observed for the misclassification model, the posterior distribution of cannot be analytically evaluated even when a Bayes-Laplace prior distribution is assumed to describe the uncertainty about . The posterior can be approximated through numerical algorithms [21, 22] or Monte Carlo methods. In Section 3, we show results related to this model which were obtained in [19].

3. Case Study: Down Syndrome Data

Here, we explore the dataset reported in [16] consisting of a random sample of blood from 34 Brazilian individuals with trisomy of chromosome 21. In this data set, the observed numbers of patients with one, two, and three peaks are 6, 22, and 6, respectively. The hypothesis of the Hardy-Weinberg equilibrium is verified for the Brazilian population and six alleles are found with frequencies 0.12, 0.45, 0.09, 0.31, 0.01, and 0.02.

In terms of the Markov Chain Monte Carlo (MCMC) algorithm, we run a chain of size 10,000 and assume a burn-in period involving the first 5,000 iterations to guarantee the convergence of the chain; as a result, the final sample from the posterior distribution has 5,000 observations. We assume independent uniform reference distributions to generate the candidates required in the sampling process.

Tables 1 and 2 present the posterior summaries for and assuming the misclassification model described in Section 2. We consider two non-informative priors for : the joint Jeffreys prior given in Section 2 and the Bayes-Laplace prior assumed in [18]. We also compare these results with those obtained by fitting Franco et al.’s model assuming the Bayes-Laplace prior for .

All posterior estimates for (Table 1) are smaller than those observed in the literature for groups of patients with Down syndrome (see [7, 1015, 21]); the literature results indicate the mean 0.6803 and the standard deviation 0.0678. As can be seen, the estimates under the misclassification model are also smaller than those obtained for the model proposed in [16]; the maximum likelihood estimate (MLE) calculated in [16] is 0.6551 for the same data set. In the comparison between the misclassification and Franco et al.’s models, we have found that the uncertainty about is higher for the model including misclassification. We also observe that the posterior estimates tend to be small and the variance tend to be large when we assume the Jeffreys prior. The posterior distributions of (Table 2) suggest that the misclassification error is very small. The posterior mean, median, mode, and the uncertainty about for the Jeffreys prior model are smaller than those estimates obtained via the Bayes-Laplace prior distribution.

Figure 4 presents two histograms representing the posterior distributions of and under the proposed model. In both graphs, the distribution has unique mode and an asymmetric shape. The posterior distribution of indicates higher probability mass for values around 0.5. Furthermore, with 95% of probability, the Meiosis I non-disjunction fraction belongs to the interval . The posterior distribution for puts most of its probability mass for values close to zero and with posterior probability of 97.5%, the misclassification error is smaller than 0.1614.

4. Final Comments

In this paper, we provide a reference analysis for the misclassification model introduced by [18] to describe the uncertainty about the Meiosis I non-disjunction fraction in patients with trisomy. The Jeffreys prior, assuming independence between the non-disjunction fraction and the misclassification error , was obtained in our analysis; we have built a Metropolis-Hastings algorithm to sample from the posterior distributions. The real data application, illustrating the use of the methodology, involves a Brazilian Down syndrome data set.

In our analysis, the Jeffreys prior determines more uncertainty in the estimation of compared to the Bayes-Laplace prior, but its use leads to a more precise estimate of the misclassification error. In addition, we can see that, for the Brazilian population, the Down syndrome is often a consequence of a non-disjunction in meiosis I.

The use of the Jeffreys prior and other non-informative priors permit us to fairly compare Bayesian and Classical approaches of inference. It is still an open problem to find the maximum likelihood estimator under the misclassification model discussed in this paper and the other extensions discussed in [19].

Acknowledgments

The authors would like to express their gratitude to the editors for the invitation. We also thank Professor Sergio D. J. Pena and Flavia C. Parra (UFMG), for providing the dataset. R. H. Loschi has her research supported in part by the financial support from CNPq (Conselho Nacional de Desenvolvimento Científico e Tecnológico) of the Ministry for Science and Technology of Brazil, Grants 306085/2009-7, 304505/2006-4, 472877/2006-2. V. D. Mayrink has received financial support from the Universidade Federal de Minas Geraisthrough the program Doutores Recém-Contratados of the PRPq (Pro-Reitoria de Pesquisa).