Abstract

To incorporate biologically observed epidemics into multistage models of carcinogenesis, in this paper we have developed new stochastic models for human cancers. We have further incorporated genetic segregation of cancer genes into these models to derive generalized mixture models for cancer incidence. Based on these models we have developed a generalized Bayesian approach to estimate the parameters and to predict cancer incidence via Gibbs sampling procedures. We have applied these models to fit and analyze the SEER data of human eye cancers from NCI/NIH. Our results indicate that the models not only provide a logical avenue to incorporate biological information but also fit the data much better than other models. These models would not only provide more insights into human cancers but also would provide useful guidance for its prevention and control and for prediction of future cancer cases.

1. Introduction

It is universally recognized that each cancer tumor develops through stochastic proliferation and differentiation from a single stem cell which has sustained a series of irreversible genetic and/or epigenetic changes (Little [1]; Tan [2, 3]; Tan et al. [4, 5]; Weinberg [6]; Zheng [7]). That is, carcinogenesis is a stochastic multistage model with intermediate cells subjecting to stochastic proliferation and differentiation. Furthermore, the number of stages and the number of pathways of the carcinogenesis process are significantly influenced by environmental factors underlying the individuals (Tan et al. [4, 5]; Weinberg [6]).

Another important observation in human carcinogenesis is that most human cancers cluster around family members. Further, many cancer incidence data (such as SEER data of NCI/NIH, USA) have documented that some cancers develop during pregnancy before birth to give new born babies with cancer at birth. This has been referred to as pediatric cancers. Well-known examples of pediatric cancers include retinoblastoma—a pediatric eye cancer, hepatoblastoma—a pediatric liver cancer, Wilm’s tumor—a pediatric kidney cancer, and medulloblastoma—a pediatric brain tumor. Epidemiological and clinical studies on oncology have also revealed that inherited cancers are very common in many adult human cancers including lung cancer, colon cancer [8], uveal melanomas (adult eye cancer, [9]), and adult liver cancer (HCC, [10]).

Given the above results from cancer biology and human cancer epidemiology, the objective of this paper is to illustrate how to develop stochastic models of carcinogenesis incorporating these biological and epidemiological observations. Based on these models and cancer incidence data, we will then proceed to develop efficient statistical procedures to estimate unknown parameters in the model, to validate the model, and to predict cancer incidence.

In Section 2, we illustrate how to incorporate segregation of cancer genes in multistage stochastic models of carcinogenesis to account for inherited cancer cases. In Section 3, we will develop stochastic equations for the state variables of the model described in Section 2. By using these stochastic equations we will derive probability distributions of the state variables (i.e., the number of intermediate cancer cells) and the probability distribution of time to detectable cancer tumors. In Section 4, assuming that we have some cancer incidence data such as the SEER data from NCI/NIH, we proceed to develop statistical models for these data from these multistage models of carcinogenesis. In Section 5, by combining models in Sections 24, we proceed to develop a generalized Bayesian inference and Gibbs sampling procedures to estimate the unknown parameters, to validate the model, and to predict cancer incidence. As an example of application, in Section 6 we proceed to develop a multistage model of human eye cancer with inherited cancer cases as described in Figure 2. We will illustrate the model and methods by analyzing the SEER data of human eye cancer from NCI/NIH. Finally in Section 7, we will discuss the usefulness of the model and the methods developed in this paper and point out some future research directions.

2. The Stochastic Multistage Model of Carcinogenesis with Inherited Cancer Cases

The -stage multistage model of carcinogenesis views carcinogenesis as the end point of () discrete, heritable, and irreversible events (mutations, genetic changes or epigenetic changes) with intermediate cells subjected to stochastic proliferation and differentiation (Little [1]; Tan [2, 3]; Tan et al. [4, 5]; Weinberg [6]). Let denote normal stem cells, the cancer tumors, and the th stage initiated cells arising from the th stage initiated cells () by some genetic and/or epigenetic changes. Then the model assumes with the cells subject to stochastic proliferation (birth) and differentiation (death). Further, it assumes that each stem cell proceeds independently of other cells and that cancer tumors develop from primary cells by clonal expansion (stochastic birth and death), where primary cells are cells which arise directly from cells; see Yang and Chen [11].

For example, Figure 1 is a multistage pathway for the squamous NSCLC (NonSmall Cell Lung cancer) as proposed by Osada and Takahashi [12] and Wistuba et al. [13]. Similarly, Figure 2 is the multistage model for uveal melanoma proposed by Landreville et al. [14] and Mensink et al. [15] while Figure 3 is the APC--Catenin-Tcf pathway for human colon cancer (Tan et al. [8], Tan and Yan [16]).

Remark 1. To develop stochastic multistage models of carcinogenesis, in the literature (Little [1], Tan [2], Zheng [7]) it is conveniently assumed that the cells grow instantaneously into cancer tumors as soon as they are generated. In this case, the number of tumors is equal to the number of cells and one may identify cells as tumors. It follows that the number of tumors is a Markov process and that the cells are transient cells. In these cases, one needs only to deal with and cells with . However, as shown by Yang and Chen [11], the number of tumors is much smaller than the total number of cells. Also, in many animal models and in cancer risk assessment of radiation, Klebanov et al. [17], Yakovlev and Tsodikov [18], and Fakir et al. [19] have shown that are in general not Markov.
To extend the above model to include hereditary cancers, observe that mutants of cancer genes exist in the population and that both germline cells (egg and sperm) and somatic cells may carry mutant alleles of cancer genes [2, 20]. Further, without exception, every human being develops from the embryo in his/her mother’s womb (embryo stage, denote time by 0), where stem cells of different organs divide and differentiate to develop different organs respectively (see Weinberg [6], Chapter 10). If both the egg and the sperm generating the embryo carry mutant alleles of relevant cancer genes, then the individual is an -stage person at the embryo stage; if only one of the germ line cells (egg or sperm) generating the embryo carries mutant alleles of cancer genes, then at the embryo stage the individual is an -stage person. Similarly, the individual is a normal person ( person) at the embryo stage if both the egg and the sperm generating the embryo do not carry mutant alleles of cancer genes. Refer to the person in the population as an person if he/she is an -stage person at the embryo stage. Then with respect to the cancer development in question, people in the population can be classified into 3 types of people: normal people ( people), people, and people. Based on this classification, for normal people in the population the stochastic model of carcinogenesis is a -stage multievent model given by ; for people in the population the stochastic model of carcinogenesis is a ()-stage multievent model given by and for people in the population, the stochastic model of carcinogenesis is a ()-stage multievent model given by .
To account for inherited cancer cases, let be the proportion of people in the population and the proportion of people in the population. In general large human populations under steady-state conditions, one may practically assume that the is a constant independent of time (Crow and Kimura [21]). Then is the proportion of normal people (i.e., people) in the population. Let be the population size and the number of people in the population so that . Assume that is very large and that marriage between people in the population is random with respect to cancer genes; then as shown in Crow and Kimura [21] (see also Tan [22], Chapter 2), the conditional probability distribution of given n is 2-dimensional multinomial with parameters . That is,
To derive probability distribution of time to cancer under the above model, observe that during pregnancy the proliferation rates of all stem cells are quite high. Thus, with positive probability people in the population may acquire additional genetic and/or epigenetic changes during pregnancy to become -stage people at birth. Similarly, people may acquire genetic and/or epigenetic changes during pregnancy to become people at birth; albeit the probability is very small, normal people at the embryo stage may acquire some genetic and/or epigenetic changes during pregnancy to become people at birth. Because the probability of genetic and epigenetic changes is small, one may practically assume that an person at the embryo stage would only give rise to stem cells and possibly stem cells at birth. This is equivalent to assuming that people at the embryos stage would not generate stem cells at or before birth. This model is represented schematically in Figure 4. Notice that if , one may practically assume that with probability one an person at the embryo stage would develop cancer at or before birth (). If , then with probability , an person at the embryo stage would develop cancer at or before birth.

3. The Stochastic Process of Carcinogenesis with Hereditary Cancer Cases and Mathematical Analysis

Because tumors are developed from primary cells, for the above stochastic model, the identifiable response variables are and , where is the number of cancer tumors at time and is the number of cells at time in people who are people at the embryo stage (see [3, 5, 8, 23], Remarks 1 and 2). For people who have genotype at the embryo stage, the stochastic model of carcinogenesis is then given by the stochastic process , where . For these processes, in the next subsections, we will derive stochastic equations for the state variables ; we will also derive the probability distributions of these state variables and the probabilities of developing cancer tumors. These are the basic approaches for modeling carcinogenesis used by the first author and his associates; see Tan [3], Tan et al. [4, 5, 8, 23], Tan and Zhou [9], Tan and Yan [16], and Tan and Chen [24, 25] and Remark 3.

Remark 2. At any time (say ) the total number of cells is equal to the total number of cells generated from cells at time plus the total number of cells generated by cell division from other cells at time ; the former cells are referred to as primary cells while the latter are not primary cells. Since each tumor is developed from a single primary cell through stochastic birth and death process, each primary cell will generate at most one tumor. It follows that at any time the total number of cells is considerably greater than the number of cancer tumors (see also Yang and Chen [11]). Thus, for generating cancer tumors the only identifiable state variables are the number of cells with () and the number of detectable cancer tumor.

Remark 3. To model stochastic multistage models of carcinogenesis, the standard traditional approach is to assume that the last stage cells (i.e., the cells in the model ) grow instantaneously into a cancer tumor as soon as they are generated and then apply the standard Markov theory to and to the state variables . This approach has been described in detail in Tan [2], Little [1], and Zheng [7]; see also Luebeck and Moolgavkar [26] and Durrett et al. [27]. However, in some cases the assumption of instantaneous growth into cancer tumors of cells may not be realistic (Klebanov et al. [17], Yakovlev and Tsodikov [18], and Fakir et al. [19]); in these cases, is not Markov so that the Markov theory method is not applicable to . To develop analytical results and to resolve many difficult issues, Tan and his associates [4, 5, 24] have proposed an alternative approach through stochastic equations and have followed Yang and Chen [11] to assume that cancer tumors develop by clonal expansion from primary last stage cells. Through probability generating function method, Tan and Chen [24] have shown that if the Markov theory is applicable to , then the stochastic equation method is equivalent to the classical Markov theory method but is more powerful. Also, through stochastic equation method we have shown in the Appendix that the classical approach provides a close approximation to discrete time model under the assumption that the primary last stage cells develop into a detectable tumor in one time unit. This provides a reasonable explanation why the traditional approach (see [2, 22]) can still work well even though the Markov assumption for may not hold. In this paper we will thus basically use the stochastic equation method and assume that cancer tumors develop from primary last stage cells through clonal expansion.

3.1. Stochastic Equations for the State Variables

Assume now that an individual is an person at the embryo stage. Then in this individual, cancer is developed by a -stage multievent model given by and the identifiable response variables are given by . To derive stochastic equations for the staging variables in in this individual, observe that for each is in general a Markov Process although may not be Markov; see Remark 1, Tan [3] and Tan et al. [4, 5], Tan and Zhou [9], and Tan and Yan [16]. It follows that derive from through stochastic birth-death processes of cells and through stochastic transition during . Let be the number of birth, the number of death of cells, and the number of transition from cells during , respectively in people who are people at the embryo stage. Let denote the number of transitions from during . Because the transition of would not affect the number of cells but only increase the number of cells (see Remark 4), by the conservation law we have the following stochastic equations for (see Tan [3], Tan et al. [4, 5, 8], Tan and Zhou [9], and Tan and Yan [16]):

Because are random variables, the above equations are basically stochastic equations. To derive probability distributions of these variables, let and denote the birth rate and the death rate at time of the cells, respectively. Let be the transition rate at time from . Then, as shown in Tan [3], for we have, to the order of ,

It follows that to the order of ,

From these distribution results, by subtracting from the random transition variables its conditional means, respectively, we obtain the following stochastic equations for the state variables : where for and where for , for .

From the above equations, by dividing both sides by and letting we obtain

In the above equations, using the distribution results in (4) it can easily be shown that the random noises have expected value zero and are uncorrelated with the staging variables and . The initial conditions at birth () for the above stochastic differential equations are .

Given the initial conditions and at birth (), the solution of the equations in (6) is given, respectively, by where

If the model is time homogeneous so that , and if if , the above solutions under the initial conditions then reduce, respectively, to where for ,

Obviously, for all . It follows that for , the expected values of for homogeneous models with if are given by where as a convention, for all .

Remark 4. Because genetic changes and epigenetic changes occur during cell division, to the order of , the probability is that one cell at time would give rise to 1 cell and 1 cell at time by genetic changes or epigenetic changes. It follows that the transition of would not affect the population size of cells but only increase the size of the population.

3.2. Transition Probabilities and Probability Distributions of Staging Variables

Let denote the probability density function of a binomial random variable , the probability density function of a Poisson random variable , and the probability density function of a bivariate multinomial random vector . Using the stochastic equations of the staging variables given by (2) and using the probability distributions of the transition variables in (4), as in Tan et al. [4, 5], we obtain the following transition probabilities of given for : where

Define the unobservable transition variables . Then, we have for the joint probability density function of given where

Let be a column vector with in the th position () and with in other positions. Let and be column vectors of nonnegative integers. (i.e., and are nonnegative integers). Then, by using the probability distribution results in (14)–(16) it can readily be shown that

The above results imply that is a ()-dimensional birth-death process with birth rates , death rates , and cross-transition rates . (See Definition 4.1 in Tan ([22], Chapter 4)). Using these results, it can be shown that the Kolmogorov forward equation for the probabilities in the above model is given by for .

By using the above set of differential equations, one can readily compute the probabilities numerically.

3.3. The Probability Distributions of the Number of Detectable Tumors and Times to Tumors

As shown by Yang and Chen [11], malignant cancer tumors arise from primary cells by clonal expansion, where primary cells are cells generated directly by cells. ( cells derived by stochastic birth of other cells are not primary cells). That is, cancer tumors develop from primary cells through stochastic birth-death processes.

To derive the probability distribution for in people in the population, let denote the probability that a primary cancer cell at time develops into a detectable cancer tumor by time . (Explicit formula for has been given in Tan [22], Chapter 8 and in Tan and Chen [24]). Than, as shown in Tan ([3, 22], chapter 8), the conditional probability distribution of given in people is Poisson with mean , where . That is,

Let be the probability that cancer tumors develop during in people in the population. For time homogeneous models with small , is then given by where .

To derive , denote by and define the functions

Applying results of given in (11), for time homogeneous models with if we obtain ’s as follows.(1)If , then . Hence, , for and for , (2)If , then we have for and and for , where if and if .

Notice that if , then reduces to

Notice also that if for and if , then the above ’s reduce, respectively, to

4. Probability Distribution of Observed Cancer Incidence Incorporating Hereditary Cancer Cases

For estimating unknown parameters and to validate the model, one would need real data generated from the model. For studies of carcinogenesis such data are usually given by cancer incidence. For example, in the SEER data of NCI/NIH of USA, the data are given by , where is the number of cancer cases at birth and the total number of birth, and where for , is the number of cancer cases developed during the th age group of a one-year period (or 5 years periods) and is the number of noncancer people who are at risk for cancer and from whom of them have developed cancer during the th age group. Given in Table 1 are the SEER data of uveal melanoma (adult eye cancer) during the period 1973–2007. In Table 1, notice that there are some cancer cases at birth implying some inherited cancer cases. In this section, we will develop a statistical model for these types of data sets from the stochastic multistage model with hereditary cancers as given in Section 2. As in previous sections, let be the number of individuals who have genotype at the embryo stage among the people at risk for the cancer in question. Then, as showed above, . It follows that . In what follows, we let denote the random variable for unless otherwise stated.

4.1. The Probability Distribution of

As shown in Figure 4, people would only generate stage cells and stage cells at birth. Thus, for cancers to develop at or before birth, the number of stages for the stochastic model of carcinogenesis must be 3 or less. It follows that if , the appropriate model of carcinogenesis must be either a 2-stage model or a 3-stage model. Since and , the probability distribution of is therefore where

The expected number of given is if and if . Hence, for the 2-stage model (i.e., ) or the 3-stage model (i.e., ), the maximum likelihood estimate of is and the deviance from the conditional probability distribution of given is

4.2. The Probability Distribution of

To derive the probability distribution of in the th age group, let be the number of cancer cases generated by people who have genotype at the embryo stage among these cancer cases. Then and is the number of cancer cases generated by the normal people in the population. The conditional probability distribution of given is

Notice that if (a 2-stage model), then all individuals would develop tumor at or before birth. Thus, if , then for all so that if , cancer cases develop only from normal people ( people) and people. On the other hand, if , then with positive probability for all , where is the last time point in the data. Let if and if . Then, , where . Since , we have for the conditional probability density function of given where is the probability density function of and the probability density function of .

The probability density function given by (34) is a mixture of Poisson probability density functions with mixing probability density function given by the multinomial probability distribution of given . This mixing probability density function represents individuals with different genotypes at the embryo stage in the population.

Let be the set of all unknown parameters (i.e., the parameters and the birth rates, the death rates, and the mutation rates of cells). Based on data , the likelihood function of is

Notice that because the mutation rates are very small, one may practically assume for . Also, because the stage-limiting genes are basically tumor suppressor genes which act recessively (see Tan [3], Weinberg [6], and Tan et al. [5, 8, 23]), one may practically assume (see Tan et al. [4, 5, 8]).

4.3. The Joint Probability Distribution of Augmented Variables and Cancer Incidence

For applying the mixture distribution of in (34) to make inference about the unknown parameters, one needs to expand the model to include the unobservable augmented variables and derives the joint probability distribution of these variables. For these purposes, observe that for and for , the conditional probability distribution of given is

Since the conditional probability distribution of given for is Poisson with mean , we have for the joint conditional probability density function of given where and .

If , then so that . Thus, we have for ,

It follows that if , then and the joint probability density function of given is where and .

Put ,  , , . From (37) and (40), we have for the conditional joint probability density function of given

It follows that the joint conditional probability density function of given is

Notice that the above probability density function is a product of multinomial probability density functions and Poisson probability density functions. For this joint probability density function, the deviance is where where .

The joint probability density function of given by (42) will be used as the kernel for the Bayesian method to estimate the unknown parameters and to predict the state variables.

4.4. Fitting of the Model to Cancer Incidence Data

To fit the model to real data, as in Tan [35], we let to correspond to a fixed time interval such as 6 months in human cancer studies. (Tan et al. [4] has assumed 3 months as one-time unit while Luebeck and Moolgavkar [26] has assumed one year as one-time unit). Then, because the proliferation rate of the last stage cells is quite large, one may practically assume for . Hence, noting that is usually very small (see [35]), the is approximated by where .

Under discrete time approximation, the ’s have been derived in the appendix. Using these results of expected numbers and using the result for , we obtain where and are defined in Section 3.3 and where the ’s are given by

Notice that if , then reduces to

Applying these results, for time homogeneous models with if , the ’s under discrete approximation are given as follows.(1)If , then . Hence, , for and for , (2)If , then we have and for ,

Notice that if one replaces by , the above ’s from discrete time model are exactly the same as from the continuous model, respectively, as given in equations (23)–(27) under the assumption that for . Notice that the assumption for is equivalent to assuming that the last stage cancer cells grow instantaneously into cancer tumors as soon as they are generated; see Remark 1.

5. The Fitting of the Model to Cancer Incidence and the Generalized Bayesian Inference Procedure

Given the model in Sections 2 and 3 and cancer incidence, one may use results in Section 4 to fit the model. By using this model and the distribution results in Section 4, one can readily estimate the unknown genetic parameters, predict cancer incidence, and check the validity of the model by using the generalized Bayesian inference and Gibbs sampling procedures; for more detail, see Tan [3, 22] and Tan et al. [4, 5].

The generalized Bayesian inference is based on the posterior distribution of given . This posterior distribution is derived by combining the prior distribution of with the joint probability distribution given given by (42). It follows that this inference procedure would combine information from three sources: previous information and experiences about the parameters in terms of the prior distribution of the parameters, biological information of inherited cancer cases via genetic segregation of cancer genes in the population (; see Section 2), and information from the expanded data and the observed data () via the statistical model from the system () given by (37) and (40). Because of additional information from the genetic segregation of the cancer genes, this inference procedure provides an efficient procedure to extract information of effects of genotypes of individuals at the embryo stage.

5.1. The Prior Distribution of the Parameters

For the prior distributions of , because biological information has suggested some lower bounds and upper bounds for the mutation rates and for the proliferation rates, we assume where is a positive constant if these parameters satisfy some biologically specified constraints are and equal to zero for otherwise. These biological constraints are as follows.(i), , and .(ii)For , we let () and , .(iii)For the , we let , and .(iv)For the , we let and .

We will refer to the above prior as a partially informative prior which may be considered as an extension of the traditional noninformative prior given in Box and Tiao [28].

5.2. The Posterior Distribution of the Parameters Given

Denote by . From the posterior distribution , we obtain where is the parameter space of provided by the biological constraints in Section 5.1.

For computational convenience, we notice that the log of is proportional to the negative of given by (44)-(45); similarly, the log of is proportional to the negative of given by (46).

5.3. The Multilevel Gibbs Sampling Procedure For Estimating Unknown Parameters

Given the posterior probability distributions, we will use the following multilevel Gibbs sampling procedure to derive estimates of the parameters. We notice that numerically, the Gibbs sampling procedure given below is equivalent to the EM-algorithm from the sampling theory viewpoint with Steps 1 and 2 as the -Step and with Steps 3 and 4 as the -Step, respectively [29]. These multilevel Gibbs sampling procedures are given by the following.

Step 1 (Generating Given (The Data-Augmentation Step 1)). Given and given , use the multinomial distribution of given in Section 3 to generate a large sample of . Then, by combining this large sample with in (37) and (40) to select through the weighted bootstrap method due to Smith and Gelfand [30]. This selected is then a sample from even though the latter is unknown. (For proof, see Tan [22], Chapter 3). Call the generated sample .

Step 2 (Generating Given (The Data-Augmentation Step 2)). Given and given generated from Step 1, generate from the probability distribution given by (36) and (38). Call the generated sample .

Step 3 (Estimation of Given ). Given and given from Steps 1 and 2, derive the posterior mode of by maximizing the conditional posterior distribution . Under the partially informative prior, this is equivalent to maximize the negative of the deviance given by (44)-(45) in Section 4.3 under the constraints given in Section 5.1. Denote this generated mode by .

Step 4 (Estimation of Given ). Given and given from Steps 13, derive the posterior mode of by maximizing the conditional posterior distribution . Under the partially informative prior, this is equivalent to maximize the negative of the deviance in (46) under the constraints. Denote the generated mode as .

Step 5 (Recycling Step). With given above, go back to Step 1 and continue until convergence.
The proof of convergence of the above steps can be derived by using procedure given in Tan ([22], Chapter 3). At convergence, the are the generated values from the posterior distribution of given independently of (for proof, see Tan [22], Chapter 3). Repeat the above procedures once then generate a random sample of from the posterior distribution of given ; then one uses the sample mean as the estimates of and use the sample variances and covariances as estimates of the variances and covariances of these estimates.

6. A New Multistage Stochastic Model for Adult Eye Cancer (Uveal Melanoma)—An Example

The human eye cancers consist of pediatric eye cancers and adult eye cancers. The most common pediatric eye cancer is the retinoblastoma which develops from the retinal pigment epithelium cells underlying the retina that do not form melanoma. The most common adult eye cancers are the uveal melanomas involving the iris, the ciliary body, and the choroid (collectively referred to as the uveal). These cancers develop from melanocytes (pigment cells) which reside within the uveal giving color to the eye. In Tan and Zhou [9] we have developed a modified two-stage model for retinoblastoma. Based on results from molecular biology (see Landreville et al. [14], Mensink et al. [15], and Loercher and Harbour [31]), Landreville et al. [14] have proposed a three stage model for uveal melanoma as given in Figure 2. As an example of applications of this paper, in this section we will apply this model of uveal melanoma to the NCI/NIH eye cancer data from the SEER project. We notice that the same methods can be applied to model other human cancers as well, but this will be our future research.

Given in Table 1 are the numbers of people at risk and the eye cancer cases in the age groups together with the predicted cases from the models. These data give cancer incidence at birth and incidence for 85 age groups () with each group spanning over a 1-year period except the last age group (≥85 years old). For human eye cancer, because the incidence at birth and for age groups from 1 to 10 years old is basically generated by the pediatric eye cancer-retinoblastoma (see [9]), to account for inherited cancer cases of uveal melanomas, the incidence for age 0 (birth) and for age periods from 1 to 10 years old in Table 1 for uveal melanoma is derived by subtracting incidence of retinoblastoma from SEER data (see Tan and Zhou [9]).

To fit the data, we let one-time unit be 6 months after birth and let . To compare different models and to assess different assumptions, we will consider the following 2-3-stage mixture models: the complete 3-stage mixture model (Model-) in which no assumptions are made on the parameters. the Type-1–3-stage mixture model in which we assume that and that normal people and at the embryo stage will remain normal people and people, respectively, at birth (Model-1). For comparison purposes, we also fit a 2-stage model as defined in Tan and Zhou [9]. We will apply the methods in Section 6 to fit these models to the SEER data given in Table 1.

Given in Table 2 are the natural logs of the likelihood functions, the AIC (Akaike Information Criterion) and the BIC (Bayesian Information Criterion) for these models. Given in Table 3 are the estimates of parameters in the 3-stage models. Given in Figure 5 are the plots of predicted cancer cases from the 3-stage mixture models (Model-F and Model-1) and the 2-stage model. For comparison purposes, in Table 1, we also provide numbers of predicted cancer cases from the 3-stage mixture models and the 2-stage model together with the observed cancer cases over time from SEER. From these results, we have made the following observations.(a)As shown by results in Table 1 and Figure 5, it appeared that both Model- and Model-1 fitted the SEER data well, although Model-1 fitted the data slightly better from values of AIC and BIC. The Chi-square test statistics for Model- and Model-1 are given by 88.43, and 94.48 respectively, giving a -value of 0.12 () for Model- and a value of 0.11 () for Model-1. On the other hand, the 2-stage model fitted the date very poorly; the Chi-statistic value for the 2-stage model is 2747.69 giving a -value less than . The AIC (Akaike Information Criteria) and BIC (Bayesian Information Criteria) values of Model-1 are given by (AIC = 2609.53, BIC = 2631.51) which are slightly smaller than those of Model-, respectively; however, the AIC and BIC values (8796.84, 8811.57) of the two-stage model are considerably greater than those of the 3-stage models, respectively. These results suggest that uveal melanoma may best be described by a 3-stage model with inherited component and that one may practically assume and that normal people and people at the embryo stage will remain normal people and people, respectively, at birth.(b)From Table 3, it is observed that the estimate of is close to zero (the estimate is of order ) indicating that the phenotype of is almost identical to that of further confirming that the staging-limiting genes are basically tumor suppressor genes and that there is no haploinsufficiency for these tumor suppressor genes. On the other hand, the estimate of is of order which is about times greater than those of cells with genotype .(c)From Table 3, the estimates of are of order , respectively. Because , assuming some values of from some biological observations, one can have some rough ideas about the magnitude of . For example, if we follow Potten et al. [32] to assume , then .(d)From Table 3, the estimates of and from the SEER data are of orders and , respectively. This indicates that in the US population, the frequency of the staging limiting cancer gene for uveal melanoma is approximately around . Table 3 also showed that the estimate of was 0.8411, indicating that most individuals with genotype would develop cancer at birth. This may help to explain why there are observed cancer incidences at birth for uveal melanoma in the SEER data even though the estimate of the frequency is of order .

7. Discussion and Conclusions

To account for inherited cancer cases, in this paper we have developed some general multistage models involving hereditary cancer cases. For human cancer incidence, these models are basically generalized mixture models. In these mixture models, the mixing probability is a multinomial distribution to account for genetic segregation of the staging-limiting tumor suppressor genes. This mixture model allows us to estimate for the first time the frequency of the staging-limiting tumor suppressor gene in human populations. As an example of applications, in this paper we have developed a general 3-stage stochastic multistage model of carcinogenesis for adult human eye cancer. To account for inherited cancer cases in the stochastic model of human eye cancer, we have also developed a generalized mixture model for uveal melanoma in human beings.

For using the proposed models to fit the cancer incidence data, in this paper we have developed a generalized Bayesian inference procedure to estimate the unknown parameters and to predict cancer cases. This inference procedure is advantageous over the classical sampling theory inference (i.e., maximum likelihood method) because the procedure combines information from three sources: previous information and experiences about the parameters in terms of the prior distribution of the parameters, biological information of inherited cancer cases via the genetic segregation of staging-limiting tumor suppressor genes in the population, and information from the expanded data and the observed data () via the statistical model from the system () given by (37) and (40).

To illustrate the usefulness and applications of our models and methods, we have applied our models and methods to the eye cancer SEER data of NCI/NIH. Our analysis clearly showed that the proposed 3-stage model with inherited cancer cases fitted the data nicely (see Table 2 and Figure 5); on the other hand, the classical 2-stage model cannot fit the data at all (see Table 2 and Figure 5). These results clearly have confirmed results from molecular biology that the human eye cancer is derived by a 3-stage model with inherited cancer component. Notice, however, our 3-stage multistage model is more general than the classical 3-stage model as described in Little [1], Tan [2], and Zheng [7] in that we postulate that cancer tumors develop from primary cells by clonal expansion (see Yang and Chen [11]). (Note that the stochastic multistage models in the literature assume that cancer tumors develop from last stage cells immediately as soon as they are generated, ignoring completely cancer progression; see Remark 1). As a matter of fact, we had assumed for a period of three months and found that the 3-stage models then did not fit the SEER data, clearly indicating that over a period of three months.

Applying our models and methods to the SEER data of human eye cancer, we have derived for the first time some useful pieces of information. Specifically, we mention for the first time that we have estimated the frequency of the staging-limiting tumor suppressor gene in the US population (). With the estimate of as , the predicted number of uveal melanoma at birth is by 3-stage models with inherited cancer component (Model-F and Model-1). (The observed number of eye cancer at birth is 34). The estimate of the proliferation rate () of cells using Model- is . (The estimate is using Model-1). This confirms that the stage-1 limiting gene is a tumor suppressor gene, and unlike the p53 gene in chromosome 17p (see [33]), there is little or no haploid insufficiency for this gene in cells with genotype .

Using models and methods of this paper, one can easily predict future cancer cases for human eye cancer. Thus, by comparing results from different populations, our models and methods can be used to assess cancer prevention and control procedures. This will be our future research topics; we will not go any further here.

Appendix

The Expected Numbers of State Variables under Discrete Time Approximation

Under discrete time, the stochastic differential equations of state variables reduce to the following stochastic difference equations of state variables, respectively: where and for .

The initial conditions at birth () for the above stochastic difference equations are if and if . The solution of the above difference equations under these initial conditions is given, respectively, by

If the model is time homogeneous so that and if , then the above solutions under the initial conditions reduce to where , .

Thus, if the model is time homogeneous and if , the ’s in discrete time models under the initial conditions are given, respectively, by