ISRN Biomathematics

Volume 2013, Article ID 954912, 19 pages

http://dx.doi.org/10.1155/2013/954912

## New Cancer Stochastic Models Involving Both Hereditary and Nonhereditary Cancer Cases: A New Approach

^{1}Department of Mathematical Sciences, The University of Memphis, Memphis, TN 38152, USA^{2}Department of Mathematics and Statistics, Arkansas State University, State University, AR 72467, USA

Received 24 August 2012; Accepted 10 October 2012

Academic Editors: T. LaFramboise, K. M. Page, I. Rogozin, and J. M. Starobin

Copyright © 2013 Wai-Yuan Tan and Hong Zhou. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

To incorporate biologically observed epidemics into multistage models of carcinogenesis, in this paper we have developed new stochastic models for human cancers. We have further incorporated genetic segregation of cancer genes into these models to derive generalized mixture models for cancer incidence. Based on these models we have developed a generalized Bayesian approach to estimate the parameters and to predict cancer incidence via Gibbs sampling procedures. We have applied these models to fit and analyze the SEER data of human eye cancers from NCI/NIH. Our results indicate that the models not only provide a logical avenue to incorporate biological information but also fit the data much better than other models. These models would not only provide more insights into human cancers but also would provide useful guidance for its prevention and control and for prediction of future cancer cases.

#### 1. Introduction

It is universally recognized that each cancer tumor develops through stochastic proliferation and differentiation from a single stem cell which has sustained a series of irreversible genetic and/or epigenetic changes (Little [1]; Tan [2, 3]; Tan et al. [4, 5]; Weinberg [6]; Zheng [7]). That is, carcinogenesis is a stochastic multistage model with intermediate cells subjecting to stochastic proliferation and differentiation. Furthermore, the number of stages and the number of pathways of the carcinogenesis process are significantly influenced by environmental factors underlying the individuals (Tan et al. [4, 5]; Weinberg [6]).

Another important observation in human carcinogenesis is that most human cancers cluster around family members. Further, many cancer incidence data (such as SEER data of NCI/NIH, USA) have documented that some cancers develop during pregnancy before birth to give new born babies with cancer at birth. This has been referred to as pediatric cancers. Well-known examples of pediatric cancers include retinoblastoma—a pediatric eye cancer, hepatoblastoma—a pediatric liver cancer, Wilm’s tumor—a pediatric kidney cancer, and medulloblastoma—a pediatric brain tumor. Epidemiological and clinical studies on oncology have also revealed that inherited cancers are very common in many adult human cancers including lung cancer, colon cancer [8], uveal melanomas (adult eye cancer, [9]), and adult liver cancer (HCC, [10]).

Given the above results from cancer biology and human cancer epidemiology, the objective of this paper is to illustrate how to develop stochastic models of carcinogenesis incorporating these biological and epidemiological observations. Based on these models and cancer incidence data, we will then proceed to develop efficient statistical procedures to estimate unknown parameters in the model, to validate the model, and to predict cancer incidence.

In Section 2, we illustrate how to incorporate segregation of cancer genes in multistage stochastic models of carcinogenesis to account for inherited cancer cases. In Section 3, we will develop stochastic equations for the state variables of the model described in Section 2. By using these stochastic equations we will derive probability distributions of the state variables (i.e., the number of intermediate cancer cells) and the probability distribution of time to detectable cancer tumors. In Section 4, assuming that we have some cancer incidence data such as the SEER data from NCI/NIH, we proceed to develop statistical models for these data from these multistage models of carcinogenesis. In Section 5, by combining models in Sections 2–4, we proceed to develop a generalized Bayesian inference and Gibbs sampling procedures to estimate the unknown parameters, to validate the model, and to predict cancer incidence. As an example of application, in Section 6 we proceed to develop a multistage model of human eye cancer with inherited cancer cases as described in Figure 2. We will illustrate the model and methods by analyzing the SEER data of human eye cancer from NCI/NIH. Finally in Section 7, we will discuss the usefulness of the model and the methods developed in this paper and point out some future research directions.

#### 2. The Stochastic Multistage Model of Carcinogenesis with Inherited Cancer Cases

The -stage multistage model of carcinogenesis views carcinogenesis as the end point of () discrete, heritable, and irreversible events (mutations, genetic changes or epigenetic changes) with intermediate cells subjected to stochastic proliferation and differentiation (Little [1]; Tan [2, 3]; Tan et al. [4, 5]; Weinberg [6]). Let denote normal stem cells, the cancer tumors, and the th stage initiated cells arising from the th stage initiated cells () by some genetic and/or epigenetic changes. Then the model assumes with the cells subject to stochastic proliferation (birth) and differentiation (death). Further, it assumes that each stem cell proceeds independently of other cells and that cancer tumors develop from primary cells by clonal expansion (stochastic birth and death), where primary cells are cells which arise directly from cells; see Yang and Chen [11].

For example, Figure 1 is a multistage pathway for the squamous NSCLC (NonSmall Cell Lung cancer) as proposed by Osada and Takahashi [12] and Wistuba et al. [13]. Similarly, Figure 2 is the multistage model for uveal melanoma proposed by Landreville et al. [14] and Mensink et al. [15] while Figure 3 is the APC--Catenin-Tcf pathway for human colon cancer (Tan et al. [8], Tan and Yan [16]).

*Remark 1. * To develop stochastic multistage models of carcinogenesis, in the literature (Little [1], Tan [2], Zheng [7]) it is conveniently assumed that the cells grow instantaneously into cancer tumors as soon as they are generated. In this case, the number of tumors is equal to the number of cells and one may identify cells as tumors. It follows that the number of tumors is a Markov process and that the cells are transient cells. In these cases, one needs only to deal with and cells with . However, as shown by Yang and Chen [11], the number of tumors is much smaller than the total number of cells. Also, in many animal models and in cancer risk assessment of radiation, Klebanov et al. [17], Yakovlev and Tsodikov [18], and Fakir et al. [19] have shown that are in general not Markov.

To extend the above model to include hereditary cancers, observe that mutants of cancer genes exist in the population and that both germline cells (egg and sperm) and somatic cells may carry mutant alleles of cancer genes [2, 20]. Further, without exception, every human being develops from the embryo in his/her mother’s womb (embryo stage, denote time by 0), where stem cells of different organs divide and differentiate to develop different organs respectively (see Weinberg [6], Chapter 10). If both the egg and the sperm generating the embryo carry mutant alleles of relevant cancer genes, then the individual is an -stage person at the embryo stage; if only one of the germ line cells (egg or sperm) generating the embryo carries mutant alleles of cancer genes, then at the embryo stage the individual is an -stage person. Similarly, the individual is a normal person ( person) at the embryo stage if both the egg and the sperm generating the embryo do not carry mutant alleles of cancer genes. Refer to the person in the population as an person if he/she is an -stage person at the embryo stage. Then with respect to the cancer development in question, people in the population can be classified into 3 types of people: normal people ( people), people, and people. Based on this classification, for normal people in the population the stochastic model of carcinogenesis is a -stage multievent model given by ; for people in the population the stochastic model of carcinogenesis is a ()-stage multievent model given by and for people in the population, the stochastic model of carcinogenesis is a ()-stage multievent model given by .

To account for inherited cancer cases, let be the proportion of people in the population and the proportion of people in the population. In general large human populations under steady-state conditions, one may practically assume that the is a constant independent of time (Crow and Kimura [21]). Then is the proportion of normal people (i.e., people) in the population. Let be the population size and the number of people in the population so that . Assume that is very large and that marriage between people in the population is random with respect to cancer genes; then as shown in Crow and Kimura [21] (see also Tan [22], Chapter 2), the conditional probability distribution of given n is 2-dimensional multinomial with parameters . That is,

To derive probability distribution of time to cancer under the above model, observe that during pregnancy the proliferation rates of all stem cells are quite high. Thus, with positive probability people in the population may acquire additional genetic and/or epigenetic changes during pregnancy to become -stage people at birth. Similarly, people may acquire genetic and/or epigenetic changes during pregnancy to become people at birth; albeit the probability is very small, normal people at the embryo stage may acquire some genetic and/or epigenetic changes during pregnancy to become people at birth. Because the probability of genetic and epigenetic changes is small, one may practically assume that an person at the embryo stage would only give rise to stem cells and possibly stem cells at birth. This is equivalent to assuming that people at the embryos stage would not generate stem cells at or before birth. This model is represented schematically in Figure 4. Notice that if , one may practically assume that with probability one an person at the embryo stage would develop cancer at or before birth (). If , then with probability , an person at the embryo stage would develop cancer at or before birth.

#### 3. The Stochastic Process of Carcinogenesis with Hereditary Cancer Cases and Mathematical Analysis

Because tumors are developed from primary cells, for the above stochastic model, the identifiable response variables are and , where is the number of cancer tumors at time and is the number of cells at time in people who are people at the embryo stage (see [3, 5, 8, 23], Remarks 1 and 2). For people who have genotype at the embryo stage, the stochastic model of carcinogenesis is then given by the stochastic process , where . For these processes, in the next subsections, we will derive stochastic equations for the state variables ; we will also derive the probability distributions of these state variables and the probabilities of developing cancer tumors. These are the basic approaches for modeling carcinogenesis used by the first author and his associates; see Tan [3], Tan et al. [4, 5, 8, 23], Tan and Zhou [9], Tan and Yan [16], and Tan and Chen [24, 25] and Remark 3.

*Remark 2. *At any time (say ) the total number of cells is equal to the total number of cells generated from cells at time plus the total number of cells generated by cell division from other cells at time ; the former cells are referred to as primary cells while the latter are not primary cells. Since each tumor is developed from a single primary cell through stochastic birth and death process, each primary cell will generate at most one tumor. It follows that at any time the total number of cells is considerably greater than the number of cancer tumors (see also Yang and Chen [11]). Thus, for generating cancer tumors the only identifiable state variables are the number of cells with () and the number of detectable cancer tumor.

*Remark 3. *To model stochastic multistage models of carcinogenesis, the standard traditional approach is to assume that the last stage cells (i.e., the cells in the model ) grow instantaneously into a cancer tumor as soon as they are generated and then apply the standard Markov theory to and to the state variables . This approach has been described in detail in Tan [2], Little [1], and Zheng [7]; see also Luebeck and Moolgavkar [26] and Durrett et al. [27]. However, in some cases the assumption of instantaneous growth into cancer tumors of cells may not be realistic (Klebanov et al. [17], Yakovlev and Tsodikov [18], and Fakir et al. [19]); in these cases, is not Markov so that the Markov theory method is not applicable to . To develop analytical results and to resolve many difficult issues, Tan and his associates [4, 5, 24] have proposed an alternative approach through stochastic equations and have followed Yang and Chen [11] to assume that cancer tumors develop by clonal expansion from primary last stage cells. Through probability generating function method, Tan and Chen [24] have shown that if the Markov theory is applicable to , then the stochastic equation method is equivalent to the classical Markov theory method but is more powerful. Also, through stochastic equation method we have shown in the Appendix that the classical approach provides a close approximation to discrete time model under the assumption that the primary last stage cells develop into a detectable tumor in one time unit. This provides a reasonable explanation why the traditional approach (see [2, 22]) can still work well even though the Markov assumption for may not hold. In this paper we will thus basically use the stochastic equation method and assume that cancer tumors develop from primary last stage cells through clonal expansion.

##### 3.1. Stochastic Equations for the State Variables

Assume now that an individual is an person at the embryo stage. Then in this individual, cancer is developed by a -stage multievent model given by and the identifiable response variables are given by . To derive stochastic equations for the staging variables in in this individual, observe that for each is in general a Markov Process although may not be Markov; see Remark 1, Tan [3] and Tan et al. [4, 5], Tan and Zhou [9], and Tan and Yan [16]. It follows that derive from through stochastic birth-death processes of cells and through stochastic transition during . Let be the number of birth, the number of death of cells, and the number of transition from cells during , respectively in people who are people at the embryo stage. Let denote the number of transitions from during . Because the transition of would not affect the number of cells but only increase the number of cells (see Remark 4), by the conservation law we have the following stochastic equations for (see Tan [3], Tan et al. [4, 5, 8], Tan and Zhou [9], and Tan and Yan [16]):

Because are random variables, the above equations are basically stochastic equations. To derive probability distributions of these variables, let and denote the birth rate and the death rate at time of the cells, respectively. Let be the transition rate at time from . Then, as shown in Tan [3], for we have, to the order of ,

It follows that to the order of ,

From these distribution results, by subtracting from the random transition variables its conditional means, respectively, we obtain the following stochastic equations for the state variables : where for and where − for , − for .

From the above equations, by dividing both sides by and letting we obtain

In the above equations, using the distribution results in (4) it can easily be shown that the random noises have expected value zero and are uncorrelated with the staging variables and . The initial conditions at birth () for the above stochastic differential equations are .

Given the initial conditions and at birth (), the solution of the equations in (6) is given, respectively, by where

If the model is time homogeneous so that , and if if , the above solutions under the initial conditions then reduce, respectively, to where for ,

Obviously, for all . It follows that for , the expected values of for homogeneous models with if are given by where as a convention, for all .

*Remark 4. *Because genetic changes and epigenetic changes occur during cell division, to the order of , the probability is that one cell at time would give rise to 1 cell and 1 cell at time by genetic changes or epigenetic changes. It follows that the transition of would not affect the population size of cells but only increase the size of the population.

##### 3.2. Transition Probabilities and Probability Distributions of Staging Variables

Let denote the probability density function of a binomial random variable , the probability density function of a Poisson random variable , and the probability density function of a bivariate multinomial random vector . Using the stochastic equations of the staging variables given by (2) and using the probability distributions of the transition variables in (4), as in Tan et al. [4, 5], we obtain the following transition probabilities of given for : where

Define the unobservable transition variables . Then, we have for the joint probability density function of given where

Let be a column vector with in the th position () and with in other positions. Let and be column vectors of nonnegative integers. (i.e., and are nonnegative integers). Then, by using the probability distribution results in (14)–(16) it can readily be shown that

The above results imply that is a ()-dimensional birth-death process with birth rates , death rates , and cross-transition rates . (See Definition 4.1 in Tan ([22], Chapter 4)). Using these results, it can be shown that the Kolmogorov forward equation for the probabilities in the above model is given by for .

By using the above set of differential equations, one can readily compute the probabilities numerically.

##### 3.3. The Probability Distributions of the Number of Detectable Tumors and Times to Tumors

As shown by Yang and Chen [11], malignant cancer tumors arise from primary cells by clonal expansion, where primary cells are cells generated directly by cells. ( cells derived by stochastic birth of other cells are not primary cells). That is, cancer tumors develop from primary cells through stochastic birth-death processes.

To derive the probability distribution for in people in the population, let denote the probability that a primary cancer cell at time develops into a detectable cancer tumor by time . (Explicit formula for has been given in Tan [22], Chapter 8 and in Tan and Chen [24]). Than, as shown in Tan ([3, 22], chapter 8), the conditional probability distribution of given in people is Poisson with mean , where . That is,

Let be the probability that cancer tumors develop during in people in the population. For time homogeneous models with small , is then given by where .

To derive , denote by and define the functions

Applying results of given in (11), for time homogeneous models with if we obtain ’s as follows.(1)If , then . Hence, , for and for , (2)If , then we have for and and for , where if and if .

Notice that if , then reduces to

Notice also that if for and if , then the above ’s reduce, respectively, to

#### 4. Probability Distribution of Observed Cancer Incidence Incorporating Hereditary Cancer Cases

For estimating unknown parameters and to validate the model, one would need real data generated from the model. For studies of carcinogenesis such data are usually given by cancer incidence. For example, in the SEER data of NCI/NIH of USA, the data are given by , where is the number of cancer cases at birth and the total number of birth, and where for , is the number of cancer cases developed during the th age group of a one-year period (or 5 years periods) and is the number of noncancer people who are at risk for cancer and from whom of them have developed cancer during the th age group. Given in Table 1 are the SEER data of uveal melanoma (adult eye cancer) during the period 1973–2007. In Table 1, notice that there are some cancer cases at birth implying some inherited cancer cases. In this section, we will develop a statistical model for these types of data sets from the stochastic multistage model with hereditary cancers as given in Section 2. As in previous sections, let be the number of individuals who have genotype at the embryo stage among the people at risk for the cancer in question. Then, as showed above, . It follows that . In what follows, we let denote the random variable for unless otherwise stated.

##### 4.1. The Probability Distribution of

As shown in Figure 4, people would only generate stage cells and stage cells at birth. Thus, for cancers to develop at or before birth, the number of stages for the stochastic model of carcinogenesis must be 3 or less. It follows that if , the appropriate model of carcinogenesis must be either a 2-stage model or a 3-stage model. Since and , the probability distribution of is therefore where

The expected number of given is if and if . Hence, for the 2-stage model (i.e., ) or the 3-stage model (i.e., ), the maximum likelihood estimate of is and the deviance from the conditional probability distribution of given is

##### 4.2. The Probability Distribution of

To derive the probability distribution of in the th age group, let be the number of cancer cases generated by people who have genotype at the embryo stage among these cancer cases. Then and is the number of cancer cases generated by the normal people in the population. The conditional probability distribution of given is

Notice that if (a 2-stage model), then all individuals would develop tumor at or before birth. Thus, if , then for all so that if , cancer cases develop only from normal people ( people) and people. On the other hand, if , then with positive probability for all , where is the last time point in the data. Let if and if . Then, , where . Since , we have for the conditional probability density function of given where is the probability density function of and the probability density function of .

The probability density function given by (34) is a mixture of Poisson probability density functions with mixing probability density function given by the multinomial probability distribution of given . This mixing probability density function represents individuals with different genotypes at the embryo stage in the population.

Let be the set of all unknown parameters (i.e., the parameters and the birth rates, the death rates, and the mutation rates of cells). Based on data , the likelihood function of is

Notice that because the mutation rates are very small, one may practically assume for . Also, because the stage-limiting genes are basically tumor suppressor genes which act recessively (see Tan [3], Weinberg [6], and Tan et al. [5, 8, 23]), one may practically assume (see Tan et al. [4, 5, 8]).

##### 4.3. The Joint Probability Distribution of Augmented Variables and Cancer Incidence

For applying the mixture distribution of in (34) to make inference about the unknown parameters, one needs to expand the model to include the unobservable augmented variables and derives the joint probability distribution of these variables. For these purposes, observe that for and for , the conditional probability distribution of given is

Since the conditional probability distribution of given for is Poisson with mean , we have for the joint conditional probability density function of given where and .

If , then so that . Thus, we have for ,

It follows that if , then and the joint probability density function of given is where and .

Put , , , . From (37) and (40), we have for the conditional joint probability density function of given

It follows that the joint conditional probability density function of given is

Notice that the above probability density function is a product of multinomial probability density functions and Poisson probability density functions. For this joint probability density function, the deviance is where where .

The joint probability density function of given by (42) will be used as the kernel for the Bayesian method to estimate the unknown parameters and to predict the state variables.

##### 4.4. Fitting of the Model to Cancer Incidence Data

To fit the model to real data, as in Tan [3–5], we let to correspond to a fixed time interval such as 6 months in human cancer studies. (Tan et al. [4] has assumed 3 months as one-time unit while Luebeck and Moolgavkar [26] has assumed one year as one-time unit). Then, because the proliferation rate of the last stage cells is quite large, one may practically assume for . Hence, noting that is usually very small (see [3–5]), the is approximated by where .

Under discrete time approximation, the ’s have been derived in the appendix. Using these results of expected numbers and using the result for , we obtain where and are defined in Section 3.3 and where the ’s are given by

Notice that if , then reduces to

Applying these results, for time homogeneous models with if , the ’s under discrete approximation are given as follows.(1)If , then . Hence, , for and for , (2)If , then we have and for ,