Abstract

A new cure rate model is presented by assuming that, conditional on , the number of competing causes of the event of interest follows the Poisson distribution, where is assumed a random variable with gamma and generalized exponential distributions. For the time-to-event of the concurrent causes, we assumed the recently introduced truncated Nadarajah–Haghighi model. The model is parameterized directly in terms of the cure term, and then different symmetric and asymmetric link functions are used to assess the effects of covariates, such as logit, probit, log-log, Cauchit, Aranda-Ordaz, skewed probit, and skewed logit. Parameters estimation for the model is approached based on the traditional maximum likelihood estimation method. We achieve a simulation study in order to investigate the performance of these estimators under different scenarios. Finally, the model is illustrated in data sets related to two kinds of cancers (melanoma and colon cancer).

1. Introduction

For the prevention and control as well as the management of cancer, modeling the lifetime of patients as well as predicting the remaining lifetime of patients is very important. The search for the evaluation and analysis of important and effective variables in cancer prognosis is one of the key topics in studies related to the analysis of survival and modeling of cancer data. In recent years, treatment for many diseases, especially cancers, has improved significantly; so, the number of patients who do not experience death has increased, and as a result, the use of cure rate models is more effective than conventional survival models. The cure rate models are presented to analyze the survival data in the presence of individuals with long-term survival and are survival models in which it is assumed that a proportion of individuals have never received the event. As a result, the survival curve reaches a plateau. Therefore, the use of standard survival models is not appropriate, because in ordinary models, the survival decreases to zero as . To introduce the model, let be an unobservable initial number of competing causes related to the occurrence of an event of interest to an individual in the population. Let be a sequence of random variables denoting the time-to-event for the event of interest (for example, in a cancer context, the number of carcinogenic cells that can produce detectable cancer). Conditional on , are assumed independent and identically distributed with the common survival function . We also assume that are independent of . If , we say the subject is uncured. If , we define and say the subject is cured. The time up to the event of interest is given by

The population survival function is given as follows (See Tsodikov et al. [1] and Rodrigues et al. [2] for details).where denotes the probability generating function (PGF) of .

In the literature, many distributions have been considered for the concurrent causes . To name a few, the negative binomial (de Castro et al. [3]), zero-modified geometric (Leao et al. negative binomial distribution for . Leão et al. [4]), geometric (Cancho et al. [5]), Mittag-Leffler (Di Crescenzo and Meoli [6]), logistic-binary (Balakrishnan et al. [7]), weighted Poisson (Balakrishnan et al. [8]), modified power seies (Gallardo et al. [9]), Flory–Schulz (Azimi et al. [10]), Bernoulli (BeCR), geometric (GeCR), and binomial (BCR) cure rate models (D’Andrea et al. [11]), Poisson (PCR) cure rate model (Chen et al. [12]), and the negative binomial (NBCR) cure rate model (Rodrigues et al. [2]), among others.

In this proposal, we present a novelty cure rate model based on the Poisson, gamma, and generalized exponential distributions. Moreover, the generalized truncated Nadarajah–Haghighi (GeTNH, Azimi and Esmailian [13]) will also be proposed to model the distribution of the concurrent causes. We highlight that this model was proposed newly in the literature.

This manuscript has been organized as follows. In Section 2, we present the Poisson-gamma (PGcr) and Poisson-generalized exponential (PGEcr) cure rate models based on a mixture of the Poisson distributions. In Section 3, we discuss the point and interval estimation problem for the model based on the maximum-likelihood approach under the presence of covariates with different regression structures. In Section 4, to assess the practical usefulness, applicability, and flexibility of the new model, we analyze the melanoma and colon cancer data. In Section 5, we present a simulation study in order to assess the performance of the estimators in finite samples. In Section 6, we summarize the main conclusions of the manuscript.

2. Modeling

Let be the random variable which denotes the number of competing causes associated with the event of interest. According to Barriga et al. [14], we consider that has Poisson distribution with mean , where is a constant and is a random parameter having a distribution with the probability density function (PDF) , which takes into account an unobserved heterogeneity or dependence from individual to individual. Thus, the probability generating function (PGF) of the random variable given the random variable is as follows:

Therefore, the marginal PGF of is given bywhere denotes the moment generating function (MGF) of the random variable . In the following section, we detail two particular cases.

2.1. Poisson-Gamma Cure Rate Model

In this case, we consider that has the gamma distribution with parameters and , which implies the MGF of gamma distribution is (to avoid identifiability problems, we set ) , and then the PGF is shown as follows:

From equations (2) and (5), the survival function for the population is given by the following expression:where is the cumulative distribution function (CDF) of the latent event times .

Hereafter, we refer to the model appear in equation (6) as the Poisson-gamma cure rate model (PGcr). The PGcr model investigated by Cancho et al. [15] is also known in the literature as the negative binomial cure rate model, because the marginal distribution of corresponds to the negative binomial distribution. They considered that the number of competing causes has a negative binomial distribution and used the PGF of the negative binomial distribution for obtaining the cure rate model. In this work, a different method is used to calculate the PGcr cure rate model.

From equation (6), the cure rate immediately is given by the following expression:

Hereafter, we use the parameterization , i.e., . In this approach, covariates can be introduced in the cure rate. This has the advantage of allowing comparison among regression coefficients for different models and also parameterized in terms of the cure rate. Under this alternative parameterization, the survival function for the PGcr model for the population from equation (6) is given by the following expression:and the population probability density function (PDF) is

Note that and are not a proper survival function and a proper PDF, respectively. The hazard function is as follows:

Alternatively, we assume the GeTNH model (Azimi and Esmailian [13]) for the latent variables , in equations (8)–(10). The PDF for this distribution is as follows:where, . The cumulative distribution function of GeTNH is given by the following expression:

For this particular case, we refer as the PGcr/GeTNH model. Thus, the PDF, survival function, and associated hazard functions areandrespectively.

Figures 1 and 2 present the HRF and the survival function for the PGcr/GeTNH model. We highlight that the HRF for the model assumes different patterns depending on the parameter values: decreasing, bathtub shaped, or upside-down bathtub shaped. For this reason, the PGcr/GeTNH provides flexibility to analyze the real data.

2.2. Poisson-Generalized Exponential Cure Rate Model

In this case, we consider that has the generalized exponential (GE) distribution with parameters and , which implies the MGF of GE distribution is (Gupta and Kundu [16])

From equations (2) and (16), the survival function for the population is given by the following expression:

Hereinafter, we make reference to this model as the Poisson-generalized exponential cure rate model (PGEcr). From (17), the cure rate is as follows:

Henceforth, we adopt the parameterization , i.e., .

With this parameterization in the cure rate , the survival function of the PGEcr model for the population from equation (17) is given by

The population PDF is as follows:

The hazard function related to the PDF (17) is as follows:

We refer to this model as the PGEcr/GETNH. The PDF, survival, and hazard functions are as follows:andrespectively.

Figures 3 and 4 present the HRF and the survival function for this model. Note that the PGEcr/GeTNH presents a HRF decreasing or upside-down bathtub shaped, depending on the parameter values.

3. Estimation

In this section, the parameters of the PGcr/GeTNH and PGEcr/GeTNH are derived based on the maximum-likelihood (ML) method. Define and as the failure and censoring times for the th individual, respectively. As usually in the prospective studies, we assumed a right censoring scheme, where for , and represent the observed times and the failure indicators, respectively. Note that and denote a failure and a censoring time for the corresponding individuals, respectively. In addition, to circumvent identifiability problems (Li et al. [17] and Hanin and Huang [18]), we considered a set of covariates for each individual, say , related to the cure rate. In addition, we explored the use of different link functions to the cure rate consideringwhere different A values are presented in Table 1.

In Table 1, , is a vector of unknown regression coefficients of dimension and denotes the standard normal cumulative distribution function.

With the standard assumptions in survival analysis (independence between the failure and censoring times, independence among random variables related to different observations, and noninformative censoring, see Williams and Lagakos [22] for details), the log-likelihood function for iswhere and are given in equations (13) and (14) for the PGcr model and equations (19) and (20) for the PGEcr model, respectively, with given in Table 1 equations.

Obtaining the estimators can be performed by maximizing numerically the log-likelihood function in (26), which can be achieved, for instance, using the NMaximize function in Mathematica software and the bbmle package of R [23].

4. Illustrative Example

In the following section, we provide the application of the methodology described in this study to the real data set in practice. We provide the application of the PGcr/GeTNH and PGEcr/GeTNH models to the colon cancer and malignant melanoma data sets. We compare the PGcr/GeTNH and PGEcr/GeTNH models with logit, probit, log-log, Cauchit, Aranda-Ordaz, skewed-logit, and skewed-probit link functions for the cure rate , by considering Akaike’ s information criterion (AIC, Akaike [24]) and Bayesian information criterion (BIC, Schwarz [25]). Lower values for such criteria suggest a preferable model. Also, we compare the PGcr/GeTNH and PGEcr/GeTNH models with some popular cure rate models, such as BeCR, GeCR, BCR, PCR, and NBCR based on the GeTNH distribution.

4.1. Melanoma Data

The malignant melanoma data set relates to the survival time after an operation for removal of a malignant melanoma (ranging from 0.0274 to 15.25 years) until the patient’s death, which is possibly censored. This data set has 205 observations and is available in the timereg package of R12. For more details of this data set, see https://CRAN.R-project.org/package=timereg. The following covariate variables are available:(i) sex: (1 = male, 38.5%, 0 = female, 61.5%),(ii): age, more than 55 years (0 = no, 53.7%, 1 = yes, 46.3%),(iii): tumour thickness: more than 2.99198 cms. (0 = no, 64.9%, 1 = yes, 35.1%),(iv): ulceration (1 = present, 43.9%, 0 = absent, 56.1%).

Figure 5 presents the Kaplan–Meier estimator for the survival function of the malignant melanoma data. Note that the proportion of patients censored at the end of the experiment may be immune, suggesting the existence of cured individuals in the population.

Table 2 provides the ML estimators with the corresponding AIC and BIC with the GeTNH distribution for different most popular cure rate models with the logit link function. Table 3 provides the same information, assuming the GeTNH distribution to the time-to-event for the concurrent causes. For comparing the cure rate models, by using the logit link function, AIC and BIC in Tables 2 and 3 indicate that the PGcr model is preferable among the BeCR, BCR, GeCR, PCR, and NBCR cure rate models in terms of fitted information criteria. Also, for comparing the link functions, AIC and BIC in Table 3 indicate that the PGcr/GeTNH model with the Cauchit link function has the lowest value, and it is preferable among the PGcr and PGEcr cure rate models with different link functions in terms of fitted information criteria. Therefore, comparing the AIC and BIC in Table 3, we conclude that the PGcr/GeTNH model with the Cauchit link function model provides a better fit to the melanoma data. From Table 3 and using the PGcr/GeTNH model with the Cauchit link function, we conclude that the coefficients of sex , age , tumour thickness , and ulceration status have a significant effect on the cure rate.

Figure 6 presents the estimated survival function based on the PGcr/GeTNH model stratified by sex: (1 = male, 0 = female), age more than 55 years (0 = no, 1 = yes), tumour thickness more than 2.99198 cms. (1 = yes), and ulceration (1: present). We observe that the cure rate of the male patients with more than 55 years, ulceration (present), and tumour thickness more than 2.99198 cms. is greater than the cure rate of male patients with less than 55 years and ulceration (present). The cure rate of the female patients with lower than 55 years, ulceration (present), and tumour thickness of more than 2.99198 is greater than the cure rate of female patients with more than 55 years and ulceration (present), and tumour thickness more than 2.99198.

4.2. Colon Cancer Data

This data set is related to patients with colon cancer, with 50.58% of censoring (938 patients) and is available in the survival package [26] of R12. The response variable is the time upto the death or censoring for the patient (measure in years). The following variables are also available:(i)node4: , more than 4 positive lymph nodes (0 = no, 72.6%, 1 = yes, 27.4%),(ii)sex: , (1 = male, 47.9%, 0 = female, 52.1%),(iii)obstruct: , obstruction of colon by tumour (0 = no, 80.6%, 1 = yes, 19.4%),(iv)surg: , time from surgery to registration (0 = short, 73.4%, 1 = long, 26.6%),(v)tr: , treatment (Obs (ervation) = 1, 33.9%, Lev (amisole) = 2, 33.4%, Lev (amisole)+5-FU = 3, 32.7%),(vi)extent: , extent of local spread (1 = submucosa, 2.3%, 2 = muscle, 11.4%, 3 = serosa, 81.7%, 4 = contiguous structures, 4.6%),(vii)adh: , adherence to nearby organs (0 = no, 85.5%, 1 = yes, 14.5%),(viii)age: , more than 55 years (0 = no, 28.5%, 1 = yes, 71.5%).

Figure 7 presents the Kaplan–Meier estimator for this data. Again, a plateau in the plot suggests that a proportion of patients can be considered as cured. Table 4 provides the ML estimators of the parameters and corresponding information criteria for the colon cancer data, with the GeTNH distribution for different most popular cure rate models with the logit link function. Tables 5 and 6 (based on the PGcr model and the PGEcr model, respectively) provide the ML estimators and standard errors of the parameters and the corresponding information criteria for the colon cancer data, assuming the GeTNH distribution for the time-to-event for the concurrent causes. For comparing the cure rate models, by using the logit link function, AIC and BIC in Tables 46 indicate that the PGcr model is preferable, and it is the best fit among the BeCR, BCR, GeCR, PCR, and NBCR cure rate models in terms of the fitted information criteria. Therefore, comparing AIC and BIC in Tables 46, we realize that the PGcr model provides a better fit for the colon cancer data. Also, for comparing the link functions, AIC and BIC in Tables 5 and 6 indicate that the PGcr/GeTNH model with the Cauchit link function has the lowest value, and it is the best fit among the PGcr and PGEcr cure rate models with different link functions in terms of the fitted information criteria. Therefore, comparing the AIC and BIC in Tables 5 and 6, we conclude that the PGcr/GeTNH model with the Cauchit link function model provides a better fit for the colon cancer data. From Table 5 and using the PGcr/GeTNH model with Cauchit link function, we conclude that the coefficients of node4 , abstruct , surg , Obs (ervation) , Lev (amisole) , adhere , node4 , abstruct , surg , Obs (ervation) , Lev (amisole) , adhere and extent of local spread (submucosa) , extent of local spread (muscle) , and extent of local spread (serosa) have significant effect on the cure rate. Figure 8 displays the plots of the estimated survival functions of the PGcr/GeTNH model for the colon cancer data stratified by node4 (more than 4 positive lymph nodes (0 = no, 1 = yes)), treatment (Obs (ervation) = 1, Lev (amisole) = 2), extent: serosa, surg: short, adherence: 0, and obstruct: 0. From Figure 8, the cure rate of the patients with more than 4 positive lymph nodes (yes) is lower than the cure rate of the patients with more than 4 positive lymph nodes (no). In the case of greater than 4 positive lymph nodes (yes), the cure rate of the patients with levamisole treatment is more than the cure rate of the patients with observation treatment. In the case of more than 4 positive lymph nodes (no) the cure rate of the patients with observation treatment is greater than the cure rate of the patients with levamisole treatment.

5. Simulation Study

In this section, we present a simulation study to perform some link functions and show the accuracy of the MLEs of the parameters of the PGcr and PGEcr models based on the GeTNH distribution with covariates. Applying a similar algorithm due to Kutal and Qian [27], the right-censored samples of size from the PGcr and PGEcr models for the GeTNH model can be drawn as follows.Step 1: Fix the parameters values, , and , and the cure rate for each individual .Step 2: Simulate , .Step 3: Simulate asif , otherwise .Step 4: Simulate from the GeTNH model, adjusting the parameters to obtain the desired percentage of censoring.Step 5: Compute and if and if . The drawn information is .

We consider the PGcr and PGEcr models with four covariates , , , and , where , , and , all of them are independent.

In the simulation study, the initial values of the parameters are . The initial value of , and were computed from the combinations of values of and cure rates . For , we choose (0, 0, 0, 0), (1, 0, 0, 0), (0, 1, 0, 0), (0, 0, 2, 0), and (0, 0, 1, 2). In the studies, we also consider the cure rate (0.2, 0.205, 0.196, 0.192, and 0.193). Solving the four equations resulting from the logit link function in Table 1, we obtain , , , and .

For the simulations, we take the sample sizes and . We replicate the simulations 1000 times and evaluate the ML estimators, root mean square errors (RMSEs), average of the asymptotic standard errors (SEs), and coverage probability of the 95% asymptotic confidence intervals (CP). The program codes are available once requested.

Based on the simulation results in Table 7, we observe that the RMSEs for ML estimators decrease when the sample size increases. Also, Table 7 show that the coverage probabilities of the confidence intervals are quite close to the nominal level of 95%. Table 8 displays a model comparison using AIC and BIC and the percentage of the lowest AIC and BIC under each fitted link function for sample sizes and . For the PGcr/GeTNH model, when the sample size is 150, the results shown in Table 8, when the true link is the logit, the average AIC (yields a better fit 23.22% from simulations) and BIC (yields a better fit 23.77% from simulations) for the PGcr/GeTNH model with the log-log link function is the lowest and when the sample size is 300, the average AIC (yields a better fit 17.48% from simulations) for the PGcr/GeTNH model with the log-log link function is the lowest and BIC (yields a better fit 18.71% from simulations) for the PGcr/GeTNH model with the probit link function is the lowest.

For the PGEcr/GeTNH model, when the sample sizes are 150 and 300, the results shown in Table 8, when the true link is the logit, the average AIC (yields a better fit 16.98% from simulations for the sample size 150) for the PGEcr/GeTNH model with the logit link function is the lowest. The average BIC (yields a better fit 17.54% from simulations for the sample size 150) for the PGEcr/GeTNH model with the log-log link function is the lowest. The average AIC (yields a better fit 18.03% from simulations for the sample size 300) for the PGEcr/GeTNH model with the log-log link function is the lowest. The average BIC (yields a better fit 18.63% from simulations for the sample size 300) for the PGEcr/GeTNH model with the log-log link function is the lowest.

6. Conclusion

In this work, we proposed new cure rate models (the PGcr/GeTNH and PGEcr/GeTNH models) for the survival data assuming that the number of competing causes of the interesting event follows the Poisson distribution. For the time of the event of interest, we have proposed a GeTNH distribution that is more flexible than the traditional models in the literature to estimate the cure rate fraction and the effects of the covariates on the cure rate fraction. We use different link functions such as logit, probit, log-log, Cauchit, Aranda-Ordaz, skewed probit, and skewed logit to evaluate the effects of covariates on the cure rate. The MLE method was applied to estimate the parameters of the new models. As shown above, based on simulation, we highlighted that the log-log link function has the best fit among the different link functions in terms of fitted information criteria. In the empirical application of the colon cancer data, we discovered that the PGcr/GeTHN model with the Cauchit link function provided the best fit among other traditional cure rate models in the literature.

Data Availability

Previously reported data were used to support this study and are available at https://CRAN.R-project.org/package=timereg and https://cran.r-project.org/web/packages/survival/index.html.

Conflicts of Interest

The authors declare that they have no conflicts of interest.