A Comparison between Cure Model and Recursive Partitioning: A Retrospective Cohort Study of Iranian Female with Breast Cancer
Background. Breast cancer which is the most common cause of women cancer death has an increasing incidence and mortality rates in Iran. A proper modeling would correctly detect the factors’ effect on breast cancer, which may be the basis of health care planning. Therefore, this study aimed to practically develop two recently introduced statistical models in order to compare them as the survival prediction tools for breast cancer patients. Materials and Methods. For this retrospective cohort study, the 18-year follow-up information of 539 breast cancer patients was analyzed by “Parametric Mixture Cure Model” and “Model-Based Recursive Partitioning.” Furthermore, a simulation study was carried out to compare the performance of mentioned models for different situations. Results. “Model-Based Recursive Partitioning” was able to present a better description of dataset and provided a fine separation of individuals with different risk levels. Additionally the results of simulation study confirmed the superiority of this recursive partitioning for nonlinear model structures. Conclusion. “Model-Based Recursive Partitioning” seems to be a potential instrument for processing complex mixture cure models. Therefore, applying this model is recommended for long-term survival patients.
Breast cancer, which is the second most prevalent cancer among Iranian females , is the most common cause of women cancer death in the world . Iran Ministry of Health has reported the age-standardized incidence rate of 33.21 per 100,000 female population . Iranian patients with breast cancer are younger than the west countries patients; this faster disease formation may lead to a heavier burden . Furthermore, the earlier detection of breast cancer would improve the life expectancy  and this is another evidence for the need of valid modeling to precisely predict the patients’ hazard. A proper modeling would correctly detect the factors’ effect on breast cancer, which may be the basis of health care planning .
Cox Proportional Hazard and Weibull Models are the two most widely used techniques to model the survival of breast cancer patients [6–9]. But admiring today’s medical progressions, there is a high probability of being cured . Because of this achievement, cure model is becoming more proper method especially when curability of a disease could be considered as a reality [10, 11].
The same as mixture cure model that probably allocates population individuals into one of the cured or patients groups, there are various statistical learning algorithms which divide the population into homogenous subsets. Referring to their higher accuracy and lower error rates, several articles claim the excellence of these recently introduced algorithms to their traditional counterparts [12–15]. “Model-Based Recursive Partitioning” (MoBRP) is one of the most interpretable members of this family and provides a proper power of prediction in nonlinear regression relationships . This model is a hybrid tree which combines the traditional model fitting with the tree machine learning algorithm. Furthermore, MoBRP derives the benefits of regression trees such as the ability of detecting complex unknown model structures and interactions .
To the best of our knowledge, there is no study for modeling the survival time of Iranian breast cancer patients by using “Parametric Mixture Cure Models” (PMCM) and cautiously the only application of “Model-Based Recursive Partitioning” in survival analysis was made by Zeileis et al. to analyze German Breast Cancer dataset . So the goal of this study is to compare the fitness of these two mentioned statistical methods through simulated and also practical breast cancer datasets.
2. Materials and Methods
For this retrospective cohort study, the information of 539 breast cancer patients was obtained. Approximately 37% of patients experienced death of breast cancer and the remaining were censored. These patients had been referring to Diagnostic Center of Hamedan Mahdieh Darolaytam during 1995–2013. The study entrance criteria were as follows:(i)Patients who have experienced one of the lumpectomy, quadrantectomy, simple or total mastectomy, or modified radical mastectomy surgeries.(ii)Female breast cancer patients who underwent chemotherapy and radiotherapy before or after surgery.The event of interest was death of breast cancer and survival time was measured in days from the date of diagnosis to the date of participants’ death. Additionally, some medical prognostic and baseline characteristics factors were gathered, for example, “Human Epidermal growth factor Receptor 2” (HER2), “Progesterone Receptor Status” (PR), “Estrogen Receptor Status” (ER), and “number of involved lymph nodes.”
2.2. Mixture Cure Model
A basic assumption for almost all survival models is that, after sufficiently long follow-up, every individual in the population would eventually experience the event of interest. Actually this assumption is violated for some practical situations. Mixture cure is a flexible model that can overcome this limitative assumption. This model considers a subset of population as nonsusceptible. Nonsusceptible individuals are cured and would never experience the event of interest . Clearly, a patient that is cured of breast cancer is nonsusceptible for experiencing the death of it.
Cured individuals would appear as censor observations during the course of follow-up. Empirical evidence for the presence of nonsusceptible individuals is the long, stable plateau which usually contains heavy censoring at the end of Kaplan-Meier survival curves [19, 20]. Provided sufficient follow-up, stabled level of probability, at the right extreme of the Kaplan-Meier, is a consistent estimator for the proportion of nonsusceptible cured patients .
Let be the indicator variable that shows the status of being susceptible; stands for susceptible or uncured patients, while stands for cured individuals. Therefore, the cure model is defined as follows:where is the conditional survival of susceptible individuals given the vector of covariates , this probability can be modeled by one of the usual survival models such as Weibull, which is the most proper in this context [10, 11, 18, 21–25], and is the survival function of nonsusceptible individuals and is embedded as one, in the aforementioned formula.
defines the probability of being susceptible and can be modeled by one of the binary regressions such as logistic which is more common [11, 18, 23–26], as is the vector of covariates and maybe the same as .
Finally, has been named marginal survival and shows the survival of the entire population.
2.3. Model-Based Recursive Partitioning
If a global model for all observations fits inappropriately, the total population could be split in a way that a proper fit is provided for each subset; this idea is the main motivation of MoBRP technique. This partitioning is actually a tree where each node is associated with a specific parametric model. The partitioning takes place in such a way that a stable model fitting is provided for each subset [16, 27]. More precisely, the algorithm for growing the tree is as follows:(1)Fit a parametric model to a dataset.(2)Statistically assess the stability of estimated parameters over some partitioning variables.(3)If there is an overall instability through all the estimated parameters, the population would be split along with the partitioning variable which is responsible for the most instability. It should be added that splitting points are chosen in such a way that residual sum of squares or negative log-likelihood is minimized.(4)Repeat the algorithm in each terminal node.To avoid overfitting, this kind of tree is accomplished by pre- and postpruning; prepruning is implemented via Bonferroni value correction for partitioning variable selection and postpruning can be done via “Akaike Information Criterion” or “Bayesian Information Criterion” .
2.4. Simulation Study
A simulation study was planned in order to compare the performance of PMCM and MoBRP.
Data were generated from Logistic-Weibull mixture cure model [18, 28], where In agreement with other studies [29–31], standard Normal and Uniform distributions were used for simulation. The covariates were fixed by design; was generated from standard Normal distribution and were generated from standard Uniform distribution. The shape parameter was also fixed at .
To discover the trend of goodness of fit, the simulation was replicated 100 times at each of 36 configurations given by three levels of censoring rate, 40%, 60%, and 80% of total population; three levels of cure rate (0%, 15%, and 30%); and two levels of sample size, 500 and 1000 observations, furthermore; to survey one more complicated model structure in an additional scenario, interaction effects of with and (i.e., and ) were added to the survival part of PMCM. Finally, to check the results for different shape parameters, some extra configurations were conducted for simulated samples of size 500 observations and the shape parameter of size 0.5.
2.5. Statistical Methods
-test [20, 32] and Kaplan-Meier were used in order to check for sufficiency of follow-up and estimating the fraction of nonsusceptible individuals. Using backward variable selection, the best fit of PMCM was chosen for a Logistic-Weibull fitting. Considering the extensibility of Weibull survival distribution, it was also used for tree node modeling.
Finally, a simulation study was designed to evaluate the performance of two methods. It should be noted that “AIC” postpruning was applied to MoBRPs.
3. Results and Discussion
The 5-year survival rate was 68.5%. The median life time, from the time of diagnosis, was 9.02; furthermore, the population mortality rate was 36.73%. The patients’ age at diagnosis was ranged from 22 to 79 and its mean (SD) and median were 46.1 (10.8) and 45 years, respectively. According to the primary information of dataset, 256 (47.49%) of individuals experienced equal or less than two involved lymph nodes and 329 (61.04%) of them the tumor size were less than two centimeters. ER+, PR+, and HER2+ were seen for 41.19%, 32.84%, and 76.44% of patients, respectively.
Nonparametric -test rejected the insufficiency of follow-up time and as can be seen from Figure 1, Kaplan-Meier curve has been stabled at the probability of almost 0.20; this implies that 20 percent of the population is cured and nonsusceptible. The plateau tail of this curve during the study period is another visual reason of sufficient follow-up. This plot suggests “8.85 years” as the median survival time for the total population.
Table 1 shows the result of mixture cure model fitting. According to the obtained parameters for this model, the estimated mean of cure rate is about 25% of the total population. The proximity of this rate to the Kaplan-Meier estimation indicates that the crude nonparametric Kaplan-Meier method confirms its parametric counterpart in a fair manner.
The estimated parameters in the logistic part of PMCM imply that a unit increment in “tumor size” and “number of involved lymph nodes” would increase the odds of being susceptible by 1.5 and 1.1 times, respectively. The negative and positive estimated parameters, respectively, for ER+ and PR+, in the Weibull part of the cure model, also confirmed the risk and protective effects of these factors for patients with breast cancer. Based on the model, the estimated median survival time is 6.02 years for uncured and 8.03 years for the total population.
The fitness and estimated parameters of MoBRP are shown in Figure 2. Total population was divided according to the two partitioning variables and three terminal nodes were formed. The censoring rate for the first terminal node was 86% which was much higher than censoring rates 67% and 57%, respectively, for the second and third terminal nodes. The MoBRP resulted AIC was 3698.7, which was almost less than its counterpart of mixture cur model. This difference clarified the superior performance of MoBRP from the perspective of a full-likelihood-based criterion.
For each subset of population in terminal tree nodes, Kaplan-Meier plot was attached to the Figure 2. The most censoring rate was seen for the first terminal node where Kaplan-Meier stabled at the high probability of 0.81 and also patients in this node were associated with lower levels of the risk factors. Kaplan-Meier curve for the third terminal node decreased with a steeper slope than the plots for the first and second terminal nodes. The log-rank test reported a significant difference between the survival curves for patients belonging to the first and third terminal nodes ( value = 0.012). All these evidences proved that Weibull-regression-based tree could divide the population into three subsets containing low risk, high risk, and moderate risk patients; additionally low risk terminal node with heavy censoring could be considered as the node with the most cured individuals.
Tables 2–4 present the results of PMCM and MoBRP fitting for simulated data. This simulation study showed that an enhancement in cure rate would increase the “AIC”; however, smaller AICs were resulted for higher censoring rates. The same as Weibull modelling, as the number of observations decreases, “AIC” values would decrease. The comparison of simulation results, with and without interaction models, indicated that the superiority performance of PMCM or MoBRP depends on operating conditions. Smaller “AIC” is expected for PMCM, when the model structure is simple and completely known; actually this condition is so rare in medical modeling. On the other hand, as MoBRP is capable of detecting unknown covariates relationships and interactions, it would be preferred when there may exist high order of factor effects or complex structures . Finally, comparison of tables with different shape parameters indicated that the mentioned trend of “AIC,” which was caused by the changes in cure and censor rates, would be the same for different shapes.
Although PMCM has been used previously to model the Iranian patients’ survival [24, 25], this study is the first report to apply this technique to model the survival of Iranian women with breast cancer. Maybe the most similar studies refer to Jafari-Koshki et al. and Rahimzadeh et al. [33, 34] where a Bayesian nonmixture cure is applied to model the survival of breast cancer patients. In agreement with our investigation, the study of Jafari-Koshki et al. also determined that the effects of tumor size, number of involved nodes, and ER+ are to the detriment of life expectancy. Furthermore, the absence of PR expression is associated with breast cancer progression. It should be added that the harmful and beneficial effects of the above-mentioned factors are verified by PMCM in many studies related to different parts of the world other than Iran. Rondeau et al. used parametric mixture cure frailty to model the survival of breast cancer patients from the south-western of France . Alike to our investigation, tumor size and number of involved nodes were the significant factors for the logistic part; the protective effect of PR+ was also confirmed via their survey. Forse et al. used a Weibull-Logistic PMCM and clarified the significant effect of tumor size to increase the probability of breast cancer recurrence . According to their study, ER+ and HER2 were not effective for modeling the recurrence time among susceptible patients. Faradmal et al. have concluded similar results by means of our practical handled dataset . A time-dependent Cox model was used in their analysis and maybe the application of the same dataset is the main reason for the similarity of factor effects. All these convergent results provide an almost complete guideline for clinicians in the assessing of disease progression. Obviously, prescribing an efficient treatment is conditioned on a timely accurate diagnosis and we have practically introduced PMCM to support better diagnostics.
Similar to mixture cure model which is composed of two parts, one part for attributing the probability of being susceptible and the other for modeling the survival of uncured individuals, MoBRP is a simultaneous bipartite technique of classification and survival time modeling. The only usage of MoBRP in survival analysis is referred to Zeileis et al. which applied this technique for modelling the survival of 686 German patients with breast cancer. Eight covariates were used as prognostic factors; couple of them were selected for node modeling and the remaining six factors were considered as partitioning variables. MoBRP resulted in a two-terminal-nodes tree which was formed by PR split .
In addition to MoBRP lower AIC, its selected split points for the partitioning variables are another marvel of its operation. Tumor size is partitioned at 1.8 centimeters which is so proximate to 2 cm as its empirical surrogate; the efficacy of this cut point is assessed by many clinical investigations [11, 23, 35–40].
Although MoBRP has not been designed to account for cure fraction, this survey certifies its capability to provide a fine separation of individuals with different risk levels, especially in nonlinear associations. Therefore, MoBRP seems to be a potential instrument for processing complex mixture cure models. Therefore, applying this model is recommended for long-term survival patients.
The author declares that there are no competing interests regarding the publication of this paper.
This investigation is the consequence of the author’s constructive suggestions and is funded by the Vice Chancellor for Research and Technology of Hamadan University of Medical Sciences as a part of Ph.D. thesis. The authors are grateful to Professor Achim Zeileis because of his prompt guidelines and assistance. At last the authors acknowledge Diagnostic Center of Hamedan Mahdieh Darolaytam that generously provided the dataset.
S. M. Mousavi, M. A. Mohaghegghi, A. Mousavi-Jerrahi, A. Nahvijou, and Z. Seddighi, “Burden of breast cancer in Iran: a study of the Tehran population based cancer registry,” Asian Pacific Journal of Cancer Prevention, vol. 7, no. 4, pp. 571–574, 2006.View at: Google Scholar
M. Hajihosseini, J. Faradmal, and A. Sadighi-Pashaki, “Survival analysis of breast cancer patients after surgery with an intermediate event: application of illness-death model,” Iranian Journal of Public Health, vol. 44, no. 12, pp. 1677–1684, 2015.View at: Google Scholar
M. Enayatrad, N. Amoori, and H. Salehiniya, “Epidemiology and trends in breast cancer mortality in iran,” Iranian Journal of Public Health, vol. 44, no. 3, pp. 430–431, 2015.View at: Google Scholar
M. Ghaneie, A. Rezaie, N. R. Ghorbani, R. N. Heidari, M. Arjomandi, and M. Zare, “Designing a minimum data set for breast cancer: a starting point for breast cancer registration in Iran,” Iranian Journal of Public Health, vol. 42, no. 1, pp. 66–73, 2013.View at: Google Scholar
I. Z. binti Zulkifli, H. binti Haron, H. A. binti Abd Rahman, M. S. binti Sarbandi, and N. N. binti Abdullah, “Identifying prognostic factors of breast cancer: comparison between cox proportional Hazard and Weibull model,” in Proceedings of the International Symposium on Mathematical Sciences and Computing Research (iSMSC '13), vol. 52, pp. 142–146, Ipoh, Malaysia, December 2013.View at: Google Scholar
R. A. M. Al-Naggar, Z. M. Isa, S. A. Shah et al., “Eight year survival among breast cancer Malaysian women from University Kebangsaan Malaysia Medical Centre,” Asian Pacific Journal of Cancer Prevention, vol. 10, no. 6, pp. 1075–1078, 2009.View at: Google Scholar
N. Fouladi, F. Amani, A. S. Harghi, and N. Nayebyazdi, “Five year survival of women with breast cancer in Ardabil, north-west of Iran,” Asian Pacific Journal of Cancer Prevention, vol. 12, no. 7, pp. 1799–1801, 2011.View at: Google Scholar
M. R. Baneshi and A. Talei, “Assessment of internal validity of prognostic models through bootstrapping and multiple imputation of missing data,” Iranian Journal of Public Health, vol. 41, no. 5, pp. 110–115, 2012.View at: Google Scholar
V. Rondeau, E. Schaffner, F. Corbière, J. R. Gonzalez, and S. Mathoulin-Pélissier, “Cure frailty models for survival data: application to recurrences for breast cancer and to hospital readmissions for colorectal cancer,” Statistical Methods in Medical Research, vol. 22, no. 3, pp. 243–260, 2013.View at: Publisher Site | Google Scholar | MathSciNet
N. Süt and M. Şenocak, “Assessment of the performances of multilayer perceptron neural networks in comparison with recurrent neural networks and two statistical methods for diagnosing coronary artery disease,” Expert Systems, vol. 24, no. 3, pp. 131–142, 2007.View at: Publisher Site | Google Scholar
J. Nilsson, M. Ohlsson, L. Thulin, P. Höglund, S. A. M. Nashef, and J. Brandt, “Risk factor identification and mortality prediction in cardiac surgery using artificial neural networks,” The Journal of Thoracic and Cardiovascular Surgery, vol. 132, no. 1, pp. 12–19.e1, 2006.View at: Publisher Site | Google Scholar
H.-Y. Shi, S.-L. Hwang, K.-T. Lee, and C.-L. Lin, “In-hospital mortality after traumatic brain injury surgery: a nationwide population-based comparison of mortality predictors used in artificial neural network and logistic regression models: clinical article,” Journal of Neurosurgery, vol. 118, no. 4, pp. 746–752, 2013.View at: Publisher Site | Google Scholar
A. Zeileis, T. Hothorn, and K. Hornik, “Evaluating model-based trees in practice research report series,” Tech. Rep. 32, Vienna University of Economics and Business, Department of Statistics and Mathematics, Vienna, Austria, 2006.View at: Google Scholar
L. López Segovia, Survival data analysis with heavy-censoring and long-term survivors [Ph.D. thesis], Polytechnic University of Catalonia, Barcelona, Spain, 2014.
R. A. Maller and X. Zhou, Survival Analysis with Long-Term Survivors, Wiley Series in Probability and Statistics: Applied Probability and Statistics, John Wiley & Sons, New York, NY, USA, 1996.View at: MathSciNet
C. L. Forse, Y. E. Yilmaz, D. Pinnaduwage et al., “Elevated expression of podocalyxin is associated with lymphatic invasion, basal-like phenotype, and clinical outcome in axillary lymph node-negative breast cancer,” Breast Cancer Research and Treatment, vol. 137, no. 3, pp. 709–719, 2013.View at: Publisher Site | Google Scholar
M. Rasouli, M. R. Ghadimi, M. Mahmoodi, K. Mohammad, H. Zeraati, and M. Hosseini, “Survival analysis of patients with esophageal cancer using parametric cure model,” Asian Pacific Journal of Cancer Prevention, vol. 12, no. 9, pp. 2359–2363, 2011.View at: Google Scholar
A. A. Akhlaghi, I. Najafi, M. Mahmoodi, A. Shojaee, M. Yousefifard, and M. Hosseini, “Survival analysis of Iranian patients undergoing continuous ambulatory peritoneal dialysis using cure model,” Journal of Research in Health Sciences, vol. 13, no. 1, pp. 32–36, 2013.View at: Google Scholar
A. Zeileis, T. Hothorn, and K. Hornik, Party with the Mob: Model-Based Recursive Partitioning in R, vol. 20, Institute for Statistics and Mathematics WU Wirtschaftsuniversität Wien, Wien, Austria, 2012.
J. P. Klein and M. L. Moeschberger, Survival Analysis: Techniques for Censored and Truncated Data, Springer, New York, NY, USA, 2003.
K. Seppä, T. Hakulinen, and E. Läärä, “Regional variation in relative survival—quantifying the effects of the competing risks of death by using a cure fraction model with random effects,” Journal of the Royal Statistical Society: Series C (Applied Statistics), vol. 63, no. 1, pp. 175–190, 2014.View at: Publisher Site | Google Scholar
D. Moriña and A. Navarro, “The R package survsim for the simulation of simple and complex survival data,” Journal of Statistical Software, vol. 59, no. 2, pp. 1–21, 2014.View at: Google Scholar
J. Faradmal, M. Mafi, A. Sadighi-Pashaki, M. Karami, and G. Roshanaei, “Factors affecting survival in breast cancer patients referred to the Darol Aitam-e Mahdieh center,” The Scientific Journal of Zanjan University of Medical Sciences, vol. 22, no. 93, pp. 105–115, 2014.View at: Google Scholar
J. Faradmal, A. Talebi, A. Rezaianzadeh, and H. Mahjub, “Survival analysis of breast cancer patients using cox and frailty models,” Journal of Research in Health Sciences, vol. 12, no. 2, pp. 127–130, 2012.View at: Google Scholar
T. Fusun, A. Semsi, U. Cem et al., “Association of HER-2/neu overexpression with the number of involved axillary lymph nodes in hormone receptor positive breast cancer patients,” Experimental Oncology, vol. 27, no. 2, pp. 145–149, 2005.View at: Google Scholar
J. W. Jakub, K. Bryant, M. Huebner et al., “The number of axillary lymph nodes involved with metastatic breast cancer does not affect outcome as long as all disease is confined to the sentinel lymph nodes,” Annals of Surgical Oncology, vol. 18, no. 1, pp. 86–93, 2011.View at: Publisher Site | Google Scholar