Research Article  Open Access
Yifei Chen, Zhenyu Jia, Dan Mercola, Xiaohui Xie, "A Gradient Boosting Algorithm for Survival Analysis via Direct Optimization of Concordance Index", Computational and Mathematical Methods in Medicine, vol. 2013, Article ID 873595, 8 pages, 2013. https://doi.org/10.1155/2013/873595
A Gradient Boosting Algorithm for Survival Analysis via Direct Optimization of Concordance Index
Abstract
Survival analysis focuses on modeling and predicting the time to an event of interest. Many statistical models have been proposed for survival analysis. They often impose strong assumptions on hazard functions, which describe how the risk of an event changes over time depending on covariates associated with each individual. In particular, the prevalent proportional hazards model assumes that covariates are multiplicatively related to the hazard. Here we propose a nonparametric model for survival analysis that does not explicitly assume particular forms of hazard functions. Our nonparametric model utilizes an ensemble of regression trees to determine how the hazard function varies according to the associated covariates. The ensemble model is trained using a gradient boosting method to optimize a smoothed approximation of the concordance index, which is one of the most widely used metrics in survival model performance evaluation. We implemented our model in a software package called GBMCI (gradient boosting machine for concordance index) and benchmarked the performance of our model against other popular survival models with a largescale breast cancer prognosis dataset. Our experiment shows that GBMCI consistently outperforms other methods based on a number of covariate settings. GBMCI is implemented in R and is freely available online.
1. Introduction
Survival analysis focuses on developing diagnostic and prognostic models to analyze the effect of covariates on the outcome of an event of interest, such as death or disease recurrence in disease studies. The analysis is often carried out using regression methods to estimate the relationship between the covariates and the time to event variable. In clinical trials, time to events is usually represented by survival times, which measure how long a patient with a localized disease is alive or diseasefree after treatment, such as surgery or surgery plus adjuvant therapy. The covariates used in predicting survival times often include clinical features, such as age, disease status, and treatment type. More recently, molecular features, such as expression of genes, and genetic features, such as mutations in genes, are increasingly being included in the set of covariates. Survival analysis also has applications in many other fields. For instance, it is often used to model machine failure in mechanical systems. Depending on specific circumstances, survival times may also be referred to as failure times.
A major complication for survival analysis is that the survival data are often incomplete due to censoring, because of which standard statistical and machine learning tools on regression cannot be readily applied. The most common type of censoring occurring in clinical trials is right censoring, where the survival time is known to be longer than a certain value but its precise value is unknown. This can be due to multiple reasons. For instance, a patient might withdraw from a clinical trial or a clinical trial might end early such that some patients are not followed up with afterwards.
Many statistical methods have been developed for survival analysis. One major category of these methods adopts a likelihoodbased approach. An essential component of the models in this category is the estimation of the hazard function , defined as the event rate at time conditional on survival up to time . Different models often impose different assumptions on the forms of the hazard function. In particular, the proportional hazards (PH) model (also called the Cox model), one of the most prevalent models in survival analysis, assumes that different covariates contribute multiplicatively to the hazard function [1–4]. To relax the proportional hazards assumption and allow for more complicated relationships between covariates, parametric models based on artificial neural networks (ANN) [5–8] and ensembles of tree models based on boosting [9–12] have also been proposed. In order to handle the censored data, all these models use an approximation of the likelihood function, called the Cox partial likelihood, to train the predictive model. The partial likelihood function is computationally convenient to use; however, it is unclear how well the full likelihood can be approximated by the partial likelihood.
Many other methods aiming at optimizing a different class of objective functions rather than the partial likelihood have also been proposed. Some of these methods adapt existing regression models to estimate the relationship between survival times and covariates, by taking the censored data into account in training the models [13, 14], while others adopt a classificationbased framework and train their models using only the rank information associated with the observed survival times [8, 15, 16]. Recently, random survival forests [17, 18], a new ensembleoftrees model based upon bagging, became popular in survival analysis. They resort to predicting either the cumulative hazard function or the logtransformed survival time.
In clinical decision making, physicians and researchers are often more interested in evaluating the relative risk of a disease between patients with different covariates than the absolute survival times of these patients. For this purpose, Harrell et al. introduced the important concept of concordance index (Cindex, concordance C, or simply CI) as a measure of the separation between two survival distributions [19, 20]. Given two survival distributions, the Cindex computes the fraction of pairs of patients with consistent risk orders over the total number of validly comparable pairs. Because of its focus on assessing the accuracy of relative risk, the Cindex is widely adopted in survival model performance evaluation, where the order of predicted survival times is compared to the order of the observed ones [21–23].
Our goal in this paper is to develop a new survival model to capture the relationship between survival times and covariates by directly optimizing the Cindex between the predicted and observed survival times. Although both the Cox model based on partial likelihood and the rankedbased methods mentioned above also utilize only the order information between survival times, the Cindex based method provides a more principled way of combining all pairwise order information into a single metric. There have been prior attempts in directly learning the Cindex for survival analysis, including a neural network based model [21] and an extension of the Cox model trained using a lower bound of Cindex [22]. However, both methods impose parametric assumptions on the effect of covariates on survival times. Our contribution here is to adopt a nonparametric approach to model the relationship between survival times and covariates by using an ensemble of trees and to train the ensemble model by learning the Cindex.
In the following, we will provide a detailed description of our ensemble survival model based on learning the Cindex. We will derive an algorithm to train the model using the gradient boosting method originally proposed by Friedman [9]. The algorithm is implemented in an R software package called GBMCI (gradient boosting machine for concordance index), which is freely available at https://github.com/ucicbcl/GBMCI. We benchmark the performance of GBMCI using a largescale breast cancer prognosis dataset and show that GBMCI outperforms several popular survival models, including the Cox PH model, the gradient boosting PH model, and the random survival forest, in a number of covariate settings.
2. Materials and Methods
2.1. Survival Analysis
We review the basic concepts of survival analysis here. For a systematic treatment, see [24, 25]. In survival analysis, the time to event (death, failure, etc.) is typically modeled as a random variable, which follows some probability density distribution . The density can be characterized by the survival function for . The survival function captures the probability that the event does not happen until time . A closelyrelated concept is the hazard function , which measures the event rate at time conditioned on survival until . One can further show that .
The likelihood function for rightcensored survival data is expressed as Note the augmentation of our notation (we will follow this convention in the following context unless otherwise stated): is the set of regression parameters of the survival/hazard model; ,, indicates whether the event happens (), or not (, i.e., the data is censored); ,, are the explanatory covariates that affect the survival time; is the set of data whose events are observed; and is the set of censored data. The full maximumlikelihood approach would optimize over the functional space of (or ) and parameter space of . Unfortunately, this is often intractable.
2.1.1. Proportional Hazard Model
In his seminal work [1, 2], Cox introduced the proportional hazard (PH) model . is the baseline hazard function; is the relative hazard, which summarizes the effect of covariates. Cox observed that under the PH assumption, it suffices to estimate without the necessity of specifying and optimizing the likelihood (1). Instead, he proposed to optimize the socalled Cox partial likelihood The Cox model has become very popular in evaluating the covariates’ effect on survival data and is generalized to handle timevarying covariates and timevarying coefficients [3, 4]. However, the proportional hazards assumption and the maximization of the partial likelihood remain two main limitations. Nonlinear models, for example, multilayer neural networks [5–7], have been proposed to replace . However, they still assume parametric forms of the hazard function and attempt to optimize the partial likelihood.
2.1.2. Concordance Index
The Cindex is a commonly used performance measure of survival models. Intuitively, it is the fraction of all pairs of patients whose predictions have correct orders over the pairs that can be ordered. Formally, the Cindex is is the set of validly orderable pairs, where ; is the number of pairs in ; is the prediction of survival time; is the indicator function of whether the condition in parentheses is satisfied or not. In the PH setting, the predicted survival time can be equivalently represented by the negative log relative hazard. The Cindex estimates the probability that the order of the predictions of a pair of comparable patients is consistent with their observed survival information.
2.2. Gradient Boosting Machine
The gradient boosting machine (GBM) is an ensemble learning method, which constructs a predictive model by additive expansion of sequentially fitted weak learners [9, 10]. The general problem is to learn a functional mapping from data , where is the set of parameters of , such that some cost function is minimized. Boosting assumes follows an “additive” expansion form , where is called the weak or base learner with a weight and a parameter set . Accordingly, compose the whole parameter set . They are learnt in a greedy “stagewise” process: set an initial estimator ; for each iteration , solve . GBM approximates (2) with two steps. First, it fits by where Second, it learns by Then, it updates . In practice, however, shrinkage is often introduced to control overfitting, and the update becomes , where . If the weak learner is the regression tree, the complexity of is determined by tree parameters, for example, the tree size (or depth), and the minimum number of samples in terminal nodes. Besides using proper shrinkage and tree parameters, one could improve the GBM performance by subsampling, that is, fitting each base learner on a random subset of the training data. This method is called stochastic gradient boosting [10].
Compared to parametric models such as generalized linear models (GLM) [26] and neural networks, GBM does not assume any functional form of but uses additive expansion to build up the model. This nonparametric approach gives more freedom to researchers. GBM combines predictions from the ensemble of weak learners and so tends to yield more robust results than the single learner. Empirically, it also works better than the baggingbased random forests [27], probably due to its functional optimization motivation. However, it requires the cost function to be differentiable with respect to . GBM has been implemented in the popular opensource R package “gbm” [12] which supports several regression models.
2.2.1. Boosting the Proportional Hazard Model
Ridgeway [11] adapted GBM for the Cox model. The cost function is the negative log partial likelihood: One can then apply (4), (5), and (6) to learn each additive model. In the “gbm” package, this cost function corresponds to the “coxph” distribution and is further optimized to refit terminal nodes with Newton’s method. We denote this particular GBM algorithm as GBMCOX and its implementation in the “gbm” package as “gbmcox.”
2.3. Concordance Index Learning via Gradient Boosting
We now propose a gradient boosting algorithm to learn the Cindex. As the Cindex is a widely used metric to evaluate survival models, previous works [21, 22] have investigated the possibility to optimize it, instead of Cox’s partial likelihood. However, these works are limited to parametric models, such as linear models or neural networks. Our key contribution is to tackle the problem from a nonparametric ensemble perspective based on gradient boosting.
Optimizing the Cindex directly is difficult because of its discrete nature, that is, the summation over indicator functions in (3). We resort to the differentiable approximation proposed in [21], which adopts the logistic sigmoid function in each term. We call it the smoothed concordance index (SCI). Specifically, where is a hyperparameter that controls the steepness of the sigmoid function (accordingly, the approximability of SCI to CI) and is the prediction of survival time. Let . Then, at each iteration of gradient boosting, So the base learner can be fitted using and (4). Next, Although differentiable, SCI has a complicated error surface and is neither convex nor concave. This brings two problems. First, the algorithm’s performance depends on its initialization which may lead to different local optima; second, it is difficult to find the global solution of in (10). In our implementation, we set the initial estimation as the prediction from a fitted PH model and use line search to detect locally. Empirically, we have found that these heuristics work well for the algorithm.
Algorithm 1, named as GBMCI, summarizes our whole algorithm, which also incorporates the stochastic boosting mechanism [10]. Note that ensemble size is an important parameter that requires tuning, as small may not capture the true model, while large makes the algorithm apt to overfitting. In practice, it is often selected by cross validation. We implement GBMCI in the “gbm” package, under a new distribution called “sci,” which shares the same regression tree engine and complete software architecture as “gbmcox” does. We name our implementation of GBMCI as “gbmsci.”
3. Results
3.1. Dataset and Feature Extraction
We illustrate the utility of GBMCI on a large breast cancer dataset, which was originally released by Curtis et al. [28]. The dataset was adopted by the Sage Dream Breast Cancer Challenge (BCC) [29], where it was named Metabric. It contains gene expressions, copy number variations, clinical information, and survival data of 1,981 breast cancer patients. The gene expression data consist of 49,576 microarray probes; the copy number data consist of 18,538 SNP probes; the clinical data contain 25 clinical covariates; the survival data contain the survival time and status (dead or censored). Following the convention of BCC, we reserve 1001 patients for training and the other 980 for testing. We applied several successful feature selection schemes from the top competitors in BCC. See Table 1 for details on how these features were generated.

3.2. Experimental Settings
As a boosting model, GBMCI’s main competitor is the boosted proportional hazard model GBMCOX. As they share identical software environment with a common regression tree engine, the comparison should be reliable and reasonable. For baseline evaluation, we investigate the performance of the PH model with a stepwise Akaike information criterion (AIC) model selection scheme (denoted as “cox”). In addition, we also consider the popular random survival forest (RSF) approach by Ishwaran et al. [18], which is implemented in the R package randomSurvivalForest [30] (denoted as “rsf”). We use the concordance index as the evaluation criteria. All experiments are performed in R 2.15.1 software environment.
For “gbmsci,” the hyperparameter controls how well SCI approximates CI. Large values make the approximation good, but the gradient can be very large or even ill defined and vice versa. In practice, we find strikes a good balance between approximability and numerical stability. The linesearch range is along the gradient direction. The shrinkage in “gbm” is by default. In our experiments, we find works well for “gbmcox” and does for “gbmsci.” We do not essentially apply shrinkage for “gbmsci,” because the small linesearch range does not necessarily detect the global optimal , thus it implicitly contributes to shrinkage. This is mainly for computational efficiency purpose. “gbmsci” and “gbmcox” share other important parameter configurations: maximum number of trees is 1500 (actual number is automatically tuned by 5fold cross validation); tree depth is 6; (see Algorithm 1) is 1 or 0.5. For “rsf,” the number of trees is 1500; other parameters use default configurations.
3.3. Empirical Comparison
Each method is tested using the five feature representations in Table 1. For “gbmsci” and “gbmcox”, as cross validation introduces randomness by partitioning the training data, we repeat the experiment 50 times. Their predictive concordance indices are shown in Figures 1 and S1 (see Figure S1 in Supplementary Material available online at http://dx.doi.org/10.1155/2013/873595). For “cox”, the predictive concordance indices are shown in Table 2, which also summarizes the performances of “gbmsci” and “gbmcox.” For “rsf,” we also do 50 random tests because of bootstrapping when growing trees. The predictive concordance indices are shown in Figure S2.

Figures 1 and S1 show that “gbmsci” fairly consistently outperforms “gbmcox.” The advantage is notable when using the features of cl, clge, ge, and mt (without subsampling) and substantial when using mi. “gbmsci” performs slightly worse only when using mt (with subsampling) but is still comparable. Further more, all differences except mt (with subsampling) are statistically significant (Student’s test, all values ). We also note that subsampling generally improves the predictive power of both “gbmsci” and “gbmcox,” except when using cl. This is consistent with the theoretical argument of [10, 11].
From Table 2, one can see that “gbmsci” performs better than “cox” overall. The advantage is notable when using cl (without subsampling) and substantial when using clge, mt, and mi. For other cases, “gbmsci” and “cox” are comparable. On the other hand, “gbmcox” performs better than or comparable to “cox” for cl, clge, and mt but does slightly worse for ge and mi. Comparing Figures 1, S1 with S2, one can see that “gbmsci” outperforms “rsf” in most cases, while “gbmcox” also performs better than “rsf” overall.
To summarize the comparative study, GBMSCI outperforms GBMCOX, Cox PH, and RSF in most of the featuresubsampling settings. The results also shed light on the importance of feature representation. First, gene expression data may have potential prognosis power given welldesigned feature extraction schemes, for example, the Attractor Metagene (mt). Second, combining clinical and gene features together seems to provide enhanced prognosis power over using them separately. This is the case in both the original gene space (clge) and the transformed space (mi).
4. Discussion
Many machine learning techniques have been adapted and developed for survival analysis [23, 32, 33]. In particular, several important parametric models, such as neural networks and support vector regression, have been generalized to handle censored data. They provide survival studies with more comprehensive and flexible methodologies. However, ensemble methods are mostly limited to either direct adaptation of boosting to the classical PH model [11, 12] or bagging approaches such as random survival forests [17, 18]. Our proposed algorithm generalizes the gradient boosting machine to learn the Cindex directly, which provides a new ensemble learning methodology for survival analysis. As the Cindex is a ranking function in essence [22], our model also serves as an ensemble treatment to the ranking problem for survival data. This is novel and has not been addressed previously [14, 34, 35].
By studying the largescale Metabric breast cancer dataset, we found that “gbmsci” overall performs better than “gbmcox,” “cox,” and “rsf” in terms of predictive Cindices. The improvement is notable and consistent when various feature representations were applied. This study also demonstrates the enhanced prognosis power when gene expression profiles and clinical variables are combined and when the gene space is remapped in the predictive model. Interestingly, “gbmsci” typically outperforms “gbmcox” and “cox” when using these informative features. This may provide useful clues for clinical decision making. Moreover, we also confirm the utility of the subsampling scheme of gradient boosting.
Although GBMCI has free parameters that require tuning, for example, and the linesearch range, they empirically work well among different experiments once they have been well tuned. In addition, the algorithm still renders similar performance, when is within a reasonable neighborhood of 1 (e.g., ). One possible reason for the robustness is that both the objective function (8) and the gradient (9) are upper and lowerbounded (as can be shown through basic algebraic manipulations). Such bounds are not typically available when optimizing other objective functions for different regression problems, such as the partial likelihood for the Cox model, the mean absolute error for the Lasso regression, and the polynomial alternative of SCI as proposed by [21].
The proposed algorithm has room for improvement. First, current initialization and linesearch steps, although working well in practice, are not necessarily the globally optimal strategy. For initialization, one potential alternative is to fit PH models by subsampling or bootstrapping of the training data. To better address the problems, one may have to design other initialization heuristics or adopt a global optimization technique such as Monte Carlo methods. Second, GBMCI is computationally more intensive than other methods, because of the pairwise sigmoid computation in (9) and (10). Fortunately, GBMCI is easily parallelizable, which should help in dealing with large datasets. Third, biomedical research often deals with highthroughput data, for example, microarray gene expression profiling and next generation sequencing data, which require feature selection and dimension reduction. GBMCI does not tackle this task yet. However, as nodesplittings of regression trees implicitly perform feature extraction, one could either run GBMCI several iterations and preselect informative variables as a “warmup” step before the main learning routine or start GBMCI with all variables, iteratively rank their nodesplit frequencies, and refine the variable pool. These would allow GBMCI perform feature selection and concordance index learning in a unified framework.
Last but not least, we note that ensemble methods are in general more expensive than the Cox model, because of the necessity of tuning parameters, training ensemble weak learners, and cross validation. The tradeoff between predictive power and computational cost remains a question that depends on the specific case requirement. For example, given a particular prognosis analysis task, the Cox model may provide a quick baseline evaluation; ensemble methods could be applied, if higher predictive accuracy and more thorough investigation of covariates’ effect are required.
5. Conclusion
To summarize, we have developed a new algorithm (GBMCI) for survival analysis. It performs concordance index learning nonparametrically within the gradient boosting framework. It does not impose parametric assumptions on hazard functions, and it expands the ensemble learning methodologies of survival analysis. We implemented GBMCI in an opensource R package, and tested it using a comprehensive cancer prognosis study. GBMCI consistently performs better than three stateoftheart survival models (the Cox PH model, its boosting expansion, and the random survival forest) over several feature representations. This study also illustrates the importance of feature engineering of clinical and gene expression data in cancer prognosis studies.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
The work is partially supported by Grants NSF DBI0846218, NCI UO1CA11480 (SPECS UO1), NCI UO1CA152738 (EDRN), UO11CA162147, UCI Chao Family Comprehensive Cancer Center, NCI P30CA62203, and by the U.S. Department of Defense Congressionally Mandated Research Program Grant PC120465. The Metabric dataset was accessed through Synapse (https://synapse.sagebase.org/). The authors thank Dr. WeiYi Cheng for kindly sharing the Attractor Metagene algorithm and Daniel Newkirk and Jacob Biesinger for helpful discussions.
Supplementary Materials
The supplementary material consists of two figures, which summarize the predictive performances of GBMbased methods (with subsampling) and the RSFbased method.
References
 D. R. Cox, “Regression models and lifetables,” Journal of the Royal Statistical Society B, vol. 34, no. 2, pp. 187–220, 1972. View at: Google Scholar
 D. R. Cox, “Partial likelihood,” Biometrika, vol. 62, no. 2, pp. 269–276, 1975. View at: Google Scholar
 P. K. Andersen and R. D. Gill, “Cox's regression model for counting processes: a large sample study,” The Annals of Statistics, vol. 10, no. 4, pp. 1100–1120, 1982. View at: Publisher Site  Google Scholar
 T. Martinussen and T. H. Scheike, Dynamic Regression Models For Survival Data, Springer, New York, NY, USA, 2006.
 D. Faraggi and R. Simon, “A neural network model for survival data,” Statistics in Medicine, vol. 14, no. 1, pp. 73–82, 1995. View at: Google Scholar
 L. Mariani, D. Coradini, E. Biganzoli et al., “Prognostic factors for metachronous contralateral breast cancer: a comparison of the linear Cox regression model and its artificial neural network extension,” Breast Cancer Research and Treatment, vol. 44, no. 2, pp. 167–178, 1997. View at: Publisher Site  Google Scholar
 R. M. Ripley, A. L. Harris, and L. Tarassenko, “Neural network models for breast cancer prognosis,” Neural Computing and Applications, vol. 7, no. 4, pp. 367–375, 1998. View at: Google Scholar
 R. M. Ripley, A. L. Harris, and L. Tarassenko, “Nonlinear survival analysis using neural networks,” Statistics in Medicine, vol. 23, no. 5, pp. 825–842, 2004. View at: Publisher Site  Google Scholar
 J. H. Friedman, “Greedy function approximation: a gradient boosting machine,” Annals of Statistics, vol. 29, no. 5, pp. 1189–1232, 2001. View at: Google Scholar
 J. H. Friedman, “Stochastic gradient boosting,” Computational Statistics and Data Analysis, vol. 38, no. 4, pp. 367–378, 2002. View at: Publisher Site  Google Scholar
 G. Ridgeway, “The state of boosting,” Computing Science and Statistics, vol. 31, pp. 172–181, 1999. View at: Google Scholar
 G. Ridgeway, “Generalized boosted models: a guide to the gbm package,” 2007. View at: Google Scholar
 P. K. Shivaswamy, W. Chu, and M. Jansche, “A support vector approach to censored targets,” in Proceedings of the 7th IEEE International Conference on Data Mining (ICDM '07), pp. 655–660, October 2007. View at: Publisher Site  Google Scholar
 V. van Belle, K. Pelckmans, S. van Huffel, and J. A. K. Suykens, “Support vector methods for survival analysis: a comparison between ranking and regression approaches,” Artificial Intelligence in Medicine, vol. 53, no. 2, pp. 107–118, 2011. View at: Publisher Site  Google Scholar
 G. C. Cawley, N. L. C. Talbot, G. J. Janacek, and M. W. Peck, “Sparse bayesian kernel survival analysis for modeling the growth domain of microbial pathogens,” IEEE Transactions on Neural Networks, vol. 17, no. 2, pp. 471–481, 2006. View at: Publisher Site  Google Scholar
 L. Evers and C.M. Messow, “Sparse kernel methods for highdimensional survival data,” Bioinformatics, vol. 24, no. 14, pp. 1632–1638, 2008. View at: Publisher Site  Google Scholar
 T. Hothorn, P. Bühlmann, S. Dudoit, A. Molinaro, and M. J. van der Laan, “Survival ensembles,” Biostatistics, vol. 7, no. 3, pp. 355–373, 2006. View at: Publisher Site  Google Scholar
 H. Ishwaran, U. B. Kogalur, E. H. Blackstone, and M. S. Lauer, “Random survival forests,” The Annals of Applied Statistics, vol. 2, no. 3, pp. 841–860, 2008. View at: Google Scholar
 F. E. Harrell Jr., R. M. Califf, D. B. Pryor, K. L. Lee, and R. A. Rosati, “Evaluating the yield of medical tests,” Journal of the American Medical Association, vol. 247, no. 18, pp. 2543–2546, 1982. View at: Publisher Site  Google Scholar
 F. E. Harrell, K. L. Lee, and D. B. Mark, “Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors,” Statistics in Medicine, vol. 15, no. 4, pp. 361–387, 1996. View at: Google Scholar
 L. Yan, D. Verbel, and O. Saidi, “Predicting prostate cancer recurrence via maximizing the concordance index,” in Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 479–485, ACM, August 2004. View at: Google Scholar
 V. C. Raykar, H. Steck, B. Krishnapuram, C. DehingOberije, and P. Lambin, “On ranking in survival analysis: bounds on the concordance index,” in Advances in Neural Information Processing Systems 20, J. C. Platt, D. Koller, Y. Singer, and S. Roweis, Eds., pp. 1209–1216, The MIT Press, Cambridge, Mass, USA, 2008. View at: Google Scholar
 A. Khosla, Y. Cao, C. C.Y. Lin, H.K. Chiu, J. Hu, and H. Lee, “An integrated machine learning approach to stroke prediction,” in Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 183–191, ACM, July 2010. View at: Publisher Site  Google Scholar
 D. Roxbee Cox and D. Oakes, Analysis of Survival Data, vol. 21, Chapman & Hall/CRC, Boca Raton, Fla, USA, 1984.
 P. D. Allison, Survival Analysis Using SAS: A Practical Guide, SAS Institute, 2010.
 P. McCullagh and J. A. Nelder, Generalized Linear Models, vol. 37, Chapman & Hall/CRC, Boca Raton, Fla, USA, 1989.
 J. Friedman, T. Hastie, and R. Tibshirani, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer, New York, NY, USA, 2009.
 C. Curtis, S. P. Shah, S. F. Chin et al., “The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups,” Nature, vol. 486, no. 7403, pp. 346–352, 2012. View at: Google Scholar
 A. A. Margolin, E. Bilal, E. Huang et al., “Systematic analysis of challengedriven improvements in molecular prognostic models for breast cancer,” Science Translational Medicine, vol. 5, no. 181, pp. 181re1–181re1, 2013. View at: Google Scholar
 H. Ishwaran and U. B. Kogalur, “Random survival forests for r,” New Functions for Multivariate Analysis, p. 25, 2007. View at: Google Scholar
 W. Y. Cheng, T. H. Ou Yang, and D. Anastassiou, “Biomolecular events in cancer revealed by attractor metagenes,” PLoS Computational Biology, vol. 9, no. 2, Article ID e1002920, 2013. View at: Google Scholar
 B. Zupan, J. Demšar, M. W. Kattan, J. R. Beck, and I. Bratko, “Machine learning for survival analysis: a case study on recurrence of prostate cancer,” Artificial Intelligence in Medicine, vol. 20, no. 1, pp. 59–75, 2000. View at: Publisher Site  Google Scholar
 M. W. Kattan, “Comparison of Cox regression with other methods for determining prediction models and nomograms,” The Journal of Urology, vol. 170, no. 6, pp. S6–S10, 2003. View at: Publisher Site  Google Scholar
 Y. Freund, R. Iyer, R. E. Schapire, and Y. Singer, “An efficient boosting algorithm for combining preferences,” Journal of Machine Learning Research, vol. 4, no. 6, pp. 933–969, 2004. View at: Publisher Site  Google Scholar
 C. J. C. Burges, K. M. Svore, P. N. Bennett, A. Pastusiak, and Q. Wu, “Learning to rank using an ensemble of lambdagradient models,” Journal of Machine Learning Research, vol. 14, pp. 25–35, 2011. View at: Google Scholar
Copyright
Copyright © 2013 Yifei Chen et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.