Research Article

A Gradient Boosting Algorithm for Survival Analysis via Direct Optimization of Concordance Index

Table 1

The five sets of features extracted from the Metabric breast cancer dataset.

Category Abbreviation Explanation

Clinical feature cl A subset of clinical covariates is selected by fitting the Cox model with AIC in a stepwise algorithm. The frequently selected features include age at diagnosis, lymph node status, treatment type, tumor size, tumor group, and tumor grade.
Gene feature ge A subset of gene expression microarray probes using Illumina HT 12v3 platform is selected whose concordance indices to the survival data are ranked highest (positive concordant) or lowest (negative concordant). A few examples are, “ILMN_1683450,” “ILMN_2392472,” “ILMN_1700337.”
Clinical and gene feature clge A combination of previously selected clinical features and gene expression features is used to fit the Cox model with AIC in a stepwise algorithm, yielding a refined subset of features.
Metagene feature mt The high-dimensional gene expression data is fed into an iterative attractor finding algorithm, yielding a few Attractor Metagenes which are found commonly present in multiple cancer types [31]. Some multicancer attractors are strongly associated with the tumor stage, grade, or the lymphocyte status.
Clinical and Metagene feature mi A minimum subset of metagenes which has strong prognosis power for breast cancer [31], combined with several important clinical covariates, such as age at diagnosis and treatment type.