Abstract

A new adaptive shooting regularization method for variable selection based on the Cox’s proportional hazards mode being proposed. This adaptive shooting algorithm can be easily obtained by the optimization of a reweighed iterative series of penalties and a shooting strategy of penalty. Simulation results based on high dimensional artificial data show that the adaptive shooting regularization method can be more accurate for variable selection than Lasso and adaptive Lasso methods. The results from real gene expression dataset (DLBCL) also indicate that the regularization method performs competitively.

1. Introduction

In the study of the dependence of survival time on covariances , the Cox’s proportional hazards model [1, 2] is the most widely used model in survival analysis. Suppose the dataset has a sample size of to study survival time on covariate , we use the data form of to represent the individual’s sample, where is the censoring indicator, the denotes the survival time if or otherwise censoring time.

By the Cox’s proportional hazards model, the hazard function can be defined as where baseline hazard function is unspecified or unknown and is the regression coefficient vector of variables.

The Cox’s partial log-likelihood is expressed as where denotes ordered risk set at time ; represents failure time.

In practice, not all the covariates may contribute to the prediction of survival outcomes: some components of may be zero in the true model. To select important variables under the proportional hazards model (2), Tibshirani [3], Fan and Li [4], and Zhang and Lu [5] proposed to minimize the penalized log partial likelihood function as

The standard regularization algorithm cannot directly be applied for nonlinear Cox model to obtain parameter estimates. Therefore, Tibshirani [3] and Zhang and Lu [5] proposed iterative procedure to transform the Cox’s partial log-likelihood function (2) to linear regression problem through an iteratively Newton-Raphon update. Here we follow the approach of Zhang and Lu [5]: define the gradient vector and the Hessian matrix , then apply the Choleski decomposition to obtain , and generate the pseudoresponse vector . Then Zhang and Lu [5] suggested an optimization problem with the penalty function: The Lasso penalty is , which shrinks small coefficients to zero and hence results in a sparse representation of the solution. However, estimation of large ’s may suffer from substantial bias in if chosen too big and may not be sufficiently spare if is selected too small. Hence, Fan and Li [4] proposed the smoothly clipped absolute deviation (SCAD) penalty, which avoids excessive penalties on large coefficients and enjoys the oracle properties. The adaptive penalty is , where the weights are chosen adaptively by data. The values chosen for are crucial for guaranteeing the optimality of the solution.

The above-mentioned series of Lasso methods were based on the L1 penalty. Xu et al. [6, 7] and Liang et al. [8] have proposed regularization method which has the penalty . The theoretical analyses and experiments show that the regularization is more effective than Lasso both in theory and practice. In this paper, we investigate the adaptive shooting regularization to solve the Cox model.

The rest of the paper is organized as follows. Section 2 describes an adaptive shooting regularization algorithm to obtain estimates from the Cox model. Section 3 evaluates our method by simulation studies and application to real gene expression dataset (DLBCL). Finally we give a brief discussion.

2. Adaptive Shooting Regularization Method for the Cox Model

The log partial likelihood function of the Cox model with the penalty is where is the tuning parameter.

In this section, we proposed the adaptive shooting algorithm to optimize the Cox model in an approximate linear form. The following is the complete algorithm procedure.

Step 1. Initial coefficients value and .

Step 2. Compute , , , , and based on (), define , (), and write as , where is the -dimensional vector consisting of all ’s other than , let for each .

Step 3. Solve (), using the shooting regularization approach:

Step 4. Solve (), using the modified reweighed iterative approach of the L1 shooting approach.
Step  4.1. Start with , set inner iteration count .
Step  4.2. At each iterative step , for each , update: where is the th column of . A new estimator is formed after updating all ’s and let .
Step  4.3. Update and and repeat Step until converge.

Step 5. Let and update and and repeat Steps 2, 3, and 4 until does not change.
In Steps 2 and 4.3, we modify shooting algorithm with weight based on last estimate at each iteratively step. It is possible that some become zero during the iterative procedure. So to guarantee the feasibly, we replace with when implementing, where is any fixed positive real number. Steps 3 and 4 implement the shooting strategy of penalty and the reweighed iterative strategy of penalties, respectively. Step 5 selects the minimum of , which is obtained by Steps 3 and 4, to improve the converge speed of the algorithm.
This algorithm gives exact zeros for some coefficients and it converges quickly based on our empirical experience. Similarly to Theorem 3 in Fu [9], we can show that the adaptive shooting regularization algorithm is guaranteed to converge to the global minimum of the log partial likelihood function of the Cox model (5).

3. Numerical Studies

3.1. Simulation Study for the High Dimensional Artificial Dataset

In this section, we compare the performance of the Lasso, the adaptive Lasso, and the adaptive shooting regularization method, under Cox’s proportional hazards model. The cross-validated partial likelihood (CVPL) method is used to estimate the tuning parameter in these three algorithms. In our simulation studies, we use the Gempertz model suggested by Qian et al. [10] to generate the Cox model datasets in the setting:

We considered the cases with 25% and 40% of censoring and used four samples, , 250, 300, and 350. The simulation results obtained by the three methods reported in Table 1. Since this simulation dataset has 6 relevant features (6 nonzero coefficients) in the 1000 ones, the idealized average numbers of variables selected (the Var column) and correct zeros (the Corr column) by each method are 6 and 994, respectively. From the Var and Corr columns of Table 1, the results obtained by the regularization method are obviously better than those of other methods for different sample sizes and censoring settings. For example, when and the censoring is 25%, the average numbers (Var) from the Lasso, the adaptive Lasso, and the regularization methods are 81.29, 41.06, and 17.79 (best). The correct zeros’ numbers (Corr) of the three methods are 917.29, 962.47, and 984.28 (best), respectively. The results obtained by the method are obviously close to the idealized values in the Var and Corr columns. Moreover, in the IBS (the integrated Brier score) column, the IBS’s value of the Lasso, the adaptive Lasso, and the shooting regularization method are 0.1502, 0.1474, and 0.1440. This means that the shooting regularization method performs slight better than the other two methods for the prediction accuracy. Similar results are observed for the 40% censoring case.

As shown in the Incorr columns of Table 1, the idealized average number is 0 if the method can correctly identify all relevant variables at each run, whereas its maximal value is 6 if the method incorrectly identifies all the nonzero coefficients to zero in all runs. When the sample size is relative small ( and censoring rate = 25%), the average number of the incorrect zeros from the Lasso is 0.26, from the adaptive Lasso is 0.35 and from the regularization shooting method is 0.42. The adaptive shooting regularization method performs worse than the other two methods. When increases to 350, all the three algorithms never evaluated the nonzero coefficients to zero. This means that the adaptive shooting regularization method shrinks the small effect covariates to zero more easily than the Lasso and the adaptive Lasso when the sample size is relative small. Similar results are observed for the 40% censoring case.

3.2. Experiments on the Real Gene Expression (DLBCL) Dataset

To further demonstrate the utility of the regularization shooting procedure in relating microarray gene expression data to censored survival phenotypes, we re-analyzed a published dataset of DLBCL by Rosenwald et al. [11]. This dataset contains a total of 240 patients with DLBCL, including 138 patient deaths during the followups with a median death time of 2.8 years. Rosenwald et al. [11] divided the 240 patients into a training set of 160 patients and a test set of 80 patients and built a multivariate Cox model. The variables in the Cox model included the average gene expression levels of smaller sets of genes in four different gene expression signatures together with the gene expression level of BMP6. It should be noted that in order to select the gene expression signatures, they performed a hierarchical clustering analysis for genes across all the samples (including both training and test samples). In order to compare our results with those in Rosenwald et al. [11], we used the same setting of training and test datasets in our analysis.

We applied the adaptive shooting regularization method to first build a predictive model using the training data of 160 patients and all the 7399 genes as features (predictors). Table 2 shows the GeneBank ID and a brief description of top ten genes selected by our proposed regularization method. It is interesting to note that eight of these genes belong to the gene expression signature groups defined in Rosenwald et al. [11]. These three signature groups include Germinal-center B-cell signature, MHC, and lymph-node signature. On the other hand, two genes selected by the method are not in the proliferation signature group defined by Rosenwald et al. [11].

Based on the estimated model with these genes, we estimated the risk scores using the method proposed by Gui and Li [12]. To further examine whether clinically relevant groups can be identified by the model, we used zero as a cutoff point of the risk scores and divided the test patients into two groups based on whether they have positive or negative risk scores ().

As a comparison, the Lasso, the adaptive Lasso, and the regularization methods are validated on the test dataset of 80 patients defined in Rosenwald et al. [11], and their corresponding Kaplan-Meier curves are shown in Figure 1. In Figure 1, the horizontal coordinate is the predictive survival time (years) and the vertical coordinate is the predictive survival probabilities. The value (lower the better to indicate statistical significance) of the Lasso for the test dataset is 0.0011, which is significantly larger than 0.0006 and 0.0004 of the adaptive Lasso and the regularization methods. This means that lasso method performs the worst for the survival prediction compared with other two methods.

On the other hand, in order to assess how well the model predicts the outcome, we also use the idea of the integrated Brier score (IBS) for the test dataset including censored observations as our criteria. In Table 3, the IBS’s value of the Lasso, the adaptive Lasso, and the adaptive shooting regularization method are 0.2306, 0.2026, and 0.2017. We can see that the adaptive Lasso and the adaptive shooting regularization methods perform slight better than Lasso for the prediction accuracy.

4. Discussion and Conclusion

In this paper, we have presented the novel adaptive shooting regularization method, which is used for variable selection in the Cox’s proportional hazards model. Its performance is validated by both simulation and real case studies. In the experiments, we use the high-dimensional and low-sample size dataset, with applications to microarray gene expression data (DLBCL). Results indicate that our proposed adaptive shooting regularization algorithm is very competitive in analyzing high dimensional survival data in terms of sparsity of the final prediction model and predictability. The proposed regularization procedure is very promising and useful in building a parsimonious predictive model used for classifying future patients into clinically relevant high-risk and low-risk groups based on the gene expression profile and survival times of previous patients. The procedure can also be applied to select important genes which are related to patient’s survival outcome.

Acknowledgments

This work was supported by the Macau Science and Technology Develop Funds (Grant no. 017/2010/A2) of Macau SAR of China and the National Natural Science Foundation of China (Grant nos. 11131006, 11171272).