Nonlinear Analysis: Algorithm, Convergence, and Applications 2014View this Special Issue
Multinomial Regression with Elastic Net Penalty and Its Grouping Effect in Gene Selection
For the multiclass classification problem of microarray data, a new optimization model named multinomial regression with the elastic net penalty was proposed in this paper. By combining the multinomial likeliyhood loss and the multiclass elastic net penalty, the optimization model was constructed, which was proved to encourage a grouping effect in gene selection for multiclass classification.
Support vector machine , lasso , and their expansions, such as the hybrid huberized support vector machine , the doubly regularized support vector machine , the 1-norm support vector machine , the sparse logistic regression , the elastic net , and the improved elastic net , have been successfully applied to the binary classification problems of microarray data. However, the aforementioned binary classification methods cannot be applied to the multiclass classification easily. Hence, the multiclass classification problems are the difficult issues in microarray classification [9–11].
Besides improving the accuracy, another challenge for the multiclass classification problem of microarray data is how to select the key genes [9–15]. By solving an optimization formula, a new multicategory support vector machine was proposed in . It can be successfully used to microarray classification . However, this optimization model needs to select genes using the additional methods. To automatically select genes during performing the multiclass classification, new optimization models [12–14], such as the norm multiclass support vector machine in , the multicategory support vector machine with sup norm regularization in , and the huberized multiclass support vector machine in , were developed.
Note that the logistic loss function not only has good statistical significance but also is second order differentiable. Hence, the regularized logistic regression optimization models have been successfully applied to binary classification problem [15–19]. Multinomial regression can be obtained when applying the logistic regression to the multiclass classification problem. The emergence of the sparse multinomial regression provides a reasonable application to the multiclass classification of microarray data that featured with identifying important genes [20–22]. By using Bayesian regularization, the sparse multinomial regression model was proposed in . By adopting a data augmentation strategy with Gaussian latent variables, the variational Bayesian multinomial probit model which can reduce the prediction error was presented in . By using the elastic net penalty, the regularized multinomial regression model was developed in . It can be applied to the multiple sequence alignment of protein related to mutation. Although the above sparse multinomial models achieved good prediction results on the real data, all of them failed to select genes (or variables) in groups.
For the multiclass classification of the microarray data, this paper combined the multinomial likelihood loss function having explicit probability meanings  with multiclass elastic net penalty selecting genes in groups , proposed a multinomial regression with elastic net penalty, and proved that this model can encourage a grouping effect in gene selection at the same time of classification.
2. Problem Formulation and Preliminary
Given a training data set of -class classification problem , where represents the input vector of the th sample and represents the class label corresponding to . For the microarray data, and represent the number of experiments and the number of genes, respectively. Restricted by the high experiment cost, only a few (less than one hundred) samples can be obtained with thousands of genes in one sample. Let and , where , . Without loss of generality, it is assumed that
For the binary classification problem, the class labels are assumed to belong to . The logistic regression model represents the following class-conditional probabilities; that is, and then According to the common linear regression model, can be predicted as where represents bias and represents the parameter vector.
In this paper, we pay attention to the multiclass classification problems, which imply that . Let be the decision function, where . The multiclass classifier can be represented as Let and For convenience, we further let and represent the th row vector and th column vector of the parameter matrix . Then extending the class-conditional probabilities of the logistic regression model to -logits, we have the following formula: where represent a pair of parameters which corresponds to the sample , and , . Similarly, we can construct the th as holds if and only if . It can be easily obtained that that is, It should be noted that if . Therefore, the class-conditional probabilities of multiclass classification problem can be represented as
3. Main Results
3.1. Multinomial Regression with the Multiclass Elastic Net Penalty
Following the idea of sparse multinomial regression [20–22], we fit the above class-conditional probability model by the regularized multinomial likelihood. Let . It is easily obtained that Hence, Let Then (13) can be rewritten as Note that Hence, the multinomial likelihood loss function can be defined as
In order to improve the performance of gene selection, the following elastic net penalty for the multiclass classification problem was proposed in  By combing the multiclass elastic net penalty (18) with the multinomial likelihood loss function (17), we propose the following multinomial regression model with the elastic net penalty: where represent the regularization parameter. Note that . Hence, the optimization problem (19) can be simplified as
3.2. Grouping Effect
For the microarray classification, it is very important to identify the related gene in groups. In the section, we will prove that the multinomial regression with elastic net penalty can encourage a grouping effect in gene selection. To this end, we must first prove the inequality shown in Theorem 1.
Theorem 1. Let be the solution of the optimization problem (19) or (20). For any new parameter pairs which are selected as , the following inequality holds, where and represent the first rows of vectors and and and represent the first rows of matrices and .
Proof. Note that the inequality holds for the arbitrary real numbers and . Hence, the following inequality
holds for any pairs , . From (22), it can be easily obtained that
Hence, from (24) and (25), we can get
Equation (26) is equivalent to the following inequality:
Hence, inequality (21) holds. This completes the proof.
Theorem 2. Give the training data set and assume that the matrix and vector satisfy (1). If the pairs () are the optimal solution of the multinomial regression with elastic net penalty (19), then the following inequality holds, where , is the th column of parameter matrix , and is the th column of parameter matrix .
Proof. First of all, we construct the new parameter pairs , where Let Since the pairs () are the optimal solution of the multinomial regression with elastic net penalty (19), it can be easily obtained that Note that the function is Lipschitz continuous. Hence, we have From (33) and (21) and the definition of the parameter pairs , we have Analogically, we have Substituting (34) and (35) into (32) gives that is, From (37), it can be easily obtained that where . This completes the proof.
According to the inequality shown in Theorem 2, the multinomial regression with elastic net penalty can assign the same parameter vectors (i.e., ) to the high correlated predictors (i.e., ). This means that the multinomial regression with elastic net penalty can select genes in groups according to their correlation. According to the technical term in , this performance is called grouping effect in gene selection for multiclass classification. Particularly, for the binary classification, that is, , inequality (29) becomes This corresponds with the results in .
3.3. Solving Algorithm
Microarray is the typical small , large problem. Because the number of the genes in microarray data is very large, it will result in the curse of dimensionality to solve the proposed multinomial regression. To improve the solving speed, Friedman et al. proposed the pairwise coordinate decent algorithm which takes advantage of the sparse property of characteristic. Therefore, we choose the pairwise coordinate decent algorithm to solve the multinomial regression with elastic net penalty. To this end, we convert (19) into the following form: Equation (40) can be easily solved by using the R package “glmnet” which is publicly available.
By combining the multinomial likelihood loss function having explicit probability meanings with the multiclass elastic net penalty selecting genes in groups, the multinomial regression with elastic net penalty for the multiclass classification problem of microarray data was proposed in this paper. The proposed multinomial regression is proved to encourage a grouping effect in gene selection. In the next work, we will apply this optimization model to the real microarray data and verify the specific biological significance.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
This work is supported by Natural Science Foundation of China (61203293, 61374079), Key Scientific and Technological Project of Henan Province (122102210131, 122102210132), Program for Science and Technology Innovation Talents in Universities of Henan Province (13HASTIT040), Foundation and Advanced Technology Research Program of Henan Province (132300410389, 132300410390, 122300410414, and 132300410432), Foundation of Henan Educational Committee (13A120524), and Henan Higher School Funding Scheme for Young Teachers (2012GGJS-063).
J. Zhu, R. Rosset, and T. Hastie, “1-norm support vector machine,” in Advances in Neural Information Processing Systems, vol. 16, pp. 49–56, MIT Press, New York, NY, USA, 2004.View at: Google Scholar
Y. Lee, Y. Lin, and G. Wahba, “Multicategory support vector machines: theory and application to the classification of microarray data and satellite radiance data,” Journal of the American Statistical Association, vol. 99, no. 465, pp. 67–81, 2004.View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
G. C. Cawley, N. L. C. Talbot, and M. Girolami, “Sparse multinomial logistic regression via Bayesian L1 regularization,” in Advances in Neural Information Processing Systems, vol. 19, pp. 209–216, MIT Press, New York, NY, USA, 2007.View at: Google Scholar
J. Friedman, T. Hastie, and R. Tibshirani, “Regularization paths for generalized linear models via coordinate descent,” Journal of Statistical Software, vol. 33, no. 1, pp. 1–22, 2010.View at: Google Scholar