Group Identification and Variable Selection in Quantile Regression

Alkenani, Ali; Msallam, Basim Shlaibah

doi:https://doi.org/10.1155/2019/8504174

Journal of Probability and Statistics

On this page

Abstract Introduction Conclusions Data Availability Conflicts of Interest References Copyright Related Articles

Special Issue

Quantile Regression and Beyond in Statistical Analysis of the Data

View this Special Issue

Research Article | Open Access

Volume 2019 | Article ID 8504174 | https://doi.org/10.1155/2019/8504174

Group Identification and Variable Selection in Quantile Regression

Ali Alkenani¹and Basim Shlaibah Msallam²

Guest Editor: Himel Mallick

Received24 Nov 2018

Revised21 Feb 2019

Accepted28 Mar 2019

Published10 Apr 2019

Abstract

Using the Pairwise Absolute Clustering and Sparsity (PACS) penalty, we proposed the regularized quantile regression QR method (QR-PACS). The PACS penalty achieves the elimination of insignificant predictors and the combination of predictors with indistinguishable coefficients (IC), which are the two issues raised in the searching for the true model. QR-PACS extends PACS from mean regression settings to QR settings. The paper shows that QR-PACS can yield promising predictive precision as well as identifying related groups in both simulation and real data.

1. Introduction

The regression model is one of the most important statistical models. The ordinary least squares regression (OLS) estimates the conditional mean function of the response. The least absolute deviation regression (LADR) estimates the conditional median function and it is resistant to outliers. The QR was introduced by Koenker and Bassett [1] as a generalization of LADR to estimate the conditional quantile function of the response. Consequently, QR gives us much more information about the conditional distribution of the response. QR has attracted a vast amount of interest in literature. It is applied in many different areas such as economics, finance, survival analysis, and growth chart.

Variable selection (VS) is very important in the process of model building. In many applications, the number of variables is huge. However, keeping irrelevant variables in the model is undesirable because it makes the model difficult to interpret and may affect negatively its ability of prediction. Many different penalties were suggested to achieve VS. For example, Lasso [2], SCAD [3], fused Lasso [4], elastic-net [5], group Lasso [6], adaptive Lasso [7], adaptive elastic-net [8], and MCP [9].

Under QR framework, Koenker [10] combined the Lasso with the mixed-effect QR model to encourage shrinkage in estimating the random effects. Wang, Li, and Jiang [11] combined LADR with the adaptive Lasso penalty. Li and Zhu [12] proposed L1-norm penalized QR (PQR) by combining QR with Lasso penalty. Wu and Liu [13] introduced PQR with the SCAD and the adaptive Lasso penalties. Slawski [14] proposed the structured elastic-net regularizer for QR.

In the setting p > n, where represents the number of predictors and represents the sample size, Belloni and Chernozhukov [15] studied the theory of PQR for the Lasso penalty. They considered QR in high-dimensional sparse models. Wang, Wu, and Li [16] investigated the methodology of PQR in ultrahigh dimension for the nonconvex penalties such as SCAD or MCP. Peng and Wang [17] proposed and studied a new iterative coordinate descent algorithm for solving nonconvex PQR in high dimension.

The search for the true model focuses on two issues: deleting irrelevant predictors and merging predictors with IC [18]. Although the above penalties can achieve the first issue, they fail in achieving the second one. The two issues can be achieved through Pairwise Absolute Clustering and Sparsity (PACS) [18]. Moreover, PACS is an oracle method for simultaneous group identification and VS.

The limitations of existing variable selection methods motivate the authors to write this paper. The aim of the current research is to find an effective procedure for simultaneous group identification and VS under QR framework.

In this paper, we suggested the QR-PACS to get the advantages over the existing PQR methods. QR-PACS benefits from the ability of PACS on achieving the mentioned issues of the discovery of the true model which is unavailable in Lasso, Adaptive Lasso, SCAD, MCP, Elastic-net, and structured elastic-net.

The rest of the paper is organized as follows. In Section 2, penalized linear QR is reviewed briefly. QR-PACS is introduced in Section 3. The numerical results of simulations and real data are presented in Sections 4 and 5, respectively. The conclusions are reported in Section 6.

2. Penalized Linear QR

QR is a widespread technique used to describe the distribution of an outcome variable (, given a set of predictors . Let be a vector of predictors for the observation and be the inverse cumulative distribution function of given . Then, , where is a vector of unknown parameters and is the level of quantile.

Koenker and Bassett [1] suggested estimating as follows:

where is the check loss function defined asUnder regularization framework, Li and Zhu [12], Wu and Liu [13], Slawski [14], and Wang, Wu, and Li [16] among others proposed the penalized versions of (1) by adding different penalties as follows:where is the penalization parameter and is the penalty function.

For the rest of this paper, the subscript is omitted for notational convenience.

3. Penalized Linear QR through PACS (QR-PACS)

In this section, we incorporate PACS into the optimization of (1) to propose QR-PACS. Under the QR setup, the predictors are standardized and the response is centered, and . The QR-PACS is proposed for simultaneous group identification and VS in QR. The QR-PACS encourages correlated variable to have equal coefficient values. The equality of coefficients is attained by adding group identification penalty to the pairwise differences and sums of coefficients. The QR-PACS estimates are proposed as minimizers ofwhere are the nonnegative weights.

The PACS penalty in (4) consists of that encourages sparseness, and , which are employed for the group identification and encourage equality of coefficients. The second term of the penalty encourages the same sign coefficients to be set as equal, while the third term encourages opposite sign coefficients to be set as equal in magnitude.

Choosing appropriate adaptive weights is very important for PACS. In QR-PACS, we employed the adaptive weights that incorporate correlations into the weights as suggested by Sharma et al. [18] with a small modification as follows:where is a consistent estimator of , such as the PACS [18] estimates or other shrinkage QR estimates, and is the biweight midcorrelation pair of predictors. We propose to employ the biweight midcorrelation [19, 20] instead of Pearson correlation which is used in the adaptive weights in [18] to obtain robust correlation and robust weights.

In this paper, ridge quantile estimates were employed as initial estimates for ’s to obtain weights performing well in studies with collinear predictors.

4. Simulation Study

In this section, five examples were carried out to assess QR-PACS method by comparing it with existing selection approaches under QR setting in both prediction precision and model discovery. A regression model was generated as follows.In all examples, predictors and the error term were standard normal.

We compared QR-PACS with ridge QR, QR-Lasso, QR-SCAD, QR-adaptive Lasso, and QR-elastic-net. The performance of the methods was compared using model error (ME) criterion for prediction accuracy which was defined by where represents the population covariance matrix of X and the resulting model complexity for model discovery. The median and standard error (SE) of ME were reported. Also, selection accuracy (SA, % of true models identified), grouping accuracy (GA, % of true groups identified), and % of both selection and grouping accuracy (SGA) were computed and reported. Note that none of ridge QR, QR-Lasso, QR-SCAD, QR-adaptive Lasso, and QR- elastic-net perform grouping. The sample size was 100 and the simulated model was replicated 100 times. Some typical examples are reported as follows.

Example 1. In this example, we assumed the true parameters for the model of study as , . The first three predictors were highly correlated with correlation equal to 0.7 and their coefficients were equal in magnitude, while the rest were uncorrelated.

Example 2. In this example, the true coefficients were assumed as , . The first three predictors were highly correlated with correlation equal to 0.7 and their coefficients were different in magnitude, while the rest were uncorrelated.

Example 3. In this example, the true parameters were , . The first three predictors were highly correlated with correlation equal to 0.7 and their coefficients were equal in magnitude, while the correlation for the second three predictors was equal to 0.3 and their coefficients were different in magnitudes. The remaining predictors were uncorrelated.

Example 4. In this example, the true parameters were , . The first three predictors were correlated with correlation equal to 0.3 and their coefficients were equal in magnitude, while the correlation for the second three predictors was 0.7 and their coefficients were different in magnitudes. The remaining predictors were uncorrelated.

Example 5. In this example, the true parameters were assumed as , . The first three and the second two predictors were highly correlated with correlation equal to 0.7 and their coefficients were different in magnitude, while the rest were uncorrelated.

For all the values of and , Table 1 shows that the QR-PACS method has the lowest ME. Although the QR-elastic-net and QR-adaptive Lasso have the highest SA, it is clear that all the considered methods do not perform grouping except QR-PACS. The QR-PACS method successfully identifies the groups of predictors as seen in the GA and SGA rows.

In Table 2, the percentage of no-grouping (NG, no groups found) and percentage of selection and no-grouping (SNG) were reported instead of GA and SGA, respectively. In terms of prediction and selection, the QR-PACS method does not perform well, while the QR-elastic-net, QR-adaptive Lasso, and QR-SCAD perform the best, respectively. All the methods under consideration perform well in terms of not identifying the group. Thus, the QR-PACS is not a recommended method when there is high correlation and the significant variables do not form a group.

Table 3 demonstrates that the QR-elastic-net and QR-adaptive Lasso have the best SA, respectively; however, the QR-PACS performs better in terms of ME. It is obvious that QR-PACS identifies the important group with high GA and SGA.

Table 4 shows that the QR-elastic-net, QR-adaptive Lasso, and QR-SCAD have the best SA. It is clear that the QR-PACS has the best results among the other methods in terms of ME. In terms of GA and SGA, it can be observed that the QR-PACS performs well.

From Table 5, it can be noticed that the QR-elastic-net has the best SA. The QR-PACS has excellent GA. Also, it is clear that QR-PACS successfully identifies the groups of predictors as seen in the GA and SGA.

5. NCAA Sports Data

In this section, the behavior of the QR-PACS with ridge QR, QR-Lasso, QR-SCAD, QR-adaptive Lasso, and QR-elastic-net was illustrated in the analysis of NCAA sports data [21]. We standardized the predictors and centered the response before the data analysis.

In each repetition, the authors randomly split the data into a training and a testing dataset, the percentage of the testing data was 20%, and models were fit onto the training set. The NCAA sports data were randomly split 100 times each to allow for more stable comparisons. We reported the average and SE of the ratio of test error (RTE) over QR of all methods and the effective model size (MZ) after accounting for equality of absolute coefficient estimates.

The NCAA data was taken from a study of the effects of sociodemographic indicators and the sports programs on graduation rates.

The data size is n=94 and p=19 predictors. The dataset and its description are available from the website (http://www4.stat.ncsu.edu/~boos/var.select/ncaa.html). The predictors are students in top 10% HS (), ACT COMPOSITE (), On living campus (), first-time undergraduates (), Total Enrolment/1000 (), courses taught by TAs (), composite of basketball ranking (), in-state tuition/1000 (), room and board/1000 (), avg BB home attendance (), Professor Salary (), Ratio of Student/faculty (), white (), Assistant professor salary (), population of city (), faculty with PHD (), Acceptance rate (), receiving loans (), and Out of state ().

From Table 6, the results indicate that QR-PACS does significantly better than ridge QR, QR-Lasso, QR-SCAD, QR-adaptive Lasso, and QR-elastic-net in test error. In fact, ridge QR, QR-Lasso, QR-SCAD, QR-adaptive Lasso, and QR-elastic-net perform worse than the QR in test error. The effective model size is 5 for QR-PACS, although it includes all variables in the models.

6. Conclusions

In this paper, QR-PACS for group identification and VS under QR settings is developed, which combines the strength of QR and the ability of PACS for consistent group identification and VS. QR-PACS can achieve the two goals simultaneously. QR-PACS extends PACS from mean regression settings to QR settings. It is proved computationally that it can be simply carried out with an effective computational algorithm. The paper shows that QR-PACS can yield promising predictive precision as well as identifying related groups in both simulation and the real data. Future direction or extension of the current paper is QR-PACS under Bayesian framework. Also, robust QR-PACS is another possible extension of the current paper.

Data Availability

The data which is studied in our paper is the NCAA sports data from Mangold et al. [21]. It is public and available from the website (http://www4.stat.ncsu.edu/~boos/var.select/ncaa.html), [21].

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

References

R. Koenker and G. Bassett Jr., “Regression quantiles,” Econometrica, vol. 46, no. 1, pp. 33–50, 1978.
View at: Publisher Site | Google Scholar | MathSciNet
R. Tibshirani, “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 58, no. 1, pp. 267–288, 1996.
View at: Google Scholar | MathSciNet
J. Fan and R. Li, “Variable selection via nonconcave penalized likelihood and its oracle properties,” Journal of the American Statistical Association, vol. 96, no. 456, pp. 1348–1360, 2001.
View at: Publisher Site | Google Scholar | MathSciNet
R. Tibshirani, M. Saunders, S. Rosset, J. Zhu, and K. Knight, “Sparsity and smoothness via the fused lasso,” Journal of the Royal Statistical Society B: Statistical Methodology, vol. 67, no. 1, pp. 91–108, 2005.
View at: Publisher Site | Google Scholar
H. Zou and T. Hastie, “Regularization and variable selection via the elastic net,” Journal of the Royal Statistical Society B: Statistical Methodology, vol. 67, no. 2, pp. 301–320, 2005.
View at: Publisher Site | Google Scholar | MathSciNet
M. Yuan and Y. Lin, “Model selection and estimation in regression with grouped variables,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 68, no. 1, pp. 49–67, 2006.
View at: Publisher Site | Google Scholar | MathSciNet
H. Zou, “The adaptive lasso and its oracle properties,” Journal of the American Statistical Association, vol. 101, no. 476, pp. 1418–1429, 2006.
View at: Publisher Site | Google Scholar | MathSciNet
H. Zou and H. H. Zhang, “On the adaptive elastic-net with a diverging number of parameters,” The Annals of Statistics, vol. 37, no. 4, pp. 1733–1751, 2009.
View at: Publisher Site | Google Scholar
C. H. Zhang, “Nearly unbiased variable selection under minimax concave penalty,” The Annals of Statistics, vol. 38, no. 2, pp. 894–942, 2010.
View at: Publisher Site | Google Scholar | MathSciNet
R. Koenker, “Quantile regression for longitudinal data,” Journal of Multivariate Analysis, vol. 91, no. 1, pp. 74–89, 2004.
View at: Publisher Site | Google Scholar | MathSciNet
H. Wang, G. Li, and G. Jiang, “Robust regression shrinkage and consistent variable selection through the LAD-Lasso,” Journal of Business & Economic Statistics, vol. 25, no. 3, pp. 347–355, 2007.
View at: Publisher Site | Google Scholar | MathSciNet
Y. Li and J. Zhu, “-norm quantile regression,” Journal of Computational and Graphical Statistics, vol. 17, no. 1, pp. 163–185, 2008.
View at: Publisher Site | Google Scholar
Y. Wu and Y. Liu, “Variable selection in quantile regression,” Statistica Sinica, vol. 19, no. 2, pp. 801–817, 2009.
View at: Google Scholar | MathSciNet
M. Slawski, “The structured elastic net for quantile regression and support vector classification,” Statistics and Computing, vol. 22, no. 1, pp. 153–168, 2012.
View at: Publisher Site | Google Scholar | MathSciNet
A. Belloni and V. Chernozhukov, l₁– Penalized Quantile Regression in High-Dimensional Sparse Models, 2011.
L. Wang, Y. Wu, and R. Li, “Quantile regression for analyzing heterogeneity in ultra-high dimension,” Journal of the American Statistical Association, vol. 107, no. 497, pp. 214–222, 2012.
View at: Publisher Site | Google Scholar | MathSciNet
B. Peng and L. Wang, “An iterative coordinate descent algorithm for high-dimensional nonconvex penalized quantile regression,” Journal of Computational and Graphical Statistics, vol. 24, no. 3, pp. 676–694, 2015.
View at: Publisher Site | Google Scholar | MathSciNet
D. B. Sharma, H. D. Bondell, and H. H. Zhang, “Consistent group identification and variable selection in regression with correlated predictors,” Journal of Computational and Graphical Statistics, vol. 22, no. 2, pp. 319–340, 2013.
View at: Publisher Site | Google Scholar
R. Wilcox, Introduction to Robust Estimation and Hypothesis Testing, Academic Press, San Diego, Calif, USA, 2nd edition, 1997.
View at: MathSciNet
A. Alkenani and K. Yu, “A comparative study for robust canonical correlation methods,” Journal of Statistical Computation and Simulation, vol. 83, no. 4, pp. 690–720, 2013.
View at: Publisher Site | Google Scholar
W. D. Mangold, L. Bean, and D. Adams, “The impact of intercollegiate athletics on graduation rates among major NCAA division I universities: implications for college persistence theory and practice,” Journal of Higher Education, vol. 74, no. 5, pp. 540–563, 2003.
View at: Google Scholar

Copyright

Copyright © 2019 Ali Alkenani and Basim Shlaibah Msallam. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

2037

Downloads

1084

Citations