Method for Solving LASSO Problem Based on Multidimensional Weight
In the data mining, the analysis of high-dimensional data is a critical but thorny research topic. The LASSO (least absolute shrinkage and selection operator) algorithm avoids the limitations, which generally employ stepwise regression with information criteria to choose the optimal model, existing in traditional methods. The improved-LARS (Least Angle Regression) algorithm solves the LASSO effectively. This paper presents an improved-LARS algorithm, which is constructed on the basis of multidimensional weight and intends to solve the problems in LASSO. Specifically, in order to distinguish the impact of each variable in the regression, we have separately introduced part of principal component analysis (Part_PCA), Independent Weight evaluation, and CRITIC, into our proposal. We have explored that these methods supported by our proposal change the regression track by weighted every individual, to optimize the approach direction, as well as the approach variable selection. As a consequence, our proposed algorithm can yield better results in the promise direction. Furthermore, we have illustrated the excellent property of LARS algorithm based on multidimensional weight by the Pima Indians Diabetes. The experiment results show an attractive performance improvement resulting from the proposed method, compared with the improved-LARS, when they are subjected to the same threshold value.
Data mining has shown its charm in the era of big data; it has gained much attention in academia regarding how to mine useful information from mass data by mathematical statistics model [1, 2]. In linear model, model error usually is a result of the lack of key variable. At the beginning of the modeling, generally, the more variables (attribute set) chosen are, the less the model error is. But, in the process of modeling, we need to find the attribute set which has the largest explanatory ability to the response, that is, improving the prediction precision and accuracy of the model through selecting variable . Linear regression analysis is the most widely used of all statistical techniques ; the accuracy of that analysis mainly depends on the selection of variables and values of regression coefficients . LASSO is an estimate method which can simplify the index set. In 1996, inspired by the ridge regression (Frank and Friedman, 1993)  and Nonnegative Garrote (Breiman, 1995) , Tibshirani proposed one new method of variable selection. The idea of this method is minimizing the square of the residuals with the constraint that the sum of the absolute values of regression coefficient is less than a constant by construct a penalty function to shrinkage coefficient . As a kind of compression estimates, the LASSO method has higher detection accuracy and better parameter convergence consistency. Efron et al. (2002) proposed the LARS algorithm to support the solution of LASSO . And they proposed improved-LARS algorithm (2004) to eliminate the opposite sign of regression coefficient β and solve LASSO better . The improved-LARS algorithm regresses stepwise; each path keeps the correlation between current residual individual and all the variables the same. It also satisfies the solution of LASSO with the same current approach direction and ensures the optimal results and algorithm complexity.
Zou (2006) introduced the adaptive-LASSO by using the different tuning parameters for different regression coefficients. He suggests minimizing the following objective function :
Keerthi and Shevade (2007) proposed a fast tracking algorithm for LASSO/LARS ; it approximates the logistic regression loss by a piecewise quadratic function.
Charbonnier et al. (2010) suggest that β owns an internal structure that describes classes of connectivity between the variables . They present the weighted-LASSO method to infer the parameters of a first-order vector autoregressive model that describes time course expression data generated by directed gene-to-gene regulation networks.
Since the LASSO method minimizes the sum of squared residual errors, even though the least absolute deviation (LAD) estimator is an alternative to the OLS estimate, Jung (2011) proposed a robust-LASSO-estimator that is not sensitive to outliers, heavy-tailed errors, or leverage points .
Bergersen et al. (2011) found that a large value of , the regression coefficient for variable , is subject to a larger penalty and therefore is less likely to be included in the model, and vice versa . They proposed to use weighted-LASSO with integrated relevant external information on the covariates to guide the selection towards more stable results.
Arslan (2012) found that, compared with the LAD-LASSO method, the weighted LAD-LASSO (WLAD-LASSO) method will resist the heavy-tailed errors and outliers in explanatory variables .
LASSO problem is a convex minimization problem; the forward-backward splitting operator method is important to solving it. Salzo and Villa (2012) proposed accelerated version to improve the method’s convergence ability .
Zhou et al. (2013) proposed an alternative selection procedure based on the kernelized LARS-LASSO method . By formulating the RBF neural network as a linear-in-the-parameters model, they derived a -constrained objective function for training the network.
Zhao et al. (2015) added two tuning parameters and to the wavelet-based weighted-LASSO methods. The tuning parameter controls the model sparsity. The choice of controls the optimal level of wavelet decomposition for the functional data. They improved wavelet-based LASSO by adding a prescreening step prior to the model fitting or, alternatively, by using a weighted version of wavelet-based LASSO .
Salama et al. (2016) proposed a new LASSO algorithm, the minimum variance distortionless response (MVDR) LARS-LASSO , which solves the DOA problem in the CS framework.
In light of superior performance achieved in  for solving LASSO problem, a new idea is extended in this paper into the uses of multidimensional weight LARS. Our main contributions are as follows:(i)In the solving process of LASSO, each attribute in the evaluation population has different relative importance to the overall evaluation. The relative importance include the following: not all attributes influence the regression results and each individual in the regression model has different weight. When improved-LARS algorithm calculated the equiangular vector, we distinguish the effect resulting from different attribute variable, considering joint correlation between regression variables and surplus variable.(ii)We discuss the method proposed in this paper by the experimental evidence of the Pima Indians Diabetes Data and two sets of evaluation index.
In Section 2, we introduce the LASSO problem and improved-LARS algorithm briefly, including theory and definition. In Section 3 we put forward the LARS algorithm based on multidimensional weighting model, which calculates the direction and variables based on the weighting variables and accelerates the approximation process in promising direction. We introduce the data sets and evaluation indicators when we verify algorithm and discuss the experimental results in Section 4. Section 5 is the summary and prospect of this paper.
2. LASSO Problem and Improved-LARS Algorithm
2.1. The Definition of LASSO
Suppose that there are the multidimensional variables , and response . Each group of has a corresponding . Regression coefficient is estimated where when the sum of squared residuals is minimal. The LASSO linear regression model is defined by
is -dimensional column vector, the parameter to be estimated. Error vector meets and . Suppose sparse model ; most of regression coefficients are 0 in . Based on obtaining data, variable selection can identify which coefficient is zero and estimate other nonzero parameters; it is looking for parameters to build a sparse model. The problem we need to solve in matrix is defined bywhere is the threshold value of the sum of regression coefficient and and are two types of regularization norms.
2.2. The Improved-LARS Algorithm
The improved-LARS algorithm can solve LASSO problem well, which is based on the Forward Selection algorithm and Forward Gradient algorithm. The improved-LRAS has appropriate forward distance, lower complexity, and more relevance of information. Figure 1 shows the basic steps of algorithm.(i)The improved-LARS calculates the correlation between and constantly and finds the individual most correlated with the response. It takes the largest step possible in the direction of this individual, using to approximate .(ii)Until some other individual, say , has the same correlation with the current residual individual, . Improved-LARS process is in an equiangular direction ( is the direction between the two predictors and ).(iii)When a third individual earns its way into the “most correlated” set, improved-LARS then proceeds equiangularly between , , and , that is, along the “least angle direction,” until a fourth individual enters, and so forth the direction equiangular means the bisector of each vector in high dimension.(iv)The LARS procedure works until the residual error is less than a threshold or all the variables are involved in the approach, the algorithm stop.
In Figure 1, an example of two-dimensional problem, it starts with both coefficients equal to zero, firstly, finding the individual more correlated with the response , , approximating along the direction until residual of and has the same correlation with and . Then the approximating direction changes to the equiangular between and .
3. LARS Based on Multidimensional Weight
3.1. Algorithm Analysis
In the process of improved-LARS stepwise regression, the angle regression takes all selected variables with the same importance. However, each individual of has different weight in the regression model that the indicators in its overall evaluation have different relative importance. We take the correlation between individual and surplus variables into consideration, taking it and the correlation between and as the condition to select approximation individual.
In Figure 3, originally, we choose as the first approximating variable for . We take the predictor ’s contribution rate to the whole system as one approximating condition; the new correlation is where is the contribution rate; we will describe the calculation method of it in detail later; are are constants.
Because of the addition condition, it will inevitably increase the range of values about judgment condition. In order to keep the stability of the system, we limit the product in .
After transformation, it may indicate the possibility that could have been chosen to be increasing. For example, Figure 2 shows that the transformed gets closer to , reaching , and the transformed gets closer to , reaching . It also may indicate the possibility that could be selected to be reducing; the transformed gets far away to y, reaching .
On the basis of Figure 1, the new correlation will significantly change the approximation process. Figure 3 shows the predictor direction when adding the multidimensional weight: the possibility is selected of two predictors’ change. It starts with ; the correlation of is the same as that of residual ; when moving to , the approximation direction changes to the equiangular direction of and ; then it moves forward ; the approximation process is complete. Approximating path changes, and the regression variables and also change; we can get the new calculated by improved regression method.
Applying the aforementioned process to multidimensional high-order system, the collected feature indicators and objects are expressed as
The collected result response is
There are many calculation methods of in regression process. Without loss of generality, the calculation methods should be unartificial, relative number, quantitative, and independent. We use part_PCA, Independence Weight, and CRITIC to control regression process and verify superiority of the algorithm.
3.1.1. Part of Principle Components Analysis
PCA uses orthogonal transformation for dimension reduction in statistics [21, 22]. It transforms data into a new coordinate system. The biggest variance of the data projection is in the first coordinate, called the 1st principal component. The second of the data projection is in the second coordinate, called the 2nd principal component, and so on. It keeps the low-order principal component of data set, ignores the high-order principal component after transform, identifies the dominating factors, and keeps top m principal component whose overall information utilization rate is higher than 85%. Inspired by the principle of PCA, we preserve the value of all the components simultaneously and identify the variance contribution of each component; the part_PCA algorithm steps are as follows.
The correlation coefficient matrix R is information utilization of each feature
3.1.2. Independence Weight
We sort the multiple correlation coefficient of individual by multiple regression in statistical methods [23, 24]; the greater the multiple correlation coefficient is, the more repeated the information should be, the smaller the information utilization should be, and the smaller the weight should be. Calculation steps are
is the rest matrix in except
For is negatively related with weight, we take the reciprocal of multiple correlation coefficient as score, getting the weighting coefficient through normalized processing
Based on Independence Weight, CRITIC is a kind of objective weighting method proposed by Diakoulaki et al. . It is based on identifying the objective weight of individual to evaluate the contrast and conflict between indicators [26, 27]. The standard deviation indicates the gap of each scheme in the same index, which can express contrast intensity. The conflict between the indicators is based on the correlation between indicators; relevancy can express conflict.
The quantitative conflict between the indicator and other indicators is
is the correlation coefficient between and
is the amount of information the indicator includes
The greater , the greater amount of information and the more relative important of that indicator. The objective weight of indicator is
3.2. Algorithm Steps
In order to obtain the numerical solution of stability, and in (3) are standardized and preprocessed to omit , so that , , .
For , define the matrix
is the column vector selected from to satisfy , with the same direction to current , where the signs equal to
The equiangular is as the following:
is the unit vector making equal angles, less than 90°, with the columns of . is a vector of 1’s of length equaling , the size of . is the equiangular contribution of each attribute which is selected in . is processed by weighting to change approach direction and approach variable selection.
We now can further describe the improved-LARS based on multidimensional algorithm; we begin at and build up new u by steps. Suppose that is the current computation and is the current correlation between predictor and response vector
The active set is the set of indices corresponding to covariates with the greatest absolute current correlations, when ; corresponding to is the approximate direction. Let
The length of approach along the direction now is
“” indicates the calculation of the minimum of positive components within each choice of in this approximating process. Each predictor in corresponds to increase in ; we add weight to control approach direction.
Part_PCA, Independent Weight evaluation, and CRITIC are added in the process of improved-LARS algorithm, respectively. Three parallel algorithms are established to calculate the weight of each attribute.
The centralized weight indicates the impact of weight on the approach direction after the aforementioned three methods.
is the same-dimension weight matrix from ; the approach direction estimate is as the following:
Then the new active set is
is the minimizing index in (25) of . In order to conform to the requirements of the LASSO solution that the track should keep the same direction with current approach direction, the step size of the first opposite sign is
When , there is opposite sign, removing from . Then the algorithm enters the next approximation process. When using instead of , repeat the above steps, until the residual error is less than a threshold or all the variables are involved in the approach. The pseudocode is shown in Pseudocode 1.
This improved algorithm increases the calculation steps for adding the weighting analysis, so the calculation time increases. But the approach mechanism of each variable in the statistical model stays the same, so the space complexity is consistent with the original algorithm.
4. Experiment and Result Analysis
4.1. Introducing Data Set
For the characteristics of the compression of the regression coefficient, the experiment set should be sparse, as well as one dependent variable which is easy to distinguish. We take Pima Indians Diabetes Data Set provided by Applied Physics Laboratory of the Johns Hopkins University, for example . The data record 768 performance descriptions, negative and positive diabetes sample, including 8 attributes variables and one classification value. In the classification value, “1” represents that the diabetes test result is positive; “0” represents negative. We verify algorithm by predictor of 8 attributes variables and response of one classification value. The goal of this test is improving on veracity comparing to original LARS algorithm.
4.2. Verification Condition
We use ROC curve to show results in order to evaluate more intuitively the performance of the proposed method. That is a binary classification problem whether the participants' diabetes is positive or negative; the testing results have the following four types: TP (true positive), the testing results are positive and are positive actually FP (false positive), the testing results are positive but are negative actually TN (true negative), the testing results are negative and are negative actually FN (false negative), the testing results are negative but are positive actually
We take the following three characters as the inspection standards through basic four types of statistics by ROC space: ACC (accuracy), TPR (true positive rate), NPV (negative predictive value),
NPV is the proportion of correct detecting about negative; it means that the people who tested negative actually are negative. ACC represents the proportion of correct estimating in the sum of positive and negative. NPV is the proportion of people who tested negative in actually negative population. Compared to NPV, TPR is also called Hit Rate; it is the proportion of correctly detecting the people who actually are positive in the tested positive population. ACC, TPR, and NPV tell us the result is better or worse than LARS.
Another character we judge the result with is SSR. The smooth turning point of SSR is corresponding to the optimal regression coefficient of predictor variable.
4.3. Experimental Result
Threshold starting from 0 increases to 1 for the step length 0.01; we draw the changing curve of ACC, TPR, and NPV, with the negative and positive as dependent variables and the 8 attribute variables as the independent variables.
Figure 4 shows the accuracy and the comprehensive optimal value of three inspection standards after each cycle of LASSO with Pima Indians Diabetes.
Figure 4 shows that NPV are all improved when adding the weighting to LASSO’s solution; NPV is improved 5.16% when adding part_PCA, 5.58% when adding Independence Weight, and 5.1% when adding CRITIC. We can find that TPR is improved 13% when adding part_PCA, 14% when adding Independence Weight, and 13% when adding CRITIC for those methods changing the approach direction of algorithm. It is observed that ACC is improved 0.32% when adding part_PCA and Independence Weight. And there is no effect to ACC when adding CRITIC so the ACC keeps the original number.
Therefore, the solutions of LASSO when adding the Independence Weight are optimal, followed by the CRITIC and part_PCA. This improved algorithm significantly increases the NPV and TPR ensuring the ACC is not reduced. The veracity of LASSO’s solutions is improved through changing approach direction when adding the weighting.
Figure 5 shows the SSR (Sum of Squared Residuals) of response and equiangular direction after each cycle of LASSO with Pima Indians Diabetes. It is obvious that the general trend of SSR completely remains consistent and the residual of optimal coefficient almost remains consistent when adding the different weight judgment. It will not change the advantages of LASSO when adding the weighting. If the optimal regression coefficient of predictor increases significantly, the SSR will change too. It increases 0.051 residual of LASSO solution when adding part_PCA, reduces 0.021 residual when adding CRITIC and Independence Weight. The results show that the final regression result is closer to the real response when adding CRITIC and Independence Weight.
It can be found when synthesizing these three inspection standards that, adding the approach with multidimensional weight, the threshold value of the optimal solution is mainly reduced (except CRITIC), which means that the sum of absolute values of system’s regression coefficient is less than a smaller threshold; this algorithm meets the requirements in more extreme threshold range.
Table 1 shows the original regression coefficient , which adds part_PCA, which adds Independence Weight, and which adds CRITIC.
Table 2 shows the difference and innovation of the multidimensional weight LARS, improved-LARS associated with part_PCA; Independence Weight and CRITIC act on , changing the regression track and getting more accurate results.
In this paper, a method considering the variables choosing and the approach direction of LARS algorithm is used to solve LASSO; we propose the LARS algorithm based on multidimensional weight to improve the veracity of LASSO’s solutions and keep the advantage of LASSO’s parameter estimation, which has stable regression coefficient, reduces the number of parameters, and has good consistency of parameter convergence. We verify the efficiency of the algorithm with Pima Indians Diabetes Data Set. The precision of the calculated weight was flawed for the greater dimension of individual, so we need to further optimize the embedding weight algorithm in the later studies, to improve the accuracy and precision of regression algorithm in approach variable and direction choosing which is changed by weighting.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
This study was supported by the National Natural Science Foundation of China (61170192, 41271292), Chinese Postdoctoral Science Foundation (2015M580765), the Fundamental Research Funds for the Central Universities (XDJK2014C039, XDJK2016C045), Doctoral Fund of Southwestern University (swu1114033), and the Project of Scientific and Technological Research Program of Chongqing Municipal Education Commission (KJ1403106).
A. John and J. A. Rice, Mathematical Statistics and Data Analysis, Mathematical statistical physics Elsevier, 2006.
E. P. G. Box, S. J. Hunter, and W. G. Hunter, “Statistics for experimenters: an introduction to design, data analysis, and model building,” Statistics for Experimenters an Introduction to Design, vol. 73, no. 10, article S229, 2014.View at: Google Scholar
Z. L. Ke, The Application of LASSO and Other Related Methods in Multiple Linear Regression Modle, Jiaotong University, Beijing, China, 2011.
R. Tibshirani, “Regression shrinkage and selection via the LASSO,” Journal of the Royal Statistical Society, vol. 58, no. 1, pp. 267–288, 1996.View at: Google Scholar
C. Charbonnier, J. Chiquet, and C. Ambroise, “Weighted-LASSO for structured network inference from time course data,” Statistical Applications in Genetics & Molecular Biology, vol. 9, no. 1, article 15, 2010.View at: Google Scholar
K. M. Jung, “Weighted least absolute deviation LASSO estimator,” Communications of the Korean Statistical Society, vol. 18, no. 6, pp. 733–739, 2011.View at: Google Scholar
C. L. Bergersen, K. I. Glad, and H. Lyng, “Weighted LASSO with data integration,” Statistical Applications in Genetics & Molecular Biology, vol. 10, no. 1, pp. 1–29, 2011.View at: Google Scholar
O. Arslan, “Weighted LAD-LASSO method for robust parameter estimation and variable selection in regression,” Computational Statistics & Data Analysis, vol. 56, no. 6, pp. 1952–1965, 2012.View at: Google Scholar
Y. Zhao, H. Chen, and R. T. Ogden, “Wavelet-based weighted LASSO and screening approaches in functional linear regression,” Journal of Computational & Graphical Statistics, vol. 24, no. 3, 2015.View at: Google Scholar
A. A. Salama, M. O. Ahmad, and M. N. Swamy, “Underdetermined DOA estimation using MVDR-weighted LASSO,” Sensors, vol. 16, no. 9, p. 1549, 2016.View at: Google Scholar
I. T. Jolliffe, Principal Component Analysis, vol. 87, Springer, Berlin, Germany, 1986.
P. Zhang, Research of Comprehensive Evaluation Based on Principal Component Analysis, Nanjing University of Science and Technology, 2004.
H. Cai and W. L. Shen, The Weight of Comprehensive Benefit Evaluation in Hospital (2)—Independence Weight, Chinese Hospital Statistics, 1997.
J. He, E. S. Gao, and L. Chaohua, “The study of the weight coefficient and standardized method of the comprehensive evaluation,” Journal of Public Health in China, vol. 17, no. 11, pp. 1048–1050, 2001.View at: Google Scholar
D. Diakoulaki, G. Mavortas, and L. Papayanakis, “Determining objective weights in multiple criteria problem: the CRITIC method,” Computer & Operation Researvh, no. 22, pp. 763–770, 1995.View at: Google Scholar
U. E. Choo, B. Schoner, and W. C. Wedley, “Interpretation of criteria weights in multicriteria decision making,” Computers and Industrial Engineering, vol. 37, no. 3, pp. 527–541, 1999.View at: Google Scholar
V. Sigillito, UCI Machine Learning Repository, The Johns Hopkins University, Applied Physics Laboratory, School of Information and Computer Science, 1990.