Advances in Artificial Intelligence

Volume 2017 (2017), Article ID 1736389, 9 pages

https://doi.org/10.1155/2017/1736389

## Method for Solving LASSO Problem Based on Multidimensional Weight

College of Computer & Information Science, Southwest University, Chongqing, China

Correspondence should be addressed to Chen ShanXiong

Received 15 November 2016; Revised 12 February 2017; Accepted 21 March 2017; Published 4 May 2017

Academic Editor: Farouk Yalaoui

Copyright © 2017 Chen ChunRong et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

In the data mining, the analysis of high-dimensional data is a critical but thorny research topic. The LASSO (least absolute shrinkage and selection operator) algorithm avoids the limitations, which generally employ stepwise regression with information criteria to choose the optimal model, existing in traditional methods. The improved-LARS (Least Angle Regression) algorithm solves the LASSO effectively. This paper presents an improved-LARS algorithm, which is constructed on the basis of multidimensional weight and intends to solve the problems in LASSO. Specifically, in order to distinguish the impact of each variable in the regression, we have separately introduced part of principal component analysis (Part_PCA), Independent Weight evaluation, and CRITIC, into our proposal. We have explored that these methods supported by our proposal change the regression track by weighted every individual, to optimize the approach direction, as well as the approach variable selection. As a consequence, our proposed algorithm can yield better results in the promise direction. Furthermore, we have illustrated the excellent property of LARS algorithm based on multidimensional weight by the Pima Indians Diabetes. The experiment results show an attractive performance improvement resulting from the proposed method, compared with the improved-LARS, when they are subjected to the same threshold value.

#### 1. Introduction

Data mining has shown its charm in the era of big data; it has gained much attention in academia regarding how to mine useful information from mass data by mathematical statistics model [1, 2]. In linear model, model error usually is a result of the lack of key variable. At the beginning of the modeling, generally, the more variables (attribute set) chosen are, the less the model error is. But, in the process of modeling, we need to find the attribute set which has the largest explanatory ability to the response, that is, improving the prediction precision and accuracy of the model through selecting variable [3]. Linear regression analysis is the most widely used of all statistical techniques [4]; the accuracy of that analysis mainly depends on the selection of variables and values of regression coefficients [5]. LASSO is an estimate method which can simplify the index set. In 1996, inspired by the ridge regression (Frank and Friedman, 1993) [6] and Nonnegative Garrote (Breiman, 1995) [7], Tibshirani proposed one new method of variable selection. The idea of this method is minimizing the square of the residuals with the constraint that the sum of the absolute values of regression coefficient is less than a constant by construct a penalty function to shrinkage coefficient [8]. As a kind of compression estimates, the LASSO method has higher detection accuracy and better parameter convergence consistency. Efron et al. (2002) proposed the LARS algorithm to support the solution of LASSO [9]. And they proposed improved-LARS algorithm (2004) to eliminate the opposite sign of regression coefficient *β* and solve LASSO better [10]. The improved-LARS algorithm regresses stepwise; each path keeps the correlation between current residual individual and all the variables the same. It also satisfies the solution of LASSO with the same current approach direction and ensures the optimal results and algorithm complexity.

Zou (2006) introduced the adaptive-LASSO by using the different tuning parameters for different regression coefficients. He suggests minimizing the following objective function [11]:

Keerthi and Shevade (2007) proposed a fast tracking algorithm for LASSO/LARS [12]; it approximates the logistic regression loss by a piecewise quadratic function.

Charbonnier et al. (2010) suggest that *β* owns an internal structure that describes classes of connectivity between the variables [13]. They present the weighted-LASSO method to infer the parameters of a first-order vector autoregressive model that describes time course expression data generated by directed gene-to-gene regulation networks.

Since the LASSO method minimizes the sum of squared residual errors, even though the least absolute deviation (LAD) estimator is an alternative to the OLS estimate, Jung (2011) proposed a robust-LASSO-estimator that is not sensitive to outliers, heavy-tailed errors, or leverage points [14].

Bergersen et al. (2011) found that a large value of , the regression coefficient for variable , is subject to a larger penalty and therefore is less likely to be included in the model, and vice versa [15]. They proposed to use weighted-LASSO with integrated relevant external information on the covariates to guide the selection towards more stable results.

Arslan (2012) found that, compared with the LAD-LASSO method, the weighted LAD-LASSO (WLAD-LASSO) method will resist the heavy-tailed errors and outliers in explanatory variables [16].

LASSO problem is a convex minimization problem; the forward-backward splitting operator method is important to solving it. Salzo and Villa (2012) proposed accelerated version to improve the method’s convergence ability [17].

Zhou et al. (2013) proposed an alternative selection procedure based on the kernelized LARS-LASSO method [18]. By formulating the RBF neural network as a linear-in-the-parameters model, they derived a -constrained objective function for training the network.

Zhao et al. (2015) added two tuning parameters and to the wavelet-based weighted-LASSO methods. The tuning parameter controls the model sparsity. The choice of controls the optimal level of wavelet decomposition for the functional data. They improved wavelet-based LASSO by adding a prescreening step prior to the model fitting or, alternatively, by using a weighted version of wavelet-based LASSO [19].

Salama et al. (2016) proposed a new LASSO algorithm, the minimum variance distortionless response (MVDR) LARS-LASSO [20], which solves the DOA problem in the CS framework.

In light of superior performance achieved in [10] for solving LASSO problem, a new idea is extended in this paper into the uses of multidimensional weight LARS. Our main contributions are as follows:(i)In the solving process of LASSO, each attribute in the evaluation population has different relative importance to the overall evaluation. The relative importance include the following: not all attributes influence the regression results and each individual in the regression model has different weight. When improved-LARS algorithm calculated the equiangular vector, we distinguish the effect resulting from different attribute variable, considering joint correlation between regression variables and surplus variable.(ii)We discuss the method proposed in this paper by the experimental evidence of the Pima Indians Diabetes Data and two sets of evaluation index.

In Section 2, we introduce the LASSO problem and improved-LARS algorithm briefly, including theory and definition. In Section 3 we put forward the LARS algorithm based on multidimensional weighting model, which calculates the direction and variables based on the weighting variables and accelerates the approximation process in promising direction. We introduce the data sets and evaluation indicators when we verify algorithm and discuss the experimental results in Section 4. Section 5 is the summary and prospect of this paper.

#### 2. LASSO Problem and Improved-LARS Algorithm

##### 2.1. The Definition of LASSO

Suppose that there are the multidimensional variables , and response . Each group of has a corresponding . Regression coefficient is estimated where when the sum of squared residuals is minimal. The LASSO linear regression model is defined by

is -dimensional column vector, the parameter to be estimated. Error vector meets and . Suppose sparse model ; most of regression coefficients are 0 in . Based on obtaining data, variable selection can identify which coefficient is zero and estimate other nonzero parameters; it is looking for parameters to build a sparse model. The problem we need to solve in matrix is defined bywhere is the threshold value of the sum of regression coefficient and and are two types of regularization norms.

##### 2.2. The Improved-LARS Algorithm

The improved-LARS algorithm can solve LASSO problem well, which is based on the Forward Selection algorithm and Forward Gradient algorithm. The improved-LRAS has appropriate forward distance, lower complexity, and more relevance of information. Figure 1 shows the basic steps of algorithm.(i)The improved-LARS calculates the correlation between and constantly and finds the individual most correlated with the response. It takes the largest step possible in the direction of this individual, using to approximate .(ii)Until some other individual, say , has the same correlation with the current residual individual, . Improved-LARS process is in an equiangular direction ( is the direction between the two predictors and ).(iii)When a third individual earns its way into the “most correlated” set, improved-LARS then proceeds equiangularly between , , and , that is, along the “least angle direction,” until a fourth individual enters, and so forth the direction equiangular means the bisector of each vector in high dimension.(iv)The LARS procedure works until the residual error is less than a threshold or all the variables are involved in the approach, the algorithm stop.