Modified One-Parameter Liu Estimator for the Linear Regression Model
Motivated by the ridge regression (Hoerl and Kennard, 1970) and Liu (1993) estimators, this paper proposes a modified Liu estimator to solve the multicollinearity problem for the linear regression model. This modification places this estimator in the class of the ridge and Liu estimators with a single biasing parameter. Theoretical comparisons, real-life application, and simulation results show that it consistently dominates the usual Liu estimator. Under some conditions, it performs better than the ridge regression estimators in the smaller MSE sense. Two real-life data are analyzed to illustrate the findings of the paper and the performances of the estimators assessed by MSE and the mean squared prediction error. The application result agrees with the theoretical and simulation results.
The linear regression model (LRM) is where is a vector of the predictand, is a known matrix of predictor variables, is a vector of unknown regression parameters, is a vector of errors such that and , and is an identity matrix. The parameters in (1) are mostly estimated by the ordinary least square (OLS) estimator defined in (2)
The performance of the estimator is conditional on the non-violation of the assumption of model (1) that the predictor variables are independent. However, in most real-life applications, we observed that the predictor variables grow together, which result in the problem termed multicollinearity. The consequence of this on the OLS estimator is that it reduces its efficiency and it became unstable (for examples, [1, 2]). Many methods exist in literature to combat the multicollinearity problem. Biased estimators with one biasing parameter include the ridge regression estimator by Hoerl and Kennard  and the Liu estimator by Kejian , among others.
The objective of this paper is to propose a new one-parameter Liu-type estimator for the regression parameter when the predictor variables of the model are linearly related. Since we want to compare the performance of the proposed estimator with the usual Liu and ridge regression estimators, we will give a brief description of each of them as follows.
1.1. Ridge Regression Estimator (RRE)
1.2. Liu Estimator
This estimator is a linear function of the shrinkage parameter .
1.3. Proposed One-Parameter Liu Estimator
One of the limitations of the shrinkage parameter by Kejian  is that it returns a negative value most of the time which affects the performance of the estimator [4, 5]. In this study, we augment to the LRM. This is done by minimizing subject to where is a constant. The modified Liu estimator of is
This modification provides a substantial improvement in the performance of the modified Liu estimator and will give a positive value of the shrinkage parameter . The estimator will always produce a smaller mean square error compared to the OLS estimator.
The bias, variance, and MSE of the proposed estimator are, respectively, given as follows:
To compare the performance of the estimators, we will consider the linear regression model in canonical form, which is given as follows: where , , . and are the eigenvalues and eigenvectors of . The ordinary, ridge, Liu, and modified Liu estimators of are
The following notations and lemmas will be used to prove the statistical property of .
Lemma 1. Given that matrix and is a vector, if and only if .
Lemma 2. Given two estimators of , and . Suppose that , where and represent the covariance matrix of and , respectively. Therefore, if and only if where where and represent the mean squared error matrix and bias vector of .
According to Özkale and Kaçiranlar , if , then where is the scalar mean square error.
The rest of the paper is as follows. The theoretical comparison among the estimators and the estimation of the biasing parameter of the proposed estimator are given in Section 2. A simulation study and numerical examples are conducted in Sections 3 and 4, respectively. This paper ends up with some concluding remarks in Section 5.
2. Comparison among Estimators
In this section, we will show theoretical comparisons among the estimators. First, we will compare between the proposed estimator and OLSE.
2.1. The Proposed Estimator and OLSE
Theorem 3. Given two estimators of , , and , if , then the estimator is better than , that is, if and only if, where .
Proof. Recall that Then, We observed from equation (16) that which shows that.
2.2. The Proposed Estimator and RRE
Theorem 4. Given two estimators of , , and , if , then the estimator is better than , that is, if and only if, where and .
Proof. Recall that Therefore, From equation (18), we observed that . Hence, this shows that .
2.3. The Proposed Estimator and Liu Estimator
Theorem 5. Given two estimators of , , and , if , then the estimator is better than , that is, if and only if, where and .
Proof. Note that Therefore, From equation (20), we observed that . Hence, this shows that.
2.4. Determination of
The partial derivative of (21) with respect to and setting it to zero, we obtained
In eqn. (22), we replace and with its unbiased estimate and obtained:
Taking a critical look at equation (23), the estimate of the shrinkage parameter will often return a positive value since and will always be greater than zero. For practical purposes, we obtained the minimum value of (24) as
3. Simulation Study
As theoretical comparison among the estimators in Section 2 gives the conditional dominance among the estimators, a simulation study conducted using the R 3.4.1 programming languages is considered in this section to grasp the better picture about the performance of the estimators.
3.1. Simulation Technique
We generated the explanatory variables by the following references of Gibbons  and Qasim et al. : where are independent standard normal pseudorandom numbers, and represents the correlation between any two predictor variables. The number of predictor variables is three and seven. The predictands for the regression models are generated as follows: where . is constrained to unity, according to Newhouse and Oman , Lukman et al. , and Lukman et al. . We examined the performances of the estimators, using mean square error criteria. The simulation is performed using the following condition: (1)Sample sizes: , 50, 100, and 200(2)Number of replications: 1000(3)The variances: are 1, 25, and 100(4)The multicollinearity levels: , 0.8, 0.9, and 0.99(5), 0.2,…, 0.9, and 1
The simulated results for and , 0.80, 0.90, and 0.99 are presented in Tables 1–3 for , 50, and 100, respectively. and , 0.80, 0.90, and 0.99 are presented in Tables 4–6 for , 50, and 100, respectively. For a better picture, we have plotted MSE vs. for , , and , 0.90, and 0.99 in Figures 1 and 2, respectively. We also plotted MSE vs. and MSE vs. in Figure 3.
3.2. Simulation Result Discussion
Results from Tables 1–8 show that increasing the sample size results in a decrease in the MSE values for each of the estimator. It is evident that MSE values of the estimators increase as the degree of correlation and the number of explanatory variables increase. The simulation results show that the proposed estimator performed best at most levels of the multicollinearity, sample size (), and number of explanatory variables with few exceptions. The only exceptions to its performance are when, and they are defined as follows: (i), , , and for , 5, and 10(ii), , and for and 10(iii), , and and 0.9 when (iv), , , and for , 5, and 10(v), , , and for , 5, and 10(vi) and for at , 5, and 10 including and 0.9 at
The instances mentioned above are the only times that ridge regression dominates the proposed estimator. The new estimator consistently dominates the OLS and the Liu estimator. Also, from Tables 1–8, we observed that the values of OLS and Liu are the same when .
Consistently when , 0.8, and 0.9, the proposed estimator performs better than other estimators at the different sample sizes irrespective of the values of the biasing parameter and . The fact that the ridge estimator dominates the proposed estimator in some of the exceptions mentioned earlier does not show that it performs better. It only shows that at those intervals, the performance of the new estimator drops. Thus, this necessitates the use of real-life data in the next session because the values of and will be estimated rather than choosing it arbitrarily.
We adopt two datasets to illustrate the theoretical findings of the paper. These include the Portland cement data and the French economy data.
4.1. Portland Dataset
The first user of this dataset was Woods et al.  and later adopted by Kaciranlar et al.  and Li and Yang . It consists of one predictand, , which is the heat evolved after 180 days of curing measured in calories per gram of cement, and four predictors: X1 is the tricalcium aluminate, X2 is the tricalcium silicate, X3 is the tetracalcium aluminoferrite, and X4 is the β-dicalcium silicate. Variance inflation factors (VIFs) and a condition number are adopted to diagnose multicollinearity in the model . The VIFs are 38.50, 254.42, 46.87, and 282.51 while the condition number is approximately 424. Both tests are evidence that the model possesses severe multicollinearity. The regression output is available in Table 9. According to Kejian , the optimum biasing parameter is expressed as
Following Özkale and Kaçiranlar , we replaced with if .
The ridge biasing parameters are computed by
We also adopt the leave-one-out crossvalidation to validate the performance of the estimators (see ). The performance of the estimator is assessed through the mean squared prediction error (MSPE). The result is presented in Table 9.
The theoretical results are computed as follows:
We observed that the theoretical comparisons stated in Sections 2.1, 2.2, and 2.3 are valid since each of the estimates are less than 1. From Table 9, the regression coefficients and MSE of the OLS and Liu estimators are approximately the same because is close to unity. Recall that the Liu estimator becomes OLS when . The proposed estimator possesses the smallest mean square error and average MSE of the validation error. Also, the performances of the estimators largely depend on their biasing parameters.
4.2. French Economy Dataset
The detail about this dataset is initially described in Chatterjee and Price  and is later available in the following references Malinvard  and Kejian . It comprises one predictand, imports, and three predictor variables (domestic production, stock information, and domestic consumption) with eighteen observations. The variance inflation factors are , , and and the condition number 32612. Both results indicate the presence of severe multicollinearity. We analyzed the data using the biasing parameters for each of the estimators and present the results in Table 10. The proposed estimator performed the best in the sense of smaller MSE and MSPE. As mentioned earlier, the estimators’ performance is a function of the biasing parameter. The proposed estimator with the biasing parameter performs best. Also, the theoretical results agree with Section 2 and support the simulation and real-life findings.
The theoretical results are computed as follows:
5. Some Concluding Remarks
Both ridge regression and Liu estimators are widely accepted in the linear regression model as an alternative to the OLS estimator to circumvent the problem of multicollinearity. In this study, we proposed a modified Liu estimator, which possesses a single parameter which places it in the class of the ridge and Liu estimators. Theoretical comparisons, simulation study, and real-life applications evidently show that the proposed estimator consistently dominates the existing Liu estimator and ridge regression estimator under some conditions. We recommend the use of this estimator for the linear regression model with multicollinearity problem. We note that the proposed estimator can be extended to other regression models, for example, logistic regression, Poisson, ZIP, gamma, and related models, and these possibilities are under current investigation.
Data will be made available on request.
Conflicts of Interest
The authors declare that there is no conflict of interest regarding the publication of this paper.
A. F. Lukman and K. Ayinde, “Review and classifications of the ridge parameter estimation techniques,” Hacettepe Journal of Mathematics and Statistics, vol. 46, no. 5, pp. 953–967, 2017.View at: Google Scholar
J. P. Newhouse and S. D. Oman, An Evaluation of Ridge Estimators. A report prepared for United States Air Force project RAND, 1971.
S. Kaciranlar, S. Sakallioglu, F. Akdeniz, G. P. H. Styan, and H. J. Werner, “A new biased estimator in linear regression and a detailed analysis of the widely-analysed dataset on Portland cement,” Sankhyā: The Indian Journal of Statistics, Series B, vol. 61, no. 3, pp. 443–459, 1999.View at: Google Scholar
A. F. Lukman, Classification-Based Ridge, Lambert Academic Publishing, Mauritius, 2018.
G. James, D. Witten, T. Hastie, and R. Tibshirani, An Introduction to Statistical Learning with Application to R, vol. 103 of Springer Texts in Statistics, 2013.View at: Publisher Site
S. Chatterjee and A. S. Price, Regression Analysis by Example, Wiley, New York, 1977.
E. Malinvard, Statistical Methods of Econometrics, North Holland, Amsterdam, 3rd edition, 1980.