Abstract

The usage of the ridge estimators is very common in presence of multicollinearity in multiple linear regression models. The ridge estimators are used as an alternative to ordinary least squares in case of multicollinearity as they have lower mean square error. Choosing the optimal value of the biasing parameter k is vital in ridge regression in terms of bias-variance trade off. Since the theoretical comparisons among the ridge estimators are not possible, it is general practice to carry out a Monte Carlo study to compare them. When the Monte Carlo designs on the existing ridge estimators are examined, it is seen that the performances of the ridge estimators are only considered for the same level of relationship between the independent variables. However, it is more likely to encounter different levels of relationships between the independent variables in real data sets. In this study, a new type iterative ridge estimator is proposed based on a modified form of the estimated mean square error function. Furthermore, a novel search algorithm is provided to achieve the estimations. The performance of the proposed estimator is compared with that of the ordinary least squares estimator and existing 18 ridge estimators through an extensive Monte Carlo design. In the design of the Monte Carlo, both data generation techniques were taken into account, based on the constant and varying correlation levels between the independent variables. Two illustrative real data examples are presented. The proposed estimator outperforms the existing estimators in the sense of the mean squared error for both data generating types. Moreover, it is also superior with respect to the k-fold cross-validation method in the real data examples.

1. Introduction

Let us consider the general form of the multiple linear regression model:where is a vector of response (dependent) variable, is a unknown regression coefficient vector, is a design matrix of rank , and is a random error vector, which is multivariate normal distributed with zero mean vector and variance-covariance matrix. The ordinary least square (OLS) estimator of is given by and the variance-covariance matrix of is [1].

Recall that the OLS estimator is unbiased and has minimum variance among all unbiased estimators. The OLS estimator maintains its unbiasedness but loses the property of minimum variance in the presence of multicollinearity. Consequently, large standard errors lead to wider confidence intervals and serious errors in the interpretation of model parameters. Several estimators have been proposed to solve the multicollinearity problem [110]. These studies have focused on reducing the variance by sacrificing unbiasedness and thus finding estimators that have smaller mean square error (MSE). The ridge regression aims to solve the multicollinearity problem by adding small positive values to the diagonal elements of matrix. The ridge estimator where is an unknown parameter vector with size given by equation (2) is proposed by [4]. Most of the ridge estimators were derived from transformations of previously proposed estimators. For example, is the harmonic mean of ; furthermore, , , and estimators were also obtained using the arithmetic mean, geometric mean, and the median of , respectively [1, 5]. and estimators are suggested by [1, 11], respectively. Six ridge estimators proposed by adding the multiplier include different quantiles of (where is maximum eigenvalue of matrix) to and estimators by [12]. These examples can be extended by studies where new estimators were obtained by applying minimum, maximum, and square root transformations to existing estimators [8, 9].

In addition to the estimators in which Ridge and Liu type estimators are combined, the estimators proposed in recent studies for the solution of problems involving multicollinearity and outlier problems have come to the fore [1316]. There are studies that involve modifying different estimation methods or using ridge estimators in different regression models. For example, the authors in [17] proposed four ridge estimators based on OLS estimates obtained by the deleted-d jackknife method, which are proposed as an alternative to the ridge estimator based on the classical principal components estimator. The authors in [18] also proposed an extention of Kibria Lukman estimator for the gamma regression model in the presence of multicollinearity.

In this study, a new estimator is proposed based on a modified form of the estimated MSE function, rather than using any transformations on existing estimators and also a new fast search algorithm is given to obtain the ridge estimates. We examined the available ridge estimators by simulation and found that the estimated MSEs of these estimators are much larger than theoretical MSE. Furthermore, we also observed similar estimated MSEs for very large or very small ridge estimates. To overcome the stated shortcomings of these estimators, we proposed the new estimator whose estimated MSE is close to the theoretical MSE. When the Monte Carlo studies on ridge estimators are examined, it is seen that they are carried out by assuming that the relationships between all independent variables are equal. However, it is more likely to encounter different levels of relationships between variables with real data sets. In this study, the data generation technique can generate data according to the relationships at different levels between the independent variables as like in real data sets used for assessing the performance of estimators in addition to existing artificial data generation technique. The organization of the paper is as follows: the ridge estimators of concern and proposed estimator are given in Section 2. In Section 3, the Monte Carlo simulation and findings on the performance evaluations of the estimators are discussed. Two real data examples and results are presented in Section 4. Finally, some concluding remarks are given in Section 5.

2. Statistical Methodology

The ridge estimator of is denoted by where is known as “ridge” or “shrinkage” parameter. It is also clear that is a special case of for [4].

The canonical form of equation (1) can be expressed using and aswhere represents the eigenvalues of . is defined as an orthogonal eigenvector matrix of size such that . The ridge estimator of is given as follows:where is the identity matrix of size , , and is the OLS estimator of . The MSE of is given in equation (4) and minimized by (where is the th element of ) [3, 4].

In real data examples, the estimated MSE given in equation (5) is used for performance comparisons of ridge estimators since and are unknown [1].

In Table 1, some of the ridge estimators in the literature are given.

2.1. The Proposed Estimator

Generally, in the literature, the performance of ridge estimators on real datasets is compared by the least squares-based estimated MSE [1, 2, 6, 8, 9, 11, 12, 1418]. From this perspective, we focused on the estimated MSE based on OLS, and considering the simulation results, we found that this approach fails and the estimated MSEs of these estimators are much larger than theoretical MSE. The fact that different ridge estimators give similar estimated MSEs for very large or very small ridge estimates indicates that optimal ridge estimation cannot be obtained considering the convex nature of the MSE function. For the reasons stated, a new ridge estimator named is proposed, based on the optimization of a modified but still convex function of the estimated MSE function. given in equation (6) is obtained by modifying in equation (5). The proposed estimator differs from in three aspects. First, in equation (6) uses the ridge estimates of canonical parameter for given ridge parameter not OLS estimates. All values in each estimator given in Table 1 are the OLS estimates of . Second, the multiplier in variance component is added to take into account the sample size effect. Finally, the term ensures the small ridge estimates; it is clear that large ridge estimates will increase the MSE.

Since the solution of in (6) cannot be obtained analytically, we proposed a fast search algorithm. The steps of the algorithm are as follows at the t-th iteration.

(1)The tolerance value set to Tol = 10−6.
(2)Let and .
(3)Calculate that are canonical ridge estimates for ,
(4)Let .
(5)If Tol, then go to 9; else 6.
(6)If , then and .
(7)If , then and .
(8)Go to 3
(9)If , then else .

3. Simulation Study

This section is about the Monte Carlo simulation including the factors having a significant impact on the estimators and the performance evaluation criteria of the estimators under consideration. We examined the structure of the correlation matrix (type of the data generation), sample size, degree of the multicollinearity, and variances of the error terms as the factors that affect the performance of the estimators.

3.1. Simulation Layout

Many authors assumed that the relationship between all the variables is equal for numerical evaluations of ridge estimators [1, 2, 69]. These works used the data generation method proposed by [19]. We denoted this type of data generation with DG and applied it as follows. The data matrix of size whose columns is the explanatory variable which has bivariate correlations with generated bywhere are produced from the standard normal distribution [19].

We also examined data generating according to different levels of correlations between independent variables. We denoted this type of data generation with CG and applied as follows. The equation can be written provided that is the correlation matrix of and is the Cholesky decomposition of , and is the pseudo random matrix with standard normal distribution. The CG type data generation is examined by considering the correlation matrices determined such as and . These correlation matrices are generated according to the partial correlations specified by the Vine method [20]. The correlation coefficients given in are spread over to a narrow range, while is in a wider range in absolute value.

The simulation study is conducted as follows. First, the four-variable ( = 4) design matrix is generated for each type of data generation. The correlations are considered corresponding to and and according to and for CG. Then, for both types of data generations, the following procedure is applied in the same manner. Each column of is centered and standardized by calculating the z-score and divided by . The dependent variable is also standardized. Thus, and are in a correlation form. is chosen as the normalized eigenvector corresponding to the largest eigenvalue of the matrix. The values of the other selected factors in simulation study are considered as sample size  = 30, 50, and 100 and standard deviation of error term and.

The observations for the dependent variable are generated by (9) where are independent and identically distributed sample from the normal distribution with zero mean and variance.

For selected data generating type and values of and , different samples are generated using equation (9) and the simulation is repeated 10000 times. For each replicate, the average MSE (AMSE) of the estimator given in (10) is used for performance evaluation criteria:where is estimated parameter vector for ridge estimator at the t-th iteration for the true parameter vector .

3.2. Results and Discussion

The results of the simulation based on DG are given in Tables 24. We observed from the results of DG type data generating that has the lowest AMSE value for all given values of , and . The OLS estimator is performed the worst in terms of AMSE criteria. We also observed from Tables 24 that as the degree of correlation increases, AMSEs of , , , , and estimators are decreased for all sample size except  = 0.1 and  = 30.

Generally, the standard deviation of the error terms increases, and the AMSE values also increase. However, the change of AMSE values for , , , and estimators is irregular. As the standard deviation of the error terms increases, the estimator is the most homogeneous estimator in terms of AMSE values for  = 0.95 and  = 0.99, while it is the second best homogeneous estimator for  = 0.999.

We observed from the CG type data generating results given in Tables 57 that the estimator has the lowest AMSE value for all values of , , and . The OLS estimator is also performed the worst among the estimators in terms of AMSE criteria.

It is also observed that as the number of observations increases, the AMSE values of and increase for each correlation structure and values. However, as the number of observations increase for 0.1, 5, and  = 10 values, AMSE of the estimator decreases for both correlation structures.

Generally, we conclude that when increases, AMSE values of all the estimators are increased for all sample size and both correlation structures. It is also observed that AMSE values are higher when the correlation matrix has high variability.

4. Real Data Examples

In the previous section, a detailed simulation study is given to compare the performances of selected estimators. To see the real performances of the estimators, two real data examples are given. The presence of collinearity is determined by the condition number (CN). The square root of the ratio of the largest eigenvalue of the matrix to the smallest eigenvalue is determined as CN. Generally, CN values between 10 and 30 indicate the moderate multicollinearity. The values of the CN greater than 30 were accepted as indicative of strong collinearity [2]. Multicollinearity is high if the CN is between 30 and 100 and severe when its greater than 100 [13].

Unlike Monte Carlo studies, the performance of the estimators was evaluated with the k-fold cross validation (CV) method, considering that the true values of the model parameters are not known in real data applications. In the CV method, the dataset is randomly divided into parts (folds), and each part consists of selected rows of and . For the partitions created, the data set is randomly divided into training and test data times. The number of observations in training and test data is and , respectively. The model estimated from training data is used for estimating the test data [21]. The CV statistic is given in equation (11) where is the j-th observation in the i-th fold, is prediction of j-th observation in the i-th fold using remaining () fold as training data and is used for evaluating the performance of the each ridge estimator.

4.1. Gasoline Consumption Data

Gasoline consumption and automotive data are used in the first example [22]. The variables considered in the model are miles/gallon , displacement (cube inches) , torque (feet/pound) , carburetor (barrels) , number of transmission speed , and overall length (inches) . Correlations between all the variables are given in . The correlations changing moderate to high level among the independent variables are observed in . The condition number is 1132, which indicates the serious concern about multicollinearity. The results of the CV for different partitions are also given in Table 8.

4.2. Stack Loss Data

We used Brownlee’s stack loss data that contain 21-day data on the oxidation of ammonia to nitric acid in the second example [23]. The response variable and independent variables considered in the model are NH3 stack loss percentage , air flow , cooling water inlet temperature , and acid concentration , respectively. Correlations between all the variables are given in where the correlations changing moderate to high level are observed among the independent variables. The condition number is 57.51, which indicates a serious concern about multicollinearity. The results of CV for two partitions are given in Table 9.

According to the real data application results from Tables 8 and 9, the estimator has the smallest CV value among the others for both gasoline consumption and stack loss data examples. The estimator has the largest CV for both applications for all partitions and performed the worst among all the estimators.

A total of 75 different scenarios were evaluated in the simulations, 45 for DG and 30 for CG. The estimator performed the best considering the number of takes places in top three for each scenario in terms of AMSE. When Figure 1 is examined, it is seen that the rankings of the other estimators for both DG and CG are inconsistent. We observe that the second best estimators after are for DG and for CG.

5. Conclusion

In this paper, we introduced a new type iterative ridge estimator. In previous studies, the performance of estimators is discussed only for the same relationships between independent variables [1, 2, 6, 8, 9]. In this study, the performances of the estimators were also compared for the relationships between the independent variables at different levels. We evaluated the performance of the estimator via an extensive Monte Carlo simulation study and two real data examples. Results of DG, which is used as a standard data generation technique in similar studies and CG, which provides more similar data to the real data structures are compared. Generally, in the literature, the performance of ridge estimators on real datasets is compared with that of the estimated MSE [13, 610]. A comparison based on mean square error (MSE) may be appropriate since the actual values of the parameters are known in simulation studies. However, since the actual values of the parameters are unknown in real data sets, performance comparison by estimated MSE based on a single sample may cause erroneous inferences. These situations have question marks in revealing the actual performances of the current estimators. However, since the actual parameter value is not known in practice, this approach will not be as reliable as in the simulation study. For this reason, we compared the performance of the estimators in real data applications with the k-fold cross-validation method. These results show that the proposed estimator outperforms than the others in the sense of AMSE in simulation study and the k-fold cross-validation for the real data examples. Therefore, we suggest researchers who encounter the multicollinearity problem to use as an alternative ridge estimator to the other estimators examined in this study.

In the future, the work may be extended by comparing the proposed estimator with other estimators for multiple linear regressions. In these comparisons, the presence of multicollinearity and outlier can be evaluated simultaneously. The performance of the proposed estimator in different regression models such as Gamma regression can be examined. The proposed estimator can be used to obtain new estimators using the approach including deleted-d Jackknife by [17].

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The author declares that there are no conflicts of interest.