Abstract

The general linear regression model has been one of the most frequently used models over the years, with the ordinary least squares estimator (OLS) used to estimate its parameter. The problems of the OLS estimator for linear regression analysis include that of multicollinearity and outliers, which lead to unfavourable results. This study proposed a two-parameter ridge-type modified M-estimator (RTMME) based on the M-estimator to deal with the combined problem resulting from multicollinearity and outliers. Through theoretical proofs, Monte Carlo simulation, and a numerical example, the proposed estimator outperforms the modified ridge-type estimator and some other considered existing estimators.

1. Introduction

A multiple linear regression model can be defined mathematically aswhere is an n × 1 vector of observations referred to as the dependent variable; is a known full column rank of n × pstandardized and centered explanatory variable matrix; is an p × 1 vector of unknown parameters; and is an n × 1 vector of disturbances with and dispersion matrix , I is the n × n identity matrix. The ordinary least squares estimator (OLSE) of is given as

According to Gauss–Markov theorem, OLS estimator is the best linear unbiased estimator (BLUE) possessing minimum variance in the class of all unbiased linear estimators [1, 2]. However, the performance of the estimator is imprecise in the presence of multicollinearity [3]. Biased estimators such as ridge regression estimator [4], Liu estimator [5] and Stein estimator [6], principal component estimator [7], modified ridge regression estimator [8], and others are often employed to tackle this problem. Another factor whose presence can negatively influence the regression coefficients of the OLS estimator is the outlier. The general practice in the literature is that one adopts robust estimators as an alternative to the OLS estimator. The M-estimator is popularly used to handle outlier in the y-direction [9].

Hoerl and Kennard [4] defined the ridge estimator (RE) aswhere and . However, RE can be sensitive to outliers in the y-direction; a remedial measure is the ridge M-estimator (RME) suggested by Silvapulle [10] aswhere is the M-estimator of β [11].

Dorugade [12] modified the ridge estimator and defined it aswhere and d introduced as an additional basing parameter.

Following Dorugade [12], Lukman et al. [13] modified the ridge estimator in (3) and called it the modified ridge-type estimator (MRT). The estimator is defined aswhere.

The organization of the paper is as follows. We proposed the new estimator in Section 2 and provided a theoretical comparison among the estimators in Section 3. We discussed the robust choice of the biasing parameters in Section 4 and conducted simulation studies in Section 5 to evaluate the performance of the proposed estimator. A real-life data set is analyzed in Section 6 to illustrate the findings in the paper, and Section 7 ends with some concluding remarks.

2. A New Estimator

The presence of outliers in the y-direction affects the performance of the MRT estimator. Therefore, we suggest a ridge-type modified M-estimator (RTMME). This is defined aswhere . It appears that is a general estimator, which includes and :

The canonical form of model (1) is written aswhere , and is the orthogonal matrix whose columns contain the eigenvectors of . Then,where are the ordered eigenvalues of .

Let be M-defined by the solution of the M-estimating equations , where is an estimator of scale for the errors and is some suitably chosen function [14]. The estimators presented in equations (2)–(7) can be written in canonical form as follows:where , , and .

3. Superiority of the New Estimator

The mean square error (MSE) criterion is used to compare the performance of the estimators. The following conditions are imposed to present our main theorems:(i) is skew-symmetric and nondecreasing(ii)The errors are symmetric(iii) is finite

Note that any estimator of α has a corresponding relation such that . Thus, it is sufficient to consider the canonical form only. The MSEs of the aforementioned estimators are derived to bewhere .

Theorem 1. For, then for every k > 0 and i= 1, 2, …, p, where are the diagonal elements of .

Proof. After some algebraic manipulation, the difference between givesFor equation (14) to be less than zero, we should have , which also implies for and for i = 1, 2, …, p.

Theorem 2. When , there exists a positive constant , where

Proof. The difference is strictly less than zero if and only if, after simplification, the following expression holds:Solving inequality (17) for k, we getNotice that if , and there is, therefore, a positive constant .

Theorem 3. A necessary condition for is

Proof. To obtain ,Theorem 2 provided such that . Besides this, in the Theorem of Silvapulle [10] (part (i), pg. 321), it is indicated that there exists such that . Thus, we obtain the corollary as follows.

Corollary. There exists k > 0 such that.

4. Robust Choice of k and d for

For the robust biasing parameters k and d for the modified two-parameter estimator, the optimal values can be determined by minimizing equation (23) with respect to each of the parameters:

This can be obtained by solving

By doing this, we have

We substitute and into equations (24) and (25) with their corresponding estimates. We assume that is normally distributed with mean α and covariance matrix . This assumption holds since , wherewith the scale estimate . Thus, the estimate of , and the unbiased estimator of is asymptotically , where is given by Huber [9] as

We get the optimal estimator of and as

Following Kibria [15], the arithmetic and geometric mean version of k is obtained, respectively, as

The harmonic mean version is generally preferred to other versions [3]. Hence, the robust harmonic mean version of the proposed d and k from (31) and (32) is obtained as

The selection of the estimators of the parameters d and k can be obtained iteratively as follows:Step 1: use to obtain an initial estimate for dStep 2: from (33), get using d in Step 1Step 3: calculate in (34) by using in Step 2Step 4: use in Step 1 if  < 0

5. Monte Carlo Simulation Study

We adopted the simulation design by McDonald and Galarneau [16], Kibria [15], and Lukman et al. [17]. The explanatory variables are generated using the following equation:where denotes the correlation between explanatory variables and are pseudo-random numbers from the standard normal distribution. The coefficients are selected as the normalized eigenvectors corresponding to the largest eigenvalue of so that we have , which is a common restriction in simulation studies of this type ([3]). The dependent variable is then determined usingwhere the error term is generated with mean and variance 0 and , respectively. We fixed the number of explanatory variables to three and seven (p = 3, 7), and other parameters such as ρ, σ, and n were varied; their values considered in this study are given as follows: and 0.99The standard deviation () of the error term in this simulation study is

We considered three different cases in this study:Case I: no outlierCase II: one outlierCase III: two outliers

In the case of no outliers, equation (36) is taken into consideration. For the case of one outlier, the tenth observation is changed as . For the case of two outliers, the fifth and the tenth observations are changed as and , respectively. The experiment is replicated 2,000 times by generating new pseudo-random numbers, and the estimated MSE is calculated as

The results of the simulation are presented in Tables 118. As expected, the OLSE is observed to have the least performance. The following observations are also made:(i)As the error standard deviation () and the degree of multicollinearity () increase, the MSEs of the estimators () increase.(ii)As the biasing parameters k and d increase, the MSEs of the estimators () also decrease.(iii)The MSEs of the estimators (, and ) decrease as the sample size increases. However, as the number of outliers increases, the MSEs also increase.(iv)Finally, just as in the outcome of Lukman et al. [13], the MRT estimator outperforms other estimators considered in the case of no outlier. However, when outliers were introduced, the RTMME outperforms other considered estimators.

6. Numerical Example

Hussein data were originally adopted by Hussein and Zari [18] and later by Lukman et al. [19, 20]. The data contain 31 observations and three explanatory variables. The variance inflation factor for the three explanatory variables is greater than 10 (VIF> 10), which indicates multicollinearity. Lukman et al. [20] identified the following observations as outliers in the y-direction: 12, 14, 15, 16, 30, and 31. Hence, this data set suffers both problems considered in this study. Table 19 shows that the estimated MSE value of the new estimator, RTMME, is smaller compared to the ridge, M-ridge, and MRT estimator.

The theoretical results in this study are validated through the Hussein data as follows:(i); thus, for every k > 0. By Theorem 1, this implies that the RTMME outperforms the MRT estimator.(ii)From Theorem 2, k1i is calculated as [0.2470, 0.0520, 0.0519, 0.0002]. For k > 1.2980 (k> k1i), (iii)The necessary condition from Theorem 3 is also proven from =>.

7. Conclusion

We proposed a two-parameter ridge-type modified M-estimator to jointly handle the problem of multicollinearity and outliers in a linear regression model. Theoretically, the new estimator outperforms the existing estimators under certain conditions. The results of the simulation study and numerical example agree with the theoretical findings. A right choice of k and d also produces better estimates using the proposed estimator. Thus, in the presence of multicollinearity and outliers, this estimator can effectively replace the following estimators: the ordinary least squares estimator, the M-estimator, the ridge estimator, the M-ridge estimator, and the modified ridge-type estimator.

Data Availability

The data used to support the findings of this study are available in page 7 of [19].

Conflicts of Interest

The authors declare that they have no conflicts of interest.