A Stochastic Restricted Principal Components Regression Estimator in the Linear Model
We propose a new estimator to combat the multicollinearity in the linear model when there are stochastic linear restrictions on the regression coefficients. The new estimator is constructed by combining the ordinary mixed estimator (OME) and the principal components regression (PCR) estimator, which is called the stochastic restricted principal components (SRPC) regression estimator. Necessary and sufficient conditions for the superiority of the SRPC estimator over the OME and the PCR estimator are derived in the sense of the mean squared error matrix criterion. Finally, we give a numerical example and a Monte Carlo study to illustrate the performance of the proposed estimator.
In linear regression analysis, the presence of multicollinearity among regressor variables may cause highly unstable least squares estimates of the regression parameters. With multicollinear data, some coefficients may be statistically insignificant and may have the wrong signs. To overcome this problem, different remedial methods have been proposed. One estimation technique designed to combat collinearity is using biased estimators, most notable of which are the Stein estimator by Stein , the principal components regression (PCR) estimator by Massy , the ordinary ridge regression (ORR) estimator by Hoerl and Kennard , and the Liu estimator by Liu . Another method to combat multicollinearity is through the collection and use of additional information, which can be exact or stochastic restrictions . When it comes to stochastic linear restrictions, Durbin , Theil and Goldberger , and Theil  proposed the ordinary mixed estimator (OME) by combining the sample model with stochastic restrictions. Some other important references on this subject are Li and Yang [9, 10], Xu and Yang , Yang and Cui , Yang and Wu , Yang and Xu , and so on.
In this paper, we will introduce a stochastic restricted principal components (SRPC) regression estimator, which is defined by combining in a special way the ordinary mixed estimator and the principal components regression estimator. We will compare the new estimator with the PCR estimator and the OME, respectively, in the sense of the criterion of the mean squared error matrix (MSEM).
The rest of the paper is organized as follows. In Section 2, the new estimator is introduced. In Section 3, some properties of the new estimator are discussed. A numerical example and a Monte Carlo simulation study are given in Section 4.
2. The New Estimator
Let us consider the general linear model where is an observable random vector with the expectation and the covariance matrix , is an known design matrix of rank , is the identity matrix of order , is a vector of unknown parameters, and is an vector of random errors. As is well known, the ordinary least squares estimator (OLSE) of is where .
Let be an orthogonal matrix such that , where and are the eigenvalues of . Further, let be the remaining columns of after having deleted the last columns, where . Thus, we have where .
In addition to model (1), let us give some prior information about in the form of a set of independent stochastic linear restrictions as follows: where is a vector, is a matrix with rank , is a vector of disturbances, and is assumed to be known and positive definite. Furthermore, it is also assumed that the random vector is independent of .
Now, the stochastic restricted principal components (SRPC) regression estimator can be obtained by combing the OME and PCR estimator. Substituting OLSE with PCR estimator in (8), we can get the new estimator as follows:
Now, we can see that is a general estimator which includes the PCR estimator and OME as special cases: if , then ; if , then .
For the sake of convenience, we list some notations and important lemmas needed in the following discussions. For an matrix , means that is symmetric and positive semidefinite and means that is symmetric and positive definite.
Note that for any estimator of , its MSEM is defined as where is the bias of .
Lemma 1. Let , and a vector; then if and only if .
Proof. See Farebrother .
By Lemma 1, the following lemma is straightforward.
Lemma 2. Let , be two homogeneous linear estimators of such that . Then if and only if where , .
Lemma 3. Let , be two matrices with and ; then .
Proof. See Rao and Toutenburg .
3. The Superiority of the New Estimator
The bias vector and the covariance matrix of the SRPC estimator are given by From (15), we can obtain that
Following the above procedure, we can get where .
In order to compare with and in the MSEM sense, now we investigate the following differences: where where
In the following theorems, we will give the necessary and sufficient conditions for the new estimator to be superior to the PCR estimator and OME in the MSEM sense.
Definition 4. Suppose that and are two estimators of ; then is said to be superior to in the MSEM sense if .
Theorem 5. Assume that ; then the SRPC estimator is superior to the PCR estimator in the MSEM sense if and only if where
Theorem 6. Assume that ; then the SRPC estimator is superior to the the mixed estimator in the MSEM sense if and only if
Proof. By Lemma 3, the assumption that implies that . It follows that Therefore, if and only if
4. Numerical Example and Monte Carlo Simulation
In order to illustrate the performance of the proposed estimator, we first consider the real data example which was discussed in Gruber , and the data has also been analyzed by Akdeniz and Erol , Li and Yang , and Chang and Yang,  among others. Now we assemble the data as follows:
In this experiment, we can note from the theorems that the comparison results depend on the unknown parameters and . Consequently, we cannot exclude that our obtained results in the theorems will be held and the results may be changeable. For this, we replace them by their unbiased estimators, that is, the OLS estimators. The results below are all computed by R2.8.0.
From the data, we can obtain the following results:(i)the eigenvalues of : 312.9320, 0.7536, 0.0453, 0.0372, and 0.0019.(ii)the OLS estimator of : ,(iii)the OLS estimator of : ,(iv)the condition number of : .
Following Chang and Yang , we choose the number of the principal components , and consider the following stochastic linear restriction:
The estimated MSE values of PCR, OME, and SRPC are obtained by replacing all unknown parameters by their OLS estimators, respectively. Table 1 gives the results.
To further identify the MSE performance of the new estimator, we are to perform a Monte Carlo simulation study. Specifically, the explanatory variables and the observations are generated by where are independent standard normal pseudorandom numbers, are independent normal pseudorandom numbers with mean zero and variance , and is specified so that the correlation between any two explanatory variables is given by . In addition, a stochastic linear constraint to the model is considered:
In the simulation, we choose , , , , and , . Four different sets of correlations, namely, , , , and , are considered to show the weak, strong, and severe collinearity between the explanatory variables following Liu . We choose the normalized eigenvector corresponding to the largest eigenvalue of as the true value of following Chang and Yang . The experiment is replicated 10000 times by generating new error terms. Then, the estimated MSE for an estimator is calculated as follows: where is the estimator of in the th replication of the experiment and . The simulation results are summarized in Tables 2 and 3, where the condition number of , that is, , is also given.
From the simulation results shown in Tables 2 and 3, we can see that, with the increase of the level of multicollinearity, the estimated MSE values of the three estimators increase in general. However, the proposed estimator SRPC behaves better than the competing estimators in most of the cases. In addition, the more severe the collinearity is, the more pronounced the superiority of SRPC is. Therefore, the proposed estimator is recommended when the explanatory variables are moderately or severely collinear.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
The work is supported by the Natural Science Foundation of Anhui Province (Grant no. 1308085QA13), the Key Project of Natural Science Foundation of Universities in Anhui Province (Grant no. KJ2012A135), and the Key Project of Distinguished Young Talents of Universities in Anhui Province (Grant no. 2012SQRL028ZD).
C. Stein, “Inadmissibility of the usual estimator for mean of multivariate normal distribution,” in Proceedings of the 3rd Berkley Symposium on Mathematical and Statistics Probability, J. Neyman, Ed., vol. 1, pp. 197–206, 1956.View at: Google Scholar
A. E. Hoerl and R. W. Kennard, “Ridge regression: biased estimation for nonorthogonal problems,” Technometrics, vol. 12, no. 1, pp. 55–67, 2000.View at: Google Scholar
C. R. Rao and H. Toutenburg, Linear Models: Least Squares and Alternatives, Springer, New York, NY, USA, 1995.
R. W. Farebrother, “Further results on the mean square error of ridge regression,” Journal of the Royal Statistical Society B, vol. 38, pp. 248–250, 1976.View at: Google Scholar
M. H. J. Gruber, Improving Efficiency by Shrinkage: The James-Stein and Ridge Regression Estimators, Marcel Dekker, New York, NY, USA, 1998.