Table of Contents Author Guidelines Submit a Manuscript
The Scientific World Journal
Volume 2014 (2014), Article ID 231506, 6 pages
http://dx.doi.org/10.1155/2014/231506
Research Article

A Stochastic Restricted Principal Components Regression Estimator in the Linear Model

Department of Statistics, Anhui Normal University, Wuhu 241000, China

Received 2 August 2013; Accepted 4 November 2013; Published 23 January 2014

Academic Editors: M. Blank, J. De Brabanter, and C. Neves

Copyright © 2014 Daojiang He and Yan Wu. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

We propose a new estimator to combat the multicollinearity in the linear model when there are stochastic linear restrictions on the regression coefficients. The new estimator is constructed by combining the ordinary mixed estimator (OME) and the principal components regression (PCR) estimator, which is called the stochastic restricted principal components (SRPC) regression estimator. Necessary and sufficient conditions for the superiority of the SRPC estimator over the OME and the PCR estimator are derived in the sense of the mean squared error matrix criterion. Finally, we give a numerical example and a Monte Carlo study to illustrate the performance of the proposed estimator.

1. Introduction

In linear regression analysis, the presence of multicollinearity among regressor variables may cause highly unstable least squares estimates of the regression parameters. With multicollinear data, some coefficients may be statistically insignificant and may have the wrong signs. To overcome this problem, different remedial methods have been proposed. One estimation technique designed to combat collinearity is using biased estimators, most notable of which are the Stein estimator by Stein [1], the principal components regression (PCR) estimator by Massy [2], the ordinary ridge regression (ORR) estimator by Hoerl and Kennard [3], and the Liu estimator by Liu [4]. Another method to combat multicollinearity is through the collection and use of additional information, which can be exact or stochastic restrictions [5]. When it comes to stochastic linear restrictions, Durbin [6], Theil and Goldberger [7], and Theil [8] proposed the ordinary mixed estimator (OME) by combining the sample model with stochastic restrictions. Some other important references on this subject are Li and Yang [9, 10], Xu and Yang [11], Yang and Cui [12], Yang and Wu [13], Yang and Xu [14], and so on.

In this paper, we will introduce a stochastic restricted principal components (SRPC) regression estimator, which is defined by combining in a special way the ordinary mixed estimator and the principal components regression estimator. We will compare the new estimator with the PCR estimator and the OME, respectively, in the sense of the criterion of the mean squared error matrix (MSEM).

The rest of the paper is organized as follows. In Section 2, the new estimator is introduced. In Section 3, some properties of the new estimator are discussed. A numerical example and a Monte Carlo simulation study are given in Section 4.

2. The New Estimator

Let us consider the general linear model where is an observable random vector with the expectation and the covariance matrix , is an known design matrix of rank , is the identity matrix of order , is a vector of unknown parameters, and is an vector of random errors. As is well known, the ordinary least squares estimator (OLSE) of is where .

Let be an orthogonal matrix such that , where and are the eigenvalues of . Further, let be the remaining columns of after having deleted the last columns, where . Thus, we have where .

In addition to model (1), let us give some prior information about in the form of a set of independent stochastic linear restrictions as follows: where is a vector, is a matrix with rank , is a vector of disturbances, and is assumed to be known and positive definite. Furthermore, it is also assumed that the random vector is independent of .

For model (1), Massy [2] introduced the PCR estimator as Xu and Yang [11] showed that the PCR estimator could be rewritten as follows:

For model (1) with the stochastic restrictions (4), the OME is given by Özkale [15] showed that the OME could be rewritten as Noting that , it can be shown that

Now, the stochastic restricted principal components (SRPC) regression estimator can be obtained by combing the OME and PCR estimator. Substituting OLSE with PCR estimator in (8), we can get the new estimator as follows:

Now, we can see that is a general estimator which includes the PCR estimator and OME as special cases: if , then ; if , then .

For the sake of convenience, we list some notations and important lemmas needed in the following discussions. For an matrix , means that is symmetric and positive semidefinite and means that is symmetric and positive definite.

Note that for any estimator of , its MSEM is defined as where is the bias of .

Lemma 1. Let , and a vector; then if and only if .

Proof. See Farebrother [16].

By Lemma 1, the following lemma is straightforward.

Lemma 2. Let , be two homogeneous linear estimators of such that . Then if and only if where , .

Lemma 3. Let , be two matrices with and ; then .

Proof. See Rao and Toutenburg [5].

3. The Superiority of the New Estimator

The bias vector and the covariance matrix of the SRPC estimator are given by From (15), we can obtain that

Following the above procedure, we can get where .

In order to compare with and in the MSEM sense, now we investigate the following differences: where where

In the following theorems, we will give the necessary and sufficient conditions for the new estimator to be superior to the PCR estimator and OME in the MSEM sense.

Definition 4. Suppose that and are two estimators of ; then is said to be superior to in the MSEM sense if .

Theorem 5. Assume that ; then the SRPC estimator is superior to the PCR estimator in the MSEM sense if and only if where

Proof. Noting that , it follows that By Lemma 3 and the assumption that , we have . Consequently, by Lemma 2, if and only if Thus, the proof of Theorem 5 is completed.

Theorem 6. Assume that ; then the SRPC estimator is superior to the the mixed estimator in the MSEM sense if and only if

Proof. By Lemma 3, the assumption that implies that . It follows that Therefore, if and only if

4. Numerical Example and Monte Carlo Simulation

In order to illustrate the performance of the proposed estimator, we first consider the real data example which was discussed in Gruber [17], and the data has also been analyzed by Akdeniz and Erol [18], Li and Yang [9], and Chang and Yang, [19] among others. Now we assemble the data as follows:

In this experiment, we can note from the theorems that the comparison results depend on the unknown parameters and . Consequently, we cannot exclude that our obtained results in the theorems will be held and the results may be changeable. For this, we replace them by their unbiased estimators, that is, the OLS estimators. The results below are all computed by R2.8.0.

From the data, we can obtain the following results:(i)the eigenvalues of : 312.9320, 0.7536, 0.0453, 0.0372, and 0.0019.(ii)the OLS estimator of : ,(iii)the OLS estimator of : ,(iv)the condition number of : .

Following Chang and Yang [19], we choose the number of the principal components , and consider the following stochastic linear restriction:

The estimated MSE values of PCR, OME, and SRPC are obtained by replacing all unknown parameters by their OLS estimators, respectively. Table 1 gives the results.

tab1
Table 1: Estimated MSE values of the OME, PCR, and SRPC.

From Table 1, we can observe that the estimated MSE value of the new estimator is smaller than those of PCR and OME, which is in accordance with the theoretical findings in Theorems 5 and 6.

To further identify the MSE performance of the new estimator, we are to perform a Monte Carlo simulation study. Specifically, the explanatory variables and the observations are generated by where are independent standard normal pseudorandom numbers, are independent normal pseudorandom numbers with mean zero and variance , and is specified so that the correlation between any two explanatory variables is given by . In addition, a stochastic linear constraint to the model is considered:

In the simulation, we choose , , , , and ,   . Four different sets of correlations, namely, , , , and , are considered to show the weak, strong, and severe collinearity between the explanatory variables following Liu [20]. We choose the normalized eigenvector corresponding to the largest eigenvalue of as the true value of following Chang and Yang [19]. The experiment is replicated 10000 times by generating new error terms. Then, the estimated MSE for an estimator is calculated as follows: where is the estimator of in the th replication of the experiment and . The simulation results are summarized in Tables 2 and 3, where the condition number of , that is, , is also given.

tab2
Table 2: Estimated MSE with .
tab3
Table 3: Estimated MSE with .

From the simulation results shown in Tables 2 and 3, we can see that, with the increase of the level of multicollinearity, the estimated MSE values of the three estimators increase in general. However, the proposed estimator SRPC behaves better than the competing estimators in most of the cases. In addition, the more severe the collinearity is, the more pronounced the superiority of SRPC is. Therefore, the proposed estimator is recommended when the explanatory variables are moderately or severely collinear.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

The work is supported by the Natural Science Foundation of Anhui Province (Grant no. 1308085QA13), the Key Project of Natural Science Foundation of Universities in Anhui Province (Grant no. KJ2012A135), and the Key Project of Distinguished Young Talents of Universities in Anhui Province (Grant no. 2012SQRL028ZD).

References

  1. C. Stein, “Inadmissibility of the usual estimator for mean of multivariate normal distribution,” in Proceedings of the 3rd Berkley Symposium on Mathematical and Statistics Probability, J. Neyman, Ed., vol. 1, pp. 197–206, 1956. View at Google Scholar
  2. W. F. Massy, “Principal components regression in exploratory statistical research,” Journal of the American Statistical Association, vol. 60, no. 309, pp. 234–266, 1965. View at Publisher · View at Google Scholar
  3. A. E. Hoerl and R. W. Kennard, “Ridge regression: biased estimation for nonorthogonal problems,” Technometrics, vol. 12, no. 1, pp. 55–67, 2000. View at Google Scholar · View at Scopus
  4. K. Liu, “A new class of biased estimate in linear regression,” Communications in Statistics—Theory and Methods, vol. 22, no. 2, pp. 393–402, 1993. View at Publisher · View at Google Scholar
  5. C. R. Rao and H. Toutenburg, Linear Models: Least Squares and Alternatives, Springer, New York, NY, USA, 1995.
  6. J. Durbin, “A note on regression when there is extraneous information about one of the coefficients,” Journal of the American Statistical Association, vol. 48, no. 264, pp. 799–808, 1953. View at Publisher · View at Google Scholar
  7. H. Theil and A. S. Goldberger, “On pure and mixed statistical estimation in economics,” International Economic Review, vol. 2, no. 1, pp. 65–78, 1961. View at Publisher · View at Google Scholar
  8. H. Theil, “On the use of incomplete prior information in regression analysis,” Journal of the American Statistical Association, vol. 58, no. 302, pp. 401–414, 1963. View at Publisher · View at Google Scholar
  9. Y. Li and H. Yang, “A new stochastic mixed ridge estimator in linear regression model,” Statistical Papers, vol. 51, no. 2, pp. 315–323, 2010. View at Publisher · View at Google Scholar · View at Scopus
  10. Y. Li and H. Yang, “A new ridge-type estimator in stochastic restricted linear regression,” Statistics, vol. 45, no. 2, pp. 123–130, 2011. View at Publisher · View at Google Scholar · View at Scopus
  11. J. Xu and H. Yang, “On the restricted r-k class estimator and the restricted r-d class estimator in linear regression,” Journal of Statistical Computation and Simulation, vol. 81, no. 6, pp. 679–691, 2011. View at Publisher · View at Google Scholar · View at Scopus
  12. H. Yang and J. Cui, “A stochastic restricted two-parameter estimator in linear regression model,” Communications in Statistics—Theory and Methods, vol. 40, no. 13, pp. 2318–2325, 2011. View at Publisher · View at Google Scholar · View at Scopus
  13. H. Yang and J. Wu, “A stochastic restricted k-d class estimator,” Statistics, vol. 46, no. 6, pp. 759–766, 2012. View at Publisher · View at Google Scholar
  14. H. Yang and J. Xu, “An alternative stochastic restricted Liu estimator in linear regression,” Statistical Papers, vol. 50, no. 3, pp. 639–647, 2009. View at Publisher · View at Google Scholar · View at Scopus
  15. M. R. Özkale, “A stochastic restricted ridge regression estimator,” Journal of Multivariate Analysis, vol. 100, no. 8, pp. 1706–1716, 2009. View at Publisher · View at Google Scholar · View at Scopus
  16. R. W. Farebrother, “Further results on the mean square error of ridge regression,” Journal of the Royal Statistical Society B, vol. 38, pp. 248–250, 1976. View at Google Scholar
  17. M. H. J. Gruber, Improving Efficiency by Shrinkage: The James-Stein and Ridge Regression Estimators, Marcel Dekker, New York, NY, USA, 1998.
  18. F. Akdeniz and H. Erol, “Mean squared error matrix comparisons of some biased estimators in linear regression,” Communications in Statistics—Theory and Methods, vol. 32, no. 12, pp. 2389–2413, 2003. View at Publisher · View at Google Scholar · View at Scopus
  19. X. Chang and H. Yang, “Combining two-parameter and principal component regression estimators,” Statistical Papers, vol. 53, pp. 549–562, 2012. View at Publisher · View at Google Scholar · View at Scopus
  20. K. Liu, “Using Liu-type estimator to combat collinearity,” Communications in Statistics—Theory and Methods, vol. 32, no. 5, pp. 1009–1020, 2003. View at Publisher · View at Google Scholar · View at Scopus