Abstract

The analysis of misspecification was extended to the recently introduced stochastic restricted biased estimators when multicollinearity exists among the explanatory variables. The Stochastic Restricted Ridge Estimator (SRRE), Stochastic Restricted Almost Unbiased Ridge Estimator (SRAURE), Stochastic Restricted Liu Estimator (SRLE), Stochastic Restricted Almost Unbiased Liu Estimator (SRAULE), Stochastic Restricted Principal Component Regression Estimator (SRPCRE), Stochastic Restricted - (SRrk) class estimator, and Stochastic Restricted - (SRrd) class estimator were examined in the misspecified regression model due to missing relevant explanatory variables when incomplete prior information of the regression coefficients is available. Further, the superiority conditions between estimators and their respective predictors were obtained in the mean square error matrix (MSEM) sense. Finally, a numerical example and a Monte Carlo simulation study were used to illustrate the theoretical findings.

1. Introduction

Misspecification due to left out relevant explanatory variables is very often when considering the linear regression model, which causes these variables to become a part of the error term. Consequently, the expected value of error term of the model will not be zero. Also, the omitted variables may be correlated with the variables in the model. Therefore, one or more assumptions of the linear regression model will be violated when the model is misspecified, and hence the estimators become biased and inconsistent. Further, it is well-known that the ordinary least squares estimator (OLSE) may not be very reliable if multicollinearity exists in the linear regression model. As a remedial measure to solve multicollinearity problem, biased estimators based on the sample model with prior information which can be exact or stochastic restrictions have received much attention in the statistical literature. The intention of this work is to examine the performance of the recently introduced stochastic restricted biased estimators in the misspecified regression model with incomplete prior knowledge about regression coefficients when there exists multicollinearity among explanatory variables.

When we consider the biased estimation in misspecified regression model without any restrictions on regression parameters, Sarkar [1] discussed the consequences of exclusion of some important explanatory variables from a linear regression model when multicollinearity exists. Şiray [2] and Wu [3] examined the efficiency of the - class estimator and - class estimator over some existing estimators, respectively, in the misspecified regression model. Chandra and Tyagi [4] studied the effect of misspecification due to the omission of relevant variables on the dominance of the class estimator. Recently, Kayanan and Wijekoon [5] examined the performance of existing biased estimators and the respective predictors based on the sample information in a misspecified linear regression model without considering any prior information about regression coefficients.

It is recognized that the mixed regression estimator (MRE) introduced by Theil and Goldberger [6] outperforms ordinary least squares estimator (OLSE) when the regression model is correctly specified. The biased estimation with stochastic linear restrictions in the misspecified regression model due to inclusion of an irrelevant variable with the incorrectly specified prior information was discussed by Teräsvirta [7]. Later Mittelhammer [8], Ohtani and Honda [9], Kadiyala [10], and Trenkler and Wijekoon [11] discussed the efficiency of MRE under misspecified regression model due to exclusion of a relevant variable with correctly specified prior information. Further, the superiority of MRE over the OLSE under the misspecified regression model with incorrectly specified sample and prior information was discussed by Wijekoon and Trenkler [12]. Hubert and Wijekoon [13] have considered the improvement of Liu estimator (LE) under a misspecified regression model with stochastic restrictions and introduced the Stochastic Restricted Liu Estimator (SRLE).

In this paper, the performance of the recently introduced stochastic restricted estimators, namely, the Stochastic Restricted Ridge Estimator (SRRE) proposed by Li and Yang [14], Stochastic Restricted Almost Unbiased Ridge Estimator (SRAURE), and Stochastic Restricted Almost Unbiased Liu Estimator (SRAULE) proposed by Wu and Yang [15], Stochastic Restricted Principal Component Regression Estimator (SRPCRE) proposed by He and Wu [16], Stochastic Restricted - (SRrk) class estimator, and Stochastic Restricted - (SRrd) class estimator proposed by Wu [17], was examined in the misspecified regression model when multicollinearity exists among explanatory variables. Further, a generalized form to represent these estimators is also proposed.

The rest of this article is organized as follows. The model specification and the estimators are written in Section 2. In Section 3, the mean square error matrix (MSEM) comparison between two estimators and respective predictors is considered. In Section 4, a numerical example and a Monte Carlo simulation study are given to illustrate the theoretical results in Scalar Mean Square Error (SMSE) criterion. Finally, some concluding remarks are mentioned in Section 5. The references and appendixes are given at the end of the paper.

2. Model Specification and the Estimators

Assume that the true regression model is given by where is the vector of observations on the dependent variable, and are the and matrices of observations on the regressors, and are the and vectors of unknown coefficients, and is the vector of disturbances such that and .

Let us say that the researcher misspecifies the regression model by excluding regressors as Let us also assume that there exists prior information on in the form ofwhere is the vector, is the given matrix with rank , is the unknown fixed vector, is the vector of disturbances such that , , where is positive definite, and

By combining sample model (2) and prior information (3), Theil and Goldberger [6] proposed the mixed regression estimator (MRE) asTo combat multicollinearity, several researchers introduce different types of stochastic restricted estimators in place of MRE. Seven such estimators are SRRE, SRAURE, SRLE, SRALUE, SRPCRE, SRrk class estimator, and SRrd class estimator defined below, respectively:where , , and are the first columns of which is an orthogonal matrix of the standardized eigenvectors of .

According to Kadiyala [10], now we apply the simultaneous decomposition to the two symmetric matrices and , aswhere is a positive definite matrix and is a positive semidefinite matrix, is a nonsingular matrix, and is a diagonal matrix with eigenvalues for and for .

Let , , , , and ; then the models (1), (2), and (3) can be written asAccording to Wijekoon and Trenkler [12], the corresponding MRE is given byHence, the respective expectation vector, bias vector, and dispersion matrix are given byIn the case of misspecification, now the SRRE, SRAURE, SRLE, SRAULE, SRPCRE, SRrk, and SRrd for model (7) can be written asrespectively, where , , , , , , and .

It is clear that , , , and are positive definite and , , and are nonnegative definite.

Since all these estimators can be written by incorporating , now we write a generalized form to represent SRRE, SRAURE, SRLE, SRAULE, SRPCRE, SRrk, and SRrd as given below: where is positive definite matrix if it stands for , , , and , and it is nonnegative definite matrix if it stands for , , and .

Now the expectation vector, bias vector, the dispersion matrix, and the mean square error matrix can be written aswhere and .

Based on (14), the respective bias vector, dispersion matrix, and MSEM of the MRE, SRRE, SRAURE, SRLE, SRAULE, SRPCRE, SRrk, and SRrd can easily be obtained and are given in Table B1 in Appendix B.

By using the approach of Kadiyala [10] and (3) and (4), the generalized prediction function can be defined as follows:where is the actual value and is the corresponding predictor.

The MSEM of the generalized predictor is given byNote that the predictors based on the MRE, SRRE, SRAURE, SRLE, SRAULE, SRPCRE, SRrk, and SRrd are denoted by , , , , , , , and , respectively.

3. Mean Square Error Matrix (MSEM) Comparisons

If two generalized biased estimators and are given, the estimator is said to be superior to   with respect to MSEM sense if and only if . Also, if two generalized predictors and are given, the predictor is said to be superior to with respect to MSEM sense if and only if .

Now let , , , and .

By applying Lemma A1 (see Appendix A), the following theorem can be stated for the superiority of over with respect to the MSEM criterion.

Theorem 1. If is positive definite, then is superior to in MSEM sense when the regression model is misspecified due to excluding relevant variables if and only if

Proof. Let be a positive definite matrix. According to Lemma A1 (see Appendix A), is nonnegative definite matrix if . This completes the proof.

The following theorem can be stated for the superiority of over with respect to the MSEM criterion.

Theorem 2. If , is superior to in MSEM sense when the regression model is misspecified due to excluding relevant variables if and only if and , where , , and stands for column space of and is an independent choice of -inverse of .

Proof. According to (16), we can write asAfter some straight forward calculation, it can be written aswhere and .
Due to Lemma A3 (see Appendix A), is nonnegative definite matrix if and only if , and , where stands for column space of and is an independent choice of -inverse of . This completes the proof.

Based on Theorems 1 and 2, we can define Corollaries C1C28, written in Appendix C, for the superiority conditions between two selected estimators and for the respective predictors by substituting the relevant expressions for , , , and given in Table B1 in Appendix B.

4. Illustration of Theoretical Results

4.1. Numerical Example

To illustrate the theoretical results, we considered the dataset which gives the total National Research and Development Expenditures as a Percent of Gross National Product by Country from 1972 to 1986. The dependent variable of this dataset is the percentage spent by the United States, and the four other independent variables are , , , and . The variable represents the percent spent by the former Soviet Union, that spent by France, that spent by West Germany, and that spent by the Japan. The data has been analysed by Gruber [18], Akdeniz and Erol [19], and Li and Yang [14], among others. Now we assemble the data as follows: Note that the eigenvalues of are 302.96, 0.728, 0.044, and 0.035, the condition number is 93, and the Variance Inflation Factor (VIF) values are 6.91, 21.58, 29.75, and 1.79. This implies the existence of serious multicollinearity in the dataset.

The corresponding OLS estimator of is and the estimate of is . In this example, we consider and . The SMSE values of the estimators are summarized in Tables B2-B3 in Appendix B.

Table B2 shows the estimated SMSE values of MRE, SRRE, SRAURE, SRLE, SRAULE, SRPCRE, SRrk, and SRrd for the regression model when , , and with respect to shrinkage parameters , where denotes the number of variables in the model and denotes the number of misspecified variables. Table B3 shows the estimated SMSE values of the predictor of MRE, SRRE, SRAURE, SRLE, SRAULE, SRPCRE, SRrk, and SRrd for the regression model when , , and for some selected shrinkage parameters .

Note that when the model is correctly specified, when one variable is omitted from the model, and when two variables are omitted from the model. For simplicity, we choose shrinkage parameter values and in the range .

From Table B2, we can observe that the MRE is superior to the other estimators when and SRAULE, SRRE, SRLE, and SRAURE outperform the other estimators for , , , and , respectively, when . Similarly, SRLE and SRRE are superior to the other estimators for and , respectively, when .

From Table B3, we further observe that predictors based on SRLE and SRRE outperform the other predictors for and , respectively, when and , and predictors based on SRrd and SRrk are superior to the other predictors for and , respectively, when .

4.2. Simulation

For further clarification, a Monte Carlo simulation study is done at different levels of misspecification using R 3.2.5. Following McDonald and Galarneau [20], we can generate the explanatory variables as follows:where is an independent standard normal pseudorandom number and is specified so that the theoretical correlation between any two explanatory variables is given by . A dependent variable is generated by using the following equation:where is a normal pseudorandom number with mean zero and variance one. Also, we select as the normalized eigenvector corresponding to the largest eigenvalue of for which 1. Further we choose and .

Then the following setup is considered to investigate the effects of different degrees of multicollinearity on the estimators:(i), condition number = 9.49, and VIF = .(ii), condition number = 34.77, and VIF = .(iii), condition number = 115.66, and VIF = .

Three different sets of observations are considered by selecting , , and when , where denotes the number of variables in the model and denotes the number of misspecified variables. Note that when the model is correctly specified, when one variable is omitted from the model, and when two variables are omitted from the model. For simplicity, we select values and in the range .

The simulation is repeated 2000 times by generating new pseudorandom numbers and the simulated SMSE values of the estimators and predictors are obtained using the following equations:The simulation results are summarized in Tables B4B9 in Appendix B.

Tables B4, B5, and B6 show the estimated SMSE values of the estimators for the regression model when , , and and , and for the selected values of shrinkage parameters , respectively. Tables B7, B8, and B9 show the corresponding estimated SMSE values of the predictors for the above regression models, respectively.

From Table B4, we can observe that MRE and SRAULE outperform the other estimators for and , respectively, when and . Further, SRLE and SRRE are superior to the other estimators for and , respectively, when under .

From Table B5, we can observe that SRAULE, MRE, and SRAURE outperform the other estimators for , , and , respectively, when . Similarly, SRAULE, SRRE, SRLE, and SRAURE are superior to the other estimators when , , , and , respectively, when , and both SRLE and SRRE outperform the other estimators for and , respectively, when and .

The results in Table B6 indicate that MRE is superior to the other estimators when , and SRAULE, SRRE, SRLE, and SRAURE outperform the other estimators for , , , and , respectively, when . Further, SRLE and SRRE outperform the other estimators for and , respectively, when and .

From Tables B7B9, we further observe that the predictors based on SRrd and SRrk always outperform the other predictors for and , respectively, when , , and .

The SMSE values of the selected estimators are plotted with different ρ values to demonstrate the results graphically when . Figures 13 show the graphical illustration of the performance of estimators in the misspecified regression model when , , and , respectively. Similarly, Figures 46 present the graphical illustration of the performance of predictors in the misspecified regression model when , , and , respectively.

5. Conclusion

Theorems 1 and 2 give the common form of superiority conditions to compare the estimators (MRE, SRRE, SRAURE, SRLE, SRAULE, SRPCRE, SRrk, and SRrd) and their respective predictors in MSEM criterion in the misspecified linear regression model when the prior information of the regression coefficients is incomplete, and the multicollinearity exists among the explanatory variables.

From the simulation study, the superior estimators and predictors over the others when the conditions are different can be identified. The results obtained in this research will produce significant improvements in the parameter estimation in misspecified regression models with incomplete prior information, and the results are applicable to real-world applications.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Supplementary Materials

Supplementary material contains three appendix sections named Appendices  A,  B, and  C. Appendix  A: Lemmas  A1–A3, used to prove the theorems. Appendix  B: Tables B1–B9, which show the stochastic properties of estimators and results of numerical example and simulation study. Appendix  C: Corollaries  C1–C28. (Supplementary Materials)