Abstract
We consider two well-known facts in econometrics: (i) the failure of the orthogonality assumption (i.e., no independence between the regressors and the error term), which implies biased and inconsistent Least Squares (LS) estimates and (ii) the consequences of using nonstationary variables, acknowledged since the seventies; LS might yield spurious estimates when the variables do have a trend component, whether stochastic or deterministic. In this work, an optimistic corollary is provided: it is proven that the LS regression, employed in nonstationary and cointegrated variables where the orthogonality assumption is not satisfied, provides estimates that converge to their true values. Monte Carlo evidence suggests that this property is maintained in samples of a practical size.
1. Introduction
Two well-known facts lie behind this work: (i) the behavior of LS estimates whenever variables are nonstationary and (ii) the failure of the orthogonality assumption between independent variables and the error term, also in an LS regression. (1)The reappraisal of the impact of unit roots in time-series observations, initiated in the late seventies, had profound consequences for modern econometrics. It became clear that (i) insufficient attention was being paid to trending mechanisms and (ii) most macroeconomic variables are probably nonstationary; such an appraisal gave rise to an extraordinary development that substantially modified the way empirical studies in time-series econometrics are carried out. Research into nonstationarity has advanced significantly since it was reassessed in several important papers, such as those of [1–5]. (2)The orthogonality problem constitutes another significant research program in econometrics; its formal seed can be traced back to [6], where a proposal to solve the identification problem is made in the estimation of demand and supply curves (see [7]). Typically, in textbooks, the method of Instrumental Variables (IVs) is proposed as a solution to the problem of simultaneous equations and, broadly speaking, whenever there is no independence between the error term and the regressors, that is, when the orthogonality assumption is not satisfied.
This paper aims to study the consequences of using nonstationary variables in an LS regression when the regressor is related to the error term; this is done in a simple regression framework. The specification under particular scrutiny is To the best of our knowledge, the asymptotics—and the finite-sample properties—of the combination of nonstationarity, nonorthogonality between and , and LS estimates, have been scarcely studied (but see [8]). That said, we acknowledge that there are several comprehensive studies concerning the use of IV in the presence of nonstationarity [9, 10], for example, studied the asymptotics as well as the finite-sample properties of the IV estimator in the context of a cointegrated relationship, and proved that even spurious instruments (i.e., instruments not structurally related to the regressors) provide consistent estimates. Phillips [11] proved that, when there is no structural relationship between the regressand and a single regressor, that is, when there is no cointegration between and , the use of spurious instruments does not prevent the phenomenon (this is a simple extension of [5]). We derive the asymptotic behavior of LS estimates, where the data generating processes (DGPs) consist of two cointegrated variables in which the regressor bears a relationship with the error term. In this case, LS provide consistent estimates. Additionally, some Monte Carlo evidence is presented to account for the adequacy of asymptotic results in finite samples. In other words, LS estimates of the true DGP parameters, and (see (2.4) in the next section), do not require the information on the parameters of , that is, is weakly exogenous for the estimation of and as defined by [12].
2. Relevant DGPs
This work aims to study the asymptotic properties of LS estimates when neither the orthogonality nor the stationarity assumptions are satisfied. Our approach is twofold: we assume (i) the variable is statistically related to the innovations of , as in the problem of independent variables measured with error and (ii) the DGPs of both variables are interdependent, as in the problem of simultaneity. All the cases studied consider nonstationary and cointegrated variables (DGP (2.1) is included because it eases the comprehension of the paper) where , for , are independent white noises with zero-mean and constant variance , and is an initial condition. We may relax the assumptions made for the innovations; for example, we could force them to obey the general level conditions in [5, Assumption 1]. Nevertheless, although the asymptotic results would still hold in this case, our primary target concerns the problem of orthogonality between the regressor and the error term, not those of autocorrelation or heteroskedasticity. These DGPs allow for an interesting variety of cases (note that the asymptotics of the LS estimates when and have been independently generated by any of first three DGPs can be found, e.g., in [13]; notwithstanding, the authors can provide these cases as mathematica code upon request).(1)Bookcase no. 1: DGP of is (2.1) and DGP of is (2.4) with . When the variables are generated in this manner, we fulfill the classical assumptions made in most basic econometrics textbooks. The variables are stationary, the innovations are homoskedastic and independent, and so forth. It is straightforward to show that: , and . (2)Bookcase no. 2: DGP of is (2.1) and DGP of is (2.4) with . These DGPs also represent a typical example of a problem of orthogonality in most basic econometrics textbooks. Although the variables are stationary and the innovations are homoskedastic and independent, the explanatory variable is related to the innovations of . It is well known that the estimates do not converge to their true value. In particular, it is straightforward to show that: , and . (3)Bookcase no. 3: DGP of is (2.2) and DGP of is (2.4) with . These DGPs allow the relationship between and to be cointegrated à la [14]. Once again, asymptotic results have been known for a long time, obtaining these does not entail any particular difficulty: , , and . (4)Nonstationarity and non-orthogonality case no. 1: DGP of is (2.2) and DGP of is (2.4). Notwithstanding, the obvious problem of orthogonality between and the error term, the variables remain cointegrated. The artifact employed to induce the orthogonality problem can be considered as, for example, measurement errors in the explanatory variable. One should expect that, in the presence of this problem, estimates would not converge to their true value. We prove below that, contrary to expectations, this is not the case. (5)Nonstationarity and non-orthogonality case no. 2: DGP of is (2.3) and DGP of is (2.4). As in the previous case, we have a cointegrated relationship between and , only in this case, the problem of orthogonality between the regressor and the error term is even more explicit; the artifact employed to induce the orthogonality problem can be related to the typical simultaneous equations case. We also prove below that Least Squares (LS) provide consistent estimates.
The common belief as regards the last two cases is that the failure of the orthogonality assumption induces LS to generate inconsistent estimates, even in a cointegrated relationship. In fact, when the variables are generated as in (2.2)–(2.4), the estimates of the parameter converge to their true value (note that we did not consider the case where the orthogonality assumption is not satisfied because of the omission of a relevant variable; [15] studied the later case and proved that the LS estimates do not converge to their true values). This is proven in Theorem 2.1:
Theorem 2.1. Let be generated by (2.4). (i)Let be generated by (2.2). The innovations of both DGPs, , for , are independent white noises with zero-mean and constant variance ; use and to estimate regression (1.1) by LS. Hence, as ,(a), (b), (c), (d), (e). (ii)Let be generated by (2.3). The innovations of both DGPs, , for , are independent white noises with zero-mean and constant variance ; use and to estimate regression (1.1) by LS. Hence, as , (a), (b), (c), (d), (e).
Proof. See Appendix A.
These asymptotic results show that a relationship between the innovations of and —as stated by DGPs (2.2), (2.3), and (2.4)—does not obstruct the consistency of LS estimates when the variables are nonstationary and cointegrated (our results are in line with those of [8]). In other words, the failure of the orthogonality assumption does not preclude adequate asymptotic properties of LS. Furthermore, it can be said that is weakly exogenous for the estimation of and but not for the estimation of . The formula of the variance is noteworthy and the asymptotic expression of depends on the values of , , and .
In order to emphasize the relevance of this result, we modified the DGPs of the variables in an effort to strengthen the link between the DGPs and the literature on simultaneous equations. The modifications are twofold and appear in the following propositions. As in Theorem 2.1, the results in proposition 1 are made under the assumption that innovations are i.i.d processes.
Proposition 2.2. Let and be generated by where , for , are independent white noises with zero mean and variance . Let these variables be used to estimate regression (1.1) by LS. Hence, as , (1), (2), (3), (4), (5).
Proof. See Appendix A.
Proposition 2.3. Let and be generated by where , for , are independent white noises with zero mean and variance , and . Let these variables be used to estimate regression (1.1) by LS. Hence, as , (1), (2), (3), (4), (5).
Proof. See Appendix A.
The two systems, represented in (2.5) and (2.6), bear a striking resemblance to classical examples of simultaneous equations in econometrics. The fundamental variations are, (i) a deterministic trend in the variable in system (2.5) and (ii) a stochastic as well as a deterministic trend in system (2.6). The asymptotics of LS estimates do not show significant differences from those in Theorem 2.1. Note, however, that is weakly exogenous for the estimation of , , and . The main result is in fact identical, that is, the failure of orthogonality between and the error term does not preclude the estimates from converging to their true values.
Asymptotic properties of LS estimators clearly provide an encouraging perspective in time-series econometrics. Notwithstanding, we should bear in mind that asymptotic properties may be a poor finite-sample approximation. In order to observe the behavior of LS estimates in finite samples, we present two Monte Carlo experiments. Firstly, we represent graphically the convergence process of towards its true value, . In accordance with asymptotic results, as . We reproduce the behavior of the later difference in figure 1. The variables and are generated according to (2.3) and (2.4), respectively. The sample size varies from 50 to 700 whilst goes from −5 to 5. The remaining parameters appear below the figure.
A brief glance at Figure 1 reveals that the asymptotic results stated in Theorem 2.1 approximate conveniently the finite-sample results for . For smaller sample sizes, it can be seen that the difference between the parameter and its estimates corresponds usually to approximately 1.5% or less of the value of the former (we tried different variables in the axis (); all of these trials produced similar figures).
The second Monte Carlo is built upon the same basis. In Table 1, each cell indicates the sample mean of and, below, its estimated standard deviation (in parentheses). The number of replications is 10,000. The parameter values used in the simulation are explicit within the table. The variables, and , are generated according to (2.3) and (2.4), respectively. Sample size ranges from ; ; ; ; ; the error term is a white noise with variance .
Table 1 shows that LS estimates of a nonstationary relationship with a nonorthogonality problem quickly converge to their true value; with a sample size as small as 50 observations, the difference between and its estimate averages, at most, 0.015, and represents a deviation from the true value of 1.5%; in many other cases, the deviation is even smaller, of order 10−3–10−4. These differences tend to diminish further as the sample size grows. In fact, when there are 700 observations, the order of magnitude of such differences oscillates between 10−5–10−8. We performed the same experiment with autocorrelated disturbances with (data available upon request); using such disturbances severely deteriorates the efficiency of the LS estimates although still converges to zero; we do not focus on this issue because, as mentioned earlier, neither autocorrelation nor heteroskedasticity are under scrutiny in this work.
3. Concluding Remarks
Using cointegrated variables in an LS regression where the regressor is not independent of the error term does not preclude the method from yielding consistent estimates. In other words, it is proven that, under these circumstances, the regressor remains weakly exogenous for the estimation of and (and for in systems (2.5) and (2.6)) as defined by [12]. Furthermore, the finite-sample evidence indicates that LS provide good estimates even in samples of a practical size.
Notwithstanding, one should note the striking resemblance between the properties of the DGPs used in the propositions and those of variables belonging to a classical simultaneous-equation model. It may be possible that the estimation of such models, even if the macroeconomic variables they are nourished with are not stationary, would yield correct estimates. Of course, such a possibility rules out the existence of structural shifts, parameter instability, omission of a relevant variable, or any other major assumption failure.
Appendix
A. Proof of Theorem 2.1 and Propositions 2.2 and 2.3
The estimated specification in Theorem 2.1 and Propositions 2.2 and 2.3 is . In all three cases, we employ the following classical LS formulae (all sums run from to unless otherwise specified): (i), (ii), (iii), (iv),
where
, , and is the element in row 2, column 2, of the matrix.
To obtain the asymptotics of , , , , and we need to ascertain the behaviour of the following expressions when : , , , , and . The behavior of these expressions varies depending on the DGP of the variables and . We present such behavior for the DGPs underlying Theorem 2.1 and Propositions 2.2 and 2.3. All of the orders in probability stated in the underbraced sums can be found in [5, 13, 16–18]. It is important to clarify that the computation of the asymptotics follows [5] and was assisted by Mathematica; we thus rewrote below the expressions written as Mathematica code.
A.1. Theorem 2.1: First Result
The expressions needed to compute the asymptotic values of , , , and are where and is an initial condition. The sums including solely the deterministic trend component are The code in this case is represented below. To understand it, a brief glossary is required and appears in Table 2.
These expressions were written as Mathematica 7.0 code.
A.2. Theorem 2.1: Second Result
The expressions needed to compute the asymptotic values of , , , and appear below. Note that , , and are identical to the ones presented in the previous appendix and have been therefore omitted The code in this case is represented below.
A.3. Proposition 2.2
First note that DGP (2.5) can be written as where, , , and . The expressions needed to compute the asymptotic values of , , , and are The code in this case is represented below.
A.4. Proposition 2.3
As in the previous appendix, first rewrite DGP (2.5) as where, , , and . The expressions needed to compute the asymptotic values of , , , and appear below. Note that , , and are identical to the ones presented in the previous appendix and have been therefore omitted The code in this case is represented below:
A.5. Computation of the Asymptotics
The previous three appendices provide the Mathematica code of , , , , and for different DGP combinations. We now present the code that computes the asymptotics of (1.1) in any such combination. Note that the code computes the asymptotics in the following order: the matrix , , , , , , and . Comments appear inside parentheses (**).
Acknowledgments
The authors would like to thank an anonymous referee for insightful comments. The opinions in this paper correspond to the authors and do not necessarily reflect the point of view of Banco de México.