Abstract
This paper studies a linear regression model, whose errors are functional coefficient autoregressive processes. Firstly, the quasi-maximum likelihood (QML) estimators of some unknown parameters are given. Secondly, under general conditions, the asymptotic properties (existence, consistency, and asymptotic distributions) of the QML estimators are investigated. These results extend those of Maller (2003), White (1959), Brockwell and Davis (1987), and so on. Lastly, the validity and feasibility of the method are illuminated by a simulation example and a real example.
1. Introduction
Consider the following linear regression model: where 's are scalar response variables, 's are explanatory variables, is a -dimensional unknown parameter, and the 's are functional coefficient autoregressive processes given as where 's are independent and identically distributed random errors with zero mean and finite variance , is a one-dimensional unknown parameter and is a real valued function defined on a compact set which contains the true value as an inner point and is a subset of . The values of and are unknown.
Model (1.1) includes many special cases, such as an ordinary linear regression model when see [1–11]. In the sequel, we always assume that for some , is a linear regression model with constant coefficient autoregressive processes (when , see Maller [12], Pere [13], and Fuller [14]), time-dependent and functional coefficient autoregressive processes (when , see Kwoun and Yajima [15]), constant coefficient autoregressive processes (when , see White [16, 17], Hamilton [18], Brockwell and Davis [19], and Abadir and Lucas [20]), time-dependent or time-varying autoregressive processes (when , see Carsoule and Franses [21], Azrak and Mélard [22], and Dahlhaus [23]), and so forth.
Regression analysis is one of the most mature and widely applied branches of statistics. Linear regression analysis is one of the most widely used statistical techniques. Its applications occur in almost every field, including engineering, economics, the physical sciences, management, life and biological sciences, and the social sciences. Linear regression model is the most important and popular model in the statistical literature, which attracts many statisticians to estimate the coefficients of the regression model. For the ordinary linear regression model (when the errors are independent and identically distributed random variables), Bai and Guo [1], Chen [2], Anderson and Taylor [3], Drygas [4], González-Rodríguez et al. [5], Hampel et al. [6], He [7], Cui [8], Durbin [9], Hoerl and Kennard [10], Li and Yang [11], and Zhang et al. [24] used various estimation methods (Least squares estimate method, robust estimation, biased estimation, and Bayes estimation) to obtain estimators of the unknown parameters in (1.1) and discussed some large or small sample properties of these estimators.
However, the independence assumption for the errors is not always appropriate in applications, especially for sequentially collected economic and physical data, which often exhibit evident dependence on the errors. Recently, linear regression with serially correlated errors has attracted increasing attention from statisticians. One case of considerable interest is that the errors are autoregressive processes and the asymptotic theory of this estimator was developed by Hannan and Kavalieris [25]. Fox and Taqqu [26] established its asymptotic normality in the case of long-memory stationary Gaussian observations errors. Giraitis and Surgailis [27] extended this result to non-Gaussian linear sequences. The asymptotic distribution of the maximum likelihood estimator was studied by Giraitis and Koul in [28] and Koul in [29] when the errors are nonlinear instantaneous functions of a Gaussian long-memory sequence. Koul and Surgailis [30] established the asymptotic normality of the Whittle estimator in linear regression models with non-Gaussian long-memory moving average errors. When the errors are Gaussian, or a function of Gaussian random variables that are strictly stationary and long range dependent, Koul and Mukherjee [31] investigated the linear model. Shiohama and Taniguchi [32] estimated the regression parameters in a linear regression model with autoregressive process.
In addition to (constant or functional or random coefficient) autoregressive model, it has gained much attention and has been applied to many fields, such as economics, physics, geography, geology, biology, and agriculture. Fan and Yao [33], Berk [34], Hannan and Kavalieris [35], Goldenshluger and Zeevi [36], Liebscher [37], An et al. [38], Elsebach [39], Carsoule and Franses [21], Baran et al. [40], Distaso [41], and Harvill and Ray [42] used various estimation methods (the least squares method, the Yule-Walker method, the method of stochastic approximation, and robust estimation method) to obtain some estimators and discussed their asymptotic properties, or investigated hypotheses testing.
This paper discusses the model (1.1)-(1.2) including stationary and explosive processes. The organization of the paper is as follows. In Section 2 some estimators of and are given by the quasi-maximum likelihood method. Under general conditions, the existence and consistency the quasi-maximum likelihood estimators are investigated, and asymptotic normality as well, in Section 3. Some preliminary lemmas are presented in Section 4. The main proofs are presented in Section 5, with some examples in Section 6.
2. Estimation Method
Write the “true” model as where , and 's are i.i.d errors with zero mean and finite variance . Define , and by (2.2) we have Thus is measurable with respect to the -field generated by , and
Assume at first that the ’s are i.i.d. . Using similar arguments to those of Fuller [14] or Maller [12], we get the log-likelihood of conditional on : At this stage we drop the normality assumption, but still maximize (2.5) to obtain QML estimators, denoted by (when they exist): Thus satisfy the following estimation equations: where
Remark 2.1. If then the above equations become the same as Maller's [12]. Therefore, we extend the QML estimators of Maller [12].
To calculate the values of the QML estimators, we may use the grid search method, steepest ascent method, Newton-Raphson method, and modified Newton-Raphson method. In order to calculate in Section 6, we introduce the most popular modified Newton-Raphson method proposed by Davidon-Fletcher-Powell (see Hamilton [18]).
Let vector denote an estimator of that has been calculated at the th iteration, and let denote an estimation of . The new estimator is given by
for the positive scalar that maximizes where vector
and symmetric matrix
where
Once and the gradient at have been calculated, a new estimation is found from
where
It is well known that least squares estimators in ordinary linear regression model are very good estimators, so a recursive algorithms procedure is to start the iteration with which are least squares estimators of and , respectively. Take such that . Iterations are stopped if some termination criterion is reached, for example, if
for some prechosen small number .
Up to this point, we obtain the values of QML estimators when the function is known. However, the function is never the case in practice; we have to estimate it. By (2.12) and (1.2), we obtain
Based on the dataset , we may obtain the estimation function of by some smoothing methods (see Simonff [43], Fan and Yao [33], Green and Silverman [44], Fan and Gijbels [45], etc.)
To obtain our results, the following conditions are sufficient.
(A1) is positive definite for sufficiently large and
where and denotes the maximum in absolute value of the eigenvalues of a symmetric matrix.
(A2) There is a constant such that
for any and .
(A3) The derivatives exist and are bounded for any and .
Remark 2.2. Maller [12] applied the condition (A1), and Kwoun and Yajima [15] used the conditions (A2) and (A3). Thus our conditions are general. (A1) delineates the class of for which our results hold in the sense required. It is further discussed by Maller in [12]. Kwoun and Yajima [15] call stable if is bounded. Thus (A2) implies that is stable. However, is not stationary. In fact, by (2.3), we obtain that
which is dependent of .
For ease of exposition, we will introduce the following notations which will be used later in the paper.
Define -vector , and
By (2.7) and (2.8), we get
where and the * indicates that the element is filled in by symmetry. Thus,
where
3. Statement of Main Results
Theorem 3.1. Suppose that conditions (A1)–(A3) hold. Then there is a sequence such that, for each , as , the probability Furthermore, where, for each and ; define neighborhoods
Theorem 3.2. Suppose that conditions (A1)–(A3) hold. Then
Remark 3.3. For , our results still hold.
In the following, we will investigate some special cases in the model (1.1)-(1.2). Although the following results are directly obtained from Theorems 3.1 and 3.2, we discuss these results in order to compare with the corresponding results.
Corollary 3.4. Let . If condition (A1) holds, then, for , (3.1), (3.2), and (3.4) hold.
Remark 3.5. These results are the same as the corresponding results of Maller [12].
Corollary 3.6. If and , then, for , where
Remark 3.7. These estimators are the same as the least squares estimators (see White [16]). For , are explosive processes. In the case, the corollary is the same as the results of White [17]. While , notice that and , and by Corollary 3.6 we obtain The result was discussed by many authors, such as Fujikoshi and Ochi [46] and Brockwell and Davis [19].
Corollary 3.8. Let . If conditions (A2) and (A3) hold, then where
Corollary 3.9. Let . If condition (A1) holds, then
Remark 3.10. Let . Note that and we easily obtain asymptotic normality of the (quasi-)maximum likelihood or least squares estimator in ordinary linear regression models from the corollary.
4. Some Lemmas
To prove Theorems 3.1 and 3.2, we first introduce the following lemmas.
Lemma 4.1. The matrix is positive definite for large enough with and .
Proof. It is easy to show that the matrix is positive definite for large enough . By (2.8), we have Note that and are independent of each other; thus by (2.7) and , we have Hence, from (4.1) and (4.2), By (2.8) and (2.17), we have Note that is a martingale difference sequence with so By (2.7) and (2.8) and noting that and are independent of each other, we have From (4.4)–(4.7), it follows that .
Lemma 4.2. If condition (A1) holds, then, for any , the matrix is positive definite for large enough , and
Proof. Let and be the smallest and largest roots of . Then from the study of Rao in [47, Ex 22.1], for unit vectors . Thus by (2.24), there are some and such that implies that By (4.10), we have By the study of Rao in [47, page 60] and (2.23), we have From (4.12) and ,
Lemma 4.3 (see [48]). Let be a symmetric random matrix with eigenvalues . Then
Lemma 4.4. For each , and also where
Proof. Let be a square root decomposition of . Then
Let . Then
From (2.28), (2.29), and (4.18),
Let
In the first step, we will show that, for each ,
In fact, note that
where
Let , and let . By Cauchy-Schwartz inequality, Lemma 4.2, condition (A3), and noting that , we have that
Here for some . Similar to the proof of , we easily obtain that
By Cauchy-Schwartz inequality, Lemma 4.2, condition (A3), and noting that , we have that
Hence, (4.24) follows from (4.25)–(4.29).
In the second step, we will show that
Note that
Consider
where
For and each , we have
By (4.34) and Lemma 4.2, we have
Using Cauchy-Schwartz inequality, condition (A3), and (4.35), we obtain
Let
Then from Lemma 4.2,
By condition (A2) and (4.38), we have
Thus by Chebychev inequality and (4.39),
Using the similar argument as , we obtain that
Using the similar argument as , we obtain that
By Cauchy-Schwartz inequality, (4.35), and (4.27), we get
Thus (4.30) follows immediately from (4.32), (4.36), and (4.40)–(4.43).
In the third step, we will show that
Let
Then
By (4.34), it is easy to show that
From condition (A3), (2.30), and (4.23), we obtain that
Hence, by Markov inequality,
Using the similar argument as (4.40), we easily obtain that
By Markov inequality and noting that
we have that
Using the similar argument as (4.6), we easily obtain that
Hence, (4.44) follows immediately from (4.46), (4.47), and (4.49)–(4.53).
This completes the proof of (4.15) from (4.21), (4.24), (4.30), and (4.44). To prove (4.16), we need to show that
This follows immediately from (2.27) and Markov inequality.
Finally, we will prove (4.17). By (4.15) and (4.16), we have
uniformly in for each . Thus, by Lemma 4.3,
This implies (4.17).
Lemma 4.5 (see [49]). Let be a zero-mean, square-integrable martingale array with differences , and let be an a.s. finite random variable. Suppose that , for all , and . Then where the r.v. has characteristic function .
5. Proof of Theorems
5.1. Proof of Theorem 3.1
Take , let be the boundary of , and let . Using (2.27) and Taylor expansion, for each , we have where for some .
Let and . Take and , and by (5.2) we obtain that By Lemma 4.1 and Chebychev inequality, we obtain Let , then , and using (4.17), we have By (5.3)–(5.5), we have By Lemma 4.3, as . Hence . Moreover, from (4.17), we have This implies that is concave on . Noting this fact and (5.6), we get On the event in the brackets, the continuous function has a unique maximum in over the compact neighborhood . Hence Moreover, there is a sequence such that satisfies Thus the is a QML estimator for . It is clearly consistent, and Since is a QML estimator for , is a QML estimator for from (2.9).
To complete the proof, we will show that as If , then and . By (2.12) and (2.1), we have By (2.9), (2.11), and (5.12), we have From (5.12), it follows that From (2.2), By (5.13)–(5.15), we have By the law of large numbers, Since is a martingale difference sequence with By Chebychev inequality, we have By Markov inequality and noting that , we obtain that Write Noting that , we have By (4.34) and condition (A3), we have By (4.34), condition (A3), and Cauchy-Schwartz inequality, we have By (5.21)–(5.24), we obtain From (5.16), (5.17), (5.19), (5.20), and (5.25), we have . We therefore complete the proof of Theorem 3.1.
5.2. Proof of Theorem 3.2
It is easy to know that and is nonsingular from Theorem 3.1. By Taylor's expansion, we have Since , also . By (4.15), we have where is a symmetric matrix with . By (5.26) and (5.27), we have Similar to (5.27), we have Here . By (5.28), (5.29), and noting that and , we obtain that From (2.7) and (2.8), we have From (2.29) and (4.18), we have By (5.30)–(5.32), we have
Let with , and . Then , and we will consider the limiting distribution of the following 2-vector: By Cramer-Wold device, it will suffice to find the asymptotic distribution of the following random: where with . Note that , so the sums in (5.35) are partial sums of a martingale triangular array with respect to , and we will verify the Lindeberg conditions for their convergence to normality as follows: Noting that and , we obtain that Hence, for given , there is a set whose probability approaches 1 as on which . On this event, for any , Here as . This verifies the Lindeberg conditions, and by Lemma 4.5 Thus we complete the proof of Theorem 3.2.
6. Numerical Examples
6.1. Simulation Example
We will simulate a regression model (1.1), where and the random errors where
By the ordinary least squares method, we obtain the least squares estimators , and . So we take , , and . Therefore, using the iterative computing method, we obtain Since , the QML estimators of and are given by
These values closely approximate their true values, so our method is successful, especially in estimating the parameters and .
6.2. Empirical Example
We will use the data studied by Fuller in [14]. The data pertain to the consumption of spirits in the United Kingdom from 1870 to 1983. The dependent variable is the annual per capita consumption of spirits in the United Kingdom. The explanatory variables and are per capita income and price of spirits, respectively, both deflated by a general price index. All data are in logarithms. The model suggested by Prest can be written as where 1869 is the origin for , , and assuming that is a stationary time series.
Fuller [14] obtained the estimated generalized least squares equation where is a sequence of uncorrelated random variables.
Using our method, we obtain the following models: where is a sequence of uncorrelated random variables.
By the models (6.6), the residual mean square is , which is smaller than calculated by the models (6.5).
From the above examples, it can be seen that our method is successful and valid. However, a further discussion of fitting the function is needed so that we can find a good method to use in practical applications.
Acknowledgments
The author would like to thank an anonymous referee for useful comments and suggestions. The paper was supported by the Key Project of Chinese Ministry of Education (no. 209078) and Scientific Research Item of Department of Education, Hubei (no. D20092207).