Abstract

This paper studies a linear regression model, whose errors are functional coefficient autoregressive processes. Firstly, the quasi-maximum likelihood (QML) estimators of some unknown parameters are given. Secondly, under general conditions, the asymptotic properties (existence, consistency, and asymptotic distributions) of the QML estimators are investigated. These results extend those of Maller (2003), White (1959), Brockwell and Davis (1987), and so on. Lastly, the validity and feasibility of the method are illuminated by a simulation example and a real example.

1. Introduction

Consider the following linear regression model: where 's are scalar response variables, 's are explanatory variables, is a -dimensional unknown parameter, and the 's are functional coefficient autoregressive processes given as where 's are independent and identically distributed random errors with zero mean and finite variance , is a one-dimensional unknown parameter and is a real valued function defined on a compact set which contains the true value as an inner point and is a subset of . The values of and are unknown.

Model (1.1) includes many special cases, such as an ordinary linear regression model when see [111]. In the sequel, we always assume that for some , is a linear regression model with constant coefficient autoregressive processes (when , see Maller [12], Pere [13], and Fuller [14]), time-dependent and functional coefficient autoregressive processes (when , see Kwoun and Yajima [15]), constant coefficient autoregressive processes (when , see White [16, 17], Hamilton [18], Brockwell and Davis [19], and Abadir and Lucas [20]), time-dependent or time-varying autoregressive processes (when , see Carsoule and Franses [21], Azrak and Mélard [22], and Dahlhaus [23]), and so forth.

Regression analysis is one of the most mature and widely applied branches of statistics. Linear regression analysis is one of the most widely used statistical techniques. Its applications occur in almost every field, including engineering, economics, the physical sciences, management, life and biological sciences, and the social sciences. Linear regression model is the most important and popular model in the statistical literature, which attracts many statisticians to estimate the coefficients of the regression model. For the ordinary linear regression model (when the errors are independent and identically distributed random variables), Bai and Guo [1], Chen [2], Anderson and Taylor [3], Drygas [4], González-Rodríguez et al. [5], Hampel et al. [6], He [7], Cui [8], Durbin [9], Hoerl and Kennard [10], Li and Yang [11], and Zhang et al. [24] used various estimation methods (Least squares estimate method, robust estimation, biased estimation, and Bayes estimation) to obtain estimators of the unknown parameters in (1.1) and discussed some large or small sample properties of these estimators.

However, the independence assumption for the errors is not always appropriate in applications, especially for sequentially collected economic and physical data, which often exhibit evident dependence on the errors. Recently, linear regression with serially correlated errors has attracted increasing attention from statisticians. One case of considerable interest is that the errors are autoregressive processes and the asymptotic theory of this estimator was developed by Hannan and Kavalieris [25]. Fox and Taqqu [26] established its asymptotic normality in the case of long-memory stationary Gaussian observations errors. Giraitis and Surgailis [27] extended this result to non-Gaussian linear sequences. The asymptotic distribution of the maximum likelihood estimator was studied by Giraitis and Koul in [28] and Koul in [29] when the errors are nonlinear instantaneous functions of a Gaussian long-memory sequence. Koul and Surgailis [30] established the asymptotic normality of the Whittle estimator in linear regression models with non-Gaussian long-memory moving average errors. When the errors are Gaussian, or a function of Gaussian random variables that are strictly stationary and long range dependent, Koul and Mukherjee [31] investigated the linear model. Shiohama and Taniguchi [32] estimated the regression parameters in a linear regression model with autoregressive process.

In addition to (constant or functional or random coefficient) autoregressive model, it has gained much attention and has been applied to many fields, such as economics, physics, geography, geology, biology, and agriculture. Fan and Yao [33], Berk [34], Hannan and Kavalieris [35], Goldenshluger and Zeevi [36], Liebscher [37], An et al. [38], Elsebach [39], Carsoule and Franses [21], Baran et al. [40], Distaso [41], and Harvill and Ray [42] used various estimation methods (the least squares method, the Yule-Walker method, the method of stochastic approximation, and robust estimation method) to obtain some estimators and discussed their asymptotic properties, or investigated hypotheses testing.

This paper discusses the model (1.1)-(1.2) including stationary and explosive processes. The organization of the paper is as follows. In Section 2 some estimators of and are given by the quasi-maximum likelihood method. Under general conditions, the existence and consistency the quasi-maximum likelihood estimators are investigated, and asymptotic normality as well, in Section 3. Some preliminary lemmas are presented in Section 4. The main proofs are presented in Section 5, with some examples in Section 6.

2. Estimation Method

Write the “true” model as where , and 's are i.i.d errors with zero mean and finite variance . Define , and by (2.2) we have Thus is measurable with respect to the -field generated by , and

Assume at first that the ’s are i.i.d. . Using similar arguments to those of Fuller [14] or Maller [12], we get the log-likelihood of conditional on : At this stage we drop the normality assumption, but still maximize (2.5) to obtain QML estimators, denoted by (when they exist): Thus satisfy the following estimation equations: where

Remark 2.1. If then the above equations become the same as Maller's [12]. Therefore, we extend the QML estimators of Maller [12].
To calculate the values of the QML estimators, we may use the grid search method, steepest ascent method, Newton-Raphson method, and modified Newton-Raphson method. In order to calculate in Section 6, we introduce the most popular modified Newton-Raphson method proposed by Davidon-Fletcher-Powell (see Hamilton [18]).
Let vector denote an estimator of that has been calculated at the th iteration, and let denote an estimation of . The new estimator is given by for the positive scalar that maximizes where vector and symmetric matrix where
Once and the gradient at have been calculated, a new estimation is found from where
It is well known that least squares estimators in ordinary linear regression model are very good estimators, so a recursive algorithms procedure is to start the iteration with which are least squares estimators of and , respectively. Take such that . Iterations are stopped if some termination criterion is reached, for example, if for some prechosen small number .
Up to this point, we obtain the values of QML estimators when the function is known. However, the function is never the case in practice; we have to estimate it. By (2.12) and (1.2), we obtain Based on the dataset , we may obtain the estimation function of by some smoothing methods (see Simonff [43], Fan and Yao [33], Green and Silverman [44], Fan and Gijbels [45], etc.)
To obtain our results, the following conditions are sufficient.
(A1) is positive definite for sufficiently large and where and denotes the maximum in absolute value of the eigenvalues of a symmetric matrix.
(A2) There is a constant such that for any and .
(A3) The derivatives exist and are bounded for any and .

Remark 2.2. Maller [12] applied the condition (A1), and Kwoun and Yajima [15] used the conditions (A2) and (A3). Thus our conditions are general. (A1) delineates the class of for which our results hold in the sense required. It is further discussed by Maller in [12]. Kwoun and Yajima [15] call stable if is bounded. Thus (A2) implies that is stable. However, is not stationary. In fact, by (2.3), we obtain that which is dependent of .
For ease of exposition, we will introduce the following notations which will be used later in the paper.
Define -vector , and By (2.7) and (2.8), we get where and the * indicates that the element is filled in by symmetry. Thus, where

3. Statement of Main Results

Theorem 3.1. Suppose that conditions (A1)–(A3) hold. Then there is a sequence such that, for each , as , the probability Furthermore, where, for each and ; define neighborhoods

Theorem 3.2. Suppose that conditions (A1)–(A3) hold. Then

Remark 3.3. For , our results still hold.
In the following, we will investigate some special cases in the model (1.1)-(1.2). Although the following results are directly obtained from Theorems 3.1 and 3.2, we discuss these results in order to compare with the corresponding results.

Corollary 3.4. Let . If condition (A1) holds, then, for , (3.1), (3.2), and (3.4) hold.

Remark 3.5. These results are the same as the corresponding results of Maller [12].

Corollary 3.6. If and , then, for , where

Remark 3.7. These estimators are the same as the least squares estimators (see White [16]). For , are explosive processes. In the case, the corollary is the same as the results of White [17]. While , notice that and , and by Corollary 3.6 we obtain The result was discussed by many authors, such as Fujikoshi and Ochi [46] and Brockwell and Davis [19].

Corollary 3.8. Let . If conditions (A2) and (A3) hold, then where

Corollary 3.9. Let . If condition (A1) holds, then

Remark 3.10. Let . Note that and we easily obtain asymptotic normality of the (quasi-)maximum likelihood or least squares estimator in ordinary linear regression models from the corollary.

4. Some Lemmas

To prove Theorems 3.1 and 3.2, we first introduce the following lemmas.

Lemma 4.1. The matrix is positive definite for large enough with and .

Proof. It is easy to show that the matrix is positive definite for large enough . By (2.8), we have Note that and are independent of each other; thus by (2.7) and , we have Hence, from (4.1) and (4.2), By (2.8) and (2.17), we have Note that is a martingale difference sequence with so By (2.7) and (2.8) and noting that and are independent of each other, we have From (4.4)–(4.7), it follows that .

Lemma 4.2. If condition (A1) holds, then, for any , the matrix is positive definite for large enough , and

Proof. Let and be the smallest and largest roots of . Then from the study of Rao in [47, Ex 22.1], for unit vectors . Thus by (2.24), there are some and such that implies that By (4.10), we have By the study of Rao in [47, page 60] and (2.23), we have From (4.12) and ,

Lemma 4.3 (see [48]). Let be a symmetric random matrix with eigenvalues . Then

Lemma 4.4. For each , and also where

Proof. Let be a square root decomposition of . Then Let . Then From (2.28), (2.29), and (4.18), Let
In the first step, we will show that, for each , In fact, note that where
Let , and let . By Cauchy-Schwartz inequality, Lemma 4.2, condition (A3), and noting that , we have that Here for some . Similar to the proof of , we easily obtain that By Cauchy-Schwartz inequality, Lemma 4.2, condition (A3), and noting that , we have that Hence, (4.24) follows from (4.25)–(4.29).
In the second step, we will show that Note that Consider where For and each , we have By (4.34) and Lemma 4.2, we have Using Cauchy-Schwartz inequality, condition (A3), and (4.35), we obtain Let Then from Lemma 4.2, By condition (A2) and (4.38), we have Thus by Chebychev inequality and (4.39), Using the similar argument as , we obtain that Using the similar argument as , we obtain that By Cauchy-Schwartz inequality, (4.35), and (4.27), we get Thus (4.30) follows immediately from (4.32), (4.36), and (4.40)–(4.43).
In the third step, we will show that Let Then By (4.34), it is easy to show that From condition (A3), (2.30), and (4.23), we obtain that Hence, by Markov inequality, Using the similar argument as (4.40), we easily obtain that By Markov inequality and noting that we have that Using the similar argument as (4.6), we easily obtain that Hence, (4.44) follows immediately from (4.46), (4.47), and (4.49)–(4.53).
This completes the proof of (4.15) from (4.21), (4.24), (4.30), and (4.44). To prove (4.16), we need to show that This follows immediately from (2.27) and Markov inequality.
Finally, we will prove (4.17). By (4.15) and (4.16), we have uniformly in for each . Thus, by Lemma 4.3, This implies (4.17).

Lemma 4.5 (see [49]). Let be a zero-mean, square-integrable martingale array with differences , and let be an a.s. finite random variable. Suppose that , for all , and . Then where the r.v. has characteristic function .

5. Proof of Theorems

5.1. Proof of Theorem 3.1

Take , let be the boundary of , and let . Using (2.27) and Taylor expansion, for each , we have where for some .

Let and . Take and , and by (5.2) we obtain that By Lemma 4.1 and Chebychev inequality, we obtain Let , then , and using (4.17), we have By (5.3)–(5.5), we have By Lemma 4.3, as . Hence . Moreover, from (4.17), we have This implies that is concave on . Noting this fact and (5.6), we get On the event in the brackets, the continuous function has a unique maximum in over the compact neighborhood . Hence Moreover, there is a sequence such that satisfies Thus the is a QML estimator for . It is clearly consistent, and Since is a QML estimator for , is a QML estimator for from (2.9).

To complete the proof, we will show that as If , then and . By (2.12) and (2.1), we have By (2.9), (2.11), and (5.12), we have From (5.12), it follows that From (2.2), By (5.13)–(5.15), we have By the law of large numbers, Since is a martingale difference sequence with By Chebychev inequality, we have By Markov inequality and noting that , we obtain that Write Noting that , we have By (4.34) and condition (A3), we have By (4.34), condition (A3), and Cauchy-Schwartz inequality, we have By (5.21)–(5.24), we obtain From (5.16), (5.17), (5.19), (5.20), and (5.25), we have . We therefore complete the proof of Theorem 3.1.

5.2. Proof of Theorem 3.2

It is easy to know that and is nonsingular from Theorem 3.1. By Taylor's expansion, we have Since , also . By (4.15), we have where is a symmetric matrix with . By (5.26) and (5.27), we have Similar to (5.27), we have Here . By (5.28), (5.29), and noting that and , we obtain that From (2.7) and (2.8), we have From (2.29) and (4.18), we have By (5.30)–(5.32), we have

Let with , and . Then , and we will consider the limiting distribution of the following 2-vector: By Cramer-Wold device, it will suffice to find the asymptotic distribution of the following random: where with . Note that , so the sums in (5.35) are partial sums of a martingale triangular array with respect to , and we will verify the Lindeberg conditions for their convergence to normality as follows: Noting that and , we obtain that Hence, for given , there is a set whose probability approaches 1 as on which . On this event, for any , Here as . This verifies the Lindeberg conditions, and by Lemma 4.5 Thus we complete the proof of Theorem 3.2.

6. Numerical Examples

6.1. Simulation Example

We will simulate a regression model (1.1), where and the random errors where

By the ordinary least squares method, we obtain the least squares estimators , and . So we take , , and . Therefore, using the iterative computing method, we obtain Since , the QML estimators of and are given by

These values closely approximate their true values, so our method is successful, especially in estimating the parameters and .

6.2. Empirical Example

We will use the data studied by Fuller in [14]. The data pertain to the consumption of spirits in the United Kingdom from 1870 to 1983. The dependent variable is the annual per capita consumption of spirits in the United Kingdom. The explanatory variables and are per capita income and price of spirits, respectively, both deflated by a general price index. All data are in logarithms. The model suggested by Prest can be written as where 1869 is the origin for , , and assuming that is a stationary time series.

Fuller [14] obtained the estimated generalized least squares equation where is a sequence of uncorrelated random variables.

Using our method, we obtain the following models: where is a sequence of uncorrelated random variables.

By the models (6.6), the residual mean square is , which is smaller than calculated by the models (6.5).

From the above examples, it can be seen that our method is successful and valid. However, a further discussion of fitting the function is needed so that we can find a good method to use in practical applications.

Acknowledgments

The author would like to thank an anonymous referee for useful comments and suggestions. The paper was supported by the Key Project of Chinese Ministry of Education (no. 209078) and Scientific Research Item of Department of Education, Hubei (no. D20092207).