Wavelet-M-Estimation for Time-Varying Coefficient Time Series Models
This paper proposes wavelet-M-estimation for time-varying coefficient time series models by using a robust-type wavelet technique, which can adapt to local features of the time-varying coefficients and does not require the smoothness of the unknown time-varying coefficient. The wavelet-M-estimation has the desired asymptotic properties and can be used to estimate conditional quantile and to robustify the usual mean regression. Under mild assumptions, the Bahadur representation and the asymptotic normality of wavelet-M-estimation are established.
The analysis of nonlinear and nonstationary time series, particularly with a time trend, has been very popular over the last two decades, because most time series data, coming from economics and finance data, are nonlinear or nonstationary or trending. Some nonlinear and nonstationary parametric, semiparametric, and nonparametric time series models have been proposed in the econometrics and statistics literature, for example, [1–6] and the references therein. One of the most attracted models is the time-varying coefficient time series models, which was formulated as follows:where is the response, is a -dimensional vector of unspecified coefficient function defined on , is a -dimensional random vector, and is the random error.
There are many smoothers proposed to estimate the time-varying coefficient in models (1), and the estimator is analyzed in the large-sample theory. Robinson  developed the Nadaraya–Watson method and showed the consistency and asymptotic normality of local constant estimator under the assumptions that the time series is a stationary -mixing and the errors are i.i.d. and independent of . Cai  proposed the local linear approach and established asymptotic properties of the proposed estimators under the -mixing conditions and without specifying the error distribution. Hoover et al.  gave a smoothing spline and a locally weighted polynomial methods for longitudinal data and presented asymptotic properties. Li et al.  and Fan et al.  made statistical inference for the partially time-varying coefficient (errors-in-variables) model, respectively. These estimations are all based on local least squares, which are efficient for Gaussian errors, but least-squares estimation may perform poorly in the presence of extreme outliers. In addition, these methods are all based on an important assumption that is of high smoothness. In reality, the assumption may be not satisfied. In some practical areas, such as signal and image processing, objects are frequently inhomogeneous. More robust estimation methods are required.
In this paper, we propose an M-type regression based on wavelet technique, which is called wavelet-M-estimation (WME), for the time-varying coefficient time series models (1). There is the considerable literature devoted to M-estimation for nonparametric regression models. Fan et al.  obtained asymptotic normality of the M-estimator for local linear fit under independent observations. Hong  established a Bahadur representation for the local polynomial estimates in nonparametric M-regression under the i.i.d. random errors. Jiang and Mack  and Cai and Ould-Saïd  considered the local polynomial M-estimator and the local linear M-estimator for dependent observations and showed some asymptotic theories of the proposed estimators. For varying coefficient models, Tang and Cheng  showed asymptotic normality of local M-estimators for longitudinal data, and so on. However, the above works required the smoothness of the function being estimated, for example, assuming that the function has continuous first/second derivative. With wavelets, such assumptions are relaxed considerably. Because wavelet bases can adapt to local features of curves in both time and frequency domains, wavelet provides a new technique to analyze functions with discontinuities or sharp spikes. Therefore, it is natural to have better estimators than the local kernel method in many cases. Great achievements have been made for wavelet in nonparametric models, for example, Antoniadis et al. ; Donoho and Johnstone ; Hall and Patil ; Härdle et al. ; Vidakovic ; Zhou and You ; Lu and Li ; and Zhou et al. . To the best of our knowledge, however, M-type estimation based on wavelet technique has not been developed for the time-varying coefficient models. We use a general formulation to treat mean regression, median regression, quantile regression, and robust mean regression in one setting by WME.
The article is organized as follows. Section 2 describes the wavelet analysis, -mixing sequence, and wavelet-M-estimation for time-varying coefficients. Section 3 presents the Bahadur representation and asymptotic normality of the WME under the -mixing stationary time series sequence, and states some application of the main results. Some technical lemmas and the proofs of the main results are given in Section 4.
As a central notion in wavelet analysis, multiresolution analysis (MRA) plays an important role for constructing a wavelet basis, which is a sequence of closed subspaces , , in a square integrable function space satisfying the following properties:(i) for , where denotes the integer set(ii) and , where is the closure of a set (iii)-spaces are self-similar, iff for (iv)There exists a scaling function whose integer-translates span the space , that is, , and for which the set is an orthonormal basis of
By dilation and translation for , we have , . From (iii) and (iv), one gets is the orthogonal bases of . According to Moore–Aronszajn theorem , is a reproducing kernel Hilbert space with a kernel
For any function ,
Denote the kernel of as , then . For more details, we refer to Vidakovic .
It motivates us to define a wavelet-M-estimator of bywhere is a given convex function, and are intervals that partition , so that . One way of defining the intervals is by taking , , and , . As an alternative to (4), the following equation is also used to define the WME of , that is, to find to satisfywhere is a -dimensional zero vector. It is a natural method to obtain (5) by taking the partial derivatives of (4) with respect to when is continuously differentiable and equating it to null, i.e., . In this paper, we shall apply the suitably choice function to (5), which includes many interesting cases such as least-squares estimation, the least absolute distance estimation, and quantile regression. See the monographs Huber and Ronchetti  and Koenker  for more details about the robustness of M-estimations and quantile regression, respectively.
Before stating the main results, we give the definition of -mixing dependence, which is necessary to establish our asymptotic theory for trending time-varying coefficient time series models. Throughout, we assume that is a stationary -mixing sequence. Recall that a sequence is said to be -mixing (or strong mixing) if the mixing coefficients,converge to zero as , where denotes the -field generated by . The notion of -mixing is widely adopted in the study of nonparametric regression models. It is reasonably weak and is known to be fulfilled for many stochastic processes, including many familiar linear and nonlinear time series models. We refer to the monograph of Doukhan  and Fan and Yao  for some properties or more mixing conditions.
3. Asymptotic Theory
We first list the regularity conditions needed in the proof of the theorems although some of them might not be the weakest possible. (A1) (i) The process is a strictly stationary -mixing with for some and . (ii) . (A2) is a convex function, and is assumed to be any choice of the derivative of . Denote by the set of discontinuity points of . The common distribution function of satisfying . (A3) satisfies the following conditions:(i)There exists some function such that as , where is continuous in a neighborhood of with .(ii)With probability 1, holds uniformly for in a neighborhood of , where is continuous at with .(iii)Let and be continuous in a neighborhood of with . Furthermore, .(iv) and are nonsingular matrices. (A4) The time-varying coefficients for , and the scaling function in wavelet kernel satisfies the following conditions:(i) belongs to Sobolev space with order .(ii) satisfies the Lipschitz of order condition of order .(iii) has a compact support and is in the Schwarz space with order , satisfies the Lipschitz condition with order . Furthermore, as , where is the Fourier transform of . (A5) (i) For the designed points, .(ii)For some Lipschitz function , (A6) The tuning parameter satisfies (i) . (ii) .(iii)Let and for , for . Assume that .
Some remarks on the conditions are in order.
Remark 1. Condition (A1) is the standard requirements for moments and the mixing coefficient for an -mixing sequence. It is well known that among various mixing conditions, for example, -, -, and -mixing, and -mixing is reasonably weak and can be used to depict many stochastic processes, including many familiar linear and nonlinear time series models. (A1) (i) is a very common condition, see Cai et al. ; Cai and Ould-Saïd ; and Fan and Yao ; among others.
Remark 2. Conditions (A2) and (A3) are often imposed to establish the large-sample theory of M-estimation in parametric or nonparametric models, see, for example, Bai et al. ; Cai and Ould-Saïd ; and Lin et al. . They are mild and cover some well-known special cases, such as least-square estimation, Huber loss, and quantile. Some special examples are given as follows.
Remark 3. Conditions (A4) and (A5) are the mild regularity conditions for wavelet smoothing, which have been adopted by Antoniadis et al. ; Zhou and You ; and Zhou et al. . In condition (A6), acts as a tuning parameter, such as the bandwidth does for standard kernel smoothers; (A6) (i) and (ii) are for Bahadur representation and (A6) (i) and (iii) are for asymptotic normality, of WME. If (A6) (iii) holds, it implies (A6) (ii). There is a wide range of options to make A6 (i) and (iii) work. For example, if , then . Furthermore, take , then (A6) (i) and (iii) hold.
Recall that based on (1). Let , then . Set . We can write the object function in equation (4) asand denoteThe first theorem is crucial for establishing the asymptotic properties of the WME.
Theorem 1. Under the conditions A1–A5 (i) and (A6) (i) and (ii), for any compact subset , we havewhere
If (A1)–(A5) (i) and (A6) (i) and (iii) hold, then
With the help of Theorem 1, we can establish the Bahadur representation of WME.
Theorem 2. Under the conditions (A1)–(A5) (i) and (A6) (i) and (ii), we haveuniformly in , where
If (A1)–(A5) (i) and (A6) (i) and (iii) hold, then
Remark 4. Theorem 2 gives the Bahadur representation of WME for time-varying coefficient time series models (1), which shows that the WME has Bahadur order is . It is slightly weaker than the Bahadur order of Hong , where the bandwidth . However, Hong  required strong smoothness: time-varying coefficient function with the second-order differentiability. We have greatly relaxed this assumption. We do not need the strong assumption. Our degrees of smoothness of the function are less restrictive. See condition (A4) (i) and (ii).
With the help of Theorem 2, we can establish asymptotic normality of the WME.
Theorem 3. Under the conditions (A1)–(A5) and (A6) (i) and (iii), we havewhere with , , and denotes the maximum integer not greater than .
Remark 5. To obtain an asymptotic expansion of the variance and an asymptotic normality, we need to consider an approximation to based on its values at dyadic points of order , as Antoniadis et al.  have done. The main reason is that the variance of as a function of is unstable. It can be avoided by using dyadic points . Also see Antoniadis et al.  for the details. From Theorems 2 and 3, we have the uniform weak consistency of WME:Next, we shall give some special cases as corollaries of Theorem 3.
Corollary 1. Let and , which corresponds to the case of mean regression, which implies , , , and . We have
Corollary 2. Let and , which corresponds to the case of quantile regression. Assume a.s. and has a continuous positive conditional density and cumulative distribution function in a neighborhood of 0 given the . Thus, and . Let . We have
Furthermore, if and are mutually independent for , we havewhere is the density of .
4. Technical Lemmas and Proofs
In the following sections, is positive constant, which may be changed from line to line in the proof.
Lemma 1 (see Antoniadis et al.  and Walter ). Suppose that (A4) holds. We have(i) and , where is a positive integer and is a constant depending on only(ii)(iii), where is a positive constant(iv) uniformly in , as
Lemma 2 (see Antoniadis et al. ). Suppose that (A4) and (A5) (i) hold and satisfies (A4) (i) and (ii). Then,where
Lemma 3 (see Lin and Lu ). Let be an -mixing sequence, , and with and , . Then,
Lemma 4 (see Pollard ). Let be a sequence of random convex functions defined on a convex, open subset of . Suppose is a real-valued function on for which in probability, for each fixed in . Then, for each compact subset of , in probability,
For simplicity, we introduce some notations before proving. We are interested in the asymptotic behaviors of , which can follow from the new optimization objective function:
We rewrite as
Proof of Lemma 5. From the definition of , we havewhereBy the convexity of , we haveFurthermore, we haveLet . From conditions A1 (ii), A3 (ii), and A4 (ii), and Lemma 1, we haveTo obtain an upper bound for the second term on the right-hand side of (34), we split it into two terms as follows. Let and with specified later. We havewhere is a sequence of positive integers such that as .
For , by (35) and choice of , we haveBy conditions A1 (ii), A3 (ii), and A4, and Lemmas 1 and 2, we haveFor , by (38), conditions A1 (ii) and A3 (ii), and Lemmas 1–3, we haveFrom (36), (37), and (39), one gets . Combining with (34) and (35), we haveuniform in . Note that . We haveuniform in .
Proof of Lemma 7. Recall thatLet . First, we calculate its variance-covariance. Note that by condition A3 (i). We haveBy using the same argument as those used in Proof of Lemma 5, we obtainNow, we prove the first term on the right-hand side of (46). Using the compact support and Lipschitz properties of , one can show that is Lipschitz uniformly in , so thatand the Lipschitz property of implieswhere and belong to . By condition A3 (iii), we haveSince is unstable as a function of , we need to compute it at dyadic points of order . By Lemma 6.1 in Antoniadis et al. , we haveCombining with (46), (47), and (50), we haveSecond, we shall show (44). RedefineNote that , and it turns (44) towhere the definition of is similar to by using to replace . As is not necessarily bounded, we employ a truncation method. Denote and as before that , and let be defined as before with replaced by . Similar to Proof of Theorem 2 in Cai et al. , by using Doob’s large-block and small-block technique, we can show thatLet . To prove (54), by (55), it suffices to showat first and then . Let be defined as before with replaced by . With the same argument as (46), one getsSincefrom condition A3 (iii). Thus, (56) holds. Therefore, Proof of Lemma 7 is completed.
Proof of Theorem 1. From Lemmas 5 and 6, and (29), for the fixed , we haveFrom Lemma 7, it is easy to see that is stochastically bounded. Since the convex function converges in probability to the convex function , it follows from convexity Lemma 4 that for any compact set ,Notice that the convexity lemma strengthens the pointwise result to uniform convergence on compact subsets of . This completes Proof of Theorem 1.
Proof of Theorem 2. To obtain Bahadur representation of WME, the idea behind the proof is to approximate by a quadratic function whose minimizing value has an asymptotic the behavior, and then to show that lies close enough to the minimizing value to share its asymptotic behaviour. We have done the first step, that is, the results of Theorem 1 and Lemma 7. Let and . Now, we prove the second step. The argument will be complete if we can show for each thatThe argument is similar to Proof of Theorem 1 in Pollard , whose method is extended to obtain the Bahadur representation of WME. From Theorem 1, the compact set can be chosen to contain a closed ball with center and radius , with probability arbitrarily close to one. Thereby, it implies thatNow, we consider the behavior of outside . Suppose with and is a unit vector. Define as the boundary point of that lies on the line segment from to , that is, . Convexity of and the definition of implyFurthermore, we haveuniformly in <