A class of martingale estimating functions is convenient and plays an important role for inference for nonlinear time series models. However, when the information about the first four conditional moments of the observed process becomes available, the quadratic estimating functions are more informative. In this paper, a general framework for joint estimation of conditional mean and variance parameters in time series models using quadratic estimating functions is developed. Superiority of the approach is demonstrated by comparing the information associated with the optimal quadratic estimating function with the information associated with other estimating functions. The method is used to study the optimal quadratic estimating functions of the parameters of autoregressive conditional duration (ACD) models, random coefficient autoregressive (RCA) models, doubly stochastic models and regression models with ARCH errors. Closed-form expressions for the information gain are also discussed in some detail.

1. Introduction

Godambe [1] was the first to study the inference for discrete time stochastic processes using estimating function method. Thavaneswaran and Abraham [2] had studied the nonlinear time series estimation problems using linear estimating functions. Naik-Nimbalkar and Rajashi [3] and Thavaneswaran and Heyde [4] studied the filtering and prediction problems using linear estimating functions in the Bayesian context. Chandra and Taniguchi [5], Merkouris [6], and Ghahramani and Thavaneswaran [7] among others have studied the estimation problems using estimating functions. In this paper, we study the linear and quadratic martingale estimating functions and show that the quadratic estimating functions are more informative when the conditional mean and variance of the observed process depend on the same parameter of interest.

This paper is organized as follows. The rest of Section 1 presents the basics of estimating functions and information associated with estimating functions. Section 2 presents the general model for the multiparameter case and the form of the optimal quadratic estimating function. In Section 3, the theory is applied to four different models.

Suppose that {𝐲𝑡,𝑡=1,…,𝑛} is a realization of a discrete time stochastic process, and its distribution depends on a vector parameter 𝜽 belonging to an open subset Θ of the 𝑝-dimensional Euclidean space. Let (Ω,ℱ,𝑃𝜽) denote the underlying probability space, and let ℱ𝑦𝑡 be the ğœŽ-field generated by {𝐲1,…,𝐲𝑡,𝑡≥1}. Let 𝐡𝑡=𝐡𝑡(𝐲1,…,𝐲𝑡,𝜽), 1≤𝑡≤𝑛 be specified ğ‘ž-dimensional vectors that are martingales. We consider the class ℳ of zero mean and square integrable 𝑝-dimensional martingale estimating functions of the form𝐠ℳ=𝑛(𝜽)∶𝐠𝑛(𝜽)=𝑛𝑡=1𝐚𝑡−1𝐡𝑡,(1.1) where 𝐚𝑡−1 are ğ‘Ã—ğ‘ž matrices depending on 𝐲1,…,𝐲𝑡−1, 1≤𝑡≤𝑛. The estimating functions 𝐠𝑛(𝜽) are further assumed to be almost surely differentiable with respect to the components of 𝜽 and such that E[(𝜕𝐠𝑛(𝜽)/𝜕𝜽)∣ℱ𝑦𝑛−1] and E[𝐠𝑛(𝜽)𝐠𝑛(𝜽)î…žâˆ£â„±ğ‘¦ğ‘›âˆ’1] are nonsingular for all 𝜽∈Θ and for each 𝑛≥1. The expectations are always taken with respect to 𝑃𝜽. Estimators of 𝜽 can be obtained by solving the estimating equation 𝐠𝑛(𝜽)=ğŸŽ. Furthermore, the 𝑝×𝑝 matrix E[𝐠𝑛(𝜽)𝐠𝑛(𝜽)î…žâˆ£â„±ğ‘¦ğ‘›âˆ’1] is assumed to be positive definite for all 𝜽∈Θ. Then, in the class of all zero mean and square integrable martingale estimating functions ℳ, the optimal estimating function 𝐠∗𝑛(𝜽) which maximizes, in the partial order of nonnegative definite matrices, the information matrix𝐈𝐠𝑛E(𝜽)=𝜕𝐠𝑛(𝜽)𝜕𝜽∣ℱ𝑦𝑛−1E𝐠𝑛(𝜽)𝐠𝑛(𝜽)î…žâˆ£â„±ğ‘¦ğ‘›âˆ’1−1E𝜕𝐠𝑛(𝜽)𝜕𝜽∣ℱ𝑦𝑛−1(1.2) is given by𝐠∗𝑛(𝜽)=𝑛𝑡=1𝐚∗𝑡−1𝐡𝑡=𝑛𝑡=1E𝜕𝐡𝑡𝜕𝜽∣ℱ𝑦𝑡−1Eî€ºğ¡ğ‘¡ğ¡î…žğ‘¡âˆ£â„±ğ‘¦ğ‘¡âˆ’1−1𝐡𝑡,(1.3) and the corresponding optimal information reduces to E[𝐠∗𝑛(𝜽)𝐠∗𝑛(𝜽)î…žâˆ£â„±ğ‘¦ğ‘›âˆ’1].

The function 𝐠∗𝑛(𝜽) is also called the “quasi-score” and has properties similar to those of a score function in the sense that E[𝐠∗𝑛(𝜽)]=ğŸŽ and E[𝐠∗𝑛(𝜽)𝐠∗𝑛(𝜽)]=−E[𝜕𝐠∗𝑛(𝜽)/ğœ•ğœ½î…ž]. This is a more general result in the sense that for its validity, we do not need to assume that the true underlying distribution belongs to the exponential family of distributions. The maximum correlation between the optimal estimating function and the true unknown score justifies the terminology “quasi-score” for 𝐠∗𝑛(𝜽). Moreover, it follows from Lindsay [8, page 916] that if we solve an unbiased estimating equation 𝐠𝑛(𝜽)=ğŸŽ to get an estimator, then the asymptotic variance of the resulting estimator is the inverse of the information 𝐈𝐠𝑛. Hence, the estimator obtained from a more informative estimating equation is asymptotically more efficient.

2. General Model and Method

Consider a discrete time stochastic process {𝑦𝑡,𝑡=1,2,…} with conditional moments𝜇𝑡𝑦(𝜽)=E𝑡∣ℱ𝑦𝑡−1,ğœŽ2𝑡𝑦(𝜽)=Var𝑡∣ℱ𝑦𝑡−1,𝛾𝑡1(𝜽)=ğœŽ3𝑡E𝑦(𝜽)𝑡−𝜇𝑡(𝜽)3∣ℱ𝑦𝑡−1,𝜅𝑡1(𝜽)=ğœŽ4𝑡E𝑦(𝜽)𝑡−𝜇𝑡(𝜽)4∣ℱ𝑦𝑡−1−3.(2.1) That is, we assume that the skewness and the excess kurtosis of the standardized variable 𝑦𝑡 do not contain any additional parameters. In order to estimate the parameter 𝜽 based on the observations 𝑦1,…,𝑦𝑛, we consider two classes of martingale differences {𝑚𝑡(𝜽)=𝑦𝑡−𝜇𝑡(𝜽),𝑡=1,…,𝑛} and {𝑠𝑡(𝜽)=𝑚2𝑡(𝜽)âˆ’ğœŽ2𝑡(𝜽),𝑡=1,…,𝑛} such that⟨𝑚⟩𝑡𝑚=E2𝑡∣ℱ𝑦𝑡−1𝑦=E𝑡−𝜇𝑡2∣ℱ𝑦𝑡−1=ğœŽ2𝑡,⟨𝑠⟩𝑡𝑠=E2𝑡∣ℱ𝑦𝑡−1𝑦=E𝑡−𝜇𝑡4+ğœŽ4𝑡−2ğœŽ2𝑡𝑦𝑡−𝜇𝑡2∣ℱ𝑦𝑡−1=ğœŽ4𝑡𝜅𝑡,+2⟨𝑚,𝑠⟩𝑡𝑚=E𝑡𝑠𝑡∣ℱ𝑦𝑡−1𝑦=E𝑡−𝜇𝑡3âˆ’ğœŽ2𝑡𝑦𝑡−𝜇𝑡∣ℱ𝑦𝑡−1=ğœŽ3𝑡𝛾𝑡.(2.2)

The optimal estimating functions based on the martingale differences 𝑚𝑡 and 𝑠𝑡 are 𝐠∗𝑀∑(𝜽)=−𝑛𝑡=1(𝜕𝜇𝑡/𝜕𝜽)(𝑚𝑡/⟨𝑚⟩𝑡) and 𝐠∗𝑆∑(𝜽)=−𝑛𝑡=1(ğœ•ğœŽ2𝑡/𝜕𝜽)(𝑠𝑡/⟨𝑠⟩𝑡), respectively. Then, the information associated with 𝐠∗𝑀(𝜽) and 𝐠∗𝑆(𝜽) are 𝐈𝐠∗𝑀∑(𝜽)=𝑛𝑡=1(𝜕𝜇𝑡/𝜕𝜽)(𝜕𝜇𝑡/ğœ•ğœ½î…ž)(1/⟨𝑚⟩𝑡) and 𝐈𝐠∗𝑆∑(𝜽)=𝑛𝑡=1(ğœ•ğœŽ2𝑡/𝜕𝜽)(ğœ•ğœŽ2𝑡/ğœ•ğœ½î…ž)(1/⟨𝑠⟩𝑡), respectively. Crowder [9] studied the optimal quadratic estimating function with independent observations. For the discrete time stochastic process {𝑦𝑡}, the following theorem provides optimality of the quadratic estimating function for the multiparameter case.

Theorem 2.1. For the general model in (2.1), in the class of all quadratic estimating functions of the form 𝒢𝑄={𝐠𝑄(𝜽)∶𝐠𝑄∑(𝜽)=𝑛𝑡=1(𝐚𝑡−1𝑚𝑡+𝐛𝑡−1𝑠𝑡)}, (a)the optimal estimating function is given by 𝐠∗𝑄∑(𝜽)=𝑛𝑡=1(𝐚∗𝑡−1𝑚𝑡+𝐛∗𝑡−1𝑠𝑡), where𝐚∗𝑡−1=1−⟨𝑚,𝑠⟩2𝑡⟨𝑚⟩𝑡⟨𝑠⟩𝑡−1−𝜕𝜇𝑡1𝜕𝜽⟨𝑚⟩𝑡+ğœ•ğœŽ2𝑡𝜕𝜽⟨𝑚,𝑠⟩𝑡⟨𝑚⟩𝑡⟨𝑠⟩𝑡,𝐛∗𝑡−1=1−⟨𝑚,𝑠⟩2𝑡⟨𝑚⟩𝑡⟨𝑠⟩𝑡−1𝜕𝜇𝑡𝜕𝜽⟨𝑚,ğ‘ âŸ©ğ‘¡âŸ¨ğ‘šâŸ©ğ‘¡âŸ¨ğ‘ âŸ©ğ‘¡âˆ’ğœ•ğœŽ2𝑡1𝜕𝜽⟨𝑠⟩𝑡;(2.3)(b) the information 𝐈𝑔∗𝑄(𝜽) is given by 𝐈𝐠∗𝑄(𝜽)=𝑛𝑡=11−⟨𝑚,𝑠⟩2𝑡⟨𝑚⟩𝑡⟨𝑠⟩𝑡−1îƒ©ğœ•ğœ‡ğ‘¡ğœ•ğœ½ğœ•ğœ‡ğ‘¡ğœ•ğœ½î…ž1⟨𝑚⟩𝑡+ğœ•ğœŽ2ğ‘¡ğœ•ğœ½ğœ•ğœŽ2ğ‘¡ğœ•ğœ½î…ž1âŸ¨ğ‘ âŸ©ğ‘¡âˆ’îƒ©ğœ•ğœ‡ğ‘¡ğœ•ğœ½ğœ•ğœŽ2ğ‘¡ğœ•ğœ½î…ž+ğœ•ğœŽ2ğ‘¡ğœ•ğœ½ğœ•ğœ‡ğ‘¡ğœ•ğœ½î…žîƒªâŸ¨ğ‘š,𝑠⟩𝑡⟨𝑚⟩𝑡⟨𝑠⟩𝑡;(2.4)(c) the gain in information 𝐈𝐠∗𝑄(𝜽)−𝐈𝐠∗𝑀(𝜽) is given by 𝑛𝑡=11−⟨𝑚,𝑠⟩2𝑡⟨𝑚⟩𝑡⟨𝑠⟩𝑡−1îƒ©ğœ•ğœ‡ğ‘¡ğœ•ğœ½ğœ•ğœ‡ğ‘¡ğœ•ğœ½î…žâŸ¨ğ‘š,𝑠⟩2𝑡⟨𝑚⟩2𝑡⟨𝑠⟩𝑡+ğœ•ğœŽ2ğ‘¡ğœ•ğœ½ğœ•ğœŽ2ğ‘¡ğœ•ğœ½î…ž1âŸ¨ğ‘ âŸ©ğ‘¡âˆ’îƒ©ğœ•ğœ‡ğ‘¡ğœ•ğœ½ğœ•ğœŽ2ğ‘¡ğœ•ğœ½î…ž+ğœ•ğœŽ2ğ‘¡ğœ•ğœ½ğœ•ğœ‡ğ‘¡ğœ•ğœ½î…žîƒªâŸ¨ğ‘š,𝑠⟩𝑡⟨𝑚⟩𝑡⟨𝑠⟩𝑡;(2.5)(d) the gain in information 𝐈𝐠∗𝑄(𝜽)−𝐈𝐠∗𝑆(𝜽) is given by 𝑛𝑡=11−⟨𝑚,𝑠⟩2𝑡⟨𝑚⟩𝑡⟨𝑠⟩𝑡−1îƒ©ğœ•ğœ‡ğ‘¡ğœ•ğœ½ğœ•ğœ‡ğ‘¡ğœ•ğœ½î…ž1⟨𝑚⟩𝑡+ğœ•ğœŽ2ğ‘¡ğœ•ğœ½ğœ•ğœŽ2ğ‘¡ğœ•ğœ½î…žâŸ¨ğ‘š,𝑠⟩2𝑡⟨𝑚⟩𝑡⟨𝑠⟩𝑡−⟨𝑚,𝑠⟩2ğ‘¡âˆ’îƒ©ğœ•ğœ‡ğ‘¡ğœ•ğœ½ğœ•ğœŽ2ğ‘¡ğœ•ğœ½î…ž+ğœ•ğœŽ2ğ‘¡ğœ•ğœ½ğœ•ğœ‡ğ‘¡ğœ•ğœ½î…žîƒªâŸ¨ğ‘š,𝑠⟩𝑡⟨𝑚⟩𝑡⟨𝑠⟩𝑡.(2.6)

Proof. We choose two orthogonal martingale differences 𝑚𝑡 and 𝜓𝑡=ğ‘ ğ‘¡âˆ’ğœŽğ‘¡ğ›¾ğ‘¡ğ‘šğ‘¡, where the conditional variance of 𝜓𝑡 is given by ⟨𝜓⟩𝑡=(⟨𝑚⟩𝑡⟨𝑠⟩𝑡−⟨𝑚,𝑠⟩2𝑡)/⟨𝑚⟩𝑡=ğœŽ4𝑡(𝜅𝑡+2−𝛾2𝑡). That is, 𝑚𝑡 and 𝜓𝑡 are uncorrelated with conditional variance ⟨𝑚⟩𝑡 and ⟨𝜓⟩𝑡, respectively. Moreover, the optimal martingale estimating function and associated information based on the martingale differences 𝜓𝑡 are 𝐠∗Ψ(𝜽)=𝑛𝑡=1𝜕𝜇𝑡𝜕𝜽⟨𝑚,ğ‘ âŸ©ğ‘¡âŸ¨ğ‘šâŸ©ğ‘¡âˆ’ğœ•ğœŽ2𝑡𝜓𝜕𝜽𝑡⟨𝜓⟩𝑡=𝑛𝑡=11−⟨𝑚,𝑠⟩2𝑡⟨𝑚⟩𝑡⟨𝑠⟩𝑡−1×−𝜕𝜇𝑡𝜕𝜽⟨𝑚,𝑠⟩2𝑡⟨𝑚⟩2𝑡⟨𝑠⟩𝑡+ğœ•ğœŽ2𝑡𝜕𝜽⟨𝑚,𝑠⟩𝑡⟨𝑚⟩𝑡⟨𝑠⟩𝑡𝑚𝑡+𝜕𝜇𝑡𝜕𝜽⟨𝑚,ğ‘ âŸ©ğ‘¡âŸ¨ğ‘šâŸ©ğ‘¡âŸ¨ğ‘ âŸ©ğ‘¡âˆ’ğœ•ğœŽ2𝑡1𝜕𝜽⟨𝑠⟩𝑡𝑠𝑡,𝐈𝐠∗Ψ(𝜽)=𝑛𝑡=1𝜕𝜇𝑡𝜕𝜽⟨𝑚,ğ‘ âŸ©ğ‘¡âŸ¨ğ‘šâŸ©ğ‘¡âˆ’ğœ•ğœŽ2ğ‘¡ğœ•ğœ½îƒªîƒ©ğœ•ğœ‡ğ‘¡ğœ•ğœ½î…žâŸ¨ğ‘š,ğ‘ âŸ©ğ‘¡âŸ¨ğ‘šâŸ©ğ‘¡âˆ’ğœ•ğœŽ2ğ‘¡ğœ•ğœ½î…žîƒª1⟨𝜓⟩𝑡=𝑛𝑡=11−⟨𝑚,𝑠⟩2𝑡⟨𝑚⟩𝑡⟨𝑠⟩𝑡−1Ã—îƒ©ğœ•ğœ‡ğ‘¡ğœ•ğœ½ğœ•ğœ‡ğ‘¡ğœ•ğœ½î…žâŸ¨ğ‘š,𝑠⟩2𝑡⟨𝑚⟩2𝑡⟨𝑠⟩𝑡+ğœ•ğœŽ2ğ‘¡ğœ•ğœ½ğœ•ğœŽ2ğ‘¡ğœ•ğœ½î…ž1âŸ¨ğ‘ âŸ©ğ‘¡âˆ’îƒ©ğœ•ğœ‡ğ‘¡ğœ•ğœ½ğœ•ğœŽ2ğ‘¡ğœ•ğœ½î…ž+ğœ•ğœŽ2ğ‘¡ğœ•ğœ½ğœ•ğœ‡ğ‘¡ğœ•ğœ½î…žîƒªâŸ¨ğ‘š,𝑠⟩𝑡⟨𝑚⟩𝑡⟨𝑠⟩𝑡.(2.7) Then, the quadratic estimating function based on 𝑚𝑡 and 𝜓𝑡 becomes 𝐠∗𝑄(𝜽)=𝑛𝑡=11−⟨𝑚,𝑠⟩2𝑡⟨𝑚⟩𝑡⟨𝑠⟩𝑡−1×−𝜕𝜇𝑡1𝜕𝜽⟨𝑚⟩𝑡+ğœ•ğœŽ2𝑡𝜕𝜽⟨𝑚,𝑠⟩𝑡⟨𝑚⟩𝑡⟨𝑠⟩𝑡𝑚𝑡+𝜕𝜇𝑡𝜕𝜽⟨𝑚,ğ‘ âŸ©ğ‘¡âŸ¨ğ‘šâŸ©ğ‘¡âŸ¨ğ‘ âŸ©ğ‘¡âˆ’ğœ•ğœŽ2𝑡1𝜕𝜽⟨𝑠⟩𝑡𝑠𝑡(2.8) and satisfies the sufficient condition for optimality E𝜕𝐠𝑄(𝜽)𝜕𝜽∣ℱ𝑦𝑡−1𝐠=Cov𝑄(𝜽),𝐠∗𝑄(𝜽)∣ℱ𝑦𝑡−1𝐾,∀𝐠𝑄(𝜽)∈𝒢𝑄,(2.9) where 𝐾 is a constant matrix. Hence, 𝐠∗𝑄(𝜽) is optimal in the class 𝒢𝑄, and part (a) follows. Since 𝑚𝑡 and 𝜓𝑡 are orthogonal, the information 𝐈𝐠∗𝑄(𝜽)=𝐈𝐠∗𝑀(𝜽)+𝐈𝐠∗Ψ(𝜽) and part (b) follow. Hence, for each component 𝜃𝑖, 𝑖=1,…,𝑝, neither 𝑔∗𝑀(𝜃𝑖) nor 𝑔∗𝑆(𝜃) is fully informative, that is, 𝐼𝑔∗𝑄(𝜃𝑖)≥𝐼𝑔∗𝑀(𝜃𝑖) and 𝐼𝑔∗𝑄(𝜃𝑖)≥𝐼𝑔∗𝑆(𝜃𝑖).

Corollary 2.2. When the conditional skewness 𝛾 and kurtosis 𝜅 are constants, the optimal quadratic estimating function and associated information, based on the martingale differences 𝑚𝑡=𝑦𝑡−𝜇𝑡 and 𝑠𝑡=𝑚2ğ‘¡âˆ’ğœŽ2𝑡, are given by 𝐠∗𝑄𝛾(𝜽)=1−2𝜅+2𝑛−1𝑡=11ğœŽ3ğ‘¡îƒ©îƒ©âˆ’ğœŽğ‘¡ğœ•ğœ‡ğ‘¡+𝛾𝜕𝜽𝜅+2ğœ•ğœŽ2𝑡𝑚𝜕𝜽𝑡+1𝛾𝜅+2𝜕𝜇𝑡−1ğœ•ğœ½ğœŽğ‘¡ğœ•ğœŽ2𝑡𝑠𝜕𝜽𝑡,𝐈𝐠∗𝑄(𝛾𝜽)=1−2𝜅+2−1𝐈𝐠∗𝑀(𝜽)+𝐈𝐠∗𝑆(𝛾𝜽)−𝜅+2𝑛𝑡=11ğœŽ3ğ‘¡îƒ©ğœ•ğœ‡ğ‘¡ğœ•ğœ½ğœ•ğœŽ2ğ‘¡ğœ•ğœ½î…ž+ğœ•ğœŽ2ğ‘¡ğœ•ğœ½ğœ•ğœ‡ğ‘¡ğœ•ğœ½î…ž.(2.10)

3. Applications

3.1. Autoregressive Conditional Duration (ACD) Models

There is growing interest in the analysis of intraday financial data such as transaction and quote data. Such data have increasingly been made available by many stock exchanges. Unlike closing prices which are measured daily, monthly, or yearly, intra-day data or high-frequency data tend to be irregularly spaced. Furthermore, the durations between events themselves are random variables. The autoregressive conditional duration (ACD) process due to Engle and Russell [10] had been proposed to model such durations, in order to study the dynamic structure of the adjusted durations 𝑥𝑖, with 𝑥𝑖=𝑡𝑖−𝑡𝑖−1, where 𝑡𝑖 is the time of the 𝑖th transaction. The crucial assumption underlying the ACD model is that the time dependence is described by a function 𝜓𝑖, where 𝜓𝑖 is the conditional expectation of the adjusted duration between the (𝑖−1)th and the 𝑖th trades. The basic ACD model is defined as 𝑥𝑖=𝜓𝑖𝜀𝑖,𝜓𝑖𝑥=E𝑖∣ℱ𝑥𝑡𝑖−1,(3.1) where 𝜀𝑖 are the iid nonnegative random variables with density function 𝑓(⋅) and unit mean, and ℱ𝑥𝑡𝑖−1 is the information available at the (𝑖−1)th trade. We also assume that 𝜀𝑖 is independent of ℱ𝑥𝑡−1. It is clear that the types of ACD models vary according to different distributions of 𝜀𝑖 and specifications of 𝜓𝑖. In this paper, we will discuss a specific class of models which is known as ACD (𝑝, ğ‘ž) model and given by 𝑥𝑡=𝜓𝑡𝜀𝑡,𝜓𝑡=𝜔+𝑝𝑗=1ğ‘Žğ‘—ğ‘¥ğ‘¡âˆ’ğ‘—+ğ‘žî“ğ‘—=1𝑏𝑗𝜓𝑡−𝑗,(3.2) where 𝜔>0, ğ‘Žğ‘—>0, 𝑏𝑗>0, and ∑max(𝑝,ğ‘ž)𝑗=1(ğ‘Žğ‘—+𝑏𝑗)<1. We assume that 𝜀𝑡's are iid nonnegative random variables with mean 𝜇𝜀, variance ğœŽ2𝜀, skewness 𝛾𝜀, and excess kurtosis 𝜅𝜀. In order to estimate the parameter vector 𝜽=(𝜔,ğ‘Ž1,…,ğ‘Žğ‘,𝑏1,…,ğ‘ğ‘ž), we use the estimating function approach. For this model, the conditional moments are 𝜇𝑡=𝜇𝜀𝜓𝑡, ğœŽ2𝑡=ğœŽ2𝜀𝜓2𝑡, 𝛾𝑡=𝛾𝜀, and 𝜅𝑡=𝜅𝜀. Let 𝑚𝑡=𝑥𝑡−𝜇𝑡 and 𝑠𝑡=𝑚2ğ‘¡âˆ’ğœŽ2𝑡 be the sequences of martingale differences such that ⟨𝑚⟩𝑡=ğœŽ2𝜀𝜓2𝑡, ⟨𝑠⟩𝑡=ğœŽ4𝜀(𝜅𝜀+2)𝜓4𝑡, and ⟨𝑚,𝑠⟩𝑡=ğœŽ3𝜀𝛾𝜀𝜓3𝑡. The optimal estimating function and associated information based on 𝑚𝑡 are given by 𝐠∗𝑀(𝜽)=−(𝜇𝜀/ğœŽ2𝜀)∑𝑛𝑡=1(1/𝜓2𝑡)(𝜕𝜓𝑡/𝜕𝜽)𝑚𝑡 and 𝐈𝐠∗𝑀(𝜽)=(𝜇2𝜀/ğœŽ2𝜀)∑𝑛𝑡=1(1/𝜓2𝑡)(𝜕𝜓𝑡/𝜕𝜽)(𝜕𝜓𝑡/ğœ•ğœ½î…ž). The optimal estimating function and the associated information based on 𝑠𝑡 are given by 𝐠∗𝑆(𝜽)=−2/ğœŽ2𝜀(𝜅𝜀∑+2)𝑛𝑡=1(1/𝜓3𝑡)(𝜕𝜓𝑡/𝜕𝜽)𝑠𝑡 and 𝐈𝐠∗𝑆(𝜽)=(4/(𝜅𝜀∑+2))𝑛𝑡=1(1/𝜓2𝑡)(𝜕𝜓𝑡/𝜕𝜽)(𝜕𝜓𝑡/ğœ•ğœ½î…ž). Then, by Corollary 2.2 that the optimal quadratic estimating function and associated information are given by 𝐠∗𝑄1(𝜽)=ğœŽ2𝜀𝜅𝜀+2−𝛾2𝜀𝑛𝑡=1−𝜇𝜀𝜅𝜀+2+2ğœŽğœ€ğ›¾ğœ€ğœ“2𝑡𝜕𝜓𝑡𝑚𝜕𝜽𝑡+𝜇𝜀𝛾𝜀−2ğœŽğœ€ğœ“ğ‘¡ğœŽğœ€ğœ“3𝑡𝜕𝜓𝑡𝑠𝜕𝜽𝑡,𝐈𝐠∗𝑄𝛾(𝜽)=1−2𝜀𝜅𝜀+2−1𝐈𝐠∗𝑀(𝜽)+𝐈𝐠∗𝑆(𝜽)−4ğœ‡ğœ€ğ›¾ğœ€ğœŽğœ€î€·ğœ…ğœ€î€¸+2𝑛𝑡=11𝜓2ğ‘¡ğœ•ğœ“ğ‘¡ğœ•ğœ½ğœ•ğœ“ğ‘¡ğœ•ğœ½î…žîƒª=4ğœŽ2𝜀+𝜇2𝜀𝜅𝜀+2−4ğœ‡ğœ€ğœŽğœ€ğ›¾ğœ€ğœŽ2𝜀𝜅𝜀+2−𝛾2𝜀𝑛𝑡=11𝜓2ğ‘¡ğœ•ğœ“ğ‘¡ğœ•ğœ½ğœ•ğœ“ğ‘¡ğœ•ğœ½î…ž,(3.3) the information gain in using 𝐠∗𝑄(𝜽) over 𝐠∗𝑀(𝜽) is2ğœŽğœ€âˆ’ğœ‡ğœ€ğ›¾ğœ€î€¸2ğœŽ2𝜀𝜅𝜀+2−𝛾2𝜀𝑛𝑡=11𝜓2ğ‘¡ğœ•ğœ“ğ‘¡ğœ•ğœ½ğœ•ğœ“ğ‘¡ğœ•ğœ½î…ž,(3.4) and the information gain in using 𝐠∗𝑄(𝜽) over 𝐠∗𝑆(𝜽) is𝜇𝜀𝜅𝜀+2−2ğœŽğœ€ğ›¾ğœ€î€¸2ğœŽ2𝜀𝜅𝜀+2−𝛾2𝜀𝜅𝜀+2𝑛𝑡=11𝜓2ğ‘¡ğœ•ğœ“ğ‘¡ğœ•ğœ½ğœ•ğœ“ğ‘¡ğœ•ğœ½î…ž,(3.5) which are both nonnegative definite.

When 𝜀𝑡 follows an exponential distribution, 𝜇𝜀=1/𝜆, ğœŽ2𝜀=1/𝜆2, 𝛾𝜀=2, and 𝜅𝜀=3. Then, 𝐈𝐠∗𝑀∑(𝜽)=𝑛𝑡=1(1/𝜓2𝑡)(𝜕𝜓𝑡/𝜕𝜽)(𝜕𝜓𝑡/ğœ•ğœ½î…ž), 𝐈𝐠∗𝑆∑(𝜽)=(4/5)𝑛𝑡=1(1/𝜓2𝑡)(𝜕𝜓𝑡/𝜕𝜽)(𝜕𝜓𝑡/ğœ•ğœ½î…ž), and 𝐈𝐠∗𝑄∑(𝜽)=𝑛𝑡=1(1/𝜓2𝑡)(𝜕𝜓𝑡/𝜕𝜽)(𝜕𝜓𝑡/ğœ•ğœ½î…ž), and hence 𝐈𝐠∗𝑄(𝜽)=𝐈𝐠∗𝑀(𝜽)>𝐈𝐠∗𝑆(𝜽).

3.2. Random Coefficient Autoregressive Models

In this section, we will investigate the properties of the quadratic estimating functions for the random coefficient autoregressive (RCA) time series which were first introduced by Nicholls and Quinn [11].

Consider the RCA model𝑦𝑡=𝜃+𝑏𝑡𝑦𝑡−1+𝜀𝑡,(3.6) where {𝑏𝑡} and {𝜀𝑡} are uncorrelated zero mean processes with unknown variance ğœŽ2𝑏 and variance ğœŽ2𝜀=ğœŽ2𝜀(𝜃) with unknown parameter 𝜃, respectively. Further, we denote the skewness and excess kurtosis of {𝑏𝑡} by 𝛾𝑏, 𝜅𝑏 which are known, and of {𝜀𝑡} by 𝛾𝜀(𝜃), 𝜅𝜀(𝜃), respectively. In the model (3.6), both the parameter 𝜃 and 𝛽=ğœŽ2𝑏 need to be estimated. Let 𝜽=(𝜃,𝛽), we will discuss the joint estimation of 𝜃 and 𝛽. In this model, the conditional mean is 𝜇𝑡=𝑦𝑡−1𝜃 then and the conditional variance is ğœŽ2𝑡=𝑦2𝑡−1𝛽+ğœŽ2𝜀(𝜃). The parameter 𝜃 appears simultaneously in the mean and variance. Let 𝑚𝑡=𝑦𝑡−𝜇𝑡 and 𝑠𝑡=𝑚2ğ‘¡âˆ’ğœŽ2𝑡 such that ⟨𝑚⟩𝑡=𝑦2𝑡−1ğœŽ2𝑏+ğœŽ2𝜀, ⟨𝑠⟩𝑡=𝑦4𝑡−1ğœŽ4𝑏(𝜅𝑏+2)+ğœŽ4𝜀(𝜅𝜀+2)+4𝑦2𝑡−1ğœŽ2ğ‘ğœŽ2𝜀, ⟨𝑚,𝑠⟩𝑡=𝑦3𝑡−1ğœŽ3𝑏𝛾𝑏+ğœŽ3𝜀𝛾𝜀. Then the conditional skewness is 𝛾𝑡=⟨𝑚,𝑠⟩𝑡/ğœŽ3𝑡, and the conditional excess kurtosis is 𝜅𝑡=⟨𝑠⟩𝑡/ğœŽ4𝑡−2.

Since 𝜕𝜇𝑡/𝜕𝜽=(𝑦𝑡−1,0) and ğœ•ğœŽ2𝑡/𝜕𝜽=(ğœ•ğœŽ2𝜀/𝜕𝜃,𝑦2𝑡−1), by applying Theorem 2.1, the optimal quadratic estimating function for 𝜃 and 𝛽 based on the martingale differences 𝑚𝑡 and 𝑠𝑡 is given by 𝐠∗𝑄∑(𝜽)=𝑛𝑡=1𝐚∗𝑡−1𝑚𝑡+𝐛∗𝑡−1𝑠𝑡, where𝐚∗𝑡−1=1−⟨𝑚,𝑠⟩2𝑡⟨𝑚⟩𝑡⟨𝑠⟩𝑡−1−𝑦𝑡−1⟨𝑚⟩𝑡+ğœ•ğœŽ2𝜀𝜕𝜃⟨𝑚,𝑠⟩𝑡⟨𝑚⟩𝑡⟨𝑠⟩𝑡,𝑦2𝑡−1⟨𝑚,ğ‘ âŸ©ğ‘¡âŸ¨ğ‘šâŸ©ğ‘¡âŸ¨ğ‘ âŸ©ğ‘¡îƒªî…ž,𝐛∗𝑡−1=1−⟨𝑚,𝑠⟩2𝑡⟨𝑚⟩𝑡⟨𝑠⟩𝑡−1𝑦𝑡−1⟨𝑚,ğ‘ âŸ©ğ‘¡âŸ¨ğ‘šâŸ©ğ‘¡âŸ¨ğ‘ âŸ©ğ‘¡âˆ’ğœ•ğœŽ2𝜀1𝜕𝜃⟨𝑠⟩𝑡𝑦,−2𝑡−1âŸ¨ğ‘ âŸ©ğ‘¡îƒªî…ž.(3.7) Hence, the component quadratic estimating function for 𝜃 is𝑔∗𝑄(𝜃)=𝑛𝑡=11−⟨𝑚,𝑠⟩2𝑡⟨𝑚⟩𝑡⟨𝑠⟩𝑡−1×−𝑦𝑡−1⟨𝑚⟩𝑡+ğœ•ğœŽ2𝜀𝜕𝜃⟨𝑚,𝑠⟩𝑡⟨𝑚⟩𝑡⟨𝑠⟩𝑡𝑚𝑡+𝑦𝑡−1⟨𝑚,ğ‘ âŸ©ğ‘¡âŸ¨ğ‘šâŸ©ğ‘¡âŸ¨ğ‘ âŸ©ğ‘¡âˆ’ğœ•ğœŽ2𝜀1𝜕𝜃⟨𝑠⟩𝑡𝑠𝑡,(3.8) and the component quadratic estimating function for 𝛽 is𝑔∗𝑄(𝛽)=𝑛𝑡=11−⟨𝑚,𝑠⟩2𝑡⟨𝑚⟩𝑡⟨𝑠⟩𝑡−1𝑦2𝑡−1⟨𝑚,𝑠⟩𝑡𝑚𝑡⟨𝑚⟩𝑡⟨𝑠⟩𝑡−𝑦2𝑡−1𝑠𝑡⟨𝑠⟩𝑡.(3.9) Moreover, the information matrix of the optimal quadratic estimating function for 𝜃 and 𝛽 is given by𝐈g∗𝑄𝐼(𝜽)=𝜃𝜃𝐼𝜃𝛽𝐼𝛽𝜃𝐼𝛽𝛽,(3.10) where𝐼𝜃𝜃=𝑛𝑡=11−⟨𝑚,𝑠⟩2𝑡⟨𝑚⟩𝑡⟨𝑠⟩𝑡−1𝑦2𝑡−1⟨𝑚⟩𝑡+î‚µğœ•ğœŽ2𝜀𝜕𝜃21⟨𝑠⟩𝑡−2ğœ•ğœŽ2𝜀𝑦𝜕𝜃𝑡−1⟨𝑚,𝑠⟩𝑡⟨𝑚⟩𝑡⟨𝑠⟩𝑡,𝐼(3.11)𝜃𝛽=𝐼𝛽𝜃=𝑛𝑡=11−⟨𝑚,𝑠⟩2𝑡⟨𝑚⟩𝑡⟨𝑠⟩𝑡−1î‚µğœ•ğœŽ2𝜀1𝜕𝜃⟨𝑠⟩𝑡−𝑦𝑡−1⟨𝑚,𝑠⟩𝑡⟨𝑚⟩𝑡⟨𝑠⟩𝑡𝑦2𝑡−1𝐼,(3.12)𝛽𝛽=𝑛𝑡=11−⟨𝑚,𝑠⟩2𝑡⟨𝑚⟩𝑡⟨𝑠⟩𝑡−1𝑦4𝑡−1⟨𝑠⟩𝑡.(3.13)

In view of the parameter 𝜃 only, the conditional least squares (CLS) estimating function and the associated information are directly given by 𝑔CLS∑(𝜃)=𝑛𝑡=1𝑦𝑡−1𝑚𝑡 and 𝐼CLS∑(𝜃)=(𝑛𝑡=1𝑦2𝑡−1)2/∑𝑛𝑡=1𝑦2𝑡−1⟨𝑚⟩𝑡. The optimal martingale estimating function and the associated information based on 𝑚𝑡 are given by 𝑔∗𝑀∑(𝜃)=−𝑛𝑡=1(𝑦𝑡−1𝑚𝑡/⟨𝑚⟩𝑡) and 𝐼𝑔∗𝑀∑(𝜃)=𝑛𝑡=1(𝑦2𝑡−1/⟨𝑚⟩𝑡). Moreover, the inequality𝑛𝑡=1𝑦2𝑡−1⟨𝑚⟩𝑡𝑛𝑡=1𝑦2𝑡−1⟨𝑚⟩𝑡≥𝑛𝑡=1𝑦2𝑡−12(3.14) implies that 𝐼CLS(𝜃)≤𝐼𝑔∗𝑀(𝜃). Hence the optimal estimating function is more informative than the conditional least squares one. The optimal quadratic estimating function based on the martingale differences 𝑚𝑡 and 𝑠𝑡 is given by (3.8) and (3.11), respectively. It is obvious to see that the information of 𝑔∗𝑄(𝜃) is larger than that of 𝑔∗𝑀(𝜃). Therefore, we can conclude that for the RCA model, 𝐼CLS(𝜃)≤𝐼𝑔∗𝑀(𝜃)≤𝐼𝑔∗𝑄(𝜃), and hence, the estimate obtained by solving the optimal quadratic estimating equation is more efficient than the CLS estimate and the estimate obtained by solving the optimal linear estimating equation.

3.3. Doubly Stochastic Time Series Model

Random coefficient autoregressive models we discussed in the previous section are special cases of what Tjøstheim [12] refers to as doubly stochastic time series model. In the nonlinear case, these models are given by𝑦𝑡=𝜃𝑡𝑓𝑡,ℱ𝑦𝑡−1+𝜀𝑡,(3.15) where {𝜃+𝑏𝑡} of (3.6) is replaced by a more general stochastic sequence {𝜃𝑡} and 𝑦𝑡−1 is replaced by a function of the past, ℱ𝑦𝑡−1. Suppose that {𝜃𝑡} is a moving average sequence of the form𝜃𝑡=𝜃+ğ‘Žğ‘¡+ğ‘Žğ‘¡âˆ’1,(3.16) where {ğ‘Žğ‘¡} consists of square integrable independent random variables with mean zero and variance ğœŽ2ğ‘Ž. We further assume that {𝜀𝑡} and {ğ‘Žğ‘¡} are independent, then E[𝑦𝑡∣ℱ𝑦𝑡−1] depends on the posterior mean 𝑢𝑡=E[ğ‘Žğ‘¡âˆ£â„±ğ‘¦ğ‘¡âˆ’1], and variance 𝑣𝑡=E[(ğ‘Žğ‘¡âˆ’ğ‘¢ğ‘¡)2∣ℱ𝑦𝑡−1] of ğ‘Žğ‘¡. Under the normality assumption of {𝜀𝑡} and {ğ‘Žğ‘¡}, and the initial condition 𝑦0=0, 𝑢𝑡 and 𝑣𝑡 satisfy the following Kalman-like recursive algorithms (see [13, page 439]):ğ‘¢ğ‘¡ğœŽ(𝜃)=2ğ‘Žğ‘“î€·ğ‘¡,ℱ𝑦𝑡−1𝑦𝑡−𝜃+𝑚𝑡−1𝑓𝑡,ℱ𝑦𝑡−1î€¸î€¸ğœŽ2𝑒(𝜃)+𝑓2𝑡,ℱ𝑦𝑡−1ğœŽî€¸î€·2ğ‘Ž+𝑣𝑡−1,𝑣𝑡(𝜃)=ğœŽ2ğ‘Žâˆ’ğœŽ4ğ‘Žğ‘“2𝑡,ℱ𝑦𝑡−1î€¸ğœŽ2𝑒(𝜃)+𝑓2𝑡,ℱ𝑦𝑡−1ğœŽî€¸î€·2ğ‘Ž+𝑣𝑡−1,(3.17) where 𝑢0=0 and 𝑣0=ğœŽ2ğ‘Ž. Hence, the conditional mean and variance of 𝑦𝑡 are given by𝜇𝑡(𝜃)=𝜃+𝑢𝑡−1𝑓(𝜃)𝑡,ℱ𝑦𝑡−1,ğœŽ2𝑡(𝜃)=ğœŽ2𝑒(𝜃)+𝑓2𝑡,ℱ𝑦𝑡−1ğœŽî€¸î€·2ğ‘Ž+𝑣𝑡−1,(𝜃)(3.18) which can be computed recursively.

Let 𝑚𝑡=𝑦𝑡−𝜇𝑡 and 𝑠𝑡=𝑚2ğ‘¡âˆ’ğœŽ2𝑡, then {𝑚𝑡} and {𝑠𝑡} are sequences of martingale differences. We can derive that ⟨𝑚,𝑠⟩𝑡=0, ⟨𝑚⟩𝑡=ğœŽ2𝑒(𝜃)+𝑓2(𝑡,ℱ𝑦𝑡−1)(ğœŽ2ğ‘Ž+𝑣𝑡−1(𝜃)), and ⟨𝑠⟩𝑡=2ğœŽ4𝑒(𝜃)+4𝑓2(𝑡,ℱ𝑦𝑡−1)ğœŽ2𝑒(𝜃)(ğœŽ2ğ‘Ž+𝑣𝑡−1(𝜃))+2𝑓4(𝑡,ℱ𝑦𝑡−1)(ğœŽ2ğ‘Ž+𝑣𝑡−1(𝜃))2. The optimal estimating function and associated information based on 𝑚𝑡 are given by𝑔∗𝑀(𝜃)=−𝑛𝑡=1𝑓𝑡,ℱ𝑦𝑡−11+𝜕𝑢𝑡−1(𝜃)𝑚𝜕𝜃𝑡⟨𝑚⟩𝑡,𝐼𝑔∗𝑀(𝜃)=𝑛𝑡=1𝑓2𝑡,ℱ𝑦𝑡−11+𝜕𝑢𝑡−1(𝜃)/𝜕𝜃2⟨𝑚⟩𝑡.(3.19) Then, the inequality 𝑛𝑡=1𝑓2𝑡,ℱ𝑦𝑡−11+𝜕𝑢𝑡−1(𝜃)𝜕𝜃2⟨𝑚⟩𝑡𝑛𝑡=1𝑓2𝑡,ℱ𝑦𝑡−11+𝜕𝑢𝑡−1(𝜃)/𝜕𝜃2⟨𝑚⟩𝑡≥𝑛𝑡=1𝑓2𝑡,ℱ𝑦𝑡−11+𝜕𝑢𝑡−1(𝜃)𝜕𝜃22(3.20) implies that𝐼CLS∑(𝜃)=𝑛𝑡=1𝑓2𝑡,ℱ𝑦𝑡−11+𝜕𝑢𝑡−1(𝜃)/𝜕𝜃22∑𝑛𝑡=1𝑓2𝑡,ℱ𝑦𝑡−11+𝜕𝑢𝑡−1(𝜃)/𝜕𝜃2⟨𝑚⟩𝑡≤𝐼𝑔∗𝑀(𝜃),(3.21) that is, the optimal linear estimating function 𝑔∗𝑀(𝜃) is more informative than the conditional least squares estimating function 𝑔CLS(𝜃).

The optimal estimating function and the associated information based on 𝑠𝑡 are given by𝑔∗𝑆(𝜃)=−𝑛𝑡=1î‚µğœ•ğœŽ2𝑒(𝜃)𝜕𝜃+𝑓2𝑡,ℱ𝑦𝑡−1𝜕𝑣𝑡−1(𝜃)𝑠𝜕𝜃𝑡⟨𝑠⟩𝑡,𝐼𝑔∗𝑆(𝜃)=𝑛𝑡=1î‚µğœ•ğœŽ2𝑒(𝜃)𝜕𝜃+𝑓2𝑡,ℱ𝑦𝑡−1𝜕𝑣𝑡−1(𝜃)𝜕𝜃21⟨𝑠⟩𝑡.(3.22) Hence, by Theorem 2.1, the optimal quadratic estimating function is given by 𝑔∗𝑄(𝜃)=−𝑛𝑡=11ğœŽ2𝑒(𝜃)+𝑓2𝑡,ℱ𝑦𝑡−1ğœŽî€¸î€·2ğ‘Ž+𝑣𝑡−1×𝑓(𝜃)𝑡,ℱ𝑦𝑡−11+𝜕𝑢𝑡−1(𝜃)𝑚𝜕𝜃𝑡+ğœ•ğœŽ2𝑒(𝜃)/𝜕𝜃+𝑓2𝑡,ℱ𝑦𝑡−1𝜕𝑣𝑡−1(𝜃)/ğœ•ğœƒğœŽ2𝑒(𝜃)+𝑓2𝑡,ℱ𝑦t−1ğœŽî€¸î€·2ğ‘Ž+𝑣𝑡−1𝑠(𝜃)𝑡.(3.23) And the associated information, 𝐼𝑔∗𝑄(𝜃)=𝐼𝑔∗𝑀(𝜃)+𝐼𝑔∗𝑆(𝜃), is given by 𝐼𝑔∗𝑄(𝜃)=𝑛𝑡=11ğœŽ2𝑒(𝜃)+𝑓2𝑡,ℱ𝑦𝑡−1ğœŽî€¸î€·2ğ‘Ž+𝑣𝑡−1×𝑓(𝜃)2𝑡,ℱ𝑦𝑡−11+𝜕𝑢𝑡−1(𝜃)𝜕𝜃2+î€·ğœ•ğœŽ2𝑒(𝜃)/𝜕𝜃+𝑓2𝑡,ℱ𝑦𝑡−1𝜕𝑣𝑡−1(𝜃)/𝜕𝜃2ğœŽ2𝑒(𝜃)+𝑓2𝑡,ℱ𝑦𝑡−1ğœŽî€¸î€·2ğ‘Ž+𝑣𝑡−1.(𝜃)(3.24) It is obvious to see that the information of 𝑔∗𝑄 is larger than that of 𝑔∗𝑀 and 𝑔∗𝑆, and hence, the estimate obtained by solving the optimal quadratic estimating equation is more efficient than the CLS estimate and the estimate obtained by solving the optimal linear estimating equation. Moreover, the relations 𝜕𝑢𝑡(𝜃)𝑓𝜕𝜃=−2𝑡,ℱ𝑦𝑡−1î€¸ğœŽ2ğ‘Žî€·1+𝜕𝑢𝑡−1ğœŽ(𝜃)/𝜕𝜃2𝑒(𝜃)+𝑓2𝑡,ℱ𝑦𝑡−1î€¸ğœŽ2ğ‘Ž+𝑣𝑡−1(𝜃)î€·ğœŽ2𝑒(𝜃)+𝑓2𝑡,ℱ𝑦𝑡−1ğœŽî€¸î€·2ğ‘Ž+𝑣𝑡−1(𝜃)2âˆ’ğœŽ2ğ‘Žî€·ğ‘¦ğ‘¡î€·âˆ’ğ‘“ğ‘¡,ℱ𝑦𝑡−1𝜃+𝑢𝑡−1(𝜃)î€¸î€¸î€·ğœ•ğœŽ2𝑒(𝜃)/𝜕𝜃+𝑓2𝑡,ℱ𝑦𝑡−1𝜕𝑣𝑡−1(𝜃)/ğœ•ğœƒî€¸î€¸î€·ğœŽ2𝑒(𝜃)+𝑓2𝑡,ℱ𝑦𝑡−1ğœŽî€¸î€·2ğ‘Ž+𝑣𝑡−1(𝜃)2,𝜕𝑣𝑡(𝜃)=ğœŽğœ•ğœƒ4ğ‘Žğ‘“2𝑡,ℱ𝑦𝑡−1î€¸î€·ğœ•ğœŽ2𝑒(𝜃)/𝜕𝜃+𝑓2𝑡,ℱ𝑦𝑡−1𝜕𝑣𝑡−1(𝜃)/ğœ•ğœƒî€·ğœŽ2𝑒(𝜃)+𝑓2𝑡,ℱ𝑦𝑡−1ğœŽî€¸î€·2ğ‘Ž+𝑣𝑡−1(𝜃)2(3.25) can be applied to calculate the estimating functions and associated information recursively.

3.4. Regression Model with ARCH Errors

Consider a regression model with ARCH (𝑠) errors 𝜀𝑡 of the form𝑦𝑡=𝐱𝐭𝜷+𝜀𝑡,(3.26) such that E[𝜀𝑡∣ℱ𝑦𝑡−1]=0, and Var(𝜀𝑡∣ℱ𝑦𝑡−1)=â„Žğ‘¡=𝛼0+𝛼1𝜀2𝑡−1+⋯+𝛼𝑠𝜀2𝑡−𝑠. In this model, the conditional mean is 𝜇𝑡=𝐱𝐭𝜷, the conditional variance is ğœŽ2𝑡=â„Žğ‘¡, and the conditional skewness and excess kurtosis are assumed to be constants 𝛾 and 𝜅, respectively. It follows form Theorem 2.1 that the optimal component quadratic estimating function for the parameter vector 𝜽=(𝛽1,…,𝛽𝑟,𝛼0,…,𝛼𝑠)=(ğœ·î…ž,ğœ¶î…ž) is 𝐠∗𝑄1(𝜷)=𝛾(𝜅+2)1−2𝜅+2−1×𝑛𝑡=11ℎ2ğ‘¡îƒ©îƒ©âˆ’â„Žğ‘¡(𝜅+2)ğ±ğ­î…ž+2â„Žğ‘¡1/2𝛾𝑠𝑗=1ğ›¼ğ‘—ğ±ğ­î…žğœ€ğ‘¡âˆ’ğ‘—îƒªğ‘šğ‘¡+îƒ©â„Žğ‘¡1/2ğ›¾ğ±ğ­î…žâˆ’2𝑠𝑗=1ğ›¼ğ‘—ğ±ğ­î…žğœ€ğ‘¡âˆ’ğ‘—îƒªğ‘ ğ‘¡îƒª,𝐠∗𝑄1(𝜶)=𝛾(𝜅+2)1−2𝜅+2−1×𝑛𝑡=11ℎ2ğ‘¡îƒ©â„Žğ‘¡1/2𝛾1,𝜀2𝑡−1,…,𝜀2ğ‘¡âˆ’ğ‘î€¸î…žğ‘šğ‘¡âˆ’ğ‘›î“ğ‘¡=11,𝜀2𝑡−1,…,𝜀2ğ‘¡âˆ’ğ‘î€¸î…žğ‘ ğ‘¡îƒª.(3.27) Moreover, the information matrix for 𝜽=(ğœ·î…ž,ğœ¶î…ž) is given by𝛾𝐈=1−2𝜅+2−1𝐈𝜷𝜷𝐈𝜷𝜶𝐈𝜶𝜷𝐈𝜶𝜶,(3.28) where𝐈𝜷𝜷=𝑛𝑡=1îƒ©ğ±ğ­î…žğ±ğ­â„Žğ‘¡+41,𝜀2𝑡−1,…,𝜀2ğ‘¡âˆ’ğ‘ î€¸î…žî€·1,𝜀2𝑡−1,…,𝜀2ğ‘¡âˆ’ğ‘ î€¸â„Ž2𝑡,𝐈(𝜅+2)𝜷𝜶=−𝑛𝑡=1î€·â„Žğ‘¡1/2ğ›¾ğ‘¡ğ±ğ­î…žâˆ‘âˆ’2𝑠𝑗=1ğ›¼ğ‘—ğ±ğ­î…žğœ€ğ‘¡âˆ’ğ‘—î€¸î€·1,𝜀2𝑡−1,…,𝜀2ğ‘¡âˆ’ğ‘ î€¸â„Ž2𝑡,𝐈(𝜅+2)𝜶𝜷=ğ¼î…žğœ·ğœ¶=−𝑛𝑡=11,𝜀2𝑡−1,…,𝜀2ğ‘¡âˆ’ğ‘ î€¸î…žî€·â„Žğ‘¡1/2𝛾𝐱𝐭∑−2𝑠𝑗=1ğ›¼ğ‘—ğ±ğ­ğœ€ğ‘¡âˆ’ğ‘—î€¸â„Ž2𝑡,𝐈(𝜅+2)𝜶𝜶=𝑛𝑡=11,𝜀2𝑡−1,…,𝜀2ğ‘¡âˆ’ğ‘ î€¸î…žî€·1,𝜀2𝑡−1,…,𝜀2ğ‘¡âˆ’ğ‘ î€¸â„Ž2𝑡.(𝜅+2)(3.29)

It is of interest to note that when {𝜀𝑡} are conditionally Gaussian such that 𝛾=0, 𝜅=0,𝐸∑𝑠𝑗=1ğ›¼ğ‘—ğ±ğ­î…žğœ€ğ‘¡âˆ’ğ‘—î€¸î€·1,𝜀2𝑡−1,…,𝜀2ğ‘¡âˆ’ğ‘ î€¸â„Ž2𝑡(𝜅+2)=ğŸŽ,(3.30) the optimal quadratic estimating functions for 𝜷 and 𝜶 based on the estimating functions 𝑚𝑡=𝑦𝑡−𝐱𝐭𝛽 and 𝑠𝑡=𝑚2ğ‘¡âˆ’â„Žğ‘¡, are, respectively, given by𝐠∗𝑄(𝜷)=−𝑛𝑡=11ℎ2ğ‘¡îƒ©â„Žğ‘¡ğ±ğ­î…žğ‘šğ‘¡+𝑛𝑡=1𝑠𝑗=1ğ›¼ğ‘—ğ±ğ­î…žğœ€ğ‘¡âˆ’ğ‘—îƒªğ‘ ğ‘¡îƒª,𝐠∗𝑄(𝜶)=−𝑛𝑡=11ℎ2𝑡1,𝜀2𝑡−1,…,𝜀2ğ‘¡âˆ’ğ‘ î€¸î…žğ‘ ğ‘¡.(3.31) Moreover, the information matrix for 𝜽=(ğœ·î…ž,ğœ¶î…ž) in (3.28) has 𝐈𝜷𝜶=𝐈𝜶𝜷=ğŸŽ,𝐈𝜷𝜷=𝑛𝑡=1â„Žğ‘¡ğ±ğ­î…žğ±ğ­î€·âˆ‘+2𝑠𝑗=1ğ›¼ğ‘—ğ±ğ­î…žğœ€ğ‘¡âˆ’ğ‘—âˆ‘î€¸î€·ğ‘ ğ‘—=1ğ›¼ğ‘—ğ±ğ­ğœ€ğ‘¡âˆ’ğ‘—î€¸â„Ž2𝑡,𝐈𝜶𝜶=𝑛𝑡=11,𝜀2𝑡−1,…,𝜀2ğ‘¡âˆ’ğ‘ î€¸î…žî€·1,𝜀2𝑡−1,…,𝜀2𝑡−𝑠2ℎ2𝑡.(3.32)

4. Conclusions

In this paper, we use appropriate martingale differences and derive the general form of the optimal quadratic estimating function for the multiparameter case with dependent observations. We also show that the optimal quadratic estimating function is more informative than the estimating function used in Thavaneswaran and Abraham [2]. Following Lindsay [8], we conclude that the resulting estimates are more efficient in general. Examples based on ACD models, RCA models, doubly stochastic models, and the regression model with ARCH errors are also discussed in some detail. For RCA models and doubly stochastic models, we have shown the superiority of the approach over the CLS method.