Bayesian Semiparametric Double Autoregressive Modeling
This paper proposes a Bayesian semiparametric modeling approach for the return distribution in double autoregressive models. Monte Carlo investigation of finite sample properties and an empirical application are presented. The results indicate that the semiparametric model developed in this paper is valuable and competitive.
The original Bayesian theory is a parametric method. The parametric model has been long applied in classical statistical and Bayesian statistical inference studies, and its estimation is based on the unknown parameters of the overall distribution. In a review of Bayesian statistics, Lindley  identified nonparametric methods as a notable area in that it lacks Bayesian progress. Our purpose here is to present work with mixture priors that substantially contributes to both parametric and nonparametric density estimation. With the development of statistical theory, statisticians have found the need to reduce the overall distribution assumptions in many cases. We find that estimates of volatility using nonparametric models can differ dramatically from those using a normal return distribution if there is evidence of a heavy-tailed return distribution. When using a normal distribution or Weibull distribution in general, we often need to assume that the population distribution is unimodal (see ). The support set must be sufficiently large to capture the fat tail. We find that the Dirichlet process mixture (DPM) model is able to adapt to several frequently used distribution models and also accurately estimates the posterior distribution of the volatilities of the returns, without assuming any underlying distribution.
Mixture models are noted for their flexibility and are widely used in the statistical literature. Nonparametric methods are usually combined Bayesian approaches with the development of more mature technologies in density estimation, regression, survival analysis, hierarchical models, and model validation . The Dirichlet process (DP) is one type of nonparametric method often used in density estimation. Ferguson  took Dirichlet as a random probability measure and noted that if using the mixture distribution to approximate an unknown distribution, the posterior inference must be “controlled metaphysically” and the support set must be sufficiently large. Escobar and West  were the first to propose a modeling style involving the DP to introduce uncertainty in the number of components. The wide application of DPM priors is one of the major success stories of modern Bayesian statistics.
A central property of economic time series that is common to many financial time series is that their volatility varies over time. Describing the volatility of an asset is a key issue in financial economics. The most popular class of models for time-varying volatility is represented by GARCH-type models . GARCH models are commonly used to describe, estimate, and predict the dynamics of financial returns. Recent surveys of the existing GARCH model literature can be found in Davidson  and Rombouts et al. . In contrast to Engle’s  ARCH model, a double autoregressive (DAR) model, which is a special case of the ARMA-ARCH models in Weiss  and an example of weak ARMA models in Francq and Zakoïan [11, 12], is also drawing attention from researchers. Under the assumption of the disturbance following a normal distribution, Ling  considered the structure and the maximum likelihood estimation. In addition, stochastic volatility (SV) models have enjoyed great popularity in analyzing financial data in the last couple of decades (see , etc.). Delatola and Griffin  presented a method for Bayesian nonparametric analysis of the return distribution in a stochastic volatility model, which allows noninformative prior distributions for the parameters and assumes that the mixing component variances are constant. In addition, they  extended the work and developed a Bayesian semiparametric model with a leverage effect allowing for stylized facts such as heavy tails of the distribution of returns and correlation between the returns and changes in volatility. Jensen  solved a semiparametric, Bayesian estimator of the long-memory stochastic volatility model using a Markov chain Monte Carlo (MCMC) algorithm. Jensen and Maheu  extended the existing fully parametric Bayesian literature on stochastic volatility to allow for more general return distributions.
Jensen and Maheu [19, 20] proposed a Bayesian nonparametric modeling approach for the return distribution multivariate GARCH models (MGARCH). Virbickait  considered an asymmetric dynamic conditional correlation (ADCC) model for time-varying correlations with GJR-GARCH models for individual volatilities, denoted ADCC-GJR-GARCH, and provided a substantially more realistic evaluation of the comovements of the assets’ returns than standard symmetric MGARCH models. Ausín et al.  developed a Bayesian semiparametric approach based on modeling the innovations using the class of scale mixtures of Gaussian distributions with a DP prior on the mixing distribution. Jensen and Maheu [19, 20] extended the asymmetric, stochastic, and volatility models by modeling the return volatility distribution nonparametrically. The novelty of their paper was in modeling the distribution with an infinite mixture of normals, where the mixture of unknowns have a DPM prior. This motivated us to establish a DAR with a DPM model (DAR-DPM).
The remainder of this paper is organized as follows. In Section 2, we describe the DP mixed process and introduce the DAR model with a nonparametric DPM prior for the unknown distribution. An MCMC scheme is used to sample from the posteriors, and the parametric portions of the model are presented in Section 3. Section 4 provides some simulation studies regarding the performance of the proposed method. An empirical example is reported in Section 5, and Section 6 concludes the paper.
2. DAR Models with Dirichlet Process Mixture Prior
The usual structure of a DAR model  assumes that a return series, denoted by , can be written as follows:where are the conditional variances of given the past history, , ; is a sequence of independent and identically distributed (IID) random variables with zero mean and unit variance; and is independent of for . Ling  gave the stationary ergodic conditions of the general model and obtained the necessary and sufficient conditions for the strict stationarity and weak stationarity of the DAR model. Let be a random matrix of :where is the unit matrix of and is the zero matrix of . To facilitate the presentation of results, we need to give the concept of the Lyapunov exponent here, which is defined as From , we know that there is , where , subordinate to the Theorem of Subadditive Ergodicity , and Ling  gave a theorem in which the necessary and sufficient condition for the existence of a strictly stationary solution of the DAR model is . When , , the condition is reduced to . From the study of Lu and Jiang , the sufficient condition for the weakest stationarity of the model is .
As noted by Ausín et al. , typical models for the innovation distribution include the Gaussian, Student, and Gaussian mixture models. The purpose of this paper is to construct a robust alternative to these distributional assumptions. Generally, the unit variance restriction of in (1) makes it difficult to undertake semiparametric Bayesian inference. To avoid this, we rewrite (1) as follows:where is a rescaled volatility and , and are sequences of IID random variables with mean 0 and variance .
We relax all assumptions concerning the distribution of in (5), allowing its distribution to be completely unknown and random as if the distribution were an unknown in addition to the parameters and latent volatilities of the DAR model. This strategy suggests using a Gaussian scale mixture model with a density function given by where denotes the density function of the Gaussian distribution with mean zero and variance and is a random probability measure of a DP; is a measure function in probability space, and is a strength parameter. The distribution of is based on the unknown prior of the DP. Employing a Lo  type DPM prior, assume thatto model the unknown distribution, where Inv-Gamma and Gamma. In (7)-(9), are normally distributed with a mean zero but a random variance , distributed as . The nonnegative value is the strength parameter, and is the base distribution. By the definition of DP in Sethuraman , is almost necessarily equal to a discrete distribution , where is a degenerative distribution on with the random variance extracted from the prior distribution of DP. To ensure the conjugation, is a gamma distribution with the parameters and , i.e., the probability distribution of Inv-Gamma . The probability of being equal to a particular is , which satisfiesunder , Beta.
The DAR-DPM model above can also be introduced into the potential distribution of variables , while , equals a particular . Under the prior of the DP, the distribution of is , where the probabilistic weight , defined by (10). Incorporating into the definition of the DAR-DPM model, we propose rescaling the mode defined in (5), (7), (8), and (9) as follows: Finally, to complete the Bayesian inference of the model, we also need to deduce distributions for the parameters of the DAR-DPM-type models.
3. Bayesian Inference for DAR-DPM-Type Models
This section describes how to perform Bayesian inference for DAR-DPM-type models. Given an observed time series and the priors defined in the previous section, we consider the joint posterior distribution of the model parameters. Unfortunately, the joint posterior distribution does not have a closed analytical expression. To this end, the MCMC algorithms of the DPM model in Escobar and West , which produce samples from a Markov chain whose stationary distribution is the joint posterior distribution of the model parameters, are employed.
Let , where and are the full model parameters and represent a set of parameters that do not contain the parameter . We block the unknown parameters such that the conditional posterior distribution of the controllable block is either known or given by a manageable form. Assuming that other parameters and latent variables have been obtained, we can construct a Markov chain on each piece of the posterior distributions for iterative sampling.
3.1. Sample from
can be sampled using the two-step method based on the Polya method . Set , where . The posterior distribution of can be written aswhere Under the conditional probability rule, whose density function is Assuming known , the posterior distribution is where , , are the distinct .
The two steps are sampled as follows.
Step 1. is the distinct number of presenting different . If is drawn to be 0, increases by 1, is set for the new value , and the new is extracted from (15); otherwise, is the random sample value .
Step 2. Discarding in the extraction of Step 1, new , are extracted using the sample and iteration, and the posterior distribution is as follows:
3.2. Sample and
The sampling of and is based on the Metropolis-Hastings (MH) method . The use of R language for this method is mentioned in a previous article , which proposed that the distribution is the conditional distribution of the parameters.
Let , denote the prior distribution of assuming known and , . In addition, write . Then, ; therefore, the posterior distribution of iswhere , . Then, our proposed distribution for is . By MH sampling, the accepted probability of , , where . If the suggested value has been refused, then the prior remains as the current value drawn from .
Sampling from , set as the prior of , , and . Then, under , The conditional mean of is , and the conditional variance is . Nakatsuma  suggests replacing this with . Then, the posterior distribution of is where , . The suggested distribution of is . Extracting , the idea of rejecting or accepting is similar to .
Drawing from the conditional posterior distribution of can be performed, as explained in Escobar and West , by first sampling an auxiliary variable from a beta distribution, , and then sampling from a gamma mixture, , where and .
3.4. MCMC Algorithm
This part briefly discusses the steps of the MCMC used to fit the DAR-DPM model. Given the priors, we define the following MCMC algorithm:
4. Simulation Study
In this section, the proposed methodology is illustrated using an artificial time series of size . The simulated data are given by , where we set and . The following three scenarios are considered: (1) ; (2) ; and (3) (Ausín, 2014). The MCMC algorithm is run for a burn-in of 30000 iterations and subsequently a further 30000 iterations, storing one in 1500 iterations, thus obtaining a total of 1500 realizations from the posterior distribution.
Sensitivity Analysis, Comparison of Different Prior Effects. The set of hyperparameters has been set as . Note that we set the prior initially because there is a fair degree of support for values near (see ), which leads to a rather noninformative prior.
Under the seven different priors in Table 1, we change only the hyperparameters (Priors 2, 3, and 4), simply change the hyperparameters (Prior 5 and 6), or change them all (Prior 7). Table 1 shows that all the estimated values of and are closer to the true values of the parameters, with a smaller standard deviation. Therefore, we believe that the parameter estimation of DAR-DPM is not very sensitive to the setting of the hyperparameters. In the following analysis, a significant loss of parameter estimation accuracy is not expected when using the hyperparameters of prior 7.
Normal and Finite Normal State Compared with Infinite Normal State. Table 2 shows that when estimating , the standard normal and finite mixture models behave well. However, both models are unable to estimate . Clearly, the DPM method exhibits its comparative advantage in all cases and can accurately estimate the two parameters and simultaneously, regardless of the data being from the standard Gaussian, finite mixture, or DPM mixed models.
We have drawn postdensity plots for each parameter based on simulation data for clarity; see Figure 1.
5. Empirical Application
This section applies the above method to actual financial data to fit the model. The utilized data are a series of US weekly observations of 3-month Treasury bill data: 1989/03-2019/03 (1566 observations, Data Source: Federal Reserve Economic Data). To make the data smooth, we perform some processing of the original data. Let denote the 90-day T-bill weekly returns, and let denote the rate of return of the order difference, i.e., ; see Figure 2.
To evaluate the performance of the different models, first, we give some descriptive statistics of the premodeling data and their autocorrelation function and partial autocorrelation function; see Table 3 and Figure 3.
We now illustrate the four models using the above data.
First, assume that ; we may obtain the following fitted DAR-DPM model: , where the standard deviation of each coefficient is 0.0361 and 0.2460 and . Then, assume that ; the fitted model is , the standard deviations of the coefficients are 0.2162 and 0.1210, and . Again, suppose that to arrive at , where the standard deviations of the coefficients are 0.0385 and 0.2505 and .
The predictive distribution of the series showcases the exibility of the DPM. We computed and plotted the predictive densities for the above models; see Figure 4.
According to the three models obtained above, we can see that the parameter values with the DPM model have a smaller variance and minimum value of . In the DPM model, we calculate that is 0.8628 and that the mean of is 2.7956. Obviously, under these figures, the disturbance tends to form several groups; therefore, using the DPM model to match these figures is satisfactory. To obtain the above results, we perform 20000 iterations; see Figure 5.
Finally, we also compare our model with that of Jensen and Maheu [19, 20] using the same datasets. The fitted GARCH(1,1)-DPM model is where . The standard deviations of the coefficients are 0.6421, 0.1288, and 0.1073, and . This model was also applied in Ausín, Galeano, & Ghosh .
Based on this example, the results demonstrate that the DAR-DPM model developed in this paper is valuable and competitive. The support for u is always confined to values below 0.7, as is evident in the posterior density of u plotted in Figure 6 for the equity data, where , [19, 20]. This indicates that the equity data strongly support a nonparametric specification with only a few.
In this article, a semiparametric Bayesian approach has been developed. The innovation distribution has been modeled using a scale mixture of a Gaussian model with a DP prior for the mixed distribution. An MCMC algorithm based on a combination of retrospective and slice sampling has been constructed to obtain samples from posterior distributions of the model parameters. The results that we achieved in each of our experiments in both a simulation study and a real data application are quite encouraging.
The text data used to support the findings of this study are included within the supplementary information file. Data Sources: Federal Reserve Economic Data can be obtained by visiting the following websites: https://www.federalreserve.gov/. Additionally, the Treasury bill data of this study is available at “https://fred.stlouisfed.org/categories/116”.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
This research was partially supported by NSFC (11071202&11861025) & Science and Technology Foundation of Guizhou Province (LKS05), China.
D. V. Lindley, Bayesian Statistics. A Review, SIAM, Pennsylvania, Pa, USA, 1972.
J. E. Griffn, “An adaptive truncation method for inference in Bayesian nonparametric models,” Statistics and Computing, vol. 26, no. 1, pp. 1–19, 2016.View at: Google Scholar
E.-I. Delatola and J. E. Griffin, “Bayesian nonparametric modelling of the return distribution with stochastic volatility,” Bayesian Analysis, vol. 6, no. 4, pp. 901–926, 2011.View at: Google Scholar
T. Nakatsuma, “A Markov-chain sampling algorithm for GARCH models,” Studies in Nonlinear Dynamics and Econometrics, vol. 3, pp. 107–117, 1998.View at: Google Scholar