Semiparametric Gaussian Variance-Mean Mixtures for Heavy-Tailed and Skewed Data
There is a need for new classes of flexible multivariate distributions that can capture heavy tails and skewness without being so flexible as to fully incur the curse of dimensionality intrinsic to nonparametric density estimation. We focus on the family of Gaussian variance-mean mixtures, which have received limited attention in multivariate settings beyond simple special cases. By using a Bayesian semiparametric approach, we allow the data to infer about the unknown mixing distribution. Properties are considered and an approach to posterior computation is developed relying on Markov chain Monte Carlo. The methods are evaluated through simulation studies and applied to a variety of applications, illustrating their flexible performance in characterizing heavy tails, tail dependence, and skewness.
There is an increasing awareness of the importance of developing new classes of multivariate distributions that flexibly characterize heavy tails and skewness, while accommodating tail dependence. Such tail dependence arises in many applications and is a natural consequence of dependence in outlying events. Such dependence is well known to occur in financial data, communication networks, weather, and other settings, but is not adequately characterized by common approaches such as Gaussian copula models. Salmon  provides a compelling commentary on how reliance on a single measure of correlation in two variables based on a Gaussian copula may have played a substantial role in the financial crisis. We need statistical methods based on new classes of distributions that do not rely on such unrealistic assumptions but that are still tractable to apply even in moderate- to high-dimensional settings.
There is an existing literature relevant to this topic. Wang et al.  proposed a class of skew-symmetric distributions having probability density functions (pdfs) of the form , where is a continuous symmetric density and is a skewing function. Choosing as normal leads to the skew normal class [3, 4], with other special cases corresponding to skew- [5, 6], skew slash  and skew elliptical distributions . These parametric models are useful in providing computationally tractable distributions that have parameters regulating skewness and kurtosis in the data. However, choosing a specific parametric family for and can be challenging in practice, with different choices yielding potentially different results. Although one can potentially conduct model selection or averaging, this adds to the computational burden.
Alternatively, nonparametric approaches have been explored to handle heavy-tailed and skewed observations with more flexibility. For instance, mixtures of normal distributions have been widely used to approximate arbitrary distributions. Venturini et al.  uses mixtures of gamma distributions over the shape parameter to model heavy-tailed medical expenditure data. Mixtures of other heavy-tailed distributions have also been proposed. Such nonparametric density estimation approaches face substantial challenges in multivariate cases due to the curse of dimensionality. Fully nonparametric density estimation is almost too flexible in allowing densities that have arbitrary numbers of modes and complex shapes, which are difficult to estimate accurately based on available data in many cases. There has been some attempts to reduce dimensionality in multivariate density estimation using mixtures of factor analyzers  and alternative approaches, but nonetheless the curse is only partly thwarted by such efforts.
We focus on Gaussian variance-mean mixtures (GVMMs), introduced by Barndorff-Nielsen  as a flexible class of multivariate distributions induced through the following hierarchical model for , where is a location parameter, is a drift or skewness parameter, , is a mixture distribution on , , and is a positive definite matrix. Conditions such as are typically imposed to avoid an unidentifiable scale factor. This model corresponds to a generalization of the multivariate normal distribution, which is obtained in the special case in which and is a point mass.
Current literature on Gaussian variance-mean mixtures is mostly focused on univariate models in which the mixing distribution belongs to a parametric family. Some special cases corresponding to different choices of include Student’s -test, Laplace, hyperbolic, normal inverse Gaussian, and variance gamma distributions, and such models have been applied broadly (see e.g., [11–16]). However, as with skew-symmetric distributions, limiting to a particular parametric class is clearly restrictive. In addition, some appealing subfamilies of GVMM, such as generalized hyperbolic (GH) distributions, are not analytically tractable. This is partially due to the flatness of the likelihood function , which makes it hard to obtain reliable parameter estimates without prior information or some form of penalization, even with large sample sizes .
We propose Bayesian semiparametric Gaussian variance-mean mixture models, in which the mixing distribution is modeled nonparametrically to flexibly accommodate heavy tails and skewness, while letting the data inform about the appropriate distribution choice. Efficient Bayesian computational strategies are developed for reliable inference on parameters.
2. Proposed Modeling Framework
2.1. Semiparametric GVMM
Consider the -dimensional multivariate continuous heavy-tailed and/or skewed observations , (). Our goal is to obtain a flexible GVMM model for the density via modeling the mixing distribution nonparametrically. To achieve this, we use Dirichlet process mixture (DPM) of generalized log-normal (log GN) prior with an unknown order for . Mathematically, the model is represented as where is the DP precision and is the base measure for and , which we choose independent normal and inverse gamma, respectively.
The family of log GN distributions considered here was initially introduced by Vianelli , which can be obtained through the exponential transformation of a random variable that follows the generalized normal distribution [20, 21]. The pdf of a distributed random variable is given by where , , , and . log GN densities are more flexible than log-normal densities in including an additional parameter controlling tail behavior, with log-normal corresponding to the special case in which and double-exponential obtained by taking .
Although log GN distributions are appealing in providing a simple generalization of the log-normal which is more flexible in the tails, such distributions have been rarely implemented even in simpler settings due to the computational hurdles involved. Fortunately, for Bayesian posterior computation via MCMC we can rely on a data augmentation algorithm based on Fact 1, with the adaption to our setting using DPMs of generalized log-normals shown in Section 3.2.
Fact 1. Let and be two random variables such that (1), (2)then .
DPMs of Gaussians provide a highly flexible approximation to arbitrary densities. As a prior for directly, DPMs of Gaussians have appealing asymptotic properties and lead to minimax optimal adaptive rates of convergence under some conditions on the true data-generating density, with these conditions unfortunately ruling out heavy-tailed densities . As motivated above, our focus is on obtaining a flexible prior for multivariate heavy-tailed and skewed densities, which is not fully nonparametric in the sense of allowing multimodal and other very irregular density shapes but can flexibly approximate a broad class of unimodal densities without inducing restrictive tail constraints. Hence, it is not appropriate to choose a DPM of Gaussians directly for the density of the data .
Expression (2.1) instead uses a DPM of log GNs for the mixture distribution within the GVMM framework. By using such DPM prior and setting unknown, we obtain a large flexibility of the mixing distribution and its tail decay, which can adopt diverse degrees of heavy tailness for the marginal density of data, while allowing the data to fully infer about the unknown mixing distribution. After considering a broad variety of alternatives, we have found this specification to be excellent at capturing a rich variety of multivariate heavy-tailed and skewed distributions, while also allowing symmetric data. Some basic properties are detailed below.
2.2. Tail Behavior
It is important to understand the relationship between the tail behavior of the mixture distribution and the induced tail behavior of , the marginal distribution of . We start by considering the univariate case. Theorem 2.2 of Barndorff-Nielsen and Sorensen  describes the relationship in the special case in which and hence the marginal distribution is symmetric.
Theorem 2.1 (see ). Suppose is the pdf of a Gaussian variance mixture as described in (1.1) with , and that the tail of the mixing distribution with pdf satisfies where and is a function of slow variation with , for all . Then (1)if , ,(2)if , .
Observe that the tail behavior of a Gaussian variance mixture (when ) mainly depends on the tail behavior of the mixing distribution. To generalize this to arbitrary Gaussian variance-mean mixtures, we first introduce the following Lemma.
Lemma 2.2. Suppose is the pdf of a Gaussian variance mixture with mixing distribution , and let denote the moment generating function of . Then s.t. : is the pdf of a Gaussian variance-mean mixture with skewness parameter and mixing distribution with pdf satisfying , where is the pdf of G. The converse is also true.
Proofs can be found in the appendix. This lemma provides a link between the tail behavior of a Gaussian variance-mean mixture and that of a Gaussian variance mixture, via the link between tails behaviors of the two mixing distributions. This relationship is used in the following theorem.
Theorem 2.3. Suppose is the pdf of a Gaussian variance-mean mixture as described in (1.1), with skewness parameter and mixing distribution . If the tail of the mixing density satisfies where , and is a function of slow variation with , for all , then: (1), as , (2), as .
Theorem 2.3 shows that the tail behavior of a Gaussian variance-mean mixture also depends on that of the mixing distribution. Generally, heavier tails in the mixing distribution induce heavier tails of the Gaussian variance-mean mixture. Thus, by placing a DPM of generalized log-normals on the mixing distribution, we induce a prior on the density with flexible degrees of tail decay. Another observation is that if the mixing distributions have subexponential tails (such as log-concave distributions), then the Gaussian variance-mean mixture also has sub-exponential tails, which also illustrates the limitation in the flexibility of using particular parametric cases of the Gaussian variance-mean mixtures to fit data.
To compute the moments of Gaussian variance-mean mixtures, we can directly apply the law of total cumulance. Let , , , and denote the expectation, variance, skewness, and kurtosis of the Gaussian variance-mean mixtures described in (1.1), and let , and denote those of the mixing distribution. Given they all exist, we have simply
More generally, we have , where and are moment generating functions of GVMM and the corresponding mixing distribution, respectively. So clearly, the existence of moments of mixing distributions indicates the existence of moments of Gaussian variance-mean mixtures, and can control expectation, variance, and kurtosis, in addition to being a skewness parameter.
3. Bayesian Computation
In the semiparametric GVMM framework, the constraint is typically imposed to guarantee model identifiability. To improve computational efficiency in our proposed Bayesian computational algorithm, we use a parameter-expansion approach in which priors are placed on parameters in an unidentifiable working model without the constraint. We then include a postprocessing step to transform the parameters back to an identifiable inferential model, which includes the constraint. A related strategy was used in Gaussian factor models by Ghosh and Dunson . We use fairly diffuse priors for unidentifiable parameters, as an aid to mixing and because it is difficult to elicit priors for these parameters. However, we avoid completely noninformative priors because the flatness of the likelihood function in some GVMM models  can lead to unreliable inferences in the absence of some prior information or penalization. To address this problem, we propose an empirical Bayes approach to incorporate skewness information from the data in estimating hyperparameters for the skewness parameter .
In particular, we transform the original data to have a positive sample skewness and a unit sample variance. The data are first normalized and the sample skewness is calculated. If it is negative, we multiply the normalized data by a negative one. In conducting inferences, we transform back to the scale and sign of the original data. As GVMMs are closed under linear transformations, this will induce a GVMM. Because the transformed data are more likely to be right skewed or symmetric, we can more easily elicit a default weakly informative prior for the skewness parameter , which we choose to be normal with positive mean , with a gamma hyperprior placed on to improve prior robustness. For the order parameter , a truncated inverse-gamma prior is used. Diffuse priors are placed on the remaining unknowns. To summarize These priors were used in an unconstrained working model for the transformed data and induce priors on the parameters in the identifiable inferential model having the constraint.
As for the DPM of log GN prior for the mixing distribution , for ease in computation, we use a similar stick-breaking representation of the DPM as proposed by Sethuraman  but with generalized log-normal instead of normal components, and truncated at components following Ishwaran and James . Furthermore, as pointed out before, since it is hard to directly update parameters of the generalized log-normal distributions, we further utilize Fact 1 to introduce augmented data to improve the computational efficiency. To summarize, the data-augmented stick-breaking representation of the DPM of log GN prior is shown as follows: where , , , and for , we use independent normal and inverse gamma for and , respectively. Here, is augmented data representing the mixture component index for observation .
3.2. Full Conditionals and Posterior Analysis
Given the model and priors as specified by (2.1), (3.1), and (3.2), we use a data-augmented Gibbs sampler to update the unknown quantities, including parameters in the GVMM framework (, , and ), parameters in the DPM of log GN (, , and ), augmented data ( and ), hyperparameters, and the mixing variables (). The Gibbs sampler is computationally efficient and mixes rapidly as most of the full conditional distributions have closed forms, except those for , , and , which are all univariate and updated using Metropolis-Hastings steps within the Gibbs sampler. Key steps in each Gibbs sampler iteration are listed as follows.(I) Samples , , and : given normal priors for and inverse-Wishart prior for and the model that , is sampled from conditionally normal distribution, and from a conditionally inverse Wishart. (II) Samples and : given the following priors are used for and for the th log-normal component, : Sample , , from conditionally truncated normal distribution: and sample from conditionally truncated inverse-gamma distribution: where is the total number of in the mixture component. (III) Sample . , , is sample from conditionally truncated exponential distribution: (IV) Sample : given the Gamma(,) for , the full conditional distribution is sampled using Metropolis-Hastings algorithm, from the following full conditional truncated kernel: where , . , and . (V) Sampling : , is sampled from the conditionally multinomial (MN) distribution, with (VI), are updated from conditionally beta distribution as shown in Ishwaraman and James . Also update univariate , using a Metropolis-Hastings step within the Gibbs sampler.
4. Simulation Study and Real Data Analysis
4.1. Univariate Semiparametric GVMM
To test the semiparametric framework, a simulated dataset from univariate GVMM is first modeled. Specifically, observations () are generated from model (2.1), with the true values of the parameters as shown in The histogram of simulated data is shown in Figure S2(B) in Supplementary Material available online at http://dx.doi.org/10.5402/2012/345784, which shows significant heavy tailness and skewness, with sample kurtosis and skewness being 4.60 and 1.01, respectively.
For Bayesian inference, we preprocess the original simulated data and place priors as described previously. We run the MCMC for 10000 iterations with the first 5000 as burn-in. Several aspects of the posterior distributions are analyzed to evaluate the model fitting. First of all, posterior samples of and parameters allow us to reconstruct and visualize the unknown mixing distribution . As shown in Figure S1, a comparison between the true mixing distribution (Gamma(3,1), panel (B)) and 100 reconstructed mixing distributions (panel (A)) shows significant similarity, indicating that the model can effectively capture the underlying structure. This is further illustrated by the posterior distributions of and , which have 95% C.I.s being [0.8676,3.2390] and [1.2532,2.9660], and posterior means being 2.0804 and 2.1523, respectively, which also very well cover the true values. It is worth mentioning that later in the real data analysis where the true values are unknown, the posterior distribution of provides us with a useful tool to assess skewness, with significant positive values of indicating skewness.
Furthermore, we can reconstruct the dataset based on the posterior samples of model parameters and the mixing distribution and directly visualize whether the posterior predictive distribution resembles the observed one. We reconstructed 200 such datasets, each containing 5000 data points. While the reconstructed datasets resemble the observed one (shown in Figure S2), we specifically looked at the posterior quantile estimates of the fitted semiparametric GVMM, which is done by getting the 95% posterior C.I.s for a series of quantiles based on the 200 reconstructed datasets. This is shown in Table 1 and compared to the quantile estimates obtained by fitting other models, such as normal, skewed normal, and skewed -distributions (Table 2). Compared to the fact that simple Gaussian model fails to capture neither heavy tailness and skewness, our GVMM model fits the data as well as skewed-Gaussian and skewed -distributions, while maintaining unique advantages of relatively easy forms, convenient sampling, and interpretability.
4.2. Modeling the S&P 500 Returns
It is well known that stock returns do not always confirm well with a Gaussian distribution. Modeling both heavy tailness and asymmetry of returns is becoming important in economics and finance. Here, we look at daily returns of Standard & Poor's 500 Composite (S&P 500) index from 01/02/1990 to 09/13/2011. Totally 5470 observations are shown in Figure S3(A), with sample skewness being 0.189 (after preprocessing), which suggests that the return distribution may be slightly right skewed.
Similar univariate semiparametric GVMM and prior setup are applied to the dataset to access the capability of the model in capturing the return distribution. To evaluate the model fitting, we also reconstructed 200 datasets based on the posterior samples of unknown quantities (each consisting of 5470 observations), and a quick comparison (Figure S3) between the observed and reconstructed datasets shows a significant similarity, indicating that our model captures the return distribution well. We also look at posterior quantile estimates based on the 200 simulated datasets (Table 3), which further illustrate the goodness of fit of the model.
Furthermore, we specifically look at the posterior distribution of . Generally speaking, when the sample skewness is relatively small (as in this case), it is difficult to claim whether the true distribution is skewed or symmetric, because samples from symmetric heavy-tailed distributions can also exhibit significant sample skewness due to the presence of extreme values with a finite sample size. However, one feature of the Bayesian semiparametric GVMM framework is that the skewness can be simultaneously inferred by looking at the sign of the parameter, which can test the actual existence of skewness directly against an “artificial sample skewness” due to heavy tailness. To illustrate this, the histogram of 5000 posterior samples of is shown in Figure 1, which gives a 95% posterior C.I. [0.0033,0.0838] and thus claims that the distribution is slightly right skewed at the 95% confidence level. As a striking comparison, we generated 5000 samples from -distribution (, shown in the appendix). The sample skewness is 0.228, which appears to suggest a slight right skewness. However, when we fit the data using our Bayesian semiparametric GVMM and look at the posterior distribution of (shown in Figure 2, as compared to Figure 1), the posterior 95% C.I. is [-0.0371, 0.0757], with posterior mean at 0.0213, which argue against the existence of skewness and suggest that the sample skewness is due to the fact that -distribution is heavy tailed.
4.3. Modeling Multivariate Monthly Precipitation
There has also been a growing interest for flexible families of non-Gaussian distributions allowing skewness and heavy tails in environmental science and climatology, as more heavy-tailed and skewed data are observed practically. Specifically, it is well known that monthly rainfall is strongly skewed to the right with high positive values of skewness coefficients (e.g., ). Various distributions have been suggested to model the precipitation data, among which there are the exponential, gamma (e.g., [29, page 98]), log-normal (e.g., ), and log-skewed-normal/-distributions . However, most of the studies are focusing on univariate modeling, and there has been a little physical justification to why a specific distribution is used. Although skewed-Gaussian/-type distributions have been extended to the multivariate setting (see e.g., [27, 31]), they do not handle substantial skewness well . We are motivated to consider applying the Bayesian semiparametric GVMM framework to model multivariate precipitation data. One appealing feature is that the extension to multivariate case is both straightforward and interpretable.
US national and regional precipitation data are publicly available from the United States Historical Climatology Network (USHCN). For the purpose of exposition, we used monthly precipitation data measured in inches from four local stations (Albemarle, Chapel Hill, Edenton, and Elizabeth City) in the state of North Carolina, for the period from 1895 through 2010 (116 data per station for each month). Figure 3 presents the histogram of log monthly precipitation data for July every year. Data from all four stations exhibit right skewness, with sample skewness coefficients being 0.675, 1.440, 0.632, and 0.971, respectively.
We fit the semiparametric multivariate GVMM (2.1) to the data (dimension ).
We run the Markov Chain for 10000 iterations, which shows good mixing and convergence, and discarded the first 5000 as burn-in. To illustrate the model fitting, we reconstruct a precipitation dataset with 5000 observations based on posterior samples of all unknown quantities and compare the reconstructed posterior predictive distribution with the observed. Specifically, we test both whether the marginal univariate distributions of each dimension and the covariance structure are captured correctly. As shown in Figure 4, the curves represent the corresponding univariate kernel smoothing density estimates of the posterior predictive distributions based on 5000 samples from the fitted model, which show good fitting to the observed dataset shown with the histograms. The goodness of fit is further illustrated by the PP-plot between the observed and fitted distributions (Figure 5). Covariance structure is also modeled well (a comparison between Figures 6 and 7). All of these suggest that our framework effectively captures the underlying structure of the precipitation data.
As a comparison, we also fitted the log-precipitation data with multivariate skewed -distributions , which have also been used to model skewed and heavy-tailed data, with two tails behaving as polynomials. The fitted model is obtained via maximum likelihood estimation using the sn package provided by Azzalini , and both the marginal distributions and covariance structure of the fitted multivariate skewed -model are compared to those of the observed dataset. We can find that although the multivariate skewed -distribution captures most of the univariate marginal distributions with a slightly lighter right tails and heavier left tails (Figure S4), it failed to find the correct covariance structure (Figure S5 compared to Figures 6 and 7). This could result from getting stuck in a local maximum and obtaining a suboptimal solution due to the relatively complicated likelihood function of multivariate skewed -distributions. In any case, our Bayesian semiparametric multivariate GVMM seems to provide a computationally and structurally simpler yet more effective family of distributions for multivariate skewed and heavy-tailed data.
This paper proposes the use of Bayesian semiparametric Gaussian variance-mean mixtures as a flexible, interpretable, and computationally tractable model for heavy-tailed and skewed observations. The model assumes the mixing distribution in the general Gaussian variance-mean mixtures to be unknown, so the data inform about the appropriate distribution choice, the degree of skewness, heaviness of tails, and the shape of the distributions and thus provide a more flexible framework for heavy-tailed and skewed data analysis. Although we test the model with univariate and multivariate simulated and real data, the broader question of scaling the framework to higher dimensions, possibly combined with factor models for sparse modeling, is interesting and challenging. Under the same scenario, the assumption that a single mixing variable is controlling the tails and skewness of all dimensions seems restrictive, although this may be an acceptable assumption with low dimensions. Some hierarchical nonparametric models that allow multiple mixing variable/distributions may help in this case, but how to define an identifiable parametrization will definitely be an issue whenever multiple mixing variables are assumed, which is also worth more thoughts and theory to support. On the other hand, we consider time-varying semiparametric GVMM a natural extension to the GVMM framework, taking advantage of the interpretability of model parameters. However, the more general class of spatial-temporal models, where additional structures and sources of information are included, is still challenging and yet unexplored.
Proof of Lemma 2.2. Given -dimensional observation from a Gaussian variance-mean mixtures described in (1.1), let denote the pdf of the Gaussian variance-mean mixture with mixing distribution , and let denote the pdf of the Gaussian variance mixture () with mixing distribution . Without the loss of generality, we considered the case with (otherwise, a linear transformation will complete the proof) Let , with in the univariate case, and define where and are densities for and , respectively, and is the moment generating function for , with . Plug in, and we can see that
Proofs, supplementary figures and results are shown in the Supplementary Material Section.
F. Salmon, “The formula that killed wall street,” Significance, vol. 9, no. 1, pp. 16–20, 2012.View at: Google Scholar
J. Wang, J. Boyer, and M. G. Genton, “A skew-symmetric representation of multivariate distributions,” Statistica Sinica, vol. 14, pp. 1259–1270, 2004.View at: Google Scholar
M. Chen, J. Silva, J. Paisley, C. Wang, D. Dunson, and L. Carin, “Compressive sensing on manifolds using a nonparametric mixture of factor analyzers: algorithm and performance bounds,” IEEE Transactions on Signal Processing, vol. 58, no. 12, pp. 6140–6155, 2010.View at: Publisher Site | Google Scholar
O. Barndorff-Nielsen, “Exponentially decreasing distributions for the logarithm of particle size,” Proceedings of the Royal Society of London A, vol. 353, no. 1674, pp. 401–419, 1977.View at: Google Scholar
O. E. Barndorff-Nielsen, J. L. Jensen, and M. Sorensen, “Parametric modelling of turbulence,” Philosophical Transactions, vol. 332, no. 1627, pp. 439–455, 1990.View at: Google Scholar
K. Aas and I. H. Haff, “The generalized hyperbolic skew students t-distribution,” Journal of Financial Econometrics, vol. 4, no. 2, pp. 275–309, 2006.View at: Google Scholar
M. Varanasi and B. Aazhang, “Parametric generalized gaussian density estimation,” Journal of the Acoustical Society of America, vol. 86, no. 4, pp. 1404–1415, 1989.View at: Google Scholar
W. Shen and S. Ghosal, “Adaptive bayesian multivariate density estimation with dirichlet mixtures,” submitted.View at: Google Scholar
A. Azzalini, R Package sn: The Skew-Normal and Skew-t Distributions (Version 0.4–17), Universit`a di Padova, Padova, Italia, 2011.
H. Ishwaran and L. F. James, “Gibbs sampling methods for stick breaking priors,” Journal of the American Statistical Association, vol. 96, no. 453, pp. 161–173, 2001.View at: Google Scholar
D. S. Wilks, Statistical Methods in the Atmospheric Sciences, vol. 91 of International Geophysics, Academic Press, 2nd edition, 2005.
E. L. Crow and K. Shimizu, Lognormal Distributions: Theory and Applications, Marcel Dekker, 1988.
B. E. Hansen, “Autoregressive conditional density estimation,” International Economic Review, vol. 35, no. 3, pp. 705–730, 1994.View at: Google Scholar