The Scientific World Journal

Volume 2015, Article ID 909231, 16 pages

http://dx.doi.org/10.1155/2015/909231

## The Lambert Way to Gaussianize Heavy-Tailed Data with the Inverse of Tukey’s *h* Transformation as a Special Case

Department of Statistics, Carnegie Mellon University, Pittsburgh, PA 15213, USA

Received 25 July 2014; Accepted 1 October 2014

Academic Editor: Taizhong Hu

Copyright © 2015 Georg M. Goerg. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

I present a parametric, bijective transformation to generate heavy tail versions of arbitrary random variables. The tail behavior of this *heavy tail Lambert* random variable depends on a tail parameter : for , , for has heavier tails than . For being Gaussian it reduces to Tukey’s distribution. The Lambert W function provides an explicit inverse transformation, which can thus remove heavy tails from observed data. It also provides closed-form expressions for the cumulative distribution (cdf) and probability density function (pdf). As a special case, these yield analytic expression for Tukey’s pdf and cdf. Parameters can be estimated by maximum likelihood and applications to S&P 500 log-returns demonstrate the usefulness of the presented methodology. The R package LambertW implements most of the introduced methodology and is publicly
available on CRAN.

#### 1. Introduction

Statistical theory and practice are both tightly linked to Normality. In theory, many methods require Gaussian data or noise: (i) regression often assumes Gaussian errors; (ii) many time series models are based on Gaussian white noise [1–3]. In such cases, a model , parameter estimates and their standard errors, and other properties are then studied, all based on the ideal(istic) assumption of Normality.

In practice, however, data/noise often exhibits asymmetry and heavy tails, for example, wind speed data [4], human dynamics [5], or Internet traffic data [6]. Particularly notable examples are financial data [7, 8] and speech signals [9], which almost exclusively exhibit heavy tails. Thus a model developed for the Gaussian case does not necessarily provide accurate inference anymore.

One way to overcome this shortcoming is to replace with a new model , where is a heavy tail distribution: (i) regression with Cauchy errors [10]; (ii) forecasting long memory processes with heavy tail innovations [11, 12], or ARMA modeling of electricity loads with hyperbolic noise [13]. See also Adler et al. [14] for a wide range of statistical applications and methodology for heavy-tailed data.

While such fundamental approaches are attractive from a theoretical perspective, they can become unsatisfactory from a practical point of view. Many successful statistical models and techniques assume Normality, their theory is very well understood, and many algorithms are implemented for the simple and often much faster Gaussian case. Thus developing models based on an entirely unrelated distribution is like throwing out the (Gaussian) baby with the bathwater.

It would be very useful to transform a Gaussian random variable to a heavy-tailed random variable and vice versa and thus rely on knowledge and algorithms for the well-understood Gaussian case, while still capturing heavy tails in the data. Optimally such a transformation should (a) be bijective, (b) include Normality as a special case for hypothesis testing, and (c) be parametric so the optimal transformation can be estimated efficiently.

Figure 1 illustrates this pragmatic approach: researchers can make their observations as Gaussian as possible () before making inference based on their favorite Gaussian model . This avoids the development of, or the data analysts waiting for, a whole new theory of and new implementations based on a particular heavy-tailed distribution , while still improving statistical inference from heavy-tailed data . For example, consider from a standard Cauchy distribution in Figure 2(a): modeling heavy tails by a transformation makes it even possible to Gaussianize this Cauchy sample (Figure 2(c)). This “nice” data can then be subsequently analyzed with common techniques. For example, the location can now be estimated using the sample average (Figure 2(d)). For details see Section 6.1.