Abstract
This paper introduces a new family of generalized lambda distributions (GLDs) based on a method of doubling symmetric GLDs. The focus of the development is in the context of L-moments and L-correlation theory. As such, included is the development of a procedure for specifying double GLDs with controlled degrees of L-skew, L-kurtosis, and L-correlations. The procedure can be applied in a variety of settings such as modeling events and Monte Carlo or simulation studies. Further, it is demonstrated that estimates of L-skew, L-kurtosis, and L-correlation are substantially superior to conventional product-moment estimates of skew, kurtosis, and Pearson correlation in terms of both relative bias and efficiency when heavy tailed distributions are of concern.
1. Introduction
The conventional moment-based family of generalized lambda distributions (GLDs) is often used in various applied mathematics contexts to model and describe data by a single functional form [1, page 5]. Some examples include modeling non-log-normal security price distributions [2], biological and physical phenomena [3], and solar radiation data [4]. The GLD is also a popular tool for generating random variables for Monte Carlo or simulation studies. Some examples include studies in such areas as operations research [5], microarray research [6], and structural equation modeling [7].
The family of GLDs is based on the transformation [1, page 21], [9, 11], [10, page 127] where is uniformly distributed on the interval . The parameters and are location and scale parameters, while and are shape parameters that determine the skew and kurtosis of a GLD. The pdf and cdf associated with (1.1) can be expressed as [10, page 127] where and are parametric forms of the pdf and cdf with the mappings and with , , , and where 1 and are the regular uniform pdf and cdf, respectively. It is assumed that in (1.2) to ensure a valid pdf that is, the transformation in (1.1) is strictly increasing. An essential requirement for a valid pdf is that , , in (1.1) all have the same sign [1, page 24]. For more specific details on the parameter space and conditions related to valid GLDs, see Karian and Dudewicz [1, pages 21–47]. Provided in Figure 1 is an example of a valid GLD pdf based on (1.1) and (1.2).
Symmetric GLDs are produced for the case where in (1.1) and where the mean (), variance (), skew (), and kurtosis () can be determined from [8] Numerical solutions for and in (1.5) and (1.7) can be found in [1, Appendix B], which are associated with standardized GLDs (i.e., , ). Note that the term in (1.5) and (1.7) represents the complete beta function where the arguments cannot be negative. As such, for the th moment to exist then we must have [8, 9, 11]. Thus, the condition ensures that the first four moments exist.
We propose a new family of asymmetric GLDs based on a technique referred to herein as doubling symmetric GLDs. More specifically, a family of double GLDs can be created by selecting a pair of constants (, ) and transforming separately for and as follows: where () is the nonzero shape parameter for the left (right) side of a distribution and is the (positive) parameter that determines the height of the double GLD at . Provided in Figure 2 is an example of a double GLD pdf based on (1.2) and (1.8). Inspection of Figures 1 and 2 clearly indicates that these two GLDs are markedly different even though both distributions have the same values of skew () and kurtosis (). Note that the values of and in Figure 2 were determined based on the equations for and given in the appendix.
Conventional moment-based estimators (e.g., ) have unfavorable attributes insofar as they can be substantially biased, have high variance, or can be influenced by outliers. For example, inspection of Figure 2 indicates, on average, that the estimates of are only 67.70% and 14.70% of their associated population parameters. Note that each estimate of in Figure 2 was calculated based on samples of size and the formulae currently used by most commercial software packages such as SAS, SPSS, and Minitab, for computing skew and kurtosis.
-moment-based estimators, such as -skew () and -kurtosis (), have been introduced to address the limitations associated with conventional moment-based estimators [12, 13]. Specifically, some of the advantages that -moments have over conventional moments are that they (a) exist whenever the mean of the distribution exists, (b) are nearly unbiased for all sample sizes and distributions, and (c) are more robust in the presence of outliers. For example, the estimates of in Figure 2 are relatively much closer to their respective parameters with much smaller standard errors than their corresponding conventional moment-based analogs . More specifically, the estimates of that were simulated are, on average, 98.89% and 99.23% of their parameters.
In the context of multivariate data generation, the methodology has been developed for simulating symmetric (or asymmetric) GLDs based on (1.1) with specified Pearson correlation structures [2, 14]. This methodology is based on conventional product moments and the popular NORTA (NORmal To Anything, [15]) approach, which begins with generating multivariate standard normal deviates. However, the NORTA approach is not without its limitations. Specifically, one limitation arises because the Pearson correlation is not invariant under nonlinear strictly increasing transformations such as (1.1). As such, the NORTA approach must begin with the computation of an intermediate correlation (IC) matrix, which is different than the specified correlation matrix between the GLDs. The purpose of the IC matrix is to adjust for the effect of the transformation in (1.1) such that the resulting GLDs have their specified skew, kurtosis, and Pearson correlation matrix.
Two additional limitations associated with the NORTA approach in this context are that solutions to an IC matrix may neither (a) exist in the range of [–1, +1] as the absolute values of the ICs must be greater than (or equal to) their specified Pearson correlations nor (b) yield a positive definite IC matrix albeit the specified Pearson correlation matrix is positive definite [16]. Further, these two problems can be exacerbated when heavy tailed distributions are involved in the computation of ICs as functions performing numerical integration can more frequently either fail to converge or yield incorrect solutions. In contradistinction, it has been demonstrated in the context of the -correlation that the limitations associated with the NORTA approach are less pronounced because the solution values of an IC matrix are in closer proximity to their specified (positive definite) -correlation matrix [17].
In view of the above, the present aim is to derive the double GLD family of distributions based on (1.8) in the contexts of -moment and -correlation theory. Specifically, the purpose of this paper is to develop the methodology and a procedure for simulating double GLDs with specified -moments and -correlations. The primary advantages of the proposed procedure are that estimates of -skew, -kurtosis, and -correlation are less biased and more efficient.
The remainder of this paper is organized into four sections. The next section provides a summary of univariate -moment theory and the derivations of the system of equations and boundary conditions for generating double GLDs with specified values of -skew and -kurtosis. The section thereafter introduces the coefficient of -correlation and the equation is subsequently derived for determining ICs for specified -correlations between double GLDs. The steps for implementing the proposed -moment procedure are subsequently described. A numerical example and results of a simulation are also provided to confirm the derivations and compare the new -moment-based procedure with the conventional product-moment-based procedure. In the last section, the results of the simulation are discussed.
2. Methodology
2.1. Preliminaries
Let be random variables each with continuous pdf , cdf , order statistics denoted as , and -moments defined in terms of either linear combinations of (a) expectations of order statistics or (b) probability weighted moments (). For the purposes considered herein, the first four -moments associated with are expressed as [13, pages 20–22] where the are determined from where . The coefficients associated with in (2.5) are obtained from shifted orthogonal Legendre polynomials and are computed as shown in [13, page 20].
The -moments and in (2.1) and (2.2) are measures of location and scale and are the arithmetic mean and one-half the coefficient of mean difference, respectively. Higher-order -moments are transformed to dimensionless quantities referred to as -moment ratios defined as for , and where and are the analogs to the conventional measures of skew and kurtosis. In general, -moment ratios are bounded in the interval as is the index of -skew () where a symmetric distribution implies that all -moment ratios with odd subscripts are zero. Other smaller boundaries can be found for more specific cases. For example, the index of -kurtosis () has the boundary condition for continuous distributions of [18]
2.2. -Moments for Double GLDs
The family of double GLDs, associated with (1.8), based on -moments is less restrictive than the family based on conventional moments insofar as we may consider the nonzero parameters of for any distribution with finite order -moments rather than for the th-order conventional moment to exist. This advantage is attributed to Hosking’s Theorem 1 [12] which states that if the mean () exists, then all other -moments will have finite expectations.
As such, the family of double GLDs can be derived in the context of -moments by defining the probability weighted moments based on (2.5) in terms of in (1.1) and the regular uniform pdf and cdf as Integrating (2.7) for and using (2.1)–(2.4) yields Thus, given specified values of and , (2.10) and (2.11) can be numerically solved for the corresponding values of and . Note that the values of -skew () and -kurtosis () in (2.10) and (2.11) are independent of the value of selected in (1.8). Further, inspection of (2.10) and (2.11) indicates that interchanging values for the parameters and reverses the direction of and has no effect on .
For the special case of the double GLD is symmetric where in (2.10) and in (2.11) will simplify to the expression Differentiating (2.12) with respect to and equating the resulting expression to zero and solving for yields a minimum value of -kurtosis of where . As such, using (2.10) and (2.11) with and , provided in Figure 3 is a graph of the region for feasible combinations of and for double GLDs. Feasible combinations of and for (2.10) and (2.11) will lie in the region above the curve graphed in the , plane of Figure 3.
Provided in Figure 4 are some examples of various double GLDs. These distributions are used in the simulation portion of this study in Section 4. The next section begins with an introduction to the -correlation.
(a)
(b)
(c)
(d)
3. The -Correlation for Double GLDs
The coefficient of -correlation (see [19]) is introduced by considering two random variables and with continuous distribution functions and , respectively. The second -moments of and can alternatively be expressed as The second -comoments of toward and toward are As such, the -correlations of toward and toward are expressed as The -correlation in (3.5) or (3.6) is bounded such that where a value of () indicates a strictly increasing (decreasing) monotone relationship between the two variables. In general, we would also note that .
In the context of the -moment-based double GLD, suppose it is desired to simulate distributions of the form in (1.8) with a specified -correlation matrix and where each distribution has its own specified values of and . Let denote standard normal variables where the distribution functions and bivariate density function associated with and are expressed as Using (3.7), it follows that the th double GLD associated with (1.8) can be expressed as because . As such, using (3.5), the -correlation of toward can be evaluated using solved values of and for , a specified intermediate correlation (IC) in (3.9), and the following integral expressed as The double GLD in (3.10) is standardized by a linear transformation such that it has a mean of zero and one-half the coefficient of mean difference equal to that of the unit-normal distribution as where is the mean from (2.8) and is a constant that scales in (2.9) and in the denominator of (3.5) to as Analogously, the -correlation of toward is expressed as Note for the special case that if in (3.10) and in (3.13) have the same parameters that is, and , then . Provided in Algorithm 1 is source code written in Mathematica [20] that implements the computation of an IC () based on (3.10). The details for simulating double GLDs with specified-correlations are described in the next section.
|
4. The Procedure and Simulation Study
To implement the procedure for simulating double GLDs with specified -moments and -correlations we suggest the following six steps. (1)Specify the -moments for transformations of the form in (1.8), that is, and obtain the solutions for the parameters of and by solving (2.10) and (2.11) using the specified values of -skew () and -kurtosis () for each distribution. Specify a matrix of -correlations () for toward , where . (2)Compute the (Pearson) intermediate correlations (ICs) by substituting the solutions of and from Step (1) into (3.10) and then numerically integrate to solve for (see Algorithm 1 for an example). Repeat this step separately for all pairwise combinations of correlations. (3)Assemble the ICs into a matrix and decompose this matrix using a Cholesky factorization. Note that this step requires the IC matrix to be positive definite. (4)Use the results of the Cholesky factorization from Step (3) to generate standard normal variables correlated at the intermediate levels as follows: where are independent standard normal random variables and where represents the element in the th row and the th column of the matrix associated with the Cholesky factorization performed in Step 3. (5)Substitute from Step (4) into the following Taylor series-based expansion for the standard normal cdf [21]: where denotes the standard normal pdf and where the absolute error associated with (4.2) is less than . (6)Substitute the regular uniform deviates, , generated from Step (5) into the equations of the form in (1.8), as noted in Step (1), to generate the double GLDs with the specified -moments and -correlations.
To demonstrate the steps above and evaluate the proposed procedure, a comparison between the new -moment and conventional product moment-based procedures is subsequently described. Specifically, the distributions in Figure 4 are used as a basis for a comparison using the specified correlation matrices in Table 1 where both strong and moderate levels of correlation are considered. Tables 2 and 3 give the solved IC matrices for the conventional moment- and -moment-based procedures, respectively. See Algorithm 2 for the algorithm and an example for computing ICs for the conventional procedure. Tables 4 and 5 give the results of the Cholesky decompositions on the IC matrices, which are then used to create with the specified ICs by making use of the formulae given in (4.1) of Step (4) with . The values of are subsequently transformed to using (4.2) and then substituted into equations of the form in (1.8) to produce for both procedures.
|
In terms of the simulation, a Fortran algorithm was written for both procedures to generate 25,000 independent sample estimates for the specified parameters of (a) conventional skew (), kurtosis (), and Pearson correlation (); (b) -skew (), -kurtosis (), and -correlation (). The estimates of were based on samples of size and the estimates of were based on sample sizes of and . The estimates for were based on Fisher’s -statistics, that is, the formulae currently used by most commercial software packages such as SAS, SPSS, and Minitab, for computing indices of skew and kurtosis (where for the standard normal distribution). The formulae used for computing estimates for were Headrick’s equations and [22]. The estimate for was based on the usual formula for the Pearson product-moment of correlation statistic and the estimate for was computed based on (3.5) using the empirical forms of the cdfs in (3.1) and (3.3). The estimates for and were both transformed using Fisher’s transformation. Bias-corrected accelerated bootstrapped average estimates, confidence intervals (C.I.s), and standard errors were subsequently obtained for the estimates associated with the parameters (, , , ) using 10,000 resamples via the commercial software package Spotfire S+ [23]. The bootstrap results for the estimates of and were transformed back to their original metrics. Further, if a parameter () was outside its associated bootstrap C.I., then an index of relative bias (RB) was computed for the estimate () as . The results of the simulation are reported in Tables 6, 7, 8, 9, 10, 11 and are discussed in the next section.
5. Discussion and Conclusion
One of the advantages that -moment ratios have over conventional moment-based estimators is that they can be far less biased when sampling is from distributions with heavy tails [13, 19]. Inspection of the simulation results in Tables 6 and 7 clearly indicates that this is the case. Specifically, the superiority that estimates of -moment ratios () have over their corresponding conventional moment-based counterparts () is obvious. For example, with samples of size the estimates of skew and kurtosis for Distribution 1 were, on average, only 64.07% and 10.22% of their associated population parameters, whereas the estimates of -skew and -kurtosis were 98.6% and 99.35% of their respective parameters. It is also evident from comparing Tables 6 and 7 that -skew and -kurtosis are more efficient estimators as their standard errors are substantially smaller and more stable than the conventional moment-based estimators of skew and kurtosis.
Presented in Tables 8, 9, 10, 11 are the results associated with the conventional Pearson and -correlations. Overall inspection of these tables indicates that the -correlation is superior to the Pearson correlation in terms of relative bias. For example, for strong correlations () the relative bias for the two heavy tailed distributions (i.e., distributions 1 and 2) was 8.66% for the Pearson correlation compared to only 1.17% for the -correlation. Further, for large sample sizes (), the -correlation bootstrap C.I.s contained all of the population parameters, whereas the Pearson correlation C.I.s contained none of the parameters. It is also noted that the variability of the -correlation appears to be more stable than that of the Pearson correlation both within and across the different conditions.
In summary, the new -moment-based procedure is an attractive alternative to the traditional conventional moment-based procedure. In particular, the -moment-based double GLD procedure has distinct advantages when distributions with heavy tails are used. Finally, we note that Mathematica Version 8.0.1 [20] source code is available from the authors for implementing both the conventional and new -moment-based procedures.
Appendix
System of Conventional Moment-Based Equations for Double GLDs
The moments () based on (1.8) can be determined from The mean, variance, skew, and kurtosis are in general (e.g., [24]) In terms of the double GLD in Figure 2, setting in (A.1) for the unit normal distribution, the moments associated with the location and scale parameters in (A.2) are and the moments related to skew and kurtosis are as follows: where , and are the regularized hypergeometric, gamma, and incomplete beta functions, respectively.