Abstract

This paper introduces two families of distributions referred to as the symmetric κ and asymmetric - distributions. The families are based on transformations of standard logistic pseudo-random deviates. The primary focus of the theoretical development is in the contexts of L-moments and the L-correlation. Also included is the development of a method for specifying distributions with controlled degrees of L-skew, L-kurtosis, and L-correlation. The method can be applied in a variety of settings such as Monte Carlo studies, simulation, or modeling events. It is also demonstrated that estimates of L-skew, L-kurtosis, and L-correlation are superior to conventional product-moment estimates of skew, kurtosis, and Pearson correlation in terms of both relative bias and efficiency when moderate-to-heavy-tailed distributions are of concern.

1. Introduction

Monte Carlo investigations often require the need for univariate or multivariate nonnormal distributions with specified conventional product moments. For example, it is common practice for methodologists to investigate the Type I error and power properties associated with inferential statistics under various degrees of nonnormality. In many cases, these investigations may only require an elementary transformation on standard normal or zero-one uniform deviates to produce nonnormal distributions with specified values of conventional skew and kurtosis. More specifically, the popular power method (PM) [13] and the generalized lambda distribution (GLD) [46] are two basic transformations that could be used for this purpose.

However, conventional-moment-based estimators of skew and kurtosis have unfavorable attributes insofar as they can be substantially biased, have high variance, or can be influenced by outliers [7, 8]. In view of this, -moment-based estimators such as -skew and -kurtosis were introduced to address the limitations associated with conventional moment-based estimators. Specifically, some of the advantages that -moments have over conventional moments are that they (a) exist whenever the mean of the distribution exists, (b) are nearly unbiased for all sample sizes and distributions, and (c) are more robust in the presence of outliers [710]. In view of these advantages, both the PM and GLD transformations have been characterized in the context of -moments to enable the simulation of univariate nonnormal distributions with specified values of -skew and -kurtosis [6, 1113].

In addition to issues of a transformation’s simplicity and ease of execution, another important criterion is its ability to simulate nonnormal distributions with specified correlation structures. To meet this criterion, methodologists have extended the conventional-moment-based PM and GLD transformations from univariate to multivariate data generation in the context of the Pearson correlation [2, 3, 1417]. However, and although these product-moment-based methods are commonly used [3, pages 2-3], they would not make feasible extensions of the univariate -moment-based PM or GLD transformations because the second -moment and the -correlation are based on the coefficient of mean difference and not the variance or standard deviation [7, 8, 18].

Thus, the primary goal of this paper is to derive two families of distributions based on transformations that produce continuous symmetric and asymmetric nonnormal distributions with specified values of -skew, -kurtosis, and -correlation. These two families of distributions are referred to herein as the symmetric and asymmetric - distributions. The proposed transformations are computationally efficient because they only require the knowledge of one or two parameters (, or , ) and an algorithm that generates standard logistic pseudo-random deviates. We would note that the proposed transformations are analogs to the -moment-based Tukey and distributions [19]. However, the and - transformations can generate more leptokurtic and elongated distributions as they are based on the logistic distribution as opposed to the standard normal based or distributions.

The remainder of the paper is outlined as follows. In Section 2, the parametric forms of the probability density function, cumulative distribution function, and other properties associated with the and - distributions are derived. In Section 3, a summary of univariate -moment theory is first provided. The derivations of the systems of equations for specifying values of -skew and -kurtosis for the and - distributions are subsequently provided. The boundary region for feasible combinations of -skew and -kurtosis is also provided. In Section 4, the coefficient of -correlation is first introduced and then the equations are developed for determining intermediate correlations for specified -correlations between and (or) - distributions. In Section 5, the steps for implementing the new method are described. A numerical example and the results of a simulation are provided to confirm the derivations and compare the new method with its conventional-moment-based counterpart. In Section 6, the results of the simulation are discussed.

2. Methodology

The derivation of the probability density function (pdf) and the cumulative distribution function (cdf) associated with the and - families of distributions begins with the following definitions.

Definition 2.1. Let be a random variable that has a standard logistic distribution with pdf and cdf expressed as Let be the auxiliary variable that maps the parametric curves of (2.1) and (2.2) as

Definition 2.2. The analytical and empirical forms of the quantile functions for symmetric distributions are defined as where is a real-valued parameter that controls the tail weight of a distribution.

Definition 2.3. The analytical and empirical forms of the quantile functions for asymmetric - distributions are defined as where () is the real-valued parameter that controls the left (right) tail of a - distribution. Note that if , then (2.6) and (2.7) are equivalent to the symmetric distribution in (2.4) and (2.5).

The explicit forms of the derivatives associated with (2.4) and (2.6) are where it is assumed that , that is, the transformations in (2.4) and (2.6) are strictly increasing monotone functions and where at in (2.8).

Proposition 2.4. If the compositions and map the parametric curves of and where as then (2.9) and (2.10) are the parametric forms of the pdf and cdf associated with the quantile functions in in (2.5) and (2.7), respectively.

Proof. It is first shown that in (2.9) has the properties:

Property 1. .

Property 2. .
To prove Property 1, let be a function where . Thus, given that and in in (2.9) we have which integrates to one because is the standard logistic pdf in (2.1). To show Property 2, it is given by definition that and it is assumed that . Hence, because will be nonnegative on the support of for all and where the provided that for symmetric distributions or for the case of asymmetric distributions we must have and .

A corollary to Proposition 2.4 is stated as follows.

Corollary 2.5. The derivative of the cdf in (2.10) is the pdf in (2.9).

Proof. It follows from and in in (2.10) that and . Hence, using the parametric form of the derivative we have in (2.9). Whence, . Thus, in (2.9) and in (2.10) are the parametric forms of the pdf and cdf associated with the quantile functions in (2.5) and (2.7).

Remark 2.6. Inspection of (2.1), (2.4), (2.6), and (2.9) indicates that the height of any symmetric or asymmetric - distribution at will be .

Definition 2.7. If the monotonicity assumption () holds in (2.9) for all , which requires , then (2.9) is defined as a global pdf.

Remark 2.8. The mode associated with a global pdf in (2.9) is located at , where is the critical number that solves and globally maximizes at . It is noted that the pdf in (2.9) will have a global maximum because the standard logistic pdf in (2.1) has a global maximum and the transformation is assumed to be a strictly increasing monotone function.

Remark 2.9. The median associated with a global pdf in (2.9) is located at . This can be shown by letting and denote the coordinates in the cdf in (2.10) that are associated with the 50th percentile. In general, we must have such that holds in (2.10) for the standard logistic distribution. As such, the limit of the quantile function locates the median at

Remark 2.10. The monotonicity assumption () also holds in (2.9) for cases where and ( or ). This leads to Definition 2.11.

Definition 2.11. A symmetric or asymmetric - distribution has a local pdf if the monotonicity assumption () holds in accordance to Remark (2.5) and from (2.10) where is a specified threshold probability (e.g. ).

Figures 1 and 2 give examples of symmetric and asymmetric - distributions based on (2.9) and (2.10). Figures 1(a) and 1(b) are examples of symmetric global pdfs. Figure 2(a) is an example of an asymmetric global pdf and Figure 2(b) is an example of an asymmetric local pdf. The point where monotonicity fails for the local pdf in Figure 2(b) is located at where making use of (2.10) yields , which would not pose a serious limitation for most purposes. In the next section the -moments for and - distributions are derived and other distributional properties are discussed after a preliminary discussion of univariate -moments.

3. -Moments for and - Distributions

3.1. Preliminaries: Univariate -Moments

Let be independent and identically distributed random variables each with continuous pdf , cdf , order statistics denoted as , and -moments defined in terms of either linear combinations of (a) expectations of order statistics or (b) probability-weighted moments (). For the purposes considered herein, the first four -moments associated with are expressed as [8, pages 20–22] where the are determined from where . The coefficients associated with in (3.5) are obtained from shifted orthogonal Legendre polynomials and are computed as shown in [8, page 20] or in [11].

The -moments and in (3.1) and (3.2) are measures of location and scale and are the arithmetic mean and one-half the coefficient of mean difference, respectively. Higher order -moments are transformed to dimensionless quantities referred to as -moment ratios defined as for , and where and are the analogs to the conventional measures of skew and kurtosis. In general, -moment ratios are bounded in the interval as is the index of -skew () where a symmetric distribution implies that all L-moment ratios with odd subscripts are zero. Other smaller boundaries can be found for more specific cases. For example, the index of -kurtosis () has the boundary condition for continuous distributions of [20]:

3.2. -Moments for Symmetric and Asymmetric - Distributions

The derivation of the first four -moments associated with symmetric distributions begins by defining the probability weighted moments based on (3.5) in terms of in (2.4) and the standard logistic pdf and cdf in (2.1) and (2.2) as Integrating (3.7) for and using (3.1)–(3.4) gives , , , and as where , and . The notations and in (3.9) and (3.11) are the polygamma and harmonic number functions, respectively. The argument for must be positive in and thus a -distribution with a global pdf and defined values of , , , requires that .

The derivation of the -moments for asymmetric - distributions associated with (2.6) begins with determining the probability-weighted moments in (3.5) by separately evaluating and summing two integrals as

As such, using (3.12) to obtain ,…, and subsequently substituting these terms into (3.1)–(3.4) we have where denotes , where the polygamma functions are , , , , ,  , , and . Thus, given specified values of and , (3.15) and (3.16) can be numerically solved to obtain the corresponding values of and . Analogous to -distributions, an asymmetric - distribution with a global pdf and defined values of , , , will require both and . Note that careful inspection of (3.15) and (3.16) reveals that interchanging values for the parameters and reverse the direction of and has no effect on . Further, if , then equations (3.13)–(3.16) simplify to (3.8)–(3.11) associated with the symmetric family of distributions.

In terms of boundary conditions, the minimum value of -kurtosis for global pdfs is where , which is associated with the standard logistic pdf in (2.1). As such, using (3.15) and (3.16) with and , provided in Figure 3 is a graph of the region for feasible combinations of and for asymmetric - global pdfs. Feasible combinations of -skew and -kurtosis for (3.15) and (3.16) will lie in the region above the curve graphed in the ,   plane of Figure 3.

The conventional-moment-based system for - distributions is given in the appendix. This system was used to determine the values of skew and kurtosis associated with the distributions given in Figures 1 and 2. It is worthy to point out that the conventional moment based system has a disadvantage in terms of moment existence. That is, the integral in (A.1) of the appendix reveals that for the th moment to exist we must have and to yield appropriate polygamma functions. More specifically, if the mean, variance, skew, and kurtosis exist, then the parameters must be bounded in the range of and . The advantage that the -moment system has in this context is attributed to Hosking’s Theorem 1 [7] which states that if the mean () exists, then all other -moments will have finite expectations.

Figure 4 provides some additional examples of and - distributions. These distributions will be used in the simulation portion of this study in Section 5. Note that the distributions associated with Figures 4(a) and 4(b) are local pdfs with probabilities at the point where monotonicity fails of and , which do not pose any limitation for the purposes considered herein. In the next section we first introduce the topic of the -correlation and subsequently develop the methodology for simulating - (or ) distributions with specified -correlations.

4. -Correlations for - (or ) Distributions

4.1. Preliminaries: The -Correlation

The coefficient of -correlation [18] is introduced by considering two random variables and with distribution functions and , respectively. The second -moments of and can alternatively be expressed as The second -comoments of toward and toward are As such, the -correlations of toward and toward are expressed as The -correlation in (4.5) or (4.6) is bounded such that where a value of () indicates a strictly increasing (decreasing) monotone relationship between the two variables. In general, we would also note that .

4.2. The -Correlation for - (or ) Distributions

In the context of -moment-based - (or ) distributions, suppose it is desired to simulate a -variate distribution from quantile functions of the form in (2.7) with a specified -correlation matrix and where each distribution has its own specified values of and . Let denote standard normal variables where the distribution functions and bivariate density function associated with and are expressed as: Using (4.7), it follows that the th - distribution associated with (2.7) can be expressed as (())) where is standard logistic because . As such, using (4.5), the -correlation of (())) toward (())) can be evaluated using solved values of and for (())), a specified intermediate correlation (IC) in (4.9), and the following integral expressed as We would point out that the purpose of the IC () in (4.9) and (4.10) is to adjust for the effect of the transformation (())), which is induced by the and parameters, such that (())) has its specified -correlation () toward (())). Further, to simplify the computation, the - distribution in (4.10) is standardized by a linear transformation such that it has a mean of zero and one-half the coefficient of mean difference equal to that of the unit-normal distribution as where is the mean from (3.13) and is a constant that scales in (3.14) and in the denominator of (4.5) to as Analogously, the -correlation of (())) toward (())) is expressed as Note also for the special case that if (())) in (4.10) and (())) in (4.13) have the same parameters, that is, and , then . Provided in Algorithm 1 is Mathematica [21] source code that implements the computation of an IC () based on (4.10). The details for simulating - (or ) distributions with specified values of -skew, -kurtosis, and -correlations are described in the next section.

(*Intermediate Correlation*)
;
Needs[“MultivariateStatistics’”]
PDF[MultinormalDistribution[ ], ];
DF[NormalDistribution[ ], ];
DF[NormalDistribution[ ], ];
(*Parameters for Distribution 4(a) in Figure 4 )
;
;
og[ /( )];
;
;
(* Standardizing constants from (3.13) and from (4.12)*)
;
;
(*Compute the specified -correlation*)
NIntegrate[Piecewise[ ] , ,
, Method MultiDimensional]
0.70

5. The Method and Simulation Study

To implement the method for simulating - (or ) distributions with specified -moments and -correlations we suggest the following six steps.(1) Specify the -moments for transformations of the form in (2.7), that is, and obtain the solutions for the parameters of and by solving (3.15) and (3.16) using the specified values of -skew () and L-kurtosis () for each distribution. Specify a matrix of -correlations () for (())) toward (())), where .(2) Compute the (Pearson) intermediate correlations (ICs) by substituting the solutions of and from Step 1 into (4.10) and then numerically integrate to solve for (see Algorithm 1 for an example). Repeat this step separately for all pairwise combinations of correlations.(3) Assemble the ICs into a matrix and decompose this matrix using a Cholesky factorization. Note that this step requires the IC matrix to be positive definite.(4) Use the results of the Cholesky factorization from Step 3 to generate standard normal variables correlated at the intermediate levels as follows: where are independent standard normal random variables and where represents the element in the th row and the th column of the matrix associated with the Cholesky factorization performed in Step 3.(5) Substitute from Step 4 into the following Taylor series-based expansion for the standard normal cdf [22]: where denotes the standard normal pdf and where the absolute error associated with (5.2) is less than .(6) Substitute the zero-one uniform deviates, , generated from Step 5 into the equations of the form of (())), where is standard logistic to generate the - distributions with the specified -moments and -correlations.

To demonstrate the steps above and evaluate the proposed method, a comparison between the proposed -moment and conventional product-moment based procedures is subsequently described. Specifically, the distributions in Figure 4 are used as a basis for a comparison using the specified correlation matrices in Table 1 where both strong and moderate levels of correlation are considered. Tables 2 and 3 give the solved IC matrices for the conventional moment and -moment-based methods, respectively. See Algorithm 2 for an example for computing ICs for the conventional method. Tables 4 and 5 give the results of the Cholesky decompositions on the IC matrices, which are then used to create with the specified ICs by making use of the formulae given in (5.1) of Step 4 with . The values of are subsequently transformed to using (4.2) and then substituted into equations of the form in (2.5) and (2.7) to produce (())) for both methods.

(*Intermediate Correlation*)
;
Needs[“MultivariateStatistics’”]
DF[MultinormalDistribution[ ], ];
DF[NormalDistribution[ ], ];
DF[NormalDistribution[ ], ];
(*Parameters for Distributions 4(a) and 4(d) in Figure 4 )
0.044817709;
.1704343967;
.0;
.0;
og[ /( )];
og[ /( )];
;
;
;
;
(*Standardizing constants from (3.13) and from (A.2) from the appendix*)
;
;
;
;
(*Compute the specified conventional Pearson correlation*)
NIntegrate[Piecewise[ ] Piecewise[ ,
]*
, , , Method MultiDimensional]
0.60

In terms of the simulation, a Fortran algorithm was written for both methods to generate 25,000 independent sample estimates for the specified parameters of: (a) conventional skew (), kurtosis (), and Pearson correlation () and (b) L-skew (), -kurtosis (), and -correlation (). All estimates were based on sample sizes of and . The formulae used for computing estimates of were based on Fisher’s -statistics, that is, the formulae currently used by most commercial software packages such as SAS, SPSS, Minitab, for computing indices of skew and kurtosis (where for the standard normal distribution). The formulae used for computing estimates of were Headrick’s Equations 2.4 and 2.6 [11]. The estimate for was based on the usual formula for the Pearson product moment of correlation statistic and the estimate for was computed based on (4.5) using the empirical forms of the cdfs in (4.1) and (4.3). The estimates for and were both transformed using Fisher’s transformation. Bias-corrected accelerated bootstrapped average estimates, confidence intervals (C.I.s), and standard errors were subsequently obtained for the estimates associated with the parameters (, , , ) using 10,000 resamples via the commercial software package Spotfire S+ [23]. The bootstrap results for the estimates of and were transformed back to their original metrics. Further, if a parameter (P) was outside its associated bootstrap C.I., then an index of relative bias (RB) was computed for the estimate (E) as: . The results of the simulation are reported in Tables 611 and are discussed in the next section.

6. Discussion and Conclusion

Headrick [11] demonstrated the advantages that -moment ratios have over conventional-moment-based estimators in terms of relative bias and efficiency when sampling was from power method distributions with either moderate or heavy tails. Inspection of the simulation results in Table 6 and Table 7 of this study clearly indicates that this is also the case for the family of and - distributions. Specifically, the superiority that estimates of -moment ratios () have over their corresponding conventional-moment-based counterparts () is obvious. For example, with samples of size the estimates of skew and kurtosis for Distribution 1 were, on average, only 28% and 2.5% of their associated population parameters, whereas the estimates of -skew and -kurtosis were 88.8% and 93.7% of their respective parameters. It is also evident from comparing Tables 6 and 7 that -skew and -kurtosis are more efficient estimators as their relative standard errors RSE (standard error/estimate) 100 are substantially smaller than the conventional estimators of skew and kurtosis. For example, in terms of Distribution 1 (), inspection of Tables 6 and 7 indicates RSE measures of: RSE() % and RSE() % compared with RSE() % and RSE() %. This demonstrates that -skew and -kurtosis have more precision because they have less variance around their estimates.

Presented in Tables 811 are the results associated with the conventional Pearson and -correlations. Overall inspection of these tables indicates that the -correlation is substantially superior to the Pearson correlation in terms of relative bias. For example, in terms of strong correlations () the relative bias for the two heavier tailed distributions (distributions 1 and 2) was 12.24% for the Pearson correlation compared to only 0.99% for the -correlation. Further, for large sample sizes (), the -correlation bootstrap C.I.s contained all of the population parameters, whereas the Pearson correlation C.I.s contained none of the parameters. It is also noted that the variability of the -correlation appears to be more stable than that of the Pearson correlation both within and across the different conditions.

In summary, the new -moment-based method is an attractive alternative to the traditional conventional-moment-based procedure. In particular, the -moment based procedure has distinct advantages when distributions with moderate and heavy tails are used. Finally, we would note that Mathematica Version 8.0 [21] source code is available from the authors for implementing the -moment-based method.

Appendix

A. System of Conventional-Moment-Based Equations for - Distributions

The moments () associated with (2.7) can be determined from

The mean, variance, skew, and kurtosis are in general (e.g., [24])

The moments associated with the location and scale parameters in (A.2) are where the polygamma functions are , , , , , , , and , .

The notation is the Hurwitz zeta function. The moments related to skew and kurtosis are as follows: where, , , , , , , , , , , , . The notation in (A.5) is the zeta function. Note that the moments for symmetric distributions can be easily obtained by setting in (A.3)–(A.6) and simplifying.