Abstract
Power method polynomial transformations are commonly used for simulating continuous nonnormal distributions with specified moments. However, conventional moment-based estimators can (a) be substantially biased, (b) have high variance, or (c) be influenced by outliers. In view of these concerns, a characterization of power method transformations by L-moments is introduced. Specifically, systems of equations are derived for determining coefficients for specified L-moment ratios, which are associated with standard normal and standard logistic-based polynomials of order five and three. Boundaries for L-moment ratios are also derived, and closed-formed formulae are provided for determining if a power method distribution has a valid probability density function. It is demonstrated that L-moment estimators are nearly unbiased and have relatively small variance in the context of the power method. Examples of fitting power method distributions to theoretical and empirical distributions based on the method of L-moments are also provided.
1. Introduction
The power method (see Fleishman [1], Headrick [2], and [3–7]) is a traditional procedure used for simulating continuous nonnormal distributions. Some applications of the power method have included such topics as ANOVA [8–10], asset pricing theories [11], business-cycle features [12], cluster analysis [13], item parameter estimation [14], item response theory [15, 16], factor analysis [17–19], price risk [20], structural equation models [21–24], and toxicology [25].
In the context of univariate data generation and for the purposes considered herein, the power method can be generally summarized by the polynomial transformation as in Headrick [2, equation (2.7)] where can be either a standard normal or a standard logistic (L) random variable with probability density function (pdf) and cumulative distribution function (cdf) Setting (or ) in (1.1) gives the Fleishman [1] (or Headrick [3]) class of distributions associated with standard normal-based polynomials. The shape of in (1.1) is contingent on the coefficients , which are determined by moment-matching techniques. For example, see Headrick [2, equations (2.18)–(2.21); (2.22)–(2.25)] for determining the coefficients associated with standard normal orstandard logistic-based polynomials for in (1.1).
The pdf and cdf associated with in (1.1) are given in parametric form () as in Headrick [2, equations (2.12) and (2.13)] where the derivative in (1.3), that is, the polynomial is a strictly increasing monotonic function and where and are the pdf and cdf associated with the random variable in (1.2). To demonstrate, Figure 1 gives the graphs of a pdf and cdf associated with a standard normal-based power method distribution used in a Monte Carlo study by Berkovits et al. [8] (and, similarly, Enders [21] and Olsson et al. [24]).
(a)
(b)
The graphs in Figure 1 and numerical solutions for the coefficients , which are based on the shape parameters in Table 1, were obtained using (1.3), (1.4), and the software package developed by Headrick et al. [26]. Note that the parameters in Table 1, (skew), (kurtosis), , and , are standardized cumulants and are scaled such that the normal distribution would have values of .
Conventional moment-based estimators, such as in Table 1, have unfavorable attributes to the extent that they can be substantially biased, have high variance, or can be influenced by outliers and thus may not be representative of the true population parameters (e.g., see [27, 28]). Some of these attributes are exemplified in Table 1 as the estimates of and their respective bootstrap confidence intervals attenuate below the population parameters of with increased bias and variance as the order of the estimate increases. That is, on average, the estimates of are only 83.67%, 63.81%, 38.35%, and 21.38% of their associated population parameters, respectively.
The estimates of in Table 1 were calculated based on samples of size and Fisher’s -statistics (see, e.g., [29, pages 299-300]), that is, the formulae currently used by most commercial software packages such as SAS, SPSS (PASW), Minitab, and so forth, for computing values of skew and kurtosis. Thus, it should also be pointed out that these estimates () have another undesirable property of being algebraically bounded based on the sample size, that is, , , , . As such, if a researcher was using a value of kurtosis in a Monte Carlo study, such as in Table 1, and drawing samples of size , then the largest sample estimate of kurtosis possible is or 71.43% of the parameter.
The method of L-moments is an attractive alternative to conventional method of moments in terms of describing theoretical or empirical probability distributions, estimating parameters, and hypothesis testing (see [27, 28, 30]). More specifically, the first four L-moments are analogous to conventional moments as they describe the location, scale, skew, and kurtosis of a data set. However, L-moments have demonstrated to be superior to conventional moments to the extent that they (a) exist whenever the mean of the distribution exists, (b) only require that a distribution have finite variance for their standard errors to be finite, (c) are nearly unbiased for all sample sizes and distributions, (d) do not suffer from the deleterious effects of sampling variability, (e) are more robust in the presence of outliers, and (f) are not algebraically bounded based on sample size (see [27, 28, 30–33]). Further, it has been demonstrated that there are conditions where the method of L-moments can also yield more accurate and efficient parameter estimates than the method of maximum likelihood when sample sizes are small to moderate (see [27, 34–37]). Other advances have also been made. For example, Elamir and Seheult (see [38, 39]) introduced trimmed L-moments and derived expressions for the exact variances and covariances of sample L-moments. Further, Necir and Meraghni [40] demonstrated that L-moments and trimmed L-moments are useful for estimating L-functionals in the context of heavy-tailed distributions.
Estimates of L-moments and L-moment ratios are based on linear combinations of order statistics unlike conventional moments that are based on raising the data to successive powers which in turn gives greater weight to data points located farther away from the mean and thus may result with estimates () having substantial bias and (or) variance. For example, the L-moment ratio estimates () in Table 2 are relatively much closer to their respective population parameters () with smaller variance than their corresponding conventional moment-based analogs () in Table 1. Specifically, the ratios () are, on average, 99.03%, 99.55%, 98.35%, and 99.29% of their respective parameters.
In view of the above, the present aim is to obviate the problems associated with conventional moments in the context of power method transformations of the form in (1.1) by characterizing these transformations through L-moments. The focus is on standard normal and standard logistic-based polynomials. In Section 2, the essential requisite information associated with L-moments for theoretical and empirical distributions is provided as well as the derivations for the systems of equations for computing polynomial coefficients. Closed-formed formulae are provided for determining if any particular polynomial has a valid pdf. Further, the boundary regions for valid pdfs are derived and graphed for polynomials of order three and for the symmetric case associated with polynomials of order five. In Section 3, conventional moments and L-moments are compared in terms of estimation and distribution fitting to demonstrate the superior characteristics that L-moments have in the context of the power method. In Section 4, third- and fifth-order standard normal and logistic-based polynomials are compared in terms of their upper and lower L-moment ratio boundary points.
2. Methodology
2.1. Preliminaries
L-moments are defined as linear combinations of probability weighted moments (PWMs). For a continuous theoretical distribution with a cdf denoted as , the PWMs can be generally defined as in Hosking [27] where . The L-moments can be determined by summing the PWMs as where are coefficients from shifted orthogonal Legendre polynomials. Specifically, the first six L-moments based on (2.1)–(2.3) are expressed as
Analogous to conventional moment theory, the values of and are parameters associated with the location and scale of the distribution. More specifically, the L-mean () is the usual arithmetic mean, and L-scale () is one-half of Gini’s coefficient of mean difference (see, e.g., [29, pages 47-48]). Higher-order L-moments are transformed to dimensionless indices referred to as L-moment ratios defined as for . In general, L-moment ratios are bounded such that as is the index of L-skew () where a symmetric distribution implies that . Smaller boundaries can be found for specific cases. For example, in the context of continuous distributions, L-kurtosis and have boundaries of (see Jones [41]) which indicate that and have lower bounds of and , respectively. An example of a set of computed L-moments () and L-moment ratios () based on (2.1) through (2.4) is provided in the first column of Table 3 for a Beta distribution.
Empirical L-moments for a set of data of size are linear combinations of the sample order statistics . The unbiased estimates of the PWMs are for and where is the sample mean. The sample L-moments are obtained by substituting in place of in (2.4). The notations used for sample L-moments and L-moment ratios are for .
2.2. L-Moments for Standard Normal Polynomial Transformations
Using (1.1)-(1.2) and (2.1), the PWMs for power method polynomials based on the standard normal distribution are
Integrating (2.7) for and subsequently substituting these PWMs into (2.4) (and after several tedious manipulations) and simplifying yields the following system of equations for fifth-order polynomials (): where is the coefficient of mean difference. The analytical expressions for the constants in (2.11)–(2.13) are given in Appendix A.
The derivations above yield a system of six equations (2.8)–(2.13) expressed in terms of six real variables . The first two equations of this system can be standardized by setting (2.8) to zero and (2.9) to . The next four equations, (2.10)–(2.13), are set to the desired values of . Simultaneously solving this system of equations yields the solutions of . The coefficients are then substituted into (1.1) to generate which has zero mean (, one-half of the coefficient of mean difference for the unit-normal distribution ( ), and the specified values of . If the negatives of and are desired, then inspection of (2.8)–(2.13) indicates that all that is needed are simultaneous sign reversals between , and . These sign reversals will have no effect on , or . It is worthy to point out that it is not necessary to numerically solve the system of equations in (2.8)–(2.13) as the coefficients have unique solutions which can be determined by evaluating equations (A.9)–(A.14) in Appendix A. See Figures 2(a) and 2(b) for examples of standard normal-based fifth-order power method pdfs.
(a) Normal-based pdf
(b) Normal-based pdf
(c) Logistic-based pdf
Setting in (2.8)–(2.11) and simplifying yields the system of equations for the smaller class of distributions associated with third-order polynomials ) in (1.1) as where the solutions for the coefficients in (2.14)–(2.17) are
Examples of a fifth-order () and a third-order () power method distribution are given in Figure 3, both of which provide an approximation of the Beta () distribution. The coefficients associated with the two polynomial approximations in Figure 3 were used in (1.4) to compute the percentiles in Table 3 for the power method pdfs. Inspection of the graphs in Figure 3 and the percentiles in Table 3 indicate that the fifth-order polynomial provides a much more accurate approximation of the Beta distribution than the third-order polynomial. The reason for this is because the fifth-order system could produce a valid pdf that is based on an exact match with the Beta distribution’s L-moment ratios , whereas the third-order system was unable to produce a valid pdf that has an exact match with , and . Thus, it is important to consider the boundary conditions for valid power method pdfs.
(a)
(b)
2.3. L-Moment Boundaries for Standard Normal Polynomial Transformations
The restriction that in (1.3) implies that a set of solved coefficients may not necessarily produce a valid pdf. To determine if a third-order polynomial produces a valid pdf, we first set the quadratic equation and subsequently solve for as
A set of solved coefficients will produce a valid pdf if the discriminant in (2.19) is negative. That is, the complex solutions for must have nonzero imaginary parts. As such, setting will yield the point where the discriminant vanishes and thus real-valued solutions exist to .
Standardizing (2.15), by setting the coefficient of mean difference to , and solving for give
Substituting the right-hand side of (2.20) into (2.16) and (2.17) and setting , because we only need to consider positive values of L-skew, gives
Inspection of (2.21) indicates that for real values of to exist then we must have and thus from (2.20) . Using (2.21) and (2.22), the graph of the region for valid third-order power method pdfs is given in Figure 4 along with the minimum and maximum values of and . In summary, a valid standardized nonnormal third-order pdf will have the properties of (a) , (b) , and (c) .
In terms of fifth-order polynomials, the formulae that solve the (quartic) equation must also have nonzero imaginary parts to ensure a valid pdf. Specifically, the closed-formed expressions for evaluating pdfs associated with symmetric distributions are and for asymmetric distributions the expressions are The expressions for , , and in (2.24) are given in Appendix B.
The boundary for the larger class of symmetric fifth-order pdfs (i.e., ) can be viewed by making use of (2.23). Specifically, the graph of the region for valid symmetric pdfs is given in Figure 5 along with the minimum and maximum values of and . The elliptical graph in Figure 5 consists of four separate segments where in (2.11) and in (2.13) are expressed solely as functions of . See Appendix C for further details.
2.4. L-Moments and Boundaries for Standard Logistic Polynomial Transformations
The method for deriving the system of equations for power method transformations based on the standard logistic distribution is similar to the derivation of the system associated with (2.8)–(2.13). However, the derivation is more straight forward because the cdf for the logistic distribution is available in closed form. As such, the system of equations for fifth-order polynomials in (1.1) is where is the coefficient of mean difference and for standardized distributions the solutions for the coefficients are given in Appendix D. The system of equations for the smaller family of distributions based on third-order polynomials can be obtained by setting in (2.25)–(2.28), and for standardized distributions, the solutions for the coefficients are
The derivation of the L-skew () and L-kurtosis () boundary for third-order polynomials is also analogous to the steps taken in Section 2.3. As such, the graph of the boundary for valid third-order power method pdfs is given in Figure 6 along with the minimum and maximum values of and . Similarly, a valid standardized nonlogistic third-order pdf will have the properties of (a) , (b) , and (c) . In terms of fifth-order polynomials, the boundary region for symmetric pdfs and minimum and maximum values of and are given in Figure 7. The elliptical graph in Figure 7 also consists of four segments. See Appendix E for further details. Note that the expressions in (2.23) and (2.24) are also used to determine whether fifth-order pdfs are valid or not. An example of a logistic-based power method pdf is given in Figure 2(c).
3. A Comparison of Conventional Moments and L-Moments
3.1. Estimation
One of the advantages that sample L-moment ratios () have overconventional moment-based estimates, such as skew () and kurtosis (), is that are less biased (e.g., [28]). This advantage can be demonstrated in the context of the power method by considering the simulation results associated with the indices for the percentage of relative bias (RBias%) and standard error (SE) reported in Table 4.
Specifically, a Fortran algorithm was coded to generate twenty-five thousand independent samples of size , and the estimates of and were computed for each of the (3 × 25000) samples based on the parameters and coefficients listed in Table 4. The estimates of were computed based on Fisher’s -statistics, and the estimates of were based on (2.6). Bootstrapped average estimates, confidence intervals, and SEs were obtained for and using twenty-five thousand resamples via the commercial software package Spotfire S+ [42]. The percentage of relative bias for each estimate was computed as and .
The results in Table 4 demonstrate the substantial advantage that L-moment ratios have overconventional estimates in terms of both bias and error. These advantages are most pronounced in the context of smaller sample sizes and higher-order moments. For example, given a sample size of , the conventional estimates of () and () generated in the simulation were, on average, 11.40% (56.50%) and 31.93% (78.36%) less than their respective parameters. On the other hand, the amounts of relative bias associated with the L-moment ratios were essentially negligible. Further, the SEs associated with are relatively much smaller than the corresponding SEs for .
3.2. Distribution Fitting
Presented in Figure 8 are conventional moment and L-moment-based power method pdfs superimposed on a histogram of data from the Project Match Research Group studies (see [43, 44]). The data are associated with participants who were assigned to a Twelve-Step Facilitation (TSF) treatment condition for alcoholism. Specifically, these data are the total number of drinks per ninety-day period which were determined by each participant’s reports of daily standard ethanol drinks (one drink was equal to one-half of an ounce). Drinking was assessed approximately every three months, beginning at pretreatment (baseline), immediately following treatment, and at three-month intervals over a twelve-month posttreatment follow-up period (i.e., six, nine, twelve, and fifteen months posttreatment).
(a)
(b)
The sample estimates (; ) associated with Figure 8 were based on a sample size of participants. The estimates were also used to solve for the two sets of coefficients, which produced the power method pdfs based on (1.3). Note that the two polynomials were linearly transformed using the location and scale estimates () from the data. Visual inspection of the approximations in Figure 8 and the goodness of fit statistics given in Table 5 indicate that the L-moment pdf provides a more accurate fit to the actual data. The reason for this is partially attributed to the fact that the conventional moment-based power method pdf does not have an exact match with , whereas the L-moment pdf is based on an exact match with all of the sample estimates. Note also that the asymptotic P values in Table 5 are based on a chi-square distribution with degrees of freedom: = 13(classes) − 6(estimates) − 1(sample size) = 6.
4. Discussion and Conclusion
This paper introduced L-moments in the context of standard normal and standard logistic-based power method polynomials. The boundaries were analytically derived for polynomials of order three and for symmetric pdfs associated with polynomials of order five. The lower boundary point associated with the standard normal third-order system is the standard normal distribution (i.e., and ). As such, it is worthy to point out that the larger family of pdfs for the standard normal fifth-order system includes the L-moment ratios associated with the regular uniform or Beta () distribution (i.e., and ). Further, the upper boundary of L-kurtosis for the third-order system of extends up to a boundary of for the fifth-order system which is a substantial increase. That is, in terms of conventional moments, this is equivalent to increasing kurtosis from to .
In terms of the third-order logistic-based family of pdfs, it is noted that maximum L-skew is whereas the maximum value of L-skew for the standard normal-based system is . Again, this is actually a considerable difference—as these values correspond to values of conventional skew of and , respectively. Similarly, the maximum values of L-kurtosis for the standard logistic-based and normal-based systems are and and are equivalent to values of conventional kurtosis of and , respectively.
Finally, we note that Mathematica ([45], Version 7.0) software for computing polynomial coefficients, cumulative probabilities, and fitting (graphing) power method pdfs to data are available on request.
Appendices
A. Analytical Expressions and Polynomial Coefficients (Normal)
The expressions for in (2.11)–(2.13) and the coefficients in (2.8)–(2.13) are Equations (A.6)–(A.8) were derived by adapting the techniques used by Renner [46] and where , , and are based on the standard recurrence relation for . The coefficients in (A.9)–(A.14) are for standardized distributions (i.e., ).
B. Expressions for (2.24)
The expressions for , , and in (2.24) are
C. Symmetric Distributions Boundary (Normal)
The elliptical graph in Figure 5 is based on (2.11) and (2.13) and setting from (2.9) and then setting as follows:
D. Polynomial Coefficients (Logistic)
The coefficients for the standard logistic-based system in (2.25)–(2.30) are
E. Symmetric Distributions Boundary (Logistic)
The elliptical graph in Figure 7 is based on (2.28) and (2.30) and setting from (2.26) and then setting as follows: