Abstract

We use the quantile function to define statistical models. In particular, we present a five-parameter version of the generalized lambda distribution (FPLD). Three alternative methods for estimating its parameters are proposed and their properties are investigated and compared by making use of real and simulated datasets. It will be shown that the proposed model realistically approximates a number of families of probability distributions, has feasible methods for its parameter estimation, and offers an easier way to generate random numbers.

1. Introduction

Statistical distributions can be used to summarize, in a small number of parameters, the patterns observed in empirical work and to uncover existing features of data, which are not immediately apparent. They may be used to characterize variability or uncertainty in a dataset in a compact way. Furthermore, statistical distributions may both facilitate the mathematical analysis of the basic structure of empirical observations and interrelate information from two or more sources. The aim of this paper is to make a contribution to the use of quantile statistical methods in fitting a probability distribution to data, following the line of thought indicated by Parzen [1] and Gilchrist [2, Section ]. In particular, we adopt a five-parameter version of the generalized lambda distribution where . is the quantile function and . If and then (1.1) is a continuous and increasing function of . Here, controls, albeit not exclusively, the location of an FPLD; the parameter acts for a multiplier to the translated quantile function and is, thus, a scale parameter; influence the shape of . The limiting cases and/or must be interpreted according to L'Hopital's rule. The references [27] are illustrative regarding the history of various parametrizations of the distribution (see also [8]). The (1.1) version was proposed by Gilchrist [2, page 163] as an extension of the generalized lambda distribution (GLD) in the parametrization given by Freimer et al. [5] where the scale parameter appears as a reciprocal and .

The FPLD can be useful in problem solving when there are difficulties with specifying the precise form of an error distribution, because it can assume a wide variety of curve shapes and, more importantly, it uses only one general formula over the entire range of data. Its versatility, however, is obtained at the cost of an unusually large number of parameters if compared with the array of studies referred to in the literature on fitting distributions to data. Gilchrist [9] notes the rather strange fact that, throughout the history of statistics, researchers seem to have been quite happy to use many parameters in, for example, regression, yet they have kept to two or three parameters in most distributional models. Clearly, a high number of parameters are a less challenging task in our age of readily available computer power. Moreover, the number of individual data in a statistical sample has considerably increased from what was customary a few decades ago, so there should be no special reason for not using five- (or more) parameter distributional models.

Regrettably, the richness of shape in (1.1) is not accompanied by a comparable development in parameter estimation techniques. For instance, one of the classical methods for computing can be based on matching the first five moments of the sample observations. In this case, though, the problem of modelling the data would be restricted to distributions possessing fairly light tails because the fifth moment must be finite. In addition, the percentiles estimates (e.g., Karian and Dudewicz [10]) and methods using location and scale-free shape functionals (e.g., King and MacGillivray [11]) depend markedly on the particular choice of percentage points. On the other hand, maximum likelihood estimation is computationally demanding because of the complex and unwieldy objective function (see Tarsitano [12]). Asquith [13] as well as Karvanen and Nuutinen [14] equated sample L-moments to those of the fitted distribution. However, higher L-moments are inelastic to changes in the parameters, leading to difficulties in estimation. These questions have been the focus of intense academic research in the past few years, for example, Ramberg et al. [4], King and MacGilllivray [15], Karian and Dudewicz [7], Lakhany and Mausser [16], Rayner and MacGillivray [17], Fournier et al. [18], and Su [19].

In the present paper, we start from the fact that the vector of parameters can be divided

into linear and nonlinear parameters, so that we can apply the nonlinear least squares (NLS) method, proposed by Öztürk and Dale [20], which is driven by the minimization of the sum of squared differences between the observed order statistics and their expected values. Furthermore, we will develop a criterion driven by the minimization of the sum of the least absolute deviations between the observed order statistics and their medians. The two estimation procedures are based on a two-stage inner/outer optimization process. The first stage is the inner step in which the linear parameters are estimated for fixed values of the nonlinear parameters. The second stage is the outer step in which a controlled random search (CRS) technique is applied in order to determine the best companion value for the nonlinear parameters. The two stages are carried out consecutively, and the process is stopped when the difference between the values of the minima of two successive steps becomes smaller than a prefixed tolerance threshold.

The rest of the paper is organized as follows. The Section 2 describes the density of the FPLD random variables and underlines the advantage of its use in practical work. The Section 3 derives the nonlinear least squares for the estimation of . In Section 4, we outline a procedure for applying the criterion of least absolute deviations to the FPLD. The CRS, an optimization algorithm based on a purely heuristic, derivative-free technique is presented in the fifth section. In Section 6, the effectiveness of the proposed methods is demonstrated and compared using real and simulated data. Finally, we conclude and outline future research directions in Section 7.

2. Overview of the Five-Parameter Lambda Distribution (FPLD)

The FPLD is a highly flexible and adaptable tool for modelling empirical and theoretical distributions and, hence, might have many potential applications. In this section, we summarize its general features. The density-quantile function of a FPLD random variable is

The range of a FPLD is given by and is different for different values of ; in particular, the extremes are if ; if , ; and if , . Hence, the FPLD can model distributions whose support extends to infinity in either or both directions as well as those with finite support.

The rth moment of the linear transformation where , is The symbol with denotes the complete beta function. As a consequence, the th moment of exists and is finite if and are greater than .

The central moments of which can be derived from (2.2) are

The shape of the FPLD is sensitive to its parameter values; in fact, the moments of are determined by a combination of all of the vector elements. If and , then the FPLD is symmetric about because, in this case, (1.1) satisfies the condition . If , then (2.1) is skewed to the left (right) if (), which suggests a natural interpretation of as a parameter which is prevalently related to the asymmetry of . For a FPLD, interchanging and and simultaneously changing the sign of reverses the type of skewness. If (), then () indicates the left (right) tail flatness in the sense that the smaller () the flatter the left (right) tail of the FPLD density. In practice, and capture the tail order on the two sides of the support of .

The densities (2.1) are zeromodal if , unimodal with continuous tails if , unimodal with truncated tails if , U-shaped if , and S-shaped if . Curves corresponding to large positive values of and have extreme peakedness and short high tails. For the FPLD degenerates to a one-point distribution.

The FPLD family is a good model for use in Monte Carlo simulation and robustness studies because it contains densities which cover a large spectrum of arbitrarily shaped densities. For example, if , then converges on a potentially asymmetric form of the logistic distribution; if and , then is an exponential distribution whereas for , becomes a reflected exponential distribution. Furthermore, FPLD fits data containing extreme values well; in fact, for , the FPLD corresponds to the generalized Pareto distribution. For , the FPLD generates the power-function distribution. The rectangular random variable is present in four versions: , , , and .

The FPLD could have an important role as a general (not necessarily Gaussian) distribution if it is able to produce good approximations to many of the densities commonly encountered in applications. There are several measures of closeness and overlapping of two statistical distributions based on the distance between density functions or between distributions functions (see, e.g., [21] and [7, pages 196–201]). On the other hand, the FPLD is expressed in the quantile domain, so, for a reliable assessment of the agreement between the exact distribution and that provided by the FPLD, we should use a measure based on quantiles. Statistical literature is not very helpful in providing criteria to quantify the proximity between distributions naturally arising from a quantile framework. In the present paper we have used the Tchebycheff metric, that is, the closeness between the theoretical quantile function and the FPLD is quantified by the maximum absolute difference (Maxd) between observed and fitted quantiles: The Tchebycheff metric is attractive because it leads to the minimization of the worst case and does not imply a heavy computational burden.

The five parameters of have been estimated using the controlled random search method described in Section 5. Table 1 shows the accuracy of the FPLD fit to some standard symmetric distributions.

Our findings suggest that the FPLD fit is reasonably good for all of the models included in the table, with the only possible exception being the Cauchy random variable.

One of the major motives for generalizing the four-parameter GLD to the five-parameter FPLD is to have a more flexible, and, hence, more suitable, distribution for fitting purposes. How much better, though, is the FPLD when compared to the GLD? Are the fitting improvements that the FPLD provides significant enough to offset the complexity that is introduced through a fifth parameter? Table 2 shows the results of the fitting procedure for some asymmetric distributions. The first row presents the parameter estimates and the Tchebycheff metric for the FPLD approximation; the second row presents the same quantities for the GLD approximation using the FMKL parametrization (see Freimer et al. [5]). The values in the table indicate that the upper bound on the discrepancy between the true and the approximated distribution is systematically lower for the FPLD than it is for the GLD. The amounts appear to be large enough to justify the inclusion of an additional linear parameter in the GLD so as to form the FPLD.

The most important point to note in Tables 1 and 2 is that the FPLD is a flexible class of very rich distributional models which cover the Gaussian and other common distributions. In this sense, the FPLD is a valid candidate for being fitted to data when the experimenter does not want to be committed to the use of a particular distribution.

3. Nonlinear Least Squares Estimation

Öztürk and Dale [20] proposed a nonlinear least squares (NLS) framework to estimate vector for a four-parameter generalized distribution. The method can easily be generalized to handle a FPLD.

For a given random sample of -independent observations from , the th order statistic may be expressed as in which the ordered observations are compared with their expected values under the hypothesized distribution and is a measure of the discrepancy between the observed and modelled th value. The error terms are such that and . The heteroskedasticity in (3.1) is not the only violation of standard assumptions of the classical regression model. In fact, the error terms are not independent and do not come from a symmetrical distribution (except for the median). However, since the purpose was to obtain an approximate solution, Öztürk and Dale [20] ignored these inadequacies.

The expected value of the th order statistic from a FPLD is available in closed form where . It follows that the deterministic component of (3.1) can be written as with and

The least squares approach calls for in (3.3) to be chosen so as to minimize The is linear in but not in . According to Lawton and Sylvestre [22] the solution to (3.5) can be obtained by fixing the value of and, then, applying the ordinary least squares to solve the linear problem for , where denotes the matrix with elements given by (3.4) and is the vector of the ordered observations. The conversion of into is straightforward:Öztürk and Dale [20] used an approximation of to overcome the problems created by repeated use of the gamma function in (3.4). More specifically, they used , with the plotting positions . In this case, the pseudoregressors are Parameter estimates are then obtained as in (3.5) through (3.7) with in place of for . We can now define the reduced form of the quantile function (3.5) in which is substituted into (3.3). Clearly, the reduced quantile function only depends on the pair . We might try to find a better estimate of by using one of the function minimization techniques to solve (3.5) and then go through exactly the same procedure as the one described above. The two-stage process is repeated until the correction for becomes sufficiently small.

4. Least Absolute Deviations Estimation

Filliben [23] noted that order statistics means have three undesirable properties: no uniform technique exists for generating the for all distributions; for almost all distributions is difficult or time consuming to compute and so must be stored or approximated; for other distributions (e.g., Cauchy), the expected value of the order statistics may not always be defined. All three of these drawbacks are avoided in general by choosing to measure the location of the th order statistic by its median rather than by its mean.

The use of the median in preference to the mean leads naturally to the absolute error being the loss function instead of the squared error. Gilchrist [2] called distributional residuals the deviations of the observed data from a given quantile function . In this context, the method of nonlinear least squares discussed in the previous section is defined as distributional least squares since it involves using the ordered data as a dependent variable, whereas the deterministic component of the model is defined by a quantile function.

The least squares criterion is reasonable when data come from a normal distribution or when linear estimates of the parameters are required. While data may contain outliers or highly influential values, the method of least squares has a clear disadvantage in that it may be pulled by extremely large errors. The least absolute deviation estimation of parameters is more suitable in such cases.

In this section we consider the problem of finding a vector of constants which minimizes the objective function where is the th order statistic of the sample from the quantile function . Rao [24] observed that the solution to this problem is the vector of marginal medians, that is, the th element of is the median of the order statistics .

The quantile function (see, e.g., Gilchrist [2, page 86]) of the th order statistic for a sample of size from an FPLD is with where is the inverse of the incomplete beta function. Consequently, the median of the th order statistic can be computed using The quantities in (4.3) are a solution of (4.1) in the sense that, for a fixed value of , the following expression attains its minimum value. It follows that the estimate of that minimizes (4.4) can be obtained by the use of distributional absolutes (DLAs).

There is nothing in the previous section that cannot be extended to the criterion (4.4). In the light of the decomposition of into two groups, linear parameters and nonlinear parameters , another two-stage procedure can be devised. In the inner stage, we determine the vector which minimizes with respect of the linear parameters and for a given value of the nonlinear parameters . In this case, the pseudo regressors are

The computation of the LAD estimates for in (4.5) can be formulated as a linear programming optimization and the standard simplex method can be employed to solve this problem. For a detailed treatment of the method of least absolutes see, for example, Koenker and D'Orey [25]. The conversion of into takes place as shown in (3.7). Once have been estimated, we can obtain a new value for with the same outer step as that used in the previous section. The inner and outer steps can be alternated until a satisfactory result is found.

Least absolute deviations have, undoubtedly, greater sophistication than least squares, although two important points should be made regarding this issue. First, the annoying computation of has to be executed just once. In addition, the fact that implies that and, consequently, the number of medians to be computed is halved.

5. Controlled Random Search (CRS)

The inner/outer scheme outlined in previous sections has the merit of reducing the dimension of the nonlinear problem. To apply it, however, we must minimize two relatively complex objective functions and of two unknowns, , which generally have more than one minimum. The solutions to these equations are difficult to obtain through traditional optimization approaches. In order to perform this special task, we resorted to a direct search optimization technique (CRS) proposed by Price [26], which is suitable for searching the global minima for continuous functions that are not differentiable, or whose derivatives are difficult to compute or to approximate (see [27, 28]).

Some well-known direct search procedures such as the downhill simplex method of Nelder and Mead [29] and the pattern-search method of Hooke and Jeeves [30] are familiar to statisticians. However, these techniques are really local optimization methods; in practice, they are designed so as to converge on a single optimum point and, therefore, they, unavoidably, discard information related to all other possible optima. In contrast, a wise application of the CRS overcomes this drawback since this method is especially suited to multimodal functions. In effect, CRS procedures have more randomness in their decision process, thus increasing the chances of finding a global optimum. Of course, global optimality cannot be guaranteed in finite time.

A CRS procedure employs a storage of points, the number of which is determined by the user for the particular problem to be solved. Repeated evaluation of the function to be optimized, , is performed at points randomly chosen from the storage of points and the search region is progressively contracted by substituting the worst point with a better one. The search continues until an iteration limit is reached, or until a desired tolerance between minimum and maximum values in the stored values is attained. Apparently, these methods are less known and less frequently used in statistics (two works that have appeared in literature are those by Shelton et al. [31] as well as Křivý and Tvrdík [32]).

For practical purposes, it is necessary to confine the search within a prescribed bounded domain. Let be the set of admissible values of the nonlinear parameters of the FPLD model. The most promising region that we have found in our applications is the rectangle delimited by Furthermore, ensures that the mean of the order statistics from a FPLD always exists and the ranges of cover most of the distributional shapes commonly encountered.

The theory and practice of global optimization have progressed rapidly over the last few years, and a wide variety of modifications of the basic CRS algorithm are now available. In particular, we refer to the scheme discussed in Brachetti et al. [33].

Any CRS implies many subjective choices of constants and weights. The version used in our paper is as follows. (1)Generate a set of random admissible points in . Find the minimum and the maximum in and their function values: . (2)If , then stop. Also stop if the limit to the number of function evaluations has been reached. The point with the lowest objective function value found is returned. If required, return all the points in . (3)Choose at random, without duplication, a set of points in , include in , and compute the centroid . Define . (4)Select randomly a point . (5)Generate uniform random numbers in and compute the trial point ; thus, the exploration of the space around the centroid is based on a randomized reflection in the simplex originally used by Price [26]. (6)If then compute and go to step . If , then repeat step a fixed number of times. If is still unfeasible, then exclude from , redefine the set and, if it is not empty, repeat step for at most times; otherwise go to step . (7)If , then replace with and with and then redetermine the maximum and . (8)If , then replace with , with , with , with and and . (9)Randomly choose two distinct points in and compute for .(10)If for each , then build the following quadratic interpolation of the objective function: for . If some of the , then generate uniform random numbers , in and compute (11)If , then repeat steps - no more than times. If is still unfeasible, then return to step .(12)Execute steps with in place of and in place of and then return to step .

We experimented by using the CRS algorithm, with , , , , and . For the random component of our implementation we used quasirandom Faure sequences (Faure [34]) rather than pseudorandom numbers. This choice was made in order to cover the search region more evenly (see, e.g., Morokoff and Caflisch [35]).

The CRS procedure has several desirable features. Boundaries and objective function can be arbitrarily complex because the procedure does not require the calculation of derivatives. Shelton et al. [31] observed that, if several points within the optimization region are equally good, the procedure will make this clear to the user.

The general outline presented here should give some appreciation of the philosophy underlying the design of CRS. It is important not to forget that, although this method proved very robust in many applications, typical nonlinear problems which have a complicated bound-constrained, possibly multiextremal objective function are challenging tasks and the success of an automated algorithm is still dependent upon the researcher's guidance. In particular, intervention may be needed to contract the box constraints on the parameters in order to perform a finer search in a final stage.

6. Comparison of the Procedures

In this section we present results of some numerical experiments carried out in order to study the behavior of the FPLD model. Moreover, we perform a comparison between estimated and theoretical values to test the accuracy of the estimators proposed in the previous sections and investigate their properties through a Monte Carlo sampling process.

6.1. Fitting to Data

To illustrate the efficacy of the FPLD as a modeling device we applied it to four sets of real data. To judge the overall agreement between estimated and observed quantiles we used the correlation between the empirical values centered on the arithmetic mean and the hypothesized values where , and From another point of view, (6.1) can be considered a measurement of the linearity of the Q-Q plot for empirical versus theoretical standardized quantiles. The coefficient (6.1) will always be positive since and are both arranged in ascending order of magnitude. More specifically, takes values in the interval. The case corresponds to a noninformative model ; a value of the coefficient close to one will suggest a good fit of the FPLD. The model that causes data to be most like a straight line on its Q-Q plot is the FPLD that most closely resembles the underlying distribution of the data. Correlations that are too small indicate a lack of fit. Finally, means that , that is, the observed data correspond exactly to their expected values.

Table 3 reports our findings on the following datasets:(i)recorded annual drought of Iroquois River recorded near Chebanse (IL), , Ghosh [36],(ii)times in seconds to serve light vans of the Severn Bridge River crossing in Britain, , Gupta and Parzen [37],(iii)peak concentrations of accidental releases of toxic gases, , Hankin and Lee [38],(iv)observations on overall length of bolts, , Pal [39].

The symbol denotes the number of function evaluations required to find the global optimum of the criterion. The acronym OD refers to the plotting positions proposed by Öztürk and Dale [20].

The best overall performance is achieved by NLS; comparably good results are also obtained for DLA and they are only slightly inferior for OD. Based on these few values, the computational difficulties encountered in NLS may be worth the effort. The estimates provided by DLA can be very different from the others because they are based on a different distance metric. The complexity of DLA is higher than that of NLS or OD, due to its more intricate search. The limited impact on the modelling adequacy does not compensate for the extra energy expended for the DLA procedure.

6.2. Simulations

In this section we carry out a simulation study to evaluate the performance of the proposed methods mainly with respect to their biases and squared errors for different sample sizes. In particular, we consider the parameter combination that allows the FPLD to approximate to the standard Gaussian distribution: The vector will act as our reference point. The reason we use the Gaussian model is that the normality assumption is one of the most extensively used in statistics and a new method should at least work in this default case.

Two simple coefficients of performance have been considered for comparing The mean relative bias (MRB) quantifies the average relative magnitude of the accuracy for each parameter and the root mean squared error (RMSE) represents the mean Euclidean distance between simulation and measurement. Methods that yield estimates which are closer to the true parameter value have lower bias, higher precision, and lower RMSEs.

The statistics in (6.4) were calculated by generating different random samples of size from . The results are summarized in Table 4

For all methods, MRB and RMSE decrease as the sample size increases, an indication that all of the estimators are consistent. The sign of the bias for the exponential parameters is almost always negative showing a systematic underestimation of probably due to the fact that the Gaussian has an infinite range (which would imply a FPLD with negative values of the exponential parameters), but it is well approximated by a FPLD with a limited support.

As might have been expected, the bias and the standard error of the exponential parameters yielded little or no discernible differences, particularly for the largest samples. The bias of the NLS technique is generally smaller than those of the other methods. This effect is particularly noticeable in the exponential parameters. The standard error of OD and DLA are generally larger than those for the NLS so that this method can be considered as the best of those looked at in this paper. However, for moderate sample sizes (), OD is nearly as good as NLS, but takes less time to compute, so the choice of method depends on the particular dataset. The DLA solution does not seem to be a real improvement over methods based on the least squares criterion.

7. Conclusions

Statistics professionals are often faced with the problem of fitting a set of data by means of a stochastic approximation. For this purpose, it is essential to select a single parametric family of distributions that offers simple but still realistic descriptions of most of the potential models of the underlying phenomenon. A very useful and tractable parametric model is the five-parameter generalized lambda distribution. By means of the FPLD model we have obtained some satisfactory fitting of empirical data and at least one of the many subordinate models can provide a good approximation to several common distributions.

More specifically, we have analyzed three methods for estimating the parameters of the FPLD (two of which are completely new). The common denominator of all of the methods is the replacement of linear parameters by their least squares or least absolutes estimates, given the value of the nonlinear parameters that, in turn, are determined by using a controlled random search technique. The proposed procedures are simple to apply as they only require a derivative-free optimization over a bounded region. Thus, many obstacles to the use of the FPLD are resolved. The efficiency of the new algorithms has been verified by applications to real and simulated data.

Our findings indicate that NLS estimators outperform least absolute deviation estimators with respect to bias and mean square error. For large sample, however, the difference between NLS and the approximated, but more manageable, scheme OD proposed by Öztürk and Dale [20] becomes imperceptible.

As for future research, we plan to compare different diagnostic tools used to analyze the fit of a quantile model. It would also be interesting to incorporate in the methods described in this paper a weighting scheme (e.g., weights proportional to the local density) so that the tails and/or the middle portion of the samples become more detectable.