The Use of Generalized Means in the Estimation of the Weibull Tail Coefficient
Due to the specificity of the Weibull tail coefficient, most of the estimators available in the literature are based on the log excesses and are consequently quite similar to the estimators used for the estimation of a positive extreme value index. The interesting performance of estimators based on generalized means leads us to base the estimation of the Weibull tail coefficient on the power mean-of-order-. Consistency and asymptotic normality of the estimators under study are put forward. Their performance for finite samples is illustrated through a Monte Carlo simulation. It is always possible to find a negative value of (contrarily to what happens with the mean-of-order- estimator for the extreme value index), such that, for adequate values of the threshold, there is a reduction in both bias and root mean square error.
1. Introduction and Preliminaries
Statistics of extremes, either univariate or multivariate, have been recently faced with many different challenges, which have enabled to better understand the complexity of extreme events in the most diverse areas of applications, like biostatistics, dynamical systems, environment, finance, insurance, and structural engineering, among other fields. Risky events are commonly in the tails of the underlying distribution, and there are usually only a few observations in those tails. Consequently, and thinking only on the univariate situation, estimates either much above the observed maximum or below the observed minimum are often required. It is thus necessary to consider models for the tails, and those models are most of the times based on asymptotic results.
Let us assume that, possibly after an adequate transformation, the available transformed sample, , can be regarded as a sample of size of independent, identically distributed (IID) random variables (RVs) from a cumulative distribution function (CDF) . More generally, can be assumed to be a sample of stationary weakly dependent RVs from . Let us use the notation for the associated ascending order statistics (OSs). Further assume that there exist sequences of real constants and such that the linearly normalized maximum, , converges weakly to a nondegenerate RV. Then (see Gnedenko ), the limiting CDF is necessarily of the type of the general extreme value (GEV) CDF, given by
The CDF is then said to belong to the max-domain of attraction of , and the notation is used. The model, in (1), is perhaps the most relevant univariate asymptotic model in statistical extreme value theory (EVT). For other relevant asymptotic models and different approaches to statistics of univariate extremes, see the reasonably recent overviews by [2–5].
The parameter is the extreme value index (EVI), one of the most relevant parameters of large events. This parameter measures the heaviness of the right-tail function , and the heavier the right tail, the larger is. Heavy-tailed models, i.e., Pareto-type underlying CDFs, with a positive EVI, belong to , with defined in (1). Note that, in a univariate framework and with denoting the class of regularly varying functions at infinity with an index of regular variation , i.e., positive measurable functions such that , for all (see Bingham et al. , for details on regular variation), and with the notation the following equivalences hold:
As an example of a CDF in , and among many others, we mention the Fréchet CDF, , ().
In this paper, our interest lies essentially in the estimation of the Weibull tail coefficient (WTC), another relevant parameter of extreme events. Regularly varying cumulative hazard functions will thus be considered. Indeed, the WTC is the parameter in a right-tail function of the type:
The class of models with a Weibull-type tail is quite broad and includes, among others, the normal, the gamma, the Weibull, and the logistic distributions. This type of models is quite useful in several areas of applications such as hydrology, meteorology, environmental sciences, and nonlife insurance (see de Wet et al. ). Further note that condition (4) is equivalent to assume that the inverse cumulative hazard function is a regularly varying function with index . Thus, with , a slowly varying function.
1.1. Semiparametric Estimators of the WTC
Regarding the estimation of the WTC, one of the first WTC estimators in the literature was based on record values . The use of the upper order statistics in the sample was considered in Broniatowski , Beirlant et al. [10, 11], and Dierckx et al. . Most WTC estimators are based on the relative excesses: or on the log excesses:
Indeed, Beirlant et al.  proposed the estimator with functional form:
Weighted versions of the estimator can be found in Gardes and Girard  and Goegebeur et al. . The following Hill-type WTC estimator was studied in Gardes and Girard : with being the classical Hill ()  EVI estimators for heavy-tailed models, which can be written as the average of the log excesses, i.e., with defined in (7). Consistency of the Hill estimator for holds if is an intermediate sequence, i.e., if
The quite positive performance of most of the EVI estimators based on generalized means (GMs) leads to the consideration of a simple generalization of the H EVI estimators, in (11), studied in Brilhante et al. , and almost simultaneously in Paulauskas and Vaiciulis  and in Beran et al.  (see also Segers ). Such a generalization leads to the so-called power mean-of-order- () EVI estimators. Indeed, on the basis of (11), it is possible to write
Since the EVI estimators are the logarithm of the geometric mean (or mean-of-order-0) of , , defined in (6), the mean-of-order- of , for any real (see Gomes and Caeiro  and Caeiro et al. , among others), can be more generally considered. This leads to the mean-of-order- EVI estimators:
Just as mentioned above, GMs have recently been used with high success in the estimation of a positive EVI allowing one to obtain reduced bias estimators of . The adequate choice of , in (14), enables such bias reduction for the mean-of-order- EVI estimators. Due to the specificity of the WTC, its relevance and its deep link to a positive EVI, the GMs, in (14), will now be used for the estimation of the WTC, with the consideration of with defined in (14), for any real . Notice that the estimator in (10) is a particular case of . Indeed, we have .
In Section 2 of this paper, after a few comments on the role of the WTC and some preliminary results, a few details on the asymptotic behaviour of the WTC estimators in (8), (9), and (15) are provided. Again, a high variance for small and a high bias for large can appear, and thus it is necessary to reduce bias and/or properly choose the tuning parameters in play. Section 3 is dedicated to an extensive Monte Carlo simulation of the WTC estimators under study. Regarding the mean-of-order- estimation, it was always possible to find a value of (negative, contrarily to what happens with the mean-of-order- EVI estimation), such that, for adequate values of the threshold, there is a reduction in both bias and root mean square error (RMSE). Finally, in Section 4, a few overall conclusions are drawn. One of the main points of the article is that, as even asymptotically equivalent estimators may exhibit very diversified finite sample properties, it is always sensible to work, in practice, with a few WTC estimators, possibly dependent on tuning parameters, which make them more flexible.
2. Asymptotic Properties
2.1. Preliminary Results
To study the nondegenerate asymptotic behaviour of the estimators, a second-order condition is required to specify the bias term. This condition can be expressed in terms of the slowly varying function in (5). Let us assume that the rate of convergence of towards 0 is ruled by a function . Then, there exists : and . This second-order parameter quantifies the rate of convergence of to 0. The closer is to 0, the slower is the convergence.
Remark 1. In the context of EVT, the EVI and the second-order parameter are null . Associated tails are then in the domain of attraction for maxima of Gumbel’s law , which exhibit a penultimate behaviour, looking more similar either to Weibull or Fréchet tails, according to or , respectively. For details on penultimate behaviour, see Gomes [30, 31] and Gomes and de Haan , among others. Indeed, notice that , and consequently,
Moreover, employing the Taylor expansion to the first term,
Then, i.e., . Moreover, there exists a slowly varying function such that i.e., .
Next, we provide some information regarding the distributional behavior of , defined in (7). Suppose that are the order statistics generated by independent standard Pareto random variables with CDF , . Then, , and . If is intermediate, the following distributional representation: holds. Hence, since are independent, identically exponentially distributed with mean one (see ), it follows that for ,
Results for can be easily deduced due to the relation . Thus, we get
2.2. Asymptotic Behaviour of the Estimators
The next theorem establishes the limit distribution of .
Theorem 2. For intermediate values of as in (12), the estimator , in (10), is consistent for the estimation of . More than that, the distributional representation holds, with an asymptotically standard normal RV.
The asymptotic behaviour of the new class of mean-of-order- WTC estimators, in (15), is next stated and proven.
Theorem 3. Under the validity of the conditions in (4) and (16), with being a sequence of intermediate values, as in (12), the asymptotic distributional representation holds for the mean-of-order- WTC estimator, , in (15), with being the standard normal RV in (24).
Proof. It is just needed to prove equation (26) for , since the case was already derived in Theorem 2.
By using (23) and the result , we obtain the following distributional representation: Consequently, under the validity of (12) and using (22) and the same results used in the proof of Theorem 2, it is possible to write Then, and from the definition of in (14), it follows that The result in (26) follows straightforward from the definition of , in (14).
Remark 4. Under the conditions of Theorem 3, the asymptotic distributional representation of the WTC estimators in (10) and (15) is the same. The independence on the real tuning parameter , in (26), associated with the mean-of-order- prevents the determination of the optimal value, i.e., the value of that cancels the asymptotic bias, or minimizes the RMSE of the mean-of-order- WTC estimator. However, dependence on can appear if higher-order terms are considered in the expansion of the tail quantile function.
Proof. From the distributional representations in (24) and (26), it is possible to write Assuming that and (see , Proposition 2.1), with being the standard normal RV in (24), denoted by , the result follows.
Proof. For the proof of the first limit result, we refer to Theorem 3.2. of Beirlant et al.  with some trivial modifications. The second limit result is a particular case of Theorem 1 in Gardes and Girard .
Remark 7. Although Corollary 5 and Proposition 6 provide similar asymptotic distributions for the WTC estimators considered in this work, the same cannot be guaranteed about their finite sample performance. It is known that asymptotic equivalent estimators of the WTC can provide a different behaviour for small sample sizes (see Goegebeur et al. , p. 3697). Indeed, a similar comment applies to estimators of any parameter of rare events.
3. Monte Carlo Simulation Study
In this section, the finite sample performance of the class of estimators is evaluated through a Monte Carlo simulation study. For comparative purposes, the WTC estimators and in (8) and (9), respectively, were also included in the study. The values for the parameter were selected from a preliminary simulation study. The value was always used, since it provides the estimator in (10). The value was also used to illustrate the effect of a positive value parameter. The following Weibull-type models were considered: (1)Exponential distribution, , with CDF
The WTC is . (2)Gamma distribution, , , with densityfor which . Illustration is provided for . (3)Weibull distribution , , with density
Illustration is provided for . The WTC is .(4)Half-normal distribution is the absolute value of a standard normal RV. For this model, the WTC is .
For each model, 20000 samples of size , 200, 500, 1000, 2000, and 5000 were simulated. Next, for each model and sample size, , the simulated values of , , , provided by the -th simulated sample were computed. Next, the Monte Carlo estimates of the mean value () and RMSE, were obtained. In addition, the simulated optimum levels, were computed.
Figures 1–4 are related to the behaviour of the aforementioned class of WTC estimators , as a function of . At the left, the simulated values of the mean value are presented, and, at the right, the corresponding estimates of the RMSE are provided. The horizontal solid line, at the left plot, indicates the true WTC value. A good performance is determined by the flatness of the mean value curve close to the true value of and by a small RMSE in such flatness region.
Although the simulation is limited to this selected Weibull tail models, the following comments can be drawn. (i)The estimator in (10) ( in (15)) has always a negative bias. The bias makes the estimator very sensitive to the choice of the threshold ;(ii)It appears that, in all the simulated cases, it was always possible to find a negative value of that drastically reduces the absolute bias and the RMSE. This is the opposite to what typically happens with the mean-of-order- EVI estimator, where there is a reduction of bias as well as of RMSE, for positive values of . And for such a value of , strongly beats the estimator considered by Gardes and Girard ;(iii)The estimators and , in (8) and (9), beat the class of mean-of-order- WTC estimators in terms of bias and RMSE for the exponential and Weibull parents under study. For these two parents, the best estimator was the one proposed by Girard ;(iv)For the gamma and half-normal parents, here considered, it is always possible to find a value of such that the estimator outperforms in bias and in RMSE both the estimators and ;(v)Algorithmic details on the choice of tuning parameters and are still under development but can be easily devised, similarly to what is done in Caeiro and Gomes  or Gomes et al. .
In Tables 1–4, the simulated values of the optimal sample fraction (OSF, the optimal level divided by the sample size) of the mean value () and of the RMSE of the estimators under study are presented. For each model, the mean value closest to the target value and the smallest RMSE are written in bold. Observe that to reach the smallest absolute bias or the smallest RMSE, it is necessary to use a larger sample fraction than the one required by (). The smallest absolute bias and RMSE are always achieved by with , for the gamma and half-normal models. Also, the optimal decreases, as the sample size increases. For large sample sizes, the choices , , , and seem to provide an overall good performance for the exponential, gamma, Weibull, and half-normal models, respectively. For the exponential and Weibull models, the smallest absolute bias and RMSE are always achieved by .
In this paper, the estimation of the WTC, a parameter of high interest when working with Weibull-type models, is the main topic under discussion. Due to the similarity between the WTC estimation and the EVI estimation and the good performance of the EVI estimators based on GMs, a new class of WTC estimators was introduced based on the power mean-of-order-. The consistency and asymptotic normality of the new class of estimators were obtained under adequate conditions. The finite sample behaviour of the estimators was evaluated through a Monte Carlo simulation study applied to some selected Weibull-type models. The dependence on the tuning parameter makes the new class highly flexible when compared to the classical WTC estimators , , and available in the literature. For the new class of WTC estimators, it is always possible to find a negative value of the tuning parameter that enables a sharp bias and RMSE reduction for the gamma and half-normal models. For the exponential and Weibull models, the WTC estimators and outperform the new class of WTC estimators proposed in this paper, with the estimator being the one providing the smallest bias and RMSE. A possible improvement to could be achieved if we replace in (10) by in (14). This topic should be addressed in a future work. Anyway and looking at the simulated values, it is possible that a choice of different from the ones considered in the Monte Carlo simulations would provide the smallest bias in most situations. Algorithms for the choice of the tuning parameters and are under development and out of the scope of this paper.
The simulated data used in this study is available from the authors upon request.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
This research was partially supported by national funds through FCT (Fundação para a Ciência e a Tecnologia), through projects UIDB/00297/2020 (CMA/UNL), UIDB/04674/2020 (CIMA), and UIDB/00006/2020 (CEA/UL).
J. Beirlant, F. Caeiro, and M. I. Gomes, “An overview and open research topics in statistics of univariate extremes,” Revstat–Statistical Journal, vol. 10, no. 1, pp. 1–31, 2012.View at: Google Scholar
C. Scarrott and A. MacDonald, “A review of extreme value threshold estimation and uncertainty quantification,” Revstat–Statistical Journal, vol. 10, no. 1, pp. 33–60, 2012.View at: Google Scholar
N. H. Bingham, C. M. Goldie, and J. L. Teugels, Regular Variation. 27, Cambridge University Press, 1989.
M. Berred, “Record values and the estimation of the Weibull tail-coefficient,” Comptes rendus de l’Académie des sciences. Série 1 Mathématiques, vol. 312, no. 12, pp. 943–946, 1991.View at: Google Scholar
J. Beirlant, J. L. Teugels, and P. Vynckier, Practical Analysis of Extreme Values, Leuven University Press, 1996.
L. Gardes and S. Girard, “Comparison of Weibull tail-coefficient estimators,” Revstat–Statistical Journal, vol. 4, no. 2, pp. 163–188, 2006.View at: Google Scholar
F. Caeiro, M. I. Gomes, and L. Henriques-Rodrigues, “Estimation of the Weibull Tail Coefficient through the Power Mean-of-Order-p,” Recent Developments in Statistics and Data Science, In press, 2022.View at: Google Scholar
M. I. Gomes and F. Caeiro, “Efficiency of partially reduced-bias mean-of-order-p versus minimum-variance reduced-bias extreme value index estimation,” in Proceedings of COMPSTAT 2014 International Association for Statistical Computing, pp. 289–298, Geneve, Switzerland, 2014.View at: Google Scholar
M. I. Gomes, Some Probabilistic and Statistical Problems in Extreme Value Theory, [Ph.D. thesis], University of Sheffield, 1978.
M. I. Gomes, “Penultimate behaviour of the extremes,” Extreme Value Theory and Applications, Springer, pp. 403–418, 1994.View at: Google Scholar
F. Caeiro and M. I. Gomes, “Threshold selection in extreme value analysis,” Extreme Value Modeling and Risk Analysis: Methods and Applications, Chapman-Hall/CRC, Boca Raton, FL, USA, pp. 71–89, 2015.View at: Google Scholar
M. I. Gomes, F. Caeiro, L. Henriques-Rodrigues, and B. Manjunath, “Bootstrap methods in statistics of extremes,” Extreme Events in Finance: A Handbook of Extreme Value Theory and its Applications, John Wiley & Sons, Hoboken, NJ, USA, vol. 6, pp. 117–138, 2016.View at: Google Scholar