Abstract

The Box-Cox transformations are used to make the data more suitable for statistical analysis. We know from the literature that this transformation of the data can increase the rate of convergence of the tail of the distribution to the generalized extreme value distribution, and as a byproduct, the bias of the estimation procedure is reduced. The reduction of bias of the Hill estimator has been widely addressed in the literature of extreme value theory. Several techniques have been used to achieve such reduction of bias, either by removing the main component of the bias of the Hill estimator of the extreme value index (EVI) or by constructing new estimators based on generalized means or norms that generalize the Hill estimator. We are going to study the Box-Cox Hill estimator introduced by Teugels and Vanroelen, in 2004, proving the consistency and asymptotic normality of the estimator and addressing the choice and estimation of the power and shift parameters of the Box-Cox transformation for the EVI estimation. The performance of the estimators under study will be illustrated for finite samples through small-scale Monte Carlo simulation studies.

1. Introduction and Preliminaries

In extreme value theory, one is often interested in the estimation of the extreme value index (EVI), , a parameter controlling the heaviness of the right tail , with the distribution function (d.f.) of the process under study. In a univariate independent and identically distributed (i.i.d.) setup, let us assume that we have access to a random sample of size and use the notation to denote the -th ascending order statistic, . Similar to the central limit theorem, the extremal types theorem states that if there exists a nondegenerate limit distribution for a suitable normalization of the maximum, being an extreme value (EV) d.f. of the type:

From Equations (1) and (2), the d.f. is said to pertain to the domain of attraction for maxima of , with . Such a fact is denoted by . The distribution in Equation (2) encompasses three types of distributions: if , the right tail is light; i.e., has a finite right endpoint (); if , the right tail is heavy of a negative polynomial type and has an infinite right endpoint; if , the right tail is of an exponential type and the right endpoint can be either finite or infinite.

In this paper, we are going to work with heavy-tailed models, i.e., models with that belong to the so-called Fréchet domain of attraction, denoted by . These types of models are quite common in areas like biostatistics, computer science, finance, insurance, and statistical quality control, among others.

1.1. Box-Cox Transformations in Extreme Value Theory

Let be a random variable (r.v.) with d.f. , with probability density function . The Box-Cox (BC) transformation [1] of is a function of the parameter given by the following: where corresponds to a mere change in location, to the square root transformation, and to the reciprocal transformation.

The use of BC transformations in EV theory has been quite diverse. Bali [2] proposed the BC-GEV distribution to model the empirical distribution of returns as a generalization of the GEV distribution in Equation (2) and of the generalized Pareto distribution, with d.f. . The d.f. of the BC-GEV distribution is as follows:

Under a semiparametric framework, Teugels and Vanroelen [3] provided the theoretical framework through the use of regular variation (see Seneta [4] and Bingham et al. [5] for details on regular variation) to determine the optimal values of that maximize the rate of convergence of the second-order condition (to be detailed in Section 1.2) for values of . Another application of the BC transformation was presented in the work of Eastoe and Tawn [6], under a Bayesian framework. Wadsworth et al. [7] use the BC transformation in the inference procedure to account parametrically for the uncertainty surrounding the scale extrapolation in physical processes. Paulauskas and Vaiciulis [8, 9] used the BC transformation and considered estimators based on block scheme considering a family of functions of order statistics.

In the next theorem, the link between the domain of attraction of and of the transformed variable , with d.f. , is established.

Theorem 1 (Wadsworth et al. [7]). Under the conditions such that convergence in Equation (1) holds, with norming sequences and , then with the d.f. in Equation (2) holds for some finite where and we can say that . Furthermore, if is twice differentiable for sufficiently large , then the limiting shape parameter takes the form: with the reciprocal hazard function of the parent distribution . For any such distribution which has , , due to fact that tends then to zero, as .

Remark 2. Note that if , then and is in the same domain of attraction of . If , i.e., if we are in the Fréchet domain of attraction, can be different of and can no longer belong to the Fréchet domain of attraction.

1.2. First- and Second-Order Frameworks

With , let denote the generalized inverse function of , defined by , let be the associated (reciprocal) quantile function, defined as , and let stand for the class of regularly varying functions at infinity with index , i.e., positive measurable functions such that , for all .

In this setup, we have the validity of the first-order condition (de Haan [10]):

To develop the asymptotic nondegenerate properties of the estimators under study, we need to assume the existence of an auxiliary function , with [11], where is a shape second-order parameter, and for every , we often work with a second-order condition (SOC) of the type:

Slightly more restrictively, we are going to work with models in the Fréchet domain of attraction () that belong to Hall-Welsh class of models [12], i.e., models with the second-order right tail expansion: with , , and , the first-order scale parameter. For these models, the second-order tail quantile function is as follows: with . If the SOC in Equation (9) holds with , then . Most common heavy-tailed distributions, like the Fréchet, the generalized Pareto, the Burr and Student’s , belong to the class in Equation (11). The log-gamma and the log-Pareto models ( in Equation (9)) do not belong to the class in Equation (11).

Remark 3. Under a parametric framework, Wadsworth et al. [7] showed that for models in Hall-Welsh class, a BC transformation of the data increases the rate of convergence of the tail of the distribution to the GEV distribution if , and as a byproduct, the bias of the estimation procedure is reduced.
Let us consider a BC transformation such that and denote by the transformed sample, with d.f. , associated with the original sample , where , with . The first-order condition associated with the is as follows: Under the validity of Equation (12) and for some , we have .
Teugels and Vanroelen [3] used the theory of extended regular variation (see de Haan and Ferreira [13]) to determine the optimal values of that maximize the rate of convergence of the second-order condition for values of and proved in Theorems 4.1, 4.2, and 4.3 that when and , a BC transformation has no effect on the SOC of unless ; if and , the BC transformation can have a negative or positive effect on the SOC of and the positive effect happens when . If and , for any value other than , there is no improvement in the speed of convergence of the SOC. Thus, the value that maximizes the rate of convergence of the SOC that will be considered in this work is as follows: This choice will enhance the bias reduction in the estimation procedure.

1.3. Reduced Bias EVI Estimation

For heavy right tails, i.e., whenever Equation (8) holds, the classical extreme value index (EVI) estimator is the Hill (H) estimator [14], with functional form the average of the log-excesses, and consequently also the average of the scaled log-spacings , .

But the Hill EVI estimator, in Equation (14), exhibits the usual trade-off between bias and variance, and therefore, the suitable accommodation of the asymptotic bias has been widely discussed among EVT practitioners. The first papers are due to Peng [15], Beirlant et al. [16], Feuerverger and Hall [17], and Gomes et al. [18], among others. Such a bias reduction usually provided more stable sample trajectories and a lower mean square error at the optimal level, but at the expense of an increase in variance, which did not allow the estimators proposed in the aforementioned papers to surpass the classical estimators for small values. Further details can be found in Gomes and Guillou [19], among others.

For heavy-tailed models, Caeiro et al. [20, 21] and Gomes et al. [22, 23], among others, were able to overcome the bias-variance trade-off, keeping the variance of the proposed reduced bias EVI estimators equal to , the variance of the Hill EVI estimator. For that reason, these estimators were coined minimum variance reduced bias (MVRB) EVI estimators. These classes of estimators depend upon the adequate consistent estimation of the vector of second-order parameters, in in a high level of larger order than , the number of top order statistics used in the EVI estimation, and surmount the classical EVI estimators for all values.

Practical issues related with the estimation of and algorithms are presented by Gomes and Pestana [24], among others. Among several classes of MVRB EVI estimators, we refer to the class introduced in Caeiro et al. [20], the corrected Hill (CH) estimator: with a set of adequate estimators of the second-order parameters (see Section 3). The estimator in Equation (15) will be used in Section 4 for comparison with the new class of MVRB EVI estimators introduced in this paper. Other techniques have been used in the literature to reduce the bias of the Hill estimator, in Equation (14), and new estimators based on generalized means or norms that generalize the Hill estimator have been proposed (see Brilhante et al. [25], Caeiro et al. [2629], Paulauskas and Vaiciulis [8, 9], Beran et al. [30], and Beirlant et al. [31], among others).

In Section 2 of this paper, we present the BC Hill estimator and derive the consistency and asymptotic normality of the estimator under study assuming that the BC parameters are known. Next, in Section 3, we address the estimation of the BC parameters, and in Section 4, we present the finite sample behaviour of the BC Hill estimators and compare it with the corrected Hill estimator. Finally, in Section 5, some concluding remarks and future work are put forward.

2. The Box-Cox Hill Estimator

The Box-Cox Hill (BC Hill) estimator introduced by Teugels and Vanroelen [3] has the functional form: where , so that is strictly positive, , as in Equation (13) and, , in Equation (14). The estimators , in Equation (14), and , in Equation (16), are both consistent for the estimation of , for intermediate sequences of integers , i.e., such that

Under the second-order framework in Equation (9), the asymptotical distributional representation holds true [32], where , with a sequence of i.i.d. standard exponential r.v.’s, is an asymptotically standard normal r.v.

Remark 4. The BC Hill estimator is also a generalization of the EVI estimator introduced by Gomes and Oliveira [33], where the shift imposed to the data is a tuning parameter. The authors studied the adequate choice of that enabled the reduction of the main component of the bias of the Hill’s estimator, in Equation (14). The results obtained in the simulation studies for models with are in line with the ones provided in Section 4.

2.1. Asymptotic Behaviour of the Box-Cox Hill Estimator When and Are Known

In this section, we start by deriving the asymptotic behaviour of the BC Hill estimator assuming that the BC parameters, and , are known. We then get an optimal value for that allows us to obtain a new class of MVRB EVI estimators.

Theorem 5. In Hall-Welsh class of models in Equation (11), for intermediate , i.e., for values such that Equation (17) holds, we have the validity of following distributional representation, If , possibly nonnull, we can guarantee that with denoting a standard normal r.v. If we further choose then

Proof. The universal uniform transformation and from the definition of the reciprocal quantile function in Equation (11), we get , with a unit Pareto r.v. In what follows, we shall use the following relations: for , , , where denotes a standard exponential r.v., and , whenever Equation (8) holds. For the second term in the right hand side, we know that under the validity of Equations (8) and (11), we can write and . Then, Since , for , Using the parametrizations and , the distributional representation in Equation (20) holds true. We then get an optimal value, the value of in Equation (22), such that that the dominant component of the bias of the BC Hill estimator is null, and Equation (23) follows.

Corollary 6. Under the first-order condition in Equation (12), and for the heavy-tailed models in Hall and Welsh class, in Equation (11), such that , with and real constants, , the BC Hill estimator in Equation (16) is asymptotically unbiased for any , not necessarily intermediate.

Proof. This result comes straightforwardly from the definition of the Box-Cox Hill estimator in Equation (16) and from the fact that we are working with heavy-tailed models in Hall and Welsh class, with quantile function , and real constants, .

Remark 7. The heavy-tailed models in Hall and Welsh class that satisfy the property in Corollary 6 are the , the , and the parents. This behaviour will be illustrated in Section 4.

In Table 1, we show for some heavy-tailed models the optimal values of and , in Equations (13) and (22), respectively, where , with the complete beta function.

Remark 8. Teugels and Vanroelen [3] considered two examples to show the bias reduction of the BC Hill estimator. In the first one, a Burr model was chosen. The power of the BC transformation is , and the authors used , which is, according to Equation (22) and Table 1, the optimal value for bias reduction. In the second example, the authors considered a Cauchy model (, ), with . It was noticed that the choice of is very important and a value around 0.5 was suggested as it was the value that had the strongest effect of minimizing the bias of the BC Hill estimator. We now know, from Equation (22) and Table 1, that the optimal value is .

3. Adaptive Estimation of the Box-Cox Parameters

The BC parameters depend on the estimation of the first-order scale parameter and on the estimation of the second-order shape, , and scale, , parameters. In this work, we are going to consider a misspecification of the first-order scale parameter, considering that .

For the estimation of the second-order shape parameter, , we consider the class of estimators introduced in [34] defined by the following: with a real tuning parameter [35]:

Note that , in Equation (14), and the results obtained in Fraga Alves et al. [34] and in Gomes and Martins [36] advise to consider for values of and for values of .

For the estimation of the second-order scale parameter, , we are going to consider the estimator introduced in Gomes and Martins [36], given by the following: where for any , with , , the scaled log-spacings, and the estimator in Equation (27).

Asymptotic considerations as well as simulated results led several authors to consider the estimation of the second-order estimators at a high level , given by where and denotes, as usual, the integer part of . In the simulation studies in Section 4, the value was used to compute the estimates of the second-order parameters. Under adequate general conditions, the classes of estimators in Equations (27) and (29) are asymptotically normally distributed and show highly stable sample paths as functions of , the number of top order statistics used, for a wide range of large values.

Theorem 9 (Fraga Alves et al. [34] and Gomes and Martins [36]). If the SOC in Equation (9) holds, for models in Hall and Welsh class with , , if is intermediate, i.e., if Equation (17) holds and if , then converges in probability towards , as , and converges in probability towards , as . Moreover, with , , , and ,

As stated earlier, the estimates of and are computed at the high level and the following results in Caeiro and Gomes [37] hold true.

Theorem 10 (Caeiro and Gomes [37]). Under the conditions of the previous theorem, with for any and such that is consistent for the estimation of . Moreover, for models in Hall and Welsh class and with , we may further say that , at most of the order of .

In addition to the classes presented in Equations (27) and (29), we also refer the classes introduced in Goegebeur et al. [38], Ciuperca and Mercadier [39], Worms and Worms [40], de Wet et al. [41], and Henriques-Rodrigues et al. [42], among others, for the estimation of and the classes introduced in Caeiro and Gomes [35] and Gomes et al. [43], for the estimation of .

In this paper, we are going to work with adaptive estimates of the BC parameters in Equations (13) and (22). So, for the estimation of , we consider the following: with the estimator in Equation (27), the high level used in the estimation of the second-order parameters, and the Hill estimator computed at the level , with [44], given by the following:

For the estimation of and with the misspecification , we are going to consider: with the estimator in Equation (29). The associated estimators of , and , considering , i.e. , which are consistent for the estimation of and , respectively, for sequences of values such that Equation (17) holds. The BC Hill estimator, in Equation (16), assuming that the BC parameters are adaptively estimated has the functional form: with and presented in Equations (33) and (35), respectively. The consistency of the BCH estimator in Equation (36) for the estimation of the EVI, , is presented in the next theorem.

Theorem 11. In Hall-Welsh class of models in Equation (11), for intermediate , i.e., for values such that Equation (17) holds, with and presented in Equations (33) and (35), consistent estimators of and , respectively, then .

Proof. The delta method is not going to be directly applied since the partial derivatives of the estimator , in Equation (36), in order to are complex. We first note that , , with a consistent estimator of and a consistent estimator of , and defining , with (a consistent estimator of ), then If , we can choose , and if , the value of depends upon the underlying model in study. So, we can write the following: The estimator in the upper bound is the Hill estimator, , in Equation (14), consistent for the estimation of . For the estimator in the lower bound, the delta method leads to the following: with The estimator is of the same type of the estimator , in Equation (19), and is therefore consistent for the estimation of as proven in Section 2 of Gomes and Oliveira [33]. The term can be written as follows: Let us next use the same kind of arguments referred to in the proof of Theorem 5: , with a unit Pareto r.v. and the quantile function in Equation (11), for , , , where denotes a standard exponential r.v., and , whenever Equation (8) holds. Then, we have the following: and we can write the following: Working with values such that Equation (17) holds and given that is a consistent estimator of and is consistent for the estimation of , the estimator is also consistent for the estimation of and the consistency of follows.

Remark 12. In the paper by Gomes and Oliveira [33], the tuning parameter of the estimator presented in Remark 4 was estimated by the order statistic . According to the authors, the estimator is consistent for the estimation of , with asymptotic distributional properties similar to those of the Hill estimator, for models in with finite left endpoint.

Remark 13. The estimator has the same functional form of the shifted Hill estimator introduced in Aban and Meerschaert [45]. The shifted Hill estimator was obtained for shifted Pareto parents of the type and is a conditional maximum likelihood estimator of . The estimator is consistent for the estimation of even though is not consistent for the estimation of under a semiparametric framework (see Remark 2.1 of Gomes et al. [46]).

4. Finite Sample Behaviour of the Estimators

To assess the finite sample behaviour of the BC Hill estimators, in Equations (16) and (36), we have implemented a multisample Monte Carlo simulation experiment of size (see Gomes and Oliveira [47], for further details on multisample simulation) for sample sizes , 500, 1000, 2000, and 5000, from the following models: (1)The Burr model with d.f. , , , and ,(2)The generalized Pareto model, GP, with d.f. , , and ,(3)The Cauchy model, with d.f. , , and ,(4)The Student model, with d.f. , , and , where denotes the (complete) gamma function, ().

In Figures 13, we present the mean values, , and mean squared errors, , for the models under study provided by the Hill estimator, , in Equation (14), the BC Hill r.v. in Equation (16), assuming that the BC parameters are known and denoted by BCH in the figures, the BC Hill estimator, , in Equation (36), assuming that the BC parameters are adaptively estimated by Equations (33) and (35), and the MVRB EVI estimator, , in Equation (15) for comparison. From Figures 13, we can state that: (i)The BCH estimator beats the Hill estimator for all simulated models and for all values, as it presents smaller absolute bias and the minimum MSE is smaller than the minimum MSE of the Hill estimator;(ii)For the simulated Burr models, the BCH estimator underestimates the true value of for values larger than 400. The MSE exhibit the usual shape being the minimum values smaller than the ones of the CH estimator. Note that if has a Burr distribution, then has a strict Pareto distribution and this could justify the excellent results of the BCH statistic;(iii)For the GP models with , the mean values of the CH and BCH estimators almost overlap for a wide region of values (, when and , when ), and from that value forward, the bias of the BCH estimator is smaller than the one of the CH estimator. When , the BCH estimator underestimates for almost all values, whereas the CH estimator overestimates for the same values. Regarding the minimum MSE, both estimators (CH and BCH) present similar values;(iv)For the Cauchy model, the behaviour of CH and the BCH estimator is similar to the behaviour observed for the parent;(v)For the Student models (), the sample paths of the mean values of the CH and BCH estimators overlap from small up to moderate values of , and from that value forward, the bias of the BCH estimator is smaller. Regarding the minimum MSE, both estimators (CH and BCH) present similar values.

In Table 2, we present the estimates of the BC parameters and , as described in Section 3, for the aforementioned models. Information on the 95% confidence intervals (CIs), computed on the basis of 10 replicates with 1000 runs each, is also provided. The simulated data used to support the findings of this study are available from the corresponding author upon request.

Next, and based on the 1000 runs, we computed the simulated optimum level, i.e., the level , with , and present in Tables 35 the simulated values of the optimal sample fraction (the optimal level divided by the number of positive elements in the sample), the mean value of each estimator computed at , and the MSE of each estimator also computed at . For each model, the smallest absolute bias and the smallest MSE are written in bold. Information on the 95% CIs, computed on the basis of 10 replicates with 1000 runs each, is also provided.

Table 2 shows that the estimation of works well for models with first-order scale parameter, , such as the Burr and , as expected and that the estimation of could be improved in some parents. However, from Tables 4 and 5, we noticed that the precise estimation of has more impact on the behaviour of the BCH estimator in some models, like the and than on other models.

At optimal levels, the BCH estimator is always better than the CH estimator for all simulated Burr parents, and in this case, the misspecification has no impact on the BCH estimator since (see Table 1). The simulated mean values, at optimal levels, of the CH estimator are better than the ones of the BCH estimator for all GP parents, but the BCH estimator has competitive MSEs for the and parents. For the Cauchy and Student and parents, the BCH estimator is better than the CH estimator for almost all sample sizes, even with the misspecification .

5. Concluding Remarks and Future Work

In this paper, we have studied the BC Hill estimator introduced by Teugels and Vanroelen, in 2004. We proved the consistency and asymptotic normality of the estimators and with the appropriate choice of the BC parameters we were able to obtain a new class of MVRB EVI estimators. We considered the adaptive estimation of the BC parameters, and the finite sample behaviour of the estimators was illustrated by a Monte Carlo simulation study. From the simulation study, we conclude that the BC Hill estimator performs better than the corrected Hill estimator for several heavy-tailed models in terms of simulated mean values and simulated MSEs. Furthermore, the estimation procedure presents some robustness to the estimation of the BC parameters, as can be depicted by the misspecification of the first-order scale parameter, . Note that several heavy-tailed models satisfy this criterion. The estimation of the BC parameters could be improved by deriving the maximum likelihood estimators, a topic under development and out of the scope of this paper. Other topics of interest for future research, out of the scope of this paper, are the determination of the optimal level used to compute the BC EVI estimates and the use of the BC Hill estimator to derive new classes of estimators for other parameters of extreme events such as high quantiles and the second-order shape parameter, .

Data Availability

The simulated data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare no potential conflict of interests.

Acknowledgments

This research was partially supported by the national funds through FCT, Fundação para a Ciência e a Tecnologia, projects UIDB/04674/2020 (CIMA) and UIDB/0006/2020 (CEA/UL).