In this paper, we introduce reduced-bias estimators for the estimation of the tail index of Pareto-type distributions. This is achieved through the use of a regularised weighted least squares with an exponential regression model for log-spacings of top-order statistics. The asymptotic properties of the proposed estimators are investigated analytically and found to be asymptotically unbiased, asymptotically consistent, and asymptotically normally distributed. Also, the finite sample behaviour of the estimators are studied through a simulation study The proposed estimators were found to yield low bias and mean square errors. In addition, the proposed estimators are illustrated through the estimation of the tail index of the underlying distribution of claims from the insurance industry.

1. Introduction

Pareto-type distributions are often encountered in applications in the area of finance [13], reinsurance [46], risk management [79], and telecommunication [10, 11]. This distribution type has tail function

or equivalently upper tail quantile function

The component and are slowly varying functions expressed as

The parameter is strictly positive for Pareto-type distributions and is also known as the tail index.

Suppose denote independent and identically distributed (i.i.d) random variables drawn from a distribution belonging to the maximum domain of attraction of the Pareto family of distributions, then for some auxiliary sequences of constants and [12],where . The estimation of continues to receive considerable attention in statistics of extremes as all inferences in extreme value analysis depend on the tail index. In practice, we seek estimators with less variance and bias as possible. A parametric or semiparametric approach can be employed to estimate the tail index, [1316]. However, in this paper, we employ the semiparametric approach to develop reduced-bias estimators since they result in bias reduction.

Under the semiparametric framework, the tail index estimators are dependent on the largest observations, with these assumption about :

Assumption 1. as .

Assumption 2. as .
The most widely used semiparametric tail index estimator is the Hill estimator [17]. The author in [17] approximates the top order statistics with a Pareto distribution and estimates using a maximum likelihood estimator (MLE). The Hill estimator has the minimum asymptotic variance among the semiparametric estimators but it is very sensitive to the choice of [18]. This drawback of the estimator makes its usage challenging in practice, especially in the selection of the tail fraction, . The author in [17] defined the tail estimator asThe Hill estimator due to its popularity has received several generalisations: (see, for example, the works of [1927]). Even though, these estimators possesses desirable properties of a good estimator, they are sensitive to changes in . In practice, this possesses a problem in choosing which value of is to be chosen for the estimation of tail index and other extreme value parameters. In view of this, several authors have looked at the selection of optimal values of (see [28] for a comprehensive review on threshold selection).
The authors in [29] proposed the bias-corrected Hill estimators (which a refinement of the Hill estimator) to solve the challenges of the Hill estimator to some extent. Specifically, the authors proposed two approaches for reducing the bias of the Hill estimator while maintaining the asymptotic variance of the Hill estimator. Empirically, the bias-corrected Hill estimator yields stable tail index estimates compared to the Hill estimator, i.e., the bias-corrected Hill estimator is less sensitive to the choice of relative to the Hill estimator. However, this estimator appears to be unstable for large values of .
In this study, we seek to propose alternative tail index estimators that yield much more stable tail index estimates relative to the number of top-order statistics and attain the minimum asymptotic variance of the Hill estimator under some conditions. The former is of much importance in the extreme value literature in addressing the problem of selection of tail sample fraction for semiparametric estimators of tail index and extreme events. The proposed method employs the regularised weighted least squares method, which entails weighting the data to account for the variability in the data and penalising the bias term. This technique minimises the bias reduction effect for smaller , resulting in bias-reduced estimators that attain the Hill estimator’s asymptotic variance.

2. Estimation Methods

We let denote a sequence of i.i.d random variables drawn from a population with distribution function and the associated tail quantile function . Let be the order statistics associated with the sample. Using equation (2), the order statistics can be jointly expressed aswhere , represent the order statistics of the standard uniform distribution. Using equation (6), the authors in [30] demonstrated thatwhere and also obtained a more refined expression of equation (7) by imposing a second-order assumption on the rate of convergence to equation (3). This is stated as an assumption as follows:

Assumption 3. There exists a real constant and a rate function that is regularly varying, with index , i.e., for some satisfying equation (3) for all :with [30].
Under Assumption 3, the authors in [30] showed that the weighted log-spacings of order statisticsare approximately exponentially distributed. They particularly obtained the expressionwhere as , and are i.i.d exponentially distributed with a unit mean and is a second-order parameter. The authors employed MLE to the estimate the parameters in equation (10).
Using equation (10) and Assumptions 1 and 2, the authors in [31] demonstrated that can be further approximated as a regression modelwhere is the slope, is the covariate, is the intercept, and is the error terms with asymptotic mean, 0, and variance, .
The authors in [31] proposed the ordinary least squares estimator for the estimation of in equation (11). Furthermore, based on equation (11), the authors in [13] have introduced the ridge regression estimator for estimating . In this paper, we propose the regularised weighted least squares estimators for estimating in equation (11).

2.1. The Proposed Estimators

In order to estimate , the loss function of the regularised weighted least squares for equation (11) is defined as

Here, is the weight function defined as

where . Thus, and decreases linearly with respect to . is employed due to its ability to reduce the variability in the estimator. The exponent is chosen such thatIn this study, we consider . Thus, we would define such that . Note that, is random through , and when the exponent is 0, is deterministic. In particular, when , we obtain the weight function , as introduced in [32]. Nevertheless, we can approximate the weight as a limit of the current result by allowing to approach 0.

We minimize the loss function in equation (12) with respect to and to obtain jointly estimates and andwhile,and

We substitute equation (13) into equation (17) to obtain an explicit expression for equation (17) as

The main theorem backing the proof of equation (20) is the Kolmogorov’s strong law of large numbers for independent random variables, but for brevity, we present only the results.

The parameter is estimated externally using the minimum variance approach introduced in [13].

In addition, the parameter in equation (12) is the penalty that regulates the bias coefficient . The loss function, , minimises the weighted sum of squared residuals and also regulates the size of the bias coefficient . The penalty term shrinks the bias term, to 0 as the penalty parameter, , increases. Thus, the larger the value of , the higher the contribution of the penalty term to the loss function and the stronger the regularisation process. To obtain an estimator for the penalty term, , we minimize the asymptotic mean squared error (AMSE) of the proposed estimator, (see, example [13]).

Note that, since the weight function depends on the ’s, the estimators in equations (15) and (16) also depend on the ’s through the weight function. Therefore, we would find the AMSE by conditioning on the ’s. From equation (15) and (16), the AMSE for is obtained as


We now derive an expression for the penalty term, . Minimising in equation (21) over , the optimal value of is obtained by solving the equationWe obtain,

In order to estimate , we assume the slowly varying function in (8) is constant [13]. Thus, we havefor some and we estimate via the estimator proposed in [33]. It follows from equation (28) and (29) that

The penalty term, is required to be non-negative; therefore, we define . We then obtain a penalty term and estimators which do not depend on ’s, by averaging over the ’s, as follows:

and we defined the proposed estimator of by

2.2. Asymptotic Properties of the Proposed Estimators

Unbiasedness, consistency, and normality are desirable properties of a good estimator. In this section, we investigate these desirable properties of the proposed estimators.

We shall summarise the asymptotic behaviour of the statistics used to build the AMSE of the proposed estimator in Lemma 1. These properties will be required in the proof of the asymptotic consistency and sampling distribution of the proposed estimator. Henceforth, anytime we use the term it is with respect to the law of the i.i.d sequence .

Lemma 1. Assume that is estimated by a consistent estimator and (14) holds, then as and ;(i).(ii).(iii).(iv).

Lemma 2. Suppose and are estimated by their respective consistent estimators and , then as and ,

It follows from Lemma 2 that the regularised weighted least estimator, is asymptotically unbiased. That is, as , . The bias of the proposed estimator is given bywhere

Since the term in the bracket converges to 0 as almost sure by Lemma 2, the expectation of the term will converge to 0 as . Therefore,which gives . Similarly, we can use Lemma 1 and Lemma 2 to show that as . We write

Now, observe from the Jensen’s inequality (applied to the expectation taken with respect to the law of ) and the Fubini’s theorem thatand therefore, we haveThis implies that the proposed estimator is asymptotically consistent under some conditions.

Theorem 3. Suppose equations (2), (8), and (14) are satisfied. Assume also that is estimated by a consistent estimator withThen, if assumptions and holds, and , then, we haveTheorem 3 discusses the asymptotic normality of defined in equation (16). To prove Theorem 3, we require the following properties in addition.

We write

Lemma 4. Let and . Then, as ,Lemma 4 is required in the Proof Proof 5.

Lemma 5. Let be independent random variables from an exponential distribution with mean , for all . Then, for any ,where is defined by equation (15).

Remark 1. Lemma 5 shows the statistics as and .
The next lemma is about the satisfaction of the Lyapunov’s version of the central limit theorem. The Lyapunov’s variant of the central limit theorem assumes the existence of a finite moment of an order higher than two.

Lemma 7. Suppose that are independent random variables such that and , then, there exists such thatwhere .

Remark 2. Setting the penalty term to 0 reduces the regularised weighted least squares estimator to a weighted least squares estimator. The difference between this weighted least squares estimator and the one introduced by [32] is that, this weighted least squares estimator has smaller asymptotic variance and this is due to the introduction of randomness into the weight function. The resulting weighted least squares estimator is also asymptotically unbiased, asymptotically consistent, and asymptotically normally distributed with mean 0 and variance .

3. Simulation Study

In the previous section, we proposed the regularised weighted least squares estimators under the semiparametric setting to estimate the tail index of the underlying distribution of a given data from the Pareto-type of distributions. In this section, we perform a simulation study to compare the performance of our proposed estimators to other existing semiparametric tail index estimators. Particularly, the regularised weighted least squares, RWLS, the reduced-bias weighted least squares with modified weight function, WLS, the ridge regression, RR [13], the least squares, LS [31], the Hill estimator, HILL [17], and the bias-corrected Hill, BCHILL [29] in the case of Pareto-type distributions are compared.

3.1. Simulation Design

We consider the Fréchet and Burr XII from the Pareto-type distributions as shown in Table 1. For each distribution , we generate 1000 repetitions of samples of size , and 2000. For the Fréchet distributions, we consider , and 1.0; and for the Burr XII we consider the mixtures.(i),(ii) and(iii)

To obtain the tail index values and 1.0, respectively. We consider the finite sample behaviour of the proposed estimators, RWLS and WLS, and also compared these estimators with RR, LS, BCHILL and HILL. The mean square errors (MSE) and bias are plotted as a function of the number of top-order statistics, , to investigate the estimators’ sample path behaviour.

In the case of the weight function, the ’s will be replaced with their point estimate, in this case, the mean of a standard uniform distribution. In the case of , we select such that , as . This choice of is made because in practice we have observed that it yields much more stable estimates compared to when is selected such that in application.

3.2. Discussion of Simulation Results

In this section, we discuss the behaviour of RWLS and WLS relative to RR, LS, HILL, and BCHILL. The MSE and bias are the performance measures in the simulation studies. The simulation results for the Burr distribution with different tail indexes are shown in Figures 13. Also, Figures 46 present the simulation results for the Fréchet distribution with varying tail indexes.

From these figures, the plots of WLS and RWLS follow the same sample path for , i.e., their performance are relatively the same on that interval. WLS and LS are very close to each other, though generally, WLS slightly outperforms LS in terms of MSE and bias. Thus, generally the WLS can be considered the most appropriate estimator of the tail index among the regression-based estimators (i.e., RR, LS, WLS, and RWLS) since it mostly has smallest bias and MSE across all samples.

Additionally, the MSE plots of the proposed estimators are low and near constant over the central part of , except in the case of Burr XII with . With the exception of the HILL estimator (which globally has the highest MSE), the MSE curves of the estimators are mostly close to each other in the central region, especially in the case of the Fréchet distribution. This implies that the proposed estimators are competitive with the existing estimators. However, the proposed estimators, WLS and RWLS, generally attain the lowest bias for small samples, i.e., . Furthermore, for medium to large values of , the sample paths of RWLS in the MSE and bias plots are between HILL and RR. Even though the BCHILL estimator mostly has the smallest MSE and bias, the proposed estimators (RWLS and WLS) outperform it for large values of .

Hence, from the simulation results, WLS and RWLS are appropriate alternatives for the estimation of the tail index of the Pareto-type distributions in terms of MSE and bias.

In addition, no single tail index estimator under investigation was found to be universally the best in terms of the MSE and bias. Finally, the R codes for the simulation and the application studies are available at https://github.com/kikiocran/RegularisedTail.

4. Applications

In this section, we consider the estimation of the tail index of the underlying distribution of two datasets from the insurance industry. First, the SOA Group Medical Insurance dataset which consists of over 170,000 claims recorded from 1991 to 1992. In this study, we consider the 1991 dataset, which comprised 75,789 claims and have been studied widely in the extreme value context (see, for example, [4, 18]). Considering the large size of this dataset, we focus on the extreme tail of the data and hence consider the top 10% data points, (i.e, ). The SOA dataset is available at https://lstat.kuleuven.be/Wiley/Data/soa.txt.

Second, an automobile insurance data from Ghana which consists of 452 claims from July 7, 2020, to May 11, 2021, and can be found at https://github.com/kikiocran/RegularisedTail. We will refer to this dataset as the GH claims in this study. To the best of our knowledge, this dataset has never been used in the extreme value theory literature.

The scatter plots of the SOA, and the GH claims are shown in Figure 7. We observe that two claims and one claim in the SOA and GH claims, respectively, appear to be far detached from the bulk of the data. These observations can also be seen to deviate from linearity and far removed from the bulk of the points, respectively, in the Pareto and exponential Q-Q plots (Figure 8) of the two datasets. Such large observations are suspected outliers and may significantly influence the tail index estimates (see, for example, [4]). The convex curvature of the exponential Q-Q plots and the near linearity of the Pareto Q-Q plots of the datasets indicate the datasets suggest they belong to the Pareto-type distributions.

Figure 9 shows the sample paths of the tail index estimators for the underlying distributions of the two datasets. The plot of HILL diverges as increases, i.e., it is very sensitive to the changes in . Hence, it is not an appropriate estimator for estimating the tail index. The other estimators exhibit some form of stability; however, the sample paths of the proposed estimators (i.e., RWLS and WLS) are smooth, that is, these estimators are less sensitive to changes in . All the tail index estimators considered are very unstable for small values of due to the small number of exceedances. A specific tail index estimate can be obtained from the plots of WLS and RWLS for both datasets.

5. Conclusion

In this paper, we proposed tail index estimators for the Pareto-type of distributions using the regression model. In addition to the ordinary least squares and the ridge regression estimators, we proposed the regularised weighted least squares and the weighted least squares estimators as alternative regression-based reduced-bias estimators. The tail index estimates by the proposed estimators are generally stable and smooth across a broader path of . The characteristics of the proposed estimators are as follows:(i)They are asymptotically consistent, asymptotically unbiased, and asymptotically normally distributed with mean 0 and variance .(ii)The MSE curves are low and flat over the central part of .(iii)The plots of their tail index estimates are more stable, smooth, and near horizontal than the Hill, ordinary least squares, and the bias-corrected Hill estimators.

In conclusion, comparatively, the proposed estimators are competitive to the existing estimators and can be considered as appropriate estimators of the tail index in terms of MSE, bias, and in real-life application.

6. Proofs

Proof of Lemma 1. (i)From equation (20) and (22), we haveIt follows thatwhere and hence, as , we have .(ii)From equation (23),Using equation (20), the first term can be written asIt follows thatwhere . Therefore,as . That is, as .(iii)The expression can also be written asTherefore, as , .(iv) can also be expressed asIt also follows that, as , .

Proof of Lemma 2. The Proof Proof 2 easily follows by using Lemma 1.

Proof of Lemma 3. We observe thatTherefore, we have , as , which completes the Proof Proof 3..

Proof of Lemma 4. The proof requires the use of large deviation principles (LDP). From equations (15) and (33), . Given , is exponentially distributed with meanTherefore, we haveUsing equation (36) and similar calculations as in [32], the moment generating function of given the law of isIt follows thatNow using the bound on , see Lemma 4 and the Squeeze Theorem, we obtainHence, by the G rtner Ellis Theorem, conditional on , the statistics follows a Large Deviation Principle (LDP) with speed and a rate function defined aswhere . This implies for every , we havewhere . The typical behaviour of the rate function is when , thereforeThus, as and this ends the proof.

Proof of Lemma 5. We observe that are independent but not identical distributed random variables.(i)where and is the Hill estimator. Hence, we have(ii)Let be the probability density function of given and we observe thatwhere for . Therefore, we have thatNow define , and note thatHence,

Proof of Theorem 1. Using Lemma 5 and Lemma 7, we can prove Theorem 3. It has been established in Lemma 5 that as . Lemma 7 also establishes that the Lyapunov’s condition holds for Central Limit Theorem; hence, by the Lyapunov’s Central Limit Theorem,Therefore, all we need to complete the Proof Proof 6, is to specify the parameters of the normal distribution.
Recall from (16) thatAlso, from Lemma 5, asymptotically, the second term on the right hand side vanishes, so we would concentrate on the first term of the expression only. LetThe expected value of is given asHence, as . Recall that , , and by assumption and are independent, therefore, the variance of is given byUsing the assumption , as , we have , which completes the Proof of Theorem 1.

Data Availability

The SOA dataset is available at https://lstat.kuleuven.be/Wiley/Data/soa.txt. The automobile insurance data from Ghana can be found at https://github.com/kikiocran/TailEstimators.


A draft of the article has been previously appeared online on arxiv.org as a preprint (see [34]).

Conflicts of Interest

The authors declare that they have no conflicts of interest.


Ocran, E. would like to thank the University of Ghana Building a New Generation of Academics in Africa (BANGA-Africa) Project (funded by Carnegie Corporation of New York) for providing financial support for this Ph.D research work.