Making use of the peaks over threshold (POT) estimation method, we propose a semiparametric estimator for the renewal function of interoccurrence times of heavy-tailed insurance claims with infinite variance. We prove that the proposed estimator is consistent and asymptotically normal, and we carry out a simulation study to compare its finite-sample behavior with respect to the nonparametric one. Our results provide actuaries with confidence bounds for the renewal function of dangerous risks.

1. Introduction

Let be independent and identically distributed (iid) positive random variables (rvs), representing claim interoccurrence times of an insurance risk, with common distribution function (df) having finite mean and variance Let be the claim occurrence times, and define the number of claims recorded over the time interval by The corresponding renewal function is defined by where is the -fold convolution of for

The renewal theory has proved to be a powerful tool in stochastic modeling in a wide variety of applications such as reliability theory, where a renewal process is used to model the successive repairs of a failed machine (see [1]), risk theory, where a renewal process is used to model the successive occurrences of risks (see [2, 3]), inventory theory, where a renewal process is used to model the successive times between demand points (see [4]), manpower planning, where a renewal process is used to model the sequence of resignations from a given job (see [5]), and warranty analysis, where a renewal process is used to model the successive purchases of a new item following the expiry of a free-replacement warranty (see [6]). Therefore, the need for renewal function estimates seems more than pressing in many practical problems. For a summary of renewal theory, one refers to Feller [7], Asmussen [8], and Resnick [9].

Statistical estimation of the renewal function has been considered in several ways. Using a nonparametric approach, Frees [10] introduced two estimators based on the empirical counterparts of and by suitably truncating the sum in (1.3) Zhao and Subba Rao [11] proposed an estimation method based on the kernel estimate of the density and the renewal equation. A histogram-type estimator, resembling to the second estimator of Frees, was given by Markovich and Krieger [12].

When Sgibnev [13] gave an asymptotic approximation of (1.3) as follows: with being the tail of

By replacing by its empirical counterpart in (1.4) Bebbington et al. [14] recently proposed a nonparametric estimator for in the case where is of infinite variance, given by where and , respectively, represent the first and second sample moments of Their main result says that whenever belongs to the domain of attraction of a stable law with (see, e.g., [15]), the df of converges, for suitable normalizing constants, to This result provides confidence bounds for with respect to the quantiles of

In general, people prefer estimators having simple formulas and carrying some kind of asymptotic normality property in order to facilitate confidence interval construction. From this point of view, the estimator may not be as satisfactory to the users as it should be. Then an alternative estimator to would be more useful in practice. Our task is to use the extreme value theory tools to construct such an alternative estimator.

Indeed, an important class of models having infinite second-order moments is the set of heavy-tailed distributions (e.g., Pareto, Burr, Student, etc.). A df is said to be heavy-tailed with tail index if for , and some real constant with a slowly varying function at infinity, that is, as for any For details on these functions, see Chapter in Resnick [16] or Seneta [17]. Notice that when we have and In this case, an asymptotic approximation of the renewal function is given in (1.4)

Prior to Sgibnev [13], Teugels [18] obtained an approximation of when is heavy-tailed with tail index : Extreme value theory allows for an accurate modeling of the tails of any unknown distribution, making the (semiparametric) statistical inference more accurate for heavy-tailed distributions. Indeed, the semiparametric approach permits extrapolating beyond the largest value of a given sample while the nonparametric one does not since the empirical df vanishes outside the sample. This represents a big handicap for those dealing with heavy-tailed data.

Extreme value theory has two aspects. The first one consists in approximating the tail distribution by the generalized extreme value (GEV) distribution, thanks to Fisher-Tippett theorem (see [19, 20]). The second aspect (commonly known as POT method) is based on Balkema-de Haan result which says that the distribution of the excesses over a fixed threshold is approximated by the generalized Pareto distribution (GPD) (see [21, 22]). Those interested in extreme value theory and its applications are referred to the textbooks of de Haan and Ferriera [23] and Embrechts et al. [24]. In our situation, we have a fixed threshold equal to the horizon (see Section 3). Therefore, the POT method would be the appropriate choice to derive an estimator for by exploiting the heavy-tail property of df used in approximation (1.4) The asymptotic normality of our estimator is established under suitable assumptions.

The remainder of the paper is organized as follows. In Section 2, we introduce the GPD approximation, mostly known as the POT method. A new estimator of the renewal function is proposed in Section 3, along with two main results on its limiting behavior. Section 4 is devoted to a simulation study. The proofs are postponed until Section 5.

2. GPD Approximation

The distribution of the excesses, over a “fixed” threshold pertaining to df is defined by It is shown, in Balkema and de Haan [21] and Pickands [22], that is approximated by a generalized Pareto distribution (GPD) function with shape parameter and scale parameter in the following sense: where as for any The GPD function is a two-parameter df defined by for if and if

Let be iid rvs with exact GPD It is well known by standard arguments (see, e.g., [25, Chapter 9]) that there exists, with probability as tends to infinity, a local maximum for the Log-Likelihood of 's density based on the sample In this case, by Theorem page 447 in the work of Lehmann and Casella [26], we infer that and are consistent estimators of and Moreover, these estimators are asymptotically normal provided that The extension to was investigated by Smith [27].

Suppose now that are drawn not from but from In view of the asymptotic approximation (2.2) Smith [27] has proposed estimates for via the Maximum Likelihood approach. The obtained estimators are solutions of the following system: where is a realization of

Letting as and and making use of (2.2) Smith [28] established, in Theorem , the asymptotic normality of as follows: where provided that as and is nonincreasing near infinity. In the case the limiting distribution in (2.5) is biased. Here denotes convergence in distribution and stands for the bivariate normal distribution with mean vector and covariance matrix

3. Estimating the Renewal Function in Infinite Time

Since we are interested in the renewal function in infinite time, we must assume that time is large enough and for asymptotic considerations, we will assume that depends on the sample size That is, with as Relation (1.7) suggests that in order to construct an estimator of we need to estimate and Let be the number of s, which are observed on horizon and denoted by the number of exceedances over with being the cardinality of set Notice that is a binomial rv with parameters and for which the natural estimator is

Select, from the sample only those observations that exceed The excesses are iid rvs with common df As seen in Section 2, the maximum likelihood estimators are solutions of the following system: where is an observation of and the vector a realization of Regarding the distribution mean we know that, for has finite variance and therefore could naturally be estimated by the sample mean which, by the Central Limit Theorem (CLT), is asymptotically normal. Whereas for has infinite variance, in which case the CLT is no longer valid. This case is frequently met in real insurance data (see, e.g., [29]). Using the GPD approximation, Johansson [30] has proposed an alternative estimator for For each we write as the sum of two components: Johansson [30] defined his estimator of by estimating both and as follows: where is the empirical df based on the sample and is an estimate of obtained from the relation which implies that Approximation (2.2) motivates us to estimate by Hence, an estimate of is By integrating (3.5), we get with with large probability. Here, denotes the indicator function of set Respectively, substituting , and for and in (1.7) yields the following estimator for the renewal function The asymptotic behavior of is given by the following two theorems.

Theorem 3.1. Let be a df fulfilling (1.6) with Suppose that is locally bounded in for and is nonincreasing near infinity, for some Then, for any with one has

Theorem 3.2. Let be as in Theorem 3.1. Then for any with we have where with , and

4. Simulation Study

In this section, we carry out a simulation study (by means of the statistical software R, see [31]) to illustrate the performance of our estimation procedure, through its application to sets of samples taken from two distinct Pareto distributions (with tail indices and We fix the threshold at , which is a value above the intermediate statistic corresponding to the optimal fraction of upper-order statistics in each sample. The latter is obtained by applying the algorithm of Cheng and Peng [32]. For each sample size, we generate independent replicates. Our overall results are then taken as the empirical means of the values in the repetitions.

A comparison with the nonparametric estimator is done as well. In the graphical illustration, we plot both estimators versus the sample size ranging from to

Figure 1 clearly shows that the new estimator is consistent and that it is always better than the nonparametric one. For the numerical investigation, we take samples of sizes and In each case, we compute the semiparametric estimate as well as the nonparametric estimate We also provide the bias and the root mean squared error (rmse).

The results are summarized in Tables 1 and 2 for and respectively. We notice that, regardless of the tail index value and the sample size, the semiparametric estimation procedure is more accurate than the nonparametric one.

5. Proofs

The following tools will be instrumental for our needs.

Proposition 5.1. Let be a df fulfilling (1.6) with , and some real Suppose that is locally bounded in for Then for large enough and for any one has where , and are those defined in Theorem 3.2.

Lemma 5.2. Under the assumptions of Theorem 3.2, one has, for any real numbers and where

Proof. We will only prove the second result, the other ones are straightforward from (1.6). Let be such that for Then for large enough, we have Recall that hence Making use of the proposition assumptions, we get and and therefore

Proof of Lemma 5.2. See Johansson [30].

Proof of Theorem 3.1. We may readily check that for all large where Johansson [30] proved that there exists a bounded sequence such that hence The first result of the proposition yields that Since then On the other hand, by the CLT we have then On the other hand, Smith [28], yields it follows that, therefore Thus, Therefore for all large we get as sought.

Proof of Theorem 3.2. From the proof of Theorem 3.1, for all large it is easy to verify that where Multiplying by and using the proposition and the lemma together with the continuous mapping theorem, we find that On the other hand, from Johansson [30], we have for all large This enables us to rewrite into In view of Lemma 5.2, we infer that for all large the previous quantity is where are standard normal rvs with for every with except for Therefore, the rv is Gaussian with mean zero with asymptotic variance Observe now that where is that in (3.12) this completes the proof of Theorem 3.2.

6. Conclusion

In this paper, we have proposed a new estimator for the renewal function of heavy-tailed claim interoccurence times, via a semiparametric approach. Our considerations are based on one aspect of the extreme value theory, namely, the POT method. We have proved that our estimator is consistent and asymptotically normal. Moreover, simulations show that it is more accurate than the nonparametric estimator given by Bebbington et al. [14].


The authors are grateful to the referees whose suggestions led to an improvement of the paper.