Abstract

In this paper, we propose a family of heavy tailed distributions, by incorporating a trigonometric function called the arcsine exponentiated-X family of distributions. Based on the proposed approach, a three-parameter extension of the Weibull distribution called the arcsine exponentiated-Weibull (ASE-W) distribution is studied in detail. Maximum likelihood is used to estimate the model parameters, and its performance is evaluated by two simulation studies. Actuarial measures including Value at Risk and Tail Value at Risk are derived for the ASE-W distribution. Furthermore, a numerical study of these measures is conducted proving that the proposed ASE-W distribution has a heavier tail than the baseline Weibull distribution. These actuarial measures are also estimated from insurance claims real data for the ASE-W and other competing distributions. The usefulness and flexibility of the proposed model is proved by analyzing a real-life heavy tailed insurance claims data. We construct a modified chi-squared goodness-of-fit test based on the Nikulin–Rao–Robson statistic to verify the validity of the proposed ASE-W model. The modified test shows that the ASE-W model can be used as a good candidate for analyzing heavy tailed insurance claims data.

1. Introduction

Heavy tailed distributions play a significant role in modeling data in applied sciences, particularly in risk management, banking, economics, financial, and actuarial sciences. However, the quality of the procedures primarily depends upon the assumed probability model of the phenomenon under consideration. Among the applied fields, the insurance datasets are usually positive [1], right-skewed [2], unimodal shaped [3], and with heavy tails [4]. Right-skewed data may be adequately modeled by the skewed distributions [5]. Therefore, a number of unimodal positively skewed parametric distributions have been employed to model such datasets [6, 7].

The heavy tailed distributions are those whose right tail probabilities are heavier than the exponential one, that is,where is the cdf of a baseline distribution. More information can be explored in Resnick [8] and Beirlant et al. [9].

Dutta and Perry [10] performed an empirical analysis of loss distributions to estimate the risk via different approaches. They rejected the idea of using the exponential, gamma, and Weibull models because of their poor results and concluded that one would need to use a model that is flexible enough in its structure. These results encouraged the researchers to propose new flexible models providing greater accuracy in data fitting. Therefore, a number of approaches have been proposed to obtain new distributions with heavier tails than the exponential distribution, such as (i) transformation method [11, 12], (ii) composition of two or more distributions [13], (iii) compounding of distributions [14, 15] , and (iv) finite mixture of distributions [16, 17].

The abovementioned approaches are very useful in deriving new flexible distributions; however, they are still subject to some sort of deficiencies, for example, (i) the transformation approach is simple to apply, but its inferences become difficult and many computational work is required to derive the distributional characteristics [18]. (ii) The approach of composition of two or more distributions using a fixed or a priori known mixing weights, and hence they can be very restrictive [19]. To overcome this problem, Scollnik [20] used unrestricted mixing weights. (iii) The density obtained by the compounding approach may not always have a closed form expression which makes the estimation more cumbersome [21]. (iv) Finite mixture models represent a further approach to define very flexible distributions which are also able to capture, for instance, multimodality of the underlying distribution. The price to pay for this greater flexibility is a more complicated and computationally challenging inference [22].

To overcome the problems associated with the above former methods, many authors have proposed new families of distributions, see, for example, Al-Mofleh [23], Jamal and Nasir [24] and Nasir et al. [25], Ahmad et al. [26], Afify et al. [27], Cordeiro et al. [28], Ahmad et al. [29], Afify and Alizadeh [30], and among many others. Therefore, bringing flexibility to the existing distributions by adding additional parameter(s) is a desirable feature and an interesting research topic.

In this regard, Mudholkar and Srivastava [31] introduced the exponentiated family of distributions by adding a shape parameter to obtain more flexible version of the existing distributions. A random variable X is said to follow the exponentiated family, if its cumulative distribution function (cdf) is given bywhere is the cdf of the baseline distribution depending on the parameter vector and is an additional shape parameter. Using equation (2), the exponentiated versions of the existing distributions have been proposed in the literature.

Furthermore, Cordeiro and de Castro [32] proposed another approach known as the Kumaraswamy-generalized (Ku-G) family by adding two additional shape parameters. The cdf of the Ku-G family is

From equation (3), it is clear that, for b= 1, the Ku-G family reduces to the exponentiated family. For a contributed work based on equation (3), we refer to Ahmad et al. [33], Mead and Afify [34], Afify et al. [35], and Mansour et al. [36].

In this paper, we enrich the branch of distribution theory by introducing the heavy tailed arcsine exponentiated-X (ASE-X) family of distributions. A random variable X belongs to the proposed ASE-X family if its cdf iswhere is the baseline cdf with a parameter vector and an additional shape parameter .

The probability density function (pdf) corresponding to equation (4) is given by

The new pdf is most tractable when and have simple analytical expressions. Henceforth, a random variable X with pdf equation (5) is denoted by . Moreover, the key motivations for using the ASE-X family in practice are the following:(i)To improve the characteristics and flexibility of the existing distributions, the special models of this family can provide left-skewed, right-skewed, unimodal, reversed J-shaped and symmetric densities, and decreasing and increasing, bathtub, upside down bathtub, and reversed-J hazard rates (See Figures 1 and 2)(ii)A very simple and convenient method of adding an additional parameter provide extended heavy tailed distributions which are very useful in modeling data form the insurance field (see Sections 6 and 7)(iii)To introduce the extended version of a baseline distribution with closed forms for the cdf and hazard rate function (hrf), the special submodels of this family can be used in analyzing censored datasets(iv)The special cases of the ASE-X approach is capable of modeling heavy tailed datasets in actuarial science as compared with existing competing models (see Sections 6 and 7).

Using the new cdf in equation (4), a number of new flexible distributions can be obtained. Some new contributed models based on the ASE-X approach are presented in Table 1.

The survival function (sf) and hrf of the proposed family are, respectively, given by

The paper is outlined as follows. In Section 2, we define the ASE-W distribution and present some plots for its density and hazard functions. We provide some mathematical properties of the ASE-X distribution in Section 3. The maximum likelihood estimators (MLEs) of the model parameters are obtained in Section 4. Two Monte Carlo simulation studies to assess the performance of the MLEs are discussed in Section 5. In Section 6, we derive two important risk measures called value at risk and tail value at risk of the ASE-W distribution and perform a simulation study to prove that the ASE-W distribution has a heavier tail than the baseline Weibull distribution. In Section 7, the ASE-W distribution is applied to a real heavy tailed insurance claims data to illustrate its potentiality. Furthermore, the value at risk and tail value at risk measures are estimated for all competing models based on the insurance claims data. A modified goodness-of-fit test using a Nikulin–Rao–Robson statistic test is presented in Section 8. Finally, in Section 9, we provide some concluding remarks.

2. The ASE-W Distribution

In this section, we introduce the ASE-W distribution and investigate the behavior of its density and hazard functions, for selected values of the parameters. Consider the cdf of the two-parameter Weibull distribution, . Then, a random variable X is said to follow the ASE-W distribution if its cdf takes the form

The pdf associated of equation (7) has the form

For , the ASE-W distribution reduces to the ASE-exponential distribution with parameter , and for , it reduces to the ASE-Rayleigh distribution with parameter .

For different values of the model parameters, plots of the pdf and hrf of the ASE-W distribution are sketched in Figures 1 and 2. The two figures reveal that the ASE-W can provide left-skewed, right-skewed, unimodal, reversed J-shaped and symmetric densities, and decreasing and increasing, bathtub, upside down bathtub, and reversed-J hazard rate shapes.

3. Basic Mathematical Properties

In this section, some statistical properties of the ASE-X family are derived.

3.1. Quantile Function

Let X be the ASE-X random variable with pdf equation (5), the quantile function (qf) of X, say Q (u), reduces towhere u has the uniform distribution on the interval (0, 1). From the expression in equation (9), it is clear that the ASE-X family has a closed form solution of its quantile function which makes generating random numbers very simple.

The qf of the ASE-W model follows as

3.2. Moments

Moments are very important and play an essential role in statistical analysis. They help to capture important features and characteristics of the distribution (e.g., central tendency, dispersion, skewness, and kurtosis). The moment of the ASE-X family is

Substituting equation (5) in equation (11), we obtain

Using the binomial expansion, we have

By replacing with , in equation (13), we obtain

By inserting equation (14) in equation (12), we obtainwhere .

The moment generating function of the ASE-X class has the form

The effects of different values of the parameters and on the mean, variance, skewness, and kurtosis of the ASE-W distribution with are illustrated in Figures 3 and 4.

4. Maximum Likelihood Estimation

In this section, we consider the estimation of the unknown parameters of the ASE-X distribution from complete samples only via the maximum likelihood. Let be a random sample from the ASE-X family with observed values . The log-likelihood function is

The log-likelihood function can be maximized either directly by using the R (AdequecyModel package), SAS (PROC NLMIXED), or the Ox program (sub-routine MaxBFGS) or by solving the nonlinear likelihood equations which are obtained by differentiating equation (17) as follows:

The log-likelihood function for the ASE-W model reduces to

The nonlinear likelihood equations can be obtained by differentiating the last equation as follows:

5. Simulation Results

5.1. Monte Carlo Simulation Study

In this section, we perform a comprehensive simulation study to access the behavior of MLEs of the ASE-W parameters. The random number generation is obtained via the inverse cdf. The inverse process and results of MLEs are obtained using optim()R-function with the argument method “L-BFGS-B.” We generate N = 1000 samples of size n = 25, 100, 300, 600, 900, 1000 from the ASE-W distribution with true parameter values. In this simulation study, we empirically calculate the mean, bias, and mean square error (MSE) of the MLEs for different parameters combinations and each sample.

Coverage probabilities (CPs) are also calculated at the confidence interval (C.I.). The simulation results are provided in Tables 2 and 3. Based on the generated data listed in Tables 2 and 3, the MLEs seem to behave as we expect, that is, the MSE values and the estimated biases decrease as increases. Furthermore, the mean values of estimates tend to the true values as increases, showing the consistency property of the MLEs.

5.2. Simulations Using the Barzilai-Borwein Algorithm

In this section, we provide the results of a simulation study for the ASE-W distribution using the Barzilai-Borwein (BB) algorithm [37]. Initial values for the parameters ( = 1.6,  = 0.6, and  = 1.9) are selected and random sample of sizes n = 50, 100, 200, and 400 are obtained. Repetitions are made 10,000 times and the averages of the simulated values of the MLEs (, , ) along with their MSEs are calculated. The simulation results are provided in Table 4.

From the simulation results provided in Table 4, we can see that the maximum likelihood estimates of the ASE-W parameters are convergent. The graphical sketching of the maximum likelihood estimates of the ASE-W parameters is provided in Figure 5.

From Figure 5, it is clear that all the parameters estimates of the ASE-W distribution converge faster than . Therefore, we conclude that the MLEs of the ASE-W parameters are consistent.

6. Actuarial Measures

One of the most important tasks of financial and actuarial sciences institutions is to evaluate the exposure to market risk in a portfolio of instruments, which arise from changes in underlying variables such as prices of equity, interest rates, or exchange rates. In this section, we derive some important risk measures including value at risk (VaR) and tail value at risk (TVaR) of the ASE-W distribution which play a crucial role in portfolio optimization under uncertainty.

6.1. Value at Risk

The VaR is widely used by practitioners as a standard financial market risk measure. It is also called the quantile premium principle or quantile risk measure. The VaR is always specified with a given degree of confidence say (typically 90%, 95% or 99%), and it represents the percentage loss in portfolio value that will be equaled or exceeded only X percent of the time. The VaR of a random variable X is the qth quantile of its cdf [38]. Hence, the VaR of the ASE-W distribution is defined as

6.2. Tail Value at Risk

Another important measure is TVaR which can be used to quantify the expected value of the loss given that an event outside a given probability level has occurred. If X follows the ASE-W distribution, then its TVaR can be defined as

Substituting equation (8) in equation (22), we obtain

Finally, the TVaR of the ASE-W model takes the form

6.3. Numerical Study of the Actuarial Measures

In this section, we provide some numerical results for the VaR and TVaR for the Weibull and ASE-W distributions for different sets of parameters. The process is described below:(i)Random sample of size are generated from the Weibull and ASE-W distributions and parameters have been estimated via the maximum likelihood method.(ii)1000 repetitions are made to calculate the VaR and TVaR of the two distributions.

The simulation results of the VaR and TVaR for the Weibull and ASE-W models are provided in Tables 5 and 6. Furthermore, the results in these tables are depicted graphically in Figures 6 and 7, respectively.

The simulation is performed for the Weibull and ASE-W distributions for selected values of their parameters. A model with higher values of VaR and TVaR is said to have a heavier tail. The simulated results in Tables 5 and 6 and the plots in Figures 6 and 7 show that the proposed ASE-W model has higher values of these risk measures than the Weibull model. Hence, the proposed ASE-W model has a heavier tail than the Weibull distribution and can be used effectively to model heavy tailed insurance data.

7. Modeling Heavy Tailed Insurance Claims Data

In this section, we demonstrate the flexibility of the ASE-W distribution by using heavy tailed insurance claims data. Furthermore, we calculate the actuarial measures of the ASE-W and other competing distributions using this real dataset.

7.1. Application of the ASE-W Distribution to Insurance Claims Data

In this section, we consider a dataset from insurance field. This data set represents the unemployment insurance initial claims per month from 1971 to 2018, and it is available at https://data.worlddatany-govns8z-xewg. We compare the goodness-of-fit results of the proposed distribution with some other well-known competing distributions including Weibull, exponentiated exponential (EE), exponentiated Weibull (EW), exponentiated Lomax (EL), Kumaraswamy Weibull (Ku-W), beta Weibull (BW), and new Weibull Burr-XII (NWB-XII) distributions. The distribution functions of these competitive distributions are given by(1)Weibull distribution:(2)EE distribution:(3)EW distribution:(4)EL distribution:(5)Ku-W distribution:(6)BW distribution:(7)NWB-XII distribution:

The competing models can be compared using some discrimination measures called(i)The Akaike information criterion (AIC)(ii)The Bayesian information criterion (BIC)(iii)The Hannan–Quinn information criterion (HQIC)(iv)The consistent Akaike information Criterion (CAIC)

In addition to the discrimination measures, we further considered other test statistics called(i)Anderson Darling (AD)(ii)Cramér–von Mises (CM)(iii)The Kolmogorov–Smirnov (KS) with its value

The formulae for these measures can be explored in Afify et al. Table 7 gives the MLEs and their standard errors. The analytical measures are provided in Tables 8 and 9. The results in these tables indicate that the ASE-W distribution provides better fits than other competing models and could be chosen as an adequate model to analyze the heavy tailed insurance claims data.

Figure 8 displays the fitted pdf and cdf of the proposed distribution which shows that the ASE-W fits the right-skewed heavy tailed distribution very well. The probability-probability (PP) plot and and Kaplan–Meier survival plots are sketched in Figure 9.

7.2. Estimating of VaR and TVaR Measures Using the Insurance Claims Data

In this section, we compute the VaR and TVaR measures of the ASE-W and other competing distributions using the estimated values of the parameters using the insurance claims data. The numerical results for all fitted distributions are reported in Table 10. The results in Table 10 are displayed graphically in Figure 10.

As we have mentioned earlier that a distribution with higher values of the risk measures is said to has a heavier tail. The values in Table 10 and Figure 10 illustrate that the ASE-W distribution has the highest values of VaR and TVaR among all competing models, proving that it has a heavier tail than other competitors for insurance claims data.

8. Validation of the ASE-W Distribution

Goodness-of-fit tests indicate whether or not it is reasonable to assume that a random sample comes from a specific distribution. Statistical techniques often rely on observations obtained from a population that has a distribution of a specific form. Selection of a suitable model in all types of statistical analysis is of a great importance. For this purpose a lot of goodness-of-fit tests are proposed by some researchers. Nikulin [39, 40] proposed a modification in the standard chi-squared Pearson’s test for a continuous distribution. Rao and Robson [41] obtained the same result for the exponential family, and later this statistic is well adapted by some researchers with the name as Rao–Robson–Nikulin (RRN) test.

In this section, we use another goodness-of-fit test to show the validity of the ASE-W distribution for heavy tailed insurance data. For this purpose, we use the NRR test statistic to show the utility of the ASE-W distribution in insurance and financial sciences.

8.1. Nikulin–Rao–Robson Test Statistic

So far in the literature, a number of methods have been proposed to verify the adequacy and goodness-of-fit of the statistical models to data. Since the seventies of the last century, researchers have shown a deep interest to propose new modifications of goodness-of-fit test. In this regard, Nikulin [42] and Rao and Robson [41] separately proposed a modification of the Pearson statistic for complete data known as Nikulin–Rao–Robson (NRR) statistic. To test the hypothesis ,where represents the vector of unknown parameters, the NRR statistic is denoted by , and it is defined as follows.

Suppose observations are grouped in subintervals , mutually disjoint:

The limits of the intervals are obtained such thatwhere

Ifis the vector of frequencies obtained by the grouping of data in these intervals,

The NRR statistic is given bywhereand is the information matrix for the grouped data defined bywithwhere represents the estimated Fisher information matrix and is the MLE of the parameter vector. The statistic follows a chi square distribution with degrees of freedom.

8.2. Modified Chi-Squared Test for the ASE-W Distribution

A modified chi-squared goodness-of-fit test is constructed by fitting the statistic developed in the previous section to verify if a sample is distributed according to the ASE-W model, , with unknown parameters . The MLEs of the unknown parameters of the ASE-W distribution are computed using the insurance claims data. The statistic does not depend on the parameters, we can, therefore, use the estimated Fisher information matrix .

To test the null hypothesis that the insurance claims data came from the ASE-W distribution, we use the statistic. To conduct the analysis, we use the BB algorithm in R software to compute the maximum likelihood estimates given by , and . For the insurance claims data, the estimated Fisher information matrix is

The value of the NRR statistic is given by , whereas the critical value is .

We can see that the value of statistic is less than the critical value. Therefore, we conclude that the insurance claims data follow the ASE-W model.

8.3. Simulation Study of the ASE-W Distribution Using Statistic

To test the null hypothesis that the sample comes from the ASE-W model, we calculate for simulated samples with sample sizes n = 50, n = 100, n = 200, and n = 400, respectively. For different significance levels , we calculate the average of the nonrejections of the null hypothesis, i.e., . We present the results of the corresponding empirical and theoretical levels in Table 11. As can be shown, the values of the empirical levels calculated are very close to those of their corresponding theoretical levels. Thus, we conclude that the proposed test provides a good fit to the ASE-W distribution.

8.4. Simulated Distribution of the Statistic for the ASE-W Model

The statistic follows in the limit chi-squared distribution with degrees of freedom. For demonstrating this fact, we compute times the simulated distribution of under the null hypothesis with different values of parameters and intervals. We sketch the plots of the chi-squared distribution with degree of freedom to see the visual representation. The histograms of the statistic versus the chi-squared distribution with degree of freedom are presented in Figures 11 and 12.

From Figures 11 and 12, we observe that the distribution of with different values of parameters and different numbers of grouping cells for different number of equiprobable grouping intervals and different values of parameters in the limit follows a chi-squared distribution with degrees of freedom within the statistical errors of simulation. Therefore, we can say that the limiting distribution of the generalized chi-squared statistic for ASE-W model is distribution free.

9. Concluding Remarks

In this paper, we used the trigonometric function to introduce a new family of heavy tailed distributions called the arcsine exponentiated-X (ASE-X) family of distributions. The ASE-X is very interesting and provides better fits to the heavy tailed insurance data. We define a special submodel called ASE-Weibull (ASE-W) distribution. The maximum likelihood is used to estimate the ASE-W parameters. The simulation results are obtained using the inversion and Barzilai-Borwein algorithms, assessing the performance of the maximum likelihood estimators. We derive two important risk measures called value at risk and tail value at risk of the ASE-W distribution and perform a simulation study to prove that the ASE-W distribution has a heavier tail than the baseline Weibull distribution. A heavy tailed insurance dataset is analyzed showing that the ASE-W distribution provides better fits than some other competing models. Furthermore, the value at risk and tail value at risk measures are estimated for all competing models based on the insurance claims data, proving that the ASE-W distribution performs well than other its competitors. Furthermore, we construct a modified chi-squared goodness-of-fit test statistic for the ASE-W distribution, based on the NRR statistic, to show its validity in modeling financial data. The special cases of Table 1 can be studied in future work. Furthermore, different classical and Bayesian methods can be employed to estimate the unknown parameters of these special submodels.

Data Availability

This work is mainly a methodological development and has been applied on secondary data related to the insurance science data, but if required, data will be provided.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The first author also acknowledges “Project of Humanities and Social Sciences Research in Jiangxi Universities in 2018: Risk Control of P2P Network Loan Platform Based on Block Chain Technology (JJ18219).”