Abstract

In this study, a new loss distribution, called the exponentiated Fréchet loss distribution is developed and studied. The plots of the density function of the distribution show that the distribution can exhibit different shapes including right skewed and decreasing shapes, and various degrees of kurtosis. Several properties of the distribution are obtained including moments, mean excess function, limited expected value function, value at risk, tail value at risk, and tail variance. The estimators of the parameters of the distribution are obtained via maximum likelihood, maximum product spacing, ordinary least squares, and weighted least squares methods. The performances of the various estimators are investigated using simulation studies. The results show that the estimators are consistent. The new distribution is extended into a regression model. The usefulness and applicability of the new distribution and its regression model are demonstrated using actuarial data sets. The results show that the new loss distribution can be used as an alternative to modelling actuarial data.

1. Introduction

In actuarial practice, there is the need to appropriately model data sets. Achieving this can lead to optimal capital allocation as a result of accurate calculations of risk measures and insurance premiums. This is essential for risk management purposes. Due to this, probability distributions are very essential in actuarial practice. Several distributions have been used in actuarial practice including Pareto, gamma, beta, Fréchet, exponential and Weibull distributions. However, given the nature of actuarial data, specifically loss data, some of these distributions are not able to appropriately model such data. For instance, loss data are observed to be heavy-tailed in nature and require distributions that exhibit such properties to be able to model them [1, 2]. Thus, several new distributions have been developed and studied by researchers over the decades for modelling loss data.

Due to the nature of loss data, distributions that exhibit right skewness, such as extreme value distributions or their generalizations, are used to model them. The Fréchet distribution, also known as type II extreme value distribution, is a special case of the generalized extreme value distribution. It has applications in several fields including actuarial science, finance, hydrology, and biological studies (see [3]). Due to its usefulness, several families of distributions have been developed for the generalization of the Fréchet distribution. Some of these include odd Fréchet family [4], extended odd Fréchet family [5, 6], transmuted odd Fréchet family [7], generalized odd Fréchet [8], and exponentiated Fréchet family [9]. Specifically, some generalizations of the Fréchet distribution include exponentiated Fréchet (EF) [10], beta Fréchet (BF) [11], gamma extended Fréchet [12], Kumaraswamy Fréchet (KF) [13], Weibull Fréchet [14], modified Fréchet [15], beta exponentiated Fréchet (BEF) [16], Burr X Fréchet [17], modified Kies-Fréchet [18], and extended Weibull Fréchet [19] distributions.

In this study, a new extension of EF distribution, known as the exponentiated Fréchet loss (EFL) distribution is developed and studied. The new distribution is developed using a family of loss distributions proposed by Ahmad et al. [20]. Regression models are very essential in relating a response variable to an independent variable(s). Letting the response variable follow the EFL distribution, an EFL regression model can be developed. Thus, in this study, an EFL regression model is developed and its application demonstrated.

The rest of the article is organized as follows: Section 2 presents the EFL distribution. Some statistical properties, including moments and moment generating function, are presented in Section 3. Section 4 presents some actuarial properties including mean excess function, limited expected value function, value at risk, tail value at risk, and tail variance. Section 5 presents four parameter estimation methods for estimating the parameters of the distribution. Monte Carlo simulation studies to assess the performance of the estimators are carried out in Section 6. A new regression model based on the EFL distribution is given in Section 7. The usefulness of the new distribution and its regression model are demonstrated on real data sets in Section 8. Section 9 presents the conclusion of the study.

2. Exponentiated Fréchet Loss Distribution

Let the random variable follow the family of loss distributions proposed by Ahmad et al. [20]. Then its cumulative distribution function (CDF) is given as follows:where is CDF of the baseline distribution. In this study, the EF distribution is used as the baseline distribution. The CDF of the EF distribution is given as follows:where is a scale parameter and and are shape parameters. Substituting equation (2) into equation (1) gives the CDF of the EFL distribution as follows:

The differential of equation (3) gives the probability density function (PDF) of the EFL distribution as follows:

The hazard rate function of ELF distribution is obtained as follows:

It should be noted that when in the EFL distribution, Fréchet loss distribution is obtained. To show the flexibility of the EFL distribution, some plots of its PDF and hazard rate functions are obtained for some parameter values of the distribution and shown in Figure 1. It can be observed that the PDF can exhibit right-skewed, decreasing, approximately symmetric shapes, and various degrees of kurtosis. Also, the hazard rate function exhibit decreasing, increasing, and reverse J shapes.

2.1. Expansion of PDF

The expansion of the PDF of EFL distribution is obtained in this subsection. The expansion is useful for obtaining some quantities, such as the moments, moment generating function, and other properties, of the distribution that involve integrals of the PDF or its functions. Using generalized binomial expansion defined as , we have

Also, using the binomial expansion , we have

Substituting equations (6) and (7) into the PDF of the EFL distribution in equation (4) gives

Using the expansion , where [21, 22], we have

Substituting equations (9) and (10) into equation (8), and after some algebraic manipulations, gives the PDF of the EFL distribution as follows:where , , and

3. Statistical Properties

In this section, some statistical properties of the EFL distribution are obtained. These include the quantile function, ordinary and incomplete moments, and moment generating function.

3.1. Quantile Function

The quantile function of a distribution can be used to characterize the distribution. It can also be used to obtain random numbers from the distribution and obtain some quantile-based quantities, such as the skewness and kurtosis, of the distribution. The quantile function is obtained as the inverse function of the CDF of a distribution. That is, For the EFL distribution, the quantile function is obtained as follows:where is the Lambert function. Substituting into the quantile function gives the median of the EFL distribution as follows:

Moor’s kurtosis and Bowley’s skewness can be defined using the quantile function, respectively, as follows:

Plots of the kurtosis and skewness of the EFL distribution are obtained and shown in Figure 2 for , , and a range of values for and . It can be observed that the EFL distribution can assume various degrees of kurtosis and also assume both negative and positive skewness.

3.2. Moments and Moment Generating Function

The ordinary moments, incomplete moments, and moment generating function (MGF) of the EFL distribution are given in this section. These properties are useful for characterizing the distribution and for obtaining some other properties of the distribution such as variance, coefficients of skewness and kurtosis, and mean excess function.

3.2.1. Ordinary Moments

The ordinary moment of a distribution is defined as . Thus, the ordinary moment of the EFL distribution is obtained by substituting the PDF of the distribution in equation (11) into the definition. This is given as follows:

Letting and in the first and second integrals, respectively, in equation (16), and after some algebraic manipulations, gives the ordinary moment of the EFL distribution as follows:where is the gamma function and and are as defined in equation (11). The mean of the EFL distribution is obtained by letting in equation (17). Thus, the mean of EFL distribution is given as follows:

Important measures such as standard deviation, coefficient of variation (CV), coefficients of skewness (CS), and kurtosis (CK) of the EFL distribution can be obtained via various ordinary moments of the distribution. The standard deviation and CV are measures of risk and are defined as and , respectively. Also, CS and CK are defined, respectively, as and . Table 1 shows the first four moments, , CV, CS, and CK of the EFL distribution for three sets of parameter values: , , and . Again, it can be observed that the EFL distribution can exhibit various degrees of kurtosis and skewness, including negative skewness.

3.2.2. Incomplete Moments

The incomplete moment of a distribution with PDF is defined as . Substituting PDF of the EFL distribution in equation (11) into the definition gives

After substitution, similar to obtaining the ordinary moment, and some algebraic manipulations, the incomplete moment of the EFL distribution is obtained as follows:where is the upper incomplete gamma function. The first incomplete moment of the EFL distribution is given as follows:

3.3. Moment Generating Function

The moment generating function (MGF) of a distribution is defined as and is useful in obtaining moments of the distribution. Using Taylor series expansion, MGF can be written as . The MGF of the EFL distribution is obtained by substituting the ordinary moment in equation (17) into the definition. This gives the MGF of the EFL distribution as follows:

4. Actuarial Properties

In this section, some actuarial properties of the EFL distribution are obtained. These include mean excess function, limited expected value function, value at risk, tail value at risk, and tail variance.

4.1. Mean Excess Function

The mean excess function is useful in so many fields. It is also known as mean residual function or complete expectation of life. In an insurance context, an insurance policy with a fixed deductible, say , has its mean excess function defined as the expected payment, with losses less than not paid. Also, in a mortality context, it can be defined as the remaining lifetime of an individual, given that the individual attained a particular age, say . The mean excess function is defined as follows:

Using the PDF of the EFL distribution given in equation (11), we have

Letting and , and after some algebraic manipulations, we havewhere . Substituting equation (25) into equation (23) gives the mean excess function of the EFL distribution as follows:

Figure 3 shows some plots of the mean excess function for three sets of parameter values of the EFL distribution. It can be observed that the mean excess function generally increases and can also assume both linear and nonlinear shapes.

4.2. Limited Expected Value Function

Given a policy limit or a deductible from a reinsurance perspective, say , a limited loss random variable is defined as follows:

The limited expected value function is defined as the expectation of the limited loss random variable given as follows:where is the first incomplete moment given in equation (21). Substituting equations (3) and (21) into the definition gives the limited expected value function of the EFL distribution as follows:

4.3. Value at Risk

Value at risk (VaR) is a commonly used risk measure. VaR is defined as the loss that will not be exceeded with a given probability. Mathematically, given a probability , . Thus, VaR is also known as a quantile risk measure and is defined as for a continuous distribution. VaR of the EFL distribution with probability is defined as follows:

4.4. Tail Value at Risk

Tail value at risk (TVaR) defines the expected value of the worst case of a loss. That is, TVaR measures the expectation of the losses beyond VaR. Given that , then TVaR is defined as follows:

Substituting equation (25), with , into the definition gives the TVaR of EFL distribution as follows:

Figure 4 shows the plots of simulated values of VaR and TVaR for different parameter values and a range of confidence levels. It can be observed that increasing confidence levels are associated with increasing VaR and TVaR. This is consistent with practice, as more capital would have to be allocated for risk management purposes if a company wants to be safer at a higher probability.

4.5. Tail Variance

TVaR measures the expectation of losses exceeding VaR but does not measure the variability of these losses. Tail variance (TV) measures the conditional variance of losses given that they exceed VaR at a given probability. TV at a probability of is defined as follows:

Using the PDF of the EFL distribution in equation (1), we have

With the necessary substitutions and algebraic manipulations, we havewhere . Substituting equation (35) into equation (33) gives the TV of the EFL distribution as follows:

5. Parameter Estimation Methods

Estimators of the parameters of the EFL distribution are presented in this section. Four different estimation methods including maximum likelihood, maximum product spacing, least squares, and weighted least squares estimation methods are presented.

5.1. Maximum Likelihood Estimation

Let be independent and identically distributed random samples from the EFL distribution with a set of parameters . The total log-likelihood function of the density of the distribution given in equation (4) is obtained as follows:

Equating the score functions, which are obtained by differentiating equation (37) with respect to each parameter, to zero and solving them simultaneously for the parameters give the maximum likelihood estimates (MLE) of the parameters of the EFL distribution. Numerical methods are employed to obtain the parameter estimates since the solution to the equations does not result in closed-form solutions.

5.2. Maximum Product Spacing Estimation

The maximum product spacing (MPS) method of obtaining parameters is an alternative to the maximum likelihood method. Let the ordered random samples of the EFL distribution be given as with CDF given in equation (3). Define the uniform spacing as follows:where and . The MPS estimates of the parameters are obtained by maximizing the function as follows:with respect to each parameter.

5.3. Ordinary and Weighted Least Squares Estimation

Let the ordered samples of the EFL distribution be given as . The ordinary least squares (OLS) estimates of its parameters are obtained by minimizing the function as follows:with respect to the parameters of the distribution. Also, the weighted least squares (WLS) estimates are obtained by minimizing the following function with respect to the parameters of the distribution:

6. Simulation Studies

Simulation studies are carried out in this section to assess the performance of the parameter estimators. The R program with the nlminb function is used for the simulation. The function uses the L-BFGS-B optimization method. The simulation procedure is given as follows:(i)Generate samples of size from the EFL distribution using its quantile function in equation (12).(ii)Compute the MLE, MPS, OLS, and WLS parameter estimates of the samples obtained in the previous step.(iii)For each parameter estimate, obtain the average estimate (AE), absolute bias (AB), and the root mean square error (RMSE) defined as follows:where .(iv)Steps (i) to (iii) are repeated for two parameter sets and .

Tables 2 and 3 show the simulation results. It can be observed that all the estimation methods are consistent since their AE grows closer to the true parameter values, while AB and RMSE grow towards zero for all the estimation methods. However, generally, for smaller sample sizes, WLS performed better for and , while MPS and MLE performed better for and , respectively, for both sets of simulations. But, for larger sample sizes, MLE and MPS generally performed better for all the parameters in both simulations. Due to the desirable properties of MLE, it will be used to estimate the parameters of the distribution for application purposes.

7. EFL Regression Model

Regression analysis plays an important role in data analysis in most fields, including actuarial science. In this section, a new regression model with the response variable following the EFL distribution is given. Using the regression structurewhere is the vector of independent variables and is the vector of parameters. is known as a link function and links the response variable to the independent variables. Generally, the response variable is linked to the independent variables via the mean. But, also, the response variable can be linked to the independent variable via the quantile or a model parameter. In using a model parameter, a scale or shape parameter is used [2]. In this study, the shape parameter is used. Also, the log link function is used. This gives the response variable following the EFL distribution with parameters , where . The PDF of the EFL regression model is given as follows:

The parameters of the EFL regression model can be obtained via the maximum likelihood method by maximizing the log-likelihood function given by

For practical purposes, after fitting a model, residual analysis is used to diagnose the model and assess its adequacy. In this study, Cox–Snell [23] residual analysis is employed. Cox–Snell residuals are defined as where ís a vector of estimated parameters. The Cox–Snell residuals are standard exponentially distributed if the model fits the data. Checking the adequacy of a model using the Cox–Snell residuals can also be graphically investigated.

7.1. Simulation Studies

A Monte Carlo simulation study is carried out to assess the MLE estimators of the parameters of the EFL regression model. Three independent variables are considered in this simulation. Thus, the regression structure used is

The process used for the simulation is as follows:(i)Generate 3,000 samples of sizes from the EFL distribution using its quantile function and the independent variables, , , and , from a uniform distribution(ii)Obtain the MLE estimates of the parameters and (iii)Compute AE, AB, and RMSE of the parameter estimates(iv)For the parameters , repeat steps (i) to (iii) for the parameter values and

The results of the simulation study are given in Table 4. It can be observed that the estimators of the parameters are consistent as the AE gets closer to the true parameter values, while AB and RMSE decrease with increasing sample size.

8. Applications to Real Data

In this section, the applications of the EFL distribution and EFL regression model to real data are demonstrated.

8.1. Application of EFL Distribution

In this subsection, the application of the EFL distribution is considered. The performance of the distribution is compared with several other distributions using Cramér von-Mises (CVM) and Anderson Darling (AD) goodness-of-fit measures. The distribution with the least of these measures is considered the best distribution. The distributions compared with the EFL distribution include the Fréchet (F), exponentiated Fréchet (EF), beta exponentiated Fréchet (BEF), Kumaraswamy Fréchet (KF), Weibull (W), and Weibull loss (WL) [20] distributions.

8.1.1. Data 1: Catastrophe Data

The first data consist of the cost associated with natural catastrophic disasters in Australia from 1967 to 2014. The normed cost in millions of 2014 Australian dollars (AUD), computed as the inflated cost using the consumer price index, is used. The data can be found in the CASdatasets package [24] of the R program with the name auscathist. Table 5 shows the descriptive statistics of the data. The data has 206 observations with a wide range of values. Since the median is less than the mean, it suggests that the data is right-skewed.

Figure 5 shows the histogram and box plot of the data. Both figures confirm that the data is right-skewed. This suggests that the EFL distribution can be used to model the data.

The parameter estimates and their corresponding standard errors, in brackets, of the EFL distribution and the other competing distributions are shown in Table 6.

Table 7 shows the goodness-of-fit measures of the distributions. It can be observed that the EFL distribution fits the data better than the competing distributions as it has the least of all the goodness-of-fit measures with large corresponding -values.

Figure 6 shows the PDF plot superimposed on the histogram of the data, the CDF, and probability-probability (P-P) plots of the EFL distribution. It can be observed that the EFL distribution fits the data.

8.1.2. Data 2: Automobile Collision Data

The second data consist of severity, the average amount of claims (in pounds sterling) adjusted for inflation, of automobile collisions in the United Kingdom. The data can be found in insuranceData package [25] of the R program with the name AutoCollision. Table 8 shows the descriptive statistics of the data. The data consists of 31 observations and indicates positive skewness, as its mean is greater than its median.

Figure 7 shows the histogram and box plot of the data. The data can be observed to be positively skewed; confirmation of the observation is made in Table 8.

Table 9 shows the parameter estimates of the EFL distribution and the other competing distributions with their standard errors in brackets.

The goodness-of-fit measures of the fitted distributions are shown in Table 10 with their corresponding -values. It can be observed that EFL distribution has the least of the measures and the largest -value.

Figure 8 shows the histogram of the data with the fitted PDF, the CDF, and P-P plots of the EFL distribution. It can be observed that the EFL distribution can be used to model the automobile collision data.

8.2. Application of EFL Regression Model

This subsection presents an application of the EFL regression model to a real data set. The data used is obtained from insuranceData package [25] in the R program with the name dataOhlsson and comes from the former Swedish insurance company Wasa. The data contains aggregated data on all insurance policies and claims from 1994 to 1998. In this data set, the variables used are the claim cost () in 10,000 Swedish krona, vehicle age () and MC class, a classification by the so-called EV ratio, defined as (engine power in kW × 100)/(vehicle weight in kg + 75), rounded to the nearest lower integer. The 75 kg represents the average driver weight. The EV ratios are divided into seven classes. This data set was analyzed in a regression context by Gündüz and Genç [2]. The descriptive statistics of the claims and frequencies of the MC class are given in Table 11. It can be observed that there are 670 observations with more than zero claims. Also, MC class 6 can be observed to have the highest number of occurrences, with class 7 having the least.

The independent variable MC class is a categorical variable with seven levels and is coded using an indicator variable for the regression model. Given a categorical variable with levels, then new indicator variables are introduced. In such a case, one of the categories is chosen as a reference level. Usually, the level with the highest frequency is used as the reference level. Similar to Gündüz and Genç [2], level 6 of the MC class is chosen as the reference level because it has the highest number of occurrences as shown in Table 10. In this scenario, the following levels and their corresponding indicator variables are used: , and . Hence, the regression model considered is given as follows:

The performance of the EFL regression model is compared with the EF regression model with parametrization I as defined by Gündüz and Genç [2].

Table 12 shows the parameter estimates of the EFL and EF regression models with their corresponding standard errors (SE) and p-values. Also, the average marginal estimates (AME), which measures the average contribution, of each independent variable is presented in Table 12. Again, the negative log likelihood , Akaike information criteria (AIC), and Bayesian information criteria (BIC) are also presented.

It can be observed from Table 12 that the vehicle age and MC class 3 are significant and significantly different from MC class 6 at a 5% significance level for both regression models. Both of these variables have a negative impact on the claims, as can be observed from their AME for the EFL regression. However, vehicle age contributes positively to the claims in the EF regression model, while MC class 3 contributes negatively. Finally, EFL regression performs better in modelling the data as compared to the EF model, as the EFL model has the least values in terms of , AIC and BIC measures.

Cox–Snell residuals analysis is performed on the fitted models to evaluate their fit. Figure 9 shows the P-P plots of the empirical probabilities of the residuals against the theoretical probabilities from the standard exponential distribution. It can be observed that the EFL regression has more plotted points closer to the diagonal as compared to the EF regression model. This confirms that the EFL regression model performed better than the EF regression model in modelling the data.

9. Conclusion

A new loss distribution, called the exponentiated Fréchet loss distribution, is developed and studied. Various statistical properties including the quantile function, moments, and moment generating function are obtained. Also, some actuarial properties including value at risk, tail value at risk, and tail variance of the distribution are obtained. Four estimation methods are used to obtain the estimators of the loss distribution. Simulations studies are performed to assess the performance of the estimators. The new loss distribution is extended into a regression model. The usefulness of the new loss distribution and its regression model are demonstrated using real data sets. The results show that the exponentiated Fréchet loss distribution and its regression model can serve as an alternative to modelling loss data.

Data Availability

The data used for the analysis are openly available, and the sources are stated in the manuscript.

Conflicts of Interest

The author declares that there are no conflicts of interest with respect to this research.