In this paper, the three-parameter skew lognormal distribution is proposed to model actuarial data concerning losses. This distribution yields a satisfactory fit to empirical data in the whole range of the empirical distribution as compared to other distributions used in the actuarial statistics literature. To the best of our knowledge, this distribution has not been used in insurance context and it might be suitable for computing reinsurance premiums in situations where the right tail of the empirical distribution plays an important role. Furthermore, a regression model can be simply derived to explain the response variable as a function of a set of explanatory variables.

1. Introduction

The modelling of large claims is a topic of relevant importance in general insurance and reinsurance, particularly in the field of mathematical risk theory, see, for instance, McNeil [1]; Beirlant and Teugels [2]; Beirlant et al. [3]; and Embrechts et al. [4]. For a comprehensive study about reinsurance, see the recent book of Albrecher et al. [5]. It is our interest to find simple statistical distributions appropriate for modelling both, smaller and medium-size losses with a high frequency and large losses with a low frequency. This is an issue of particular interest in the context of reinsurance and premium calculation principles. In this sense, the classical Pareto distribution [68] has been traditionally considered as a suitable claims’ size distribution in relation to rating problems. Concerning this, the single parameter Pareto distribution not only has nice statistical properties but also provides a good description of the random behaviour of large losses (e.g., the right tail of the distribution). Particularly, when calculating deductibles and excess-of-loss levels of reinsurance, the simple Pareto distribution has been proved to be convenient (see, for example, [9, 10]). As an alternative to the classical Pareto distribution, other models have been recently introduced in the actuarial literature by Sarabia et al. [1113] and Ghitany et al. [14].

On the contrary, there exist many situations where the empirical data show slight or marked asymmetry. This is frequently the case, for example, with actuarial and financial data that also have heavy tails reflecting the existence of extreme values. These two features imply that the data cannot be adequately modelled by the Gaussian or normal distribution.

Let and , respectively, be the probability density function (pdf) and the cumulative distribution function (cdf) of a symmetric distribution. A random variable is said to have a skew distribution if its pdf is given by

This family of distributions has been widely studied as an extension of the normal distribution via a shape parameter, , that accounts for the skewness of the model. When and are replaced in (1) by and , i.e., the standard normal density and distribution function, respectively, the resulting model is called the skew normal distribution.

In this paper, special attention is paid to the generalized skew normal density provided in Henze [15] and also studied by Arnold and Beaver [16]. Its pdf is given by

For multivariate extensions, see for instance Azzalini and Valle [17], Azzalini and Capitanio [18], and Arnold and Beaver [16]. For an exhaustive and comprehensive study of the skew-normal distribution, see the recent book of Azzalini [19].

This paper is organized as follows. In Section 2, the proposed distribution is studied and some interesting properties are given. Numerical applications are provided in Section 3, and conclusions are drawn in Section 4.

2. Modelling the Size of Losses

The basic skew lognormal distribution has been studied by Lin and Stoyanov [20] (see also [19]; Chap. 2, [2123]). Nevertheless, in this paper, we will pay special attention to the distribution arising from exponentiation of (2) in the following sense: if is a random variable with density (2), , we consider a new random variable . Also, if taking and using a linear transformation, we allow for more flexible location and scale parameters in our model. Thus, we get the pdf given bywhere

Here, , , and . Observe that when expression (3) reduces to the classical lognormal distribution. It can be seen that the parameter regulates the shape of the distribution. Furthermore, Lin and Stoyanov [20] established that the distribution has heavy tails and therefore it is a suitable distribution to be incorporated to the wide catalogue of heavy tails (see [5]). Additionally Lin and Stoyanov [20] provided a stochastic representation of this distribution and determined that the distribution can be obtained as the product of two independent random variables, one of which is lognormal and the other one a logarithmic half-normal. Simple calculations show that density (3) can be also written aswhere is expression (1) with , , , and .

It is straightforward to show that the distribution is unimodal with the mode satisfying the equation:

The mean value and the second order moment of the random variable with density (3), , are given by

Let denote the cdf of . By applying directly result B.21 in Azzalini [19], it is not difficult to observe that this cdf is given bywhich is satisfied whenever and . Here, represents Owen’s function (see [24]) given by

If , we deal with the standard normal distribution, and . The latter function can be easily computed by using the Mathematica software package.

The Acceptance-Rejection method of simulation can be used to generate random variates from (3). We begin by simulating a value from a lognormal distribution with parameters and . Then, having chosen an alternative random variable that follows a lognormal probability distribution to simulate from, we define a constant in the following way:where is given by (3) and is the standard lognormal density. The algorithm for simulating a random variate from the LSN distribution is as follows:(1)Generate a random variate from the lognormal distribution, .(2)Generate a random number .(3)If , then set the simulated value from (3) equal to . Otherwise, return to step 1.

By writing aspdf (3) has mean value equal to and variance provided bywhere

Therefore, a regression model can be derived from the LSN distribution. In this model, is the mean of the response variable and can be interpreted as a precision parameter in a way such that, for fixed , the larger the value of , the larger the variance of .

Let us now consider that the random variable is related to a vector of covariates associated with the th observation, where . This vector assigns a weight of observable features is related to through a log link function and is a vector of regressors with for . The regression model takes the form

Observe that log link ensures that falls within the interval .

On the contrary, sometimes it is desirable to deal with a lower value in the range of by shifting this random variable, say with , and therefore a shifted version of the previous distribution is obtained. In this case, it is convenient to work with the pdf given bywhere

Figure 1 shows pdf (15) as compared to the Pareto and PAT (Pareto ArcTan distribution) (this generalization of the Pareto distribution was proposed by Gómez-Déniz and Calderín-Ojeda [13]), all of them with the same mean, say (22), and a threshold of . It is discernible that the tail of the LSN density seems larger than those ones of the other models.

By takingit is guaranteed that (15) has mean equal to . In this case, for the associated regression model, we will use the link function:which guarantees that falls within the interval .

It is already known (see Lemma 2 in [25]) that

Then, it is straightforward to see that if , we have

This can be used to easily obtain asymptotic values of the pdf given in (15). In fact, if , we have the following approximation:

3. Empirical Illustrations

In this section, the versatility of the proposed skew lognormal model, as compared with the Pareto distribution and the Pareto ArcTan Distribution [13], is tested using three datasets. The first dataset is the danishuni that can be found in the R package CASdatasets collected at Copenhagen Reinsurance and comprises 2167 fire losses over the period 1980 to 1990, adjusted for inflation to reflect 1985 values and are expressed in millions of Danish Krone. The second dataset is norfire comprises 9181 fire losses over the period 1972 to 1992 from an unknown Norwegian company. A priority of 500 thousands of Norwegian Krone (NKR) (if this amount is exceeded, the reinsurer becomes liable to pay) was applied to obtain this dataset. The third dataset considers hospital costs in the state of Wisconsin. It can be found in the personal web page of Professor E. Frees. It examines the impact of several predictors on hospital charges obtained from the Office of Health Care Information, Wisconsin’s Department of Health and Human Services, in the year 1989. The response variable is the logarithm of total hospital charges per number of discharges, and the explanatory variables considered are the health service area, hospital discharge costs, the type of payer, size of the area population, number of hospital beds, and average income within the area. For comparison reasons some alternative distributions have been considered in this paper, their pdf’s are provided in Appendix A and B. Some descriptive statistics of the three variables of interest are shown in Table 1.

Parameter estimation for all the models considered in this paper has been carried out by the method of maximum likelihood using Mathematica v.12.0® and also confirmed by using WinRATS v.7.0. Both moments and the maximum likelihood methods are suitable to estimate the vector of parameters of the distribution via sample observations, as shown in Appendix A and B. Codes are available from the authors upon request. For details about software, see Ruskeepaa [26] and Brooks [27].

Since closed expressions are not available for the maximum likelihood estimates and the computation of the global maximum of the logarithm function of the likelihood is not guaranteed, thus it is advisable to use several seed points as the starting value. It is also prudent to use different optimization methods (Newton–Raphson, Broyden–Fletcher–Goldfarb–Sanno, BGGS) (this is an iterative method for solving unconstrained nonlinear optimization problems which can be seen in Broyden [28, 29] and Shanno [30]) that ensure that the same solution is obtained from any of these methods. The standard errors of the estimates can be calculated by inverting the Hessian matrix. In this sense, both Mathematica and WinRats have at least two methods to reach it. The first is to retrieve them from the Cholesky factors (this package is available on the web upon request). The second, faster, is to obtain them by finite differentiation. Furthermore, WinRats package also offers the possibility to directly compute the maximum of the log-likelihood providing the elements of the Fisher information matrix. In fact, for the examples considered in this section, these two packages were used to quickly compute the maximum likelihood estimates. Commands for fitting the skew normal and log-skew normal distributions are also available in stata (see [31]).

Parameter estimates and their standard errors (in brackets), negative of the maximum of the log likelihood function, and AIC for the three datasets considered are shown in Tables 2 and 3. Model assessment is derived through the following information criteria. Negative log likelihood (NLL) is calculated by taking the negative of the value of the log-likelihood evaluated at the ML estimates; Akaike information criterion (AIC) is computed as twice the NLL and evaluated at the ML estimates, plus twice the number of estimated parameters; Consistent Akaike Information Criteria (CAIC), a corrected version of the AIC is proposed by Bozdogan [32] to overcome the tendency of the AIC overestimating the complexity of the underlying model as it lacks the asymptotic property of consistency. In order to calculate the CAIC, a correction factor based on the sample size is used to compensate for the overestimating nature of AIC. The CAIC is defined as twice the NLL plus , where is the number of free parameters and refers to the sample size. Note that a model with a lower statistics value is preferred to one with a higher value. All these results are shown in Table 2. Furthermore, we also include the Kolmogorov–Smirnov test (KS) and the Anderson–Darling test (AD) to express the fit of the model to the data in terms of the distribution function. For these statistics, smaller values indicate a better fit of the model to the data. Note that they not only provide a way to measure the fit in terms of distribution functions but also allow us to perform hypothesis testing for model selection purposes. The value of the test statistics, computed using the Monte Carlo method by using 1000 simulations, is shown in brackets. An extremely small value may lead to a confident rejection of the null hypothesis that the data comes from the proposed model. It can be seen that LSN distribution is not rejected at the usual significance levels for both tests.

Graphs of histogram of the data and superimposed fitted densities are given in Figure 2. As can be seen, the proposed distribution provides a good fit for the empirical data.

In Table 4, the LSN distribution is fitted to the Wisconsin hospital costs’ dataset. From left to right the parameter estimates, standard errors (SE), and the corresponding values calculated based on the -Wald statistic are displayed. Besides, AIC and BIC values for each model are provided in the last two rows of the table. The estimates of the regressors associated with the explanatory variables, number of beds and income, are not significant at the usual nominal level.

Finally, in order to choose a model that provides an acceptable description of the loss process for the Danish and Norwegian datasets, we should verify that the population limited expected values, computed numerically byare close to the empirical ones. As is well known, (22) is the expected amount per claim retained by the insured on a policy with a fixed amount deductible of . In this case, the empirical limited expected value function was calculated based on . Obviously, when tends to infinity, and approach and the sample mean, respectively.

Table 5 below exhibits the limited expected value for different values of the policy limit considered for the hospital costs’ dataset. It is observed that the values obtained from the LSN distribution adheres closely to the observed empirical limited expected values obtained from the Pareto and PAT distributions. Similarly, in Figure 3, empirical and fitted limited expected value function for this dataset and also for the Danish and Norwegian data are shown. As can be observed, the LSN distribution stays closer to the empirical limited expected values for the three datasets considered.

3.1. Out-of-Sample Validation of LSN Distribution

We are interested now in showing the power of the LSN distribution to predict the number of claims out-of-sample via quantile-quantile plots (QQ-plots) of randomized quantile residuals. For this reason, the dataset danishuni is randomly partitioned into two disjoint subsets of different sizes. The first subset (dataset A) is used for fitting the models, whereas the second subset (dataset B) is used for graphing the quantile-quantile plots of the randomized quantile residuals to check for normality. The residuals for the out-of-sample datasets are exhibited in Figure 4 for different in-sample sizes, i.e., size of dataset A. For comparison purposes, we have we have plotted the residuals for the whole dataset (top-left graph) with sample size 2167. A perfect alignment with the line implies the residuals are normally distributed. It is observable that the residuals for LSN distribution adheres reasonably well to the line throughout the residual distribution. However, for all different in-sample sizes considered, the model tends to overestimate the lower quantiles and underestimate the upper quantiles. In general, the predictive power of the model is acceptable.

4. Conclusions and Extensions

In this paper, the use of the skew lognormal distribution has been proposed as a suitable model for describing claims’ size empirical data. To the best of our knowledge, this probabilistic family has not been previously considered in the actuarial literature. The advantages of this distribution are twofold. On the one hand, it includes heavy tails which makes it an interesting model to describe severity data and; on the other hand, the distribution can be rewritten to allow for the incorporation of predictors to explain a response variable.

Finally, it is worthy to point out that the distribution introduced in this paper could be extended to other areas of risk theory. For example, the LSN distribution could be employed as the secondary distribution (the distribution of the claims size) in the compound collective risk model. In particular, this distribution could be used as an approximation of the distribution of the total claims’ size instead of the traditionally considered normal distribution. Furthermore, calculation of reinsurance premiums based on this distribution is an issue that deserves to be studied.


A. PDF of the Alternative Distributions

The pdf of the classical Pareto and PAT distributions used in this work are given byrespectively.

B. Normal Equations

Let us firstly consider the case of the model without covariates and let be a sample obtained from distribution (3). By denoting as the vector of parameters to be estimated, the log-likelihood is proportional to

The required normal equations to compute the maximum likelihood estimates are

Data Availability

The csv data used to support the findings of this study can be found in the R repository CASdatasets and also at https://instruction.bus.wisc.edu/jfrees/jfreesbooks/Regression%20Modeling/BookWebDec2010/data.html.

Conflicts of Interest

The authors declare that they have no conflicts of interest.


EGD would like to thank the Ministerio de Economía y Competitividad (project ECO2017-85577-P) for partial support of this work (Ministerio de Economía y Competitividad, Spain).