Abstract

During the past couple of years, statistical distributions have been widely used in applied areas such as reliability engineering, medical, and financial sciences. In this context, we come across a diverse range of statistical distributions for modeling heavy tailed data sets. Well-known distributions are log-normal, log-, various versions of Pareto, log-logistic, Weibull, gamma, exponential, Rayleigh and its variants, and generalized beta of the second kind distributions, among others. In this paper, we try to supplement the distribution theory literature by incorporating a new model, called a new extended Weibull distribution. The proposed distribution is very flexible and exhibits desirable properties. Maximum likelihood estimators of the model parameters are obtained, and a Monte Carlo simulation study is conducted to assess the behavior of these estimators. Finally, we provide a comparative study of the newly proposed and some other existing methods via analyzing three real data sets from different disciplines such as reliability engineering, medical, and financial sciences. It has been observed that the proposed method outclasses well-known distributions on the basis of model selection criteria.

1. Introduction

In the practice of statistical theory, particularly, in engineering, medical, and financial sciences, data modeling is an interesting research topic. In this context, the statistical distributions are worthwhile for modeling such data sets. The most frequently used statistical distributions are exponential, Rayleigh, Weibull, beta, gamma, log-normal, Pareto, Lomax, and Burr, among others. However, these traditional distributions are not flexible enough for countering complex forms of the data sets. For example, in reliability engineering and biomedical sciences, the data sets are usually unimodal and skewed to the right; see Demicheli et al.’s [1], Lai and Xie’s [2], Zajicek’s [3], and Almalki and Yuan’s [4] studies. Hence, in such cases, the utilization of the exponential, Rayleigh, Weibull, or Lomax distributions may not be a suitable choice to employ. On the other hand, the gamma, beta, and log-normal distributions do not have closed forms for the cumulative distribution function (cdf) causing difficulties in estimating the parameters.

Furthermore, in financial and actuarial risk management problems, the data sets are usually unimodal, skewed to the right, and possess thick right tail; for details see, Cooray and Ananda’s [5] and Eling’s [6] studies, among others. The distributions that exhibit such characteristics can be used quite effectively to model insurance loss data to estimate the business risk level. The distributions commonly used in the literature include Pareto by Cooray and Ananda [5], Lomax by Scollnik [7], Burr by Nadarajah and Bakar [8], and Weibull by Bakar et al. [9], which are particularly appropriate for modeling of insurance losses, financial returns, file sizes on the network servers, etc. Unfortunately, these distributions are subject to some sort of deficiencies. For example, the Pareto distribution, due to the monotonically decreasing shape of the density, does not provide the best fit in many applications, whereas the Weibull model is capable of covering the behavior of small losses, but fails to cover the behavior of large losses.

Moreover, Dutta and Perry [10] provided an empirical study on loss distributions using exploratory data analysis and other empirical approaches to estimate the risk. They rejected the idea of using exponential, gamma, and Weibull distributions due to their poor results and pointed out that one would need to use a model that is flexible enough in its structure. Hence, there are only few probability distributions capable of modeling heavy tailed data sets and none of them are flexible enough to provide greater accuracy in fitting complex forms of data.

To address the problems stated above, the researchers have shown an increased interest in defining new families of distributions by incorporating one or more additional parameters to the well-known distributions. The new families have been defined through many different approaches introducing additional, location, scale, shape, and transmuted parameters, to generalize the existing distributions. These generalizations are mainly based on, but not limited to, the following approaches: (i) transformation of the variable and (ii) compounding of two or more models; in detail, we refer the interested readers to studies by Tahir and Cordeiro [11], Bhati and Ravi [12], and Ahmad et al. [13].

One of the most interesting methods of adding the shape parameter to the existing distributions is exponentiation. The exponentiated family pioneer to Mudholkar and Srivastava [14] is defined by the following cdf: where is the additional shape parameter.

Marshall and Olkin [15] pioneered a new simple approach of introducing a single-scale parameter to a family of distributions. The cdf of the Marshall-Olkin (MO) family is given by where is the additional scale parameter.

Cordeiro and Castro proposed (2010) proposed the Kumaraswamy- family defined by where and are the additional shape parameters.

Mostly, so far in the literature either the scale or shape parameters are introduced to propose a new family of distributions. Introducing both the scale and shape parameters to a family of distribution may increase the level of flexibility. But the number of parameters increases, and the estimation of parameters and computation of many mathematical properties become complicated.

In the premises of above, a new attempt has been made to introduce more flexible probability distributions by introducing a single additional parameter which serves as a scale as well as a shape parameter and provides greater accuracy in fitting real-life data in applied fields such as reliability engineering, medical, and financial sciences. Hence, in this paper, a new method is proposed to introduce new statistical distributions. The proposed family may be named as a new extended- (NE-) family. A random variable is said to follow the proposed family, if its cdf is given by

The introduction of the additional parameter in expression (4) adds greater distributional flexibility to the baseline distributions with cdf which may depend on the vector parameter . The additional parameter plays the role of both scale and shape parameters. The probability density function (pdf) corresponding to (4) is

We concentrate our focus to a special submodel of the proposed family, called a new extended Weibull (NE-W) distribution.

Finally, we direct our attention to the results related to the NE-W model with real life data in three different disciplines. The first data set is taken from biomedical field, and the results of the proposed model are compared with five other competitive models including (i) two-parameter Weibull distribution and (ii) three-parameter models such as flexible Weibull extended (FWE), alpha power transformed Weibull (APTW), Marshall-Olkin Weibull (MOW), and modified Weibull (MW) distributions. The second data set is taken from reliability engineering, and the results of the proposed model are compared with three other well-known distributions such as (i) the three-parameter extended alpha power transformed Weibull (Ex-APTW), (ii) four-parameter Kumaraswamy Weibull (Ku-W), and (iii) beta Weibull (BW) distributions. The third data set is taken from financial sciences, and the results of the proposed model are compared with the Weibull and other heavy tailed models including Lomax and Burr-XII (B-XII) distributions.

The rest of the paper is organized as follows: in Section 2, a special case of the proposed family is introduced and the shapes of its density and hazard functions are investigated. Some mathematical properties of the proposed family are derived in Section 3. Maximum likelihood estimators of the model parameters are obtained in Section 4. In the same section, a Monte Carlo simulation study is conducted. Practical applications are analyzed in Section 5. Here, the NE-W distribution is compared with the models mentioned above under different measures of discrimination and other goodness of fit measures. Finally, some concluding remarks are given in the last section.

2. Model Description

In this section, we introduce the NE-W distribution. Considering the cdf of the two-parameter Weibull distribution with the shape parameter and scale parameter , given by , and pdf, given by , respectively, where . Then, the cdf of the NE-W distribution is given by

The density function of the NE-W distribution is

Some possible shapes for the density and hazard functions of the NE-W distribution are sketched in Figures 1 and 2, respectively,

In Figure 1, we plotted different shapes for the density of NE-W distribution. When , then the density of the proposed model behaves like exponential distribution. But as the value of these parameters increases, the proposed model captures the characteristics of the Rayleigh and Weibull distributions. However, the proposed model has certain advantages over these distributions, since it provides the best fit to data in different disciplines as shown in Section 5. The hrf is plotted in Figure 2. The hazard function of the proposed model is very flexible in accommodating different shapes, namely, decreasing, increasing, unimodal, and bathtub; hence, the NE-W distribution becomes an important model to fit several real lifetime data in applied areas such as reliability, survival analysis, economics, and finance.

3. Mathematical Properties of the NE-X Distributions

In this section, we study some mathematical properties of the NE-X distributions such as the quantile function, moment, and moment generating function.

3.1. Quantile Function

The quantile function of the NE-X distributions is given by where . From expression (8), we can see that the proposed model has a closed form solution of the quantile function which makes it easier to generate random numbers for the subcase of the NE-X family.

3.2. Moments

This subsection deals with the derivation of th moment of the NE-X distributions. The th moment of the NE-X distributions is derived as

Using (5) in (9), we have

Using the expansion (https://math.stackexchange.com/questions/1624974/series-expansion-1-1-xn) and using and in (11), we get

Also using the series representation and using and in (13), we get

Using (12) and (14) in (10), we have where

Numerical values for the mean, variance, skewness (Sk), and kurtosis (Kur) of the NE-W distribution for some selected values of the parameters are given in Tables 1 and 2. To check the effect of the additional parameter on Sk and Kur, (i) we kept the parameters and constant and allow θ to vary and then (ii) we kept constant the parameters θ and γ and allow α to vary.

From the numerical results provided in Table 1, it is clear that as the additional parameter increases the mean and variance decrease, whereas increasing results in increasing the Sk and Kur of the model showing that the proposed distribution is leptokurtic, unimodal, and skewed to the right. From the results provided in Table 1, we can also detect that increase in the parameter results in producing skewness to the right indicating heavy tail to the right. Also, from the results in Table 2, we can see that as the parameter increases, the distribution produces skewness to the right but has low impact on skewness and kurtosis. Hence, from the numerical results presented in Tables 1 and 2, we conclude that the introduction of the additional parameter to the Weibull model brings more flexibility to the skewness and kurtosis of the NE-W distribution.

The moment generating function, say , of the NE-W distributions can be obtained as follows:

Using (15) in (17), we get the mgf of the NE-W distributions.

4. Maximum Likelihood Estimation and Simulation Study

This section offers the maximum likelihood estimators of the model parameters and provide Monte Carlo simulation study to assess the behavior of these estimators.

4.1. Maximum Likelihood Estimation

Numerous approaches for estimating the unknown parameters have been proposed in the literature. Among them, the maximum likelihood estimation is the most prominent and commonly employed method to obtain the point estimators. The maximum likelihood estimators (MLEs) possess desirable properties and can be utilized for constructing the confidence interval and other statistical tests. By MLEs, various statistics are built for assessing the goodness-of-fit in a model, such as the maximum log-likelihood (), Akaike Information Criterion (AIC), and Bayesian Information Criterion (BIC), as given in the next section. The normal approximation of the MLEs can easily be treated either numerically or analytically. In this subsection, we consider the estimation of the unknown parameters of the NE-X family from complete samples only by the method of maximum likelihood. Suppose form an observed random sample from the NE-X family with pdf (5). Let be the parameter vector. The log likelihood function corresponding to (5) is given by

The log-likelihood function can be maximized directly either by using the ASS (PROC UNMIXED) or by solving the nonlinear likelihood equations obtained by differentiating (18). The partial derivatives of (18) are as follows:

Equating the nonlinear system of equations and to zero and solving these expressions simultaneously yield the MLEs and , respectively. From expressions (19), it is clear that these expressions are not in explicit forms. Therefore, computer software can be used to solve these expressions numerically. We use R-function with the argument to obtain the maximum likelihood estimators. The expression (18) can be used to obtain the MLEs for any subcase of the proposed family. For the NE-W distribution, the expressions for the MLEs are derived in the appendix.

4.2. Monte Carlo Simulation Study

In this subsection, we investigate the performance of the maximum likelihood estimators of the proposed distribution. For the simulation purposes, the NE-W distribution is considered. We use the inverse cdf method for generating random numbers from the NE-W distribution. If and if G has an inverse function, then is a random variable with NE-W distribution. The random numbers are generated via the R-function with the argument . The simulation process is based on the following steps: (i)Generate 750 samples of size from NE-W distribution with parameters , , and (ii)Compute the maximum likelihood estimators of for (iii)Compute biases and mean square errors (MSEs) of the model parameters(iv)Steps (i)–(iii) are repeated for

The simulation results are provided in Figures 36, indicating that (i)the estimates are quite stable and, more importantly, are close to the true values for these sample sizes(ii)the estimated biases decrease when the sample size increases(iii)the estimated MSEs decay toward zero when the sample size increases

5. Comparative Study

As we have mentioned earlier, the researchers have been developing new distributions to provide the best fit to real-life data in applied areas such as reliability engineering, medical, actuarial, and financial sciences. Therefore, in this section, we consider three real life applications from different discipline of applied areas including medical, engineering, and financial sciences. For each data set, the NE-W distribution is compared with different well-known distributions and we observed that the proposed distribution outclasses other competitors.

To decide about the goodness of fit among the applied distributions, we consider certain analytical measures. In this regard, we consider two discrimination measures such as the Akaike information criterion (AIC) introduced by Akaike [16] and Bayesian information criterion (BIC) of Schwarz [17], and Scollnik [18]. These following measures are given: (i)The AIC is given by (ii)The BIC is given by where denotes the log-likelihood function evaluated at the MLEs, is the number of model parameters, and is the sample size. In addition to the discrimination measures, we further consider other goodness of fit measures such as the Anderson Darling (AD) test statistic, Cramer-von Mises (CM) test statistic, and Kolmogorov-Smirnov (KS) test statistic with corresponding values. These following measures are given: (i)The AD test statistic where is the sample size and is the th sample, calculated when the data is sorted in an ascending order(ii)The CM test statistic (iii)The KS test statistic is given by where is the empirical cdf and supx is the supremum of the set of distances

A distribution with lower values of these analytical measures is considered to be a good candidate model among the applied distributions for the underlying data sets. By considering these statistical tools, we observed that the NE-W distribution provides the best fit compared to other distributions because the values of all of the selected criteria of goodness of fit are significantly smaller for the proposed distribution.

5.1. A Real Life Application of Biomedical Analysis

The bladder cancer is the ninth most frequently diagnosed malignancy worldwide [19] and one of the most prevalent, representing 3 of cancers diagnosed globally [20]. Bladder cancer accounts for an estimated 386,000 new diagnoses and 150,000 related deaths annually. Early detection of bladder cancer remains one of the most urgent issues in many researches. The first data set is taken from Lee and Wang [21]; the authors studied the remission times (in months) of a random sample of 128 bladder cancer patients. They rejected the hypothesis of using the exponential and Weibull distributions for modeling medical sciences data having nonmonotic hazard function. The authors observed that the extended versions of these classical distributions can be used quite effectively to model such type of data. The proposed NE-W model is applied to this data in comparison with other well-known competitors. The distribution functions of the competitive models are as follows: (1)FWE distribution (2)APTW distribution (3)Marshall-Olkin Weibull (MOW) distribution (4)MW distribution

The maximum likelihood estimators with standard error (in parenthesis) of the model for the analyzed data are presented in Table 3. The discrimination measures along with the goodness of fit measures of the proposed and other competitive models are provided in Table 4. Form the results provided in Table 4, it is clear that the proposed distribution has lower values of these measures than the other models. The fitted cdf and Kaplan-Meier survival plots of the proposed model for the analyzed data set are plotted in Figure 7. The PP plot of the proposed model and Box plot of the data set are sketched in Figure 8. From Figure 7, it is clear that the proposed model fits the estimated cdf and Kaplan Meier survival plots very closely. Box plot is a tool for graphically depicting the data. It gives a good indication of how the values in the data are spread out. From Figure 8, we can easily detect that the data has a heavy tail skewed to the right (Box plot) and the proposed model closely followed the PP plot.

5.2. A Real Life Application from Reliability Engineering

Here, we investigate the NE-W distribution via analyzing reliability engineering data set taken from Algamal [22] representing the failure time of coating machine. To show the potentiality of the proposed method, the proposed model and other competitive distributions are applied to this data set and it is observed that the NE-W model again outclassed the well-known distributions. The distribution functions of the competitive models selected for the second data set are as follows: (1)Ex-APTW distribution (2)Ku-W distribution (3)BW distribution

Corresponding to data set 2, the values of the model parameters are reported in Table 5. The analytical measures of the proposed and other competitive models are provided in Table 6. The estimated cdf and Kaplan-Meier survival plots are sketched in Figure 9, which shows that proposed distribution fits the estimated cdf and Kaplan-Meier survival plots very closely. The PP plot and box plot are sketched in Figure 10. From the box plot of the second data set, it is also clear that the data set has heavier tail.

5.3. A Real Life Application from Insurance Sciences

The third data set was taken from the insurance sciences representing the vehicle insurance losses available at http://www.businessandeconomics.mq.edu.au/our_departments/Applied_Finance_and_Actuarial_Studies/research/books/GLMsforInsuranceData. We fitted the proposed model in comparison with the other models. The distribution functions of the competitive models are as follows: (1)Lomax (2)Burr

For the third data set, parameter values are reported in Table 7, and the analytical measures are presented in Table 8. The estimated cdf and Kaplan-Meier survival plots are sketched in Figure 11. The PP plot and Box plot are sketched in Figure 12. From Figures 11 and 12, it is clear that the data set has a heavier tail and the proposed model fits the estimated cdf and Kaplan-Meier survival plots very well.

6. Concluding Remarks

The importance of the extended distributions was first realized in financial sciences and later in other applied fields such as engineering and medical sciences. To cater data in those fields, a number of methods have been introduced. In this context, we have proposed a versatile three-parameter distribution, called a new extended Weibull distribution using a new approach allowing closed form expressions for some basic mathematical and other related properties. The applicability of the proposed family has been illustrated via three data sets from medical, engineering, and financial sciences, and the model performs reasonably well as compared to some well-known distributions.

This new development, which has a promising approach for data modeling in the field, may be very useful for practitioners who handle such data sets. For that reason, it can be deemed as an alternative to the Weibull and other well-known competitors.

Appendix

Using and in (18), we get the expression of the log-likelihood function for the NE-W distribution, given by where . The partial derivatives of (A.1) are as follows:

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

There is no competing interest regarding the publication of this paper.

Acknowledgments

The first three authors acknowledge the support of (i) the National Social Science Fund of China (17BTJ010) and (ii) the Fund for Shanxi “1331 Project” Key Innovative Research Team.