Meta-Heuristic Techniques for Solving Computational Engineering Problems 2021View this Special Issue
A Novel Generalized Family of Distributions for Engineering and Life Sciences Data Applications
In this paper, a new method is proposed to expand the family of lifetime distributions. The suggested method is named as Khalil new generalized family (KNGF) of distributions. A special submodel, termed as Khalil new generalized Pareto (KNGP) distribution, is investigated from the family with one shape and two scale parameters. A number of mathematical properties of the submodel have been derived including moments, moment-generating function, quantile function, entropy measures, order statistics, mean residual life function, and maximum likelihood method for the estimation of parameters. The proposed distribution is very flexible in its nature covering several hazard rate shapes (symmetric and asymmetric). To examine the performance of the maximum likelihood estimates in terms of their bias and mean squared error using simulated samples, a simulation study is carried out. Furthermore, parametric estimation of the model is conferred using the method of maximum likelihood, and the practicality of the proposed family is illustrated with the help of real datasets. Finally, we hope that the new suggested flexible KNGF may produce useful models for fitting monotonic and nonmonotonic data related to survival analysis and reliability analysis.
A prominent subject in distribution theory is the advancement of novel methods to expand the existing families of lifetime distributions. For modeling data, many distributions have been extensively used and applied in various fields such as actuarial sciences, biological sciences, demography, social sciences, engineering, and medical sciences. A number of lifetime models are available in the literature to analyze the data. However, in many situations, the classical distributions are not suitable for describing and predicting real-world phenomena. Due to this reason, attempts are made to describe new techniques for creating new distributions with the addition of one more parameter to the baseline model. Generally, an extra parameter is included by means of generators, or present distributions are joined to have more flexible models .
The justification of these amendments is to take along more flexibility to the conventional models such as to introduce skewness in symmetric distributions, to obtain reliably better fits than other proposed distributions and to create heavy-tailed distributions for modeling varied real datasets, and to obtain flexible models for fitting all types of hazard rate functions .
Therefore, several researchers adopted numerous ideas, for example, Azzalini  defined skewness in the normal distribution by adding a skewness parameter as follows:where and are the probability density function (PDF) and cumulative distribution function (CDF) of the standard normal distribution. Its shape can be positively or negatively skewed depending on the values of , and for , it becomes standard normal. Mudholkar and Srivastava  presented the idea of exponentiated family by taking an extra parameter in the power of the CDF of an existing distribution. In particular, they derived exponentiated Weibull distribution, which was a more flexible distribution using the expression
Furthermore, this idea was adopted by other researchers to obtain different versions of exponentiated distributions including exponentiated Pareto, exponentiated gamma, exponentiated Rayleigh, exponentiated Lomax, exponentiated beta, exponentiated Kumaraswamy, exponentiated Gompertz, exponentiated Lindley, and exponentiated half-normal distribution.
Marshall and Olkin  proposed a procedure of adding a new parameter in present distributions by using the generator
This generator was initially used to modify exponential and Weibull distributions. Later on, it was used for other well-known distributions as well, for example, Pareto, normal, Lomax, gamma, Lindley, Fréchet, extended Weibull, Rayleigh, beta, and extended generalized Rayleigh.
Eugene et al.  suggested the beta-G distributions with the following generator:where is the CDF of the parent distribution and is the PDF of the beta distribution. Since its introduction, several distributions have been converted into beta-generated family, for instance, normal, Gumbel, exponential, Fréchet, Weibull, Pareto, modified Weibull, Laplace, Burr XII, generalized Pareto, Cauchy, and extended Weibull.
Jones  utilized a random variable beta for introducing a general family of univariate distributions. The new distribution family holds greater flexibility for fitting skewed and symmetric models. Zografos and Balakrishnan  presented the idea of gamma-generated distributions based on the proposal of Jones . They used the gamma distribution as the generator and CDF of any random variable as the parent distribution.
Furthermore, for the continuous distributions, Alzaatreh et al.  presented the idea of the transformed-transformer family of distributions known as T-X family, where the PDF of some continuous random variable and a function of the CDF instead of the original CDF, satisfying some conditions, were used there in the expression. The generator is given aswhere is the PDF of any continuous random variable. Particularly, they derived subfamilies which include new Weibull-X family , logistic-X family , weighted T-X family , and modified T-X family .
Mahdavi and Kundu  proposed a novel idea, known as alpha power (AP) transformation, introducing the additional parameter to the underlying continuous distribution. The AP transformation is represented as
Specifically, the generator was used for the transformation of the exponential distribution with one parameter into the AP exponential distribution with two parameters. Later on, the generator has been used by many researchers so as to have AP generalized exponential , AP Weibull distribution , AP transformed Pareto distribution , AP transformed inverse Lindley distribution , and alpha power exponentiated inverse Rayleigh .
Cordeiro et al.  derived the Ku–Weibull distribution and discussed its several properties. Later on, Cordeiro and de Castro  proposed the Kumaraswamy-G family of distributions, defined as follows:where are the shape parameters. Several distributions have been introduced using (8) such as Kumaraswamy transmuted exponentiated additive Weibull , Kumaraswamy generalized power Weibull , Kumaraswamy flexible Weibull extension , and Kumaraswamy exponentiated inverse Rayleigh . Ahmad  introduced a new general family of distributions based on the proposal of Shaw and Buckley  as
Another recent development in the distribution families is the approach of the cubic rank transmuted family, proposed by Granzotto et al.  and given by
Hence, for providing a better fit to the data, it is a common practice to amend the classical probability models so as to model the monotonic and nonmonotonic hazard functions. One such amendment is to produce a generator and use it on existing models to develop new probability classes. The aim behind the development of these generators is to eliminate some of the difficulties found in the present probability models. For this purpose, a new method is proposed in this manuscript, termed as Khalil new generalized family (KNGF) of distributions. The advised model in comparison with the existing probability distributions available in the literature will increase the flexibility as well as produce a better fit and will also be able to model the monotonic and nonmonotonic hazard rate function.
2. The Khalil New Generalized Family (KNGF)
In this section, we propose a new method for deriving new continuous probability distributions, termed as Khalil new generalized family (KNGF).
Let a random variable X follow Weibull distribution with the CDF as having and as the scale and shape parameters, respectively. Now, replacing in the Weibull density, we have . Henceforth, the CDF of the KNGF is defined aswhere is the considered baseline cumulative distribution function of the baseline model. The probability density function, survival function, hazard rate function, and reversed hazard rate function of the KNGF with scale and shape parameters are, respectively, defined as follows:
Therefore, a random variable x following PDF (12) is defined as the Khalil new generalized family of distributions, denoted as , where are the scale and shape parameters and is the parameter of the baseline model.
This paper is classified as follows: Section 3 consists of considering a special submodel, i.e., basic Pareto distribution, from the proposed family of distributions, and various statistical properties of the submodel are derived. Section 4 comprises estimating the parameters using the maximum likelihood method. To demonstrate the practicality of the suggested model, simulations and real-life datasets have been used in Section 5. Section 6 finally concludes the paper.
3. The Khalil New Generalized Pareto (KNGP) Distribution
Specifically, in this segment, we consider a submodel of the KNG family of distributions, termed as Khalil new generalized Pareto (KNGP) distribution, using the basic Pareto distribution  as a baseline model. The cumulative distribution function of the basic Pareto distribution isand its PDF is
Then, a random variable X following the KNGP distribution is denoted as with probability density function, cumulative distribution function, and reliability and hazard rate functions given in the following:
Probability density function is
Figures 1 and 2 illustrate the plots of the density function and cumulative distribution function with scale and shape parameters and scale parameter of the KNGP distribution, respectively. The density function varies significantly and changes with respect to scale and shape parameters . Keeping the scale parameter fixed , the PDF becomes more and more symmetric with increasing scale and shape parameter values (Figure 1). Similarly, Figure 3 demonstrates the hazard rate patterns of the proposed model. The hazard function is nonincreasing for any value of and . As the shape parameter values increase, the hazard function first increases and then decreases gradually (nonmonotonic).
3.1. Statistical Properties
The statistical properties of the KNGP distribution are given in the following sections.
If a random variable X has the KNGP distribution, then the rth moment of the KNGP distribution, say , takes the following form:
Proof. By definition,Using the exponent series ,Use the binomial expansion in the above expression to haveFor r = 1, we can have the mean of the KNGP distribution given by
3.1.2. Moment-Generating Function (MGF)
If a random variable X has KNGP , then the MGF of X is acquired using
The Taylor series yields the following simplified expression:
3.1.3. Entropy Measures
A very important measure in the reliability analysis is the measure of entropy. It is used to measure the amount of variation and uncertainty in the dataset. If the value of entropy is small, it indicates less uncertainty in the data. Hence, for measuring the amount of uncertainty of a random variable x following the KNGP distribution, Renyi  and Havarda and Charvat  entropies are considered. The entropies are as follows:
Proof. By definition, Renyi and q-entropy areThe Renyi entropy of the KNGP distribution, for , is as follows:Use the binomial expansion in the above expression to haveOn simplification of the above integrals, the Renyi entropy yields the following result:The q-entropy or -entropy, introduced by Havarda and Charvat , is defined asConsider the integralSubstituting the result of the above integral in , it reduces to
3.1.4. Mean Residual Function
The mean residual function of KNGP is defined byand solving the integral in (35) for the KNGP model,
Substituting the above result of the integral in (35), we get
The mean residual life function is thus obtained as follows:
3.1.5. Quantile Function
Let X follow the KNGP distribution with the PDF given in (15). Then, the quantile function of X, denoted by Q (u), is as follows:where u follows the uniform distribution over the interval [0, 1]. When u is replaced by q, the median, 1st quantile, and 3rd quartile can be obtained from (39) by simply substituting q = 0.5, 0.25, and 0.75, respectively.
3.1.6. Order Statistics
Let a random sample of size k from the KNGP distribution have the corresponding order statistics denoted by . Then, the PDF of the order statistics is given by
4. Parameter Estimation of the KNGP Distribution
The subsequent section presents the parametric estimation of the KNGP distribution using the maximum likelihood method.
4.1. Maximum Likelihood Estimation
Consider a random sample from the KNGP distribution. Then, its likelihood function is
The log-likelihood function of the KNGP distribution is obtained by taking the logarithm on both sides of (42) as follows:
For the MLE of unknown parameters of the distribution, the above nonlinear equation is simplified by taking its derivative w.r.t , respectively, and setting , , and . The normal equations are as follows:
The asymptotic confidence interval can be derived for the unknown parameters with the assumption that the MLEs of these parameters are approximately normal with mean and inverse Fisher information observed covariance matrix , defined as
Hence, the asymptotic confidence intervals for the parameters can be obtained as
where is the upper percentile of the standard normal distribution.
Simulation study and real datasets are used in this section to show the practicality of the suggested KNGP model.
In order to carry out a simulation study for studying the behavior of the MLEs, 1000 samples are generated from the KNGP distribution with various sets of parameter values, i.e., using the Monte Carlo simulation method. Various sample sizes (n = 50, 100, 500, and 1000) have been considered. The average estimates of the parameters, mean square errors, and biases are reported in Table 1. It can be observed from the results presented in Table 1 that, as the sample size increases , the estimated values of the parameters get quite closer to the assumed parameter values hence proving the property of consistency.
5.2. Real Data Application
The performance of the new model is investigated using three real datasets. Different submodels of Pareto distribution are considered for comparison such as Pareto (P) , basic Pareto (BP) , generalized Pareto (GP) , alpha power Pareto (APP) , and exponentiated generalized Pareto (EGP) . Dataset I is considered from , which describes the times among air conditioning equipment consecutive failures in a Boeing 720 airplane, dataset II is taken from  representing a random sample of average daily wind speed data collected in 2015 from meteorological Turkish services, and dataset III is taken from Bader and Hriest  which describes the strength measured in GPa for 1000 carbon fiber microcomposite specimens.
5.2.1. Dataset I (Failures of Air Conditioning Equipment Data)
74, 57, 48, 29, 502, 12, 70, 21, 29, 386, 59, 27, 153, 26, 326.
5.2.2. Dataset II (Average Daily Wind Speed Data)
2.8, 1.8, 3.2, 5.0, 2.4, 4.8, 2.9, 2.9, 2.3, 3.2, 2.3, 2.0, 1.9, 3.3, 4.4, 6.7, 4.3, 1.9, 2.2, 3.3, 2.1, 4.0, 2.0, 3.1, 3.8, 3.1, 3.2, 3.4, 2.8, 2.1, 3.1.
5.2.3. Dataset III (Carbon Fiber Microcomposite Specimens’ Data)
1.312, 1.314, 1.479, 1.552, 1.700, 1.803, 1.861, 1.865, 1.944, 1.958, 1.966, 1.997, 2.006, 2.021, 2.027, 2.055, 2.063, 2.098, 2.14, 2.179, 2.224, 2.240, 2.253, 2.270, 2.272, 2.274, 2.301, 2.301, 2.359, 2.382, 2.382, 2.426, 2.434, 2.435, 2.478, 2.490, 2.511, 2.514, 2.535, 2.554, 2.566, 2.57, 2.586, 2.629, 2.633, 2.642, 2.648, 2.684, 2.697, 2.726, 2.770, 2.773, 2.800, 2.809, 2.818, 2.821, 2.848, 2.88, 2.954, 3.012, 3.067, 3.084, 3.090, 3.096, 3.128, 3.233, 3.433, 3.585, 3.585.
Tables 2–4 contain the MLEs, Kolmogorov–Smirnov test results, and values for the datasets I, II, and III, respectively. The K–S test for the KNGP distribution has the smallest test statistic values in comparison with the competing submodels and is highly insignificant ( value > 0.05) for the three datasets showing better fit. Tables 5–7 display the results of goodness of fit with various selection criteria such as AIC (Akaike’s information criterion), CAIC (corrected Akaike’s information criterion), and BIC (Bayesian information criterion). Moreover, the KNGP distribution has the smallest values of AIC, CAIC, BIC, and HQIC in comparison with BP, P, GP, APP, and EGP for datasets I, II, and III, thus, demonstrating the superiority of the suggested model. Hence, KNGP is performing better on real datasets considered and simulated data as compared to the other variants of the Pareto distribution. Figures 4–6 illustrate the fitted empirical and theoretical PDFs and CDFs for datasets I, II, and III, respectively. Furthermore, the P-P and Q-Q plots are presented in Figures 7–9 for datasets I, II, and III. All the figures demonstrate that the KNGP distribution is the best fitted model among all the considered models and support the results in Tables 5–7.
A new family of distributions was suggested in this study which was termed as Khalil new generalized family of distributions. For the purpose of applications, Pareto distribution was used as an input model in the KNGF which resulted in a new family of distributions, named as the KNGP distribution. A number of statistical properties of the suggested distribution have been derived for the submodel. A simulation study was performed by generating data from KNGP, and the maximum likelihood estimates of the unknown parameter were obtained. The simulation results showed consistency of the parameter estimates of the KNGP model. The suggested model was also fitted to three real (monotonic and nonmonotonic) datasets to show its usefulness. KNGP provided a satisfactory fit to the datasets in comparison with basic Pareto, Pareto, generalized Pareto, alpha power Pareto, and exponentiated generalized Pareto. Hence, on the basis of findings, it is concluded that the proposed distribution is a more efficient model as compared to other distributions considered here for modeling real-life datasets.
The data used in this Manuscript is obtained from the authors upon request and citing this paper.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
The authors are thankful to the Deanship of Scientific Research at King Khalid University for awarding project ID: RGP. 2/190/42 and titled Advance Computational Methods for Solving Complex Computer Science and Mathematical Engineering Problems. The authors would also like to thank Suan Sunandha Rajabhat University for the scholarship support.
N. L. Johnson, S. Kotz, and N. Balakrishnan, Continuous Univariate Distributions, Wiley, New York, NY, USA, 1994.
M. Alizadeh, F. Merovci, and G. G. Hamedani, “Generalized transmuted family of distributions: properties and applications,” Hacettepe Journal of Mathematics and Statistics, vol. 46, no. 4, pp. 645–667, 2017.View at: Google Scholar
A. Azzalini, “A class of distributions which includes the normal ones,” Scandinavian Journal of Statistics, vol. 12, pp. 171–178, 1985.View at: Google Scholar
M. A. Selim and A. M. Badr, “The Kumaraswamy generalized power Weibull distribution,” Mathematical Theory and Modeling, vol. 6, no. 2, pp. 110–124, 2016.View at: Google Scholar
M. A. El-Damcese, A. Mustafa, B. S. El-Desouky, and M. E. Mustafa, “The Kumaraswamy flexible Weibull extension,” International Journal of Mathematics And Its Applications, vol. 55, p. 7, 2016.View at: Google Scholar
M. A. Ul Haq, “Kumaraswamy exponentiated inverse Rayleigh distribution,” Mathematical Theory and Modeling, vol. 6, no. 3, pp. 93–104, 2016.View at: Google Scholar
Z. Ahmad, “A new generalized class of distributions: properties and estimation based on type-I censored samples,” Annals of Data Science, pp. 1–4, 2018.View at: Google Scholar
M. M. Rahman, B. Al-Zahrani, and M. Q. Shahbaz, “Cubic transmuted Pareto distribution,” Annals of Data Science, vol. 14, pp. 1–8, 2018.View at: Google Scholar
S. W. Philbrick, “A practical guide to the single parameter Pareto distribution,” PCAS LXXII, pp. 44–85, 1985.View at: Google Scholar
A. Renyi, On Measures of Entropy and Information, Hungarian academy of sciences, Budapest, UK, 1961.
J. Havarda and F. Charvat, “Quantification method in classification Processes: concept of structural α-entropy,” Kybernetika, vol. 3, pp. 30–35, 1967.View at: Google Scholar
N. L. Johnson and S. Kotz, Continuous Univariate Distributions-2, Houghton Mifflin, Boston, MA, USA, 1970.
S. Lee and J. H. Kim, “Exponentiated generalized Pareto distribution: properties and applications towards extreme value theory,” Communications in Statistics-Theory and Methods, vol. 48, pp. 1–25, 2018.View at: Google Scholar
D. Aydin, “The new weighted inverse Rayleigh distribution and its application,” Mathemtics and Informatics, vol. 34, no. 3, pp. 511–523, 2019.View at: Google Scholar
M. G. Bader and A. M. Hriest, “Statistical aspects of fiber and bundle strength in hybrid composites,” in Progressin Science in Engineering Composites, T. Hayashi, K. Kawata, and S. Umekawa, Eds., pp. 1129–1136, ICCM– IV, Tokyo, Japan, 1982.View at: Google Scholar