Abstract

Statistical probability distributions are commonly used by data analysts and statisticians to describe and analyze their data. It is possible in many situations that data would not fit the existing classical distributions. A new distribution is therefore required in order to accommodate the complexities of different data shapes and enhance the goodness of fit. A novel model called the new generalized exponentiated Fréchet–Weibull distribution is proposed in this paper by combing two methods, the transformed transformer method and the new generalized exponentiated method. This novel modeling approach is capable of modeling complex data structures in a wide range of applications. Some statistical properties of the new distribution are derived. The parameters have been estimated using the method of maximum likelihood. Then, different simulation studies have been conducted to assess the behavior of the estimators. The performance of the proposed distribution in modeling has been investigated by means of applications to three real datasets. Further, a new regression model is proposed through reparametrization of the new generalized exponentiated Fréchet–Weibull distribution using the log-location-scale technique. The effectiveness of the proposed regression model is also investigated with two simulation studies and three real censored datasets. The results demonstrated the superiority of the proposed models over other competing models.

1. Introduction

Statistical distributions are extremely useful in describing many world phenomena. Specifically, finding an appropriate distribution is a fundamental requirement to analyze and interpret data properly. The suitable selection of distributions leads to valid inference and right conclusion. Many statistical distributions have been proposed and applied to fit real data in many applications, such as education, physics, chemistry, demography, management, and engineering. However, in many of these areas, data may display a complex pattern which cannot be adequately fit using the classical and traditional distributions. This complexity in data patterns has led to the need to develop statistical distributions that are more flexible, practical, and accurate in modeling them in the literature. Recently, several studies have attempted to extend the classical models and generate some new families of distributions (for a good review of these methods, see [1]).

Alzaatreh et al. [2] introduced a new general method for generating families of continuous distributions, called the transformed transformer (T-X) method. This method generalizes the beta-G [3] and Kumaraswamy-G [4] families by replacing the beta distribution and Kumaraswamy distribution with any continuous distribution for a random variable T defined on [a, b]. Particularly, the cumulative distribution function (cdf) of the T-X family can be defined aswhere is a real number and is a function of any cdf of a random variable X.

Many new statistical distributions have been proposed using this method, such as the gamma-normal distribution in [5], the odds generalized exponential-exponential distribution in [6], the new Weibull–Pareto distribution in [7], the Weibull–Burr type X distribution in [8], the Lindley–Pareto distribution in [9], the odd log-logistic logarithmic normal distribution in [10], the odd Lindley–Burr XII distribution in [11], the Topp–Leone-exponential Poisson distribution in [12], the Lomax–Gumbel (Fréchet) distribution in [13], the Weibull-gamma distribution in [14], the Topp–Leone generalized odd log-logistic Weibull distribution in [15], the Burr III-Marshall–Olkin–Weibull distribution in [16], the Hjorth uniform distribution in [17], the Xgamma-Lindley distribution in [18], the Fréchet–Topp–Leone–Kumaraswamy distribution in [19], and the Marshall–Olkin–Weibull exponential distribution in [20].

Abd-Elmonem et al. [21] applied the T-X method to introduce a new extended distribution, called Fréchet–Weibull distribution which is based on the Fréchet distribution as a generator. The cdf and the probability density function (pdf) of Fréchet–Weibull distribution with four parameters, namely, as scale parameters and as shape parameters, respectively, are given as

On the other hand, Gupta et al. [22] introduced the exponentiated method for which the existing distribution is generalized by adding an extra shape parameter to its pdf. Consequently, Cordeiro et al. [23] proposed a new class that generalizes the exponentiated method by adding two extra shape parameters to an existing distribution. Recently, Rezaei et al. [24] introduced a more general method by adding three extra shape parameters to an existing distribution. The cdf and pdf of this new exponentiated family are defined, respectively, aswhere G(x) and (x) are the cdf and pdf of any statistical distribution and a, b, and are positive real numbers. The exponentiated generalized half logistic Fréchet distribution introduced in [25] and the exponentiated generalized exponential Dagum distribution proposed in [26] can be regarded as members of this family. Another distribution is the exponentiated generalized extended Gompertz distribution in [27] that generalizes the Gompertz distribution.

It is the purpose of this paper to increase the flexibility of some existing distributions in order to accommodate the complexity of certain data. To that end, we combine the classical Fréchet–Weibull distribution with the new generalized exponentiated distribution class, providing the new generalized exponentiated Fréchet–Weibull distribution (NGEFWD). The proposed distribution can be used as an alternative to several existing distributions in modeling different applications. Another goal is related to the significance of regression modeling. Specifically, real data are frequently explained by other variables, which are referred to as explanatory variables or covariates. Hence, researchers have shown an increasing interest in investigating these relationships by considering regression analysis. Many regression models have been constructed in the literature recently based on some distributions of the response variable. In particular, log-location-scale regression models have been considered by many authors based on different distributions. Among these, Silva et al. [28] studied the log-Burr XII regression model, Carrasco et al. [29] introduced the log-modified Weibull regression model, Ortega et al. [30] proposed the log generalized modified Weibull regression model, Pescim et al. [31] developed a log-linear regression model based on the odd log-logistic generalized half-normal distribution, Altun et al. [32] suggested the log Zografos–Balakrishnan BXII distribution, Korkmaz et al. [33] proposed the log odd power Lindley–Weibull regression model, Baharith et al. [34] introduced the log odds exponential Pareto IV regression model, Cordeiro et al. [18] discussed the log-Xgamma Weibull regression model, Eliwa et al. [35] proposed the log odd Lindley half logistic regression model, Altun et al. [36] proposed the log additive odd log-logistic odd Weibull-Weibull regression model, Shama et al. [37] suggested the log gamma Gompertz regression model, and Anwaar Dhiaa and Sunbul Rasheed [38] provided two regression models derived from the Burr XII family of distributions. Then, a further objective of this paper includes introducing a new regression model based on the NGEFWD distribution.

This paper is organized as follows. In Section 2, the NGEFWD is introduced and some plots for the pdf and hazard rate function (hrf) of NGEFWD are provided. In Section 3, we derive the expansion of the pdf for the NGEFWD. In Section 4, we discuss some of the statistical properties of the new distribution. The maximum likelihood estimates (MLEs) of the model parameters are determined in Section 5. Section 6 discusses the simulation results. In Section 7, the NGEFWD is applied to three real datasets. In Section 8, we propose the log-NGEFW regression model and estimate the model parameters using the maximum likelihood estimation. Section 9 presents some simulation studies to estimate log-NGEFW regression model parameters. In Section 10, three real datasets are investigated to show the flexibility of the new regression model. Finally, Section 11 offers some concluding remarks.

2. The New Generalized Exponentiated Fréchet–Weibull Distribution

The NGEFWD can be obtained by replacing G(x) in equation (4) by the cdf in equation (2) and (x) in equation (5) by the pdf in equation (3). That is, a random variable X is said to have NGEFWD with seven parameters as shape parameters and as scale parameters if its cdf and pdf are defined, respectively, asand

The reliability function and hrf of NGEFWD can be obtained, respectively, as

For various values of the distribution’s parameters, Figures 1 and 2 illustrate the shapes of the NGEFWD’s pdf and hrf, receptively. It can be seen that the NGEFWD can demonstrate left skewed, symmetrical, right skewed, and reversed-J shaped densities. Also, it can take a form of decreasing, upside down bathtub, reversed bathtub, and reversed-J shaped hazard rates. Accordingly, NGEFWD can be considered as an appropriate model for fitting a variety of lifetime data in applied areas.

3. Expansion of pdf for NGEFWD

In the following, we can express the pdf of NGEFWD in equation (7) with an expanded form using the binomial expansion defined for a positive real power asfor and ( is a nonnegative integer).

Specifically, applying the binomial expansion in equation (9) three times, the pdf of the NGEFWD can be rewritten aswhere

4. Statistical Properties

In this section, we derive some useful statistical properties of the NGEFWD.

4.1. The Quantile Function and Median

The quantile function of the NGEFWD is defined aswhere and u is a uniformly distributed random variable. If we use u = 0.25 or 0.75, we get the first quantile or the third quantile of the NGEFWD, respectively.

The median of the NGEFWD is given aswhere .

4.2. The Galton Skewness and Moors Kurtosis

The Galton skewness (GS) measures the degree of the long tail (towards left if GS 0 or right side if GS 0). It is defined in [39] asand the Moors kurtosis (MK) measures the degree of tail heaviness (if increases, the tail of the distribution becomes heavier). It is defined in [40] aswhere is the quantile function in equation (12).

From Figure 3, the NGEFWD can be right skewed, and for fixed , the MK is a decreasing function of .

4.3. The Moment

The moment of the NGEFWD can be obtained aswhere is defined in equation (11).

Then, the mean and variance of the NGEFWD are, respectively, given aswhere is defined in equation (11).

4.4. The Moment Generating Function and Characteristic Function

Based on the expansion of , the moment generating function can be calculated based on the moment of the NGEFWD aswhere is defined in equation (11).

Similarly, we can obtain the characteristic function based on moment of the NGEFWD aswhere is defined in equation (11).

4.5. Order Statistics

Suppose is a random sample from NGEFWD, where is the order statistic; then, the pdf of this order statistic is defined aswhere f(x) and F(x) are the pdf and cdf of NGEFWD defined, respectively, in equations (7) and (6). By using the binomial expansion, we can write

Thus,

Substituting by equations (6) and (7) and applying the binomial expansion in equation (9) four times, we obtainwhere

4.6. Rényi Entropy

The Rényi entropy of a random variable represents a measure of variation of the uncertainty, and it is defined as

The Rényi entropy of the NGEFWD can be given as

By using the binomial expansion in equation (9) three times, we getwhere

Thus,

Then, the Rényi entropy of the NGEFWD can be obtained as

5. Estimation of the NGEFWD Parameters

This section provides the estimation of the unknown parameters using the maximum likelihood technique, which is the most widely used estimation method. Let be a random sample from the NGEFWD with unknown parameters ; then, the log likelihood function is given as

By taking the first partial derivatives of the log likelihood function with respect to , we obtainandwhere .

The MLEs of the unknown parameters can then be obtained by solving the system of nonlinear equations (32)–(38) numerically. Alternatively, equation (31) might be directly maximizing using an optimization technique in any software, such as the statistical R program.

6. Simulation Studies for the NGEFWD

In this section, some simulation studies are performed to examine the accuracy of the MLEs of the NGEFWD. The results were obtained by generating samples from the NGEFWD with different sample sizes, , and 500, and with various cases for the true parameter values asCase I: .Case II: .Case III: .

The quantile function in equation (12) is applied to generate random samples from the NGEFWD where is uniformly distributed. The mean square error (MSE) and the root mean square error (RMSE) were computed for each parameter in order to investigate its accuracy using the following relations:where .where is the number of generated samples, is the size for each sample, is the MLE, and is the true value of each parameter.

From Table 1, it can be seen that when the sample size n increases, the MLEs become closer to the true value of parameters, and hence the MSE and RMSE decrease and tend to zero. The results demonstrate that the maximum likelihood method provides an accurate estimation of the parameters for the NGEFWD.

7. Applications for the NGEFWD

In this section, some applications of the NGEFWD are provided to illustrate its usefulness, using three real datasets. The goodness of fit of the NGEFWD is compared with some of its submodels and a related distribution. Specifically, the fit of NGEFWD is compared to the following distributions.(i)The Weibull distribution (WD) with pdf as(ii)The Fréchet–Weibull distribution (FWD) with pdf in equation (3).(iii)The exponentiated Fréchet–Weibull distribution (EFWD) with pdf as(iv)The exponentiated generalized Fréchet–Weibull distribution (EGFWD) with pdf as(v)The Kumaraswamy–Weibull–Burr XII distribution (KWBXIID) in [41] with pdf as

The comparison is based on some different criteria, namely, the negative log likelihood function , the Akaike information criterion (AIC), the consistent Akaike information criteria (CAIC), the Bayesian information criteria (BIC), Hannan–Quinn information criterion (HQIC), and the Kolmogorov–Smirnov (KS) statistic as with its value.

The best model to fit data is the model with lowest values of AIC, CAIC, BIC, HQIC and KS and highest value. The MLEs of the model parameters were computed by using “optim” function in R program. Furthermore, the observed frequencies for the data are plotted and compared with the expected frequencies for each model. Tables 24 summarize each dataset while the results of the analyzed datasets are reported in Tables 57 and Figures 46.

7.1. First Dataset

We will consider the dataset discussed in [42], which represents the ages of 155 patients suffering from breast tumors from June to October in 2014.

7.2. Second Dataset

This dataset was discussed in [43] in which it contains sums of skin folds in 202 athletes collected at the Australian Institute of Sport.

7.3. Third Dataset

Data from [44], representing the fatigue times of 6061-T6 aluminum coupons comprising 101 observations with maximum stress per cycle of 31,000 psi, is considered.

From Tables 57, it can be seen that the NGEFWD is the best model to fit all of the considered datasets in which it has the smallest AIC, CAIC, BIC, HQIC, and KS and the largest value. Also, from Figures 46, it is clear that the NGEFWD is the closest to the actual distribution of all data. Thus, the NGEFWD can be considered as the best model for all real datasets considered.

8. The Log New Generalized Exponentiated Fréchet–Weibull Regression Model

Assume that X is a random variable from the NGEFWD given in equation (7) and let . Then, the cdf and pdf of the log new generalized exponentiated Fréchet–Weibull (LNGEFW) regression model with the transformation parameters and can be expressed, respectively, aswhere are the scale parameters, are the shape parameters, and is the location parameter.

The survival function of the LNGEFW regression model is given as

The standardized random variable has the following pdf:with survival function as

Based on the LNGEFW regression model, a linear regression model can be defined aswhere is the random error with pdf in equation (47), , , , , , , and are unknown parameters, and is the explanatory variable vector. The parameter is the location of . Then, the location parameter vector is defined as a linear model , where is a known model matrix.

8.1. Maximum Likelihood Estimation of the LNGEFW Regression Model

For the right-censored lifetime data, let be random sample of n observations where each random response variable is obtained as . Let be the log-lifetime and be the log-censoring time which are independent and random; then, the likelihood function for the parameter vector is given aswhere is the indicator random variable. Then, the log likelihood function can be obtained aswhere and are given in equations (47) and (48), respectively. Thus, we havewhere denotes the number of uncensored observations and .

The MLE of the parameter vector can be obtained by maximizing the log likelihood function in equation (52). The “optim” function in the statistical program R might be applied to obtain the MLEs.

8.2. Residual Analysis

After the regression model has been formulated, it is important to perform a residual analysis. It is derived to evaluate the adequacy of the fitted model and check outlier observations. In this study, we conducted residual analysis based on the martingale residual and deviance residual.

8.2.1. Martingale Residual

Barlow and Prentice [45] introduced the martingale residual aswhere is the censor indicator; if the observation is lifetime and if the observation is censored.

The martingale residual of the LNGEFW regression model iswhere . takes value between and +1 and has skewness.

8.2.2. Deviance Residual

Therneau et al. [46] defined the deviance residual to reduce the skewness symmetrically distributed around zero aswhere is given in equation (54). The deviance residual of the LNGEFW regression model is

9. Simulation Studies for the Log New Generalized Exponentiated Fréchet–Weibull Regression Model

We conduct Monte Carlo simulation studies for various values of sample size , parameter values, and different censoring percentages to investigate the accuracy of the MLE in the LNGEFW regression model. The lifetimes are sampled from the NGEFWD in equation (7) considering the following reparametrization: and , and by taking , where is the explanatory variable generated from a standard uniform distribution. The considered values for the parameters areCase I: Case II:

Noninformative censoring is commonly used in different studies. The censoring times are generated from a uniform distribution , where the indicator random variable is given as

This is adjusted until the censoring percentages of 0.1, 0.3, and 0.5 are reached. The lifetimes considered in each fit are calculated as . This simulation was repeated times, and for each parameter, the mean estimate, MSE, and RMSE are calculated. The results are listed in Table 8.

From Table 8, it is shown that when sample sizes increase, the MSE and RMSE of estimates decrease and the estimates tend to the true values of the parameters. Also, when censoring levels increase, the MSE and RMSE of parameter estimates increase for the same sample size. The results indicate that the maximum likelihood method provides consistent estimation for the parameters of the LNGEFW regression model.

10. Applications for the Log New Generalized Exponentiated Fréchet–Weibull Regression Model

In this section, three real datasets are applied to illustrate the usefulness of the LNGEFW regression model. For three applications, the maximum likelihood method is applied to obtain the estimates of the parameters for the LNGEFW regression model. The estimates and their standard errors (SEs) are calculated along with the AIC, CAIC, BIC, and HQIC to compare the LNGEFW regression model with some competitive models, namely, log-Burr XII (LBXII) regression model in [28], log Topp–Leone–Fréchet (LTLF) regression model in [47], and log Topp–Leone generated Weibull (LTLGW) regression model in [48]. The estimates and their SEs are reported in Tables 911, while Tables 1214 summarize the information criteria for each analyzed dataset.

10.1. Voltage Data

Lawless [49] introduced an experiment in which specimens of solid epoxy electrical insulation were considered in an accelerated voltage life test. The sample size of data is n = 60 with a percentage of 10% censored observations, and it has three levels of voltage: 52.5, 55.0, and 57.5 kV. The variables considered in the study are as follows: : failure times for epoxy insulation specimens, : censoring indicator (0 = censoring, 1 = lifetime observed), and : voltage (kV). The results are presented by the fitting modelwhere follows the NGEFWD.

10.2. Leukemia Data

Leukemia data are presented in [49]. These data contain information of 33 patients who were diagnosed with leukemia. The variables involved in the study are as follows: : survival time, : log survival time, : censoring indicator (0 = censoring, 1 = lifetime), : white blood cell characteristics test (0 = negative, 1 = positive), and : white blood cell count. The fit of the regression model is described aswhere follows the NGEFWD.

10.3. Stanford Heart Transplant Data

Kalbfleisch and Prentice [50] considered the Stanford heart transplant dataset. The dataset contains the survival time of 103 patients since acceptance into transplant program to death. The following variables are displayed in the study: : log survival time, : censoring indicator (0 = censoring, 1 = dead), : the age of patients, : the previous surgery (0 = No, 1 = Yes) and : the transplant (0 = No, 1 = Yes). The model fitted can be written aswhere follows the NGEFWD.

The results in Tables 1214 show that the LNGEFW regression model has smallest values of AIC, CAIC, BIC, and HQIC for the voltage, leukemia, and Stanford heart transplant data compared to the other competitive models. Therefore, the LNGEFW regression model might provide the best fit to the three data among other models. Figures 79 represent the deviance residuals against the index of the observations for all datasets. It can be noted that all observations fall within the interval (−3, 3), except observation 26 in Figure 9. Thus, observation 26 in Figure 9 is a possible outlier. In addition, from these figures, it can be seen that all points lie inside the envelope, which indicates that the LNGEFW regression model provides good fit to all datasets.

11. Conclusions

Introducing flexible distributions to real data models is of great importance to more accurately model different real datasets. In addition, many regression models must be developed to analyze the effect of covariates on the data in numerous practical applications. Thus, in this article, the NGEFWD is proposed in order to overcome the complexity of the pattern of some datasets. Some useful statistical properties of the new distribution are derived. The maximum likelihood method is applied to estimate the model’s parameters. Additionally, in order to examine these MLEs, some Monte Carlo simulation studies are conducted for different cases for which the results indicate that the proposed estimators have a good performance, and it is quite clear from the results that as sample size increases, a better estimate is obtained. Thus, the consistency, normality, and maximum efficiency properties of the MLE are effective. The suggested distribution can be applied in different applications, such as engineering, reliability, and many other real-life data. Hence, the usefulness of the new distribution is examined by analyzing three real datasets. It has been observed that the NGEFWD distribution consistently provides a better and accurate fit than some other common competitive models. Moreover, based on the NGEFWD, the log-location-scale technique is applied to introduce the LNGEFW regression model. The maximum likelihood method for the right-censored data is considered to estimate the parameters of the LNGEFW regression model. Some simulation studies with various values of parameters, sample size, and censoring percentage are considered to demonstrate the new regression model’s versatility. Based on the results of two Monte Carlo simulation studies conducted for the LNGEFW regression model, the MLEs provided consistently good results. The LNGEFW regression model performed very well when applied to three real-world datasets and provided the best fits among some other competitor regression models based on the information criteria. Therefore, it can be considered the most appropriate model among all the others. Hence, NGEFWD and its extension LNGEFW regression model are expected to attract the attention of various applied sciences due to their suitability and flexibility. Further studies could be conducted by using other methods of estimation, such as the moment estimation method, and different regression techniques, such as quantile regression.

Data Availability

The references for the data used to support the findings of this study are cited within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.