Abstract

In recent years, the developments of new families of probability distributions have received greater attention as a result of desirable properties they exhibit in the modelling of data sets. The Harmonic Mixture Weibull-G family of distributions was developed in this study. The statistical properties were comprehensively presented and five special distributions developed from the family. The hazard functions of the special distributions were shown to exhibit various forms of monotone and nonmonotone shapes. The applications of the developed family to real data sets in medical studies revealed that the special distribution (Harmonic mixture Weibul Weibull distribution) provided a better fit to the data sets than other competitive models. A location-scale regression model was developed from the family and its application demonstrated using survival time data of hypertensive patients.

1. Introduction

Advances in the field of medicine are critical to the well-being of humankind. To this end, the need for the use of appropriate and very efficient probability distributions in the modelling of medical data is fundamentally important. The efficient modelling of medical data is useful in providing good understanding of the distribution of disease incidence and prevalence in medical studies.

In medical and biological studies, several phenotypic traits including chronic conditions such as cancer, diabetes, hypertension, and cardiovascular diseases among others are usually encountered. Appropriate knowledge about the distribution of disease incidence and prevalence in a population enhances the development of appropriate hypotheses about underlying mechanisms of health and disease [1]. This is profoundly important in advancing the course of medicine.

In medical and biological studies, the Weibull distribution among numerous classical distributions is a widely applied model for analyzing data with monotone hazard rate shapes. For complex biological phenotypic traits with nonmonotone hazard rate shapes, the Weibull distribution does not have the flexibility to model such data. Consequently, new families of distributions in the form of extended or modified versions of the Weibull distribution have been introduced in literature with the attempt of increasing its flexibility. Some examples include the following: Marshall-Olkin Weibull generated family [2], exponentiated power generalized Weibull power series family of distributions [3], complementary generalized power Weibull power series family of distributions [4], the Burr-Weibull power series family [5], extended Weibull-G family [6], Weibull Burr X–G family of distributions [7], the Weibull Marshall–Olkin family [8], the gamma-Weibull-G family [9], generalized odd Weibull generated family [10], the beta Weibull-G family [11], Kumaraswamy Weibull-generated family [12], generalized extended Weibull power series family of distributions [13], the inverse Weibull power series family [14], the Marshall-Olkin extended Weibull family of distributions, [15] and the extended Weibull power series family [16].

In this study, we proposed a novel generalization for the Weibull-G family, called the Harmonic mixture Weibull-G family by combining the Harmonic mixture-G [17] and the Weibull-G [18] families. The major motivations behind generating this family include the following: to develop special distributions capable of modelling medical data that are characterized with bimodality; to generate distributions with the capability of modelling medical data that are characterized with both monotone and nonmonotone hazard rate shapes; to produce special distributions that can generalize some well-known models in the literature; to generate more flexible distributions that take into consideration skewness, kurtosis, and tail variations in the modelling of medical data; to develop alternative distributions with superior parametric fits to data in medical studies than existing classical distributions; and to develop a location-scale regression model for studying the relationship between a response variable and a set of covariates.

The remainder of the article is structured as follows: Section 2 presents the development of the new family. Section 3 presents some statistical properties of the family. Section 4 presents the maximum likelihood estimation of the parameters. Section 5 presents some special distributions. Section 6 presents the location-scale regression model. Section 7 presents simulation results. Section 8 presents the applications of the developed family. Section 9 finally presents the conclusions of the study.

2. Development of the Harmonic Mixture Weibull-G Family

Suppose that the continuous random variable follows the Weibull-G family of distributions. Then, according to Bourguignon et al. [18], the cumulative distribution function (CDF) and probability density function (PDF) are, respectively, given by and

If the random variable follows the Harmonic mixture-G (HM-G) family, then according to Kharazmi et al. [17], the CDF and PDF are, respectively, given by and

The Harmonic mixture Weibull generated (HMW-G) family of distributions is developed in this section by combining the CDFs of the HM-G and Weibull-G families. Suppose that the random variable follows the HMW-G family of distributions, the CDF of the HMW-G family is given by where and are scale parameters, is a shape parameter, and is a vector of parameters. When , the HMW-G family reduces to the Marshall-Olkin Weibull-G family of distributions. The corresponding PDF of the HMW-G family is the first derivative of its CDF. Thus, the PDF is given by

The hazard rate function of the family is given by

Lemma 1. The mixture representation of the density function of the HMW-G family is where

Proof. Using the binomial series expansion ,Thus, the PDF of the HMW-G family can be written as Using the Taylor series, we have Applying the binomial series expansion, Thus, This completes the proof.

3. Statistical Properties

In this section, statistical properties of the HMW-G family of distributions are presented.

3.1. Quantile Function

The quantile function plays an important role in simulating random samples from a given distribution. For a given distribution, the characteristics such as median, kurtosis, and skewness can also be described using the quantile function.

Proposition 2. The quantile function of the HMW-G family for is given by

Proof. Using the CDF of the HMW-G family defined in equation (5), let be a random variable having the uniform distribution on the interval [0, 1]. Then, the quantile, denoted by is obtained such that, This implies that Hence, the quantile is obtained as the solution of This completes the proof.

3.2. Moments

In this section, the expression for the order moment of the HMW-G family of distributions is derived. It can be used to compute measures of dispersion, kurtosis, and skewness of data sets in medical studies.

Proposition 3. The noncentral moment of the HMW-G family of distributions is

Proof. The noncentral moment is defined as Substituting the mixture representation of the density function into the definition, we have This implies that This completes the proof.

3.3. Incomplete Moment

In this section, the expression for the incomplete moment of the HMW-G family of distributions is derived. It can be used to determine the mean deviation or median deviation of data sets in medical studies.

Proposition 4. The incomplete moment of the HMW-G family of distributions is

Proof. By definition, the incomplete moment is given by Substituting the mixture representation of the density function into the definition, we have This implies that

3.4. Moment Generating Function

In this section, the expression for the moment generating function (MGF) of the HMW-G family of distributions is presented. The MGF is useful in finding the moments of a random variable. The MGF of a random variable having the HMW-G distribution if it exists is given by the following proposition.

Proposition 5. The MGF of the HMW-G family of distribution is given by

Proof. By definition, the MGF is given by Using the Taylor series expansion, This implies that Substituting into equation (30) gives This completes the proof.

3.5. Mean Residual Life

The Mean Residual Life (MRL) function is a function that characterizes the distribution function uniquely [19]. It describes the average survival time of a component after it exceeds a specific time. The MRL function plays a key role in survival analysis when analyzing the event time of a given phenotypic trait in medical studies.

Proposition 6. If is a random variable representing the life time of a component with distribution function , then the MRL of the HMW-G family of distributions is where .

Proof. By definition, the MRL is given by Hence, substituting the first incomplete moment, into equation (33) gives This completes the proof.

3.6. Identifiability

The identifiability property of the HMW-G family is studied in this section. The identifiability property of the model is essential to ensure that precise inferences are possible.

Proposition 7. Let be HMW-G family random variable with CDF and be HMW-G family random variable with CDF . Then, the HMW-G family is identifiable if and .

Proof. For the HMW-G family of distributions to be identifiable,.
Hence, If and , Hence, the identifiability condition is satisfied. This completes the proof.

4. Parameter Estimation

In this section, the maximum likelihood estimation (MLE) procedure is presented for the estimation of the unknown parameters of the HMW-G family. Let be a random sample of size from the HMW-G family of distributions with an unknown parameter vector where is a parameter vector for the baseline distribution. Under these settings, the total log-likelihood function is

The score vectors of the likelihood function are obtained by taking partial derivatives of (35) with respect to the parameters as

By setting the score vectors to zero, the simultaneous solution of the system of nonlinear equations gives the maximum likelihood estimates of the parameters. However, this nonlinear system of equations does not have a closed form. Thus, we apply numerical optimization to maximize the log-likelihood function directly using R software.

5. Special Distributions

In this section, some special cases of the HMW-G family of distributions are developed and studied.

5.1. Harmonic Mixture Weibull Burr III Distribution

Suppose that the baseline model of the HMW-G family is the Burr III distribution with CDF and PDF, respectively, defined by Burr [20] as and . Then, the PDF and hazard function of the HMW-Burr III (HMWBIII) distribution are given, respectively, by where , and

The density plot of the HMWBIII distribution exhibited a variety of shapes such as; reverse -shape, -shape, right skewed, various forms of symmetric, and left skewed shapes as shown in Figure 1. The hazard rate function also showed varied shapes such as; upside-down bathtub, monotone decreasing, and various forms of monotone increasing failure rates for some selected values.

The quantile function for the HMWBIII distribution is given by

5.2. Harmonic Mixture Weibull Lomax Distribution

Considering the Lomax distribution as the baseline model with CDF and PDF, respectively, defined by Lomax [21] as and , the PDF of the Harmonic mixture Weibull Lomax (HMWL) distribution is given by where . The hazard function is given by

The density function of HMWL distribution exhibited a wide variety shapes which include -shape, reverse -shape, right skewed, left skewed, and different forms of symmetric shapes as shown in Figure 2. The hazard rate function also exhibited varied shapes such as, upside down bathtub, monotone decreasing, and different forms of monotone increasing failure rates.

The quantile function for the HMWL distribution is given by

5.3. Harmonic Mixture Weibull Weibull Distribution

Consider that the Weibull distribution is a baseline model with CDF and PDF, respectively, defined by Weibull [22] as and . Then, the PDF of the Harmonic mixture Weibull Weibull (HMWW) distribution is given by where . The hazard function is given by

The density plot of the HMWW distribution showed a wide variety of shapes such as; reverse J-shape, J-shape, right skewed, left skewed, and symmetric (with various levels of kurtosis) as shown in Figure 3. The hazard rate function also showed varying shapes such as, bathtub, upside-down bathtub, monotone decreasing, and various forms of monotone increasing failure rates for some selected parameter values.

The quantile function for the Harmonic mixture Weibull Weibull distribution is given by

5.4. Harmonic Mixture Weibull Fréchet Distribution

Considering the Fréchet distribution as a baseline model with CDF and PDF, respectively, defined by Fréchet [23] as and , the PDF of the Harmonic mixture Weibull Fréchet (HMWF) distribution is given by where . The hazard function is given by

The density plot of the HMWF distribution exhibited a wide variety of attractive shapes such as, J-shape, reverse J-shape, right skewed, left skewed, and various forms of symmetric shapes as shown in Figure 4. The hazard rate function also showed varying shapes such as; upside-down bathtub, monotone decreasing and different forms of monotone increasing failure rates for some selected parameter values.

The quantile function for the Harmonic mixture Weibull Fréchet distribution is given by

5.5. Harmonic Mixture Weibull Normal Distribution

Consider that the Normal distribution is a baseline model with the CDF and PDF, respectively, given by and , , , . Then, the PDF of the Harmonic mixture Weibull Normal (HMWN) distribution is given by where . The hazard function is given by

The density plot of the HMWN distribution exhibited a very wide variety of attractive shapes such as unimodal right skewed, unimodal left skewed, symmetric (with different levels of kurtosis), J-shape, reverse J-shape, bimodal (with different levels of kurtosis), bimodal left skewed, and N-shapes as shown in Figure 5. The hazard rate function also showed a wide variety of very flexible shapes such as, bathtub, upside-down bathtub, various forms of modified upside-down bathtubs, monotone decreasing, and different forms of monotone increasing failure rates for some selected values as shown in Figure 6.

The quantile function for the Harmonic mixture Weibull Normal distribution is given by

6. Log-HMWW Location-Scale Regression

In this section, the log-HMWW regression model is presented. Suppose the random variable follows the HMWW distribution, then follows the log Harmonic mixture Weibull Weibull (LHMWW) distribution. Let Following the given reparameterization, the density function of the LHMWW distribution is where is the location parameter, , , and are the scale parameters and is the shape parameter. The density plot of the LHMWW distribution exhibited varying shapes such as, J-shape, reverse J-shape, right skewed, symmetric, and left skewed shapes as shown in Figure 7. By these properties, the LHMWW distribution is capable of modelling right skewed, symmetric and left skewed dependent variable with covariates in medical studies.

The corresponding survival function to (54) is given by

Suppose that is the standardized random variable, then the PDF is written as

By using the LHMWW density, we develop the LHMWW location-scale regression model with the following regression structure where is the location parameter which depends on a particular set of covariates, is a parameter vector for the regression model, is the set of covariates, and is the error term that follows the LHMWW distribution. The unknown parameters of the LHMWW regression model are estimated using the maximum likelihood estimation procedure. The log-likelihood function of the LHMWW regression model is given by where is the number of observations. By maximizing the log-likelihood function in (58), the parameter estimates of the LHMWW regression model are obtained. The adequacy of the regression model is evaluated by using the Cox-Snell residuals [24]. The Cox-Snell residuals of the LHMWW regression model are given by , where is defined as in (55). The Cox-Snell residuals plots are expected to follow the standard exponential distribution if the LHMWW regression model gives a good fit to a data set.

7. Simulation

In this section, the finite sample properties of the maximum likelihood estimators of the parameters are investigated using Monte Carlo simulations. The Monte Carlo simulations were performed by using the estimators of the HMWBIII distribution. The quantile function of the HMWBIII distribution was used to generate random samples from the HMWBIII distribution. The simulation experiment was replicated 1000 times for each of the sample sizes 50, 100, 200, 300, and 600 with parameter values I: (0.8, 0.3, 0.7, 0.5, 0.1), II: (0.9, 0.2, 0.6, 0.5, 0.2), III: (0.1, 0.3, 0.8, 1.2, 0.3), and IV: (0.9, 0.2, 0.5, 0.9, 0.1). The average estimate (AE), the average bias (AB), the root mean square error (RMSE), and the coverage probability (CP) were used to assess the performance of the estimators of the parameters. Generally, the AE values converge to the actual parameter values as the sample size increases and the RMSE also decreases as the sample size increases. The AB values also converge to zero (0) with increase in the sample size as shown in Table 1 and Table 2. The CP values for most of the estimators are also observed to revolve around the nominal value of 0.975. These characteristics demonstrate that the maximum likelihood method works very effectively in estimating the parameters of the developed family. It also shows that the estimators of the developed family are asymptotically consistent, efficient and unbiased.

8. Applications of the HMW-G Family

The applications of the special distributions (HMWL and HMWW) of the HMW-G family to real data sets in medical studies are illustrated in this section. To this end, the special distributions of the family are fitted to real data sets and their performances compared to other competing distributions including generalized inverse Weibull (GIW) distribution [25], odd generalized exponential Weibull (OGEW) distribution [26], generalized odd inverse exponential Weibull (GOIEW), and generalized odd inverse exponential Lomax (GOIEL) distributions [27, 28], and exponentiated Lomax (E-Lx) distribution [29]. The total time on test (TTT) plot due to Aarset [30] is used in assessing the applicability of the special distributions to the real data sets. Goodness of fit tests such as Anderson-Darling (AD) test, Cramér-von Mises (CVM) test, and Kolmogorov-Smirnov (K-S) test as well as Akaike information criterion (AIC), corrected AIC (AICc), Bayesian information criterion (BIC), and the log-likelihood are used to assess the performances of the fitted distributions. The values of the K-S test are provided. A model with the least values of the goodness of fit measures and highest value of the log-likelihood represents the best fitted model for the data set.

8.1. First Application

The first data set represents the ordered survival times of blood cancer patients. The data is found in Abouammoh et al. [31]. It can also be found in Amadu [27] and Amadu et al. [28]. The ordered survival times for 40 patients are given in Table 3.

The TTT plot in Figure 8 indicates that the blood cancer data exhibit an increasing failure rate and hence, the HMW-G family is appropriate to fit the data set.

In Table 4, the maximum likelihood parameter estimates of the fitted distributions is presented.

Table 5 presents the goodness of fit measures of the fitted distributions on the blood cancer data. The results generally show that the HMWW and HMWL distributions provide better fits to the blood cancer data than the other competing models with the HMWW distribution being the overall best fitted model.

Figure 9 shows the densities and CDFs plots of the fitted models. The results give a confirmation that the HMWW distribution provides a better fit to the data than the other competing models.

8.2. Second Application

The second data set represents the survival times (life lengths in years) until onset of diabetes from a random sample of 105 patients obtained from the Bolgatanga Regional Hospital in the Upper East region of Ghana. The data set is shown in Table 6.

The TTT plot in Figure 10 indicates that the diabetes data exhibit an increasing failure rate and hence, the HMW-G family is appropriate to fit the data set.

In Table 7, the maximum likelihood estimates of the parameters of the fitted distributions are presented.

Table 8 presents the goodness of fit measures of the fitted models. The results show that the special distributions (HMWW and HMWL) of the HMW-G family generally provide better fits to the diabetes data than the other competing models with the HMWW distribution being the overall best fitted model.

Figure 11 shows the densities and CDFs plots of the fitted models. The results confirm that the HMWW and HMWL distributions of the HMW-G family generally provide better fits to the data than the other competing models.

8.3. Third Application

The third data set represents the survival times (life lengths in years) until onset of hypertension from a random sample of 119 patients obtained from the Bolgatanga Regional Hospital in the Upper East region of Ghana. The data set is shown in Table 9.

The TTT plot in Figure 12 indicates that the hypertension data exhibit an increasing failure rate and therefore, the HMW-G family is appropriate to fit the data set.

Table 10 shows the maximum likelihood estimates of the parameters of the fitted distributions.

The goodness of fit measures of the fitted models on the hypertension data are given in Table 11. The results show that the HMWW and HMWL distributions of the HMW-G family generally provide better fits to the data than the other competing models with the HMWW distribution being the overall best fitted model.

The densities and CDFs plots of the fitted models are shown in Figure 13. The results give a confirmation that the HMWW distribution provides a better fit to the data than the other competing models.

8.4. Fourth Application

In this section, the application of the LHMWW location-scale regression model is demonstrated by modelling a real data set. The data set obtained from the Bolgatanga Regional Hospital in the Upper East Region of Ghana represents the survival times (life lengths in years) until onset of hypertension from a random sample of 119 patients with gender as a covariate. The gender (, ) is presented in brackets for each survival time. The data set is given in Table 12.

The dependent variable, time until the onset of hypertension , is modelled with gender (, ) as the covariate. To this end, the following regression model is fitted to the data set where follows the LHMWW distribution. The performance of the LHMWW regression model was assessed by comparing with the log Marshal-Olkin Weibull Weibull (LMOWW) regression model. The parameter estimates of the regression models are presented in Table 13. The goodness of fit measures of the regression models show that the LHMWW regression model performs better than the LMOWW regression model. From the parameter estimates of the LHMWW regression model, gender is statistically significant at the 5% level of significance. Thus, the LHMWWW regression results show that the time frame for onset of hypertension is not the same in males and females, and this is evidenced by the significant influence of gender on the survival times of hypertension. This is a revelation of gender differences in relation to time until the onset of hypertension. The finding is useful and consistent with that of Paresh et al. [32].

The likelihood ratio test (LRT) was also performed to compare the LHMWW regression model and the LMOWW regression model. The LRT statistic of 7.8791 with a value of 0.0050 showed that the LHMWW regression model performs better than the LMOWW regression model.

The Cox-Snell residuals were used to assess the adequacy of the LHMWW regression model. The P-P plot results in Figure 14 show that the LHMWW regression model provides a very good fit to the data set and therefore can be adequately applied for modelling real life data in medical studies.

9. Conclusion

A new family of probability distributions called the Harmonic mixture Weibull-G (HMW-G) family of distributions is introduced in this work. The statistical properties of the family including quantile function, moments, incomplete moment, moment generating function, and mean residual life were comprehensively derived. Five special distributions (HMWBIII, HMWL, HMWW, HMWF, and HMWN) of the family were developed and studied. The density plots of the special distributions showed a wide variety of very attractive shapes making them very suitable for modelling bimodal data sets as well as left skewed, right skewed, and symmetric data sets in medical studies. The hazard function plots also showed a wide variety of shapes making the family very suitable for modelling data with both monotone and nonmonotone failure rates. The maximum likelihood method was used in estimating the parameters of the HMW-G family. The performance of the maximum likelihood estimators was assessed using Monte-Carlo simulation studies. The LHMWW location-scale regression model was developed to investigate the effect of covariates on a response variable that follows the LHMWW distribution. The usefulness of the HMW-G family was demonstrated with applications to real data sets in medicine. The applications empirically showed that the HMWW distribution provides a better fit to the given data sets than the other competing models. Finally, the application of the regression model showed that the LHMWW regression model provided a very good fit to the given data and hence can be adequately applied for modelling data in medical studies. As part of our future studies, sensitivity analysis of the regression model will be performed.

Data Availability

The (blood cancer, diabetes, and hypertension) data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this work.