Abstract

Statistical distributions play a prominent role in applied sciences, particularly in biomedical sciences. The medical data sets are generally skewed to the right, and skewed distributions can be used quite effectively to model such data sets. In the present study, therefore, we propose a new family of distributions to model right skewed medical data sets. The proposed family may be named as a flexible reduced logarithmic-X family. The proposed family can be obtained via reparameterizing the exponentiated Kumaraswamy G-logarithmic family and the alpha logarithmic family of distributions. A special submodel of the proposed family called, a flexible reduced logarithmic-Weibull distribution, is discussed in detail. Some mathematical properties of the proposed family and certain related characterization results are presented. The maximum likelihood estimators of the model parameters are obtained. A brief Monte Carlo simulation study is done to evaluate the performance of these estimators. Finally, for the illustrative purposes, three applications from biomedical sciences are analyzed and the goodness of fit of the proposed distribution is compared to some well-known competitors.

1. Introduction

The statistical analysis and modeling of lifetime phenomena are essential in almost all areas of applied sciences, particularly, in biomedical sciences. A number of parametric continuous distributions for modeling lifetime data sets have been proposed in literature including exponential, Rayleigh, gamma, lognormal, and Weibull, among others. The exponential, Rayleigh, and Weibull distributions are more popular than the gamma and lognormal distributions since the survival functions of the gamma and the lognormal distributions cannot be expressed in closed forms and hence both require numerical integration to arrive at the mathematical properties. The exponential and Rayleigh distributions are commonly used in lifetime analysis. These distributions, however, are not flexible enough to counter complex forms of the data. For example, the exponential distribution is capable of modeling data with constant failure rate function, whereas the Rayleigh distribution offers data modeling with only increasing failure rate function. The Weibull distribution, also known as the super exponential distribution, is more flexible than the aforementioned distributions. The Weibull distribution offers the characteristics of both the exponential and Rayleigh distributions and is capable of modeling data with monotonic (increasing, decreasing, and constant) hazard rate function. Unfortunately, the Weibull distribution is not capable of modeling data with nonmonotonic (unimodal, modified unimodal, and bathtub-shaped) failure rate function. In some medical situations, for example, neck cancer, bladder cancer, and breast cancer, the hazard rate is shown to have unimodal or modified unimodal shape. The hazard rates for neck, bladder, and breast cancer recurrence after surgical removal have been observed to have unimodal shape. In the very initial phase, the hazard rate for cancer recurrence begins with a low level and then increases gradually after a finite period of time after the surgical removal until reaching a peak before decreasing. Another example of the unimodal shape is the hazard of infection with some new viruses, where it increases in the early stages from low level till it reaches a peak and then decreases; for detail, see [1]. In view of the importance of unimodal failure rate function in biomedical sciences, a series of papers have been appeared to propose new distributions capable of modeling medical data with unimodal failure rate function [28]. In the recent years, the researchers have shown a trend in proposing new families of distributions to obtain more flexible models. In this regard, [9] introduced the Marshall-Olkin generated (MOG) family by introducing an extra parameter to the Weibull distribution. The cumulative distribution function (cdf) of the MOG family is given bywhere is an additional parameter and is the cdf of the baseline model which may depend on the parameter vector . [10] proposed another method of constructing new lifetime distributions known as alpha power transformation approach via cdf

Using (2), [10] and [11] introduced the alpha power exponential (APE) and alpha power transformed Weibull (APTW) distributions, respectively. We further carry this branch of distribution theory and introduce a new flexible class of distributions which can be used in modeling unimodal medical care data sets. Tahir and Corderio [12] proposed the exponentiated Kumaraswamy G-logarithmic (EKuGL) class of distributions given by the cdf:where and . For the EKuG-L family of distributions, the parametric space of is restricted to (0, 1). Due to this relation, the EKuG-L family may not be flexible enough to counter complex forms of data. Furthermore, the EKuG-L family has four additional parameters. Note that the expression (3) is not true for . Furthermore, due to the higher number of parameters, the estimation of the parameters as well as the computation of many distributional characteristics becomes very difficult. Therefore, in this paper, an attempt has been made to propose a more flexible class of distributions, called flexible reduced logarithmic-X (FRL-X) family via reparameterizing (3). The new family is introduced for (to reduce the number of parameters to avoid the difficulties in computation of mathematical properties) and reparameterizing (to relax the upper limit of the parametric space of ), where . In view of unrestricted upper bound, the proposed distribution would be quite flexible in modeling complex forms of data. Thus, the motivation for proposing the FRL-X family is to reduce the number of parameters as well as to relax the boundary conditions of the parametric values to bring more flexibility in the shape of the hazard rate function than the classical monotone behavior. Also, to improve the description which calls for complexity by adding the parameters in the class of distributions, this gives us more information about the behavior of the hazard rate function in the tail end. A random variable X is said to have the FRL-X distribution, if its cdf is given bywhere is cdf of the baseline random variable depending on the parameter , and is an additional parameter. The expression (4) is also true for The probability density function (pdf) corresponding to (4) is given by

The new pdf is most tractable when and have simple analytical expressions. Henceforth, a random variable X with pdf (5) is denoted by . Furthermore, for the sake of simplicity, the dependence on the vector of the parameters is omitted and will be used. Moreover, the key motivations for using the FRL-X family in practice are(1)A very simple and convenient method of adding an additional parameter to modify the existing distributions(2)To improve the characteristics and flexibility of the existing distributions(3)To introduce the extended version of the baseline distribution having closed forms for cdf, sf, and hrf(4)To provide better fits than the competing modified models(5)To introduce new distributions having nonmonotonic shaped hazard rate functions(6)To provide best fit to unimodal medical care data sets

The FRL-X family can also be obtained via reparameterizing the alpha logarithmic family (ALF) proposed by [13]. The cdf of the ALF family is given by

The problem with ALF family is that , and consequently, the parametric space of is restricted. The RFL-X addressed this problem via reparameterizing as The advantage of the FRL-X family over the ALF is that acceptable, and its parametric space is not restricted. Furthermore, for , the FRL-X reduces to the logarithmic transformed family of [14] given by

The survival function (sf) and hazard function of the FRL-X family are given, respectively, by

The rest of this article is organized as follows. In Section 2, a special submodel of the proposed family is discussed. Some mathematical properties are obtained in Section 3. The characterizations results are presented in Section 4. Maximum likelihood estimates of the model parameters are obtained in Section 5. A comprehensive Monte Carlo simulation study is conducted in Section 6. Section 7 is devoted to analyzing three real-life applications. Further framework is discussed in Section 8. Finally, concluding remarks are provided in the last section.

2. SubModel Description

This section offers a special submodel of the FRL-X family, called the flexible reduced logarithmic-Weibull (FRL-W) distribution. Let and be cdf and pdf of the two-parameter Weibull distribution given by , and , respectively, where . Then, the cdf of the FRL-W distribution has the following expression:

The density function corresponding (9) is given by

Plots of the pdf of the FRL-W distribution are sketched in Figure 1 for selected values of the model parameters.

3. Basic Mathematical Properties

In this section, some statistical properties of the FRL-X family are derived.

3.1. Quantile Function

Let X be the FRL-X random variable with cdf (4), the quantile function of X, say , is given bywhere . From the expression (11), it is clear that the FRL-X family has a closed form solution of its quantile function which makes it easier to generate random numbers.

3.2. Moments

Moments are very important and play an essential role in statistical analysis, especially in the applications. It helps to capture the important features and characteristics of the distribution (e.g., central tendency, dispersion, skewness, and kurtosis). The rth moment of the FRL-X family is derived as

Using (5) in (12), we have

Using the series representation

For in (14), we arrive at

Using (15) in (13), we obtainwhere .

Furthermore, a general expression for the moment generating function (mgf) of the RFL-X family is given by

3.3. Residual and Reverse Residual Life

The residual life offers wider applications in reliability theory and risk management. The residual lifetime of FRL-X, denoted by , is

Additionally, the reverse residual life of the FRL-X random variable, denoted by , is

4. Characterization Results

This section is devoted to the characterizations of the FRL-X distribution based on a simple relationship between two truncated moments. It should be mentioned that for this characterization the cdf is not required to have a closed form. The first characterization result employs a theorem due to [15]; see Theorem 1 below. Note that the result holds also when the interval H is not closed. Moreover, as shown in [23], this characterization is stable in the sense of weak convergence.

Theorem 1. Let (, Ƒ, ) be a given probability space and let be an interval for some ( might as well be allowed). Let X: be a continuous random variable with the distribution function G and let and be two real functions defined on H such thatIs defined with some real function . Assume that and G is twice continuously differentiable and strictly monotone function on the set H. Finally, assume that the equation has no real solution in the interior of H. Then G is uniquely determined by the functions , and , particularlywhere the function is a solution of the differential equation and C is the normalization constant, such that .

Remark. The goal in Theorem 1 is to have as simple as possible.

Proposition 1. Let be a continuous random variable and let and for . The random variable X has pdf (5) if and only if the function defined in Theorem 1 is of the form

Proof. Let X be a random variable with pdf (5), thenand finallyConversely, if is given as above, thenand henceNow, in view of Theorem 1, X has density (5).

Corollary 1. Let be a continuous random variable and let be as in Proposition 1. The random variable X has pdf (5) if and only if there exist functions and defined in Theorem 1 satisfying the following differential equation:

Corollary 2. The general solution of the differential equation in Corollary 1 iswhere D is a constant. We like to point out that one set of functions satisfying the above differential equation is given in Proposition 1 with . Clearly, there are other triplets which satisfy conditions of Theorem 1.

5. Estimation

In this section, the method of maximum likelihood estimation is used to estimate the model parameters. Furthermore, the robustness is also discussed.

5.1. Maximum Likelihood Estimation

In this subsection, the maximum likelihood estimators (MLEs) of the parameters and of RFL-X family from complete samples are derived. Let be a simple random sample from RFL-X family with observed values . The log-likelihood function for this sample is

Obtaining the partial derivatives of (29), we have

Setting and equal to zero and solving numerically these expressions simultaneously yields the MLEs of .

5.2. M-Estimator as a Robust Estimation

Robust statistics are statistics with good performance for the data drawn from a wide range of probability distributions, especially for nonnormal distributions. Robust statistical approach has been developed for many common problems, such as estimating location, scale, and regression. One motivation is to produce statistical methods that are not unduly affected by outliers. Another motivation is to provide methods with good performance when there are small departures from the parametric distribution. For example, robust methods work well for mixtures of two normal distributions with different standard-deviations; under this model, nonrobust methods like a t-test work poorly. Historically, several approaches to robust estimation were proposed, including R-estimators and L-estimators. However, M-estimators now appear to dominate the field as a result of their generality, high breakdown point, and their efficiency. M-estimators are generalization of the maximum likelihood estimators (MLEs). What we try to do with MLE’s is to maximize or, equivalently, minimize [16] proposed to generalize this to the minimization of , where is some function. MLEs are therefore special case of M-estimators. Minimizing can often be done by differentiating and solving where ; for further detail, we refer the interested readers to [17, 18].

6. Monte Carlo Simulation Study

This section offers a comprehensive simulation study to assess the behavior of the MLEs. The FRL-X family is easily simulated by inverting (4). The expression (4) can be used to simulate any special submodel of the FRL-X family. Here, we consider the FRL-W distribution to assess the behavior of the MLEs of the proposed method. We simulate the FRL-W distribution for two sets of parameters (Set 1: , , , and Set 2:, , ). The simulation is performed via the statistical software R through the command mle. The number of Monte Carlo replications made was 1000. For maximizing the log-likelihood function, we use the L-BFGS-B algorithm with optimum function. The evaluations of the estimators were performed via the following quantities for each sample size. The empirical mean squared errors (MSEs) are calculated using the R package from the Monte Carlo replications. The MLEs are determined for each piece of simulated data, say, for ; and the biases and MSEs are computed by

For , we consider the sample sizes at n = 25, 50, 100, 200, 400, 600, 800, 900, and 1000. The empirical results are given in Tables 1 and 2. Corresponding to Tables 1 and 2, the simulation results are graphically displayed in Figures 25. From the simulation results, we conclude that(i)Biases for all parameters are positive(ii)The parameters tend to be stabilized(iii)Estimated biases decrease when the sample size n increases(iv)Estimated MSEs decay toward zero when the sample size n increases

7. Comparative Study

In this section, we illustrate the flexibility of the proposed model via three biomedical data sets. We also compare the proposed model with the other well-known models. The distribution functions of the competitive models are(i)Weibull(ii)APTW distribution(iii)Marshall-Olkin Weibull (MOW) distributionTo determine the optimum model, we compute Cramer–Von Messes (CM) test statistic, Anderson Darling (AD) test statistic, and Kolmogorov Simonrove (KS) test statistics with corresponding values. These values are calculated as follows:(iv)The AD test statisticwhere n is the sample size and is the ith sample, calculated when the data is sorted in ascending order.(v)The CM test statistic(vi)The KS test statisticwhere is the empirical cdf, and supx is the supremum of the set of distances. A distribution with lower values of these measures is considered a good candidate model among the applied distributions for the underlying data sets. By considering these statistical tools, we observed that the FRL-W distribution provides the best fit compared to the other competitors since the values of all selected criteria of goodness of fit are significantly smaller for the proposed distribution.

7.1. Data 1: The Remission Times (in Months) of a Random Sample of 128 Bladder Cancer Patients

The first data set represents the remission times (in months) of a random sample of 128 bladder cancer patients; see [19]. The FRL-W and the considered distributions are applied to this data set. The maximum likelihood estimates of the models for the analyzed data are presented in Table 3, whereas the goodness of fit measures of the proposed and other competitive models are provided in Table 4. Form Table 4, it is clear that the proposed distribution has lower values than the other models applied in comparison. The box plot and Time Scale TTT plot of the first data set are presented in Figure 6. The fitted pdf and cdf of the proposed model are plotted in Figure 7, whereas the PP and Kaplan–Meier survival plots of the proposed model for the first data set are sketched in Figure 8. From the Time Scale TTT plot (Figure 6), we can see that the first data set possess unimodal behavior. Also, from box plot in Figure 6, we can easily observe that the bladder cancer patient’s data set is positively skewed. From Figure 7, it is clear that the proposed model fits the estimated pdf and cdf very closely. From Figure 8, we can easily detect that the proposed model is closely followed the PP-plot which is an empirical tool for finding a best candidate model.

7.2. Data 2: The Survival Times of Neck Cancer Patient Data

The second data set consists of 44 observations taken from [20] represents the survival times of a group of patients suffering from head and neck cancer and treated using a combination of radiotherapy. This data set also used by [21]. We also applied the FRL-W and the other selected distributions to the second data set. Again, we observe that the proposed model outclasses the other competitors. Corresponding to data 2, the values of the model parameters are presented in Table 5. The analytical measures of the proposed and other competitive models are provided in Table 6. The box plot of the second data set and the corresponding Time Scale TTT plot of FRL-W are presented in Figure 9. The estimated pdf and cdf are sketched in Figure 10, which shows that proposed distribution fit the estimated pdf and cdf plots very closely, whereas the PP-Plot and Kaplan–Meier survival plots are presented in Figure 11. From the Time Scale TTT plot (Figure 9), we can see that the second data set possess the unimodal behavior. Also, from box plot in Figure 9, we can easily observe that the neck cancer data set is positively skewed. The proposed model also provides best fitting to the neck cancer data (see Table 6) and the proposed distribution fit the estimated pdf, cdf, and Kaplan–Meier survival plots very closely.

7.3. Data 3: The Guinea Pigs Infected Data

The third data set consists of 72 observations taken from [22] representing the guinea pigs infected with virulent tubercle bacilli. Again, the FRL-W and other competitors are applied to this data set. Analyzing the third data set, we observe that the proposed model provides the better fit than the other competitors. Corresponding to data 3, the values of the model parameters are presented in Table 7. The analytical measures of the proposed and other competitive models are provided in Table 8. The box plot of the third data set and the corresponding Time Scale TTT plot of the FRL-W are presented in Figure 12. The estimated pdf and cdf are sketched in Figure 13, whereas the PP and Kaplan–Meier survival plots are provided in Figure 14. Figures 1214 reveal that the FRL-W distribution provides the superior fits to the guinea pigs infected data.

8. Discussion and Future Frame Work

Statistical decision theory addresses the state of uncertainty and provides a rational framework for dealing with the problems of medical decision-making. The medical data sets are generally skewed to the right, and the positively skewed distributions are reasonably competitive when describing unimodal medical data. The traditional distributions are not flexible enough to counter complex forms of data such as medical sciences data having nonmonotonic failure rate function. In view of the importance of statistical distributions in applied sciences, a number of papers have been appeared in the literature aiming to improve the characteristics of the existing distributions. However, unfortunately the number of parameters has been increased and the estimation of the parameters and derivation of mathematical properties becomes complicated. Furthermore, due to the restricted parametric space, some distributions may not be flexible enough to provide adequate fit to many real data sets. To provide a better description of the medical sciences data, in this study, an attempt has been made to introduce a new family of statistical distributions by reducing the number of parameters and reparameterizing the existing distributions to relax the boundary conditions of the additional parameter. A special submodel of the proposed family offers the best fitting in data modeling with nonmonotonic hazard rate function. The maximum likelihood method is adopted to estimate the model parameters and a comprehensive Monte Carlo simulation study is done to evaluate the behavior of the estimators. To show the usefulness of the proposed method in medical sciences, three real-life examples are discussed. The very first example about bladder cancer patient data set is considered. The second data set represents the neck cancer data and third data set representing the guinea pigs infection. Analyzing these three real-life examples, it showed that the proposed model performs much better than the other competitive distributions. From the above discussion, it is obvious that the researchers are always in search of new flexible distributions. Therefore, to bring further flexibility in the proposed model, we suggest to introduce its extended versions. The proposed method can be extended by introducing a shape parameter to the model.(i)A random variable X is said to follow the extended version of the FRL-X family, if its cdf is given bywhere is the additional shape parameter. For the expression (38) reduces to (4). The new proposal may be named as a flexible reduced logarithmic exponentiated-X (FRLE-X) family. For the illustrative purposes, one may consider its special case may be named as flexible reduced logarithmic exponentiated-Weibull (FRLE-W) distribution defined by the cdf:

Due to the introduction of the of additional shape parameter, the suggested extension may be much flexible in modeling data in medical sciences and other related fields.(ii)Another extension of the FRL-X family is given bywhere is the additional shape parameter. For the expression (40) reduces to (4). The model defined in (40) may be named as the extended flexible reduced logarithmic-X (EFRL-X) family.(iii)Another generalized version of the FRL-X can be introduced viawhere and are the additional shape parameters. Clearly, for the expression (41) reduces to (38). For the expression (41) reduces to (40), whereas for the expression (41) reduces to (4). The model introduced in (41) may be named as the extended flexible reduced logarithmic exponentiated-X (EFRLE-X) family.

9. Concluding Remarks

In this study, we introduced a new family of continuous distributions called the flexible reduced logarithmic-X family. Some mathematical properties of the proposed family are obtained. The maximum likelihood method used to estimate the unknown model parameters. Three applications to the real-life medical data sets are given to illustrate empirically the flexibility of the proposed model. The comparison of the proposed method is made to some well-known lifetime distributions such as Weibull, Marshall-Olkin Weibull, and alpha power transformed Weibull distributions. The comparison is made on the basis of well-known goodness of fit measures including Cramer–Von Messes test statistic, Anderson Darling test statistic, and Kolmogorov–Simonrove test statistics with corresponding values. Empirical findings indicate that the proposed model provide better fits than the other well-known competitive models.

Data Availability

This work is mainly a methodological development and has been applied on secondary data related cancer patients, but if required, data will be provided.

Conflicts of Interest

The authors declare that they have no conflicts of interest.