Abstract

Finite mixture models provide a flexible tool for handling heterogeneous data. This paper introduces a new mixture model which is the mixture of Lindley and lognormal distributions (MLLND). First, the model is formulated, and some of its statistical properties are studied. Next, maximum likelihood estimation of the parameters of the model is considered, and the performance of the estimators of the parameters of the proposed models is evaluated via simulation. Also, the flexibility of the proposed mixture distribution is demonstrated by showing its superiority to fit a well-known real data set of 128 bladder cancer patients compared to several mixture and nonmixture distributions. The Kolmogorov Smirnov test and some information criteria are used to compare the fitted models to the real dataset. Finally, the results are verified using several graphical methods.

1. Introduction

In most reliability applications, data is modeled by a single parametric distribution. However, in many situations, a population can be divided into a number of subpopulations each representing a different type of failure. Finite mixture models play an important role in modeling such heterogeneous data. Applications of mixture models are especially in clustering and classification, see, for example, Everitt and Hand [1], McLachlan and Peel [2], McLachlan and Basford [3], Titterington et al. [4], Lindsay [5], McLachlan and Krishnan [6], Al-Moisheer et al. [7, 8], and Al-Moisheer [9, 10]. In this paper, we will introduce a finite mixture of Lindley and lognormal distributions (MLLND). The motivation of suggesting this mixture comes from the importance of its component distributions. The one-parameter Lindley distribution was introduced by Lindley [11, 12], and then Ghitany et al. [13] illustrated its importance in lifetesting and reliability applications. With regards to one component lognormal distribution, it has found important applications in a wide variety of fields; (see, Kim and Yum [14] and Lin et al. [15]). In the literature, work has been done on mixture models having the Lindley distribution as one of its components, see, for example, Al-Moisheer et al. (Al-Moisheer et al. [16], Al-Moisheer et al. [17]) for the mixture of two one-parameter Lindley distribution and the mixture of Lindley and inverse Weibull distributions, respectively. Also, Daghestani et al. [18] considered the mixture of Lindley and Weibull distributions. In this paper, we will introduce a new mixture distribution, namely, the finite mixture of Lindley and lognormal distribution (MLLND). This paper is organized as follows. In Section 2, we obtain the new model and derive some of its properties. In Sections 3 and 4, we derive the probability density function of the order statistics and the equations required to obtain the maximum likelihood estimation of the model parameters. In Section 5, the flexibility of the proposed model is illustrated by showing its ability to provide the best fit for a well-known real data set compared to six competitive models. Finally, In Section 6, we draw a conclusion.

2. Model Formulation and Some Properties

The MLLND has the following probability density function (pdf)whereas the pdf of the Lindley component is given by

The pdf of the lognormal component is given by

, , and .

Evidently, the cumulative distribution function (cdf) of the MLLND is given bywherewith referring to the cdf of the standard normal distribution.

Ghitany et al. [13] and Shanker et al. [19] displayed some properties of the LD in (2), while properties of the LND in (3) were given, for example, by Crow and Shimizu [20] and Johnson et al. [21]. In this section, we introduce some properties of the MLLND by mixing the results of the LD and LND.

2.1. Mean and Variance

The mean of the MLLND in (1) is simply given bywhile the variance is given by

Also, the th moments of the MLLND is given by

2.2. Mode and Median

It can be shown that the equations for obtaining the modes and median of the MLLND, respectively, areand

Figure 1(a) shows the pdf of the MLLND unimodal case at the choice of parameters with the values of mode and median (1.1623, 1.2712), respectively. Also, Figure 1(b) shows the shape of the pdf in the MLLND bimodal case at the choice of parameters with the values of mode and median ((1.4370, 2.1430), 1.4542), respectively. For plotting the pdf of the LD and LND in , we use the function dlindley() and dlnorm(), respectively. The package rootSolve() in is used for modes and median of the MLLND.

2.3. Reliability and Failure Rate Functions

The reliability function of the MLLND is given by

By using (3) and (4), the hazard rate function (HRF) of the MLLND is given bywhich can be written by using the result in AL-Hussaini and Sultan [22], aswhere

The HRF of the MLLND in (12) achieves the following limits.

Lemma 1.

Proof. To prove the first part of the limits, using the equation (13), we get

Then, we haveand thus (15) is proved.

Also, to prove (16),

It follows from (19) and (20) that

It follows that

For more details, see Sultan [23], Sultan and Al-Moisheer [24], and Al-Moisheer et al. (Al-Moisheer et al. [16]).

2.4. Skewness, Kurtosis, and the Coefficient of Variation

The coefficient of skewness , the coefficient of kurtosis , and the coefficient of variation of the MLLND distribution are given by, respectively,

Some values of the mean, standard deviation, coefficient of variation, coefficient skewness, and coefficient kurtosis for the MLLND distributions are obtained for the two choices of the parameter and different values of the parameter . From the results which are presented in Tables 1 and 2, we note that as increases, both the mean and the standard deviation increase, whereas the values of the other measures remain fairly stable.

3. Order Statistics

Let be a random sample of size selected from a distribution with pdf and cdf , and also let be the corresponding order statistics. The pdf of the th order statistics that say is given by

and the corresponding cdf is given by

Therefore, using (25) and (26), the pdf and cdf of the th order statistics, are, respectively, given by

where

Accordingly, the density functions of the minimum and maximum order statistics, respectively, are given by

4. Maximum Likelihood Estimation

The likelihood function (LF) for the MLLND in (1) is given bywhere = and =. By differentiating the LF with respect to the model parameters , respectively, we get the following equationswhere , , , , , and are as follows:and , , and are as in , respectively. The MLEs of the parameters can be obtained by solving systems of nonlinear Eqs. given in (33) using the package nleqslv() in .

The numerical results are obtained in Table 3 for two different combinations of the parameters. The first one corresponds to a unimodal distribution ,whereas the second choice is for bimodal distribution.

In each case, the averages of the MLEs, biases, mean squared errors (MSE), and the lower and upper limits of the nd confidence intervals (CIs) for the parameters are computed at different sample sizes.

It is clear from Table 3 that the MSE decreases as the sample size increases for all estimates parameters. Furthermore, the values of the bias decrease. Also, as the sample size increases, the width of the confidence intervals (CIs) for the parameters decreases. The number of replications of the simulation results is taken to 10000.

5. Application

The flexibility of the proposed model is illustrated by applying it on a real data set given in Shanker et al. [25]. It represents the remission times (in months) of sample size bladder cancer patients as reported in Lee and Wang [26]. This data was previously analyzed by Daghestani et al. [18], who compared their proposed model, mixture of Lindley and Weibull distribution MLWD to two other models; mixture of two one- parameter Lindley distribution MTLD and mixture of two Weibull distribution MTWD. They showed that their model provides the best fit as it has the lowest values of the KS statistic and AIC criterion and the highest value.

In this paper, we compared our proposed model with six other models including the three models given in Daghestani et al. [18] and there other models, namely, the mixture of Lindley inverse Weibull distributions (MLIWD), mixture of inverse Weibull and Weibull distributions (MIWWD), one component Lindley distribution, and one component lognormal distribution. All the seven models are fitted to the real data. The results are listed in Table 4. Table 4 shows the MLEs of the parameters of the seven models with their standard errors and values of the KS statistic which is used to assess the similarity between the actual data and the fitted distributions. In the software, the packages (MASS) and (fitdistrplus) are used to calculate the values of KS statistics and their corresponding values for the seven distributions. The loglikelihood function and some criteria that measure the quality of the fitted models such as AIC and BIC criteria are computed. Table 5 gives the results of the variance covariance matrices for the competitive models calculated by the function vcov() in which depends on the package fitdist(). Lower and upper limits of the nd confidence intervals (CIs) for the parameters of the different distributions are also provided. Figure 2 displays the plots of the pdfs of the seven fitted models superimposed on the histogram of the real data set by using the function denscomp() in . The figure shows that the MLLND provides a very good fit for these data compared to other mixtures and one component models. Figure 3 shows the comparisons of the plots of the theoretical cdfs of the fitted distributions to the empirical cdf of the data using the function cdfcomp() in . Again, it is clear that the cdf of the MLLND is closer to the empirical distribution than any other model. Figures 4 and 5 show the pp plots and qq plots for the real data to those of the compared models using the functions ppcomp() and qqcomp(), respectively, in . The plots show the adequency of the proposed model to fit the real data compared to other models. In short, all the above figures indicate that the MLLND is the perfect fit for the real data set compared to all the competitive models.

6. Concluding Remarks

This paper introduces a new mixture model which is the MLLND to handle heterogeneous data. This model was proposed due to the importance of each of the Lindley and lognormal distributions and their great applications, and so it was expected that mixing these two distributions together would lead to a more flexible model than its components distributions. Some properties of the MLLND were obtained such as the expectation, the mean, variance, the mode (s), median, reliability function, HRF, skewness, kurtosis, and coefficient of variation. The pdf for the minimum and maximum order statistics of the MLLND is presented. Maximum likelihood estimation of the parameters of the model was discussed and estimated via simulation with number of replications 10000 runs. The main objective of this paper was to illustrate the applicability of the proposed distribution compared to six competitive distributions. This was achieved by showing the ability of the model to fit a well-known real data set better than the compared models. This was done by using the formal test like - statistic as well as information criteria and also through graphical procedures such as plots of theoretical and empirical cdfs, pp plots, and qq plots.

Abbreviations

MLLND:Mixture of Lindley and lognormal distributions
MTLD:Mixture of two Lindley distributions
MLIWD:Mixture of Lindley inverse Weibull distributions; mixture of two Weibull distributions (MTWD)
MIWWD:Mixture of inverse Weibull Weibull distributions
LD:One component of Lindley distribution
LND:One component from lognormal distribution
MLEs:Maximum likelihood estimates
LF:Likelihood function
pdf:Probability density function
cdf:Cumulative distribution function
HRF:Hazard rate function
Sk:Coefficient of skewness
Ku:Coefficient of kurtosis
Cv:Coefficient of variation
CIs:Confidence intervals
AIC:Akaike information criterion
KS:Kolmogorov-Smirnov
ECDF:Empirical cumulative distribution function
pp:Probability plot
qq plot:Quantile quantile plot
Std.:Error: standard error
BIC:Bayesian information criterion.

Data Availability

The data used to support the findings of this study have been deposited in the Shanker et al. [26] repository ([doi:10.15406/bbij.2016.03.00061]).

Conflicts of Interest

The author declares that there is no conflict of interests regarding the publication of this paper.