The Beta-Lindley Distribution: Properties and Applications
We introduce the new continuous distribution, the so-called beta-Lindley distribution that extends the Lindley distribution. We provide a comprehensive mathematical treatment of this distribution. We derive the moment generating function and the rth moment thus, generalizing some results in the literature. Expressions for the density, moment generating function, and rth moment of the order statistics also are obtained. Further, we also discuss estimation of the unknown model parameters in both classical and Bayesian setup. The usefulness of the new model is illustrated by means of two real data sets. We hope that the new distribution proposed here will serve as an alternative model to other models available in the literature for modelling positive real data in many areas.
In many applied sciences such as medicine, engineering, and finance, amongst others, modelling and analysing lifetime data are crucial. Several lifetime distributions have been used to model such kinds of data. The quality of the procedures used in a statistical analysis depends heavily on the assumed probability model or distributions. Because of this, considerable effort has been expended in the development of large classes of standard probability distributions along with relevant statistical methodologies. However, there still remain many important problems where the real data does not follow any of the classical or standard probability models.
Some beta-generalized distributions were discussed in recent years. Eugene et al. , Nadarajah and Gupta , Nadarajah and Kotz , and Nadarajah and Kotz  proposed the beta-normal, beta-Gumbel, beta-Frchet, and beta-exponential distributions, respectively. Jones  discusses this general beta family motivated by its order statistics and shows that it has interesting distributional properties and potential for exciting statistical applications.
Recently, Barreto-Souza et al.  proposed the beta-generalized exponential distribution, Pescim et al.  introduced the beta-generalized half-normal distribution, and Cordeiro et al.  defined the beta-generalized Rayleigh distribution with applications to lifetime data.
In this paper, we present a new generalization of Lindley distribution called the beta-Lindley distribution. The Lindley distribution was originally proposed by Lindley  in the context of Bayesian statistics, as a counter example of fudicial statistics.
Definition 1. A random variable is said to have the Lindley distribution with parameter if its probability density is defined as
The corresponding cumulative distribution function (CDF) is
Ghitany et al.  have discussed the various statistical properties of Lindley distribution and shown its applicability over the exponential distribution. They have found that the Lindley distribution performs better than exponential model. One of the main reasons to consider the Lindley distribution over the exponential distribution is its time dependent/increasing hazard rate. Since last decade, Lindley distribution has been widely used in different setup by many authors.
The rest of the paper has been organized as follows. In Section 2, we introduced the beta-Lindley distribution and demonstrated its flexibility showing the wide variety of shapes of the density, distribution, and hazard rate functions. The moments and order statistics from the beta-Lindley distribution are derived in Sections 3 and 4, respectively. In Section 5, the maximum likelihood and least square estimators as well as Bayes estimators of the parameters are constructed for estimating the unknown parameters of the beta-Lindley distribution. For demonstrating the applicability of proposed distribution, two real data sets are considered in Section 6. Simulation algorithm is also provided in Section 6 to generate the random sample from beta-Lindley distribution. The paper is then concluded in Section 7.
2. Beta-Lindley Distribution
Let denote the cumulative distribution function (CDF) of a random variable , and then the cumulative distribution function for a generalized class of distribution for the random variable , as defined by Eugene et al. , is generated by applying the inverse CDF to a beta distributed random variable to obtain The corresponding probability density function for is given by where is the parent density function and is beta function. We now introduce the three-parameter beta-Lindley (BL) distribution by taking in (3) to be the CDF (2). The CDF of the BL distribution is then The PDF of the new distribution is given by
Figure 1(a) illustrates some of the possible shapes of the PDF of the beta-Lindley distribution for selected values of the parameters , and , respectively.
(a) PDF of BL
(b) Hazard function of BL
Lemma 4. The limit of beta-Lindley density as is and the limit as is .
Proof. It is straightforward to show the above from the beta-Lindley density in (6).
The reliability function , which is the probability of an item not failing prior to some time , is defined by . The reliability function of the beta-Lindley distribution is given by The other characteristic of interest of a random variable is the hazard rate function defined by , which is an important quantity characterizing life phenomenon. It can be loosely interpreted as the conditional probability of failure, given that it has survived to time . The hazard rate function for the beta-Lindley random variable is given by
Figure 1(b) illustrates some of the possible shapes of the hazard function of the beta-Lindley distribution for selected values of the parameters , and , respectively.
3. Moments and Generating Function
Theorem 5. The th moment of the beta-Lindley distributed random variable , if are real nonintegers, is given as
Proof. See the appendix.
4. Order Statistics
The th order statistic of a sample is its th smallest value. For a sample of size , the th order statistic (or largest order statistic) is the maximum; that is, The sample range is the difference between the maximum and minimum. It is clearly a function of the order statistics: We know that if denotes the order statistic of a random sample from a continuous population with CDF and PDF , then the PDF of is given by for . The PDF of the th order statistic for the beta-Lindley distribution is given by
5.1. Maximum Likelihood Estimates
The maximum likelihood estimates, MLEs, of the parameters that are inherent within the beta-Lindley distribution function are obtained as follows. The likelihood function of the observed sample of size drawn from the density (6) is defined as The corresponding log-likelihood function is given by Now, setting we have where is digamma function. The MLEs of , respectively, are obtained by solving this nonlinear system of equations. It is usually more convenient to use nonlinear optimization algorithms such as the quasi-Newton algorithm to numerically maximize the sample likelihood function given in (16). Applying the usual large sample approximation, the MLE can be treated as being approximately trivariate normal with mean and variance-covariance matrix equal to the inverse of the expected information matrix; that is, where is the limiting variance-covariance matrix of . The elements of the matrix can be estimated by , .
The elements of the Hessian matrix corresponding to the function in (17) are given in the appendix.
Approximate two-sided confidence intervals for and for are, respectively, given by where is the upper th quantile of the standard normal distribution. Using , we can easily compute the Hessian matrix and its inverse and hence the standard errors and asymptotic confidence intervals.
We can compute the maximized unrestricted and restricted log-likelihood functions to construct the likelihood ratio (LR) test statistic for testing on some of the beta-Lindley submodels. For example, we can use the LR test statistic to check whether the beta-Lindley distribution for a given data set is statistically superior to the Lindley distribution. In any case, hypothesis tests of the type versus can be performed using a LR test. In this case, the LR test statistic for testing versus is , where and are the MLEs under and , respectively. The statistic is asymptotically (as ) distributed as , where is the length of the parameter vector of interest. The LR test rejects if , where denotes the upper quantile of the distribution.
5.2. Least Squares Estimators
In this section, we provide the regression based method estimators of the unknown parameters of the beta-Lindley distribution, which was originally suggested by Swain et al.  to estimate the parameters of beta distributions. It can be used in some other cases also. Suppose is a random sample of size from a distribution function and suppose , , denotes the ordered sample. The proposed method uses the distribution of . For a sample of size , we have see Johnson et al. . Using the expectations and the variances, the least squares methods can be used.
Obtain the estimators by minimizing with respect to the unknown parameters. Therefore, in case of BL distribution, the least squares estimators of , and , say , and , respectively, can be obtained by minimizing with respect to , and .
5.3. Bayes Estimation
In this section, we developed the Bayes procedure for the estimation of the unknown model parameters based on observed sample from beta-Lindley distribution. In addition to having a likelihood function, the Bayesian needs a prior distribution for parameter, which quantifies the uncertainty about parameter prior to having data. In many situations, existing knowledge may be difficult to summarise in the form of an informative prior. In such case, it is better to consider the noninformative prior for Bayesian analysis (for more details on the use of noninformative prior, see ). We take the noninformative priors () for , , and of the following forms: It is to be noticed that the choices of and are unimportant and we can simply take Thus, the joint posterior distribution of , , and is given by where is the normalizing constant. Under square error loss, the Bayes estimates of , , and are the means of their marginal posteriors and defined as respectively. It is not easy to calculate Bayes estimates through (28), (29), and (30) and so the numerical approximation techniques are needed. Therefore, we proposed the use of Monte Carlo Markov Chain (MCMC) techniques, namely, Gibbs sampler and Metropolis Hastings (MH) algorithm; see [17–19]. Since the conditional posteriors of the parameters cannot be obtained in any standard forms, we, therefore, used a hybrid MCMC strategy for drawing samples from the joint posterior of the parameters. To implement the Gibbs algorithm, the full conditional posteriors of , , and are given by The simulation algorithm we followed is given by the following.
Step 1. Set starting points, say , , and , then at th stage.
Step 2. Using MH algorithm, generate .
Step 3. Using MH algorithm, generate .
Step 4. Using MH algorithm, generate .
Step 5. Repeat steps 2–4, times to get the samples of size from the corresponding posteriors of interest.
Step 6. Obtain the Bayes estimates of , , and using the following formulae: respectively, where is the burn-in period of the generated Markov chains.
Step 7. Obtain the HPD credible intervals for , , and by applying the methodology of . The HPD credible intervals for , , and are , , and , respectively, where is chosen such that
Here, denotes the largest integer less than or equal to .
Note that there have been several attempts made to suggest the proposal density for the target posterior in the implementation of MH algorithm. By reparameterizing the posterior on the entire real line, [16, 21] have suggested to use the normal approximation of the posterior as a proposal candidate in MH algorithm. Alternatively, it is also realistic to have the thought of using the truncated normal distribution without reparameterizing the original parameters. Therefore, we proposed the use of the truncated normal distribution as the proposal kernel to the target posterior.
6.1. Real Data Applications
In this section, we use two real data sets to show that the beta-Lindley distribution can be a better model than one based on the Lindley distribution. The description of the data is as follows.
Data Set 1. The data set 1 represents an uncensored data set corresponding to remission times (in months) of a random sample of 128 bladder cancer patients reported by Lee and Wang .
Data Set 2. The data set 2 represents the survival times (in days) of 72 guinea pigs infected with virulent tubercle bacilli, observed and reported by Bjerkedal . The survival times of 72 guinea pigs are as follows.
The variance-covariance matrix of the MLEs under the beta-Lindley distribution for data set 1 is computed as Thus, the variances of the MLE of , and are , and . Therefore, confidence intervals for , and are , , and , respectively.
In order to compare the two distribution models, we consider criteria like , AIC, and CAIC for the data set. The better distribution corresponds to smaller , AIC, and AICC values.
The LR test statistic to test the hypotheses versus for data set 1 is , so we reject the null hypothesis.
The variance-covariance matrix of the MLEs under the beta-Lindley distribution for data set 2 is computed as Thus, the variances of the MLE of , and are , and . Therefore, confidence intervals for , and are , and , respectively.
The LR test statistic to test the hypotheses versus for data set 2 is , so we reject the null hypothesis. Tables 1 and 2 show parameter MLEs to each one of the two fitted distributions for data sets 1 and 2, and Tables 1 and 2 show the values of , AIC, and AICC. The values in Tables 1 and 2 indicate that the beta-Lindley is a strong competitor to another distribution used here for fitting data set 1 and data set 2. A density plot compares the fitted densities of the models with the empirical histogram of the observed data (Figures 2(a) and 2(b)). The fitted density for the beta-Lindley model is closer to the empirical histogram than the fits of the Lindley models.
(a) Data set 1
(b) Data set 2
The Bayes estimates and the corresponding HPD credible intervals for the parameters , , and are summarised in Table 3.
6.2. Simulated Data
In this subsection, we provided an algorithm to generate a random sample from the beta-Lindley distribution for the given values of its parameters and sample size . The simulation process consists of the following steps.
Step 1. Set , and .
Step 2. Set initial value for the random starting.
Step 3. Set .
Step 4. Generate .
Step 5. Update by using Newton’s formula such as
Step 6. If (very small, tolerance limit), then will be the desired sample from .
Step 7. If , then set and go to step 50.
Step 8. Repeat steps 40–70, for , and obtain .
Using the previous algorithm, we generated a sample of size 30 from beta-Lindley distribution for arbitrary values of , and . The simulated sample (Data 3) is given by
The maximum likelihood estimates and Bayes estimates with corresponding confidence/credible intervals are calculated based on the simulated sample. The MLEs of are , respectively. The asymptotic confidence intervals for are obtained as , , and , respectively. For Bayes estimates and the corresponding credible intervals based on simulated data, see Table 3.
Here, we propose a new model, the so-called beta-Lindley distribution which extends the Lindley distribution in the analysis of data with real support. An obvious reason for generalizing a standard distribution is that the generalized form provides larger flexibility in modelling real data. We derive expansions for the moments and for the moment generating function. The estimation of parameters is approached by the method of maximum likelihood and Bayesian; also the information matrix is derived. We consider the likelihood ratio statistic to compare the model with its baseline model. Two applications of the beta-Lindley distribution to real data show that the new distribution can be used quite effectively to provide better fits than the Lindley distribution.
Proof of Theorem 5. One has
The elements of Hessian matrix. One has
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
S. Nadarajah and S. Kotz, “The beta exponential distribution,” Reliability Engineering & System Safety, vol. 91, Article ID 689697, 2005.View at: Google Scholar
R. R. Pescim, C. G. Demtrio, G. M. Cordeiro, E. M. Ortega, and M. R. Urbano, “The beta generalized half-normal distribution,” Computational Statistics and Data Analysis, vol. 54, no. 4, pp. 945–957, 2010.View at: Google Scholar
G. M. Cordeiro, C. T. Cristino, E. M. Hashimoto, and E. M. M. Ortega, “The beta generalized Rayleigh distribution with applications to lifetime data,” Statistical Papers, vol. 54, no. 1, pp. 133–161, 2013.View at: Google Scholar
J. Swain, S. Venkatraman, and J. Wilson, “Least squares estimation of distribution function in Johnson's translation system,” Journal of Statistical Computation and Simulation, vol. 29, pp. 271–297, 1988.View at: Google Scholar
N. L. Johnson, S. Kotz, and N. Balakrishnan, Continuous Univariate Distr ibution, vol. 2, John Wiley & Sons, 2nd edition, 1995.View at: MathSciNet
S. K. Upadhyay, N. Vasishta, and A. F. M. Smith, “Bayes inference in life testing and reliability via Markov chain Monte Carlo simulation,” Sankhya: The Indian Journal of Statistics A, vol. 63, no. 1, pp. 15–40, 2001.View at: Google Scholar
T. Bjerkedal, “Acquisition of resistance in Guinea pigs infected with different doses of virulent tubercle bacilli,” The American Journal of Epidemiology, vol. 72, no. 1, pp. 130–148, 1960.View at: Google Scholar