Abstract

In this article, we study the geometric distribution under randomly censored data. Maximum likelihood estimators and confidence intervals based on Fisher information matrix are derived for the unknown parameters with randomly censored data. Bayes estimators are also developed using beta priors under generalized entropy and LINEX loss functions. Also, Bayesian credible and highest posterior density (HPD) credible intervals are obtained for the parameters. Expected time on test and reliability characteristics are also analyzed in this article. To compare various estimates developed in the article, a Monte Carlo simulation study is carried out. Finally, for illustration purpose, a randomly censored real data set is discussed.

1. Introduction

Lifetime experiments are conducted to collect data on items under study. The data are used for fitting a suitable lifetime model and then inferring about the statistical properties and survival/reliability characteristics of the items. These experiments may be expensive in terms of both cost and time. To save the cost and time, lifetime experiments may be censored intentionally or censoring may occur in an experiment naturally. Many types of censoring schemes have been studied in literature such as type-I, type-II, progressive, hybrid, and random censoring schemes.

Random censoring is a situation when an item under study is lost or removed randomly from the experiment before its failure. In other words, some subjects in the study have not experienced the event of interest at the end of the study. For example, in a clinical trial or a medical study, some patients may still be untreated and leave the course of treatment before its completion. In a social study, some subjects are lost for the follow-up in the middle of the survey. In reliability engineering, an electrical or electronic device such as bulb on test may break before its failure. In such cases, the exact survival time (or time to event of interest) of the subjects is unknown; therefore they are called randomly censored observations.

The random censoring was introduced in literature by Gilbert [1]. Thereafter, Breslow and Crowley [2], Koziol and Green [3], and Csörgó and Horváth [4] also discussed randomly censored data in their work. Kim [5] did chi-square goodness of fit tests for randomly censored data. In the last decade, the recent studies on randomly censored data from exponential distribution include Friesl and Hurt [6] and Saleem and Raza [7]. Rayleigh model with randomly censored data was analyzed by Ghitany [8] and Saleem and Aslam [9]; Burr Type XII was analyzed by Ghitany and Al-Awadhi [10]; generalized exponential and Weibull models were analyzed, respectively, by Danish and Aslam [11, 12]. Krishna et al. [13] studied Maxwell distribution with randomly censored samples.

In all the above cases, survival time or failure time has been assumed to be a continuous variable. However, sometimes it is impossible or inconvenient to measure the life length of a device on a continuous scale.

In real life experiments we come across situations, where failure time data is discrete either through the grouping of continuous data due to imprecise measurement or because time itself is discrete, for example, days, weeks, or months. In such circumstances, one measures the life of a device on a discrete scale. A discrete lifetime model may also consider the number of successful cycles, trials, or operations before failure of a device. In discrete lifetime models, the one parameter geometric distribution has an important position. The geometric distribution can be used as a discrete failure to investigate the ability of electronic tubes to withstand successive voltage overloads and performance of electric switches, which are repeatedly turned on and off.

In reliability theory, geometric distribution has been considered as a lifetime model by Yaqub and Khan [14], Bhattacharya and Kumar [15], Maiti [16], Krishna and Jain [17], Sarhan and Kundu [18], and so forth. In most of above studies, a complete sample or right censoring is considered. No literature is available on random censoring in any discrete distribution.

In view of the above, this paper considers classical and Bayes estimation of the unknown parameters with some reliability characteristics for geometric distribution under randomly censored data. In Section 2, mathematical formulation for randomly censored data with failure and censoring times following geometric distributions is given. Section 3 deals with the maximum likelihood estimation (MLE) for the unknown parameters along with their variances and confidence intervals. Section 4 describes the expected time on test and observed time on test. In Section 5 we obtain the Bayes estimators for the unknown parameters under generalized entropy and LINEX loss functions using beta priors. In Section 6, we consider a Monte Carlo simulation study to explore the properties of various estimates developed in the above sections. Finally, Section 7 deals with a real data example to study the applications of random censoring in geometric distribution. It is essential to mention here that we have used statistical software R [19] for computation purposes throughout the paper.

2. The Model and Its Assumptions

In a life testing experiment or a clinical trial, items or patients are subjected to test. Let be their discrete failure or survival times. Assume that ’s are independently and identically distributed (i.i.d.) random variables (r.v.) with probability mass function (p.m.f.) and cumulative distribution function (c.d.f.) . Further, suppose that these items may be censored before their failure at times . Again, assume ’s to be i.i.d. discrete random variables with p.m.f. and c.d.f. . It is also assumed that ’s and ’s are independent. In a randomly censored experiment, minimum of ’s and ’s, that is, , will actually be observed for . Let be an indicator variable defined by (0) if (). Here, is a Bernoulli random variable with probability [].

In the present article, we consider survival and censoring time variables to follow geometric distributions Geo(θ) & Geo(λ), respectively, with p.m.f.,The probability of failure of an item on test before its censoring is given by

The above probability of failure is tabulated in Table 1 for various values of θ and λ. From Table 1 we observe that on increasing the value of lifetime parameter θ, the probability of failure decreases. However, as we increase the value of censoring parameter λ, the probability of failure increases.

Now, for , the Bernoulli p.m.f. of isAlso, the c.d.f. of iswhich is a geometric distribution with parameter (θλ). The independence of   and   implies the independence of   and  . Therefore, the joint p.m.f. of is given bywith and .

Also, note that, for Geo(θ), the reliability characteristics with mission time are given by(i)survival function = ; (ii)mean life time = (iii)failure rate = ;

3. Maximum Likelihood Estimation

The likelihood function of the randomly censored sample data , , from geometric distributions as discussed in Section 2, is given by

Taking log, differentiating with respect to θ   and  λ, and equating to zero, we get the MLEs of θ  and  λ as    and   .

By the invariance property of MLEs, we have

The MLEs can be viewed as the Bayes estimators under the 0-1 loss function and the uniform prior. Also, the estimates derived by the method of moments coincide with the above MLEs in this case.

The Fisher information matrix is given byso that the variances of the estimates are

The estimates of the above variances can be obtained by replacing (θ, λ) by . The geometric distribution belongs to the exponential family of distributions; therefore, most of the properties of MLEs are valid in this case. The asymptotic, sampling distribution of is normal . Thus, a two-sided confidence interval for θ becomes . Similarly, for λ the confidence interval is given by . Here, is the percentile of standard normal distribution for .

4. Expected Time on Test

In life testing experiments, expected time on test (ETT) is beneficial to have an idea about the expected duration of the experiment. Since the time required for completing an experiment has a direct impact on the cost, this information is important for an experimenter to choose an appropriate sampling plan. Krishna et al. [13] developed ETT for Maxwell distribution under random censoring for the first time. In this section, we develop the mathematical formulation of ETT for randomly censored geometric distribution.

Let be the th order statistic in a randomly censored sample of size , denoting the time to observe the th failure or censoring time. Then, the c.d.f. of is given byFor randomly censored data, ETT is given bySimilarly, in complete case, denotes the time to observe the th failure in a sample of size . Then, the ETT for complete sample is given byBy invariance property of MLEs, the MLE of ETT for randomly censored data is obtained as

Also, we can obtain the observed time on test OBTT = , which is the actual observed experimental time. Sometimes, OBTT is also proposed to estimate ETT.

One can compute ratio of expected experiment time (REET) for comparison purpose, as under

For various values of θ,  λ, and ,  ETT (RCS) and ETT (CS) are computed and simulated in Table 2. From the table, it is observed that, for both randomly censored sample and complete sample, ETT increases with an increase in θ, λ, and . The OBTT estimates ETT quite satisfactorily.

5. Bayesian Estimation

In Bayes estimation, the prior knowledge is updated by conducting an experiment and estimators are constructed to make inferences about the characteristics of interest. This technique provides valid alternatives to traditional estimation methods. Bayesian analysis is an important technique carried out with various types of loss functions and a variety of prior distributions. In this paper, we use the natural conjugate beta priors for the unknown parameters under generalized entropy and linear exponential loss functions.

Let the unknown parameters θ and λ follow the beta distributions of first kind with parameters and , respectively. Here, θ and λ are regarded as random variables having the marginal prior distributions as

By assuming that the prior distributions of θ  and  λ are independent, we have a joint prior that is incorporated with the likelihood to yield the following joint posterior distribution: where ; , ;

The posterior distributions of θ and λ are again independent beta distributions and , respectively.

5.1. Bayes Estimates under Generalized Entropy Loss Function

The generalized entropy loss function (GELF) was proposed by Calabria and Pulcini [20] aswhere is the estimate of θ. γ is a constant which is cancelled out on dividing the numerator by denominator in the procedure to obtain the Bayes estimate. Thus, without loss of generality, we assume (γ = 1). For , a positive error has a more serious effect than a negative error and for , a negative error has a more serious effect than a positive error. In Bayes estimation, we choose such value of which minimizes the risk function We get the Bayes estimator of θ by differentiating with respect to and equating to zero. The Bayes estimator and the corresponding risk function of θ are given by From (16), the marginal posterior distribution of θ is given by Thus, using (18) and (20) the Bayes estimator of θ under GELF is obtained asNow, using (19) the posterior risk function under GELF can be derived aswhere is the digamma function defined as . Similarly, we can obtain the Bayes estimate and the risk function for the parameter λ.

Particular Cases of GELF. The above estimates reduce as particular cases to the Bayes estimates under other loss functions, such as the following:(a)For , they give the estimates under popular squared error loss function (SELF).(b)For , they coincide with the estimates under entropy loss function (ELF).(c)For , they reduce to the estimates under precautionary loss function (PLF).

5.2. LINEX Loss Function

Varian [21] and various other authors have used the linear exponential loss function (LINEX) in different estimation problems. Under the assumption that the minimal loss occurs at , the LINEX loss function can be expressed as Without loss of generality, we assume that . Under the LINEX loss function the Bayes estimator and posterior risk are define as

Sometimes, a situation occurs where no formal prior information is available. Then, an improper joint prior is incorporated with the likelihood to yield a joint posterior distribution. By setting the values of in beta priors, the Bayes estimates of and are obtained in case of noninformative prior.

5.3. Bayesian Credible and HPD Credible Intervals

If is the marginal posterior distribution of the parameter θ, the credible interval for θ is obtained byand for both sided equal tail credible intervals, we takeSimilarly, the credible interval for λ is obtained.

Chen and Shao [22] proposed a procedure for calculating a highest posterior density (HPD) credible interval for θ when the posterior distribution of θ is unimodal. In the present case, the posterior distribution of θ is beta distribution which is a unimodal distribution; therefore we can apply the following Chen and Shao algorithm.

Step 1. Obtain an MCMC sample , , from .

Step 2. Sort ; to obtain the ordered values as

Step 3. For , compute the credible intervalsHere, [] denotes the integer part of .

Step 4. The % HPD credible interval is the one with the smallest interval width among all credible intervals obtained in Step 3.

Similarly, we can obtain the HPD credible interval for the parameter λ. For computing the above algorithm, Boa package of R can be implemented.

6. Simulation Study

Since the performance of different estimation methods cannot be compared theoretically in the present case, we therefore perform a Monte Carlo simulation study to compare the estimates, obtained from maximum likelihood and Bayes estimation techniques under various loss functions. The simulation procedure step by step is given below:(i)Choose the values of hyperparameters and of informative prior distributions with fixed sample size . For noninformative priors take .(ii)Take the different values of the parameters equal to the means of prior distributions as and , respectively.(iii)Calculate the actual value of MTSF = . Taking the mission time = integer value of [MTSF/2], calculate values of the reliability characteristics    and   .(iv)Generate a randomly censored sample of size from the models given in (3) and (4).(v)Calculate the maximum likelihood estimates, their variances, confidence intervals, and coverage probabilities for the parameters and MLEs of reliability characteristics.(vi)Obtain the Bayes estimates under GELF and LINEX loss functions for the parameters and the reliability characteristics. Derive the associated credible and HPD credible intervals for the parameters along with their average length (AL) and coverage probability (CP).(vii)Repeat steps (iv–vi), times, for different combinations of the parametric values. Compute the average values (AV) and mean square error (MSE) of the estimates obtained in steps (v-vi).

All computations were performed using the software R. The main results of the simulation study are shown in Tables 410. From the tables, we conclude the following:(i)The maximum likelihood estimation method for parameters and reliability characteristics gives very good results in terms of both average values and MSEs.(ii)The Bayes estimates are also very good in respect of bias and MSE under SELF and LINEX loss functions but Bayes estimates under LINEX are better than under SELF in respect of bias. In ELF, estimates show underestimation and in PLF, they give overestimation.(iii)The coverage probabilities of the parameters attain their nominal levels in the cases of confidence intervals based on MLEs, credible, and HPD credible intervals. But sometimes HPD credible intervals give better coverage than the others.

7. Real Data Example

In this section, we present a real data example to illustrate the utility of our model. The data set given in Lee and Wang [23, p-231] consists of the remission times in months of 137 cancer patients. These remission times are a subset of the data from a bladder cancer study and are used here only for illustration purposes. The data set is given below in complete months.

Remission Times (Months) of 137 Cancer Patients. 4, 32, 3, 13, 19, 4, 3+, 5, 14, 5, 19+, 7, 7, 7, 20, 5, 5, 3, 46, 4+, 2, 4, 5, 2, 9, 1, 0, 8, 3, 9, 36, 14, 26, 79, 10, 8, 4+, 2, 4, 11, 2, 4, 5, 1, 0, 11, 16, 4, 21, 10, 12, 34, 0+, 10, 6, 2, 0, 12, 0, 17, 3, 2, 1, 12, 43, 14, 2, 7, 0, 1, 7, 4, 3, 8, 1, 13, 1, 14, 6, 23, 24+, 5, 8+, 3, 10+, 18, 7, 7, 17, 25, 3, 2, 15, 17, 9, 3, 7, 1, 0, 2, 4, 9, 5, 2, 22, 6, 2, 11, 2, 7, 2, 5, 3, 4+, 8, 6, 4, 3, 0, 10, 6, 3, 5, 13, 8, 3, 5, 7, 5, 11, 2, 2, 6, 7, 4, 25, 12.

Note. Censored observations are indicated by plus sign (+).

To test the goodness of fit, we consider chi-square test and derive the following results:

Chi-square observed value = 19.34912, Chi-square tabulated value = 37.65248, and value = 0.2199.

Here, observed value is less than the tabulated value at 5% level of significance and value is also quite large in this case. Thus, the data set is fitted well with our model.

Also, we obtain the empirical c.d.f. and maximum likelihood estimate of c.d.f. curves for comparing the behavior of the data through the graphs. In the graph, in Figure 1, both the curves are closer to each other. Hence, we conclude that this data set is fitted very well for this model.

First of all, we obtain the estimates of the unknown parameters by maximum likelihood estimation, 95% confidence intervals, and Bayes estimation under various types of loss functions with informative priors as beta distributions having parameters () and () taking , , . We also estimate the parameters for noninformative priors by taking , assuming no prior information is available with us. The expected time on test and the observed time on test are also calculated for this data set. The estimates of the unknown parameters are listed in Table 3.

By using all the above criteria, we see that the maximum likelihood estimates are quite close to Bayes estimates for informative as well as noninformative priors under various loss functions. Confidence limits also show a good coverage of the estimates.

Concluding Remarks. The present paper deals with the estimation of parameters and reliability characteristics with a randomly censored sample from geometric distribution. Maximum likelihood estimates along with their variances and confidence intervals for the parameters are derived. Bayes estimates under generalized entropy and LINEX loss functions are obtained. The estimates are shown to be almost unbiased and efficient through simulation study. The concept of expected time on test in random censoring is also considered. A real data example is given for explaining the methods developed in the paper.

Competing Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.