Abstract

We proposed a new family of lifetime distributions, namely, complementary exponentiated exponential geometric distribution. This new family arises on a latent competing risk scenario, where the lifetime associated with a particular risk is not observable but only the maximum lifetime value among all risks. The properties of the proposed distribution are discussed, including a formal proof of its probability density function and explicit algebraic formulas for its survival and hazard functions, moments, rth moment of the ith order statistic, mean residual lifetime, and modal value. Inference is implemented via a straightforwardly maximum likelihood procedure. The practical importance of the new distribution was demonstrated in three applications where our distribution outperforms several former lifetime distributions, such as the exponential, the exponential-geometric, the Weibull, the modified Weibull, and the generalized exponential-Poisson distribution.

1. Introduction

Several new classes of models have been introduced in recent years grounded in the simple exponential distribution. The main idea is to propose lifetime distributions which can accommodate practical applications where the underlying hazard functions are nonconstant, presenting monotone shapes, since the exponential distribution does not provide a reasonable fit in such situations. For instance, we can cite [1], which proposed a variation of the exponential distribution, the exponential geometric (EG) distribution, with decreasing hazard function, [2], which introduced the exponentiated exponential distribution as a generalization of the usual exponential distribution, which can accommodate data with increasing and decreasing hazard functions, [3], which proposed a generalized exponential distribution, which can accommodate data with increasing and decreasing hazard functions, [4], which proposed the exponentiated type distributions extending the Fréchet, gamma, Gumbel, and Weibull distributions, [5], which proposed another modification of the exponential distribution with decreasing hazard function, [6], which generalizes the distribution proposed by [5] by including a power parameter in this distribution, which can accommodate increasing, decreasing, and unimodal hazard functions, [7], which proposed the Poisson-exponential distribution, and [8], which proposed the complementary exponential geometric distribution, which is complementary to the exponential geometric distribution proposed by [1]. The last two proposed distributions accommodate increasing hazard functions.

In this paper, following [8], we propose a new distribution family by extending the exponentiated exponential distribution [2] by compounding it with a geometric distribution, hereafter the complementary exponentiated exponential geometric distribution or simplistically the CE2G distribution. The new distribution genesis is stated on a complementary risk problem base [9] in presence of latent risks, in the sense that there is no information about which factor was responsible for the component failure and only the maximum lifetime value among all risks is observed. This family have one shape and two scale parameters accommodating increasing, decreasing, and bathtub failure rates.

The paper is organized as follows. In Section 2 we introduce the new CE2G distribution, derive the expressions for the probability density, survival, and hazard functions and the th quantile, and present its genesis. In Section 3 we present some of its properties, such as its characteristic function, th raw moment, mean and variance, order statistics, th moment of the th order statistic, mean residual lifetime, and modal value. In Section 8 we present the inferential procedure. In Section 10 the practical importance of the new distribution was demonstrated in three applications where our distribution outperforms several former lifetime distributions, such as the exponential, the exponential-geometric, the Weibull, the modified Weibull, and the generalized exponential Poisson distribution. Some final comments in Section 11 conclude the paper.

2. The CE2G Model

Let be a nonnegative random variable denoting the lifetime of a component in some population. The random variable is said to have a CE2G distribution with parameters , , and if its probability density function (pdf) is given by where is a scale parameter of the distribution, and and are shape parameters. Figure 1(a) shows the CE2G probability density function for , , and and we can see that the function can be decreasing or unimodal.

The survival function of a CE2G distributed random variable is given by where, , , and .

From (2) and (1), the failure rate function, according to the relationship , is given by

The initial value is not finite if and otherwise is given by if or if and the long-term hazard function value is . The failure rate (3) can be increasing, decreasing, or bathtub as shown in Figure 1(b), which shows some failure rate function shapes to , , and .

The th quantile of the CE2G distribution is given by where has the uniform distribution and is the distribution function of .

Consider that in the study of reliability we can observe only the maximum component lifetime for each component among all risks. On many occasions, the information about what risk produces the dead of the component in analysis is not available or it is impossible that the true cause of failure is specified. Complementary risks (CR) problems arise in several areas and an extensive literature is available. Interested readers can see [1012].

Then, in this context, our model can be derived as follows. Let be a random variable denoting the number of failure causes, and considering with geometrical probability distribution given by where and .

Also consider , realizations of a random variable denoting the failure times, that is, the time-to-event due to the th CR and, from [2], has an exponentiated exponential probability distribution with parameters and , given by where and are the pdf and df, respectively, of the exponential distribution with parameter .

In the latent complementary risks scenario, the number of causes and the lifetime associated with a particular cause are not observable (latent variables), but only the maximum lifetime among all causes is usually observed. So, we only observe the random variable given by

The following result shows that the random variable has probability density function given by (1).

Proposition 1. If the random variable is defined as (7), then, considering (5) and (6), is distributed according to a CE2G distribution, with probability density function given by (1).

Proof. The conditional density function of (7) given is given by
Then, the marginal probability density function of is given by
This completes the proof.

3. Some Properties

Many of the most important features and characteristics of a distribution can be studied through its moments, such as mean and variance. A general expression for rth ordinary moment of the CE2G distribution is hard to be obtained and we resume the mean and variance as follows.

The moment generating function of the variable with density function given by (1) can be obtained analytically, if we consider the expression, given in [13, page 329, Equation ].

For any real number , let be the characteristic function of , that is, , where denotes the imaginary unit. With the preceding notations, we state the following.

Proposition 2. For the random variable with CE2G distribution, we have that its characteristic function is given by where .

Proof. Consider the following: where the last equality follows from the change of variable .
Comparing the last integral with (10), obtaining , , , and , and making the appropriate substitutions completed the proof.

Proposition 3. A random variable with density given by (1) has mean and variance given, respectively, by
where is known as PsiGamma function.

Proof. The first result follows from the relationship . From the literature, and , and with a little algebra follow the results.

Skewness is a measure of the asymmetry of the probability distribution. The skewness value can be positive or negative, or even undefined. Qualitatively, a negative skew indicates that the tail on the left side of the probability density function is longer than the right side and the bulk of the values lie to the right of the mean. A positive skew indicates that the tail on the right side is longer than the left side and the bulk of the values lie to the left of the mean. The skewness of a random variable , say , is given by the third standardized moment

Kurtosis is any measure of the “peakedness” of the probability distribution of a real-valued random variable. In a similar way to the concept of skewness, kurtosis is a descriptor of the shape of a probability distribution. It is common practice to use the kurtosis to provide a comparison of the shape of a given distribution to that of the normal distribution. One common measure of kurtosis, originating with Karl Pearson, say , is based on a scaled version of the fourth moment, given by

Algebraic expressions of kurtosis and skewness are extensive to show, due to the fact that is necessary the algebraic moment expressions up order four. This moment can be obtained by algebraic manipulation to determine , , , and in (14) and (15) through the Equation (11). Figure 2 shows the kurtosis () and skewness () of the CE2G distribution for with , and for with , .

4. Order Statistics

Order statistics are among the most fundamental tools in nonparametric statistics and inference. Let be a random sample taken from the CE2G distribution and denote the corresponding order statistics. Then, the pdf of the th order statistics is given by

The th moment of the th order statistic can be obtained from the following result due to [14]:

Consider the binomial series expansion given by where is a Pochhammer symbol, given and if the series converge, and

Proposition 4. For the random variable with CE2G distribution, we have that th moment of the th order statistic is given by

Proof. From (2) and (18), we have that
Using the change of variable and the expansion (18) results in the kernel of the gamma distribution function as Now considering (22) in (17) and the property (19), the result follows.

5. Entropy

An entropy of a random variable is a measure of variation of the uncertainty. A popular entropy measure is Rényi entropy [15].

If has the probability density function (1) then Rényi entropy is defined by where and .

Proposition 5. If the random variable is defined as (7), then, the Rényi entropy, is given by

Proof. From (23), we can calculate
So, using the (25) in , the result follows.

6. Reliability

In the context of reliability, the stress-strength model describes the life of a component which has a random strength that is subjected to a random stress . The component fails at the instant hat, the stress applied to it exceeds the strength, and the component will function satisfactorily whenever . So, is a measure of component reliability. In the area of stress-strength models there has been a large amount of work as regards estimation of the reliability when and are independent random variables belonging to the same univariate family of distributions.

Proposition 6. If the random variable is defined as (7), then, the reliability for and i.i.d is given by

Proof. For and i.i.d. CE2G r.v.'s where is the stress and is the strength, the reliability is given by This completes the proof.

7. Residual Lifetime Distribution

Given that there was no failure prior to time , the residual lifetime distribution of a random variable , distributed as CE2G distribution, has the survival function given by

The mean residual lifetime of a continuous distribution with survival function is given by

Proposition 7. For the random variable with CE2G distribution, we have that the mean residual lifetime is given by

Proof. From (29) and using given by (2), we have that Now using (18) and making a binomial expansion in a similar way of the proof of Proposition 4 on (22), the result follows.

8. Inference

Assuming the lifetimes are independently distributed and are independent from the censoring mechanism, the maximum likelihood estimates (MLEs) of the parameters are obtained by direct maximization of the log-likelihood function given by where is a censoring indicator, which is equal to or , respectively, if the data is censored or observed. The advantage of this procedure is that it runs immediately using existing statistical packages. We have considered the optim routine of the R [16].

Large-sample inference for the parameters are based on the MLEs and their estimated standard errors. For , we consider the observed Fisher information matrix given by where the elements of the matrix are given in the appendix.

Under conditions that are fulfilled for the parameters , and in the interior of the parameter space, the asymptotic distribution of , as , is a normal 3-variate with zero mean and variance covariance matrix .

In order to compare different distributions, we relied upon several authors in the literature, for example, [6, 1719], which use the Akaike information criterion (AIC) and Bayesian information criterion (BIC) values, which are defined, respectively, by and , where is the LogLikehood evaluated in the MLE vector on respective distribution, is the number of parameters estimated, and is the sample size. The best distribution corresponds to a lower AIC and BIC values.

9. Simulation Study

Regarding the performance of the MLEs in the process of estimation, a study was performed based on one hundred generated dataset from the CE2G with six different sets of parameters for , , , , , and . In order to have unbounded parameters, we consider the following restrictions on the parameters in estimation process. For the parameter , we considered the transformation , where , and for and consider an exponential transformation. Based on the literature of the MLEs, we can return on the original parameters thought of the transformations. For the calculus of their variances, we use the delta method. The values were used as the initial values for all numerics simulations since , , and .

The results are condensated in Table 1, which shows the averages of the MLEs, Av, together with coverage probability of the 95% confidence intervals for parameters of the CE2G, , the bias, the mean squarer error, MSE, and their deviance, Sd. These results suggest that the MLEs estimates have performed adequately. The deviance of the MLEs decrease when sample size increases. The empirical coverage probabilities are close to the nominal coverage level, particularly, as sample size increases.

10. Applications

In this section, we compare the CE2G distribution fit with several usual lifetime distributions on three datasets extracted from the literature. The first dataset, , refers to the serum-reversal time (days) of children contaminated with HIV from vertical transmission at the university hospital of the Ribeirão Preto Scholl of Medicine (Hospital das Clínicas da Faculdade de Medicina de Ribeirão Preto) from 1986 to 2001 [20]. Serum reversal can occur in children born from mothers infected with HIV.

The second dataset, , is lifetimes in hours of 417 forty-watt, 110-volt internally frosted incandescent lamps taken from 42 weekly quality control [21]. Survival times, in days, are given for the control group of lamps on original dataset.

The third dataset, , gives the survival times for laboratory mice, which were exposed to a fixed dose of radiation at an age of 5 to 6 weeks. The cause of death for each mouse was determined after autopsy to be one of three possibilities: thymic lymphoma (C1), reticulum cell sarcoma (C2), or other causes (C3) [22]. Consider here the minces of C3 in the control group.

Firstly, in order to identify the shape of a lifetime data failure rate function, we will consider a graphical method based on the TTT plot [23]. In its empirical version, the TTT plot is given by , where and , represent the order statistics of the sample. It has been shown that the failure rate function is increasing (decreasing) if the TTT plot is concave (convex). Figure 3(a) shows concave TTT plots for the , , and datasets, indicating increasing failure rate functions.

We compare the CE2G distribution fits with the exponential distribution with probability density function given by , the exponentiated exponential distribution, EE, with probability density function given by , the EG distribution [1] with probability density function given by , the Weibull distribution with probability density function given by , where the shape parameter is and scale parameter is , the gamma distribution with probability density function given by , with shape parameter and scale parameter , the modified Weibull (MW) distribution with probability density function given by , where and , the generalized exponential Poisson (GEP) distribution [6] with probability density function given by , the generalized Birnbaum-Saunders (BS-G) distribution [24] with probability density function given by , where is the probability density distribution of the standard normal distribution, and the Birnbaum-Saunders (BS) distribution. The BS distribution is obtained considering in the BS-G probability density function.

Table 2 provides the AIC and BIC criterion values for each distribution. They provide evidence in favor of our CE2G distribution for the datasets and in all of the three comparison criterion. For the dataset , the CE2G distribution provides similar fitting to the Weibull and MW distributions, implying that the CE2G distribution is a competitor to the usual survival distributions. These results are corroborated by the empirical Kaplan-Meier survival functions and the fitted survival functions shown in Figure 3(b). The MLEs (and their corresponding standard errors in parentheses) of the parameters , , and of the CE2G distribution are given, respectively, by 3.7469 (0.5688), 41.4860 (9.7659), and 17536.46 (7.1814) for , by 5.1765 (19.4159), 0.2625 (0.9915), and 94.6676 (3.8720) for , and by 0.0018180 (0.9818), 0.0698 (0.3770), and 78.7704 (11.5084) for .

11. Concluding Remarks

In this paper, a new lifetime distribution is provided and discussed. The CE2G distribution accommodates increasing, decreasing, and bathtub failure rate functions and arises in a latent complementary risks scenario, where the lifetime associated with a particular risk is not observable but only the maximum lifetime value among all risks. The properties of the proposed distribution are discussed, including a formal proof of its probability density function and explicit algebraic formulas for its survival and hazard functions, moments, th moment of the th order statistic, mean residual lifetime, modal value, and the observed Fisher information matrix. Maximum likelihood inference is implemented straightforwardly. The practical importance of the new distribution was demonstrated in three applications where the CE2G distribution provided the best fit in comparison with several other former lifetime distributions.

Appendix

In this appendix, we show the values of the elements of the observed Fisher information matrix in (33). From (32), we obtain

where , , , and .

Acknowledgments

V. Marchi and F. Louzada are supported by the Brazilian organizations CAPES and CNPq, respectively. The authors are grateful to Dr. Gauss Cordeiro, Editor of this special issue in, as well as to the anonymous Referees for their comments, criticisms, and suggestions, which lead to important improvements.