Abstract

The Weibull distribution is widely used in the parametric analysis of lifetime data. In place of the Weibull distribution, it is often more convenient to work with the equivalent extreme value distribution, which is the logarithm of the Weibull distribution. The main advantage in working with the extreme value distribution is that unlike the Weibull distribution, the extreme value distribution has location and scale parameters. This paper is devoted to a discussion of statistical inferences for the extreme value distribution with censored data. Numerical simulations are performed to examine the finite sample behaviors of the estimators of the parameters. These procedures are then applied to real-world data.

1. Introduction

In medical research, data documenting the time until the occurrence of a particular event, such as the death of a patient, is frequently encountered. Such data is called time-to-event data, also referred to as lifetime, survival time, or failure time data, which has in general right-skewed distribution. For this reason, the Weibull distribution is widely used. In place of the Weibull distribution, it is often more convenient to work with the equivalent extreme value distribution in which data are the logarithm of those taken from the Weibull distribution (Lawless [1]). Specifically, if 𝑌 has a Weibull distribution with 𝑓(𝑦)=𝜆𝛽(𝜆𝑦)𝛽1exp(𝜆𝑦)𝛽,𝑦>0,(1.1) where 𝜆>0 and 𝛽>0 are parameters, then 𝑇=log𝑌 has an extreme value distribution with 𝑏=1/𝛽 and 𝑢=log𝜆. The main convenience in working with the extreme value distribution is that unlike the Weibull distribution, this distribution has location and scale parameters. An excellent review on extreme value distributions can be found in Coles [2] and Kotz and Nadarajah [3].

A common feature of lifetime data is that the data points are possibly censored. For example, the event of interest may not have happened to all patients. A patient undergoing cancer therapy might die from a road accident. In this case, the observation period is cut off before the event occurs. In such a case, the data is said to be censored, and it would be incorrect to treat the time-to-death as lifetime. When data are censored (as in the case of the cancer patient who dies from a road accident), conventional statistical methods cannot be directly applied to analyze the data. Insteady, special statistical methods are necessary to handle such data. Censored data have been studied by many authors. Kaplan and Meier [4] proposed the estimate, the so-called product limit estimate, of the distribution function, Cox [5] introduced the still commonly used proportional hazard model, Buckley and James [6] investigated the censored regression model, and Prentice [7] studied two-sample censored data. More recent works include, among others, Tsiatis [8], Wei et al. [9], Wei and Gail [10], and Lee and Yang [11]. Our objective in this paper is to systematically study the inference procedures for the extreme value distribution with censored data. Based on the criteria of the empirical coverage probability and the confidence interval length in the numerical studies, we observe that the log transformation of the estimate enhances results by the usual normal approximation, and the likelihood ratio method is effective for small sample sizes when heavy censoring. Note that ideally, we expect that empirical coverage probabilities are close to the theoretical coverage probability, and empirical mean lengths are relatively short. Similar work on the maximum likelihood estimation in the Weibull distribution with censored data can be found in Cohen [12].

This paper is organized as follows: Section 2 describes the maximum likelihood estimates and their confidence intervals when the lifetime data includes some censored observations. Numerical results and a graphical method for checking the model are presented in Section 3. This section also illustrates the procedures using a vaginal cancer data set for rats. Section 4 states the conclusion.

2. Procedures

2.1. Maximum Likelihood Estimation

The probability density function for the extreme value distribution considered here is 1𝑓(𝑡)=𝑏𝑒(𝑡𝑢)/𝑏exp𝑒(𝑡𝑢)/𝑏<𝑡<,(2.1) where <𝑢< and 𝑏>0 are location and scale parameters, respectively. Suppose that the data follows the random censoring model. Consider a random sample consisting of 𝑛 observations. Let 𝑇𝑖, 𝑖=1,,𝑛 be lifetimes that are independent and positive random variables with the density function 𝑓(𝑡), the distribution function 𝐹(𝑡), and the survival function 𝑆(𝑡)=𝑃(𝑇𝑖𝑡). Let 𝐶𝑖, 𝑖=1,,𝑛 be censoring variables with the survival function 𝐺(𝑡)=𝑃(𝐶𝑖𝑡) that are independent of 𝑇𝑖. We observe 𝑇𝑖 only if 𝑇𝑖𝐶𝑖, and so the available data consists of pairs (𝑡𝑖,𝛿𝑖), 𝑖=1,,𝑛, where𝑡𝑖𝑇=min𝑖,𝐶𝑖and𝛿𝑖=1if𝑡𝑖=𝑇𝑖,0if𝑡𝑖=𝐶𝑖.(2.2)Notice that 𝛿𝑖=𝐼(𝑇𝑖𝐶𝑖) is the censoring indicator that is 1 when 𝑇𝑖𝐶𝑖 and 0 otherwise. Let 𝑓(𝑡)=𝑑𝐹(𝑡)/𝑑𝑡 and 𝑔(𝑡)=𝑑𝐺(𝑡)/𝑑𝑡. Then, we have 𝑃𝑡𝑖=𝑡,𝛿𝑖𝑇=1=𝑃𝑖=𝑡,𝑇𝑖𝐶𝑖𝑃𝑡=𝑓(𝑡)𝐺(𝑡),𝑖=𝑡,𝛿𝑖𝐶=0=𝑃𝑖=𝑡,𝑇𝑖>𝐶𝑖=𝑔(𝑡)𝑆(𝑡).(2.3)

The above probabilities can be combined into the single expression 𝑃𝑡𝑖=𝑡,𝛿𝑖=𝑓𝑡𝑖𝐺𝑡𝑖𝛿𝑖𝑔𝑡𝑖𝑆𝑡𝑖1𝛿𝑖.(2.4)

This yields the sampling distribution of (𝑡𝑖,𝛿𝑖),𝑖=1,,𝑛𝑛𝑖=1𝑓𝑡𝑖𝐺𝑡𝑖𝛿𝑖𝑔𝑡𝑖𝑆𝑡𝑖1𝛿𝑖=𝑛𝑖=1𝐺𝑡𝑖𝛿𝑖𝑔𝑡𝑖1𝛿𝑖𝑛𝑖=1𝑓𝑡𝑖𝛿𝑖𝑆𝑡𝑖1𝛿𝑖.(2.5)

Knowing that 𝑔(𝑡) and 𝐺(𝑡) do not contain any parameter of interest, we have the likelihood function defined as 𝐿(𝑢,𝑏)=𝑛𝑖=1𝑓𝑡𝑖𝛿𝑖𝑆𝑡𝑖1𝛿𝑖.(2.6)

It can be easily shown that for the extreme value distribution, the survival function is 𝑆(𝑡)=exp𝑒(𝑡𝑢)/𝑏.(2.7)

Hence, the above likelihood function can be written as𝐿(𝑢,𝑏)=𝑛𝑖=11𝑏𝑡exp𝑖𝑢𝑏𝑒(𝑡𝑖𝑢)/𝑏𝛿𝑖exp𝑒(𝑡𝑖𝑢)/𝑏1𝛿𝑖.(2.8) To estimate (𝑢,𝑏), we find the values of (𝑢,𝑏) that maximize log[𝐿(𝑢,𝑏)] by setting the derivative 𝑑log𝐿(𝑢,𝑏)/𝑑(𝑢,𝑏) equal to zero and solving for (𝑢,𝑏).

From (2.8), we have the log likelihood function log𝐿(𝑢,𝑏)=𝑛𝑖=1𝛿𝑖𝑡log𝑏+𝑖𝑢𝑏𝑒(𝑡𝑖𝑢)/𝑏1𝛿𝑖𝑒(𝑡𝑖𝑢)/𝑏(2.9)

which is equivalent tolog𝐿(𝑢,𝑏)=𝑟log𝑏+𝑛𝑖=1𝛿𝑖𝑡𝑖𝑢𝑏𝑛𝑖=1𝑒(𝑡𝑖𝑢)/𝑏(2.10) by letting 𝑟=𝑛𝑖=1𝛿𝑖. Differentiating (2.10) with respect to 𝑢 and 𝑏 in turn and equating to zero, we obtain the estimating equations𝜕log𝐿=1𝜕𝑢𝑏𝑛𝑖=1𝑒𝑧𝑖𝑟=0,𝜕log𝐿=1𝜕𝑏𝑏𝑟𝑛𝑖=1𝛿𝑖𝑧𝑖+𝑛𝑖=1𝑧𝑖𝑒𝑧𝑖=0,(2.11) where 𝑧𝑖=(𝑡𝑖𝑢)/𝑏.

The above equations can be solved by some numerical techniques such as the Newton-Raphson iteration or random search to locate the estimates, ̂𝑢 and ̂𝑏, of 𝑢 and 𝑏, respectively. In this paper, the random search was used for simplicity, although it is computationally intensive.

From (2.11), the maximum likelihood equations can be written as 𝑛𝑖=1𝑒̂𝑧𝑖=𝑟,𝑛𝑖=1̂𝑧𝑖𝑒̂𝑧𝑖𝑛𝑖=1𝛿𝑖̂𝑧𝑖=𝑟,(2.12)

which are equivalent to𝑒=1̂𝑢𝑟𝑛𝑖=1𝑒𝑡𝑖/̂𝑏̂𝑏,(2.13)𝑛𝑖=1𝑡𝑖𝑒𝑡𝑖/̂𝑏𝑛𝑖=1𝑒𝑡𝑖/̂𝑏̂1𝑏𝑟𝑛𝑖=1𝛿𝑖𝑡𝑖=0,(2.14) respectively. Equation (2.14) can be solved iteratively for ̂𝑏, then ̂𝑢 calculated from (2.13).

To make inferences about (𝑢,𝑏), we can use the fact that, by the usual large-sample theory, the joint distribution of ̂𝑢 and ̂𝑏 is approximately bivariate normal with mean (𝑢,𝑏) and covariance matrix 𝐼1(𝑢,𝑏). Symbolically, ̂𝑏̂𝑢,𝑁(𝑢,𝑏),𝐼1,(𝑢,𝑏)(2.15)

where 𝐼 is the Fisher information matrix, defined as E𝜕𝐼(𝑢,𝑏)=2log𝐿𝜕𝑢2E𝜕2log𝐿E𝜕𝜕𝑢𝜕𝑏2log𝐿E𝜕𝜕𝑢𝜕𝑏2log𝐿𝜕𝑏2.(2.16)

It is often difficult to evaluate the expectations in 𝐼(𝑢,𝑏), so a natural procedure is to use the approximation ̂𝑏̂𝑢,N(𝑢,𝑏),𝐼01,(2.17)

where 𝐼0 is the observed information matrix 𝐼0=𝜕2log𝐿𝜕𝑢2𝜕2log𝐿𝜕𝜕𝑢𝜕𝑏2log𝐿𝜕𝜕𝑢𝜕𝑏2log𝐿𝜕𝑏2|||||||(𝑢,𝑏)=(̂𝑢,̂𝑏).(2.18)With 𝑧𝑖=(𝑡𝑖𝑢)/𝑏, we have 𝜕2log𝐿𝜕𝑢2||||(̂𝑢,̂𝑏)1=̂𝑏2𝜕𝑟,2log𝐿𝜕𝑏2||||(̂𝑢,̂𝑏)1=̂𝑏2𝑟+𝑛𝑖=1̂𝑧2𝑖𝑒̂𝑧𝑖,𝜕2log𝐿||||𝜕𝑢𝜕𝑏(̂𝑢,̂𝑏)1=̂𝑏2𝑛𝑖=1̂𝑧𝑖𝑒̂𝑧𝑖.(2.19)These yield 𝐼0=1̂𝑏2𝑟𝑛𝑖=1̂𝑧2𝑖𝑒̂𝑧𝑖𝑛𝑖=1̂𝑧2𝑖𝑒̂𝑧𝑖𝑟+𝑛𝑖=1̂𝑧2𝑖𝑒̂𝑧𝑖.(2.20)

2.2. Inference Procedure

From the usual large-sample theory, we have ̂𝑏̂𝑢,(𝑢,𝑏)𝑁(0,0),𝐼01.(2.21)

Thus, ̂𝑢𝑢𝑑11𝑑̂𝑁(0,1),𝑏𝑏𝑑22𝑑𝑁(0,1),(2.22)

where the matrix 𝐼01=(𝑑𝑖𝑗),𝑖,𝑗=1,2.

From the asymptotic normality of ̂(̂𝑢,𝑏)(𝑢,𝑏) and the estimated variance, inference procedures on (𝑢,𝑏) can be easily obtained. For example, possible asymptotic (1𝛼)100% confidence intervals for 𝑢 and 𝑏 are ̂𝑢𝑧𝛼/2𝑑11,̂𝑢+𝑧𝛼/2𝑑11,̂𝑏𝑧𝛼/2𝑑22,̂𝑏+𝑧𝛼/2𝑑22,(2.23)

respectively, where 𝑧𝛼/2 is the (1𝛼/2)-percentile of the standard normal distribution. However, such an interval for the scale parameter 𝑏 may contain negative values. Recall that <𝑢< and 𝑏>0. Similar to Lawless [1], this problem can be repaired by using the log transformation. Let 𝜉=log𝑏 and ̂̂𝑏𝜉=log. A standard argument shows that ̂log𝑏log𝑏 is asymptotically equivalent to ̂𝑏1𝑏/. From this, we have ̂𝜉𝑢𝜉=̂1̂𝑢̂𝑢𝑢log𝑏log𝑏̂𝑢𝑢̂𝑏̂𝑏𝑏,(2.24)

which is equivalent to ̂𝐴̂𝑢𝑢𝑏𝑏(2.25)

with 01𝐴=10̂𝑏.(2.26)

Therefore, we have ̂𝜉̂𝑢,(𝑢,𝜉)N(0,0),𝐴𝐼01𝐴,(2.27)

where 𝐴 is the transpose of 𝐴. Let (2,2)th entry of 𝐴𝐼01𝐴 be 𝑚22. Then, an asymptotic (1𝛼)100% confidence interval for 𝜉 is ̂𝜉𝑧𝛼/2𝑚22,̂𝜉+𝑧𝛼/2𝑚22.(2.28)

Hence, since 𝑏=𝑒𝜉, an asymptotic (1𝛼)100% confidence interval for 𝑏 is 𝑒̂𝜉𝑧𝛼/2𝑚22̂𝜉,𝑒+𝑧𝛼/2𝑚22.(2.29)

Note that the interval always lies in the positive half of the axis.

The procedures based on the normal approximation are appropriate for quite large sample sizes. An appealing alternative is to use likelihood ratio procedures. Chi-squared (𝜒2) distributions, approximating the distributions of likelihood ratio test, are often found to be adequate for small sample sizes. We include the confidence interval from these procedures for a comparative study. The procedures are discussed below.

Consider the test problem 𝐻0𝑏=𝑏0 versus 𝐻𝑎𝑏𝑏0. The likelihood ratio test, with level 𝛼, rejects 𝐻0 when 2logmax𝐻0𝐿(𝑢,𝑏)max𝐻a𝐿(𝑢,𝑏)>𝜒21,𝛼,(2.30)

where max𝐻0𝐿(𝑢,𝑏)=max𝑢𝐿(𝑢,𝑏0) is achieved at ̃𝑢 maximizing log𝐿(𝑢,b0). This yields ̃𝑢=𝑏0log𝑛𝑖=1𝑒𝑡𝑖/𝑏0𝑟,(2.31)

at which 𝜕log𝐿|𝜕𝑢̃𝑢,𝑏0=0.(2.32)

Note that max𝐻𝑎̂𝐿(𝑢,𝑏)=𝐿(̂𝑢,𝑏) with probability 1. A (1𝛼)100% confidence interval for 𝑏 is 𝑏0𝐿2log̃𝑢,𝑏0𝐿̂𝑏̂𝑢,𝜒21,𝛼.(2.33)

Similarly, a (1𝛼)100% confidence interval for 𝑢 is 𝑢0𝐿𝑢2log0,̃𝑏𝐿̂𝑏̂𝑢,𝜒21,𝛼,(2.34)

where max𝐻0𝐿(𝑢,𝑏)=max𝑏𝐿(𝑢0,𝑏) is achieved at ̃𝑏, obtained from 𝜕log𝐿/𝜕𝑏|𝑢0,̃𝑏=0.

3. Numerical Studies

Several experimental simulations were carried out to assess the performance of the confidence intervals discussed in Section 2. We report the simulation results based on the criteria of the empirical coverage probability and the empirical mean length of the confidence interval. For the simulation study, the survival data is taken from the extreme value distribution with 𝑢=0 and 𝑏=1. The censoring distributions are normal, where 𝐶𝑖=𝑐+Normal(0,1), with 𝑐 chosen to result in various censoring percentages in the samples. For the sample sizes of 𝑛=20,50, and 100, results were based on 500 repetitions. For the censored data, the censoring proportions of 20%, 30%, 40%, 50%, 60%, and 70% were used. These censoring proportions were obtained from settings of 𝑐=0.75,0.33,0.08,0.47,0.87,1.3, and −1.87, respectively. Simulation results based on these settings are summarized in Table 1.

It should be noted that although the normal approximation procedures are adequate for quite large samples, the approximations on which they are based are rather poor for small-size samples (Lawless [1]). One possible way to solve this problem is a transformation. For example, simulation results presented in Table 1 show that the log transformation appears to be one way of alleviating this problem. That is, treating ̂𝑏log as approximately normal is preferable to treating ̂𝑏 as approximately normal. In Table 1, Method 1 denotes the confidence interval results from the large-sample normal approximation, and Method 2 in parenthesis indicates the confidence interval results by the log transformation of the maximum likelihood estimate, ̂𝑏. In the Table, ECP is the empirical coverage probability, and EML means the empirical mean length of the confidence interval. The likelihood ratio method is denoted by LR. It is known that the likelihood ratio method is often found to be appropriate for small sample sizes.

We now look at the results for the censored data case presented in Table 1. Overall, the empirical coverage probabilities of the parameters by Method 1 are noticeably improved by Method 2 and LR for all sample sizes in every censoring proportion considered. If we restrict our attention to the scale parameter (𝑣) in Table 1, the performance of the confidence interval obtained by Method 1 is improved by the log transformation (Method 2) for the small sample size at all censoring proportions. It is also observed that the LR method outperforms Method 2 that is superior to Method 1. The 95% confidence intervals from the 500 simulations result in 500 independent Bernoulli random variables, where success occurs when the true parameter is covered by the confidence interval and the probability of success is 95%. Thus the 95% error margin for the empirical coverage probability is 1.96(.95)(1.95)/500. This implies that the empirical coverage probability is expected to fall within the interval (.9309,   .9691). For the sample size of 𝑛=20, Method 1 fails to provide the confidence intervals that lie in the interval (.9309,   .9691), indicating that it does not achieve the theoretical coverage probability of 95%. The confidence intervals by LR fall within the interval (.9309,  .9691), whereas Method 2 is close but still does not achieve the theoretical coverage probability. For moderate and large sample sizes, there seems to be no dominant method that outperforms the others. The confidence intervals by Methods 1, 2 and LR approximately achieve the theoretical coverage, probability of 95%. However, LR tends to be superior to Method 1 when censoring is heavy. As sample size increases for all censoring proportions, the confidence interval length decreases. It seems that the confidence interval lengths by Method 1 are slightly shorter than Method 2 and LR but these differences are not substantial.

In the case of the location parameter (𝑢), Methods 1, 2, and LR show nearly all of the same performance for all sample sizes when data are not heavily censored. However, for small sizes with heavily censored data, LR appears to outperform Method 1, achieving the theoretical coverage probability of 95%. Method 1 fails to achieve the nominal level in this case. The differences of the confidence interval lengths are not substantial, although Method 1 is slightly shorter than LR.

We also discuss a graphical method for checking the adequacy of the distribution. The extreme value survival function satisfies log[log𝑆(𝑡)]=(𝑡𝑢)/𝑏, so 𝑡=𝑢+𝑏log[log𝑆(𝑡)]. With 𝑢=0 and 𝑏=1, therefore, log[log𝑆(𝑡)] is a linear function of 𝑡, and a plot of log[log𝑆(𝑡)] versus 𝑡 should be roughly linear if the extreme value distribution is reasonable. When data are censored, the most widely used estimate for 𝑆(𝑡) is the Kaplan-Meier estimate (Kaplan and Meier [4]), also referred to as the product-limit estimate of the survival function. The Kaplan-Meier estimate is defined as 𝑆(𝑡)=𝑗𝑡𝑗<𝑡𝑑1𝑗𝑛𝑗,(3.1)

where 𝑑𝑗 represents the number of lifetimes at time 𝑡𝑗, and 𝑛𝑗 is the number of individuals uncensored before 𝑡𝑗. The 𝑆(𝑡) was used for 𝑆(𝑡) in this paper. Figure 1 shows a linear relationship between 𝑡 and log[log𝑆(𝑡)], although some indistinguishable deviations from linear trend are detected under heavy censoring. For a sample size of 𝑛=100, Figure 1 was obtained from the same settings as the preceding simulations.

The procedures are applied to a real data set. Pike [13] gives results of a laboratory experiment concerning vaginal cancer in female rats. In this experiment, 19 rats are painted with the carcinogen DMBA, and the number of days until the appearance of a carcinoma was observed. At the end of study, 17 out the 19 rats had developed a carcinoma, and this indicates that two of the times are censored. The censoring proportion is 2/190.105. See Pike [13] and Lawless [1] for details. In order to check the adequacy of an extreme value model, we consider a plot of log(log𝑆(𝑡)) versus 𝑡. Figure 2 shows this to be roughly linear, suggesting that the extreme value distribution could be reasonable. Another graphical approach employed by Lawless [1] for the vaginal cancer data also confirms that the model is plausible. The estimation procedures under the extreme value distribution give that ̂𝑢=5.4565 and ̂𝑏=0.1649, and the interval estimation results are summarized in Table 2. In this study concerning vaginal cancer for rats, sample size is small and censoring level is low. Therefore, among the three confidence intervals, the interval constructed by the likelihood ratio method would be the most reliable, especially for the scale parameter of the model.

4. Concluding Remarks

In this paper, we have investigated the inference procedures for the extreme value distribution with censored observations. The extreme value distribution is a useful model in the parametric analysis of lifetime data. Through numerical studies, the inference procedures, based on the maximum likelihood estimates, were examined. The usual normal approximation procedures were enhanced by means of the log transformation and the likelihood ratio method. By analysis of the empirical coverage probabilities and the empirical mean lengths of the confidence intervals, we have found that the likelihood ratio method is very effective for small sample sizes when data are heavily censored. A graphical method for checking the adequacy of the distribution was also discussed. The procedures were then applied to a real-world data set.