Variational Inequalities and Vector OptimizationView this Special Issue
Research Article | Open Access
Weighted Wilcoxon-Type Rank Test for Interval Censored Data
Interval censored (IC) failure time data are often observed in medical follow-up studies and clinical trials where subjects can only be followed periodically, and the failure time can only be known to lie in an interval. In this paper, we propose a weighted Wilcoxon-type rank test for the problem of comparing two IC samples. Under a very general sampling technique developed by Fay (1999), the mean and variance of the test statistics under the null hypothesis can be derived. Through simulation studies, we find that the performance of the proposed test is better than that of the two existing Wilcoxon-type rank tests proposed by Mantel (1967) and R. Peto and J. Peto (1972). The proposed test is illustrated by means of an example involving patients in AIDS cohort studies.
Interval censored (IC) failure time data often arise from medical studies such as AIDS cohort studies and leukemic blood cancer follow-up studies. In these studies, patients were divided into two groups according to different treatments. For example, in leukemic cancer studies, one group of the patients was treated with radiotherapy alone, and the other group of patients was treated with initial radiotherapy along with adjuvant chemotherapy. The two groups of patients were examined every month, and the failure time of interest is the time until the appearance of leukemia retraction; the object is to test the difference of the failure times between the two treatments. Some of the patients missed some successive scheduled examinations and came back later with a changed clinical status, and they contributed IC observations. For our convenience, we assume that in such a medical study, the underlying survival function can be either discrete or continuous, and there are only finitely many scheduled examination times. IC data only provide partial information about the lifetime of the subject, and the data is one kind of incomplete data. To deal with such incomplete data, Turnbull  introduced a self-consistent algorithm to compute the maximum likelihood estimate of the survival function for arbitrarily censored and truncated data. For IC data, there have been some related studies in the literature as well. For example, Mantel  extends Gehan’s [3, 4] generalized Wilcoxon  test to interval censored data, and R. Peto and J. Peto  also develop a different version. Sun  applied Turnbull’s algorithm to estimate the number of failures and risks of IC data and then propose a log-rank type test.
Fay , Sun , Zhao and Sun , Sun et al. , and Huang et al.  extend the log-rank test to interval censored data. Petroni and Wolfe  and Lim and Sun  generalize Pepe and Fleming’s  weighted Kaplan-Meier (WKM)  test to interval censored data.
For the purpose of comparing the power of the test statistics, Fay  proposed a model for generating interval censored observation. A similar selection scheme can also be seen in the Urn model of Lee  and mixed cased model of Schick and Yu . In this paper, we propose a Wilcoxon-type weighted rank test to compare with the existing two Wilcoxon-type rank tests proposed by Mantel  and R. Peto and J. Peto . We restrict ourselves to the Wilcoxon-type rank tests because these tests are simple to use and have the robustness property that their powers are fairly stable under different lifetime distributions.
This paper is organized as follows. In Section 2, we review the Turnbull’s  algorithm and introduce Fay’s  selection model for generating interval censored data. This selection model can be extended to a more general one, and the consistency property can be found in Schick and Yu . In Section 3, we introduce Mantel’s  and R. Peto and J. Peto’s  generalized Wilcoxon-type rank tests and propose our weighted rank test. In Section 4, a simulation study is conducted to compare the performance of the three tests under different configurations. Finally, an application to AIDS cohort study is presented in Section 5.
2. Data Treatment
Assume that is the lifetime random variable of a survival study, measured in discrete units and taking values . Let be the collection of all admissible intervals, and define , where , so that , and . Note that the observed failure time data in a clinical trial can be discretized if the underlying variable is continuous.
2.1. Turnbull’s Algorithm
Suppose that there is a sample of i.i.d. observations of , . Here, is the IC observation of the th individual in the sample, where , and . The case is to denote that the failure time of the th subject occurs after the last examination time . Turnbull  proposed an algorithm to estimate the unknown probabilities . The algorithm can be described by the following four steps.
Step 1. Start with initial values .
Step 2. Obtain improved estimates by setting
Step 3. Return to Step 1 with replacing .
Step 4. Stop when the required accuracy has been achieved.
The algorithm is simple and converges fairly rapidly. The estimate yielded from the iteration is in fact the unique maximum likelihood estimate of and is a self-consistent estimate.
2.2. Return Probability Model
To comply with the periodical clinical inspection, Fay  proposed a simulation model for generating IC data. He assumed that the probability for a patient to return to the clinic for inspection at time points are i.i.d. Bernulli random variables ; that is, , , , . means that the patient returned to the clinic at the inspection time , and means that the patient missed the inspection. In our model, we always assume that . The failure time is independent of , and the observable random interval is
2.2.1. Model Consistency
Under Fay’s  selection model, the consistency property has been proved. This selection model can be generalized to the case that the return probability at each examination time point may be different; say that , . To demonstrate the generalized return model, we set and , , and . The selection probabilities for all admissible intervals are shown in Tables 1 and 2.
It is not difficult to see that the selection probability of the interval is where , , and . For instance, the interval may be selected under two possibilities. First, the true value of is , and the patient who missed the inspection at then goes to inspection at ; in this case, the interval is selected with probability . Second, the true value of is , and the patient missed the inspection at then goes to inspection at ; in this case, the interval is selected with probability , and therefore .
The generalized return probability model can be viewed as a special case of the mixed case model in Schick and Yu ; under very mild conditions, the estimate of computed by Turnbull’s algorithm is still consistent.
3. Wilcoxon-Type Rank Tests for Interval Censored Data
Two-sample Wilcoxon rank test is a well-known method to test whether two samples of exact data come from the same population. The method is constructed by ranking the pooled samples and giving an appropriate rank to each observation. However, this ranking technique is in general not admissible for intervals. In this section, we will discuss how to generalize the ranking technique and then propose a Wilcoxon-type rank test for IC data to compare with two existing rank tests proposed by Mantel  and R. Peto and J. Peto . Suppose that two samples of IC data for and are, respectively, , and , . To test whether these two samples come from the same population is equivalent to testing the equality of survival functions and ; that is,
3.1. Mantel’s Test
Mantel  extended Gehan’s [3, 4] generalized Wilcoxon test to interval censored data by defining the score of the th observation as the number of observations that are definitely greater than the th observation minus the number of observations that are definitely less than the th observation. He proposed the test statistic
Under , the test statistic is approximately normal distributed with mean 0 and variance
3.2. R. Peto and J. Peto’s Test
Different from the Mantel’s generalized version, R. Peto and J. Peto  defined the score of the th observation as where is the estimated survival function, ; hence, . They proposed the test statistic Under , the test statistic is approximately distributed as .
3.3. Our Proposed Wilcoxon-Type Weighted Rank Test
To transform an IC data to exact, we first assign each inspection time a primary rank ; for instance, . Rewrite any observation, say , as , where , and . Then, we associate the observation with the weighted rank Let , be, respectively, the average weighted rank of the and samples, so that To test whether two IC samples come from the same population, we propose the test statistic Under , the central limit theorem implies that W.R.T is approximately distributed as a standard normal random variable. However, the mean and variance of and may depend on the probability space where they are defined; it means, different selection probability for IC intervals in (4) leads to different mean and variance of and . We therefore only consider the selection model of Fay defined in Section 2.2. In this model, the selection probability of an IC interval is in one of the following categories: Consider the probability space (), where the probability measure is defined in Section 2. To compute the variance of and , we define a random variable on this space by assigning value to the interval in , where The value can be viewed as the weighted rank of . If , are chosen as in the Wilcoxon test for exact data, then our proposed test statistic W.R.T is a Wilcoxon-type weighted rank test. Under this probability space, the expectation can be simplified as in the following theorem.
Theorem 1. Suppose that is the random variable defined on the probability space according to (17). Then, the expectation of , , can be simplified as which is independent of the choice of .
Proof. It is obvious that can be written as , where the coefficients , are to be determined. The theorem is, hence, proved if we can show that all the coefficients are ones.
Consider first. An interval contributes in if and only if it contains the point . Therefore, it must be of the form , . For intervals , , the probabilities are defined in (13). For interval , the probability is defined in (14). Therefore, the coefficient is Next, consider the coefficient for . An interval contributes if and only if it contains the point . Therefore, it must be of the form , where . It is necessary to study the contribution of the interval to in four different categories.(i).
By (13), this category contributes .(ii).
By (14), the interval contributes .(iii).
By (15), this category contributes .(iv).
By (16), this category contributes .
Consequently, the coefficient of is Finally, the proof for the case is
The variance of , , is where and are the selected probability and the weighted rank of the th admissible interval of , respectively, .
Consider the formulas (13)–(16), the selection probability depends on and ; therefore, the likelihood function can be written as where , and are positive integers determined by the sample. Since the probability can be estimated by Turnbull’s  algorithm discussed in Section 2.2, and can also be estimated by trivially.
For demonstration, we set , inspection times , , and the true lifetime is exponentially distributed with . For different sample sizes , 100, and 150, different return probabilities of inspection , , and 0.3, and simulation with 100 replications, Table 3 presents the estimates of and sample variance and sample deviation of . To show the normality of W.R.T, we assume that the two populations (sample size ) are coming from the same distribution exponential (1/5). By simulation with 10000 replications and different return probabilities of inspection , , and 0.3, Table 4 presents the quantiles of W.R.T and . Figure 1 shows the CDF plots of and W.R.T with .
4. Simulation Study
In this section, we carry out simulation studies to compare the performance of W.R.T test with Mantel’s  and Peto’s  tests. In the study, we assume that the failure time random variable is distributed as exponential, total sample sizes are and 200, and each sample has subjects. The interval censored data are generated by the following four steps.
Step 1. Generate a failure time from some distribution.
Step 2. Create a 0, 1 sequence with probabilities , , and .
Step 3. The observation is , if , and .
We consider three return probabilities, , , and 0.3, two sets of inspection time points, , and 1000 replications at significance level 0.05.
In the case of , 6 return points, we set the hazards 1/3 for population 1 and for population 2. Figure 2 shows the density plot of exponential distribution with , −0.2, 0, 0.2, 0.4. In the case of , 10 return points, we set the hazards 1/4 for population 1 and for population 2. Figure 3 shows the density plot of exponential distribution with , −0.3, 0, 0.3, 0.6. Tables 5 and 6 present the powers of the three tests with sample size and 200. Simulation result shows that when the failure times come from the exponential distribution, our proposed test W.R.T is the most powerful.
5. An Application to AIDS Cohort Study
Consider the data of 262 hemophilia patients in De Gruttola and Lagakos , among them, 105 patients received at least 1,000 g/kg of blood factor for at least one year between 1982 and 1985, and the other 157 patients received less than 1,000 g/kg in each year. In this medical study, patients were treated between 1978 and 1988, the observations (] for the 262 patients, based on a discretization of the time axis into 6-month intervals. The failure time of interest is the time of HIV seroconversion. The object is to test the difference of the failure times between the two treatments. Applying our proposed test, namely, W.R.T, Mantel’s  and Peto’s  tests to this data set, the values of the three test statistics are −7.815, −7.352, and 56.476, respectively. All the three values are less than 0.001 and have the same conclusion that the HIV seroconversion appeared in the two groups of patients being significantly different.
- B. W. Turnbull, “The empirical distribution function with arbitrarily grouped, censored and truncated data,” Journal of the Royal Statistical Society B, vol. 38, no. 3, pp. 290–295, 1976.
- N. Mantel, “Ranking procedures for arbitrarily restricted observation,” Biometrics, vol. 23, no. 1, pp. 65–78, 1967.
- E. A. Gehan, “A generalized Wilcoxon test for comparing arbitrarily singly-censored samples,” Biometrika, vol. 52, no. 1-2, pp. 203–223, 1965.
- E. A. Gehan, “A generalized two-sample Wilcoxon test for doubly censored data,” Biometrika, vol. 62, no. 3-4, pp. 650–653, 1965.
- F. Wilcoxon, “Individual comparisons by ranking methods,” Biometrika, vol. 1, no. 6, pp. 80–83, 1945.
- R. Peto and J. Peto, “Asymptotically effcient rank invariant test procedures,” Journal of the Royal Statistical Society A, vol. 135, no. 2, pp. 185–206, 1972.
- J. Sun, “A non-parametric test for interval-censored failure time data with application to AIDS studies,” Statistics in Medicine, vol. 15, no. 13, pp. 1378–1395, 1996.
- M. P. Fay, “Comparing several score tests for interval censored data,” Statistics in Medicine, vol. 18, no. 3, pp. 273–285, 1999.
- Q. Zhao and J. Sun, “Generalized log-rank test for mixed interval-censored failure time data,” Statistics in Medicine, vol. 23, no. 10, pp. 1621–1629, 2004.
- J. Sun, Q. Zhao, and X. Zhao, “Generalized log-rank tests for interval-censored failure time data,” Scandinavian Journal of Statistics, vol. 32, no. 1, pp. 49–57, 2005.
- J. Huang, C. Lee, and Q. Yu, “A generalized log-rank test for interval-censored failure time data via multiple imputation,” Statistics in Medicine, vol. 27, no. 17, pp. 3217–3226, 2008.
- G. R. Petroni and R. A. Wolfe, “A two-sample test for stochastic ordering with interval-censored data,” Biometrics, vol. 50, no. 1, pp. 77–87, 1994.
- H.-J. Lim and J. Sun, “Nonparametric tests for interval-censored failure time data,” Biometrical Journal, vol. 45, no. 3, pp. 263–276, 2003.
- M. S. Pepe and T. R. Fleming, “Weighted Kaplan-Meier statistics: a class of distance tests for censored survival data,” Biometrics, vol. 45, no. 2, pp. 497–507, 1989.
- E. L. Kaplan and P. Meier, “Nonparametric estimation from incomplete observations,” Journal of the American Statistical Association, vol. 53, no. 282, pp. 457–481, 1958.
- C. Lee, “An urn model in the simulation of interval censored failure time data,” Statistics & Probability Letters, vol. 45, no. 2, pp. 131–139, 1999.
- A. Schick and Q. Yu, “Consistency of the GMLE with mixed case interval-censored data,” Scandinavian Journal of Statistics, vol. 27, no. 1, pp. 45–55, 2000.
- V. De Gruttola and S. W. Lagakos, “Analysis of doubly-censored survival data, with application to AIDS,” Biometrics, vol. 45, no. 1, pp. 1–11, 1989.
Copyright © 2013 Ching-fu Shen et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.