Power Loss of Stratified Log-Rank Test in Homogeneous Samples
We study the loss of power of the stratified log-rank test (SLRT) compared to the unstratified log-rank test (ULRT) in the case of a large number of strata with relatively a small number of stratum sizes in terms of the asymptotic distributions of test statistics under local alternatives. The SLRT tends to lose information due to overstratification. It is better to test the homogeneity among strata before using the stratified log-rank test.
It is well known in survival analysis that the (unstratified) log-rank test (ULRT) is the most efficient invariant test under contiguous alternatives in the proportional hazards model [1, 2]. Gill  gave a nice proof of this conclusion by using the Cauchy-Schwarz inequality.
In multicenter clinical trials with time-to-event as the primary outcome variable, we want to compare the treatment effects of two or more treatment methods. In the example in Section 6, patients were randomized in a ration to two treatment groups. The primary outcome is the time to death from all causes during the study. Individuals from different centers are assumed to be independent. Even if the treatment effect can be assumed to be the same among centers, each center may have some factors which make the baseline hazard functions different from center to center. For this kind of data, the stratified log-rank test (SLRT) can be used to account for the baseline difference.
Some previous work has been developed to study the power loss of the log-rank test. Akazawa et al.  evaluated through simulation the loss of power of stratified log-rank test due to the the heterogeneity in clinical trials. Generally, the power of the stratified log-rank test decreases due to two reasons: (i) the stratum size may be too small and (ii) the individuals in the same stratum are heterogeneous. The simulation shows that the loss of power is substantial when the stratum size is very small and “the total number of failures and the treatment effect are fixed”. From the stratified Cox regression model in survival analysis, we can see that if the stratum size is small, its contribution to the overall test is small with censored data in the stratum. This may decrease the power of the stratified log-rank test. Note the stratified log-rank test is the score test from stratified partial likelihood (see Section 3).
In this paper, we consider the case where there is a large number of strata, but each stratum has a relatively small sample size. We assume that patients are homogeneous within each treatment group. For this kind of data, we can construct both the stratified and unstratified log-rank tests. We derive a variance relation between the SLRT and ULRT and quantify the power loss due to unnecessary stratification by this relation. We illustrate our approach with data from a multi-center clinical trial (MADIT II) to test the treatment effect of an implantable defibrillator on survival of patients with reduced left ventricular function after myocardial infarction.
This paper is organized as follows. Data and notation are described in Section 2. We derive the SLRT and ULRT and their local asymptotic distribution from the Cox proportional hazards regression model in Sections 3 and 4. An association between the SLRT and ULRT is developed in Section 5. We apply the approach to the MADIT II study data in Section 6, and offer concluding remarks in Section 7.
2. Data Structure and Notations
Suppose there are centers with patients randomized to treatment () or control group () in center . Assume that , , are independent and identically distributed (iid) positive integer random variables with finite second moments.
The underlying survival times , , are subject to random censoring with censoring time . Here we assume that are independent of and . We further assume that are iid. Due to the censoring, the observable data are
Define the stochastic processes and by
Therefore, if and only if the event has happened before time , and if and only if the patient is still at risk immediately before . Note that
Note that if and only if and . The latter one means . Therefore, for all . Define
Suppose that the hazard functions of are of the form
For homogeneous samples, for all .
3. Stratified Log-Rank Test
The log partial likelihood is
The first two order derivatives of the log partial likelihood are
Under the null hypothesis (no treatment effect),
where . Let
The predictable variation of is Under the null hypothesis, , where can be consistently estimated by . The stratified log-rank test is defined as
The asymptotic distribution of SLRT under local alternatives is
which is derived in the appendix.
4. Unstratified Log-Rank Test
Similar to SLRT, the ULRT can be derived from the Cox proportional hazards model. The log partial likelihood function is
The first-and the second-order derivatives of the log partial likelihood are
Under the null hypothesis , where (18) is the general form of log-rank used in literatures. Let
The predictable variation of is
Under the null hypothesis, , where can be consistently estimated by . The unstratified log-rank test is defined as
With the same method as in the appendix, the asymptotic distribution of ULRT under local alternatives is
5. A Relation between the Asymptotic Variances of and
From martingale theory, the predictable covariation of and is
This means that the asymptotic covariance of and satisfies
From Cauchy-Schwarz inequality,
Lemma 1. Suppose . Then for any , is an increasing function of .
This lemma is readily checked. From (22), (25), (A.4), and Lemma 1, the ULRT is always asymptotically more powerful than the SLRT in the homogeneous samples. The loss of power of SLRT is due to the loss of information from the unnecessary stratification.
6. A Real Example
We study the SLRT and ULRT in a multi-center clinical trial . The Multicenter Automatic Defibrillator Implantation Trial II (MADIT-II) was designed to evaluate the potential survival benefit of a prophylactically implanted defibrillator in coronary patients with a prior myocardial infarction and advanced left ventricular (ejection fraction ). The trial started in July 1997, and enrolled 1232 patients from 76 hospital centers (71 in US and 5 in Europe). The patients were randomized in a ratio to receive either an implantable defibrillator or conventional medical therapy. We first test the homogeneity of strata by the log-rank test (-value ). The estimated variances of and are and , respectively. This shows that the ULRT is asymptotically slightly more powerful than its SLRT counterpart. For those 76 centers, the number of patients ranges from to with mean value . There were centers without any event by the end of the study. They did not contribute to the stratified log-rank test.
In this paper, we studied the loss of power of stratified log-rank test in multi-center clinical trials with a large number of centers, but relatively small stratum size (assuming homogeneous strata). Our results show that asymptotic variance of SLRT is smaller than that of ULRT which makes the SLRT less powerful. Overstratification may incur loss of information compared to the unstratified log-rank test. However, there are some limitations in our study. First, we assumed that strata are homogeneous. In that case, the unstratified log-rank test should be the best choice. In practice, it is important to test homogeneity of strata before using the stratified log-rank test. Second, we considered the case with a large number of strata, but small stratum size. Another case of interest is a small number of strata with large stratum size. Although (23) is always true, the local alternatives cannot be specified in terms of . We are currently investigating whether (25) is still true.
The Asymptotic Distribution of SLRT Under Local Alternatives
To study the asymptotic distribution of SLRT under alternatives, we consider the following local alternatives: . Then from Taylor expansion,
where is between and . Note that is bounded in probability uniformly. Then
This means that
The authors gratefully thank Dr. Author J. Moss (PI of MADIT-II) for allowing them to use the MADIT-II data in this paper. This research was supported by Grant 5U19AI056390-05 from the National Institutes of Health of USA.
R. Peto and J. Peto, “Asymptotically efficient rank invariant test procedures (with discussion),” Journal of the Royal Statistical Society. Series A, vol. 135, no. 2, pp. 185–207, 1972.View at: Google Scholar
R. D. Gill, Censoring and Stochastic Integrals, vol. 124 of Mathematical Centre Tracts, Mathematisch Centrum, Amsterdam, The Netherlands, 1980.
D. R. Cox, “Regression models and life-tables (with discussion),” Journal of the Royal Statistical Society. Series B, vol. 34, pp. 187–220, 1972.View at: Google Scholar
P. K. Andersen, Ø. Borgan, R. D. Gill, and N. Keiding, Statistical Models Based on Counting Processes, Springer Series in Statistics, Springer, New York, NY, USA, 1993.
A. J. Moss, W. Zareba, W. J. Hall et al., “Prophylactic implantation of a defibrillator in patietns with myocardial infarction and reduced ejection fraction,” The New England Journal of Medicine, vol. 346, no. 12, pp. 877–883, 2002.View at: Google Scholar