Abstract

The main objective of cancer phase I clinical trials is to determine a maximum tolerated dose (MTD) of a new experimental treatment. In practice, most of these trials are designed so that three patients per cohort are treated at the same dose level. In this paper, we compare the safety and efficiency of trials using the escalation with overdose control (EWOC) scheme designed with three or only one patient per cohort. We show through simulations that the number of patients per cohort does not impact the proportion of patients given therapeutic doses, safety of the trial, and efficiency of the estimate of the MTD. Additionally, we present guidelines and tabulated values on the number of patients needed to design a phase I cancer clinical trial using EWOC to achieve a given accuracy of the estimate of the MTD.

1. Introduction

Cancer phase I clinical trials are small studies whose main objective is to determine a maximum tolerated dose (MTD) of a new experimental drug or combination of known drugs for use in a phase II trial. Patients are typically accrued to the trial sequentially in cohorts of size and dose level assignment to a given cohort of patients is dependent upon the dose levels and toxicity outcomes of the previously treated cohorts of patients. A large number of statistical methodologies which account for the sequential nature of the data generated by such designs have been proposed in the literature, see [1, 2] for a comprehensive review of such methods. In particular, the continual reassessment method (CRM) proposed by [3] and its modifications [48] and escalation with overdose control (EWOC) described in [915] are Bayesian adaptive designs that produce consistent sequences of doses and can be easily implemented in practice using published tutorials and free interactive software, see, for example, [1619].

The work we present here has been motivated by the frequent requests by clinicians and review committees at Cancer Center Institutions the authors worked at on (1) the number of patients that should be included in each cohort, and (2) the number of patients required to conduct a phase I cancer clinical trial. Denote by a design that treats patients in successive cohorts of size simultaneously at the same dose level. For a given fixed number of patients in the trial, an advantage of an design with over a design is a shorter time of completion of the trial. However, it is not clear how the two designs compare with respect to safety of the trial and efficiency of the estimate of the MTD using EWOC. Goodman et al. [5] argue for the use of more than one patient per dose level in a modified version of the CRM to reduce the duration of the trial and toxicity incidence associated with the original CRM. In this paper, we compare a design with a design in terms of the number of patients given therapeutic doses, that is, doses in a neighborhood of the “true” MTD. In addition, safety of the trial and efficiency of the estimate of the MTD will be compared using extensive simulations.

The number of patients that are enrolled in a cancer phase I clinical trial is typically between 12 and 40 and trial duration depends on the study design and length of the study cycle to resolve toxicity outcome. An increasing number of clinicians we work with inquire about the number of patients they need to accrue in order to estimate the MTD with an acceptable degree of accuracy. We are not aware of any published methodologies for sample size determination (SSD) in cancer phase I clinical trials based on power calculation or precision of some Bayes estimates using either frequentist or Bayesian adaptive designs. As a point of fact, most sample size recommendations are based on prespecified stopping rules, see, for example, [20] on selecting the number of patients by considering different stopping rules using the CRM. Lin and Shih [21] and Ivanova [22] describe sample size recommendations based on the expected number of patients allocated to each dose selected from a set of prespecified dose levels.

In this paper, we address the SSD problem using the traditional approach; we estimate the sample size based on a desired accuracy of the Bayes estimate on the average. Specifically, we seek the smallest number of patients so that the posterior variance of the MTD on the average over all possible trials is no more than a specified margin. This procedure is not based on a specific stopping rule and consequently preserves the coherent nature of EWOC, see [15] for the coherence EWOC.

This paper is organized as follows. Section 2 describes dose escalation with overdose control using cohorts of size . In Section 3, we present two criteria to sample size determination in this Bayesian setting. Comparisons of designs that treat cohorts of size simultaneously over the ones that treat one patient at a time are presented in Section 4. In that section, we also give tabulated values relating the number of patients on the trial and the corresponding average posterior variance and length of the highest posterior density interval. Section 5 contains some concluding remarks and discussion.

2. Design Using EWOC

EWOC is a Bayesian adaptive design permitting precise determination of the MTD while directly controlling the likelihood of an overdose. It is the first statistical method to directly incorporate formal safety constraints into the design of cancer phase I clinical trials. Zacks et al. [10] and Tighiouart and Rogatko [15] discuss statistical properties and coherence of the method, and a comparison of EWOC with alternative phase I design methods is given in [9]. Babb and Rogatko [11] provide a summary of Bayesian phase I design methods and Tighiouart et al. [12] studied the performance of EWOC under a richer class of prior distributions for the model parameters. The defining property of EWOC is that the expected proportion of patients treated at doses above the MTD is equal to a specified value , the feasibility bound. This value is selected by the clinician and reflects his/her level of concern about overdosing. Zacks et al. [10] showed that among designs with this defining property, EWOC minimizes the average amount by which patients are underdosed. This means that EWOC approaches the MTD as rapidly as possible, while keeping the expected proportion of patients overdosed less than the value . Zacks et al. [10] also showed that, as a trial progresses, the dose sequence defined by EWOC approaches the MTD (i.e., the sequence of recommended doses converges in probability to the MTD). Eventually, all patients beyond a certain time would be treated at doses sufficiently close to the MTD.

EWOC has been used to design over a dozen of phase I studies approved by the Research Review Committee and the Institute Review Board of the Fox Chase Cancer Center, Philadelphia, Winship Cancer Institute, Atlanta, and Cedars Sinai Medical Center, Los Angeles (see [2329] for some of the published trials).

We adopt the-logistic-based model to represent the dose-toxicity relationship the following: where so that the probability of dose limiting toxicity (DLT) is an increasing function of dose. The MTD is defined as the dose expected to produce DLT in a specified proportion of patients. Let be the probability of a DLT at the starting dose. To facilitate interpretation of model parameters by the clinicians, we further parameterize model (2.1) in terms of , see [9, 12] for more details. Suppose we plan to enroll patients in the trial in cohorts of size . Dose levels in the trial are selected in the interval and an design proceeds as follows. We first specify prior distributions for and . Then, the first cohort of patients receives the dose . Let be the number of toxicities observed among the first patients. The likelihood given the observed data thus far is where and . Let be the marginal posterior cumulative distribution function (cdf) of the MTD given . The second cohort of patients receives the dose so that the posterior probability of exceeding the MTD is equal to the feasibility bound . In general, the likelihood of the data after observing the toxicity outcomes of the th cohort of patients is where is the dose assigned to the th cohort of patients, is given by (2.3) with replaced by , and . The ()st cohort of patients receives the dose where is the marginal posterior cdf of given . This process is repeated until a total of cohorts are enrolled in the trial. This completes the description of an design. For a given sample size , we propose to compare the performance of a with a design by estimating the percent of patients treated within a neighborhood of the true MTD. Other comparisons include safety and efficiency of the estimate of the MTD under the two designs.

3. Sample Size Determination

An increasing number of clinicians inquire about the number of patients they need to accrue in the design of cancer phase I trials to achieve a specific goal. Sample size recommendation based on the expected number of patients treated at each dose level in “” designs and designs have been studied in [21, 22], respectively. However, these methods apply to a prespecified set of discrete doses and it is not clear how they can be applied to continuous doses. Unlike the frequentist approach, there is no consensus on a specific Bayesian method for the SSD problem, see Adcock [30] for a review of Bayesian approaches. In this paper, we present numerical results based on the posterior variance of the MTD and highest posterior density (HPD) interval, see [31].

Denote by ) the posterior variance of the MTD given that patients have been accrued to the trial. The first criterion is to find the smallest that satisfies where the above expectation is taken with respect to the marginal distribution of the data and is specified by the clinician. In other words, we require an estimate of the MTD within a given accuracy as measured by the posterior variance on the average overall possible trials. In the second criteria, we seek the smallest such that where is the length of the HPD interval determined by the constraint on the coverage probability

This is also known as the average length criteria (ALC) because for each realization of a trial , the corresponding HPD interval is determined by (3.3) and the lengths of these HPD intervals are averaged out with respect to the marginal distribution of the data in (3.2). The tolerance values of the average length of the HPD interval and coverage probability 1- are prespecified by the clinician. Since both the posterior distribution of the MTD and marginal distribution of the data are intractable, Monte Carlo averages were used to estimate the left hand sides of (3.1) and (3.2). Details on the computation of Var) and can be found in [9, 18].

4. Numerical Results

The simulation results presented below all assume that the feasibility bound and that the dose levels are standardized so that the starting dose for each trial is and all subsequent dose levels are selected from the unit interval. Independent uniform prior distributions were put on the parameters and on the intervals , , respectively.

4.1. Comparison of Designs with

We simulate trials under different scenarios corresponding to different values of and . For the design, the first patient receives dose 0 and the next dose is determined as described in Section 2. The second response is then generated from the logistic model (2.3). This process is repeated until a trial of patients is generated. The same process applies to the design except that 3 patients will be given the same dose at each stage of the trial and 3 responses are generated from model (2.3) independently instead of 1. Since and , we considered 12 scenarios corresponding to combinations of three values of , with four values of , 0.2, 0.4, 0.6, and 0.8. We will refer to , , as low, intermediate, and high values for , respectively. Similarly, 0.2 and 0.4 will be referred to as low values for the MTD and 0.6 and 0.8 as high values. The same value was used in all simulations. For each design, each sample size , 18, 24, 30, and each combination of , we simulated 5000 trials and calculated the proportions of patients given therapeutic doses, that is, doses in an -neighborhood of the true MTD, for .

Table 1 gives the estimated proportions of patients given doses in an -neighborhood of the true MTD under designs and and the difference in these proportions between the two designs for low values of the true MTD and different sample sizes. Table 2 gives the corresponding estimates for high values of the true MTD and Table 3 gives the average of these estimates across the 12 combination of (). For low values of the true MTD, design assigns more patients to doses near the MTD than design in general and the difference can be as high as 16% for , , and . For high values of the MTD, Table 2 shows that design always assigns more patients to doses near the MTD than design and the highest difference is about 16% for , , and . The estimated difference in the proportions of patients given doses in an -neighborhood of the true MTD between the design and design averaged across the 12 entertained scenarios for for different sample sizes show that the proportion of patients given therapeutic doses under design is always greater than the corresponding proportion under design , the largest of these differences is about 5%. The practical impact of this difference is unimportant because of the relatively small number of patients involved in phase I cancer clinical trials. In Tables 4 and 5, we present differences in (i) the proportions of patients exhibiting DLT, (ii) the proportions of patients given doses above the “true” MTD, (iii) the bias, and (iv) the mean square error between the and designs. Table 6 gives the average values of these statistics, averaged across the 12 entertained scenarios for (). Based on (i) and (ii), the results indicate that the two designs are equally safe and that no practical gain is achieved in terms of the efficiency of the estimate of the MTD according to (iii) and (iv). From an ethical point of view, we recommend the design to prevent the occurrence of three simultaneous DLTs if we were to use the design. This should be discussed with the clinician after assessing the importance of the length of the trial.

4.2. Sample Size Determination

In this section, we present tabulated values for average posterior standard deviation of the MTD and average length HPD interval that are achieved for even sample sizes and selected values of , the target probability of DLT. Table 7 summarizes the results for . For a given sample size , each entry in the table was calculated according to the following algorithm:

Set .(i) Generate () ~ Uniform and independently.(ii) Simulate a trial of patients according to the EWOC algorithm described in Section 4.1 with ( as the true model parameters.(iii) Calculate the posterior variance and HPD using (3.3).(iv) Repeat steps (i)–(iii) for .The left hand sides of (3.1) and (3.2) are estimated by

In the numerical results presented here, we took . When , Table 7 shows that with 6 patients, we can estimate the MTD with an average posterior standard deviation equal to 25% of the range of the dose and that a 17% decrease in the average posterior standard deviation is achieved when increasing the sample size from 6 to 40 patients. Similarly, the average length of the 90% HPD interval is 74% of the dose range when 6 patients are enrolled in the phase I trial and a reduction of 16% of this length is achieved when increasing the number of patients from 6 to 40. Figures 1 and 2 show the average posterior standard deviation and average lengths of the 95% HPD intervals as functions of the sample size and target probability of DLT .

4.3. Illustrative Example

A randomized phase I clinical trial of the combination bortezomib and melphalan as conditioning for autologous stem cell transplant in patients with multiple myeloma was designed using EWOC and the results published in [27]. patients are randomized to arm A where a fixed dose of melphalan (100 mg/m2) is given before bortezomib and arm B where the same fixed dose of melphalan is given after bortezomib. The doses available for bortezomib are 0.4, 0.7, 1.0, 1.3, and 1.6 mg/m2 with the first patient in either arm receiving 1.0 mg/m2. For each arm, the MTD is defined to be the dose level of bortezomib that when administered in combination with 100 mg/m2 of melphalan (either before or after) to a patient results in a probability equal to that a dose limiting toxicity will be manifest. In this trial, we start at and increase in small increments of 0.05 until , this value being a compromise between the therapeutic aspect of the Bortezomib and its toxic side effects. Since the doses in this trial are discrete, the dose allocated to the next patient is obtained by rounding down the dose recommended by EWOC algorithm to the nearest discrete dose, see [9, 15] on how to conduct a trial in the presence of a prespecified set of discrete doses.

Figure 3 shows all the possible dose sequences that could be realized for the first four patients, assuming that only one patient is treated at each dose and a selected situation for patient 5. The principal investigator (PI) wanted to determine the number of patients to accrue in each arm so that the posterior standard deviation of the MTD is no more than one-fifth the range of the dose level. This statistical constraint combined with the logistics such as availability of the resources for the PI, number of patients available, and limits on the duration of the trial leads us to select 20 patients per arm. In fact, a sample size of 20 results in an average posterior standard deviation ; This is just below one-fifth the range of dose levels 0.4–1.6.

5. Concluding Remarks

The objectives of this paper are to provide a rational for the choice of cohort sizes and number of patients to accrue in a phase I cancer clinical trial when the Bayesian adaptive design EWOC is used. In these trials, patients are typically enrolled in cohorts of size three for no apparent reason other than being in agreement with the traditional “3 + 3” design and shortening the duration of the trial. We have shown through simulations that the two designs are equally safe and that no practical gain is achieved in terms of the efficiency of the estimate of the MTD. Depending on how important the length of the trial is to the clinician and the institution, we recommend using one patient per dose level to avoid seeing simultaneous toxic events when a group of patients is treated at the same dose level as was the case in a recent phase I trial of the drug TGN1412, see [32]. In that trial, six volunteers were given what was believed to be a safe dose of an anti-inflammatory drug TGN1412. Shortly after, all 6 were admitted into intensive care due to severe reactions including swelling of the head and neck.

The simulation results were obtained by generating the toxicity responses using the logistic model (2.3). This assumption may not be true in practice and the operating characteristics of EWOC may be sensitive to model misspecification. However, for the purpose of model comparisons between and designs, any model misspecification for the probability of toxicity response will affect the two designs the same way.

In the second part of the paper, we addressed the SSD problem by giving tabulated values of the number of patients to accrue in a cancer phase I clinical trial as a function of the posterior standard deviation and length of the HPD interval of the MTD on the average over all possible trials. Although this aspect of the trial never received much emphasis in the literature due to the relatively small number of patients and logistical issues associated with such trials, we felt that providing a measure of the accuracy of the estimate of the MTD that can be achieved for a given sample size would help the clinicians understand what can and cannot be achieved during this phase of the trial. Our results show that in general, there is 17% decrease in the average posterior standard deviation of the MTD when the sample size increases from 6 to 40 patients and that for a sample size of 20 patients, the average posterior standard deviation of the MTD is about one-fifth the range of the dose levels. Although this decrease in the average posterior standard deviation seems modest, we note that this is dependent upon the use of prior distribution for the MTD. A more informative prior based on past data will result in smaller average posterior standard deviations and narrower HPD intervals.

Acknowledgments

This paper is supported in part by the National Center for Research Resources, Grant UL1RR033176, and is now at the National Center for Advancing Translational Sciences, Grant UL1TR000124 (M. Tighiouart and A. Rogatko), Grant 5P01CA098912-05 (A. Rogatko), and Grant P01 DK046763 (A. Rogatko). The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.