Abstract

The clinical trial, a prospective study to evaluate the effect of interventions in humans under prespecified conditions, is a standard and integral part of modern medicine. Many adaptive and sequential approaches have been proposed for use in clinical trials to allow adaptations or modifications to aspects of a trial after its initiation without undermining the validity and integrity of the trial. The application of adaptive and sequential methods in clinical trials has significantly improved the flexibility, efficiency, therapeutic effect, and validity of trials. To further advance the performance of clinical trials and convey the progress of research on adaptive and sequential methods in clinical trial design, we review significant research that has explored novel adaptive and sequential approaches and their applications in Phase I, II, and III clinical trials and discuss future directions in this field of research.

1. Clinical Trials

Medicine is of paramount importance for human healthcare. Development of novel successful medicines is a lengthy, difficult, and expensive process which consists of laboratory experimentation, animal studies, clinical trials (Phase I, II, and III), and postmarket followup (Phase IV). Clinical trials are FDA-approved studies conducted in human beings to demonstrate the safety and efficacy of new drugs for health interventions under pre-specified conditions. A clinical trial is conducted in a sampled small population and the conclusions reached will be applied to a whole target population; therefore, statistics is an indispensable and critical component of clinical trial development and analysis, which has become increasingly important in contemporary clinical trials. As the gold standard for the evaluation of a new drug, every contemporary clinical trial must be well designed according to its specific purpose and conducted properly under governmental regulations. The major roles of a statistician in a clinical trial are to design an efficient trial with minimum cost and length and maximum therapeutic effect for patients in the trial, and to draw convincing conclusions by applying appropriate cutting edge statistical knowledge. In the past several decades, numerous groundbreaking novel statistical methodologies have been developed and applied to clinical trials and have significantly improved their performance. Consequently, clinical trials have evolved from simple observation studies to hypothesis-driven and well-designed prospective studies. At present, contemporary clinical trials have become the most important part of modern medicine.

2. Adaptive and Sequential Methods

Classical clinical trials are usually designed with a fixed sample size and schedule without using the information obtained from the ongoing trial. However, it has become increasingly common to modify a trial and/or statistical procedures during the conduct of a clinical trial. Specific modifiable procedures include the patient eligibility and evaluation criteria, drug or treatment dosage and schedule, laboratory testing or clinical diagnosis, study endpoints, measurement of clinical response, formulation of study objectives into statistical hypotheses, appropriate study design according to study purpose, calculation of minimum sample size, participant randomization, study monitoring with interim/futility analysis, statistical data analysis plan, and reaching conclusions, and so forth. The purpose of the modification is to improve the performance of a trial with prompt utilization of data accumulating from within the trial as well as upcoming related information from the literature.

Recently, adaptive and sequential clinical trials have become increasingly popular. The sequential method is an approach of frequentist statistics in which data are evaluated sequentially as they are accumulated and a study is monitored sequentially for stopping whenever a conclusion is reached with enough evidence. Adaptive design refers to the modification of aspects of the trial according to data accumulating during the progress of the trial, while preserving the integrity and validity of the trial. The modifiable aspects of adaptive trials include, but are not limited to, (a) sample size, (b) addition or removal of a study arm, (c) dose modification, (d) treatment switch, and so forth [1]. There are two types of adaptive methods in clinical trials, Bayesian and frequentist approaches [2]. The frequentist approach performs the modification of trials while controlling for type I and II errors. The Bayesian approach allows adaption according to the predicted probability. Common characteristics of sequential and adaptive clinical trials are that the trial and/or statistical procedures are modified during the conduct of trial according to the data accumulating during the trial. The sequential method mainly refers to sequentially monitoring the stopping criteria for futility and efficacy, while adaptive methods include modification of many more aspects of the trial as listed above, in addition to the decision of whether to stop the ongoing trial. Considerable novel statistical research has been conducted in the development of sequential and adaptive methods, especially for Phase I and II clinical trials. However, only some of these methods have actually been applied to the daily practice of real clinical trials. In the next 3 sections, we will review significant sequential and adaptive methods that have been applied to Phase I, II, and III clinical trials and have had a high impact on the field of clinical trials.

3. Statistical Methodology of Phase I Clinical Trials

A Phase I trial is one of the most important steps in a drug’s development and is the first clinical trial in human subjects after laboratory and animal studies of a therapeutic agent have shown a potential cure effect on the disease. The sample size of a Phase I clinical trial is relatively small and varies in the range of twenty to eighty. It is a widely accepted assumption that the therapeutic effect of a drug depends on its toxicity and increases monotonically with its dosage level. Higher doses are correlated with both severe toxicity and better therapeutic effect. Therefore, a balance is to be achieved between toxicity level and therapeutic benefit. To achieve the best therapeutic benefit, a patient should be treated with the maximum dosage of drug at which the patient can tolerate its associated toxicities with close monitoring. Among all toxicities patients experience, some are so severe that they limit dose escalation. These toxicities are called dose limiting toxicity (DLT). In the National Cancer Institute (NCI) Common Toxicity Criteria, DLT is defined as a group of grade 3 or higher nonhematologic toxicities and grade 4 hematologic nontransient toxicities. The grades of all toxicities are classified as below:grade 0: no toxicity;grade 1: mild toxicity;grade 2: moderate toxicity;grade 3: severe toxicity;grade 4: life-threatening toxicity;grade 5: death.

The main goals of a Phase I trial are to determine the dose-toxicity relationship of a new therapeutic agent and estimate the maximum tolerated dose (MTD) of the agent given the specified tolerable toxicity level. The highest acceptable DLT level is usually defined as a target toxicity level (TTL). It can be said that the TTL determines the MTD of the new therapeutic agent. A careful and thoughtful approach to the design of Phase I trials and accurate MTD estimation are essential for the fate of the new drug in subsequent clinical trials.

In a Phase I clinical trial, the well accepted assumption is that the probability of toxicity increases monotonically with increasing drug dose, although a decrease in the probability of toxicity at high dose levels could happen in some special cases which are not common and not considered here. There are nonparametric and parametric manners to describe the toxicity-dose relationship. In the non-parametric way, the only assumption is that toxicity is nondecreasing with dose. In the parametric description, a distribution with some parameters is adapted to model the toxicity-dose curve. From a biological point of view, the human body has stabilization and self-salvage systems to protect the person from mild toxicity when a drug dose is at a low level below a certain threshold level, but the probability of toxicity increases at an accelerated speed once the stabilization and self-salvage systems have been overcome, and reaches rapidly the worst condition, death, and then levels off. Therefore a sigmoid shape distribution is an appropriate model to describe the relationship between toxicity probability and dose. Many statistical designs have been proposed for Phase I clinical trials; the most commonly used are summarized and compared in Table 1. According to their algorithm, Phase I clinical trial designs can be grouped into two major categories, rule based design and model based design [3].

3.1. Rule Based Phase I Designs

All rule based designs follow a sequential approach. In rule based designs, a non-decreasing dose toxicity relationship is the only well accepted assumption required. Therefore rule based designs are well suited for first in human clinical trials in which the dose toxicity relationship is not well understood. Common rule based designs include design [4], isotonic design [5], accelerated titration design [6], and so forth.

The designs are rule based up-and-down methods used in Phase I protocol templates of the cancer therapy evaluation program (CTEP), whose mission is to improve the lives of cancer patients by sponsoring clinical trials to evaluate new anticancer agents, with a particular emphasis on translational research to elucidate molecular targets and mechanisms of drug effects. While designs have become standard practice among many Phase I clinical trialists, they are not designed with the intention of producing accurate estimates of a target quantile. Rather they are designed to screen drugs quickly and identify a dose level that does not exhibit too much toxicity in a very small group of patients. These designs fall into two categories, without dose de-escalation (Figure 1) and with dose de-escalation (Figure 2). In the design without dose de-escalation, three patients are assigned to the first dose level. If no DLT is observed, the trial proceeds to the next dose level and another cohort of three patients is enrolled. If at least two out of the three patients experience at least one DLT, then the previous dose level is considered as the MTD; otherwise, if only one patient experiences the DLT, then three additional patients are enrolled at the same dose level. If at least one of the three additional patients experiences the DLT, then the previous dose is considered as the MTD; otherwise, the dose will be escalated. The design with dose de-escalation allows three new patients to be treated at a previous dose level if only three patients were treated at that level previously. Dose reduction continues until a dose level is reached at which six patients are treated and at most one DLT is observed in the six patients. The MTD is defined as the highest dose level at which at most one of six patients experiences DLT, and the immediate higher dose level has at least two patients who experience DLTs. If the first dose is not tolerable, then the MTD cannot be established within the confines of the study. Hence, the MTD is identified from the data and is a statistic rather than a parameter. Storer (1989) was probably the first to examine the characteristics of the design from the standpoint of the statistician [7]. The operating characteristics of the design were discussed in Lin and Shih (2001) [4]. Note that any design with sampling that is asymmetric about the MTD will yield a biased result; thus the standard design, and all other designs that approach the MTD from below, will tend to yield a low estimate of the MTD. The designs are simple and can usually determine a reasonable MTD and are thus the most widely used methods for Phase I clinical trials. But they also have many shortcomings; for example, the methods are not designed around a quantile of interest; not all toxicity data are used to determine the MTD; the MTD is not a dose with any particular probability of toxicity. These disadvantages led to the exploration of extended isotonic design for Phase I clinical trials.

Leung and Wang (2001), for the first time, introduced a semiparametric Phase I design called isotonic design in which only a non-decreasing dose toxicity relationship is the required assumption [5]. In their isotonic design, the pool-adjacent-violators algorithm (PAVA) and isotonic regression are used to update the probability of DLT of each dose level after the toxicity response of each newly treated cohort has been obtained. The dose allocation rationale is to treat each new cohort at a dose level with an estimated probability of DLT closer to the pre-specified target acceptable toxicity level. The trial stops when the same dose has been tested consecutively for a certain number of cohorts or a maximum number of patients have been treated. The recommended dose level for the next cohort based on all completed data after the trial stops is the MTD. Through simulation studies, the isotonic design was demonstrated to perform substantially better than the design and comparably to the continual reassessment method (CRM) [8], Storer’s up-and-down designs, and escalation with overdose control (EWOC) design [9]. Moreover, the isotonic design is model-free and especially appropriate in cases where the parametric dosetoxicity relationship is not well understood.

There are many other rule based designs. All rule based designs can estimate a reasonable MTD using a stopping rule based either on observed DLTs or on convergence criteria. Ad hoc additional dose levels can also be added when needed without any impact on their robustness. Most rule-based designs are practically simple and easy to implement. At present, designs are still the most popular in Phase I clinical trials.

3.2. Model Based Designs

In model based designs, three parametric dose-toxicity functions (logistic model, hyperbolic model, and power function) are usually employed to depict the relationship between dose and toxicity. Model based designs often fail to find an MTD in first in human studies that are based on observed DLTs. The most common model based designs are CRM and EWOC. Their algorithms are illustrated in Figure 3.

O’Quigley et al. (1990) originally introduced the CRM, a Bayesian approach to fully and efficiently use all data and prior information available in a Phase I study [8]. As in rule based designs, a TTL is specified and the goal is to estimate the dose associated with the TTL, . A parametric model depicting the dose toxicity relationship and a prior distribution for each unknown parameter of the model are required to implement CRM. The posterior mean of each parameter is computed using the prior for the parameter and all available toxicity data for the probability of toxicity, , of each dose level. The computation is conducted and of each dose level is updated with accumulative toxicity data available when a new patient is recruited. The main idea of CRM is to treat each patient at the dose level with closest to . The MTD is defined as the dose level of the last patient treated in the trial. In the originally proposed CRM, a one parameter model of dose toxicity function and a single patient cohort are used. Furthermore, the first patient is proposed to be treated at a dose level determined purely by a guess in the original CRM, which makes the method impractical. Therefore, Korn et al. (1994) proposed a modified CRM in which the trial starts at the lowest dose level, no dose level can be skipped during the dose escalation, and the trial stops when the same dose has been recommended for a new patient consecutively for a fixed number of times [10]. However, patients still may be treated at excessively toxic doses in the modified CRM because of its single patient per cohort and the length of study is still very long because of the restriction that the toxicity of all treated patients must be obtained to calculate the new dose level for the next patient. In addition to the modification of Korn at al. (1994) [10], Faries (1994) [11], in his modified CRM, added another rule that no dose escalation is allowed for the next patient when the last patient has DLT. This rule can avoid treating patients at overly toxic doses compared with the traditional design. In order to address the ethical requirement that the probability of a patient being treated at overdose is under a pre-specified value, Babb et al. (1998) introduced an adaptive dose escalation scheme called EWOC [9]. The constraint on overdosing of EWOC is a superior feature over the CRM and its theoretical foundation was further elaborated by Zacks et al. (1998) [12]. A two-parameter model logit was first used to depict the dose, , and DLT relationship and then the joint posterior for and was transformed to a joint posterior for the MTD and the probability of DLT at the lowest dose level, . EWOC is also designed to rapidly approach the MTD in addition to the overdose constraint so that it starts from the lowest dose level and a single patient per cohort is used. After the toxicity response of the last enrolled patient has been obtained, the joint posterior for the MTD and is updated using all the available information and the next coming patient is treated at the 25th percentile of the marginal posterior for the MTD. The trial stops after a fixed number of patients have been treated and then the MTD is computed as its posterior mean or estimated by minimizing the posterior expected loss in a loss function. In order to be safe and shorten the length of the trial, no dose level can be skipped during the dose escalation procedure and multiple patient cohorts can be used instead in EWOC. Through simulation studies, EWOC has been shown to be effective in overdose control and have comparable accuracy of estimated MTD as CRM. Fewer patients are treated at nonoptimal dose levels, resulting in less DLT, and the estimated MTD has smaller average bias and mean squared error in EWOC than in some other nonparametric designs, such as four up-and-down designs and two stochastic approximation methods [9]. It seems that EWOC is a promising alternative design for Phase I clinical trials, especially when the ethical and safety requirement of overdose control is a particular concern. Both CRM and EWOC belong to adaptive dose finding designs in which a Bayesian approach is usually employed and the dose level for the new incoming cohort is adaptive based on the toxicity responses of the previously treated patients in the ongoing trial. Another adaptive dose design is the nonparametric adaptive urn design approach for estimating a dose-response curve [13].

All ruled based designs are robust and simple to implement and usually give a reasonable MTD under certain rules. Applying some sort of models, such as isotonic regression, to data can improve the accuracy of the MTD. Model based designs require a parametric model of dose toxicity relationship and may greatly improve the probability of estimating the correct MTD compared with rule based designs when certain assumptions are satisfied. However, model based designs are not robust and should not be used unless their underlying assumptions can be met with confidence. The accuracy of the estimated MTD depends substantially on the number of observed DLTs, and the sample size is also an important factor. Overall, different designs, whether rule based or model based, usually perform similarly when they are similar in sample size and aggressiveness. Thus, simple designs, especially standard designs, are still very popular in Phase I clinical trial practices.

The design of Phase I clinical trials can involve one or two stages. Rule based or model based designs can be implemented in each stage of two stage designs. There are other critical issues in Phase I clinical trial designs, such as the operating characteristics of design in terms of expected toxicity level [14], two or multiple stage Phase I design, within-patient dose escalation, late toxicity, combination of multiple agents, balance between toxicity and efficacy, individual MTD, fully utilization of all toxicities [15, 16], and so forth. Some outstanding research studies have been conducted on these topics, which will not be elaborated on herein due to space constraints but have been described in several comprehensive review articles [3, 1719].

4. Statistical Methodology of Phase II Clinical Trials

After the safety and MTD of an experimental drug have been established in a Phase I clinical trial, the drug will enter Phase II clinical trials, which initially evaluate the drug’s therapeutic effects at the recommended MTD. Phase II trials are sometimes further classified as Phase IIa and IIb studies. Phase IIa trials screen the promising novel experimental agent for significant antidisease activity and Phase IIb trials focus on the drug’s improved therapeutic effectiveness over the standard treatment. Phase II studies provide critical information to decide whether further testing of the experimental drug in a large confirmatory Phase III trial is warranted. The surrogate endpoint used in Phase II clinical trials needs to be obtained in a short time and should be able to assess the treatment’s primary benefit. For cancer trials, the experimental drug’s antitumor activity and progression-free survival (PFS) of treated patients are often used as surrogates of the drug’s efficacy. The drug’s anti-tumor activity is measured as clinical response within a short period of time following the treatment and is classified as complete response (CR), partial response (PR), progressive disease (PD), or stable disease (SD). PFS, which is estimated as the time elapsed from the date of treatment to the date of adverse event, resembles the outcome (overall survival) of the following Phase III clinical trial and is also widely used when it can be measured in a short time.

4.1. Single Arm Phase II Designs

The most commonly used Phase II clinical trial designs are summarized in Table 2. Phase II trials can involve either a single arm, which compares the new treatment with the standard response rate reported by historical data, or two or more arms with patients randomized among different treatments. In a single arm Phase II trial, two or multistage designs may be used to improve the trial efficiency and save resources with early termination of a futile trial. The interim analysis between the consecutive stages examines the accumulated data and decides whether the trial should stop as suggested by the early evidence of futility or should continue to next stage. The earliest two stage Phase II design was proposed by Gehan et al. in 1961 [20], in which a trial is terminated for futility when no patients enrolled in the first stage show any response or continues with the second stage, enrolling an additional number of patients to estimate a more accurate response rate with additional patient data. This design provides interim monitoring and can rule out ineffective drug with minimized sample size. This design is only appropriate for binary outcomes, which differ from the overall survival endpoint used in the following Phase III trial. Moreover, this design has no statistical testing on agents showing some promise and is not optimized. Therefore, Simon (1989) proposed an optimized two stage Phase II design by controlling both type I and type II errors as well as optimizing the sample sizes in both stages [21]. This design can quickly screen out agents without effectiveness while testing further agents with some promise. The design has two subtypes, optimal and minimax. The optimal subtype minimizes the expected overall sample size with the probability of the trial stopping after only the first stage so that it is appropriate for experimental drugs with a high probability of failure after the first stage. The minimax subtype minimizes the maximum possible sample size when the trial stops after completion of two stages so that it is better for highly promising experimental drugs. As with Gehan’s design, Simon’s two stage designs are only appropriate for binary outcomes. Other investigators have further proposed to conduct multiple interim analyses in Phase II clinical trials by using multistages. For example, Fleming (1982) [22] and Chang et al. (1987) [23] studied multiple testing and group sequential methods for Phase II trial designs. But the issue of inflating overall type I error needs to be considered in these kinds of Phase II designs.

Among the single arm Phase II designs, another major group is Bayesian Phase II design. For example, Thall and Simon (1994) [24] proposed a Bayesian Phase II design which continuously examines the results after each new enrolled patient and determines whether the trial can stop with a solid decision on the efficacy of the experimental drug or should continue to enroll more patients and obtain enough data for making a decision. Lee and Liu (2008) [25] proposed a Bayesian approach called predictive probability Phase II design. This novel Bayesian design provides a flexible monitoring schedule for Phase II clinical trials which becomes more efficient and robust, but at the cost of intensive computation, and relies heavily on the statistician during the trial. Yin et al. (2011) further coupled the methods of predictive probability monitoring and adaptive randomization in a randomized Phase II trial and extensively compared this hybrid Bayesian approach with group sequential methods [26].

4.2. Two or More Arm Phase II Designs

Some Phase II clinical trials may have two arms and randomization is frequently used to generate a reliable concurrent control arm and reduce biases. This kind of randomized Phase II trial is more similar to a Phase III trial. Randomized Phase II trials may reduce the so-called trial effect which often arises due to different patient populations, physician preferences, and medical environments between current and previous studies. But the sample size, trial length, and cost increase about 4-fold.

There are several multiple arm Phase II designs [27]. The Phase II “pick the winner” design is one in which each experimental regimen is compared with a historical control. No formal statistical comparison between groups is conducted and the simple winner of the all arms is the winner of the trial. This design provides an efficient and effective way of comparing two or multiple experimental regimens but is not appropriate for the comparison of adding an experimental agent to a standard regimen.

Phase II screening design is another Phase II design with multiple arms in which all experimental arms are compared with the standard treatment arm and all the experimental arms beating the standard treatment arm are winners. Therefore this design limits the sample size required for a randomized Phase II comparison and it is appropriate for testing the effect of adding an experimental agent to a standard regimen. However, it provides no statistical comparison between the selected (winning) arms.

Some investigators have proposed a novel Phase II randomized discontinuation design in which all patients receive the same treatment for a period of time and those with stable disease are randomized to continue or discontinue. This design is particularly appropriate when the treatment is known to have better therapeutic effects and it is ethical for all participants to benefit from it, or when the potential subgroup of patients who can benefit from the treatment is unknown before receiving it. However, this design requires a large number of patients to be treated with a treatment not effective for them. Therefore this design has specific applications but is not widely used.

Conventionally, Phase II and III trials are conducted separately in a sequential order and only an experimental drug that has successfully passed a Phase II trial can enter a Phase III trial. The resulting gap between trials and time lag may be unnecessary under certain circumstances. Therefore, a seamless Phase II/III design has been proposed, which uses Phase II data in a Phase III trial and minimizes delay in starting up the Phase III study [28, 29]. Usually the Phase II part is a randomized Phase II trial which uses a concurrent control. This nonstop Phase II/III design is particularly useful for new drugs showing efficacy. It usually requires large sample sizes and requires a Phase III infrastructure to be developed even if it stops early.

4.3. Other Advanced Topics in Phase II Designs

Categorical tumor response has been the most common endpoint in the Phase II clinical trial designs. However, from a statistical standpoint, categorizing a continuous tumor change percentage into a categorical tumor response with 4 levels results in a loss of study power by not fully utilizing all available data. Several publications have studied extensively the direct utilization of continuous tumor shrinkage as the primary endpoint for the measurement of drug efficacy in Phase II clinical trials [3032]. The success rate of Phase III oncology trials remains very low (e.g., 50–60%) despite the success demonstrated in the preceding Phase II trials [30]. The relationship between tumor response/tumor shrinkage percentage and overall survival as the gold standard for drug efficacy has been revisited [33]. PFS has the advantage of short follow-up time [34] and has been confirmed as the best estimate of overall survival [35] so that PFS is recommended as the primary endpoint over categorical tumor response in Phase II clinical trials when feasible.

5. Statistical Methodology of Phase III Clinical Trials

If an experimental agent exhibits adequate short term therapeutic effects in a Phase II trial, the drug will be moved forward to a Phase III study for confirmative testing of its long term effectiveness. The typical endpoint in a Phase III trial is a time to event measurement, such as progression free survival or overall survival. Phase III trials are large scale in terms of sample size, resources, efforts, and costs. This Phase collects a large amount of data over a long period of followup to evaluate the ultimate therapeutic effect of a new drug. The design of Phase III clinical trials has become a very important research field in order to improve the performance of these critical clinical trials. The most commonly used Phase III clinical trial designs are summarized in Table 3.

5.1. Randomization

The earliest design of Phase III clinical trials is a single arm study design using historical controls from the literature, existing databases, or medical charts. This kind of Phase III design allows ethical consideration and can increase enrollment as patients are assured of receiving new therapy. In addition, trials will have shorter time and lower cost, making this type of trial a good choice for the initial testing of new treatments, or when disease diagnosis is clearly established, prognosis is well known, or the disease is highly fatal. This Phase III design, however, provides no comparison to control group data and is vulnerable to biases because disease and mortality rates have changed over time and literature controls are particularly poor. Phase III trials conducted using this design tend to exaggerate the value of a new treatment. In order to avoid bias and eliminate time trends, a concurrent control but nonrandomized design for Phase III clinical trials was then proposed and implemented. In this design, randomization does not interfere with treatment selection. It is easier to select a group to receive the intervention and select the controls matching key characteristics. Therefore, this design can reduce costs and is relatively simple and easily acceptable to both the investigator and participant. But in this Phase III design, intervention and control groups may not be comparable because of selection bias and incomparable different group populations. It is difficult to prove comparability because it is impractical to have information on all important prognostic factors and to match several factors. The existence of unknown or unmeasured factors in large studies is also uncertain. The afterward covariance analysis is not adequate for offsetting the imbalance between groups.

To eliminate the bias, facilitate masking treatments, and permit the use of statistical theory, randomization has been employed widely in the Phase III clinical trials [36]. There are two major types of randomization approaches, non adaptive versus adaptive. Simple randomization, block randomization, and stratified randomization belong to the nonadaptive randomization type. The simple randomization is robust against both selection and accidental biases and appropriate for RCTs with over 200 subjects because of the possibility of imbalanced group sizes in small RCTs [37]. Block randomization can guarantee balanced group sizes by pre-specifying the block size and allocation ratio and allocating subjects randomly within each block [33]. Block randomization is often used with “stratified randomization” in small RCTs. There are several adaptive randomization approaches: adaptive biased coin, covariate adaptive, and response adaptive [33]. The adaptive biased-coin randomization method can reduce the imbalance of group size and is less affected by selection bias than permuted-block randomization by decreasing and increasing the probability of being assigned to an overrepresented group and underrepresented group, respectively. Randomization can be adaptive to covariate in order to produce balanced groups in terms of the sample size of several covariates. The most common covariate adaptive randomization approaches are the Taves’s method [38], Pocock and Simon method [39], and Frane’s method [40] for both continuous and categorical types. Overall, covariate adaptive randomization can reduce the imbalance further and handle more covariates simultaneously than using the combination of block and stratified randomization [41]. Randomization can be adaptive to response or outcome in order to increase the trial therapeutic effect, taking into account ethical considerations. Response-adaptive randomization can assign more patients to receive better treatment by skewing the probability of assigning new patients to the group showing favorable response as the data of the trial are accumulating while maintaining a certain study power [41]. The most common approaches used for response-adaptive randomization are the urn model, biased coin design, and Bayesian’s approach [34]. Each randomization approach has its own merits and limitations. The selection of randomization method depends on the specific study purpose.

5.2. Randomized Controlled Phase III Trials

The statistical approach of randomization removes any potential bias in group allocation. The use of randomization and a concurrent control together produce comparable groups and make conclusions more convincing. The use of feasible blinding minimizes the bias after randomization. At present, the standard form of a Phase III trial is a randomized and placebo-controlled clinical trial (RCT) with double blinds. The control arm may be a placebo or the standard of care. The use of placebo is only acceptable if there is no other better or standard therapy available. Interim monitoring is also often considered for a long term confirmatory RCT. The RCT which guarantees the validity of statistical tests and valid comparisons has been generally used as the “gold standard” for verifying the efficacy of new drugs. However, there are still some limitations in RCTs; for example, subjects may not represent the general patient population; sample size and cost increase substantially; the randomization process may not be widely accepted; the administrative process may be complex; and so forth. According to their statistical algorithm and characteristics, besides the conventional fixed sample Phase III clinical trial in which only one final data analysis is conducted at the end of the study, other RCT designs with additional analyses before final analysis can be divided into two distinct categories: sequential RCT design and Bayesian adaptive RCT design.

5.2.1. Group Sequential RCT Design

The scheme of the group sequential design is summarized in Figure 4. In this design, type I and II errors are explicitly controlled while testing the study hypotheses, and patients continue to be enrolled and randomized until the primary hypothesis has been proved or disproved. To design a Phase III clinical trial with the group sequential method, the total number of stages, the sample size, and stopping criterion at each stage for the null hypothesis testing as well as the usual specifications in a conventional Phase III clinical trial must be pre-specified before the trial starts. At each interim stage, all accumulated data up to the point are analyzed and the test statistics is compared with critical values generated from the sequential design to determine whether the trial should stop or continue. A conclusion on the primary hypothesis must be reached at the final stage when the sequential trial passes all interim analyses and completes with the final stage.

Multiple testing during the sequential trial may inflate type I error which can be controlled using the Pocock approach [42], O’Brien-Fleming approach [43], and alpha spending function [44]. The Pocock approach was the first method for group sequential testing with given overall type 1 error and power by dividing type I error evenly across the number of interim and final analyses. For example, in a clinical trial with 2 interim analyses and 1 final analysis, the Pocock procedure uses the same cut-off for both the interim and final analyses and the clinical trial can stop and claim a positive outcome if the value is less than 0.022 at any of the analysis times. One obvious problem with the Pocock approach is its too high probability of stopping the trial early. In order to prevent early stopping and to keep the final value close to the overall significance level, such as 0.05, O’Brien and Fleming’s approach [43] uses a very strict cut-off value at the beginning, then relaxes the cut-off value over time. As in the above clinical trial, the values for the first and second interim analyses are 0.005 and 0.014, respectively. The value for the final analysis is 0.045 which is close to 0.05. Both the Pocock and O’Brien-Fleming approaches maintain the overall type I error by paying a penalty at the final analysis, but the O’Brien-Fleming method involves much less of a penalty at the planned conclusion of the study because it requires stricter standards earlier. Both methods have some limitations; both require a pre-specified maximum number of patients, the number of interim analysis, and equal increments of information between interim stages. Therefore, DeMets and Lan [44] (1994) introduced a spending function approach to relax the requirement of the equal increments of information. The approach spends the allowable type I error rate over time according to a chosen spending principle and the amount of information accrued and allows dropping or adding an interim analysis during conduct of the trial. There are several types of spending functions proposed in the literature. Besides the Pocock-type and O’Brien-Fleming-type error spending functions proposed by Lan and DeMets, the gamma error spending function [45] proposed by Hwang, Shih, and DeCani and the power error spending function [46] proposed by Jennison and Turnbull are also commonly used in clinical trials. The conclusions drawn at the interim and final analyses are affected heavily by the pre-specified boundaries so that the choices of the type of spending function are very important and depend on the specific purpose of the trial and its associated clinical program. In addition to efficacy, the safety profile of drug is also an important factor when considering the early stopping of a trial.

The major advantages of the group sequential RCT design are its abilities to prevent unnecessary exposure of patients to an unsafe or ineffective new drug or to a placebo treatment, and to save time and resources by stopping the trial early for efficacy, futility, and safety. The sequential RCT design is suitable for acute response, paired subjects, and continuous testing. It is especially appropriate for dichotomized decisions (yes/no) because the result of the RCT trial is determined to be significant or not according to a pre-specified significance level (type I error). Although sequential RCT is the most widely used design in Phase III clinical trials, it has some limitations. Sequential RCT may require larger sample sizes than Bayesian adaptive RCT as a result of additional variability and comparison of multiple treatments with similar efficacies. Sequential RCT is somewhat adaptive by using interim monitoring and stopping rules, but it requires prespecification of all possible study outcomes, thus inhibiting the full adaptation and utilization of newly accumulated data from the ongoing trial.

5.2.2. Bayesian RCT Design

Bayesian randomized clinical trials refer to trials in which Bayesian approaches are applied extensively to some or all of the processes of a trial including randomization, monitoring, interim and futility analysis, final analysis, and adaptive decisions. Berry and Kadane [47] proposed optimal Bayesian randomization in 1997 and the practical uses of Bayesian adaptive randomization in clinical trials have been reviewed by Thall and Wathen [48]. Bayesian monitoring has been frequently used in some Phase III clinical trials, especially in those with failure time endpoints [49]. Bayesian analysis in clinical trials has become increasingly common recently as it can borrow strength from outside the study [50]. Bayesian adaptive decisions in clinical trials can be made according to a posterior probability or predictive probability of trial success or from the result of Bayesian final analysis. Bayesian adaptive decisions have been compared to frequentist sequential approaches [51] and some studies [5254] proposed to use Bayesian decision theoretical approaches in the optimization of designs under various settings.

Bayesian RCT design is dynamic learning adaptive in nature as it prespecifies the approaches to combine all available data accumulated during the process of the study, calculate probabilistic estimation of uncertainty, control the probability of false-positive and false-negative conclusions, and change the study design correspondingly [55]. Bayesian and adaptive RCT design cannot only compare multiple active treatments but can also allow the ongoing trial to add new emerging effective interventions, discontinue less effective ones proved by accumulated within-trial data, or focus on patient subgroups identified by certain biomarkers for whom interventions are more (or less) effective so that the trial tests the most current interventions, improves the clinical relevance, and targets biomarkers that predict response to alternative intervention. Using external existing data from previous studies during the design stage and the accumulated within-trial data to update the design results in smaller sample size, shorter time, and reduced cost of Bayesian and adaptive RCT [56]. But Bayesian RCT may be criticized as being too subjective, not well planned, or too complicated.

Both Bayesian and sequential RCT designs have their advantages and disadvantages. Instead of biasing toward either Bayesian or sequential methods, statisticians and investigators should choose the design of Phase III clinical trial that best fits the goals of the trial and is most likely to provide the best performance.

5.2.3. Adaptive Sample Size Calculation and Adaptive Stopping

In the planning stage of a Phase III clinical trial, sample size is one of the most important factors to be considered because the budget for the trial depends on the minimum required sample size. Usually sample size is fixed in a trial, but an adaptive sample size calculation is often used in adaptive clinical trials and the sample size is adjusted based on the observed data at the interim analysis [1]. Sample size determination depends on the expected treatment difference and its standard deviation; however, their initial estimations often turn out to be too large or small as suggested by the accumulating data from the ongoing trial or other newly completed studies. In this case, keeping the original sample size will lead to an underpowered or overpowered trial, and so the sample size should be adjusted according to the updated effect size for the ongoing trial. There are several approaches for sample size adjustment based on the criteria of treatment effect size, conditional power, and/or reproducibility probability [5761]. The observed treatment effect and estimated standard deviation from a limited number of subjects at the interim analysis may not be of statistical significance. Therefore, these factors should not be weighed too heavily and the targeted clinically meaningful difference in the ongoing clinical trial should always be considered fully in the adaptive sample size calculation.

The fate of an ongoing Phase III trial is determined at its data monitoring committee (DMC) meeting, which makes recommendations based on the available data according to stopping rules in the statistical guidelines. The common factors considered in stopping rules are safety, efficacy, futility, benefit-risk ratio, weight between the short term and long term treatment effects, and conditional power or predictive power [1]. Current tools for monitoring Phase III trials are stopping boundaries, conditional and predictive powers, futility index, repeated confidence interval, and Bayesian monitoring tools. Even though the stopping rules are usually stipulated in the design stage, adaptive stopping is becoming more and more common due to unpredicted events during the conduct of the trial, such as a change in the DMC meeting date because of unavailability of committee members, different patient accrual progress, and deviation in the analysis schedule. Moreover, the true variability in the parameters to construct these boundaries of stopping rules is never known and it is very common that the initial estimates of the variability and treatment effect in the design phase are inaccurate as shown by the preliminary results of the ongoing trials. These deviations could affect substantially the stopping boundaries so that adaptive stopping becomes especially desirable in these cases. To stop a trial prematurely under adaptive stopping algorithm, thresholds for the number of subjects randomized and some rules (such as utility rules, futility rules, etc.) in terms of boundaries must pass.

6. Concluding Remarks

Clinical trials remain an indispensable component of new drug development. Novel statistical approaches have been applied to clinical trials and have significantly improved their performance in every step from design, conduct, and monitoring to data analysis and drawing final conclusions. As modern medicine progresses, increasingly complex requirements and factors need to be considered in clinical trials, which in turn create new challenges for statisticians. In the future, more novel statistical approaches, frequentist and Bayesian, should be developed to enhance the performance of clinical trials in terms of therapeutic effect, safety, accuracy, efficiency, simplicity, and validity of conclusions and to expedite the development of effective new drugs to improve human healthcare.

Acknowledgments

This work is supported in part by NIH/NCI Grants no. 1 P01 CA116676 (Z. Chen.), P30 CA138292-01 (Z. Chen. and J. Kowalski.), and 5 P50 CA128613 (Z. Chen); NSA GrantH98230-12-1-0209 (Y. Zhao).