Abstract

If there are no carryover effects, AB/BA crossover designs are more efficient than parallel (A/B) and extended parallel (AA/BB) group designs. This study extends these results in that (a) optimal instead of equal treatment allocation is examined, (b) allowance for treatment-dependent outcome variances is made, and (c) next to treatment effects, also treatment by period interaction effects are examined. Starting from a linear mixed model analysis, the optimal allocation requires knowledge on intraclass correlations in A and B, which typically is rather vague. To solve this, maximin versions of the designs are derived, which guarantee a power level across plausible ranges of the intraclass correlations at the lowest research costs. For the treatment effect, an extensive numerical evaluation shows that if the treatment costs of A and B are equal, or if the sum of the costs of one treatment and measurement per person is less than the remaining subject-specific costs (e.g., recruitment costs), the maximin crossover design is most efficient for ranges of intraclass correlations starting at 0.15 or higher. For other cost scenarios, the maximin parallel or extended parallel design can also become most efficient. For the treatment by period interaction, the maximin AA/BB design can be proven to be the most efficient. A simulation study supports these asymptotic results for small samples.

1. Introduction

The standard design of a randomized clinical trial is the parallel group design: subjects are randomly assigned to one of two treatments, say A or B. An alternative, well-known design is the AB/BA crossover trial in which subjects receive both treatments, A and B, but the sequencing of the treatments is opposite for two randomly allocated groups [1, 2]. An AB/BA crossover trial is considered most suited when examining treatments for chronic or ongoing diseases, such as rheumatism, chronic obstructive pulmonary disease, or (frequent) heartburn. In these cases, there is no real possibility that the disease gets cured, and the aim is to moderate the effects of the disease [2]. A third design that we will consider involves treatment sequences AA and BB. This design extends the parallel design across two treatment periods, allows for testing treatment by time interaction effects, and is a realistic alternative for the AB/BA design in case the treatment regime should continue.

If the outcome variable is continuous and (approximately) normally distributed, the data can be analyzed by mixed effects regression [3]. Of primary interest is testing the treatment effect of, for instance, a new medication for chronic obstructive pulmonary disease. A relevant issue then is which design is the most efficient in estimating the treatment effect, thereby yielding maximum power for testing this effect. Such optimality has already been examined when comparing crossover and parallel designs [2] and when comparing all three designs introduced before [4, 5]. If there are no carryover effects and no dropouts, the sample sizes are equal and equally allocated to the treatments, an AB/BA design yields more efficient estimates of the treatment effect than a parallel and extended parallel design and consequently, will yield more power to test this effect.

The present study extends results on the relative efficiencies of these designs in that (a) optimal instead of equal treatment allocation is examined, (b) allowance is made for treatment-dependent outcome variances, and (c) next to treatment effects, also treatment by period interaction effects are examined. Outcome variances may differ between treatments [6, 7]. This also is to be expected if treatments differ in terms of their effectiveness. Furthermore, since research costs and outcome variances may differ between treatments, equal allocation to treatments may not be the most efficient. The issue then is how to allocate subjects to treatments such that a design’s efficiency is optimized, and how different designs relate in terms of efficiency under such optimal allocation. Optimal allocation requires a priori knowledge on parameters of the analysis model, that is, intraclass correlations for the mixed effects model that we consider. Since this knowledge typically is rather vague, optimal allocations and corresponding efficiencies for maximin versions of the (extended) parallel design and crossover design will be derived. These maximin designs guarantee a power level across plausible ranges of the intraclass correlations at the lowest research costs.

In designs where treatments are successively given to the same group of subjects, carryover may occur. For the AB/BA trial, it may be that, in the AB sequence treatment, A still has an effect on the outcome, when B has been given and the second measurement is done. When in the BA sequence, the effect of B is present, once A has been administered and this effect differs from the carryover effect for the AB sequence, differential carryover occurs. The present study assumes that differential carryover can be safely excluded or is negligible and that this effect does not need to be estimated in analyzing the data.

The paper is structured as follows. Section 2 will present the linear mixed model for analyzing data from each of the three designs. Section 3 will introduce the efficiency criterion and will provide asymptotic expressions for this criterion in the case of maximum likelihood estimation of the treatment effect. Starting from a flexible cost function, optimal allocations to treatments will be derived as well as resulting design efficiencies. Since the efficiencies depend on the intraclass correlations and knowledge on these parameters is often limited, in Section 4, we will derive maximin designs. Section 5 will show to what extent the asymptotic efficiencies translate into desired power levels for small sample sizes. Section 6 will give an application of the results, and Section 7 will discuss some issues for further research.

2. Linear Mixed Effects Models

In the case of a parallel design, an extended parallel design, and a crossover design, the subjects are randomly allocated to one of the two arms. In a parallel design to treatment A or treatment B, in an extended parallel design, they are allocated to treatment sequence AA or BB, and in a crossover trial to treatment sequence AB or BA. We consider a quantitative outcome variable, denoted as yij for person j () at measurement occasion i, and assume yij is (approximately) normally distributed.

For a parallel design and outcome variances that differ between treatments A and B, simple linear regression with heterogeneous variances may be an adequate tool for data analysis:where treatment is coded 0 for persons having treatment A and coded 1 for persons having treatment B, and and are normally distributed, with mean 0 and variances and , respectively. The random terms and can be thought of as consisting of a random person (between-subject) effect, , and a treatment-dependent random error (within-subject) effect, and . In formula, , and . These two sources of random variation cannot be separated in a single-period parallel trial.

For a crossover AB/BA design and an extended, two-period, parallel AA/BB design, however, the variances of and of and can be identified. The linear regression model can then be extended with a random intercept as well as a fixed effect of time, yielding the following mixed effects model:

In (2), time is coded 0 for observations at the first measurement and coded 1 for observations at the second measurement. The random terms , , and are independently normally distributed, with mean 0 and variances , , and , respectively. Their relation with the variances in (1) is for treatment A and for treatment B.

In the case we want to examine whether there is an interaction between treatment and period, the model in (2) is extended as follows:where represents the treatment by period interaction effect. The parameters in (1)–(3) can be estimated through maximum likelihood (ML). In what follows, we are interested in optimally estimating in (1) and (2), which will be denoted as , and in optimally estimating in (3), which will be denoted as . A relevant concept is the intraclass correlation, which is between-subject variation on the outcome as compared to the total outcome variation. For the models in (2) and (3),this can be expressed as and for treatments A and B, respectively. The larger the person (between-subject) variance as compared to the error (within-subject) variance, the larger the intraclass correlations. Note that we assume a common between-subject variance, but allow for treatment-dependent within-subject variances, leading to treatment-dependent within-subject correlations. We also define a variance ratio , which can be expressed as a ratio of the intraclass correlations, .

3. Optimal Allocations and Corresponding Design Efficiencies

Let denote the variance of the estimator of the treatment effect in (1) or (2) or the variance of the treatment by the period interaction effect in (3), given a design . The efficiency of an estimator of is defined as the inverse of its variance, that is, . In the sequel, we will consider the efficiency of one design, , versus another design, , which is defined as and denoted as the relative efficiency. Since no closed-form expressions are available for the variances of the maximum likelihood (ML) estimator, asymptotic variances of the ML estimator were derived (Appendices A.1 and A.2).

The optimal allocation to treatments minimizes the variance of the estimator of in (1) or (2) and of in (3), given a fixed research budget. Note that changing the coding of the treatment factor or the time factor in (1)–(3), for instance into 1 versus −1 instead of 1 versus 0, will not affect the optimal allocation. Such a change of coding leads to a linear transformation of or , and this will change the variance of their estimators only by a multiplicative constant. This implies that allocations that minimize the variance of the estimators do not depend on the coding of treatment and time.

To derive the optimal allocations under a budget restriction, we need to define a budget function. Let the costs involved with each subject in the parallel design be csp euros, in an extended parallel design be csep euros, and in a crossover design be csc euros. These costs may represent financial rewards given to subjects for participating in the trial but also the (average) costs of recruiting a subject. Furthermore, for treatments A and B there are, for each subject, costs cA and cB, respectively, and each measurement may involve ct euros. Finally, attached to each treatment sequence, there may be administration costs cts.

In the case of allocation proportions pA for treatment A and pB = 1 − pA for treatment B in a parallel design having subjects, the following budget is required:

For the designs that we consider, this budget function can be reparametrized such that it is the same as the cost function given by Yuan and Zhou [8], thereby generalizing the cost function proposed by Brown [9] and Berger and Wong [4].

For an AB/BA crossover design, involving subjects and allocation proportions pAB for treatment sequence AB and pBA = 1 − pAB for treatment sequence BA, noting that each subject receives both treatment A and B and is measured twice, the following budget is required:

Finally, the required budget for an AA/BB design, involving subjects, with allocation proportions pAA and pBB = 1 − pAA for the treatment sequences AA and BB, respectively, is as follows:

Note that, for the functions in (4)–(6), the budget may simply be the total number of observations involved in a study, by setting ct = 1 and the other costs to 0. It can also represent the total number of subjects involved, by setting csp = csep = csc = 1 and the remaining costs to 0.

In what follows, we will assume that the subject-specific costs of the two-period designs are the same; that is, csep = csc = cs_2p. Since subjects in these designs receive two treatments and a washout period may be involved, these costs are very likely larger than those of a parallel design. We also assume that the subject-specific costs for the two-period designs will not exceed 2 times the subject-specific costs for the parallel design, so that csp ≤ cs_2p ≤ 2csp. Finally, since each design involves two treatment sequences, the administration costs are the same for each of the three designs considered, and thus, the budgets that are available for remaining costs are identical; that is, the budget C =  − 2cst is the same for each design.

3.1. Treatment Effect

For treatment effect estimation, the optimal allocations to the treatment sequences are derived in Appendix B. The optimal allocations and corresponding (asymptotic) variances of the treatment effect estimators are shown in the second and third column of Table 1, respectively. The optimal allocation ratios of the parallel and the extended parallel design depend on the costs and intraclass correlations: the more the expensive treatment A (or the cheaper treatment B) and the larger the intraclass correlation in treatment A (or the smaller the intraclass correlation in treatment B), the more the subjects have to be assigned to treatment B. The optimal allocation ratio for a crossover design is 1, which may be expected, since both groups receive both treatment A and B.

3.2. Treatment by Period Interaction Effect

In the case the treatment by period interaction effect is of primary interest, the optimal allocations can be derived along lines similar to the derivations for the treatment effect (Appendix B). The allocations and corresponding optimal variances are displayed in Table 1. Note that, similar to treatment effect estimation, the allocation ratio for a crossover design is 1, whereas the allocation ratio for an extended parallel design depends on the treatment costs and intraclass correlations, such that more persons are allocated to treatment sequence AA if the intraclass correlation of A decreases, the intraclass correlation of B increases, the costs of treatment A decrease, or the costs of treatment B increase.

4. Maximin Designs

Choosing the optimal allocation requires knowledge on the intraclass correlations and (remember that the variance ratio is fixed if and are given). Commonly, there is only limited knowledge on these parameters. A possible solution is the maximin strategy [4], consisting of 2 steps: (1) for each design determine the minimum efficiency of the effect estimator across the plausible ranges for the intraclass correlations and and (2) choose that design which maximizes this minimum efficiency. Such a design optimizes a worst case scenario and is called a maximin design. The maximin strategy implies choosing the design that minimizes the maximum variance of the estimator of the effect of interest. In determining sample sizes, choosing values for the intraclass correlations and within their plausible ranges (and thus a variance ratio within its plausible range) for which the variance is maximum will guarantee the desired power level also for all other values of these parameters. Moreover, the maximin design guarantees this power level at the lowest research costs. In what follows, we will refer to ranges of and that have lower bounds and and upper bounds and , respectively.

4.1. Treatment Effect

From the asymptotic variances in Table 1, one can derive for which values of and (and thus for which value of the variance ratio ), the variance of the treatment effect estimator is maximized. These derivations are given in Appendix C. The maximin parameter values and corresponding variances for the treatment effect estimator under optimal allocation to the treatments are shown in Table 2. The corresponding optimal allocations for the maximin designs are obtained by substituting the maximin parameter values of Table 2 into the allocation ratios as given in Table 1.

If for a parallel design the maximin value for the variance ratio  =  is within the plausible range for , that is, [] and cs_2p ≤ 2csp, then a parallel design is always less efficient than a maximin crossover design. If for an extended parallel design the maximin value for one of the intraclass correlations is within the plausible range for the corresponding intraclass correlation, then also this design is less efficient than a maximin crossover design. For other scenarios, the relations between the maximin designs are more complicated, depending on the ranges for and , the costs of treatments, subject recruitment, and measurement.

A systematic numerical evaluation was done to examine under what conditions the crossover design is the best choice in terms of efficiency. For and , we consider ranges of width 0.10 (small), 0.30 (medium), and 0.60 (large). The lower bounds were {}, where the largest possible lower bound was determined by the width of the range under consideration. For instance, if the range is 0.30 (medium), the largest lower bound for the intraclass correlation is 0.70. All combinations of small, medium, and large ranges for and were considered. The values of the variance ratio thus considered vary from 1/100 to 100. Since in most crossover trials, the intraclass correlation exceeds 0.30 [13, 10, 11], ranges with lower bounds of 0.30 or higher are empirically most relevant. The empirical evidence on the costs cA, cB, ct, csp, and cs_2p is scarce, and we thus choose costs covering a wide range of scenarios. Let CRA = (cA + ct)/csp, CRB = (cB + ct)/csp, and CRp = cs_2p/csp (note that the relative efficiencies of the maximin designs depend only on these cost ratios). CRA and CRB take on the values 100, 20, 10, 1, 0.1, 0.05, and 0.01. For CRp, we consider 1 and 2.

If the costs of treatments are identical between the treatment arms, that is, CRA = CRB, for most scenarios examined, the crossover maximin design turns out to be most efficient. For CRp = 1 and CRA = CRB ≤ 1, the crossover design always is the most efficient. For CRp = 1 and CRA = CRB > 1, or CRp = 2, only if the lower bound of one of the intraclass correlations is 0.05 or lower and the ranges of the intraclass correlations do not overlap, the parallel design can become most efficient. Since in most empirical studies the intraclass correlations will exceed 0.05, this implies that, for equal costs of treatments, the crossover maximin design will almost always be the most efficient design.

In the case the treatment costs differ and CRA ≤ 1 and CRB ≤ 1, only in the case the lower bound of one of the intraclass correlations is 0.10 or smaller, the parallel or the extended parallel maximin design can become most efficient. The extended parallel design can only become most efficient if CRp = 1. Hence, in all scenarios with unequal treatment costs and CRA ≤ 1 and CRB ≤ 1, for intraclass correlations of 0.15 or higher, the maximin crossover design is most efficient.

In the case the treatment costs differ and CRA > 1 or CRB > 1, the maximin crossover design is less often most efficient. For these cost scenarios, also for ranges of intraclass correlations exceeding 0.15, the maximin parallel and extended parallel design may become more efficient. This especially occurs if the costs of treatment A and lower bound of the range of are both larger (or smaller) than the costs of treatment B and lower bound of , respectively. The efficiency improvement is large if treatment A is much more expensive than treatment B and if the costs of treatments and measurements are large compared to the subject-related costs. This is illustrated in Figure 1. The top row shows that if the costs of treatment A are larger than the costs of treatment B and the lower bound of is larger than the lower bound of , a parallel design is most efficient, even up to an upper bound 1 of if CRA = 100. As can also be seen, the upper bound of is not very relevant in terms of the relative efficiencies. The left plot of the middle row of Figure 1 shows that if the lower bounds of and are equal, then for almost all upper bounds of , the crossover design is most efficient. Again, as can be seen in the rightmost plot of the middle row, if the lower bound of is higher than the lower bound of , then for higher upper bounds of , the parallel design is most efficient but to a lesser extent as compared to a smaller lower bound of . As is evident from the four subplots in the top and middle row, when increasing the ratio CRA/CRB, the crossover design becomes less efficient as compared to the other two designs. The subplots of the bottom row furthermore show that the crossover design also becomes less efficient compared to the other designs if CRA and CRB increase while the ratio CRA/CRB remains constant. This illustrates that the efficiency of the other designs relative to the crossover design becomes larger if the costs of treatments and measurements are large compared to the subject-related costs. However, to summarize, if the treatment costs differ and CRA > 1 or CRB > 1, no simple rules of the thumb emerge and the most solid way to choose the most efficient design is just to calculate the maximin variances as given in Table 2.

Finally, if CRp = 2, the maximin parallel design is consistently more efficient than the maximin extended parallel design (as is illustrated in Figure 1). If CRp = 1, the maximin extended parallel design can also become more efficient than the maximin parallel design.

4.2. Treatment by Period Interaction Effect

The maximin parameter values and corresponding variances of the estimator of the treatment by period interaction effect are shown in Table 3. The derivations of these results can be done along lines similar to the derivations for the treatment effect estimator (Appendix C). The optimal allocation for the extended parallel design is obtained by substituting the maximin parameter values in the expression for the allocation ratio in Table 1. For a crossover design, the allocation ratio is 1. The maximin efficiency of an extended parallel design is always higher than that of a crossover design if the maximin value is within the plausible range for . This follows fromwhere the right-hand side of the inequality in turn is smaller than the variance of a maximin crossover design (Table 3). The higher maximin efficiency of the extended parallel design can also be shown to hold if the maximin value is within the plausible range for . Furthermore, if the variance maximizing values and are outside the plausible ranges for and , respectively, then values for and that coincide with one of the borders of their corresponding ranges should be chosen as values that maximize the variance. But in that case, even smaller variances result for the extended parallel design.

4.3. Maximin Designs That Minimize the Number of Subjects and Number of Measurements

As noted in Section 3, by setting csp = csep = csc = 1 and the remaining costs to 0 in (4)–(6), the budget is simply the total number of subjects involved in a study, and by setting ct = 1 and the other costs to 0, the budget reduces to the total number of measurements involved. When the budget is the total sample size and interest is in estimating the treatment effect, it can be proven, based on the formulas in Table 2, that a maximin crossover design requires less subjects than a maximin parallel design. From an extensive numerical evaluation analogous to the one of Section 4.1, a maximin crossover design also appears to require less subjects than a maximin extended parallel design.

When minimizing the number of measurements, the numerical evaluation shows again that the maximin crossover design is the best choice provided the lower bounds of both intraclass correlations are 0.10 or higher. In other cases also a maximin parallel design may minimize the total number of measurements. Since in most crossover trials the intraclass correlation exceeds 0.30 [13, 10, 11], in practice, this implies that the maximin crossover trial also is the best choice when minimizing the number of measurements.

In the case, interest is in the treatment by period interaction, Section 4.2 showed a maximin extended parallel design to be more efficient and thus also to require less budget than a maximin crossover trial. In the special case where the number of subjects or the total number of measurements are minimized, the maximin extended parallel design will therefore also outperform the maximin crossover design.

5. Monte Carlo Evaluation of the Power of Maximin Designs

The efficiencies as derived for the maximin designs are based on the asymptotic variance of the ML estimator, . For sufficiently large numbers of subjects, the relation between the asymptotic variance of the ML estimator and the power level 1 − γ to detect a treatment effect in a two-tailed test with type I error rate α can be approximated as follows:where and are the 100 (1 − α/2) and 100 (1 − γ) percentiles of the standard normal distribution. For small sample sizes calculated by (8), corrections are needed [12, 13]. For each of the three designs, these corrections will be applied. We will examine to what extent the differences between designs in asymptotic efficiencies translate into corresponding differences in power levels for small samples. Also, when planning sample sizes based on the asymptotic variances, we can check whether the commonly used power levels of 80% or 90% are realized in the case of small samples.

For the treatment effect estimator, the following expression for the required number of subjects results for a crossover design with optimal allocation:

If we let ES =  be the effect size based on the outcome variances in the treatment and control arm (cf. [14]), then (9) can be rewritten as follows:

Note that, in the case of a maximin design, the expression is the same as (10), however, with and being substituted for and , respectively. Similar rewritings of the sample sizes in terms of the effect size are possible for the parallel and extended parallel design, respectively:

The choices to be made for and in the case of maximin versions of the parallel and extended parallel design are determined by the conditions as formulated in Table 2.

In the case of the treatment by period interaction effect, the following expression for the required number of subjects can be derived for a crossover design with optimal allocation:where ES = . In the case of a maximin design, the expression is the same as (12), however, with and being substituted for and , respectively. The expression for the sample size of an extended parallel design, when allocating optimally, can be written as follows:

The choices to be made for and in the case of a maximin extended parallel design are determined by the conditions formulated in Table 3.

Since maximin designs only require information on plausible ranges of model parameters, they are more practical than optimal designs. In what follows, we will therefore examine through a Monte Carlo simulation the power for maximin designs in the case of small sample sizes. First, we will discuss the factors that are varied and motivate the choices made for these factors in determining the simulation scenarios.

5.1. Choice of Ranges/Values for Relevant Factors
5.1.1. Effect Sizes

The effect size, ES, is commonly categorized into small (0.2), medium (0.5), and large (0.8) [14]. Being primarily interested in the small sample performance of (9)–(13), we only will consider ES = 0.8, leading to the smallest sample sizes.

5.1.2. Costs

The empirical evidence on costs is rather scarce, but we will choose the costs such that they imply minimizing the sample size of a study (i.e., cA = cB = ct = 0 and csp = cs_2p = 1).

5.1.3. Intraclass Correlations

The ranges for and are identical to the ranges of the numerical evaluation of Section 4.1. Since we are interested in the small sample performance, for each design, we consider that pair of ranges across all combinations of ranges for the intraclass correlations (i.e., small-small, medium-medium, large-large, small-medium, small-large, and medium-large) that lead to the smallest sample sizes. Since this each time turns out to be a pair from the small-small category, the same was done for all pairs of medium and large ranges, which will be used more often in practice. For each design, the two resulting pairs of ranges of intraclass correlation are displayed in the two leftmost columns of Table 4 and the Table in Appendix D.

5.1.4. Power Level and Type I Error Rate

In sample size planning commonly used power levels are 80% and 90% in a two-tailed test with either a 5% or a 1% type I error rate. Focusing on the small sample performance, we will consider 80% power in a two-tailed test with a 5% type I error rate. For small sample sizes derived from the standard normal distribution (as in (9)–(13)), corrections are needed that turn out to depend on the type I error rate [12, 13]. For this reason, we will also study a 1% type I error rate.

5.2. Simulation Procedure and Testing Methods

For each of the 20 simulation scenarios, 25,000 data sets were generated. To distinguish chance deviations of the simulated power from systematic deviations, a 95% predictive interval was calculated. For a nominal power π, this is defined as [, ], where z0.025 and z0.975 are the 2.5 and 97.5 percentiles of the standard normal distribution and Nsim is the number of simulations. Since the nominal power π = 0.80 and Nsim = 25,000, the 95% predictive interval is [0.795, 0.805].

For a test of the treatment effect, the data generated for the crossover design were analyzed with a two-sample t-test on the difference scores obtained by subtracting the two measurements for each subject. The model in (2) implies homogeneity of variances for these difference scores, so that a pooled variance t-test was applied. The data generated for the parallel design were simply analyzed by a two-sample t-test on the original scores, whereas the data for the extended parallel design were analyzed with a two-sample t-test on the scores averaged across both measurements. For these parallel designs, (1) and (2) imply that the analyzed scores may have variances differing between groups, so that an unpooled variance t-test was applied. For the treatment by period interaction effect, the data generated for the crossover design were analyzed with a two-sample (pooled variance) t-test on the scores averaged for each subject across both measurements, whereas the data generated for the extended parallel design were analyzed with a two-sample (unpooled variance) t-test on the differences between the two measurements (3). These different t-tests follow for each of the designs (involving equal numbers of measurements per subject) from the analysis models in (1)–(3) and do not require asymptotic assumptions.

In calculating the required sample size, (9)–(11) were used, when interest is in testing the treatment effect, and (12) and (13) were used, when interest is in testing the treatment by period interaction. The optimal allocations for each design are given in Table 1, taking the maximin values for and as determined from Tables 2 and 3 for the treatment main and treatment by period interaction effect, respectively. Since the outcome variances are unknown, the calculated sample sizes were subsequently corrected. For an unpooled variance t-test, if the number of persons per arm is 8 or more, sufficient corrections for two-tailed tests are 2 extra persons per arm if α = 0.05 and 4 extra persons per arm if α = 0.01. With less than 8 persons in one or both arms, sufficient corrections are 3 extra persons per arm if α = 0.05 and 4 extra persons per arm if α = 0.01 ([13], p. 568). For a pooled variance t-test, we only need to add 1 person per arm if α = 0.05 (two-tailed) and 2 persons per arm if α = 0.01 (two-tailed) ([12], p. 1216-1217). These are sufficient corrections for planned powers of 80% and 90%. The simulations, statistical tests, and power calculations were done in R, version 3.1.3 [15].

5.3. Results

As can be seen in Table 4 and Appendix D, for all sample size-design combinations that should yield 80% power, the simulated powers were either within or above the 95% predictive intervals. This indicates that the asymptotic results, supplemented with simple correction rules for using the standard normal distribution, yield sample sizes that guarantee the desired level of power. The realized power levels generally are higher than 80%, since the small sample corrections are sufficient and in some cases smaller corrections would have been appropriate [12, 13].

The power differences between the designs can become rather large and are in line with the asymptotic relative efficiencies. For the examples of Table 4, the crossover design always is most efficient and in the simulation also has the highest power. Additional simulations show that similar conclusions can be drawn for ranges of intraclass correlations for which the crossover design is not most efficient. As expected, when testing the treatment by period interaction, the extended parallel design has more power than the crossover design (Appendix D).

6. Application in Planning a Trial

Suppose one would like to perform a randomized trial on the effectiveness of indacaterol versus tiotropium, among subjects suffering from chronic obstructive pulmonary disease, similar to Donohue et al. [16]. After 12 weeks of treatment, one plans to evaluate the effect of 18 μg of tiotropium versus 300 μg of indacaterol on the bronchodilator efficacy of 24 h postdose forced expiratory volume in 1 s (FEV1 in mL). The variance of FEV1 in the indacaterol and the tiotropium conditions in the study by Donohue et al. [16] differed significantly from each other, their ratio being 1.58. No information was available on the intraclass correlations and the research costs. Suppose reasonable guesses on the intraclass correlations are for indacaterol and for tiotropium. If one aims at minimizing the total sample size (so we set cA = cB = ct = 0 and csp = cs_2p = 1), the maximin crossover design is more efficient than the AA/BB and A/B designs requiring only 48% and 43%, respectively, of the number of subjects of these designs. To be able to detect a medium effect (ES = 0.5, see below (9)) with 80% power in a two-tailed test with a 5% type I error rate, taking as maximin parameter values  = 0.10 and  = 0.30 in (10), 54 subjects are needed. Since the sample size calculation in (10) is based on the standard normal, whereas the test statistic follows a t-distribution, we add 1 subject to each treatment sequence [12], yielding a total sample size of 56 subjects with 28 subjects being allocated to each of the two treatment sequences of the crossover design.

7. Conclusion and Discussion

We examined the asymptotic efficiency of the ML estimator of the treatment and the treatment by period interaction effect for three two-treatment designs: a parallel, an extended parallel, and a crossover design. For a flexible cost function, the optimal allocations to the treatment sequences and corresponding optimal efficiencies were derived. Since commonly the intraclass correlations for each of the treatments and the ratio of treatment-dependent variances are not precisely known, also maximin designs were derived, which guarantee a power level across plausible ranges of values for the intraclass correlations at the lowest costs.

When interested in testing the main effects of the treatments, the relations between the efficiencies of the maximin versions of the A/B, AB/BA, and AA/BB designs depend on assumed ranges of the intraclass correlations, on the costs of the treatments and the costs of recruiting and measuring subjects. A numerical investigation shows that if A and B are equally expensive or the sum of the costs of one treatment and measurement per person are less than the remaining subject-specific costs (such as recruitment costs), then the crossover design is most efficient for ranges of intraclass correlations starting at 0.15 or higher. In other cost scenarios, also for ranges of the intraclass correlations above 0.15, the parallel design or its extended version may become most efficient. Then, the efficiency relations are complicated, and the most efficient design is best determined by the results in Table 2. For the treatment by period interaction, however, the maximin AA/BB trial is proven to be more efficient than the maximin AB/BA design.

Since the efficiency comparisons of the maximin designs were based on asymptotic variances, a Monte Carlo simulation study was done for small samples. After applying correction factors in sample size planning based on the standard normal distribution, it was shown that (a) the asymptotic relative efficiencies translate into corresponding relative power levels and (b) power levels targeted in sample size planning are realized. This illustrates the practical utility of these results for sample size calculation.

If prerandomization measurements of the outcome variable are available, these could be included as covariates in the analysis [1]. Adding covariates in a randomized trial will not change the treatment effect of interest but will lead to a reduction of the intercept variance and thus of the intraclass correlations [3]. Provided that the costs of prerandomization measurements are the same for all designs (and there are no missing values on these prerandomization covariates), the results of the present study also apply.

The present study did not consider carry over in deriving optimal and maximin designs. If there is self carry over, that is, carry over from a treatment onto itself, this implies that steady-state did not yet occur in the first period, and then the total treatment effect would be the relevant effect, that is, the direct effect in the first period plus the carry over effect in the second period [5]. If there is self carry over, one-period designs and the AB/BA design are not suitable, as they do not allow for estimating the total treatment effect, leaving only the AA/BB design as a suitable option. There may also be steady-state carry over, which can only occur if there is a switch of treatments [17]. Such carry over would affect the efficiency of the crossover design. Although one commonly tries to avoid carry over, examining to what extent steady-state carry over affects the relative efficiency of the maximin crossover design would be an interesting issue for further research.

Appendix

A.1. Asymptotic Variance for the ML Estimator of the Treatment Effect

Let the vector of observed scores on the dependent variable for person in a two-period design be denoted as . The linear mixed model for the scores can be expressed as follows:where is the design matrix for subject j, is the vector of regression coefficients, is the random person effect, is the vector of residual scores under treatment A, and is the vector of residual scores under treatment B. In (A.1), 1 = , I is the identity matrix, and Q is a matrix which indicates whether treatment A has been given or not in a particular period, so for a person j with treatment sequence AB, we have Q = , indicating that A has been given in period 1 and treatment B in period 2.

Let J be a matrix with only ones (of order 2 by 2). The variance-covariance matrix of can be derived as follows:

For the AA sequence, we have Q = I, and (A.2) can be rewritten as follows:

By applying the result that, for an n × n matrix, , with a 0 and a −nb, the inverse is ([18], p. 443), the inverse of the matrix in (A.3) can be written as follows:

For the BB sequence, we have I − Q = I, and in a similar way, we obtain

Finally, for the AB sequence and BA sequence, we can derive the following equations, respectively:

The information matrix of the ML estimator of can be written as

Taking the inverse of the information matrix yields the asymptotic variance-covariance matrix of the ML estimators of .

Different treatment sequences not only lead to different matrices but also to different matrices. As an example, we consider the crossover design. In total, we have N subjects, and p is the proportion of persons being allocated to the AB sequence. Let , where the regression coefficients represent the intercept, the treatment effect, and the time effect, respectively. For persons allocated to the AB sequence, we have , where . For persons allocated to the BA sequence, we have , where . We can now elaborate the matrix formulation in (A.7). For persons in the AB sequence, we haveand for persons in the BA sequence, we obtain

The information matrix in (A.7) can be obtained by summating the expressions in (A.8) and (A.9) across all persons in each of the treatment sequences, and the asymptotic variance-covariance matrix of the ML estimators then results by taking the inverse of this matrix. We are interested in the variance of the treatment effect estimator, which is the entry in row 2 and column 2 of the resulting variance-covariance matrix:which, noting that , , and can be written as the expression given in Table 5. Along similar lines, the variance of the treatment effect estimator for a parallel and an extended parallel design, as shown in Table 5, can be derived.

A.2. Asymptotic Variance for the ML Estimator of the Treatment by Period Interaction Effect

The derivations of the variance of treatment by period interaction estimator are also similar, but start from another model and corresponding design matrices for the fixed regression coefficients. Now , where the regression coefficients represent the intercept, the treatment effect, the time effect, and the treatment by period interaction effect, respectively. When considering again a crossover trial as an example, for persons assigned to the AB sequence, we have , where . For persons assigned to the BA sequence, we have , where . For persons in the AB sequence, we obtainand for persons in the BA sequence, we obtain

The information matrix in (A.7) is obtained by summating the expressions in (A.11) and (A.12) across all persons in both treatment sequences, and the asymptotic variance-covariance matrix of the ML estimators then results by taking the inverse of this matrix. We are interested in the variance of the treatment by period interaction effect estimator, which is the entry in row 4 and column 4 of the variance-covariance matrix:which can be rewritten in terms of , , and , as the expression in Table 5. Along similar lines, the variance of the treatment by period interaction effect estimator for an extended parallel design, as shown in Table 5, can be derived.

B. Derivation of Optimal Allocations under a Budget Constraint

The asymptotic variance of the treatment effect estimator is minimized as a function of p, the allocation proportion, given a fixed budget C. For a crossover, parallel, and extended parallel design, p is the proportion allocated to the sequence AB, A, and AA, respectively. The variance for a crossover design as given in Table 5 can be rewritten in terms of the research costs and budget C, employing the cost function in (5) and noting that csc = cs_2p:

It is easy to see that this expression is minimized if is maximized, which is the case if .

For a parallel design, the variance of the treatment effect estimator as given in Table 5 can be rewritten in terms of the cost function in (4) as follows:

Taking the derivative of (B.2) with respect to p yields the following expression:

Solving the expression for p gives two solutions, one of which turns out to give a minimum (the second derivative of the variance as a function of p is positive for this particular value):

Finally, for an extended parallel design, the variance of the treatment effect estimator as given in Table 5 can be rewritten in terms of the cost function in (6) noting that csep = cs_2p:

Taking the derivative with respect to p yields the following expression:

Solving (B.6) for p gives two solutions, one of which turns out to give a minimum (the second derivative of the variance as a function of p is positive for his particular value):

Substituting the optimal allocation , and the allocations in (B.4) and (B.7) into the corresponding expressions for the variance of the treatment effect estimator yields the optimal variances as given in Table 1 of the main text. Derivations along similar lines can be done if interest is in minimizing the variance of the treatment by period effect estimator. These result in the allocations and variances, as also presented in Table 1 of the main text.

C. Derivation of Maximin Designs and Variances of the Effect Estimators

Maximin designs optimize the allocation to the treatment sequences under the worst case, that is, the maximum variance of the treatment effect estimator across plausible ranges of the model parameters, here the intraclass correlations and .

For crossover designs, the variance of the treatment effect estimator under optimal allocation to the treatments is (see Table 1 of the main text)

By taking the derivatives with respect to and , it can be shown that this expression decreases as a function of these parameters. This implies that the worst case occurs for the lower bounds of the ranges for and . This yields the expression for the variance of the treatment effect estimator in the case of a maximin crossover design in Table 2 of the main text.

Similarly, for a parallel design, we have to examine for which values of and the following expression is maximized:

For derivations purposes, it is more convenient to rewrite (C.2) in terms of :

Taking the derivative of (C.3) with respect to , we find that the variance increases as a function of as long as and decreases as a function of if . So, if we can choose and from their plausible ranges such that , this maximizes (C.2). On the other hand, if even the lower bound of , , and the upper bound of , , yield , then choose and . Furthermore, if , then choose and . Substituting the maximin values of and into (C.2) will result in the variances of the treatment effect estimator as displayed in Table 2 of the main text.

For the extended parallel design, the variance for optimal allocation to the treatment sequences is given in Table 1 of the main text:

Taking the derivative of (C.4) with respect to shows that the variance of the treatment effect estimator increases as a function of as long as where and decreases if . Taking into account a feasible range for , the value for maximizing the variance in (C.4), , therefore is

Taking the derivative of (C.4) with respect to shows that the variance increases as a function of as long as and decreases if . So, also taking into account a feasible range for , we have as maximin value for :

For each of the intraclass correlations, and , there are three possible values that maximize the variance of the treatment effect estimator. Not all values can cooccur. (C.5) and (C.8) imply that , which in turn is true only if , an unrealistic condition that we will not consider. Furthermore, if then (C.10), which implies that . If at the same time , then also . The latter two conditions for λ imply that , which is never true. On the other hand, if , then it is possible that . It can also be shown that if , then it is only possible that . So, all this implies that we can rewrite the “if” conditions in (C.5)–(C.7), with replaced by , and the “if” conditions in (C.8)–(C.10), with replaced by .

Finally, we can show that the combination and cannot occur. This implies the inequalities in (C.7) and (C.10), which in turn imply that and , respectively, which imply and , respectively, which are incompatible. All other combinations of the maximin values and can be shown to be possible. These results lead to the procedure for determining the maximin design as delineated in Table 2 of the main text for an extended parallel design.

Similar derivations can be given for the maximin versions of a crossover and extended parallel design in the case interest is in the treatment by period interaction effect. These derivations are, upon request, available from the author.

D. Powers from the Monte Carlo Simulations for Maximin Designs in the Case of the Treatment by Period Interaction

Table 6 provides the simulated powers for maximin designs in the case of the treatment by period interaction. For each pair of ranges of the intraclass correlations, the asymptotic efficiency of each design versus the most efficient design is given within brackets.

Data Availability

This study is not based on empirical data. However, the R programs that are used in this paper are available upon request from the corresponding author.

Conflicts of Interest

The author declares that there are no conflicts of interest regarding the publication of this paper.