Abstract

In adaptive optimal procedures, the design at each stage is an estimate of the optimal design based on all previous data. Asymptotics for regular models with fixed number of stages are straightforward if one assumes the sample size of each stage goes to infinity with the overall sample size. However, it is not uncommon for a small pilot study of fixed size to be followed by a much larger experiment. We study the large sample behavior of such studies. For simplicity, we assume a nonlinear regression model with normal errors. We show that the distribution of the maximum likelihood estimates converges to a scale mixture family of normal random variables. Then, for a one parameter exponential mean function we derive the asymptotic distribution of the maximum likelihood estimate explicitly and present a simulation to compare the characteristics of this asymptotic distribution with some commonly used alternatives.

1. Introduction

Elfving [1] introduced a geometric approach for determining a c-optimal design for linear regression models. Kiefer and Wolfowitz [2] developed the celebrated equivalence theorem which provides an efficient method for verifying if a design is D-optimal, again for a linear model. These two results were generalized by Chernoff [3] and White [4] to include nonlinear models, respectively. See Bartroff [5], O’Brien and Funk [6], and references therein for extensions to the geometric and equivalence approaches. Researchers in optimal design have built an impressive body of theoretical and practical tools for linear models based on these early results. However, advances for nonlinear models have not kept pace.

One reason for the prevalence of the linear assumption in optimal design is that the problem can be explicitly described. The goal of optimal design is to determine precise experiments. Define an approximate design, proposed by Kiefer and Wolfowitz [7], as , where is a probability measure on consisting of support points and corresponding design weights ; are rational and defined on the interval and . Then the optimal design problem is to find the design that maximizes precision for a given experimental interest. Typically, this precision is achieved by maximizing some concave function, , of Fisher's information matrix. For example, when the estimation of all the parameters is the primary interest then the D-optimality criteria, where is equal to the determinant of the inverse of Fisher's information, are the most popular method. See Pukelsheim [8] for a detailed discussion of common optimality criteria.

The basic principles for nonlinear models are the same as for linear models except Fisher's information will be a function of the model parameters. As a result, optimal designs depend on the parameters and thus are only optimal in the neighborhood of the true parameters. The term locally optimal design is commonly used for nonlinear optimal designs to reflect this dependence on the parameters of interest.

To overcome this dependence Fisher [9] and Chernoff [3] suggest using expert knowledge to approximate the locally optimal design. Ford et al. [10] suggest optimal designs in nonlinear problems are to be used to provide a benchmark or to construct sequential or adaptive designs. Atkinson et al. [11] suggest using a polynomial expansion to approximate the nonlinear model with a linear one.

Stein [12] provides the earliest two-stage procedure in which the information from the first stage is used to determine design features for the second stage. In this paper we examine a two-stage adaptive optimal design procedure. An adaptive optimal design uses the data from all previous stages to estimate the locally optimal design of the current stage. Many, including Box and Hunter [13], Fedorov [14], White [15], and Silvey [16], have suggested using such designs. Recently, Lane et al. [17], Dragalin et al. [18], Fedorov et al. [19], Yao and Flournoy [20], and so forth have investigated the properties and performance of these procedures.

Lane et al. [17] show that the optimal stage-one sample size is of the order , where is the overall sample size, in a two-stage regression model. Luc Pranzato obtains this relationship for a more general model (personal communication, 2012). However, in certain experiments, for example, early phase clinical trials or bioassay studies, it is common to use designs with very small stage-one sample sizes. Current literature has characterized the adaptive optimal design procedure under the assumption that both stage-one and stage-two sample sizes are large.

In this paper we characterize the asymptotic distribution of the maximum likelihood estimate (MLE) when the stage-one sample size is fixed. The distribution for a nonlinear regression model with normal errors and a one parameter exponential mean function is derived explicitly. Then for a specific numeric example the differences between the finite stage-one sample distribution are compared with other candidate approximate distributions.

2. Adaptive Optimal Procedure for a Two-Stage Nonlinear Regression Model with Normal Errors

2.1. The Model

Let be observations from a two-stage experiment, where is the number of observations and is the single-dose level used for the th stage, . Assume that where is some nonlinear mean function. In most practical examples it is necessary to consider a bounded design space, that is, , . It is assumed that are independent conditional on treatment , where is fixed and is selected adaptively. Denote the adaptive design by , where .

The likelihood for model (2.1) is where are the stage specific sample means, and the total score function is where represents the score function for the th stage.

2.2. The Adaptive Optimal Procedure

Fix the first stage design point and let represent an estimate based on the first-stage complete sufficient statistic . The locally optimal design point for the second stage is which is commonly estimated by for use in stage 2. Because the adaptive optimal design literature assumes is large, the MLE of the second stage design point, , where is the MLE of based on the first stage data, is traditionally used to estimate .

However, when is small the bias of the MLE can be considerable. Therefore, for some mean functions using a different estimate would be beneficial. In general, the adaptively selected stage two treatment is

2.3. Fisher's Information

Since , a bounded design space, but , there is a positive probability that will equal or . Denote these probabilities as and , respectively. Then the per subject information can be written as where is the random variable defined by the onto transformation (2.5) of .

3. Asymptotic Properties

We examine three different ways of deriving an asymptotic distribution of the final MLE which may be used for inference at the end of the study. The first is under the assumption that both and are large. The second considers the data from the second stage alone. Finally, assume a fixed first-stage sample size and a large second-stage sample size.

3.1. Large Stage-1 and Stage-2 Sample Sizes

If is bounded and continuous and provided common regularity conditions that hold, as and , where . This result is used to justify the common practice of using to estimate in order to make inferences about . However, if is not bounded and continuous then it is very difficult to obtain the result in (3.1) and for certain mean functions the result will not hold. In such cases the asymptotic variance in (3.1) must be replaced with . Lane et al. [17] examine using the exact Fisher's information for an adaptive design , , instead of in (3.1) to obtain an alternative approximation of the variance of the MLE .

3.2. Distribution of the MLE If Only Second-Stage Data Are Considered

Often pilot data are discarded after being used to design a second experiment then the derivation of the distribution of the MLE using only the second-stage data takes if to be fixed: as , where . The estimate will likely perform poorly in comparison to if and are relatively of the same size but conceivably may perform quite well when is much smaller than . For this reason it represents an informative benchmark distribution.

3.3. Fixed First-Stage Sample Size; Large Second-Stage Sample Size

When the first-stage sample size is fixed and the second stage is large we have the following result.

Theorem 3.1. For model (2.1) with as defined in (2.5) if for all , , is an onto function of , and provided common regularity conditions, as , where and is a random function of .

Proof. As in classical large sample theory (cf. Ferguson [21] and Lehmann [22]): since can be expanded around as where is the true value of the parameter and . Solving for gives It can be shown that is consistent for if and which gives the result in (3.4).
Now, decompose the right hand side of (3.4) as As , , , and as . Thus, the first term in (3.7) goes to 0 as . Write the second term in (3.7) as Further as , and , The first term in (3.9) goes to 0. To evaluate the second term, it is important to recognize that and are independent and thus are independent. Because of this independence, where is a random function of and as determined by . Now, with as the result follows from an application of Slutsky's theorem.

Remark 3.2. Provided is bounded and continuous is the asymptotic distribution of as . The important case for this exposition is presented in Theorem 3.1. However, the two other potential cases can be shown easily.

Case 1. , and . As , which implies that , a constant, and thus converges to asymptotic distribution of given in (3.1).

Case 2. fixed, and . Just as in Case 1, , where . Note that differs from which depends on and . Therefore . Look back at (3.7) in the proof, but now take to be fixed; and and the only term left is Consider the following: and as . Therefore, as which is equivalent to .

4. Example: One Parameter Exponential Mean Function

In model (2.1) let , where , and . The simplicity of the exponential mean model facilitates our illustration, but it is also important in its own right. For example, Fisher [9] used a variant of this model to examine the information in serial dilutions. Cochran [23] further elaborated on Fisher's application using the same model.

For this illustration we use the MLE of the first-stage data to estimate the second-stage design point. Here, The adaptively selected second-stage treatment as given by (2.5) is Thus, the exact per subject Fisher information is For this example as . For more detailed information on the derivations of (4.1), (4.2), and (4.3) see Lane et al. [17].

The asymptotic distributions of the MLE in Sections 3.1 and 3.2 can be derived easily. For the asymptotic distribution of the MLE in Section 3.3 consider the following corollary. For details on the functions , , and see the proof of the corollary.

Corollary 4.1. If in model (2.1) then as , where is defined by where is the standard normal cumulative distribution function. Let and . Then if , If , then

Proof. First, we find the distribution of where and the random variable is defined by Figure 1 illustrates the map from to where , , , and .
Lambert's product log function (cf. Corless et al. [24]) is defined as the solutions to for some constant . Denote the solutions to (4.9) by . Let Then The function is real valued on , single valued at , and double valued on . , , . Therefore is real valued for all . For simplicity, define and for a given c.
We present the proof for the cumulative distribution function (CDF) of and the CDF of for the case where and . The derivation of the distributions under alternative cases is tedious and does not differ greatly from this case.
Note in this case the domain of is . If , then If , then If , then However, since , If , then Thus, Figure 2 plots the CDF of U for , , , , , and . The distribution is a piecewise function with discontinuities at the boundary points and .
Now consider the distribution of . Recall and and are independent. If , then The distribution is symmetric, thus the derivation of the CDF if is analogous.

4.1. Comparisons of Asymptotic Distributions

First, consider the distribution described in (3.1) using in place of and the distribution described in (3.2). When is significantly smaller than , and can differ significantly as a function of . This is primarily because is a function of , whereas is an average over . Through simulation it can be seen that a is a better approximate distribution of than for only a small interval of , and this interval has a very small probability. For these reasons the distribution of the MLE using only the second stage data as described in Section 3.2 is not considered further.

Now for a set of numeric examples consider three distributions: (3.1), (3.1) using in place of and the distribution of defined in (3.3). An asymptotic distribution can be justified in inference if it is approximately equal to the true distribution. In this case the true distribution is that of . However, does not have a closed form and thus its distribution cannot be obtained analytically or numerically. To approximate this distribution 10,000 Monte Carlo simulations have been completed for each example to create a benchmark distribution.

Figure 3 plots the three different candidate approximate distributions, found exactly using numerical methods, together with the distribution of approximated using Monte Carlo simulations, for , , , , , , and . Note the y-axis represents , , where is , is , and is . When it is difficult, graphically, to determine if or provides a better approximation for . It seems that if the distribution is preferable to ; however, when the opposite appears to be the case. It is fairly clear that for this example performs poorly.

When , it is clear that is much closer to than both and . Further, comparing the two plots one can see how the distribution of has nearly converged to but still differs from those and significantly, as predicted by Theorem 3.1 and Corollary 4.1.

Using only graphics it is difficult to assess which of , , and is nearest for a variety of cases. To get a better understanding, the integrated absolute difference of the CDFs of , , and versus that of for , , , , , and is presented in Table 1. First, consider the table where . The locally optimal stage-1 design point is when ; as a result this scenario is the most generous to distribution . However, even for this ideal scenario outperforms and for all values of . In many cases the difference between and is quite severe. In this scenario outperforms ; however, the differences are not great.

Next, examine the results for and . Once again outperforms and in all but 2 cases, where in many cases its advantage is quite significant. Also note that outperforms about half the time when and the majority of the time when . This supports our observation that when the distance between and increases the performance of compared with and worsens which indicates a lack of robustness for the commonly used distribution . This lack of robustness is not evident for and .

One final comparison is motivated by the fact that if , , , and have the same asymptotic distribution. Although our method is motivated by the scenario where is a small pilot study, there is no theoretical reason that will not perform competitively when is large. Table 2 presents the integrated differences for the distributions and from for , , , , , , and . is not included in the table due to the lack of robustness; it can perform better or worse than the other two distributions based on the value of . Even with larger values of , performs slightly better when and 100 and only slightly worse when indicating that using is robust for moderately large .

5. Discussion

Assuming a finite first-stage sample size and a large second-stage sample size, we have shown for a general nonlinear one parameter regression model with normal errors that the asymptotic distribution of the MLE is a scale mixture distribution. We considered only one parameter for simplicity and clarity of exposition.

For the one parameter exponential mean function, the distribution of the adaptively selected second-stage treatment and the asymptotic distribution of the MLE were derived assuming a finite first-stage sample size and a large second-stage sample size. Then the performance of the normalized asymptotic distribution of the MLE, , was analyzed and compared to popular alternatives for a set of simulations.

The distribution of was shown to represent a considerable improvement over the other proposed distributions when was considerably smaller than . This was true even when is moderately large in size.

Since the optimal choice of was shown to be of the order for this model in Lane et al. [17], the usefulness of these findings could have significant implications for many combinations of and .

Suppose it is desired that , where is the desired confidence level and is the true parameter. If one was to use the large sample approximate distribution given in (3.1), and , and therefore , cannot be determined until after stage 1. However, using (3.1) with in place of or by using on can compute the overall sample size necessary to solve for and before stage one is initiated. One could determine initially using (3.1) with or and then update this calculation after stage-1 data is available. Such same size recalculation requires additional theoretical justification and investigation of their practical usefulness.

We have not, in this paper, addressed the efficiency of the estimate . One additional way to improve inference would be to find biased adjusted estimates that are superior to for finite samples. We have not investigated the impact on inference of estimating the variances in the distributions of , , , and . Instead, the distributions themselves are compared. For some details on the question of estimation and consistency see Lane et al. [17] and Yao and Flournoy [20].

Acknowledgment

The authors would like to thank the reviewers for their helpful comments and suggestions.