Journal of Probability and Statistics

Journal of Probability and Statistics / 2019 / Article

Research Article | Open Access

Volume 2019 |Article ID 3249097 | 8 pages |

Biostatistical Assessment of Mutagenicity Studies: A Stepwise Confidence Procedure

Academic Editor: Anna Karczewska
Received13 Feb 2019
Revised26 Mar 2019
Accepted28 Mar 2019
Published14 Apr 2019


The paper addresses the issue of identifying the maximum safe dose in the context of noninferiority trials where several doses of toxicological compounds exist. Statistical methodology for identifying the maximum safe dose is available for three-arm noninferiority designs with only one experimental drug treatment. Extension of this methodology for several experimental groups exists but with multiplicity adjustment. However, if the experimental or the treatment groups can be ordered a priori according to their treatment effect, then multiplicity adjustment is unneeded. Assuming homogeneity of variances across dose group in normality settings, we employed the generalized Fieller’s confidence interval method in a multiple comparison stepwise procedure by incorporating the partitioning principle in order to control the familywise error rate (FWER). Simulation results revealed that the procedure properly controlled the FWER in strong sense. Also, the power of our procedure increases with increasing sample size and the ratio of mean differences. We illustrate our procedure with mutagenicity dataset from a clinical study.

1. Introduction

Assessing an investigational substance for mutagenic activity is one of the vital concerns of genetic toxicologists. This is because it is unacceptable to declare a substance as nonmutagenic when in actual fact it is mutagenic. Hence, the objective of mutagenicity assay in regulatory toxicology is the decision on mutagenicity or nonmutagenicity of an investigational substance (Hothorn et al., [1]). Therefore, it is important to adopt reliable biostatistical procedure to properly control (FWER) in a strong sense. However, a deep-seated problem of a statistical procedure is the possibility of a false decision. A typical experimental design used in this assay for genotoxicity assessment in one-way model in groups is as follows:In this setup, we have two objectives to achieve. Firstly, we need to assess the sensitivity of the experiment in order to ensure the validity of the study by comparing the the positive control to negative control. Secondly, we simultaneously compare each of the treatments with the negative control. Statistical decision in this settings involves multiple comparison and stepwise procedures: that is, individual inferences are made in stepwise manner if the sequence of individual inferences is in a specific order, as used in Stefensson et al. [2], Cao et al. [3], Chen [4], and Adjabui et al. [5]. Some simultaneous inferences remit multiplicity adjustments by invoking the partition principle proposed by Finner and Strassburger [6]: where the parameter space is partitioned into many disjoint subsets and only one of these nonempty disjoint subsets contains the true parameter of interest, so that the FWER will be properly controlled. In literature, mutagenicity dataset has been assessed according to the proof of safety by utilizing the concept of the maximum safe dose (Hothorn and Hauschke [7], by numerous authors, among them Hauschke and Hothorn [8], Hauschke et al. [9], Hothorn and Bretz [10]).

As a result, this article discusses statistical aspects in terms of design and analysis using stepwise confidence set-based procedure for identification of maximum safe dose: that is, the highest experiment dose with no biological relevant increase in safety effect in comparison with negative control (Hothorn amd Hauschke [9]). We organize the article as follows. In Section 2, we provide both the testing and confidence notations, which are essential for the construction of our proposed stepwise confidence procedure. We proposed stepwise confidence interval procedure for identifying maximum safe dose for a normally distributed data with equal variances across dose group in Section 3. In Section 4, we carried out simulation studies to investigate the performance of our stepwise confidence interval procedure in terms of FWER and power estimation. We apply our proposed procedure to analyze real dataset as an example in Section 5. We end with conclusion of our study in Section 6.

2. Preliminaries

2.1. Testing Procedure

Let a random sample be the observations from group . Consider a one-way model as follows:where represent the genetic response for the experimental unit, in the treatment group, where denote the negative control group and denote a positive control group, respectively. Suppose that the random sample variables are mutually independent and follow a normal distribution with means , and with their respective sample sizes , and which are not necessarily equal. The random error has , where is unknown constant variance. Without loss of generality, assume larger values of imply better safety of the treatment group.

The test problem is formulated aswhere is a relevant safety threshold. Practitioners, that is, genetic toxicologists, are often reluctant to define as an absolute value. However, Hauschke et al. [12] express the value as a fraction of difference between negative and positive control groups by , for . For some ethical reasons, a negative control group can be included in trial in (3). Therefore, the testing problem can be written aswhere is the ratio of difference in means denoted as Equation (4) is valid if and only if ; this is inescapable condition and must be determined in the first step in our stepwise procedure in order to assess the sensitivity of the trial. We can rearrange and express (3) as

Let the sample mean estimates beThe unknown and common variance can be estimated aswhere is the pooled estimator of the variance and , and denote the sample variances for the experiment and positive and negative groups, respectively. Then, the random variablesfor are the test statistics for the testing problem in (3), which has distribution with degrees of freedom. Pigeot et al. [13] have proved that one can claim safety if where is -percentile of the central - distribution with d.f. There are two approaches in solving the problem in (2), namely, the p-value approach and the confidence interval approach. It is noted in literature that the confidence interval approach is preferred to p-value approach. Therefore, in this study, we will construct a confidence set-based approach for for that remits multiplicity adjustment. The concept of maximum safe dose (MSD) for the proof of safety was defined by Hothorn and Hauschke [7] as which means that is rejected if () at a given level . Then, safety can be concluded for treatments   .

In solving the testing problem in (3), we construct simultaneous confidence sets using intersection-union principle formulated by Berger [14]: the global null hypothesis can be expressed as the union of the subsets of the null hypotheses, against the intersection of the alternatives hypotheses , that is,If is rejected, then are all rejected too in a stepwise fashion. In this case, no multiplicity adjustment is needed. Notice that these hypotheses are a priori ordered according to their importance and one’s interest and beliefs but they assume no order restrictions.

2.2. Fieller’s Confidence Interval

We employed the generalized Fieller’s theorem [15] to construct confidence interval for for . We need to solve quadratic equations and then adapt the following notation from Hasler et al. [11]:thus yielding the upper confidence bounds asThe above confidence interval is only valid as long as by Fieller’s theorem [15]. The upper confidence limits for one-sided confidence interval arefor the parameters .

3. The Proposed Procedure

3.1. Stepwise Confidence Interval for Identifying Maximum Safe Dose Based on Ratio of Mean Differences

We identify maximum safe dose via Hsu-Berger [16] stepwise confidence set procedure: In the first step, we establish the assay sensitivity of the procedure by proving that . If not, the procedure stops, indicating that the sensitivity of experiment is inadequate. We estimate the upper confidence limits in the second step aswhere is the total number of treatment doses to be tested. In step three, we start screening the drug by screening the lowest dose (that is at ) for the first safety drug and sequentially screen the subsequent doses for without adjusting the levels in each of the steps in ascending manner searching for the first integer , if it exists such that and (this screens the first unsafe dose that is inferior to the reference dose). In this set up, dose level at step M is estimated as : the highest estimated safe dose that is noninferior to the reference doses, such that it and all lower doses at steps are also noninferior.

Once dose at step is estimated as , then the upper confidence bound for doses at steps is unneeded and should not be computed. A discernible property of this procedure is theoretically more powerful than Bonferroni-Holm step-down procedure (Holm [17]). This is because the value in our procedure is inexhaustible and hence in each step the entire is used without multiplicity adjustment while in Bonferroni-Holm step-down procedure the is exhaustible: that is, is exhausted and hence conservative. This may lead to liberal decision especially when is large. The conservativeness of Bonferroni-Holm step-down procedure is overcome by the partition principle employed in our procedure.

3.2. Validity of the Stepwise Procedure

To construct and validate simultaneous confidence sets in the above procedure in estimating MSD, the individual confidence intervals should have confidence level. For a given parameter space , we set as the rejection region and the alternative as the acceptance. We can construct simultaneous confidence set for the parameter vector by employing the partitioning principle (Bretz et al. [18]). In identifying the MSD, the parameter space can be decomposed into nonempty disjoint subset as follows:

Therefore, partition the entire parameter space . That is, . Each of these subsets is tested at a local level with the conviction that the true parameter of interest can be found in one and only one of the nonempty disjoint subsets. This construction leads to multiple comparison procedure which guarantees the control of family-wise error in the strong sense. Hence, (12) can be rewritten as

Theorem 1. Suppose that are the confidence bounds for , respectively, with confidence level . Then, for all , we have

The proof of Theorem 1 is a direct application of Theorem 1 of Hsu and Berger [16].

Proof. Case 1. Let M=1 be the step at which the procedure stops. In such a situation, the assay sensitivity of the experiment cannot be assessed
Case 2. : For j = , let
for . Then, the parameter space is partitioned by . Moreover,provides a confidence set for because if thenIn this setup, the unionized confidence set can be decomposed as follows:Finally, we have

Remark 2. The resulting proof of Theorem 1 warrants the control of FWER at level 1- in a strong sense.

For this reason, we state and prove the following proposition.

Proposition 3. The stepwise simultaneous inferences procedure for ratio of difference in means strongly controls the FWER at level .

Proof. Let be any unknown subset of . Suppose that , then no FWER will ever exist. Thus, assume that and , where Without loss of generality, let

Remark 4. Proposition 3 guarantees that FWER is properly controlled at prespecified nominal level . This is a critical requirement by Food and Drug Administration (FDA) for statistical procedures in dose-findings.

To confirm these theoretical results, the following simulation studies were carried out at Section 4.

4. Simulation Studies

4.1. FWER

We conducted simulation studies to investigate the performance of the (FWER). Without loss of generality, we set , . In this study, observations were generated with 1million replications from a normal distribution based on the assumption of equal variance across dose groups. This is indicated in Table 1 as HOMO. We also explored the effect of violation of this assumption as a way of comparing the two situations and this is indicated in Table 1 as HETRO. We used Hasler et al. [11] means configuration . For HOMO=( for ) and the HETRO= ( for ). In the simulation study, we considered only experimental treatment. Results from Table 1 indicated that the FWER is properly controlled at a nominal value in the case of equal variances but that of unequal variances is seriously conservative because simulated values are far below or above 0.025, the nominal level, and hence, poorly controlled the FWER.


5  0.0252 (0.0250)0.0299 (0.0305)
7  0.0249 (0.024800.0184 (0.0177)
9 (10)0.0251 (0.0251)0.0109 (0.0160)
11 (12)0.0249 (0.0247)0.0157 (0.0153)
13 (14)0.0244 (0.0249)0.0149 (0.0114)
15 (16)0.0247 (0.0248)0.0141 (0.0136)
17 (18)0.0249 (0.0250)0.0129 (0.0128)
19 (20)0.0249 (0.0248)0.0124 (0.0119)
21 (22)0.0249 (0.0250)0.0117 (0.0115)
23 (24)0.0205 (0.0247)0.0110 (0.0109)
25 (26)0.0250 (0.0249)0.0106 (0.0106)
27 (28)0.0251 (0.0245)0.0100 (0.0009)
29 (30)0.0250 (0.0248)0.0096 (0.0093)

4.2. Power Estimation

Power estimation is imperative for a well-design clinical study. There are many definitions of power in multiple comparisons procedures, but in this study, we will define power in the case of maximum safe dose. The maximum safe dose is established when and for . That is,

Hence, in this setting, power is defined as the probability of rejecting the incorrect null hypotheses. This power concept is directly related to all-pairs power definition introduced by Ramsay [19]. Therefore, (25) expression can be rewritten asTherefore, (26) can be calculated from a variate noncentral -distribution with degree of freedom and noncentrality parameters for :

It is possible to express common variance as a fraction of difference , that is, ,  . Hence, the following representation of noncentrality parameter based on the ratio of mean differences is stated asFrom (28), it is clear that the expected values of power are a function of , the ratio of mean differences, and the sample sizes. From Table 2, it can be seen that power increases with increasing and sample size but decreases with increasing values of . This is consistent with the results of Pigeot et al. [13].



5. Example

To illustrate our procedure, we used raw data published by Adler and Kliesch [20] for a micronucleus assay by applying 30mg/kg, 50mg/kg, 75mg/kg, and 100mg/kg doses of hydroquinone (Hydro) with positive control 25mg/kg cyclophosphamide. Their primary interest is to demonstrate whether the underlying substance is able to induce chromosome damage or interact with spindle apparatus. The male mice studies results of 24h sampling time are given in Table 3. and summary of the test for micronucleus assay data from Hasler et al. [11] is given in Table 4.

Experimental group MeanStandard deviationSample size

Vehicle control2.571.277
Positive control258.914

Treatment groups Unadjusted p-valuesUpper bound

30 mg/kg0.00880.24
50 mg/kg0.01820.35
75 mg/kg0.02880.74
100 mg/kg0.096391.04

In evaluation of the mutagenicity data from Table 3 and setting and , where is the safety threshold, the following results were obtained:

the procedure then stop at step 3, which implies that it is needless to step it further down.

From this analysis, the doses 30mg/kg and 50mg/kg are declared safe while doses 75mg/kg and 100mg/kg are unsafe at level . Since , 50mg/kg is recommended as the maximum safe dose, which the highest dose that is noninferior to the reference drug at level . Note that 30mg/kg is also noninferior to the reference drug but lower.

6. Conclusion

In this paper, we have proposed a stepdown confidence set approach for identification of maximum safe dose within the framework of noninferiority clinical trials. The classical three-arm trial for noninferiority investigations involves only one experimental treatment but in clinical trials some therapeutic situations necessitate comparisons with several experimental compounds. Therefore, the proposed trial is an extended three-arm noninferiority trial with only one treatment to multiple treatments without multiplicity adjustment. Our simulations results revealed strong control of the familywise type I error rate when we assumed equal variances across dose groups for a normally distributed dataset. This was validated by the partitioning principle.

Data Availability

I used data from literature for illustrative purposes.

Conflicts of Interest

The authors declare that they have no conflicts of interest.


  1. L. A. Hothorn, M. Hayash, and D. Siedel, “Dose-response relationship in mutagenicity assay including an appropriate positive control group; a multiple test approach,” Enviromental and ecologic statistics, vol. 7, pp. 27–42, 2000. View at: Google Scholar
  2. G. Stefansson, W. C. Kim, and J. C. Hsu, “On confidence set in multiple comparisons,” in Statistics Decision Theory and Related Topic, IV, S. Gupta and J. O. Berger, Eds., vol. 2, pp. 89–104, Springer, New York, NY, USA, 1988. View at: Google Scholar | MathSciNet
  3. L. Cao, J. Tao, N.-Z. Shi, and W. Liu, “A stepwise confidence interval procedure under unknown variance based on an asymetric loss fuction for toxicological evaluation,” Australian & New Zealand Journal of Statistics, vol. 57, no. 1, pp. 73–98, 2015. View at: Publisher Site | Google Scholar | MathSciNet
  4. J. T. Chen, “A nonparametric coherent confidence procedure,” Communications in Statistics—Theory and Methods, vol. 45, no. 11, pp. 3397–3409, 2016. View at: Publisher Site | Google Scholar | MathSciNet
  5. M. Adjabui, N. Howard, and A. Luguterah, “Nonparametric stepwise procedure foe identification of maximum safe dose(MSD),” Asian Research Journal of Mathematics, vol. 6, no. 3, pp. 1–12, 2017. View at: Publisher Site | Google Scholar
  6. H. Finner and K. Strassburger, “The partitioning principle: a powerful tool in multiple decision theory,” The Annals of Statistics, vol. 30, no. 4, pp. 1194–1213, 2002. View at: Publisher Site | Google Scholar | MathSciNet
  7. L. A. Hothorn and D. Hauschke, “Identification of maximum safe dose: a multiple test approach,” Journal of Biophamarceutical Statistics, vol. 10, pp. 15–30, 2000. View at: Publisher Site | Google Scholar
  8. D. Hauschke and L. A. Hothorn, “Two-stage testing of safety: A statistical view,” ATLA Alternatives to Laboratory Animals, vol. 31, no. 1, pp. 77–80, 2003. View at: Google Scholar
  9. D. Hauschke, R. Slacik-Erben, S. Hensen, and R. Kaufmann, “Biastatistical assessment of mutagenicity studiesby including the positive control,” Biometrical Journal, vol. 47, no. 1, pp. 82–87, 2005. View at: Publisher Site | Google Scholar
  10. L. A. Hothorn and F. Bretz, “Dose-response and threshold in mutagencity studies: a statistical testing approach,” ATL, vol. 31, no. 1, pp. 97–103, 2003. View at: Google Scholar
  11. M. Hasler, R. Vonk, and L. A. Hothorn, “Assessing non-inferiority of a new treatment in a three-arm clinical trial in the presence of heteroscedasticity,” Statistics in Medicine, vol. 27, no. 4, pp. 490–503, 2008. View at: Publisher Site | Google Scholar | MathSciNet
  12. D. Hauschke, T. Hothorn, and J. Schäfer, “The role of control group in mutagenicity studies:maching biological and statistical relevance,” Alternatives to Laboratory Animals, vol. 31, no. 1, pp. 65–75, 2003. View at: Google Scholar
  13. I. Pigeot, J. Schäfer, J. Röhmel, and D. Hauschke, “Assessing non-inferiority of new treatment in a three-arm clinical trial including a placebo,” Statistics in Medicine, vol. 22, no. 6, pp. 883–899, 2003. View at: Publisher Site | Google Scholar
  14. R. L. Berger, “Multiparameter hypothesis testing and acceptance sampling,” Technometrics. A Journal of Statistics for the Physical, Chemical and Engineering Sciences, vol. 24, no. 4, pp. 295–300, 1982. View at: Publisher Site | Google Scholar | MathSciNet
  15. E. C. Fieller, “Some problems in interval estimation,” Journal of Royal Statistical Society Series B, vol. 16, pp. 175–185, 1954. View at: Google Scholar | MathSciNet
  16. J. C. Hsu and R. L. Berger, “Stepwise confidence interval without multiplicity adjustment for dose response and toxicity studies,” Journal of the American Statistical Association, vol. 94, no. 446, pp. 468–482, 1999. View at: Google Scholar
  17. S. Holm, “A simple sequential rejective multiple test procedure,” Scandinavian Journal of Statistics, vol. 6, no. 2, pp. 65–70, 1979. View at: Google Scholar | MathSciNet
  18. F. Bretz, L. A. Hothorn, and J. C. Hsu, “Identifying effective and/ or safe doses by stepwise confidence interval for ratio,” Statistics in Medicine, vol. 22, no. 6, pp. 847–858, 2003. View at: Publisher Site | Google Scholar
  19. P. H. Ramsey, “Power differences between pairwise multiple comparisons,” Journal of the American Statistical Association, vol. 73, no. 363, pp. 479–485, 1978. View at: Publisher Site | Google Scholar
  20. I.-D. Adler and U. Kliesch, “Comparison of single and multiple treatments in the mouse bone marrow micronucleus assay for hydroquinone and cyclophosphamide,” Mutation Research/Environmental Mutagenesis and Related Subjects, vol. 234, no. 3-4, pp. 115–123, 1990. View at: Publisher Site | Google Scholar

Copyright © 2019 Michael J. Adjabui et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

More related articles

591 Views | 284 Downloads | 0 Citations
 PDF  Download Citation  Citation
 Download other formatsMore
 Order printed copiesOrder

Related articles

We are committed to sharing findings related to COVID-19 as quickly and safely as possible. Any author submitting a COVID-19 paper should notify us at to ensure their research is fast-tracked and made available on a preprint server as soon as possible. We will be providing unlimited waivers of publication charges for accepted articles related to COVID-19. Sign up here as a reviewer to help fast-track new submissions.