Computational and Mathematical Methods in Medicine

Computational and Mathematical Methods in Medicine / 2018 / Article

Research Article | Open Access

Volume 2018 |Article ID 8134132 |

Cynthia Kpekpena, Saman Muthukumarana, "Bayesian Equivalence Testing and Meta-Analysis in Two-Arm Trials with Binary Data", Computational and Mathematical Methods in Medicine, vol. 2018, Article ID 8134132, 8 pages, 2018.

Bayesian Equivalence Testing and Meta-Analysis in Two-Arm Trials with Binary Data

Academic Editor: Kazuhisa Nishizawa
Received15 Jan 2018
Revised09 Jun 2018
Accepted24 Jun 2018
Published08 Aug 2018


We consider a Bayesian approach for assessing hypotheses of equivalence in two-arm trials with binary Data. We discuss the development of likelihood, the prior, and the posterior distributions of parameters of interest. We then examine the suitability of a normal approximation to the posterior distribution obtained via a Taylor series expansion. The Bayesian inference is carried out using Markov Chain Monte Carlo (MCMC) methods. We illustrate the methods using actual data arising from two-arm clinical trials on preventing mortality after myocardial infarction.

1. Introduction

Consider a clinical trial where a pharmaceutical company wants to test a new drug against a currently existing drug. Sometimes in these studies, the clinical trial end point may be the success or failure of the treatment. A binary outcome is an outcome whose unit can take on only two possible states “0” and “1.” This success/failure response variable could be heart disease (Yes/No), patient condition (Good/Critical), how often patient feel depressed (Never/Often), and so on. The natural distribution for modeling these types of binary data is the binomial distribution given by

The mean and variance for the binomial random variable are and , respectively. In (1), it is assumed that there are only two outcomes (denoted “success” or “failure”) and a fixed number of trials (n). The trials are independent with a constant probability of success.

The main objective of this type of clinical trial is to determine whether there is a significant difference between active treatment (new drug) and reference treatment (current drug). Tests of significance have generally been argued not to be enough. That is, if the value for a test of significance leads to the nonrejection of the null hypothesis, it is not a proof that the null hypothesis holds. The clinician may want to test a null hypothesis of equivalence against an alternative hypothesis that states that there is a sufficient difference between the two drugs.

Equivalence testing is widely used when a choice is to be made between a drug (or a treatment) and an alternative. The term equivalence in the statistical sense is used to mean a weak pattern displayed by the data under study regarding the underlying population distribution. Equivalence tests are designed to show the nonexistence of a relevant difference between two treatments. It is known that Fisher’s one-sided exact test is the same as the test for equivalence in the frequentist approach [1]. This testing procedure is similar to the classical two-sided test procedure but involves an equivalence zone determined by a margin known as equivalence margin ().

The equivalence margin (), which represents a margin of clinical indifference, is usually estimated from previous studies and as such is also based primarily on clinical criteria as well as statistical principle. This margin is influenced by statistical principle but largely dependent on the interest of the experimenter and research questions clinicians wish to answer. As such, the statistical method employed together with the design of the study must be in such a manner that the margin of difference is not too restrictive to capture the bounds of the research question. For a test of equivalence of two binomial proportions, the equivalence margin is discussed in [2].

The frequentist approach to equivalence testing is the two one-sided test (TOST) procedure. By the TOST, equivalence is established at the significance level if a confidence interval for the difference in treatment means is contained within the interval where is the equivalence margin.

The motivation for this paper is based on the fact that for a given disease, there is likely to be many other substitute drugs or new drugs that can be used to treat the patients. But these drugs may not all be at the same cost; some may possibly have adverse side effects, and the method of application could be complex for others. On grounds of these information, we do equivalence testing to see if two different drugs can be regarded as equivalent in terms of their treatment effect. There are a variety of different approaches to this problem as indicated by some recent literature. See Wellek [1], Albert [2], Gamalo et al. [3], Rahardja and Zhao [4] and Zaslavsky [5] for comprehensive details on recent developments. We remark that Gamalo et al. [3] consider a Bayesian approach to proportions along with noninferiority trials. In this paper, we consider a Bayesian approach focusing on equivalence tests. We also construct a simple normal approximation and provide a mechanism for missing data analysis as well.

The remaining sections of this article are organized as follows: In Section 2, Bayesian inferential procedure for binary data is discussed. Section 3 presents a normal approximation to the posterior distribution obtained via a Taylor series expansion. We then examine the suitability of this normal approximation. We discuss a Gibbs sampling mechanism for estimating missing data in Section 4. In Section 5, we analyze a published dataset by Carlin [6] and Yusuf et al. [7]. This dataset consists with 22 treatment-control trials to prevent mortality after myocardial infarction. We conclude with a discussion of the approach in Section 5.

2. Bayesian Inferential Procedure

Let be the number of individuals with positive exposure out of a total of patients in treatment group with proportion . Accordingly, let denote the number of individuals with positive exposure out of a total in the control group with proportion . Then,

The priors on the parameters, and are given by

Then the posterior distributions of and are given by

For Bayesian inference about treatment effect, a test is required to determine whether the posterior probability of treatment proportions and lies within the bounds of the equivalence margin or not. There is therefore, the need to sample from the posterior distribution of . The marginal posteriors of and are Beta distributions, and therefore is not in an analytically tractable form. So, are generated from and independently generated from because and are independent. Then, can be treated as a random sample from .

3. Normal Approximation to the Beta Posterior Distribution

Note that the posterior distributions of and are Beta distributions. By following Kpekpena [8], a normal approximation to posteriors can be obtained using a Taylor series expansion of the Beta distribution. By applying a Taylor series expansion with first three terms, it can be shown that , where

Similarly, the approximation of can also be obtained. The details of this construction are given in the Appendix. We provide some approximations based on this development in Table 1 and Figures 1 and 2. It is clear that the approximation starts to work well for the values of the posterior parameters from and . However, the approximation is not suitable when Beta posterior parameters are less than 10.

Exact distributionApproximation

4. Estimating Missing Data in Arms

Missing data are easily handled in Bayesian inference by treating them as another set of parameters. We estimate the missing values conditioning on the observed data. For example, let be a binary random sample from in an arm and suppose that is missing. Note that represents in treatment arm and in control group. Let and . Then, the likelihood of the observed data is

The posterior of based on the complete data is

The full conditionals of and are

It is easy to generate from these full conditionals in , so and can be estimated using Gibbs sampling.

5. Data Analysis

We apply our approach on data analyzed in [7, 9]. The data includes 22 trials of beta-blockers to prevent mortality after myocardial infarction. For each of the 22 trials, a test of equivalence is done to ascertain whether the treatment proportion is equivalent to the control proportion. This example is based on the Statistical inferential procedure for binary data discussed in Sections 2 and 3. For each arm, the number of patients who had myocardial infarction out of a total is considered to be the number of successes in binomial trials. Similarly, the number of cases in the control group is treated as a binomial outcome independent of the treatment group. The equivalence margin is chosen to be as small as possible such that if the absolute value of the difference in the control and treatment proportions is less than , and we can say that the two proportions are equivalent. For demonstration purpose, we assume a practically meaningful equivalence margin . We use noninformative priors for the parameters, and . The hypothesis for a test of equivalence of study number and its control group is as follows:

We perform the equivalence test in (9) using the Bayes factor [9]. Table 2 gives the results of the equivalence tests. The first column is the study label. Columns 2 and 3 are the treatment proportion () and control proportion (), respectively. Columns 4 () and 6 () are the posterior probabilities that is true under the Beta posterior distributions and under the normal approximation to the Beta posterior, respectively. Column 5 (B) is the Bayes factor for exact posterior, and in column 7 is the Bayes Factor based on the normal approximation. For study 1, the Bayes factor for the exact posterior is 7.3822 whereas that of the normal approximation is 7.5466. Both Bayes factors are above 1 which implies that is more likely to be true, and is the hypothesis that the treatment proportion is not equivalent to the control proportion. We remark that classical hypothesis tests give one hypothesis a preferred status and only consider evidence against it which is not the case in Bayesian tests. Results also indicate that the approximation and exact computation lead to the same conclusion in each study indicating the suitability of the approximation.


We now consider a missing data analysis in an arm. As an example, suppose an observation was missing in the treatment group under study 1. We estimate this missing value using Gibbs sampling derived in Section 4. The posterior distributions of parameters and are given in Figure 3 based on 20000 MCMC simulations. According to Figure 3, it is likely that is 0. The trace plot in Figure 4 shows that mixing is good enough, and there are no large spikes in the autocorrelation plot after lag 0. This is an indication of convergence of the Markov Chain.

We also consider a meta-analysis of the binary data in two-arm trials in order to assess the between-study variations. Let be the estimate of the true effect size corresponding to the study. Then, the random effects model is given as

As developed in Muthukumarana and Tiwari [10], we consider a hierarchical Dirichlet process formulation for as follows:where , and are known.

We analyze the dataset published in Nissen and Wolski [11] to assess the between-study heterogeneity. In this dataset, there are 42 trials including 15565 diabetes patients who were put on rosiglitazone (treatment group) and 12282 diabetes patients assigned to medication that does not contain rosiglitazone (control group). Note that the interest is on myocardial infarction and death from rosiglitazone as a treatment for diabetes. We use the odds ratio as the treatment effect. The parameters of the model are estimated by Gibbs sampling algorithm implemented in R. The estimates of the model parameters () are given in Table 3.

ParameterEstimateStandard deviation


We also conducted a simulation study to assess the validity of the approach. Each study was simulated by means of a binomial random variable in which the number of cases in the treatment group and the control group are generated as independent binomial random variables. We generate twenty binomial successes using the rbinom random generator. We assume in each case and fix the at 0.7. This setting is similar to administering a treatment in twenty hospitals with 200 patients in each hospital. Fixing at 0.7 generates number of cases that do not vary so much from each other. This is confirmed in the non-significance of the chi-square test for heterogeneity. Another set of twenty “number of cases” is generated from the binomial distribution, but this time we induce heterogeneity. This is done by varying the success probability of each trial. For instance rbinom(1, 200, 0.86), rbinom(1, 200, 0.10), rbinom(1, 200, 0.55), ….

Note that our interest is in comparing the posterior treatment means of the heterogeneous studies with the studies that are not heterogeneous. Table 4 compares the posterior treatment means of 20 studies with heterogeneity to the treatment means of 20 other studies in which there is no heterogeneity. Column 1 is the posterior treatment means of the nonheterogeneous () studies whereas in column 2 is the posterior treatments of the heterogeneous studies. Treatment means in column 1 () are mostly 0.68 or just slightly below or above it. If the responses are similar, the treatment effects are supposed to be an estimate of a common treatment mean. On the other hand, all the treatment means in column 2 () differ from each other significantly indicating that the model can assess the between-study variations.



6. Discussion

We have considered a Bayesian analysis of binary data in testing hypotheses of equivalence. The tests of hypotheses of equivalence are popular in clinical trials, and our approach is relatively simple and easy to perform. Bayesian formulation was considered for testing hypothesis of equivalence, and we observed that the normal approximation to the Beta posterior can be used for moderately large sample sizes. We also presented a mechanism for estimating missing data in arms. This is useful in situations where data are partially missing in some arms. We also considered a meta-analytic approach for assessing between-study variations.

There are two directions we would like to pursue along the methods discussed in this article. We are interested in enhancing the method to accommodate extra covariates into the model in the presence of multiple outcomes in an arm. The incorporation of covariates makes the Bayes factor inappropriate, and we would like to examine the other model selection criterion in place of the Bayes factor.


Let the best estimate of , be the value of for which the posterior is at its maximum. That is,

The Taylor series expansion of a function at is

Let the log of the posterior distribution be

By applying a Taylor series expansion to at with first three terms,

By taking the exponential of ,where is a normalizing constant.

Let and . This gives

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.


Saman Muthukumarana has been partially supported by a grant from the Natural Sciences and Engineering Research Council of Canada.


  1. S. Wellek, Testing Statistical Hypothesis of Equivalence and Noninferiority, CRC Press, Boca Raton, FL, USA, 2010.
  2. J. Albert, “Teaching inference about proportions using bayes and discrete models,” Journal of Statistics Education, vol. 3, no. 3, 1995. View at: Publisher Site | Google Scholar
  3. M. Gamalo, R. Wu, and R. Tiwari, “Bayesian approach to noninferiority trials for proportions,” Journal of Biopharmaceutical Statistics, vol. 21, no. 5, pp. 902–919, 2011. View at: Publisher Site | Google Scholar
  4. D. Rahardja and Y. D. Zhao, “Bayesian inference of a binomial proportion using one-sample misclassified binary data,” Model Assisted Statistics and Applications, vol. 7, no. 1, pp. 17–22, 2012. View at: Google Scholar
  5. B. G. Zaslavsky, “Bayesian hypothesis testing in two-arm trials with dichotomous outcomes,” Biometrics, vol. 69, no. 1, pp. 157–163, 2013. View at: Publisher Site | Google Scholar
  6. J. B. Carlin, “Meta-analysis for 22 tables: a bayesian approach,” Statistics in Medicine, vol. 11, no. 2, pp. 141–158, 1992. View at: Publisher Site | Google Scholar
  7. S. Yusuf, R. Peto, J. Lewis, R. Collins, and P. Sleight, “Beta blockade during and after myocardial infarction: an overview of the randomized trials,” Progress in Cardiovascular Diseases, vol. 27, no. 5, pp. 335–371, 1985. View at: Publisher Site | Google Scholar
  8. C. Kpekpena, “Bayesian analysis of binary and count data in two-arm trials,” University of Manitoba, Winnipeg, MB, Canada, 2014, View at: Google Scholar
  9. R. E. Kass and A. E. Raftery, “Bayes factors,” Journal of the American Statistical Association, vol. 90, no. 430, pp. 773–795, 1995. View at: Publisher Site | Google Scholar
  10. S. Muthukumarana and R. C. Tiwari, “Meta-analysis using Dirichlet process,” Statistical Methods in Medical Research, vol. 25, no. 1, pp. 352–365, 2016. View at: Publisher Site | Google Scholar
  11. S. Nissen and K. Wolski, “Effect of rosiglitazone on the risk of myocardial infarction and death from cardiovascular causes,” New England Journal of Medicine, vol. 356, no. 24, pp. 2457–2471, 2007. View at: Publisher Site | Google Scholar

Copyright © 2018 Cynthia Kpekpena and Saman Muthukumarana. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Related articles

No related content is available yet for this article.
 PDF Download Citation Citation
 Download other formatsMore
 Order printed copiesOrder

Related articles

No related content is available yet for this article.

Article of the Year Award: Outstanding research contributions of 2021, as selected by our Chief Editors. Read the winning articles.