Complexity / 2021 / Article
Special Issue

Applications of Frontiers and Complexity in Optimisation Theory and Algorithms

View this Special Issue

Research Article | Open Access

Volume 2021 |Article ID 6685951 |

Keyi Mou, Zhiming Li, "Homogeneity Test of Many-to-One Risk Differences for Correlated Binary Data under Optimal Algorithms", Complexity, vol. 2021, Article ID 6685951, 29 pages, 2021.

Homogeneity Test of Many-to-One Risk Differences for Correlated Binary Data under Optimal Algorithms

Academic Editor: Baogui Xin
Received03 Jan 2021
Revised17 Feb 2021
Accepted16 Mar 2021
Published08 Apr 2021


In clinical studies, it is important to investigate the effectiveness of different therapeutic designs, especially, multiple treatment groups to one control group. The paper mainly studies homogeneity test of many-to-one risk differences from correlated binary data under optimal algorithms. Under Donner’s model, several algorithms are compared in order to obtain global and constrained MLEs in terms of accuracy and efficiency. Further, likelihood ratio, score, and Wald-type statistics are proposed to test whether many-to-one risk differences are equal based on optimal algorithms. Monte Carlo simulations show the performance of these algorithms through the total averaged estimation error, SD, MSE, and convergence rate. Score statistic is more robust and has satisfactory power. Two real examples are given to illustrate our proposed methods.

1. Introduction

Binary data are often encountered for paired organs (e.g., eyes and ears) in medical clinical studies. The responses of each patient are collected and recorded as paired data at the end of the study. The outcome can be none, unilateral, or bilateral cured. Data from all patients can be summarized in a contingency table. The correlation between responses from paired parts should be taken into account to avoid biased or misleading results.

Some probability models have been proposed for analyzing correlated paired data. Rosner introduced a constant model under the assumption that the conditional probability of a response at one side of the paired body parts given response at the other side was times the unconditional probability [1]. Under Rosner’s model, asymptotic and exact tests were discussed [25]. However, Dallal pointed out that Rosner’s model could give a poor fit if the characteristic was almost certain to occur bilaterally with widely varying group-specific prevalence [6]. He assumed that each group had a constant conditional probability and derived the likelihood ratio statistic to test prevalence equality. M’lan and Chen presented three objective Bayesian methods for bilateral data under Dallal’s model [7]. Donner proposed an alternative model by assuming that the correlation coefficient was a fixed constant in each of the groups [8]. Thompson proved that Donner’s model could make full use of single and two-organ data to optimize the power of study [9]. Pei et al. applied the model into stratified paired data and assumed that the correlation coefficients of responses were the same to all subjects in two groups of each stratum [10].

Testing the homogeneity has received considerable attention for bilateral binary data. In ophthalmologic studies, Rosner [1] proposed two statistics to test the equality of rate difference in (in)dependence models. Tang et al. [2] developed exact and approximate unconditional procedures for the aforementioned statistics in small sample designs or sparse data structures. Further, Tang et al. [11] developed several statistics for testing the equality of cure rates, including likelihood ratio, score, and two Wald-type statistics in the (in)dependence models. Ma et al. [3] extended these tests to multigroup cases and investigated whether the response rates of the groups were identical under Rosner’s model. From the above results, we note that it is crucial to derive the global and constrained MLEs under the hypotheses. However, there are usually no closed-form solutions for maximum likelihood estimates (MLEs). Under Donner’s model, Ma and Liu [12] used two-step algorithm to obtain MLEs and developed several tests for the proportion equality among groups . Liu et al. [13] also used the method for constrained MLEs. Peng et al. [14] constructed confidence intervals (CIs) of proportion ratio under Rosner’s model. They introduced Fisher scoring algorithm for constrained MLEs. Many algorithms were proposed to obtain MLEs for correlated binary data. However, there are few research studies on comparison of different algorithms for MLEs in multigroup binary design.

Under Donner’s model, this paper aims to provide several algorithms for calculating global and constrained MLEs and extends the homogeneity tests of Tang et al. [11] to many-to-one case under optimal algorithms. Fisher scoring algorithm, two-step method, and generalized expectation-maximization (GEM) algorithm are taken into account, since they are widely used in calculating MLEs. Optimal algorithms for MLEs required by the objective test can be found through comparing these algorithms. The rest of this article is organized as follows. In Section 2, we review data structure and establish Donner’s model for multigroup correlated binary data. Global and constrained MLEs are derived by various algorithms in Section 3. Based on the optimal algorithms, the likelihood ratio, score, and Wald-type statistics are constructed for testing the equality of many-to-one risk differences. The performance of algorithms is compared by the total averaged estimation error, SD of the averaged estimation error, MSE, and convergence rate in Section 4. Monte Carlo simulations show the empirical type I error rate and power of these tests. Two real examples are provided to illustrate the proposed methods in Section 5. Conclusions and further work are given in Section 6.

2. Preliminaries

Suppose there are groups involving individuals in the clinical trial, where the first group is control group and other groups are treatment groups. Let be the number of patients with responses in the -th group and be the total number of patients in the -th group, which is assumed to be fixed. The data structure is shown in Table 1.

Number of responses Group Total


Let be the probabilities of none, unilateral, and bilateral response(s) in the -th group, where for any fixed . Denote and . For the -th group, follows a trinomial distribution. Thus, the probability density of is expressed as follows:

Let be an indicator of the -th organ’s response for the -th patient in the -th group. If there is a response, then , and 0 otherwise. Suppose that , and under Donner’s model. Thus, the probabilities can be obtained byfor . Based on the observed data , the log-likelihood function can be given bywhere is a constant.

Let , where is the risk difference between the first group and the -th group. We are interested to test the hypotheses below.

Under , that is , the log-likelihood function can be rewritten aswhereand is a constant.

3. Test Methods

In this section, the global and constrained MLEs are first derived by various algorithms. Then, likelihood ratio, score, and Wald-type tests are constructed based on the optimal algorithms.

3.1. Global MLEs

Let and be the global MLEs of and . For the unknown parameters and , their global MLEs are the solutions of the following equations:where

However, there are no closed-form solutions for the above equations. Thus, we need to obtain the global MLEs and by different algorithms.

3.1.1. Global MLEs Based on Fisher Scoring Algorithm

The initial values of and can be given by

The -th approximates and can be obtained by Fisher scoring algorithm:where and is a Fisher information matrix (see Appendix A). Repeat the process until the result converges.

3.1.2. Global MLEs Based on Two-Step Method

The two-step method is described by a third-order polynomial and Newton–Raphson algorithm. The detailed procedure is provided below.(i)Take the initial value . Moreover, equation (8) can be simplified as a third-order polynomial:Put into the polynomial and solve its real root to obtain the -th approximates of , denoted by .(ii)Update the -th approximate of by Newton–Raphson algorithm:whereand . Repeat (i)-(ii) until convergence.

3.1.3. Global MLEs Based on GEM Algorithm

According to equation (1), we have

Suppose patients with 0 response can be divided into two parts, whose responsibilities are and , respectively. Let latent variables and be their total numbers. Observable variable can also be split into two latent variables and . Suppose the probability of result happening in (or ) individuals is (or ). When , and are given, and follow binomial distributions:

Denote and . Then, are complete data and observable data are incomplete data. Based on complete data, we have

Thus, the log-likelihood function about complete data iswhere

The initial values of are defined in equation (9). The process of GEM algorithm is described by the expectation (E) and maximization (M) steps as follows.(i)E Step. Given and the current approximates . DenoteLet be the expected value of the log-likelihood function of , with respect to the current conditional distribution of as follows:where(ii)M Step. Update and in order to successively maximize expected value of the log-likelihood function in E step. The new approximate of parameters can be obtained by maximizing when other parameters are given as their latest approximates. Repeat E and M steps until the result converges.

3.2. Constrained MLEs

Let and be the constrained MLEs of , and under . Under : , it is obvious that . Thus, the constrained MLEs satisfy the following equations:

However, their closed-form solutions are not given by these equations. Thus, we introduce Fisher scoring algorithm, two-stage procedure, and GEM algorithm to obtain the constrained MLEs , , and .

3.2.1. Constrained MLEs Based on Fisher Scoring Algorithm

The initial values of are defined in equation (9), and . The -th updates of , , and can be calculated by Fisher scoring algorithm as follows:where is a Fisher information matrix (see Appendix B).

3.2.2. Constrained MLEs Based on Two-Stage Procedure

The two-stage procedure is different from the two-step method in Section 3.1.2. Firstly, the MLE of is given by Newton–Raphson algorithm. Then, and are obtained by Fisher scoring algorithm under given MLE . The detailed process is described as follows.(i)The initial values and are defined in the equation (9), and . The -th approximate of is obtained by Newton–Raphson algorithm:where(ii)Given , the -th approximates of and can be calculated aswhere is a Fisher information matrix of and (see Appendix B).

3.2.3. Constrained MLEs Based on GEM Algorithm

Similar to global MLEs, the GEM algorithm is used to calculate constrained MLEs under . The initial values of are defined in equation (9), and . The detailed process is shown as follows.(i)E Step. Letwhere(ii)M Step. Similar to Section 3.1.3, we choose parameters so that increases. Repeat E and M steps until the result converges.

3.3. Likelihood Ratio Test

Likelihood ratio test statistic can be constructed through the global and constrained MLEs as follows:where are the global MLEs and are the constrained MLEs.

3.4. Score Test

Note that is equivalent to . The homogeneity test of many-to-one risk differences can be achieved by testing the equality of risks in treatment groups. Denote . Score test statistic is derived aswhere and . and are constrained MLEs under . is a information matrix (see Appendix C for more information). Thus, can be simplified aswhere

3.5. Wald-Type Test

Let and

The null hypothesis is equivalent to . Thus, Wald-type statistic iswhere the Fisher information matrix is the same as that of score test (see Appendix D). It can be simplified aswhere

Under , test statistics , and are asymptotically distributed as chi-square distribution with degrees of freedom. Thus, should be rejected if the value of test statistic is larger than at the significant level , where is the percentile of the chi-square distribution with degrees of freedom.

4. Monte Carlo Simulations

In this section, the performance of several algorithms are compared with respect to average errors of MLEs, the number of iteration, and time cost. For convenience, we denote Fisher scoring algorithm, two-step method, and GEM algorithm for global MLEs as FSA, TSM, and GEM and Fisher scoring algorithm, two-stage procedure, and GEM algorithm for constrained MLEs as FSA, TSP, and GEM for tables and figures. Then, we investigate the type I error rates (TIEs) and power of the likelihood ratio, score, and Wald-type tests. In simulations, and are arranged as shown in Table 2, where the scenarios 4, 8, and 12 are unbalanced designs.





4.1. Selection of Algorithms

Under or , we randomly select 1000 sets of and for each scenario in Table 2. Further, 10,000 samples are randomly produced for each parameter setting.

4.1.1. Evaluation of MLEs

Let be averaged error among estimators and true values of parameters from the 10000 random samples under the -th parameter setting. Denote in each scenario, which is the total averaged error for 1000 parameter settings. The dispersion of in each scenario can be reflected by standard deviation (SD) value of . The MSEs of the global MLEs can be evaluated by the differences from the true parameters to the corresponding estimated values under various parameter settings. The global MLEs can be calculated based on Fisher scoring algorithm, two-step method, and GEM algorithm. The constrained MLEs are obtained by Fisher scoring algorithm, two-stage procedure, and GEM algorithm. The random samples for the former are generated under and the latter are generated under . The convergence accuracy is defined by the differences from two close iterations and fixed as . The MSEs of the three algorithms for global MLEs have no significant difference as shown in Tables 35 . That is to say, the global MLEs are identical by these algorithms. In Tables 68 , the values of , SD, and MSEs from Fisher scoring algorithm are usually smaller than other two algorithms for constrained MLEs. So, Fisher scoring algorithm has higher accuracy for constrained MLEs. All MSEs become smaller and close to each other when sample size increases. Algorithms for MLEs have better MSEs in balanced designs than unbalanced designs.

IndexMLEScenario 1Scenario 2Scenario 3Scenario 4