International Scholarly Research Notices

International Scholarly Research Notices / 2012 / Article

Research Article | Open Access

Volume 2012 |Article ID 896082 |

N. Rao Chaganty, Roy Sabo, Yihao Deng, "Alternatives to Mixture Model Analysis of Correlated Binomial Data", International Scholarly Research Notices, vol. 2012, Article ID 896082, 10 pages, 2012.

Alternatives to Mixture Model Analysis of Correlated Binomial Data

Academic Editor: A. Hutt
Received28 Feb 2012
Accepted29 Mar 2012
Published28 May 2012


While univariate instances of binomial data are readily handled with generalized linear models, cases of multivariate or repeated measure binomial data are complicated by the possibility of correlated responses. Likelihood-based estimation can be applied by using mixture distribution models, though this approach can present computational challenges. The logistic transformation can be used to bypass these concerns and allow for alternative estimating procedures. One popular alternative is the generalized estimating equation (GEE) method, though systematic errors can lead to infeasible correlation estimates or nonconvergence problems. Our approach is the coupling of quasileast squares (QLSs) method with a rarely used matrix factorization, which achieves a simplified estimation platformā€”as compared to the mixture model approachā€”and does not suffer from the convergence problems in GEE method. A noncontrived example is provided that shows the mechanical breakdown of GEE using several statistical software packages and highlights the usefulness of the QLS approach.

1. Introduction

Binomial data occur when observations on a given subject consist of a fixed series of Bernoulli trials, resulting in a proportional outcome. Maximum likelihood estimation is readily available in a generalized linear modeling framework when subjects consist of univariate measures (i.e., one Bernoulli or binomial trial per subject). However, estimation becomes more complicated when several Bernoulli or binomial trials are observed for each subject. In this case subject responses could be multivariate (consisting of several series of separately defined trials) or repeated measure (where the set of trials are defined similarly), and in both instances there is the possibility that the intrasubject responses are correlated. Here we use the term ā€œsubjectā€ for convenience but it could be an item, store, location, plot in agriculture experiments, and so on. For example, real-life data situations where we encounter correlated proportions include (1) bankers interested in the proportion of customers making the š‘–th type of transaction at the š‘—th bank branch; (2) the proportion of CD deposits at the š‘–th branch in the š‘—th month of a year; (3) retail managers interested in the proportion of customers purchasing the š‘–th item at the š‘—th store; (4) marketers interested in the proportion of subjects viewing the š‘–th advertisement type on the š‘—th website; (5) information technology specialists interested in the proportion of students who use the š‘–th computer program in the š‘—th computer lab; (6) biologists interested in the proportion of hatched English sole eggs kept in solutions at different temperatures and salinity levels. We will discuss the last example later in this paper. Other examples where correlated binomial data occur are seed testing experiments described in Gilliland et al. [1].

Data layout of the aforementioned examples is presented in Table 1. In all of these examples the proportions within each row are correlated but the rows can be assumed to be independent. The within-row correlation, while complicating matters, must be accounted for in order to obtain proper variance estimates and inference for any regression parameters representing the associations between the vector of proportions and covariates. Thus, the problem is to estimate the parameters of interest within the ensemble of all parameters. In this context one could use a likelihood-based approach utilizing mixture-distribution models.

Rep. 1 Rep. 2 ā‹Æ Rep. š‘”

Sub. 1 š‘ 1 1 š‘ 1 2 ā‹Æ š‘ 1 š‘”
Sub. 2 š‘ 2 1 š‘ 2 2 ā‹Æ š‘ 2 š‘”
Sub. 3 š‘ 3 1 š‘ 3 2 ā‹Æ š‘ 3 š‘”
ā€ƒ ā‹® ā‹® ā‹® ā‹Æ ā‹®
Sub. š‘š š‘ š‘š 1 š‘ š‘š 2 ā‹Æ š‘ š‘š š‘”

In the case of binomial data the mixture model would consist of both binomial and logit-normal components. However, parameter estimation in the mixture model could experience convergence problems due to the multitude of marginal means, regression, and correlation parameters. A simplified alternative approach would be to transform the variable-specific proportions for each subject in a way that would simplify the assumed probability distribution. The logit of the proportions would transform the outcome scale from [0,1] to (āˆ’āˆž,āˆž), which could make appropriate a multivariate normal-based methodology. One such procedure could be the generalized estimating equations (GEEs) proposed by Liang and Zeger [2]. Though a popular methodology for estimating regression parameters in cases of longitudinal or repeated measure data, this procedure suffers problems estimating correlations. As will be seen in subsequent sections, the GEE method can fail to converge even for cases of continuous data, which is the case if the logit transformation is used on binomial data.

Coull and Agresti [3] discussed random effects models for logit-transformed correlated binomial data. Here we suggest the method of quasileast squares (QLSs), developed by Chaganty [4] and Chaganty and Shults [5]. While generally seen as an alternative method to solving the maximum likelihood score equation for correlation parameters in the case of Gaussian data (Sabo and Chaganty [6]), the estimation of correlation in the QLS procedure can also be supplemented with a little-known matrix factorization that makes it distinct from the maximum likelihood method. In this sense the QLS procedure is applicable for estimating correlated continuous data, which is appropriate for logit-transformed binomial proportions.

The rest of this paper is outlined as follows. The likelihood-based mixed-model approach is discussed in Section 2, while the logit transformation of binomial data and the GEE methodology are discussed in Section 3. We briefly outline the QLS estimating procedure in Section 4, while also highlighting the matrix factorization for use in estimating correlation. A noncontrived example is given in Section 5 that shows the usefulness of the QLS approach, as well as the convergence problems experienced by several statistical software packages in implementing the GEE method. A brief conclusion follows in Section 6.

2. Maximum Likelihood Estimation Using Mixture Distribution Models

For š‘–=1,ā€¦,š‘š subjects, let š²š‘–=(š‘¦š‘–1,ā€¦,š‘¦š‘–š‘”)ī…ž be a vector of š‘” possibly dependent binomial random variables, where š‘¦š‘–š‘— is the number of successes out of š‘›š‘–š‘— trials with success probability š‘š‘–š‘— for the š‘—th variable of the š‘–th subject. Also assume that š±š‘–š‘—=(š‘„š‘–1,ā€¦,š‘„š‘–š‘˜) is the vector of š‘˜ covariates corresponding to the š‘—th variable in the š‘–th subject, such that š—š‘–=(š±ī…žš‘–1,ā€¦,š±ī…žš‘–š‘”)ī…ž is the š‘”Ɨš‘˜ matrix of all covariates for the š‘–th subject.

The general mixture distribution model for binomial data is given byš‘“ī€·š²š‘–ī€ø=ī€œ[0,1]š‘”š‘”ī‘š‘—=1āŽ›āŽœāŽœāŽš‘›š‘–š‘—š‘¦š‘–š‘—āŽžāŽŸāŽŸāŽ š‘š‘¦š‘–š‘—š‘–š‘—ī€·1āˆ’š‘š‘–š‘—ī€øš‘›š‘–š‘—āˆ’š‘¦š‘–š‘—š†ī€·š‘‘š©š‘–ī€ø,(2.1) where š† is a multivariate cumulative distribution function with support in [0,1]š‘” and š©š‘–=(š‘š‘–1,ā€¦,š‘š‘–š‘”). Basically, we assume that š©š‘– is distributed as šŗ(ā‹…), and, given š©š‘–, the vector š²š‘– consists of š‘” independent binomial variables. Then the marginal distribution of š²š‘– is given by (2.1). A popular choice for š† is the multivariate logit-normal distribution; that is, the distribution obtained under the assumption logit(š©š¢)=(log(š‘š‘–1/(1āˆ’š‘š‘–1)),ā€¦,log(š‘š‘–š‘”/(1āˆ’š‘š‘–š‘”))) is multivariate normal with mean šš‘–=(šœ‡š‘–1,ā€¦,šœ‡š‘–š‘”) and covariance matrix Ī£. Here šœ‡š‘–š‘—=š±ā€²š‘–š‘—šœ·represents the mean as a function of the covariates and a š‘˜-dimensional regression parameter vector šœ·. To make model (2.1) identifiable we make the common assumption that Ī£=šœ™š‘, where š‘ is a correlation matrix and šœ™ is a scale parameter (Joe [7], page 219). This condition is necessary for model identification as the following simple example shows. Suppose š‘”=2 and that the vector š© is multivariate logit-normal distributed with mean š=0 and covariance matrix ī‚€Ī£=šœŽ210.3šœŽ1šœŽ20.3šœŽ1šœŽ2šœŽ22ī‚. It is easy to verify that two sets of choices for the variances šœŽ21 and šœŽ22 can result in identical binary distribution for š² as shown in Table 2.

( š‘¦ 1 , š‘¦ 2 ) Joint probability of š²
šœŽ 1 = 3 . 0 , šœŽ 2 = 2 . 9 šœŽ 1 = 3 . 8 , šœŽ 2 = 4 . 0

( 1 , 1 ) 0.2877 0.2877
( 1 , 0 ) 0.2123 0.2123
( 0 , 1 ) 0.2877 0.2877
( 0 , 0 ) 0.2123 0.2123

If šœ·,š‘,šœ™ are the only parameters of interest, we can obtain maximum likelihood estimates by maximizing the likelihood āˆšæ(šœ·,š‘,šœ™)=š‘šš‘–=1š‘“(š²š‘–). However, if the šø(š‘¦š‘–š‘—/š‘›š‘–š‘—)=š‘š‘–š‘—ā€™s are also of interest, we can obtain estimates of these parameters using either the empirical Bayes (EB) method or the EM algorithm considering the full likelihoodšæī€·š©š‘–ī€ø=,šœ·,š‘,šœ™š‘šī‘š‘”š‘–=1ī‘š‘—=1āŽ›āŽœāŽœāŽš‘›š‘–š‘—š‘¦š‘–š‘—āŽžāŽŸāŽŸāŽ š‘š‘¦š‘–š‘—š‘–š‘—ī€·1āˆ’š‘š‘–š‘—ī€øš‘›š‘–š‘—āˆ’š‘¦š‘–š‘—ā„Žī€·š©š‘–ī€ø,šœ·,š‘,šœ™.(2.2) Equation (2.2) is the full specification of (2.1) with covariates, regression parameters, correlation, and variance described in ā„Ž(ā‹…), the multivariate logit-normal density function. One quickly notices that the likelihood (2.2) has parameters that increase with š‘š, and solutions to the maximization of (2.2) will require roots of complex nonlinear equations. These considerations may make the full likelihood approach subject to computational difficulties and convergence problems. Further, such specific definitions for the components in the mixture model may affect estimator robustness.

3. Alternatives to Likelihood-Based Estimation

For reasons outlined earlier it makes sense to consider the vector of logit-transformed proportions Ģ‚š®š‘–=(Ģ‚š‘¢š‘–1,ā€¦,Ģ‚š‘¢š‘–š‘”), where Ģ‚š‘¢š‘–š‘—=logit(Ģ‚š‘š‘–š‘—)=log[Ģ‚š‘š‘–š‘—/(1āˆ’Ģ‚š‘š‘–š‘—)] and Ģ‚š‘š‘–š‘—=š‘¦š‘–š‘—/š‘›š‘–š‘—. Note that Ģ‚š®š‘– is distributed as multivariate normal with parameters Ģ‚š®šø(š‘–)=šš‘–=š—š‘–šœ·and Ģ‚š®Cov(š‘–)=šœ™š‘. The focus on these normally distributed random variables, rather than the mixture-distribution-based binomial random variables, can allow us to relax distributional assumptions and utilize distribution-free methodologies for parameter estimation such as the generalized estimating equations (GEEs). This methodology is a two-stage process, in which the estimate of the regression parameter šœ· is updated by a residual-based moment estimate of š‘. Specifically, estimation is iterated between the two equationsš‘šī“š‘–=1ī‚µšœ•šš‘–ī‚¶šœ•šœ·ā€²īš‘āˆ’1ī€·Ģ‚š®š‘–āˆ’šš‘–ī€øīš™=0,š‘=īšœ™,(3.1) until convergence. Here āˆ‘š™=š‘šš‘–=1š³š‘–š³ī…žš‘–, īāˆ‘šœ™=š‘šš‘–=1(š³ī…žš‘–š³š‘–)/(š‘šš‘”āˆ’š‘˜), where š³š‘–=Ģ‚š®š‘–āˆ’šš‘–(ī‚Ššœ·). The problem with this methodology is that the diagonal elements of š™ are not necessarily equal to īšœ™, implying that the diagonal elements of īīšœ™š‘=š™/ are not necessarily unity. However, the GEE methodology, as implemented in software packages, forces the diagonal elements of īš‘ to unity (i.e., it changes the values from whatever they are to 1), and thus matrix īš‘ is not guaranteed to be positive definite. This can lead to (most harmlessly) convergence problems, but it can also lead to artificially deflated estimator variances for the regression parameters and is thus subject to improper or incorrect inference.

4. Quasileast Squares

The quasileast squares (QLSs) approach, on the other hand, provides an alternative estimate of š‘ and does not experience the convergence problems exhibited by GEE. A further benefit of this method is that it does not require the assumption of normality for the joint distribution of each response or among their marginal distributions. The initial step for estimation of š‘ is to minimize tr(š‘āˆ’1š™) over the set of correlation matrices. Since the diagonal elements of š‘ are restricted to be one, introducing a diagonal matrix of lagrange multipliers Ī›, we can verify that the point of minimum ī‚š‘ factors the matrix š™ asī‚ī‚š™=š‘šš²š‘.(4.1) Whittle [8] has shown that for a positive definite matrix š™ the factorization (4.1) is unique. Further ī‚š‘=Ī›āˆ’1/2(Ī›1/2š™Ī›1/2)Ī›āˆ’1/2, and the diagonal matrix Ī›satisfies the fixed-point equation Ī›=diag(Ī›1/2š™Ī›1/2)1/2, which can be solved using a simple fixed-point iterative scheme (Olkin and Pratt [9], Chaganty [4]). Next, using the first step correlation estimate ī‚š‘, we can then obtain a consistent correlation estimate asīī‚ī‚š‘=š‘šš«š‘,(4.2) where Ī”=diag(š‚), ī‚ī‚š‚=(š‘āˆ˜š‘)āˆ’1šž, šž is a vector of ones, and āˆ˜ denotes the Hadamard product. It is possible that the correlation matrix (4.2) may not be positive definite in which case Chaganty and Shults [5] have recommended the estimateīš‘=diag(š™)āˆ’1/2š™diag(š™)āˆ’1/2.(4.3) See equation (3.2) in Chaganty and Shults [5]. The quasi-least squares method uses the estimate (4.2) of š‘ if it is positive definite and otherwise uses (4.3), which is clearly a positive definite correlation matrix, to update the estimate of šœ·until convergence. Code for fitting this model using the R statistical software is provided in the Appendix.

5. Example

We now provide an example from Alderdice and Forrester [10], who modeled the effects of salinity and temperature on the proportion of hatched English sole eggs. In this study, the number of hatched eggs was recorded at seven salinity and five temperature levels. Measurements were taken in four separate tanks for each combination of salinity and temperature, and for each tank we have recordings of the number of fish eggs and the number hatched. Thus, the tanks represent the repeated measure component for this binomial data set. The data, as given on page 6 of Lindsey [11], is reproduced in Table 3.

Tank 1 Tank 2 Tank 3 Tank 4
Temp. Salinity Hatch Total Hatch Total Hatch Total Hatch Total

15 4 236 666 203 724 183 764 212 723
15 8 600 656 697 747 615 746 641 703
15 12 407 566 343 603 365 560 302 394
25 4 203 717 177 782 155 852 138 590
25 8 591 621 564 640 714 754 532 570
25 12 475 622 465 645 506 608 415 532
35 4 1 738 3 655 10 742 3 763
35 8 526 616 419 467 410 484 374 606
35 12 272 362 352 478 392 590 382 459
10 10 303 681 329 710 262 611 301 700
10 6 277 757 234 681 263 647 287 801
40 10 387 450 389 553 388 564 318 604
40 6 276 662 247 542 248 527 149 591
20 10 351 391 559 650 527 603 476 548
20 6 585 643 620 671 437 497 667 771
30 10 447 491 462 530 475 545 499 556
30 10 522 573 615 680 539 581 517 561
30 6 563 666 600 704 562 656 615 723

The goal of the analysis is to study the dependence of the proportion of eggs hatched on the temperature and salinity. After calculating Ģ‚š‘¢š‘–š‘—=logit(š‘¦š‘–š‘—/š‘›š‘–š‘—), where š‘¦š‘–š‘— is the number of eggs hatched out of the total š‘›š‘–š‘— in the š‘—th tank at the š‘–th combination of temperature and salinity, the Shapiro-Wilk test for normality was performed on the transformed responses for each replicate. The results (š‘ƒ-values <0.05) indicate a departure from normality, so that maximum likelihood methods for continuous data are not applicable. The data was analyzed using GEE in several statistical software packages using an unstructured working correlation matrix to account for the correlation between the four replications of the solution in the four tanks. The results using PROC GENMOD in SAS version 9.2, module in TIBCO Spotfire S+ version 8.2, and xtgee procedure in STATA version 11, are shown in parts (i), (ii), and (iii) of Table 4. The warning message from PROC GENMOD read ā€œWARNING: Iteration limit exceeded.ā€ Here we see that in each case the estimates failed to converge. The 0.999 correlation estimates in part (i) represent model breakdown in that programmers often use this value to indicate nonconvergence.

(i) GEE estimates using SAS GENMOD procedureā€ƒā€‚ā€ƒā€ƒā€ƒā€ƒWorking correlation

Parm.Est.SE95% C.I.Z P r > | š‘ | 1.0000.9990.9990.999

(ii) GEE estimates using TIBCO Spotfire S+ā€ƒā€‚ā€ƒā€ƒā€ƒā€ƒā€ƒā€ƒā€ƒWorking correlation

Parm.Est.SE95% C.I.Z P r > | š‘ | 1.0001.0490.8900.978
Temp. 0.379 0.141 0.103 0.656 2.69 0.007 1.000

(iii) GEE estimates using STATA 11ā€ƒā€ƒā€ƒā€ƒā€ƒā€ƒā€ƒā€ƒā€ƒā€ƒā€ƒā€ƒEigenvalues

Parm.Est.SE95% C.I.Z P r > | š‘ | 3.6640.2590.114āˆ’0.038

(iv) QLS estimates using R 2.14.1ā€‚ā€ƒā€ƒā€ƒā€ƒā€ƒā€ƒā€ƒā€ƒā€ƒā€ƒā€ƒā€ƒWorking correlation

Parm.Est.SE95% C.I.Z P r > | š‘ | 1.0000.9680.9200.940

The warning message from TIBCO Spotfire S+ software read (sic) ā€œWarning messages: 1: at convergence at the correlation estimate 1 is outside of the range [āˆ’1,1] in cgeefit (gee.model) 2: correlation matrix is not full rank, 2<4 in: cgeefit (gee.model).ā€ Note that the correlation between the first and second tanks in part (ii) is greater than one, clearly violating the most liberal of correlation boundaries. The warning message from xtgee in STATA version 11 read ā€œconvergence not achieved.ā€ Also, the fourth eigenvalue in part (iii) is negative, indicating that the estimated correlation matrix is not positive definite. The results of the QLS analysis are given in part (iv) of Table 4, which show that the estimated correlation matrix is positive definite.

6. Discussion

The logit transformation was originally applied on mortality rates in univariate bioassays (Berkson, [12]), though the idea also generalizes nicely into the cases of correlated repeated-measure, longitudinal, or multivariate binomial data discussed here. Doing so allows the data analyst to bypass complicated, parametrically saturated mixture distributions and utilize methods for correlated continuous data. Interestingly, even after the logit transformation is applied, the GEE method still experiences convergence difficulties and problems with correlation parameter estimation. Potential causes for these problems are explained in Section 3. The QLS method, on the other hand, does not experience these difficulties and handles the simultaneous estimation of both regression and correlation parameters with relative ease. This was made possible by incorporating the little-known and rarely used matrix factorization given in (4.1).

Note that the probit transformation had an earlier origin and similar function to the logit transformation (Bliss, [13]) and can also be used in place of the logit transformation shown here. However, likelihood estimation of correlated binomial data using a latent multivariate distribution has already been established for the probit link function (Ashford and Sowden, [14]) and has been compared favorably to the GEE method when analyzed on real data (Sabo and Chaganty [15]).


For more details see Algorithm 1.

# R (ver 2.14.1) program to compute QLS estimates for the mixture model #
# Function to estimate the correlation matrix
# between the repeated measurements
correlation.est <- function(residuals, tol=1e-10)
ā€ƒā€ƒt <- ncol(residuals)
ā€ƒā€ƒZ <- t(residuals)%*%residuals
ā€ƒā€ƒ# start the decomposition algorithm with an identity matrix
ā€ƒā€ƒLambda0 <- diag(t)
ā€ƒā€ƒev <- eigen(Z)
ā€ƒā€ƒLambdak <- diag(diag(ev$vec%*%diag(sqrt(ev$val))%*%t(ev$vec)))
ā€ƒā€ƒDiff <- diag(Lambdak -Lambda0)
ā€ƒā€ƒwhile(sum(Diffāˆ§2) > tol)
ā€ƒā€ƒā€ƒLambda0 <- Lambdak
ā€ƒā€ƒā€ƒev <- eigen(sqrt(Lambda0)%*% Z %*%sqrt(Lambda0))
ā€ƒā€ƒā€ƒM <- ev$vec%*% diag(sqrt(ev$val)) %*%t(ev$vec)
ā€ƒā€ƒā€ƒLambdak <- diag(diag(M))
ā€ƒā€ƒā€ƒDiff <- diag(Lambdak - Lambda0)
ā€ƒā€ƒRtilde <- solve(sqrt(Lambdak))%*% M %*%solve(sqrt(Lambdak))
ā€ƒā€ƒRhat <- Rtilde%*%diag(as.vector(solve(Rtilde*Rtilde)%*%rep(1,t)))%*
ā€ƒā€ƒev <- eigen(Rhat)
ā€ƒā€ƒif (any(ev$val<0))
ā€ƒā€ƒā€ƒRhat <- solve(sqrt(diag(diag(Z))))%*% Z %*% solve
ā€ƒā€ƒā€ƒ(sqrt(diag(diag(Z)))) return(Rhat)
# Function to calculate the regression parameter beta.
regression.est <- function(x, y, t, Rhat)
ā€ƒā€ƒmt <- nrow(x)
ā€ƒā€ƒSigma <- solve(kronecker(diag(mt/t), Rhat))
ā€ƒā€ƒXRinvX <- t(x)%*%Sigma%*%x
ā€ƒā€ƒXRinvY <- t(x)%*%Sigma%*%y
ā€ƒā€ƒbetahat <- solve(XRinvX)%*%XRinvY
# The main program starts here
d <- read.table("c:/hatch-eggs.txt", header=TRUE)
proportion <- d$Hatch/d$Total
y <- log(proportion/(1-proportion))
x <- model.matrix(~Salinity+Temperature, data=d)
tol <- 1e-10
t <- length(d$ID)/length(unique(d$ID))
mt <- nrow(x)
m <- mt/t
k <- ncol(x)
Rhatinit <- diag(t)
betahat <- regression.est(x, y, t, ī ¢Rhatinit)
residuals <- matrix(y-x%*%betahat, ncol=t, byrow=TRUE)
Rhat <- correlation.est(residuals)
betanew <- regression.est(x, y, t, Rhat)
ā€ƒā€ƒbetahat <- betanew
ā€ƒā€ƒresiduals <- matrix(y-x%*%betahat, ncol=t, byrow=TRUE)
ā€ƒā€ƒRhat <- correlation.est(residuals)
ā€ƒā€ƒbetanew <- regression.est(x, y, t, Rhat)
# Calculate the scale parameter
residuals <- matrix(y-x%*%betahat, ncol=t, byrow=TRUE)
Z <- t(residuals)%*% residuals
Rhat <- correlation.est(residuals)
scale <- sum(diag(solve(Rhat)%*%Z))/(mt-k)
# Calculate model based standard errors and z-scores for betas
Sigma <- solve(kronecker(diag(m), Rhat))
Covbeta <- scale*solve(t(x)%*%Sigma%*%x)
stderrbeta <- sqrt(diag(Covbeta))
zstat <- betanew/stderrbeta
# Prepare and print the output
output <- cbind(betanew, stderrbeta, zstat, 2*(1-pnorm(abs(zstat))))
colnames(output) <- c("Estimate", "Std. Error", "z value", "P-value")
list(scale = scale, Rhat=Rhat, beta = output)


  1. D. Gilliland, O. Schabenberger, and H. Liu, ā€œIntercluster correlations for binomial data: an application to seed testing,ā€ Journal of Agricultural, Biological, and Environmental Statistics, vol. 7, no. 1, pp. 95ā€“106, 2002. View at: Publisher Site | Google Scholar
  2. K. Y. Liang and S. L. Zeger, ā€œLongitudinal data analysis using generalized linear models,ā€ Biometrika, vol. 73, no. 1, pp. 13ā€“22, 1986. View at: Publisher Site | Google Scholar | Zentralblatt MATH
  3. B. A. Coull and A. Agresti, ā€œRandom effects modeling of multiple binomial responses using the multivariate binomial logit-normal distribution,ā€ Biometrics, vol. 56, no. 1, pp. 73ā€“80, 2000. View at: Google Scholar
  4. N. R. Chaganty, ā€œAn alternative approach to the analysis of longitudinal data via generalized estimating equations,ā€ Journal of Statistical Planning and Inference, vol. 63, no. 1, pp. 39ā€“54, 1997. View at: Publisher Site | Google Scholar | Zentralblatt MATH
  5. N. R. Chaganty and J. Shults, ā€œOn eliminating the asymptotic bias in the quasi-least squares estimate of the correlation parameter,ā€ Journal of Statistical Planning and Inference, vol. 76, no. 1-2, pp. 145ā€“161, 1999. View at: Publisher Site | Google Scholar | Zentralblatt MATH
  6. R. T. Sabo and N. R. Chaganty, ā€œEstimation methods for an autoregressive familial correlation structure,ā€ Communications in Statistics. Theory and Methods, vol. 39, no. 6, pp. 973ā€“991, 2010. View at: Publisher Site | Google Scholar | Zentralblatt MATH
  7. H. Joe, Multivariate Models and Dependence Concepts, vol. 73, Chapman & Hall, London, UK, 1997.
  8. P. Whittle, ā€œA multivariate generalization of Tchebichev's inequality,ā€ The Quarterly Journal of Mathematics, vol. 9, pp. 232ā€“240, 1958. View at: Publisher Site | Google Scholar | Zentralblatt MATH
  9. I. Olkin and J. W. Pratt, ā€œA multivariate Tchebycheff inequality,ā€ Annals of Mathematical Statistics, vol. 29, pp. 226ā€“234, 1958. View at: Publisher Site | Google Scholar | Zentralblatt MATH
  10. D. F. Alderdice and C. R. Forrester, ā€œSome effects of salinity and temperature on early development and survival of the English sole (Parophrys vetulus),ā€ Journal of the Fisheries and Research Boad of Canada, vol. 25, pp. 495ā€“521, 1968. View at: Google Scholar
  11. J. K. Lindsey, ā€œLikelihood analyses and tests for binary data,ā€ Journal of Applied Statistics, vol. 24, no. 1, pp. 1ā€“16, 1975. View at: Google Scholar
  12. J. Berkson, ā€œApplication to the logistic function to bio-assay,ā€ Journal of the American Statistical Association, vol. 39, pp. 357ā€“365, 1944. View at: Google Scholar
  13. C. I. Bliss, ā€œThe calculation of the dosage mortality curve,ā€ Annals of Applied Biology, vol. 22, pp. 134ā€“167, 1935. View at: Google Scholar
  14. J. R. Ashford and R. R. Sowden, ā€œMulti-variate probit analysis,ā€ Biometrics, vol. 26, no. 3, pp. 535ā€“546, 1970. View at: Google Scholar
  15. R. T. Sabo and N. R. Chaganty, ā€œWhat can go wrong when ignoring correlation bounds in the use of generalized estimating equations,ā€ Statistics in Medicine, vol. 29, no. 24, pp. 2501ā€“2507, 2010. View at: Publisher Site | Google Scholar

Copyright © 2012 N. Rao Chaganty et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

More related articles

 PDF Download Citation Citation
 Download other formatsMore
 Order printed copiesOrder

Related articles

We are committed to sharing findings related to COVID-19 as quickly as possible. We will be providing unlimited waivers of publication charges for accepted research articles as well as case reports and case series related to COVID-19. Review articles are excluded from this waiver policy. Sign up here as a reviewer to help fast-track new submissions.