Sufficient Sample Size and Power in Multilevel Ordinal Logistic Regression Models

Ali, Sabz; Ali, Amjad; Khan, Sajjad Ahmad; Hussain, Sundas

doi:https://doi.org/10.1155/2016/7329158

Computational and Mathematical Methods in Medicine

On this page

Abstract Introduction Materials and Methods Results and Discussion Conclusion References Copyright Related Articles

Research Article | Open Access

Volume 2016 | Article ID 7329158 | https://doi.org/10.1155/2016/7329158

Sufficient Sample Size and Power in Multilevel Ordinal Logistic Regression Models

Sabz Ali,¹Amjad Ali,¹Sajjad Ahmad Khan,²and Sundas Hussain³

Academic Editor: Zoran Bursac

Received13 Jun 2016

Revised12 Aug 2016

Accepted24 Aug 2016

Published22 Sept 2016

Abstract

For most of the time, biomedical researchers have been dealing with ordinal outcome variable in multilevel models where patients are nested in doctors. We can justifiably apply multilevel cumulative logit model, where the outcome variable represents the mild, severe, and extremely severe intensity of diseases like malaria and typhoid in the form of ordered categories. Based on our simulation conditions, Maximum Likelihood (ML) method is better than Penalized Quasilikelihood (PQL) method in three-category ordinal outcome variable. PQL method, however, performs equally well as ML method where five-category ordinal outcome variable is used. Further, to achieve power more than 0.80, at least 50 groups are required for both ML and PQL methods of estimation. It may be pointed out that, for five-category ordinal response variable model, the power of PQL method is slightly higher than the power of ML method.

1. Introduction

Data collected from hospitals and educational institutions are mostly multilevel or hierarchical data. This type of data is frequently used by researchers to construct statistical models such as multilevel models, hierarchical models, or mixed effects models [1, 2]. As the observations in these nested data structures become dependent, the classical methods and models like analysis of variance (ANOVA) and linear regression cannot be applied because these models assume independence. Hence, the use of alternative multilevel models is warranted to analyze the nested data structure.

It is really challenging to decide about an appropriate sample size for multilevel ordinal logistic models. In the contemporary literature, only [3] discusses the issue of sample size in multilevel ordinal logistic model by using PQL method of estimation. The researcher uses three-category multilevel ordinal logistic models. Apart from this, there is no existing research on sample size and power issues in multilevel ordinal logistic models. Unlike [3], the study of [4] compares both PQL and ML methods in small group sizes. However, the study of [4] does not provide any results about power analysis. In the present study, the focus of researchers is not only to compare ML and PQL estimation methods of estimation in larger group sizes but also to provide guidelines about optimum sample size needed for multilevel ordinal logistic models.

2. Materials and Methods

2.1. Multilevel Logistic Regression Model

A very popular concept is used in social sciences to develop a dichotomous multilevel logistic model through a latent continuous variable model [5]. The same idea can be extended to three or more ordered categories through a threshold parameters. A threshold concept is used that the latent continuous variable underlies the observed variable . A simple two level ordinal logistic model can be written as where corresponds to level 1 explanatory variable, represents level 2 explanatory variable, and level 1 coefficients denoted by and ’s are the fixed effects. If it is assumed that follows a normal distribution, that is, , then the resulting multilevel model is termed as multilevel probit model. Similarly, if , then the model is said to be multilevel logistic model [6]. Level 2 random effects are more often assumed to have a normal distribution asAccording to [7], ICC is the proportion of group level variance compared to the total variance, represented by is the group level or level 2 variance and is the individual level or level 1 variance so the total variance = (level 2 variance) + (level 1 variance) = .

That is why the squared sigma appeared in both numerator and denominator.

Now can be linked with the observed variable through a threshold model. The threshold model for categories, , can be written as where are the threshold parameters.

For identification purpose, it is common to set the first threshold to zero and to allow an intercept in the model [8]. We will assume a proportional odds model which means that the effect of explanatory variables will remain the same across categories [9–12]. The equivalent to the cumulative logit model above in generalized linear models context is

2.2. Simulation Design

The fixed effect parameters were set from the previous study by [4]. , , , and . The reason behind is that multilevel ordinal logistic model cannot identify the overall model intercept and all threshold parameters jointly. There are two ways: a researcher can estimate the intercept by setting the first threshold parameter to zero or equate the intercept to zero and estimate the threshold parameters. That is why we put . There are three groups conditions , three group sizes , and three values of ICC . These values of ICC correspond to an intercept variance , and 3.4. Similarly, random slope values were taken as , and 0.62, while , and 0.30. For each scenario, the number of simulation R was set to be 1000. We used GLIMMIX (ML) with adaptive quadrature procedure and GLLIMIX (PQL) procedures in SAS for data generation and analyses.

2.3. Procedure for the Parameter Estimations

The accuracy of different fixed effect and random effect parameters estimates was calculated through the relative parameter bias, that is,while estimate is the value produced by ML or PQL method of estimations and parameter values are those taken in the simulation design. Average relative biases for Tables 1 and 2 are obtained by using (6).

Similarly, power was computed asEmpirical powers were computed for , , and by using (7). The power was calculated as the number of replications in which , was correctly rejected at 5 percent level of significance divided by 1000 as we used 1000 replications for each condition. Power values between 0.80 and 1.0 were considered excellent and below 0.80 as inadequate [13]. For brevity, the threshold estimates are excluded as researchers are not interested in their values.

Empirical coverage rates of 95% confidence intervals were used to judge the accuracy of the standard errors of estimated parameters [14]. Tables 4–11 were obtained by using The 95% confidence intervals coverage rates were computed in each condition as the proportion of replications in which the true parameter is captured by the 95% confidence interval. Reference [15] recommended acceptable coverage rates as 92.5% to 97.5%. A separate logistic regression was used to assess the impact of simulation conditions on empirical coverage rates of estimates. values in Tables 4 to 11 are obtained by

3. Results and Discussion

Tables 1 and 2 are estimated for the purpose of checking the accuracy of both fixed and random effects estimates through relative bias. Absolute relative bias <5% is negligible under both ML and PQL method. Estimates that have absolute relative bias close to zero are considered as unbiased estimates; that is, estimates and parameters become identical.

Tables 4 to 11 are computed to check the accuracy of estimates standard errors through 95% confidence interval. Those estimates which achieve acceptable coverage rates (92.5% to 97.5%) are considered best. Similarly, Table 3 is computed to get power rates under both ML and PQL methods.

Table 1 represents the impact of various simulation conditions on average relative bias of the fixed effect estimates under two estimation methods ML and PQL. Estimates were substantially biased when the number of groups was 30, group size was 5, and under ML method. The estimates relative bias was less than 5% when the number of groups was 50 under ML method. With 100 groups, estimates were unbiased under ML method. Estimates relative bias was negligible when the number of groups was 100 under ML method. Relative bias of estimates was highly influenced by the number of groups under ML method. On the contrary, estimates were underestimated under PQL method. Substantial bias of estimates was noted when group size was 5 and under PQL method. Estimates relative bias was less than 5% when group size was 50 under PQL method. On average, ML method fixed effects estimates average relative bias was smaller than that of PQL method in absolute terms. Moreover, group size impact was larger on estimates relative bias under PQL method.

Similarly, ML method random effects estimates were substantially biased when the number of groups was 30 as shown in Table 3. With 100 groups, random effects estimates bias was less than 5% consistently. ML method performed well when the number of groups was large. On the other hand, random effects estimates were substantially biased under PQL method when group size was 5 and . Like fixed effects estimates, ML method random effects bias was smaller than that of PQL method random effects estimates bias in absolute terms.

Table 3 reflects the power pattern of estimates under both ML and PQL methods in five-category ordinal outcome variable model. Further, to achieve more than 0.80 power rates for estimates, at least 50 groups are mandatory for both estimation methods. However, the power of PQL method estimates was slightly higher than that of ML method estimates in five-category ordinal outcome variable model.

Table 4 lists empirical coverage rates for estimates of the multilevel ordinal logistic model under ML method when response variable had three categories and distribution type was symmetrical. Under the three group conditions, the fixed effect estimates achieve the acceptable coverage rates (92.5 to 97.5) defined by [13]. However, random effect estimates coverage rates were smaller than the nominal coverage rates. The effect of the number of groups was significant on estimates coverage rates. However, group size and ICC effect was minimal on estimates coverage rates. Table 4 results suggest that a large number of groups should be used rather than larger group size under ML method.

Table 5 lists empirical coverage rates for estimates of the multilevel ordinal logistic model under ML method when response variable had three categories and distribution type was skewed. Again, group size and ICC effect was minimal on estimates empirical coverage rates. The impact of the number of groups was again significant and dominant on estimates empirical coverage rates. Moreover, to achieve unbiased standard errors of estimates, large number of groups will be better than the larger group size under ML method.

Table 6 lists empirical coverage rates for estimates of the multilevel ordinal logistic model under ML method when response variable had five categories and distribution type was symmetrical. Like three-category response model, the influence of the number of groups was significant on estimates coverage rates under ML method. More groups should be used to achieve the desired results under ML method.

Table 7 lists empirical coverage rates for estimates of the multilevel ordinal logistic model under ML method when response variable had five categories and distribution type was skewed. Like five-category response model, the influence of the number of groups was again significant on estimates coverage rates under ML method. On the other hand, group size and ICC effect was insignificant on estimates empirical coverage rates under ML method.

Table 8 lists empirical coverage rates for estimates of the multilevel ordinal logistic model under PQL method when response variable had three categories and distribution type was symmetrical. With group size 5, fixed effects estimates coverage rates were unacceptable. However, with group size of 30 and 50, respectively, fixed estimates coverage rates were acceptable. Furthermore, random effects estimates coverage rates were unacceptable across all conditions. Additionally, ICC had also a significant effect on estimates coverage rates; that is, coverage rates decreased with the larger ICC values under PQL method. On the other hand, groups effect was little under PQL method on estimates coverage rates.

Table 9 lists empirical coverage rates for estimates of the multilevel ordinal logistic model under PQL method when response variable had three categories and distribution type was skewed. Group size factor had a significant effect on estimates coverage rates. However, coverage rates consistently decreased with the increase in ICC values under PQL method. Fixed effects achieved acceptable coverage rates across the last two levels of group size factor, that is, 30 and 50, respectively. Groups had a significant effect on coverage rates. Larger group sizes seem to be good under PQL method.

Table 10 lists empirical coverage rates for estimates of the multilevel ordinal logistic model under PQL method when response variable had five categories and distribution type was symmetrical. Fixed effects achieved acceptable coverage rates across all group sizes, definitely reflecting improved performance of PQL method over three-category response model. The performance of PQL method was as good as ML method in five-category response model. However, estimates coverage rates decreased with the larger values of ICC. Again, the effect of the number of groups was insignificant on coverage rates.

Table 11 lists empirical coverage rates for estimates of the multilevel ordinal logistic model under PQL method when response variable had five categories and distribution type was skewed. Fixed effect coverage rates were all acceptable across all group sizes. However, random effects coverage was unacceptable and smaller than that of ML coverage rates. Group size again showed a significant effects on all estimates coverage rates while groups effect was insignificant.

4. Conclusion

In both symmetrical and skewed distribution shapes of category responses, , , and were overestimated in majority of the conditions under ML method of estimation when three-category and five-category multilevel ordinal logistic models were used. Moreover, random effects estimates, that is, random intercept and random slope estimates, were more biased than fixed effects estimates on average. The random effects estimates were not substantially biased under ML method of estimation, especially when the number of groups was large and random effects were not small. On the contrary, , , and were underestimated almost across all conditions under PQL method of estimation when three-category and five-category multilevel ordinal logistic models were used. Like ML method of estimation, random effects estimates relative biases were higher on average than those of the fixed effects estimates under PQL method. Both fixed effects and random effects estimates were not substantially biased under PQL method of estimation, particularly when population random effects were small (, , and ) and group sizes were large. In general, the absolute relative bias of ML method estimates was consistently smaller than that of PQL method estimates.

The accuracy of standard errors of the estimates (excluding threshold estimates) was judged through empirical coverage rates. The influence of the number of groups was significant on the accuracy of both three-category and five-category multilevel ordinal logistic models estimates standard errors under ML method of estimation. On the contrary, the group size factor and ICC had an insignificant effect on estimates standard errors under ML method of estimation. Estimates standard errors were least biased when number of groups was 100. On the other hand, the influence of group size factor was highly significant on the accuracy of estimates standard errors under PQL method of estimation. Furthermore, ICC also influenced estimates standard errors; that is, standard errors were substantially biased when population random effects were medium (, , and ) and large (, , and ). The impact of the number of groups on estimates standard errors in majority of the conditions was minimal under PQL method. However, five-category multilevel ordinal logistic model estimates standard errors were as good as that of ML method.

The power rates of PQL estimates were slightly higher than that of ML estimates when ordinal response variable had five categories, which indicate that PQL standard errors were least biased due to increases in the number of categories of ordinal response variable.

In general, ML method performed well in terms of estimates small bias, high coverage rates, and high power rates when ordinal response variable had three categories. However, in five-category ordinal response variable model, PQL method performances were comparable to those of ML method. PQL estimates were poor when population random effects (ICC) were medium and large while ML estimates were poor in small population random effects. In addition, ML estimates and estimates standard errors benefitted from larger number of groups while PQL estimates and estimates standard errors benefitted from larger group sizes. We recommend at least 100 groups and 30 units per group to achieve accurate multilevel ordinal logistic model estimates and estimates standard errors when method of estimation is ML. Furthermore, 100 groups and at least 50 units per group are essential for accurate multilevel ordinal logistic model estimates and estimates standard errors when method of estimation is PQL. Similarly, at least 50 groups are essential to achieve 0.80 or more power for both ML and PQL methods of estimation. On the basis of this study, it is recommended that PQL method may be avoided when group sizes are small, number of groups are large, random effects are medium and large, and outcome variable has three categories. In such conditions, ML method is the best option. However, when the outcome variable has five or more categories, random effects are small, group sizes are large, and number of groups is small, PQL method may be better option.

Competing Interests

The authors declare that there is no conflict of interests regarding publication of this paper.

References

S. W. Raudenbush and A. S. Bryk, Hierarchical Linear Models: Applications and Data Analysis Methods, vol. 1, Sage, Thousand Oaks, Calif, USA, 2nd edition, 2002.
H. Goldstein, Multilevel Statistical Models, Edward Arnold, London, UK, 2nd edition, 1995.
A. M. Timberlake, Sample size in ordinal logistic hierarchical linear modeling [Ph.D. thesis], Department of Educational Policy Studies, Georgia State University, 2011.
D. J. Bauer and S. K. Sterba, “Fitting multilevel models with ordinal outcomes: performance of alternative specifications and methods of estimation,” Psychological Methods, vol. 16, no. 4, pp. 373–390, 2011.
View at: Publisher Site | Google Scholar
J. S. Long, Regression Models for Categorical and Limited Dependent Variables, Sage Publications, Thousand Oaks, Calif, USA, 1997.
A. Agresti, Categorical Data Analysis, Wiley, New York, NY, USA, 1990.
View at: MathSciNet
J. J. Hox, Multilevel Analysis: Techniques and Applications, Lawrence Erlbaum Associates, Mahwah, NJ, USA, 2002.
J. J. Hox, Applied Multilevel Analysis, TT-Publikaties, Amsterdam, The Netherlands, 1995.
P. McCullagh and J. A. Nelder, Generalized Linear Models, Chapman & Hall, London, UK, 2nd edition, 1989.
View at: Publisher Site | MathSciNet
D. Hedeker and R. D. Gibbons, “A random-effects ordinal regression model for multilevel analysis,” Biometrics, vol. 50, no. 4, pp. 933–944, 1994.
View at: Publisher Site | Google Scholar
A. Agresti and J. B. Lang, “A proportional odds model with subject-specific effects for repeated ordered categorical responses,” Biometrika, vol. 80, no. 3, pp. 527–534, 1993.
View at: Publisher Site | Google Scholar | MathSciNet
P. McCullagh, “Regression models for ordinal data,” Journal of the Royal Statistical Society. Series B. Methodological, vol. 42, no. 2, pp. 109–142, 1980.
View at: Google Scholar | MathSciNet
J. Cohen, “Quantitative methods in psychology: a power primer,” Psychological Bulletin, vol. 112, no. 1, pp. 155–159, 1992.
View at: Publisher Site | Google Scholar
W. Chang, Sufficient sample sizes for the multivariate multilevel regression model [Ph.D. dissertation], Faculty of the Graduate School, The University of Texas at Austin, 2015.
J. V. Bradley, “Robustness,” British Journal of Mathematical and Statistical Psychology, vol. 16, no. 2, pp. 144–152, 1978.
View at: Google Scholar

Copyright

Copyright © 2016 Sabz Ali et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

8454

Downloads

2185

Citations