Computational and Mathematical Methods in Medicine

Volume 2016 (2016), Article ID 7329158, 8 pages

http://dx.doi.org/10.1155/2016/7329158

## Sufficient Sample Size and Power in Multilevel Ordinal Logistic Regression Models

^{1}Department of Statistics, Islamia College University, Peshawar, Pakistan^{2}Department of Statistics, Abdul Wali Khan University Mardan, Khyber Pakhtunkhwa, Pakistan^{3}Department of Statistics, Shaheed Benazir Bhutto Women University, Peshawar, Pakistan

Received 13 June 2016; Revised 12 August 2016; Accepted 24 August 2016

Academic Editor: Zoran Bursac

Copyright © 2016 Sabz Ali et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

For most of the time, biomedical researchers have been dealing with ordinal outcome variable in multilevel models where patients are nested in doctors. We can justifiably apply multilevel cumulative logit model, where the outcome variable represents the mild, severe, and extremely severe intensity of diseases like malaria and typhoid in the form of ordered categories. Based on our simulation conditions, Maximum Likelihood (ML) method is better than Penalized Quasilikelihood (PQL) method in three-category ordinal outcome variable. PQL method, however, performs equally well as ML method where five-category ordinal outcome variable is used. Further, to achieve power more than 0.80, at least 50 groups are required for both ML and PQL methods of estimation. It may be pointed out that, for five-category ordinal response variable model, the power of PQL method is slightly higher than the power of ML method.

#### 1. Introduction

Data collected from hospitals and educational institutions are mostly multilevel or hierarchical data. This type of data is frequently used by researchers to construct statistical models such as multilevel models, hierarchical models, or mixed effects models [1, 2]. As the observations in these nested data structures become dependent, the classical methods and models like analysis of variance (ANOVA) and linear regression cannot be applied because these models assume independence. Hence, the use of alternative multilevel models is warranted to analyze the nested data structure.

It is really challenging to decide about an appropriate sample size for multilevel ordinal logistic models. In the contemporary literature, only [3] discusses the issue of sample size in multilevel ordinal logistic model by using PQL method of estimation. The researcher uses three-category multilevel ordinal logistic models. Apart from this, there is no existing research on sample size and power issues in multilevel ordinal logistic models. Unlike [3], the study of [4] compares both PQL and ML methods in small group sizes. However, the study of [4] does not provide any results about power analysis. In the present study, the focus of researchers is not only to compare ML and PQL estimation methods of estimation in larger group sizes but also to provide guidelines about optimum sample size needed for multilevel ordinal logistic models.

#### 2. Materials and Methods

##### 2.1. Multilevel Logistic Regression Model

A very popular concept is used in social sciences to develop a dichotomous multilevel logistic model through a latent continuous variable model [5]. The same idea can be extended to three or more ordered categories through a threshold parameters. A threshold concept is used that the latent continuous variable underlies the observed variable . A simple two level ordinal logistic model can be written as where corresponds to level 1 explanatory variable, represents level 2 explanatory variable, and level 1 coefficients denoted by and ’s are the fixed effects. If it is assumed that follows a normal distribution, that is, , then the resulting multilevel model is termed as multilevel probit model. Similarly, if , then the model is said to be multilevel logistic model [6]. Level 2 random effects are more often assumed to have a normal distribution asAccording to [7], ICC is the proportion of group level variance compared to the total variance, represented by is the group level or level 2 variance and is the individual level or level 1 variance so the total variance = (level 2 variance) + (level 1 variance) = .

That is why the squared sigma appeared in both numerator and denominator.

Now can be linked with the observed variable through a threshold model. The threshold model for categories, , can be written as where are the threshold parameters.

For identification purpose, it is common to set the first threshold to zero and to allow an intercept in the model [8]. We will assume a proportional odds model which means that the effect of explanatory variables will remain the same across categories [9–12]. The equivalent to the cumulative logit model above in generalized linear models context is

##### 2.2. Simulation Design

The fixed effect parameters were set from the previous study by [4]. , , , and . The reason behind is that multilevel ordinal logistic model cannot identify the overall model intercept and all threshold parameters jointly. There are two ways: a researcher can estimate the intercept by setting the first threshold parameter to zero or equate the intercept to zero and estimate the threshold parameters. That is why we put . There are three groups conditions , three group sizes , and three values of ICC . These values of ICC correspond to an intercept variance , and 3.4. Similarly, random slope values were taken as , and 0.62, while , and 0.30. For each scenario, the number of simulation R was set to be 1000. We used GLIMMIX (ML) with adaptive quadrature procedure and GLLIMIX (PQL) procedures in SAS for data generation and analyses.

##### 2.3. Procedure for the Parameter Estimations

The accuracy of different fixed effect and random effect parameters estimates was calculated through the relative parameter bias, that is,while estimate is the value produced by ML or PQL method of estimations and parameter values are those taken in the simulation design. Average relative biases for Tables 1 and 2 are obtained by using (6).