Anemia

Volume 2018, Article ID 3087354, 13 pages

https://doi.org/10.1155/2018/3087354

## Multilevel Analysis of Determinants of Anemia Prevalence among Children Aged 6–59 Months in Ethiopia: Classical and Bayesian Approaches

^{1}Department of Statistics, Madda Walabu University, Robe, Ethiopia^{2}School of Mathematical and Statistical Sciences, Hawassa University, Hawassa, Ethiopia^{3}Department of Statistics, Dilla University, Dilla, Ethiopia

Correspondence should be addressed to Zeytu G. Asfaw; moc.oohay@wahsagutyez

Received 21 November 2017; Revised 12 April 2018; Accepted 22 April 2018; Published 3 June 2018

Academic Editor: Duran Canatan

Copyright © 2018 Kemal N. Kawo et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

*Background*. Anemia is a widely spread public health problem and affects individuals at all levels. However, there is a considerable regional variation in its distribution.* Objective*. Thus, this study aimed to assess and model the determinants of prevalence of anemia among children aged 6–59 months in Ethiopia.* Data*. Cross-sectional data from Ethiopian Demographic and Health Survey was used for the analysis. It was implemented by the Central Statistical Agency from 27 December 2010 through June 2011 and the sampling technique employed was multistage.* Method*. The statistical models that suit the hierarchical data such as variance components model, random intercept model, and random coefficients model were used to analyze the data. Likelihood and Bayesian approaches were used to estimate both fixed effects and random effects in multilevel analysis.* Result*. This study revealed that the prevalence of anemia among children aged between 6 and 59 months in the country was around 42.8%. The multilevel binary logistic regression analysis was performed to investigate the variation of predictor variables of the prevalence of anemia among children aged between 6 and 59 months. Accordingly, it has been identified that the number of children under five in the household, wealth index, age of children, mothers’ current working status, education level, given iron pills, size of child at birth, and source of drinking water have a significant effect on prevalence of anemia. It is found that variances related to the random term were statistically significant implying that there is variation in prevalence of anemia across regions. From the methodological aspect, it was found that random intercept model is better compared to the other two models in fitting the data well. Bayesian analysis gave consistent estimates with the respective multilevel models and additional solutions as posterior distribution of the parameters.* Conclusion*. The current study confirmed that prevalence of anemia among children aged 6–59 months in Ethiopia was severe public health problem, where 42.8 of them are anemic. Thus, stakeholders should pay attention to all significant factors mentioned in the analysis of this study but wealth index/improving household income and availability of pure drinking water are the most influential factors that should be improved anyway.

#### 1. Introduction

Anemia is a condition characterized by a low level of hemoglobin in the blood [1]. Anemia is a widespread public health problem, and severe anemia is a significant cause of childhood mortality [2]. The World Health Organization (WHO) considers anemia prevalence over 40% as a major public health problem, between 20 and 40% as a medium-level public health problem, and between 5% and 20% as a mild public health problem [1]. High prevalence of anemia and its consequences on children’s health, especially for their growth and development, have made anemia an important public health problem, given the difficulty in implementing effective measures for controlling it [3]. Therefore, it is important to understand the scope and strength of individual risk factors for anemia in populations where anemia is common to design more effective interventions [3].

According to [4] above 1.62 billion people were anemic worldwide, and approximately two-thirds of preschool children in Africa and South East Asia were anemic. According to the WHO report, more than half of the world’s preschool-age children (56.3%) reside in countries where anemia is a major public health problem [5]. In Sub-Saharan Africa, much of the national prevalence is estimated to be above 40% among this group [6]. In Ethiopia, more than four out of ten children under five (44%) were anemic [7]. From these, about 21% of children were mildly anemic, 20% were moderately anemic, and 3% were severely anemic. Even if the national anemia prevalence estimate has dropped by 19 percent, from 54 percent in 2005 to 44 percent in 2011, it was a major public health problem according to the WHO criteria.

Discussion about model comparison of multilevel model by using classical and Bayesian approaches is rare in literature. However, [8] examined the distribution of weighted anemia prevalence across different groups and performed logistic regression to assess the association of anemia with different factors based on BDHS (2011) data on hemoglobin (Hb) concentration among the children aged 6–59 months. Also [9] conducted a study on the determinants of anemia among children aged 6–59 months living in Kilte Awulaelo Woreda, Northern Ethiopia. Bivariate and multivariate logistic regression analyses were performed to identify factors related to anemia. In this paper we shall consider multilevel analysis of determinants of anemia prevalence among children aged 6–59 months in Ethiopia using classical and Bayesian approaches.

The main concerns of authors were to identify the major determinants and assess the prevalence of anemia among children aged 6–59 months in Ethiopia using classical and Bayesian approaches, that is,(i)to identify significant predictors of having high prevalence of anemia among children aged 6–59 months in Ethiopia through classical and Bayesian approach,(ii)to analyze the within- and between-regions variation of prevalence of anemia among children aged 6–59 months in Ethiopia,(iii)to make model comparison and suggest an appropriate model for analyzing anemia prevalence in Ethiopia.

#### 2. Materials and Methods

##### 2.1. Data

This study used the data collected in the Ethiopian Demographic and Health Survey [7]. The Ethiopia Demographic and Health Survey was conducted by the Central Statistical Agency (CSA) under the auspices of the support of the Ministry of Health from 27 December 2010 through June 2011 with a nationally representative sample of nearly 17,817 households. The sampling frame used for the EDHS was the Population and Housing Census conducted by the Central Statistical Authority (CSA) in 2007; during the 2007 Population and Housing Census, each of the kebeles was subdivided into convenient areas called census enumeration areas (EAs). The EDHS sample was selected using a stratified, two-stage cluster design and EAs were the sampling units for the first stage. For the 2011 EDHS, a representative sample of approximately 17,817 households from 624 clusters was selected. In the first stage, 624 clusters, 187 urban and 437 rural, were selected from the list of enumeration areas based on sampling frame. In the second stage, a complete listing of households was carried out in each selected cluster. For this study 5,507 children were included in the analysis after all incomplete observations have been deleted from the data among 9,157 total children held with hemoglobin (Hb) data.

##### 2.2. Ethical Approval

Our study was wholly based on an analysis of existing public domain health survey data sets obtained from EDHS 2011, which is freely available online with all identifier information removed. The EDHS 2011 was reviewed and approved by the ICF Macro Institutional Review Board and the National Research Ethics Committee of the Ethiopian Medical Research Council.

##### 2.3. Variables in the Study

Variables considered in this study were selected based on literatures which have been conducted at the global level. Potential determinant factors expected to be correlated with anemia status were included as variables of the study.

##### 2.4. Response Variable

Hemoglobin is necessary for transporting oxygen to tissues and organs in the body. Hemoglobin analysis was carried out onsite using a battery-operated portable HemoCue analyzer. Parents of children with a hemoglobin level under 11 g/dl were instructed to take the child to a health facility for follow-up care. Unadjusted hemoglobin values are obtained using the HemoCue instrument. Given that hemoglobin requirements differ substantially depending on altitude, an adjustment to sea-level equivalents has been made before classifying children by level of anemia. Prevalence of anemia, based on hemoglobin levels is adjusted for altitude by hemoglobin in grams per decilitre (g/dl) [7]. The response variable of this study was anemia status of children aged 6–59 months in Ethiopia. For the current analysis, response variable (anemia status) was dichotomized indicating whether one is anemic or not.

##### 2.5. Explanatory Variables

The explanatory variables which might determine the status of anemia of children among 6–59 months were socioeconomic, demographic, health, and environmental factors. From the source of data we considered the following variables region, place of residence, number of children under 5 in the household, wealth index, marital status, child’s age in months, sex of children, husband/partner’s education level, given iron pills/syrup, source of drinking water, mother’s current working status, and child’s size at birth.

##### 2.6. Common Techniques for Dealing with Missing Data

Missing data is a common problem for almost every health survey data. Missing data presents a problem in statistical analyses. The first issue in dealing with the problem of missing data is determining the missing data mechanism. The work in [10] distinguishes between three types of missing data mechanism; among them we apply missing completely at random (MCAR), which means that missingness is not related to the variables under study. To handle missing value we used listwise deletion which is a common approach and easy to perform by deleting all incomplete observations from the analysis. The results can be unbiased when data are MCAR. Even so, the disadvantage for this method is reduction of sample size.

#### 3. Methods of Statistical Analysis

Multilevel models allow the relationship between the explanatory variables at different level and dependent variables at lower level to be estimated, enabling the extent of variation in the outcome of interest to be measured at each level assumed in the model both before and after the inclusion of the explanatory variables in the model.

Two levels of data hierarchy were stated (for instance, individual children of households and regions) in a multilevel logistic regression model. Units at one level are nested within units at the next higher level. In this study the basic data structure of the two-level logistic regression is a collection of N groups (regions) and within group a random sample of level one units (individual children of households). The response variable is denoted by

with probability being the probability of children with any anemia for the household in the region and the probability being the probability of nonanemic (normal) children for the households in the regions. Here, follows a Bernoulli distribution.

##### 3.1. The Variance Components Model

The variance component two-level model for a dichotomous outcome variable refers to a population of groups (level-two units (regions)) and specifies the probability distribution for group-dependent probabilities in without taking further explanatory variables into account. We focus on the model that specifies the transformed probabilities to have a normal distribution. This is expressed, for a general link function , by the formula where is the population average of the transformed probabilities and the random deviation from this average for group j. Intraclass correlation coefficient (ICC) represents the proportion of the total variance that is attributable to between-group differences and it provides an assessment of whether or not significant between-groups variation exists. Then the intraclass correlation coefficient (ICC) at regions level is given by where is the between-groups variance which can be estimated by and is within-group variance [11].

##### 3.2. The Random Intercept Model

The random intercept model is used to model unobserved heterogeneity in the overall response by introducing random effects. In the random intercept model the intercept is the only random effect meaning that the groups differ with respect to the average value of the response variable, but the relation between explanatory and response variables cannot differ between groups. The random intercept model expresses the log odds, i.e., the logit of , as a sum of linear functions of the explanatory variables. That is, where the intercept term is assumed to vary randomly and is given by the sum of an average intercept and group-dependent deviations ; that is, .

As a result we have Solving for ,Equation (6) does not include a level one residual because it is an equation for the probability rather than for the outcome , where is the fixed part of the model. The remaining is called the random or the stochastic part of the model. It is assumed that the residual is mutually independent and normally distributed with mean zero and variance [12].

##### 3.3. The Random Coefficients Model

In the random coefficient model both the intercepts and slopes are allowed to differ across the region. Suppose that there are k level one explanatory variables , and consider the model where all X-variables have varying slopes and random intercept. That is, where and , h-1,2,…,k.

Here the first part of (8), , is called the fixed part of the model and the second part is called the random part of the model.

##### 3.4. Parameter Estimation of Multilevel Model

###### 3.4.1. Likelihood Method

The maximum likelihood (ML) method is a general estimation procedure, which produces estimates for the population parameters that maximize the probability of observing the data that are actually observed. Assuming that the conditional distributions of given the random effect are independent of each other, the conditional density of is given by :

For two-level logistic Bernoulli response model, where random effects are assumed to be multivariate normal and independent across units, the marginal likelihood function is given by where is variance covariance matrix. is typically assumed to be the multivariate normal density and can be written in the form .

##### 3.5. Bayesian Modeling

Bayesian inference involves creating a complete probability model over all data and parameters of interest, fitting the model to observed data, and then reasoning about either the fitted parameters or about new data taking into account the uncertainty in the fitted parameters. In a Bayesian formulation the uncertainty about the value of each parameter can be represented by a probability distribution, if prior knowledge can be quantified [13]. In Bayesian approach, either mean or median of the posterior samples for each parameter of interest is reported as a point estimate. 2.5% and 97.5% percentiles of the posterior samples for each parameter give a 95% posterior credible interval (interval within which the parameter lies with probability 0.95).

##### 3.6. The Likelihood Function

The key ingredients to a Bayesian analysis are the likelihood function, which reflects information about the parameters contained in the data, and the prior distribution, which quantifies what is known about the parameters before observing data. The prior distribution and likelihood can be easily combined to form the posterior distribution, which represents total knowledge about the parameters after the data have been observed. Bayesian multilevel logistic analysis specifies a dichotomous dependent variable as a function of a set of explanatory variables. The likelihood contribution from the subject in the group is Bernoulli: where represents the probability of the event for subject i in j group that has covariate vector and indicates the presence or absence of the event for that subject. In multilevel logistic regression, we know thatwhere is fixed part of the model and is random part of the model and .

is the probability of child in group (region) being anemic, so that the likelihood contribution for the subject in the region is Since individual subjects in the group are assumed to be independent of each other, the likelihood function over a data set of n subjects in the 11 region is then

##### 3.7. Prior Distribution

The prior distribution is a probability distribution that represents the prior information associated with the parameters of interest. There are two types of prior distribution: informative priors and noninformative priors.

##### 3.8. Model Comparison

In this study Akaike information criterion (AIC) and Bayesian information criterion (BIC) were used for model comparison. A model with a lower AIC and BIC is preferred over a model with a larger AIC and BIC.

##### 3.9. Software Used

The statistical software types used in this study were SPSS version 20 (StataCorp, Texas 77845, USA) and WinBUGS14. SPSS was used for the descriptive analysis, STATA was used for multilevel analysis part, and WinBUGS14 was used for Bayesian analysis.

#### 4. Results and Discussions

The results of the analysis are divided into descriptive analysis and multilevel binary logistic models from categorical data. Results and their discussions are presented in the following sections.

##### 4.1. Descriptive Analysis

Descriptive statistics are a set of brief descriptive figures that summarizes a given data set, which can be a representation of entire sample. A total sample of 5,507 children aged between 6 and 59 months was included in this study. Among these, 2358 (42.8%) were anemic (Hb < 11.0 g/dl) while 3149 (57.2%) were not anemic at the date of the survey.

##### 4.2. Bivariate Analysis between Response and Predictors

This section reports the association between the response variable and each predictor variable. The bivariate analysis, based on Pearson’s chi-square statistic, provides a preliminary insight into the association/relationship between all selected independent variables and dependent variable. High values of Pearson chi-square for a given independent variable indicate that there is strong association between each of the given independent variables and the dependent variable keeping the effect of the other factors constant. That is, testing the hypotheses are as follows: = there is no association between the dependent and independent variables. = there is association between the dependent and the particular independent variable.

The decision was based on the chi-square value and p value at 0.05 level of significance.

Basic descriptive information that summarizes the association between predictors and response variable is presented in Table 1. The results in Table 1 show the row percentage and count of anemic/not anemic status of children aged 6–59 months with respect to the categorical covariates.