#### Abstract

Small area models have become popular methods for producing reliable estimates for sub-populations (small geographic areas in this study). Small area modeling may be carried out via model-assisted approaches within the model-based approaches or design-based paradigm. When there are medium or large samples, a model-assisted approach may be reliable. However, when data are scarce, a model-based technique may be required. Model-based Bayesian analysis is popular for its ability to combine information from several sources as well as taking account uncertainties in the analysis and spatial prediction of spatial data. Nevertheless, things become more complex when the geographic boundaries of interest are misaligned. Some authors have addressed the problem of misalignment under hierarchical Bayesian approach. In this study, we developed non-trivial extension of existing hierarchical Bayesian model for a binary outcome variable under spatial misalignment with three contributions. First, the model uses unit-level survey data and area-level auxiliary data to predict the posterior mean proportion spatially at the second geographic area level. Second, the linking model is changed to logit-normal model in the proposed model. Lastly, the mean process was considered to overcome the multicollinearity between the true predictors and the spatial random effect. Sensitivity analysis was also done via simulation.

#### 1. Introduction

Sample surveys are widely used as a cost-effective way to gather information on variables of interest in target populations. Sample survey data are used to obtain precise estimates of parameters for the whole population and for large sub-populations (domains). However, a particular widespread demand from policy makers is for estimates at a finer level of domains.

In sample survey, when the domains were not originally planned, they are usually poorly represented or even not represented at all. These domains are called small areas, for which reliable (precise and accurate) estimates based on only domain-specific sample data cannot be produced [1]. Small areas can be based on geographic boundaries or based on a combination of socio-demographic characteristics like urban/rural, male/female, age group, economic status, and so on. The term “small area” in this study context refers to a sub-population based on geographical classification, i.e., small geographic areas.

In this case, small area estimation (SAE) is a technique for improving estimation accuracy and providing accurate parameter estimates for small areas [2–4]. This has led to the development of a variety of models that combine survey data with data from outside the survey, often related to recent censuses and current administrative data, for the target small areas to improve precision.

Statistical models in small area estimation can be utilised in either a model-assisted or model-based approach. The model-assisted estimators [5] employ working models, in which a model is specified yet desired design-based properties are retained even when the model is misspecified.

When the sample size from the target population is large enough and sampling plan is well, according to a cautious viewpoint [6], model-assisted inference performs satisfactorily. Despite the fact that a survey’s sampling strategy and sample size are normally well planned at the population level, due to probability sampling and budget constraints, the sample size in small areas can be small or even zero. In this case, model-assisted estimators for small area parameters cannot achieve desired accuracy and thus become inaccurate, necessitating the use of a model-based approach.

There now exist a wide range of model-based approaches depending on the nature of the measurement (e.g., binary, count, or continuous) and on the availability of auxiliary data (area level or unit level).

On average, unit-level estimates are expected to be more precise than area-level estimates [7]. When unit-level data are available, we can use a model for more in-depth analysis. However, access to unit-level auxiliary data might be challenging. Hence, a fusion model incorporating both unit and area-level data can be considered to accommodate this challenge. The fusion small area model exploits the area-level auxiliary data and individual-level survey data, which are effectively applied by other authors under the Bayesian approach [8].

Bayesian approach of modeling has the ability to combine information from several sources using melding or data fusion [9]. The approach also simplifies computation of measures of accuracy in the SAE, which produces realizations of the posterior distribution of the target quantities [10]. Under the hierarchical Bayesian framework, Ghosh et al. [11] proposed a generalized linear model with or without spatial correlation structure. Furthermore, Bakar et al. [8] proposed a hierarchical Bayesian spatial model for several categorical data under spatial misalignment.

Misaligned data models have recently been presented using the hierarchical Bayesian framework to solve the challenge of making inferences for small areas that differ from those for which survey data are available [8, 12, 13]. Following this, Trevisani et al. [14] proposed a model to handle spatial misalignment for count data. For several categorical variables under spatial misalignment, Bakar et al. [8] proposed a hierarchicthe posterior mean proportional Bayesian model to handle this issue.

Furthermore, Bakar et al. [15] proposed to estimate a binary outcome variable under purely spatial setting for secondary geography where survey data were not collected, where the primary and secondary geographic areas are spatially misaligned. However, the study by Bakar et al. [15] considered area-level data.

The aim of this research was to develop a hierarchical Bayesian spatial fusion small area model for binary data with spatial misalignment. The proposed model is a non-trivial extension of Bakar et al. [8] for the context of binary outcome variable under spatial misalignment with three contributions. First, the model uses unit-level survey data and area-level auxiliary data to predict the posterior mean proportion spatially at the second geographic area level. The individual process attempts to make better use of all the information gathered at the individual level. The area-level process, including the spatial random effect, which is estimated from the primary geographic areal units, is reused for prediction at the second area level. Second, the linking model is changed to , logit-normal model, in the proposed model. The logit-normal model facilitates easy computation of the parameters in the model. Last, the mean process was considered to overcome the multicollinearity between the true predictors and the spatial random effect. Unlike Bakar et al. [8], considering the mean process accounts for the hierarchical nature of the modeling.

The following is an outline of the remainder of this paper. Section 2 presents our modeling technique as well as the underlying distribution theory, backed up by relevant literature. Section 3 demonstrates the details of inference making under a number of sub-sections. Section 3’s first two sub-sections are all about the details based on the primary geographic area, while the last two sub-sections are about the secondary geographic area. Section 4 highlights a simulation study illustrating how the proposed model can be put into practice. Concluding remarks are given in Section 5. Finally, appendices A–C present proofs and derivations of joint posterior, full conditional, and predictive distributions.

#### 2. Model Development

This section introduces a spatial hierarchical Bayesian statistical small area model for the binary response variable. Let be a binary response variable for the individual in the area, for , , and where denotes the number of small areas and is the number of individuals in area.

Consider taking values one (with probabilities ) and zero (with probabilities ), where follow Bernoulli distribution, that is, . Hence, is a Bernoulli response at area and a logit model with probability defined as follows:

This may also be written in a generalized linear mixed model form, where is considered as a logit link function, as follows:

Assuming equal unit-level error variance [16], , and taking as the area-level matrix of auxiliary covariates, then the first term of the right hand side of equation (2) becomeswhere is the auxiliary covariate at the area level and is its respective regression coefficient with , where L refers to the number of auxiliary covariates in the model. We assume equal error variance, [16], to make the posterior distributions proper.

The proposed hierarchical Bayesian spatial model for the binary data contains an underlying process whose predictive distribution on the probability scale is less uncertain than the observed proportions. The statistical dependencies caused by geographic proximity are modeled explicitly, and Bayesian conclusions account for them in a cohesive and natural way.

Therefore, embedding spatially correlated process,where the mean process, , is given in equation (3) and is assumed not have a column for intercept, for the intercept to be implicitly present in spatial component [17]. This will not affect the model as the collinearity between the spatial random effects and the intercept is not of interest. Hence, we let the spatial random component remain unconstrained.

Furthermore, the term , the spatial component, is a spatially correlated process. Defining and understanding the spatial process is typically essential for generating the full conditional distribution of .

Intrinsic conditional autoregressive (ICAR) and conditional autoregressive (CAR) models are popular methods used to model areal spatial random effects [18, 19], commonly considered in the Besag–York–Mollie (BYM) model. However, applying the CAR/ICAR technique to sparse area-level data is sometimes problematic since areas without observations are frequently neighbored by areas with observations. It is quite difficult to get spatial predictions in an overlapping area using the CAR modeling method. Another approach to model spatial random effect component is to use spatial basis functions. Moran’s I basis function for a discrete Gaussian spatial field may be utilised to alleviate the sparseness and geographically overlapped predictions [20] by evaluating a latent spatial surface process formed by the basis.

Moran’s I basis function [21] is one of the prominent techniques for modeling areal spatial process. In contrast to the CAR models, the process is updated using a multivariate distribution, which adds the big-n challenge stated in spatial literature [22]. As a result, for dimension reduction, *m* eigenvectors corresponding to the first *m* positive eigenvalues of a Moran’s I operator matrix were used where [20]. Moran’s I basis function, a discretely indexed areal spatial process model in general, which implements the same weights in geographical areas with large and small geographical border sizes, is frequently not appropriate [23]. However, due to the position and nature of the large area, it is feasible that the large area may contribute to a relatively minor spatial correlation with other surrounding areas. The other spatial basis function, the multiresolution basis function, similarly, could overcome the big-n problem.

The multiresolution spatial basis function is utilised for point-referenced data [24, 25]. In multiresolution spatial basis functions, *m* knot points are defined across the spatial domain [25] by utilising reduced rank approximation [26, 27]. Recently, a number of authors used spatial basis functions that included both the multiresolution basis function and Moran’s I basis function [8, 15]. This integrates data from a neighborhood matrix of areas and distances between their centroids as well as allows non-stationarity [8].

This study used a spatial basis function for defining the spatial component, which included the multiresolution and Moran’s I spatial basis functions for the spatial process. This study also developed a model-based strategy for handling the spatial predictive issue by employing a spatial basis function that tackles the sparse data problem via a latent spatial field and also accommodates the change of support problem. The spatial basis used in this study is a mix of Moran’s I basis [21], which is used for area-level modeling, and the multiresolution spatial basis defined via the bi-square function, which is used for point-referenced spatial data [24, 28]. Such a basis function was also developed by Bakar [23] for a Gaussian model. However, this study extends further to the spatial generalized linear models.

Therefore, the spatial component of the model in equation (4), which can be expressed in terms of reduced spatial component and the spatial basis function , is written aswhere is a vector of *m* dimension and is a spatial basis that comprises Moran’s I basis function and multiresolution basis function such that

This is a mix of the multiresolution spatial basis and eigenvectors of Moran’s I operator matrix . The term in equation (5) is an identically and independently distributed error process used to capture the remaining random component so as to take the uncertainties arising from dimension reduction, .

To derive spatial basis functions, there is a need to first derive both Moran’s I basis function and multiresolution basis functions. For deriving Moran’s I basis functions, Moran’s I operator matrix, , from Moran’s I statistic is derived. Moran’s I statistic is a measure of spatial association, which equals a weighted sum of squares where the weights are called [21]. For this, reparameterization is considered a gateway.

The reparameterization is used to demonstrate that the random effects can increase the variance of the posterior distribution of due to the confounding between the spatial random effects and the fixed effect predictors [17]. Reich et al. [17] defined the reparameterization as smoothing orthogonal to the fixed effects. Furthermore, Griffith et al. [29] augmented a model with selected eigenvectors of to reveal the structure of missing spatial covariates, where A is neighborhood weight matrix, 1 is the n-vector of 1s, and I is the identity matrix. Observing that missing covariates are not of primary interest, rather in smoothing orthogonal to . Hence, replace with , where is the orthogonal projection onto orthogonal complement, that is, with . The resulting operator may be found in the numerator of Moran’s I statistic, which is a popular non-parametric measure of spatial dependence [30].

Recently, the approach of dimension reduction by Reich et al. [17] was adapted by Bakar et al. [8, 15]. Bakar et al. [8] used mean process for the case of modeling several categorical variables simultaneously unlike design matrix, . In this study, the mean process generated from the predictors was used to construct the basis so as to alleviate the well-known problem of collinearity [17].

Considering the mean process, , generated from the predictors and neighborhood weight matrix, , Moran’s I operator matrix, is explicitly defined aswhere , is an identity matrix, and is nearest neighbor weight matrix containing .

Instead of using the first eigenvectors of , which correspond to the positive eigenvalues [21], eigenvectors to define were employed [8, 17] for compatibility of matrix algebra. This may be accomplished by performing a spectral decomposition of Moran’s operator matrix [20].

Set *m* knot points to define the multiresolution spatial basis . The selection of the position and number of knots is a problem that is connected to . For the position of knot points, a regular grid or its modification is generally a common choice [31, 32]. Then, of order is produced using the distances between the knot points and the centroid of the areas, while a bi-square function is used to define the multiresolution spatial basis function, . Therefore, for basis functions of the resolution, the bi-square basis function is defined in aswhere is an indicator function, is the number of basis functions at the resolution, is the Euclidean norm, is the center of the basis function at the resolution, and is the radius of its spatial support (also called the aperture).

The choice of determines the multiple resolutions, which are used to capture different dependence scales [24, 25, 27]. Several basis functions with centers beyond the research region are incorporated to account for boundary effects [33].

When knot-point sites are equidistant across the region, the value of is a multiple of the shortest distance between knot-point sites. Cressie et al. [24] utilised a constant range for all knot points with a particular resolution. To accommodate knot locations that may not be equidistant, the ranges of the knot points were enabled to vary. Hence, was defined as a multiple of the shortest distance between knot points and all the other knot points. In practice, is defined as the shortest Euclidean distance between basis function centers of the same resolution multiplied by 1.5 [24]. As a result, is a spatial basis that reflects the identification of the neighborhood phenomenon of the geographically specified areal borders as well as their distances.

Furthermore, we model , which is a random vector of spatial effect, by assuming a Gaussian distribution with covariance matrix of lower dimensional and mean zero. Here comprises smoothing parameter and a spatial covariance , that is,where , is an matrix with diagonal entries and non-diagonal entries. The diagonal entry is the number of region i’s neighbors, and non-diagonal entries if areas *j* and *i* are neighbors, and 0 otherwise [17]. This allows one to incorporate neighborhood information into the covariance matrix, . It can be computed using the neighborhood weight matrix, , as , and 1 is a vector of 1s.

Since *Q* is singular [34], a spectral decomposition of was used to derive the full conditional distribution of the spatial process .

Thus, we can writewhere is a diagonal matrix and is an orthogonal matrix of order [20]. Following Hughes et al. [21], in this study, eigenvectors of are assumed to be known with eigenvalues known up to a multiplicative constant.

Graphical representation of hierarchical Bayesian spatial small area models can be helpful for the readers in understanding structure and building complex models. Hence, the overall scheme of the model is given in Figure 1.

#### 3. Bayesian Inference

##### 3.1. Posterior Distribution

This section describes the hierarchical derivation of the posterior distribution. The hierarchical specification for the desired joint model is in terms of data models, process models, and parameter models [35]. Such specification is likened to a simple fact from specifications of the Bayes theorem [35] (see Appendix A).

In Bayesian statistics, the posterior distribution has to be an appropriate distribution. It is vital to ensure that the sufficient conditions, in terms of the parameters’ priors, are satisfied so that the resultant posterior is suitable. As a result, we applied proper priors to all unknown parameters to ensure that the posterior distribution is correct [36]. In choosing a prior belonging to a particular distributional family, some choices may be more computationally convenient than others. In particular, it may be possible to choose a prior distribution which is conjugate to the likelihood, in which case the prior and posterior distribution will be in the same family and have the same distributional form. Accordingly, a conjugate normal prior distribution for the elements of and inverse gamma prior distribution for the variance parameters and smoothing parameter are used.

Direct assessment of the posterior distribution necessitates high-dimensional numerical integration, which is computationally infeasible. Therefore, the Markov Chain Monte Carlo (MCMC) algorithm was used to generate samples from the posterior distributions [37]. The full conditional distributions are required to execute MCMC sampling under the model.

##### 3.2. Full Conditional Distributions

The full conditional distributions are derived by abstracting only those elements including the parameter of interest from the posterior distribution, in this case from equation (A.5), and treating other components as constants [38]. Accordingly, we obtained the full conditional distributions for the variance parameters , smoothing parameter , and the coefficients of the fixed effects of the model as follows:

Similarly, the conditional distribution of can be denoted as , i.e.,where

The proof for , , and is straightforward. However, the corresponding proof for the derivation of conditional distribution of and necessitates some algebra manipulation (see proofs in Appendix B). Therefore, since all the full conditional distributions have a closed form solution, samples are generated from the posterior distributions using the Gibbs sampling method [39].

##### 3.3. Predictive Distributions

Using estimates of the parameters under the primary geographic classification , we need to predict at area level with overlapping geographic boundaries. We assume that the areal relationships from are largely maintained for the prediction area as they cover the same population and the same whole areas. This led to reusing all the parameters with few adjustment of spatial correlation according to the new geographical classification.

Let be the second geographical area where the ultimate prediction is required. Here we have a new neighborhood weight matrix based on prediction areas. Let also be the known matrix of area-level auxiliary covariates based on second geographical boundaries. Hence, the mean process component, the spatial random component, and the individual-level error component corresponding to equation (4) for the prediction area will be computed separately and joined finally. Specifically, change of support is used to derive the probability distribution of the mean process component at prediction areas (second geographical boundaries).

The change of support for the mean process in prediction area is expressed aswhere denotes the area of the areal unit . However, the integral in equation (14) is difficult to compute, and there are approximation techniques available to resolve this problem [18]. However, the approximations are best suited for situations when . These approximation techniques are derived using the concept of the posterior predictive distribution in the Bayesian inference approach (see Appendix C).

Thus, using the auxiliary covariates , the posterior predictive distribution of is written as

On the other hand, for the spatial random effect, a new closest neighbor weight matrix based on *s* neighborhood in Moran’s I-matrix and a new orthogonal multiresolution spatial basis based on the distances between knot points and *s* centroids are used. Thus, is obtained by incorporating in equation (5), where is the basis function containing the neighborhood weight matrix generated from at . and , respectively, are the multiresolution basis and the eigenvectors of Moran’s I operator matrix based on area , i.e., SA2s.

Similarly, the term is derived from the predictive conditional Gaussian distribution with variance and mean 0. The predictive distribution of is therefore produced using the composition of draws and MCMC samples from the posterior distributions and posterior predictive distribution. Hence, the predictive equation for MCMC sample may be stated in terms of area as

##### 3.4. Estimation and Evaluation Metrics

Measures of precision play crucial role in small area estimation. For as the number of MCMC samples after removing the burn-in period followed by thinning, by the ergodic theorem for Markov chains [40], converges to and to as . Hence, the estimate of and its corresponding posterior variance for the area are obtained directly from the predictive distribution of . Accordingly,

We also considered mean absolute error (MAE) and root mean square error (RMSE) to evaluate the model performance. These metrics were defined using the posterior mean of the proportion, , of a small area and iteration from the true population proportion, , in two ways: for an individual small area, and an average across all individual small areas. So, the most accurate estimate will lead to the smallest RMSE and/or MAE value going to zero.

The mean square error was computed as the arithmetic mean of squared deviations of from , and hence the was the square root of .

Similarly, MAE is given as

#### 4. Illustrative Example

A simulation study was conducted as an illustrative example to make the ideas described in the proposed model clear and understand how the model may be put into practice. We started it by describing the simulation setup. The sensitivity analysis of the predictive posterior distribution for the changes in the values of hyperparameters was done. Furthermore, precision measures of the estimates from the proposed model were demonstrated along with the direct estimates and population values.

##### 4.1. Simulation Setup

A regular grids observation locations on a unit square, with cells and cells, were considered as primary geographic areas with survey data and prediction area (secondary geographic area overlapping with primary geographic area), respectively (Figure 2(a)). A binary outcome variable and two auxiliary covariates with 3000 population observations were generated. Ten percent of the population was taken as sample observations.

**(a)**

**(b)**

Values for the parameters of the priors (i.e., hyperparameters) are assumed to reflect a fairly vague knowledge of the prior distributions. In case of inverse gamma distribution, smaller hyperparameters reflect vague knowledge. However, as noted in [41], inverse gamma priors with extremely small hyperparameters lack a proper limiting posterior distribution. As a result, we considered proper and weakly informative inverse gamma priors with reasonable values of the shape hyperparameter and the rate hyperparameter , which is frequently recommended in literature [42, 43]. On the other hand, hyperparameters for the normal priors were considered under sensitivity analysis.

We generated an MCMC sample of simulated datasets from the full conditionals of the hierarchical model. Bi-square basis function centers with two resolutions: resolution one containing basis functions and resolution two containing basis functions, were considered (Figure 2(b)). Therefore, in the simulation, there were spatial basis functions in total.

##### 4.2. Sensitivity Analysis

Sensitivity to the hyperparameter specification is a crucial part in a hierarchical Bayesian model. Assessing “complete factorial” sensitivity to our prior specifications is exceedingly difficult because of the model’s complexity and size. However, it is suggested that sensitivity studies of one parameter at a time be performed by rerunning the Gibbs sampler with various values for each parameter [44]. Hence, we investigated the sensitivity analysis for the normal prior. The sensitivities were mainly assessed by visual inspection of diagnostic plots (Figure 3) as well as using mean absolute error (MAE) and root mean square error (RMSE) (Table 1).

The model with the lowest RMSE and MAE is thought to perform better. RMSE and MAE were computed using different hyper-prior values from the table above, ranging from informative to vague (non-informative). At and , the RMSE and MAE values are and 0.40, respectively. Moving in the opposite direction, the next hyper-prior values are and , with 0.02 and 0.03 increments, respectively. The step that follows produces the largest change increments of 0.04 and 0.06 for RMSE and MAE, respectively. After that, the increment becomes nearly constant. The RMSE and MAE values revealed that our model is not very sensitive to the hyperparameter selection; in other words, the observed changes in RMSE and MAE values were of a similar magnitude.

Visual inspection of the trace plots was also considered to complement analysis of RMSE and MAE (Figure 3). From this figure, we can observe that plots corresponding all parameters except and showed similar convergence. Furthermore, the plots corresponding to a prior shows better convergence. This visual inspection was supported by Geweke’s test of convergence for each parameter under each scenario (Table 2). Hence, we considered a normal prior with mean zero and variance ten, , for our estimation and measures of precision.

##### 4.3. Estimates and Precision Measures

Using samples from the MCMC algorithm and posterior predictive distribution along with the new information from SA2, estimates of probability and all precision measures were computed, including but not limited to 95% credible intervals (CIs). For comparison purpose, the direct estimate and the population values for each small areas were demonstrated.

From Figures 4 and 5, we can see that our proposed model is better than direct estimate by considering several perspectives. Firstly, it enabled us to obtain estimates where we could not have from direct estimate, i.e., the line representing the direct estimates is broken for some areas. Secondly, if we look at the overall pattern of the lines, the deviation of the SAE line is less than the deviation of direct line from the population line. Hence, our proposed model yields estimates for areas where we could not have direct estimates and the estimates from our proposed model are precise as compared to the direct estimates.

**(a)**

**(b)**

**(c)**

From Figures 6 and 7, we can understand that the proposed model yields precise estimates as compared to direct estimates.

Different scenarios for the coefficients of auxiliary variables were considered to check the sensitivity of hyperparameters in the proposed model, and we found that weakly informative priors were better in the proposed model. However, it is recommended to conduct further studies assessing all combinations of hyperparameters in the proposed model. In this perspective, further research is needed.

#### 5. Concluding Remarks

In this study, we have developed a spatial hierarchical Bayesian small area model for a binary response variable under spatial misalignment as non-trivial extension of existing hierarchical Bayesian small area model for binary data under spatial misalignment. Fusion model (considering both area-level and unit-level data) was considered. The process models derived from the predictors were used to construct the basis as a means of alleviating the issue of collinearity between the fixed effects and the spatial random effects. All the full conditionals have closed form. Sensitivity analysis was done via simulation. The developed model can be applied in many broad areas including but not limited to health sciences, public health, agriculture, and economics.

#### Appendix

#### A. Posterior Distribution

For ease of expression, hereafter we denote , , , and . Hence, the conditional distributions under data models, process models, and parameter models used in posterior distribution are given as follows.

Data models:

Process models:

Parameter models:where .

Accordingly, the posterior distribution is written aswhich is proportional to a product of the kernels only after removing terms that do not involve the parameter of interest that can be written as

#### B. Derivation of Full Conditionals

(1)The full conditionals for . The full conditionals for can be proved by taking all terms with from equation (A.5) as which is the kernel of inverse gamma distribution with shape and scale parameters respectively.(2)The full conditionals for . The full conditionals for can be proved by taking all terms with from equation (A.5) as which has the form of inverse gamma distribution with a shape parameter of and scale parameter of(3)The full conditionals for . The full conditionals for can be proved by taking all terms with from equation (A.5) as Hence, the full conditional distribution of has the form of inverse gamma distribution with scale parameter and shape parameter(4)The full conditionals for . The full conditionals for can be proved by taking all terms with from the full joint posterior distribution equation (A.5) as Here, applying properties of determinant and inverse of a matrix as well as the relationship , we get implying which is inverse gamma distribution with parameters(5)The full conditionals for . The corresponding proof for the conditional distribution of is as follows. Taking all terms with the from the full joint posterior distribution equation (A.5), which is multivariate normal distribution with covariance and mean vector#### C. Predictive Distribution

The posterior predictive distribution is the distribution of potential unobserved values conditional on the observed values. The posterior predictive distribution for unobserved values conditional on the observed values is defined as follows:

This can be derived from the Bayes rules:

We have

Now integrating out the nuisance variable on both the most left and right sides in equation (C.3) yields

The right hand side formula in equation (C.4) appears to have a Markov-type assumption, . Alternatively, by the law of total expectation and Fubini’s theorem [45], applied to any bounded measurable function defined on the relevant sample space , we observe that

Obviously, the under-braced components of the left hand side and the right hand side in equation (C.5) are equal, and hence it is equivalent to equation (C.4). That is, for all bounded measurable functions, we conclude that

Now, replacing by , by and in equation (C.6), we find the predictive posterior distribution of .

#### Abbreviations

BYM: | Besag–York–Mollie |

CAR: | Conditional autoregressive |

CI: | Credible interval |

ICAR: | Intrinsic conditional autoregressive |

MAE: | Mean absolute error |

MCMC: | Markov Chain Monte Carlo |

RMSE: | Root mean square error |

SAE: | Small area estimation. |

#### Data Availability

The data used to support the findings of this study are included within the article.

#### Disclosure

The views expressed in this publication are those of the authors. The funder had no role in manuscript writing, editing, approval, or decision to publish.

#### Conflicts of Interest

The authors declare that there are no conflicts of interest.

#### Authors’ Contributions

KFM and SMM were responsible for conceptualization and software. KFM, SMM, and AKW were responsible for methodology, validation, investigation, original draft preparation, review and editing, and visualization. SMM and AKW were responsible for supervision. KFM was responsible for formal analysis, resources, data curation, project administration, and funding acquisition. All authors have read and agreed to the published version of the manuscript.

#### Acknowledgments

We would like to express our appreciation to Pan African University Institute for Basic Sciences, Technology and Innovation for the support. We are grateful for the feedback and comments obtained from conference presentations including the th Sub-Saharan African Network (SUSAN) Conference of the International Biometric Society, the BIG4small Small Area Estimation Conference 2021, and the 62nd Annual Conference of the South African Statistical Association.