Research Article | Open Access
Ye-Mao Xia, Jian-Wei Gou, "Assessing Heterogeneity for Factor Analysis Model with Continuous and Ordinal Outcomes", Journal of Applied Mathematics, vol. 2016, Article ID 1648462, 12 pages, 2016. https://doi.org/10.1155/2016/1648462
Assessing Heterogeneity for Factor Analysis Model with Continuous and Ordinal Outcomes
Factor analysis models with continuous and ordinal responses are a useful tool for assessing relations between the latent variables and mixed observed responses. These models have been successfully applied to many different fields, including behavioral, educational, and social-psychological sciences. However, within the Bayesian analysis framework, most developments are constrained within parametric families, of which the particular distributions are specified for the parameters of interest. This leads to difficulty in dealing with outliers and/or distribution deviations. In this paper, we propose a Bayesian semiparametric modeling for factor analysis model with continuous and ordinal variables. A truncated stick-breaking prior is used to model the distributions of the intercept and/or covariance structural parameters. Bayesian posterior analysis is carried out through the simulation-based method. Blocked Gibbs sampler is implemented to draw observations from the complicated posterior. For model selection, the logarithm of pseudomarginal likelihood is developed to compare the competing models. Empirical results are presented to illustrate the application of the methodology.
Owing to its wide applications in behavioral and social science researches, analysis of factor analysis models with mixed data structure has received a lot of attention; see [1–6]. However, most of these methods are mainly developed within particular parametric distribution families such as the exponential family or normal scale mixture family, which have a limited role in dealing with the distributional deviations, in particular heterogeneity or multimodality of the data. Though some robust methods are developed to downweight the influence of the outliers [7–12], most of them are still confined to dealing with unimodality and are less effective for the asymmetric and/or multimodal problems.
Recently, some authors focused on the Bayesian semiparametric modeling for latent variables model. For multivariate categorical data analysis, Kottas et al.  extended the traditional multivariate probit model [14–16] to a flexible underlying prior probability model. The usual single multivariate normal model for the latent variables is replaced by a mixture of normal priors with infinite number of components. And, for the latent variable model with fixed covariates and continuous responses, Lee et al.  established the semiparametric Bayesian hierarchal model for the structural equation models (SEMs) by relaxing the common normal distribution of exogenous factors to follow a finite-dimensional Dirichlet process . Song et al.  developed a semiparametric Bayesian procedure for analyzing the latent variable model with unordered categorical data. For some recent advances in semiparametric analysis for factor analysis model, see [20–23] among others.
In this paper, we developed a Bayesian semiparametric approach for analyzing factor analysis model with mixed continuous and ordinal responses. The methods are twofold. Firstly, we extended Kottas, Müller, and Quintana’s model to a more general multivariate model which contains factor variables. This extension aims to interpret the relationships between measurements and latent variables and explore correlations among the multiple manifest variables. Moreover, we treat the threshold parameters as unknown and estimate them simultaneously with other model parameters, thus providing a more flexible approach to fit the data. Secondly, we introduce the truncated Dirichlet process prior as the prior of the mean vector and variance-covariance parameters of unique errors and latent variables. This facilitates the interpretation of heterogeneity in the mean and/or covariance structure across the subjects.
This paper is organized as follows. We first introduce the Bayesian semiparametric modeling framework for factor analysis model with continuous and ordinal variables. We then present the Markov chain Monte Carlo procedure for parameters estimation and model selection. Simulation studies and a real example are provided to illustrate the performance of the proposed procedure. We close with some remarks and concluding comments.
2. Model Description
2.1. Factor Analysis Model with Continuous and Ordinal Responses
Suppose that a -dimensional mixed observed vector contains continuous variables and ordinal variables with taking an integral value in for , . We assume that the observed ordinal vector is related to the unobserved continuous vector throughwhere , is a set of unknown threshold parameters that define the categories: . Hence, for the th variable , there are categories.
Let denote the vector of continuous observed measurements and unobserved variables. For subject , we formulate the dependence among ’s through the following measurement model:where is a intercept vector, is a factor loading matrix, is an vector of latent variables, and is a vector of measurement errors which is independent of . In many applications, may represent the hypothesized factors underlying manifest responses and/or unobserved heterogeneity not explained by covariates.
The latent variable model with mixed continuous and ordinal responses defined by (1) and (2) faces two sources of identification problems. The first one is associated with the determinacy of latent variables in modeling of categorical variables, and the second one is related to the uniqueness of the factor loadings matrix. To solve the first problem, we use the common method  to fix endpoints and () at preassigned values. For the second problem, we follow the usual practice in structural equation modeling to identify the covariance matrix of by fixing appropriate elements in at preassigned values.
Let be the parametric vector formed by the unknown parameters contained in and let denote the free parameters contained in factor loading matrices and with . Based on the assumptions of (2), the conditional distribution of given is a normal distribution with mean vector and covariance matrix .
Note that the latent factors here play an important role in characterizing the associations between the observed variables. It can be seen clearly that and are dependent when is integrated out. The marginal density of is given by within which is the standard normal cumulative distribution function.
2.2. Bayesian Semiparametric Hierarchical Modeling
Let be the conditional density of given and denote by a prior distribution function of . Suppose that is proper; we define the following mixture density: in which is the conditional distribution of given . By taking a prior for and restricting to be a parametric family of distributions indexed by , we complete the Bayesian parametric model specification. However, this restriction severely constrains the estimation of and produces estimators that shrink data values toward the same points. A more flexible modeling for is to treat as random and assign a prior for it. For this end, we introduce a latent variable vector and assume that, given , ’s are conditionally independent and drawn from . Furthermore, we suppose that ’s are independent and identically distributed (i.i.d.) according to with a prior on it. As a result, we break the mixture model into where “ind” means “independent” and is a prior of .
We consider the following truncated version of Dirichlet process for :in which denotes a discrete probability measure concentrated on atom and , independent of , are random weights constructed through the following stick-breaking procedure:with ; ’s are i.i.d with common distribution .
Truncated Dirichlet process prior (7) can be considered as a truncation version of Dirichlet process [25–30] in the nonparametric Bayesian analysis. It can be shown that, under (7) and (8), for any Borel set in ,This indicates that can be served as the starting point or guess of and determines the concentration of the prior around . In practice, the value of is either set to a large, predetermined value (e.g., ) or chosen empirically. For instance, Ishwaran and Zarepour  suggested that the adequacy of the truncation level, , can be assessed by evaluating moments of the tail probability. Our simulation results have shown that is more than adequate for the model considered in the present context.
Now, we specify the distribution . Recalling that by convention is the collection of , hence, we assume thatwhere , and are hyperparameters, is a diagonal matrix with the th diagonal element , and is an positive definite matrix; refers to the inverse gamma distribution with shaper parameters and scale parameters , respectively, and denotes the inverse Wishart distribution .
3. Parameters Estimation and Model Selection
3.1. Prior Specifications and Estimation via Blocked Gibbs Sampler
Let . To implement Bayesian analysis, blocked Gibbs sampler is used to simulate observations from the posterior. The key for blocked Gibbs sampler is to recast model (11) completely by introducing the cluster variables such that Consequently, the semiparametric hierarchical model (11) can be reformulated as the following framework:where is a prior of , is the stick-breaking prior given by (8) with , and is the joint distribution of given byin which is given in (10).
For the Bayesian analysis, we need to specify priors for the parameters involved in the model. The whole parameters can be divided into two parts: parametric component part and nonparametric component part . For the parametric components, we assume that withwhere is a column vector that contains unknown parameters in the th row of .
Let , , and . Posterior analysis in relation to the complex is carried out through the data augmentation technique . Specifically, we treat the latent quantities as missing data and augment them with the observed data. A sequence of random observations is generated from the joint posterior distribution by the blocked Gibbs sampler [31, 34], coupled with the Metropolis-Hastings algorithm [35, 36]: given at the th iteration draw from , , , , , , , draw from , , , , , , draw from , , , , , , draw from , , , , , , , draw from , , , , , , ,and form . It can be shown that as tends to infinity, the empirical distribution of converges to at any geometrical rate. The full conditional distributions and the implementation of the above algorithm are given in the Appendix.
3.2. Model Selection
Model selection is an important issue in Bayesian semiparametric modeling for latent variable model since it is of practical interest to compare different modelings for factor analytic models. Formal Bayesian model selection is accomplished by comparing the marginal predictive distribution of data across models. Consider the problem of comparing competing models and . Let and denote the marginal density of data under and , respectively. A popular choice for selecting models is achieved via Bayes factor (BF) (e.g., [37–39]). However, in view of the fact that computing BF involves the high-dimensional density which is hard to estimate well, we prefer comparing the following logarithm of pseudomarginal likelihood (LPML) [40, 41]:where is known as the conditional predictive ordinate (CPO) defined asHere, is the data set with removed. Obviously, from (17), we can see that is the marginal posterior predictive density of given and can be interpreted as the height of this marginal density at . Thus, small values of LPML imply that does not support the model.
Based on MCMC sample already available in the estimation, a consistent estimate for LPML can be obtained via ergodic average given by
It is noted that, under our proposed model, which is complicated due to the existence of and . This can be solved by Monte Carlo approximation. Specifically, given the current values at the th iteration, we draw (i) from and (ii) from for and then evaluate at the observation through Obviously, the distributions involved in (i) and (ii) are standard and sampling is rather straightforward and fast.
4. A Simulation Study
In this section, a simulation study to evaluate the performance of the proposed procedure is conducted. The goal is to assess the accuracy of estimates under parametric, partly exchangeable, and semiparametric modelings when data take on the multimodality or heterogeneity. We consider the situation in which each observed vector consists of three-dimensional continuous vector and three-dimensional ordinal vector with threshold values . We generate by first generating with from the mixture of two factor analytic models with weights 0.45 and 0.55 and then transforming into () via (1) to create the ordinal observations, where represents a observed continuous random vector and is a latent continuous random vector. Each component in the mixture model is specified through the following measurement model: for ,The parameters involved in the components of mixture model are taken as , , , ,in which is a vector with all elements equal to one and is a identity matrix. The elements with asterisks involved in loading matrix and threshold parameters are treated as fixed for identifying model (see Section 2.1). Based on these settings, random sample with size 500 is generated and 100 replications are completed for each combination.
Prior inputs in the prior distributions involved in the parametric components (see (14)) are as follows: and are diagonal matrices with the diagonal elements 1.0, and elements in are equal to the true values, while prior inputs in the prior distribution of superparameter (see (15)) are , , , , , , and . Note that these values ensure approaching noninformative priors.
A few test runs are conducted to explore the effect of truncated levels on the estimates of unknown parameters and the convergence of the blocked Gibbs sampler. We take , , , , , , , and 300 and calculate the total sum of the root mean square (RMS) of estimates (see below for details). The resulting values are 1.9830, 1.7382, 1.6582, 1.5548, 1.4194, 1.4128, 1.4108, and 1.4101, respectively. It can be seen that the total sum of the root mean square (RMS) becomes rather stable when . In the following analysis, we set in our data analysis. For the threshold parameters , we choose (see Appendix) in MH algorithm to produce the acceptance rate about 0.40. Figure 1 gives the plots of EPSR (estimated potential scale reduction ) values of unknown free parameters in , , and against iterations for three groups of different starting values. It can be seen that the estimates converge in less than 1000 iterations. To be conservative, in the following analysis, we collect 3000 observations after 2000 “burn-in”s deleted to take posterior analysis. We first consider the performance of the proposed LMPL in model comparison. We compare the proposed model with the parametric model (denoted by PARA) and the partly exchangeable model (denoted by PAEX), which approximately correspond to and under our proposal, respectively. The parametric model is defined by The priors of the parameters are given by , , and
The partly exchangeable model is given by where are i.i.d. with distribution given in (5); the priors for the unknown parameter vector and hyperparametric vector are, respectively, given in (14) and (15).
Under the foregoing settings for the hyperparameters, observations obtained through the blocked Gibbs sampler are used to compute the values of LPML for each scenario across 100 replications. For the parametric and partly exchangeable model, computing values of LPML is very straightforward and standard. For the semiparametric model, we draw 50 observations for approximating . The values of LPML under parametric model, semiparametric model, and partly exchangeable model are, respectively, −6684.740, −6255.553, and −8487.259 with standard deviations 62.509, 151.480, and 147.742. Based on the LPML criteria, semiparametric model is selected, which is consistent with the fact that the true model takes on the multimodes. Moreover, according to our empirical results, the correct rates of LPML selecting the true model across 100 replications are about 0.93.
Table 1 gives the biases (BIAS), root mean squares (RMS), and standard deviations (SD) of estimates of unknown parameters across 100 replications under semiparametric models and parametric and partly exchangeable model, respectively. The measures BIAS, RMS, and SD are given as where is the number of replications. It can be seen that estimates obtained through the proposed approach are reasonably accurate. The values of under our approach are smaller than those under parametric and exchangeable modelings in terms of the absolute values of BIAS and RMS. The results show that ignoring heterogeneity among the data may lead to biased estimates and incorrect interpretation of the analyzed phenomena. This also reflects that the factor loadings are not robust against the distributional deviations of inceptor, variance of unique errors, and covariances of latent factors.
Further simulation study is conducted to assess the performance of the proposed model and parametric model as well as the partly exchangeable model when data are generated from a single normal distribution. The population values of parameters are taken as , , and The values of factor loadings and threshold points are the same as those in previous mixture model. As usual, we take for truncated levels. The sample size is set to 62 which is analogous to the real example. The inputs for superparameters involved in priors are set the same as that in mixture model. The results based on 100 replications are summarized in Table 2.
Based on Table 2, it can be found that the results obtained from our proposal are rather reasonable when compared to normal model, while partly exchangeable model gives serious biases. Moreover, we consider different inputs of superparameters in priors and find that the estimates are rather robust.
5. A Real Example
To illustrate the proposed procedure with a real example, a political-economic risk data set  was analyzed, which is adopted from Henisz’s  political constraint index data set (POLCON), Marshall et al.  state failure problem sets (PITF), and Alvarez et al.’s  ACLP Political and Economic Database (ACLP). The data set is formed by the two economic indicators and three political variables from 62 countries. The first index is the log black market premium (BMP). This is a continuous variable which is usually used as a proxy for illegal economic activity. The second index is log real gross domestic product (GDP). It is used to measure the productivity of a country. The third variable is a measure of independence of the national judiciary. This is a binary variable: it takes 1 if the judiciary is judged to be independent and 0 otherwise. The next measurement, measuring the level of lack of expropriation risk threat (LE), is an ordered categorical variable coded with 0, 1, 2, 3, 4, and 5. The last variable is an expert judgment of measuring lack of corruption (LC). It is also an ordered categorical variable scaled with 0 to 5. The total sample size is 62 and the frequencies of each category occurring are equal to , , and , respectively. To unify scales of the continuous variables, the corresponding raw data were standardized.
Let = (, , IJ, LE, LC) be the vector of the observed variables. Based on the objective of this example, it is natural to group (i) to an endogenous latent variable that can be interpreted as “economic factor, ” and (ii) to an exogenous genotype latent variable that can be interpreted as “political factor, .” Hence, the following loading matrix in the measurement equation with is considered: in which the ones and zeros are treated as known. Although other structures of could be used, here we consider a nonoverlapped structure for clear interpretations of the latent variables: measures the effect of on the observed variable . Since the third variable is binary and the last two variables are measured on a six-point scale with each involving six thresholds, for model identification, we fix and endpoints of thresholds , and at −1.8486, 0.7527, −1.4007, and 1.0574, respectively. These fixed threshold values were chosen via , where are observed marginal proportions of the categories with .
By primary data analysis, we find that the skewness and kurtosis of the first two variables are and , respectively. We also evaluate the predictive density function for continuous variables. Figure 2 gives the contours of posterior predictive density of pair under parametric model and semiparametric model (see below) based on grids. It can be seen that the data for pair are heavy-tailed and the predictive density under semiparametric model captures the high frequency region successfully while parametric model fails. For model comparison, we consider the following competing models: The following two types of prior inputs are, respectively, used for the hyperparameters involved in the parametric components and semiparametric components: (I) , , , , , , , , , , , and , where denotes the maximum likelihood estimates of under parametric model from analysis of a “control-group” sample and is the polychoric correlation matrix obtained on the basis of single confirmatory factor analysis model; (II) , , , , , , , , , , and Note that prior (I) gives more information than prior (II) since it partly takes advantage of information from sample.
The proposed Bayesian semiparametric approach with was applied to calculate the values of CPO and LPML. We draw 100,000 effective observations from the corresponding posteriors via the blocked Gibbs sampler and divide them into 100 batches equally. Table 3 gives the means and standard deviations of LPML under priors (I) and (II). The following facts can be found. (i) The values of LPML under prior (I) are larger than those under prior (II). This indicates that the LPML tends to choose the model with informative prior. (ii) give the largest value. Among the posited models, is selected. We also compute the values of LPML for parametric model. They are and under priors (I) and (II) with standard deviations 6.287 and 5.065, respectively. Therefore, the data support the semiparametric model instead of parametric model.
Table 4 presents the estimates of factor loading as well as their standard deviations with semiparametric and parametric model under prior (I). The factor loading estimates in the measurement equation can be interpreted according to a standard confirmatory factor analysis model. The difference between the two approaches is obvious: the estimates of and under parametric model are only half of those under semiparametric model. Moreover, the standard deviations of estimates with parametric method are uniformly larger than that of semiparametric model. Since we identify illegal economic activity with economic factor () and independent of judiciary with political factor (), respectively, the level of economic factor has a negative effect on real gross domestic product, while the level of political factor has positive effect on lack of expropriation risk threat and lack of corruption. The estimate indicates that a one-unit increase in the level of economic factor leads to 0.123-unit decrease in the magnitude of gross domestic product. The interpretation of and is similar. The differences of estimates between parametric and semiparametric methods illustrate the effects of heavy tails of the data on the estimates.
6. Concluding Remarks
Parametric modeling for latent variable model with mixed data structure has long dominated Bayesian inference work, typically developed within the standard exponential family. Such modeling is often confused with handling the multimodal and unknown heterogeneous problems. In dealing with multimodality or increased heterogeneity in data, one naturally resorts to the finite mixture model [47, 48] which is more flexible and feasible to implement due to advances in simulation-based model fitting.
Rather than handling the large number of parameters resulting from the finite mixture models with a large number of components, we consider, in this paper, the finite-dimensional Dirichlet process mixture model for latent variable model with continuous and ordinal responses. The core of our proposal is to model the mean vector and/or variance-covariance parameters of unique errors and latent variables into the finite-dimensional stick-breaking priors. This will help to reveal the local dependence structure such as classification groups and clustering among the data. The blocked Gibbs sampler developed by Ishwaran and Zarepour , which takes advantage of the block updating and accelerates mixing in Gibbs sampling, is adapted here to cope with the posterior inference.
Full Conditional Distributions
(1) Full Conditional Distribution , , , , , . To draw from , , , , , , we implement it by (i) drawing from , , , , , , and (ii) drawing from , , , , , , . The underlying reason is that drawing from the joint conditional distribution as proposed here is more efficient than drawing and separately from the corresponding marginal conditional distribution (see Liu , Nandram and Chen , and Song and Lee ).
It can be shown that , , , , , , , not involving and , is given by where , , and . Further,where is the cumulative distribution function of . It is difficult to sample from since this target distribution is nonstandard. We follow Cowles’ routines  and use Metropolis-Hasting (MH) algorithm to sample observations from this complex conditional distribution. Specifically, given the current values at the th iteration, generate a candidate vector from the following truncated normal distribution:Accept this candidate as with the probability , where As pointed out by Cowles (1996) , the quantities should be chosen carefully such that the average acceptance probability is about 0.30 or more.
For , without loss of generality, we assume that the elements in are all free. Let . Under the prior distributions given in (14), we havein which
(2) Full Conditional Distribution , , , , , , , . It can be shown that the conditional distribution of is given by where .
(3) The Full Conditional Distribution , , , , , , . It is clear thatLet be the number of equal to , for . It can be shown that is a generalized Dirichlet distribution, with , , which is constructed by where
For , , , , , , , , , , , let be the unique set of , , and corresponding to those values in with excluded. Then, Let , , and , and note that . The components of are easy to sample based on (14). Further, which can be implemented by drawing: for in which
(4) Full Conditional Distribution , , , , , , . It can be shown that where and is a normalized constant such that .
(5) Full Conditional Distribution , , , , , , , . Based on the priors given in (15), the full conditional distributions for components of hyperparameters are given as follows: