Abstract
Current procedures for estimating compensatory multidimensional item response theory (MIRT) models using Markov chain Monte Carlo (MCMC) techniques are inadequate in that they do not directly model the interrelationship between latent traits. This limits the implementation of the model in various applications and further prevents the development of other types of IRT models that offer advantages not realized in existing models. In view of this, an MCMC algorithm is proposed for MIRT models so that the actual latent structure is directly modeled. It is demonstrated that the algorithm performs well in modeling parameters as well as intertrait correlations and that the MIRT model can be used to explore the relative importance of a latent trait in answering each test item.
1. Introduction
Item response theory (IRT) is a popular approach used for describing probabilistic relationships between correct responses on a set of test items and continuous latent traits (see [1–4]). IRT models have also been used in other areas of applied mathematics and statistical research. Some examples include US Supreme Court decision-making processes [5], alcohol disorder analysis [6–9], nicotine dependency [10–12], multiple-recapture population estimation [13], psychiatric epidemiology [14–16], longitudinal data analysis [17, 18], latent regression models [19, 20], and missing data analysis [21].
IRT has the advantage of allowing the inference of what the items and persons have on the responses to be modeled by distinct sets of parameters. As a result, a primary concern associated with IRT research has been on parameter estimation, which offers the basis for the theoretical advantages of IRT. Specifically, of concern are the statistical complexities that can often arise when item and person parameters are simultaneously estimated (see [1, 22–24]). More recent attention has focused on fully Bayesian estimation where Markov chain Monte Carlo (MCMC) simulation techniques are used (e.g., [25, 26]). Over the past decade, MCMC has been implemented in the context of IRT models where one latent trait is assumed (e.g., [3, 27–29]) as well as to models where multiple traits are considered (e.g., [30–36]), for a thorough review on the historical and current developments of MCMC in terms of IRT, see [37].
The compensatory multidimensional IRT (MIRT; [38]) model assumes that each item measures multiple latent traits. It differs from some other dichotomous models insofar as it has an additional source of model indeterminacy that creates difficulties when using MCMC. Some techniques have been developed to approach this problem by imposing a special structure that constrains the item slope parameters [30, 36, 39]. However, these approaches do not directly model the actual interrelation between the distinct latent traits and, thus, are limited in certain applications. In view of the above, the present aim is to derive an efficient MCMC algorithm via Gibbs sampling [40] that (a) obviates the additional source of model indeterminacy associated with the MIRT model and (b) directly models the underlying latent trait structure. The MIRT model considered herein is presented in normal ogive form as more complicated MCMC procedures would have to be adopted for the logistic form (e.g., [3, 28, 35, 36]). Further, given that parametric probability functions of correct responses are usually modeled by a normal ogive or a logistic function and noting that the logistic and normal ogive forms of the IRT models are essentially indistinguishable in terms of model fit or parameter estimates (given proper scaling, see [41]), MCMC procedures for logistic models are not considered.
The remainder of this paper is organized as follows. In Section 2, the two-parameter normal ogive (2PNO) MIRT model is outlined. In Section 3, the Gibbs sampler is derived, and the prior specifications for the model parameters are described. Section 4 gives examples of implementing the Gibbs sampling algorithm in the context of simulated and real data to demonstrate the proposed methodology.
2. Preliminaries
The MIRT model is introduced by considering a test that consists of dichotomous items with each measuring latent traits. Let denote a matrix of responses to the items where () if the th person answers the th item correctly (incorrectly) for and .
Definition 2.1. The probability of the th person obtaining a correct response on the th item is defined for the 2PNO MIRT model as The vector denotes latent trait parameters associated with the th person, and the vector denotes nonnegative slope parameters where larger values of have more influence on determining a success on the th item. The intercept parameter denotes the location in the latent space where the th item is maximally informative, and denotes the unit normal cdf. The model in (2.1) is also referred to as a compensatory MIRT model [38] because a low level of in one dimension can be compensated by a high level of in another dimension.
Remark 2.2. If the vector of slope parameters in (2.1) is such that , then the MIRT model reduces to the 2PNO multi-unidimensional model as
where the test involves multiple parameters of and where each item measures one of these latent variables (see [31, 32]). The difference between (2.1) and (2.2) is analogous to the distinction made between factor analysis and that with a rotation to achieve a simple structure [42]. As such, (2.2) can be viewed as a special case of (2.1) where each item measures only one of the several latent traits. Further, the two models differ in that (2.1) is exploratory whereas (2.2) is confirmatory in nature.
The unidimensional IRT model, which has a systematic component form of , has a well-known identification problem in terms of location and scale invariance (e.g., [43]). Common practices of resolving this problem are to impose some constraint on the item parameters, that is, and , or select some specific values for the location and scale parameters for the prior normal distribution of , for example, (see, e.g., [3, 27–29, 43]). Further, Bafumi et al. [5] proposed using a parameter transformation to approach the identification problem in the context of unidimensional IRT models. More specifically, the model parameters are transformed using a normalization procedure after estimation is completed. Bafumi et al. [5] noted that this transformation procedure obviates the problem of elusive convergence that results from highly correlated samples.
In terms of the multi-unidimensional IRT model in (2.2), Lee [31] extended Tsutakawa’s [43] approach by adopting a constrained covariance matrix for the latent traits and modeling the constrained covariance matrix indirectly. Lee’s [31] method not only solves the model indeterminacy problem, but also appropriately estimates the interrelationship between multiple latent traits (see also [32, 44]).
The more general MIRT model, as defined in (2.1), involves a new source of model indeterminacy called rotational invariance and is statistically more complicated than the unidimensional or multi-unidimensional models. As such, a Gibbs sampler is subsequently derived based on the ideas suggested in [5, 31] to address the general MIRT model identification problems and to model the latent structure directly.
It is noted that in an effort to develop computer software, Sheng [45] has shown that the approaches based on [5, 31] are useful for the 2PNO additive MIRT model, whose systematic component for modeling takes the form . The model assumes that each item measures two latent traits: , a common latent trait that all items measure, and , a latent trait that is specific for items in the th subtest. The difference between the model in [45] and the general MIRT model presented herein is comparable to that between a bifactor model (see [46]) and a general factor analysis model. The two models assume different latent structures, and hence the approaches for resolving their model indeterminacies are not the same.
3. The Gibbs Sampler
The derivation of the Gibbs sampler associated with the MIRT model defined in (2.1) begins by considering a multivariate distribution for and a linear transformation on it, which will be based on the following definitions.
Definition 3.1. Let , where is a constrained covariance matrix or a correlation matrix, with 1’s on the diagonal and with correlations (between and ) on the off-diagonal.
Definition 3.2. Let , where and , where is an diagonal matrix. Note that this variance-correlation decomposition of [47] makes the interpretation easier [48] and is essential for modeling the correlation matrix indirectly while solving the model indeterminacy in the context of the MIRT model.
From Definitions 3.1 and 3.2, it can be shown that
where can be transformed from using
for . To obviate the identification problem associated with the unconstrained parameters, let be related with the item parameters ( and ) so that the likelihoods are preserved given
where the item parameters ( and ) will have to be constrained such that and . This leads us to the following proposition.
Proposition 3.3. If are constrained such that , then
Proof. It follows from (3.1) that , and thus, substituting into (3.3) gives Using (3.5), we can subsequently derive Setting in (3.6) and subsequently multiplying the left-hand side yields which leads to for . Hence, given the constraint that , each nonzero element in is .
To implement Gibbs sampling for the MIRT model in (2.1), a latent variable is introduced such that (see, e.g., [27, 49]). Further, from Definition 3.1, we assume that to ensure unique scaling for , which precludes the identification problem associated with such models (see [45]). Furthermore, for the unconstrained covariance matrix , we assume that . Thus, if with assumed prior distributions, then the joint posterior distribution of () is where is the likelihood function, with being the model probability function as defined in (2.1).
The proposed Gibbs sampler involves the following five steps:(1)sampling of the augmented parameters from(2)sampling of the latent variable (person) parameters from where and ,(3)sampling of the item parameters from where , assuming uniform priors and , or from where and , assuming conjugate normal priors , ,(4)sampling of the unconstrained covariance matrix from where is an inverse Wishart distribution, , and where is derived from (3.4),(5)a transformation from to .
In view of the additional model indeterminacy that results from the additive nature of , the parameters are further normalized after each Markov transition step is completed [5, 45]. More specifically, , , and are transformed () to the following normalized parameters: , and , where and represent the mean and standard deviation of . This rescaling preserves the likelihood because , while allowing the computation to proceed more efficiently [50]. Further, the transformation also assists in terms of speeding up the convergence of the Markov chains by reducing the posterior correlation in the posterior probability densities [51].
Thus, with initial starting values of , , and , the observations (i.e., , , , , and ) can be drawn or transformed iteratively from (3.10), (3.11), (3.12), (3.14), and (3.2) (or (3.13) in lieu of (3.12)), respectively. This iterative process continues for a sufficient number of samples after the posterior distributions reach stationarity (i.e., a phase commonly referred to as burn-in). The posterior means of all the samples collected after the burn-in stage are considered to be estimates of the model parameters (, ) and the hyperparameter ().
4. Numerical Examples
To demonstrate the methodology presented above, the proposed Gibbs sampler was implemented using both simulated and real data. In terms of simulated data, tests that measure two latent traits were considered. In particular, three (i.e., , , and ) dichotomous data matrices were simulated from the 2PNO MIRT model where the population correlation between the two latent traits was set to , 0.4, 0.6, respectively. The item parameters were generated randomly from uniform distributions so that , . Gibbs sampling was subsequently implemented to recover the model parameters assuming informative normal (i.e., and ) or uniform priors for . Convergence was evaluated using the Gelman and Rubin [52] statistic for each item parameter. While the usual practice is to use multiple Markov chains from different starting points, a single chain can also be divided into subchains so that convergence is assessed by comparing the between and within subchain variances (see [53]). In view of the fact that a single chain is more economical in the number of iterations needed, the latter approach was adopted. The posterior estimates of item parameters (), the intertrait correlation hyperparameter, and the associated Gelman-Rubin statistics were obtained and are listed in Tables 1, 2, and 3 (note that is denoted as in these tables).
The Gelman-Rubin statistic provides a numerical measure for assessing convergence for each item parameter. With values close to 1, it is determined that in the implementation of the Gibbs sampler, Markov chains reached stationarity with a run length of 10,000 iterations and a burn-in period of 5,000 iterations. The posterior estimates of the item parameters as well as the intertrait correlation hyperparameter are fairly close to the specified parameters, suggesting that the algorithm performs well in recovering these parameters when the latent dimensions have a low to medium correlation. Further, the two sets of posterior estimates, resulting from different prior distributions, differ only slightly from each other, signifying that the posterior estimates are not sensitive to the choice of noninformative or informative priors for the slope and intercept parameters.
In the context of real data, a subset of the College Basic Academic Subjects Examination (CBASE; [54]) English data was used to demonstrate the methodology. Specifically, these data contain independent binary responses of 1,200 college students to 41 multiple-choice items. The English test is further organized to have two subtests, namely, reading and writing, so that 25 items are in the reading subtest and 16 are in the writing subtest. It is noted that the test was designed in such a manner that it conforms to the multi-unidimensional model, as each item measures one of the two latent traits. However, one may use the more general MIRT model to explore the latent structure, and in particular, to assess individual test items (i.e., to determine if the trait mainly involved in answering each item agrees with the one that it is supposed to measure). This can be accomplished by examining the estimated slope parameters, as a larger corresponds to a latent dimension that is more important in determining a person’s success on the item. Hence, assuming uniform priors for , Gibbs sampling was implemented to fit the MIRT model to the CBASE data with a run length of 10,000 iterations and a burn-in period of 5,000, which was sufficient for the chains to converge. An examination of the posterior estimates of shown in Table 4 suggests that all 16 items in the writing subtest relies on the second dimension writing more than the first dimension reading. However, some items in the reading subtest, such as items 17, 19–26, 28, and 30, require further attention and modification, as they do not seem to measure mainly reading as the rest of the items do.
In summary, the proposed MCMC algorithm provides computationally efficient and accurate estimation in the context of both simulated and real data examples. Not only does the algorithm appropriately model parameters, but also the algorithm efficiently models the intertrait correlations for the compensatory MIRT model, which provides an exploratory approach for examining the latent structure of a test and detecting items that do not measure the trait they are designed to measure.
5. Concluding Remarks
The MCMC algorithm presented in this paper offers solutions for directly modeling the underlying structure of IRT models with multiple continuous latent traits. The algorithm works well when the actual intertrait correlation is low to moderate (less than 0.8), as a high correlation tends to result in high collinearity, which makes it difficult to distinguish among multiple latent traits and estimate them. With model parameters being accurately estimated, the compensatory MIRT model can be used to explore the relative importance of a latent trait in answering each test item. This is particularly useful when the underlying structure is not known, or when it is desirable to confirm the structure by examining the performance of individual items.