Abstract

Current procedures for estimating compensatory multidimensional item response theory (MIRT) models using Markov chain Monte Carlo (MCMC) techniques are inadequate in that they do not directly model the interrelationship between latent traits. This limits the implementation of the model in various applications and further prevents the development of other types of IRT models that offer advantages not realized in existing models. In view of this, an MCMC algorithm is proposed for MIRT models so that the actual latent structure is directly modeled. It is demonstrated that the algorithm performs well in modeling parameters as well as intertrait correlations and that the MIRT model can be used to explore the relative importance of a latent trait in answering each test item.

1. Introduction

Item response theory (IRT) is a popular approach used for describing probabilistic relationships between correct responses on a set of test items and continuous latent traits (see [14]). IRT models have also been used in other areas of applied mathematics and statistical research. Some examples include US Supreme Court decision-making processes [5], alcohol disorder analysis [69], nicotine dependency [1012], multiple-recapture population estimation [13], psychiatric epidemiology [1416], longitudinal data analysis [17, 18], latent regression models [19, 20], and missing data analysis [21].

IRT has the advantage of allowing the inference of what the items and persons have on the responses to be modeled by distinct sets of parameters. As a result, a primary concern associated with IRT research has been on parameter estimation, which offers the basis for the theoretical advantages of IRT. Specifically, of concern are the statistical complexities that can often arise when item and person parameters are simultaneously estimated (see [1, 2224]). More recent attention has focused on fully Bayesian estimation where Markov chain Monte Carlo (MCMC) simulation techniques are used (e.g., [25, 26]). Over the past decade, MCMC has been implemented in the context of IRT models where one latent trait is assumed (e.g., [3, 2729]) as well as to models where multiple traits are considered (e.g., [3036]), for a thorough review on the historical and current developments of MCMC in terms of IRT, see [37].

The compensatory multidimensional IRT (MIRT; [38]) model assumes that each item measures multiple latent traits. It differs from some other dichotomous models insofar as it has an additional source of model indeterminacy that creates difficulties when using MCMC. Some techniques have been developed to approach this problem by imposing a special structure that constrains the item slope parameters [30, 36, 39]. However, these approaches do not directly model the actual interrelation between the distinct latent traits and, thus, are limited in certain applications. In view of the above, the present aim is to derive an efficient MCMC algorithm via Gibbs sampling [40] that (a) obviates the additional source of model indeterminacy associated with the MIRT model and (b) directly models the underlying latent trait structure. The MIRT model considered herein is presented in normal ogive form as more complicated MCMC procedures would have to be adopted for the logistic form (e.g., [3, 28, 35, 36]). Further, given that parametric probability functions of correct responses are usually modeled by a normal ogive or a logistic function and noting that the logistic and normal ogive forms of the IRT models are essentially indistinguishable in terms of model fit or parameter estimates (given proper scaling, see [41]), MCMC procedures for logistic models are not considered.

The remainder of this paper is organized as follows. In Section 2, the two-parameter normal ogive (2PNO) MIRT model is outlined. In Section 3, the Gibbs sampler is derived, and the prior specifications for the model parameters are described. Section 4 gives examples of implementing the Gibbs sampling algorithm in the context of simulated and real data to demonstrate the proposed methodology.

2. Preliminaries

The MIRT model is introduced by considering a test that consists of 𝑘 dichotomous items with each measuring 𝑚 latent traits. Let 𝑦=[𝑦𝑖𝑗]𝑛×𝑘 denote a matrix of 𝑛 responses to the 𝑘 items where 𝑦𝑖𝑗=1 (𝑦𝑖𝑗=0) if the 𝑖th person answers the 𝑗th item correctly (incorrectly) for 𝑖=1,,𝑛 and 𝑗=1,,𝑘.

Definition 2.1. The probability of the 𝑖th person obtaining a correct response on the 𝑗th item is defined for the 2PNO MIRT model as 𝑃𝑦𝑖𝑗=1𝜽𝑖,𝜶𝑗,𝛽𝑗𝜶=Φ𝑗𝜽𝑖𝛽𝑗=Φ𝑚𝑙=1𝛼𝑙𝑗𝜃𝑙𝑖𝛽𝑗𝜂=Φ𝑖𝑗.(2.1) The vector 𝜽𝑖=(𝜃1𝑖,,𝜃𝑙𝑖,,𝜃𝑚𝑖) denotes latent trait parameters associated with the 𝑖th person, and the vector 𝜶𝑗=(𝛼1𝑗,,𝛼𝑙𝑗,,𝛼𝑚𝑗) denotes nonnegative slope parameters where larger values of 𝛼𝑙𝑗 have more influence on determining a success on the 𝑗th item. The intercept parameter 𝛽𝑗 denotes the location in the latent space where the 𝑗th item is maximally informative, and Φ denotes the unit normal cdf. The model in (2.1) is also referred to as a compensatory MIRT model [38] because a low level of 𝜃𝑙𝑖 in one dimension can be compensated by a high level of 𝜃𝑙𝑖 in another dimension.

Remark 2.2. If the vector of slope parameters in (2.1) is such that 𝜶𝑗=(0,,0,𝛼𝑙𝑗,0,,0), then the MIRT model reduces to the 2PNO multi-unidimensional model as 𝑃𝑦𝑙𝑖𝑗=1𝜃𝑙𝑖,𝛼𝑙𝑗,𝛽𝑙𝑗𝛼=Φ𝑙𝑗𝜃𝑙𝑖𝛽𝑙𝑗,(2.2) where the test involves multiple parameters of 𝜃𝑙𝑖 and where each item measures one of these latent variables (see [31, 32]). The difference between (2.1) and (2.2) is analogous to the distinction made between factor analysis and that with a rotation to achieve a simple structure [42]. As such, (2.2) can be viewed as a special case of (2.1) where each item measures only one of the several latent traits. Further, the two models differ in that (2.1) is exploratory whereas (2.2) is confirmatory in nature.
The unidimensional IRT model, which has a systematic component form of 𝛼𝑗𝜃𝑖𝛽𝑗, has a well-known identification problem in terms of location and scale invariance (e.g., [43]). Common practices of resolving this problem are to impose some constraint on the item parameters, that is, 𝛼𝑗=1 and 𝛽𝑗=0, or select some specific values for the location and scale parameters for the prior normal distribution of 𝜃𝑖, for example, 𝜃𝑖𝑁(0,1) (see, e.g., [3, 2729, 43]). Further, Bafumi et al. [5] proposed using a parameter transformation to approach the identification problem in the context of unidimensional IRT models. More specifically, the model parameters are transformed using a normalization procedure after estimation is completed. Bafumi et al. [5] noted that this transformation procedure obviates the problem of elusive convergence that results from highly correlated samples.
In terms of the multi-unidimensional IRT model in (2.2), Lee [31] extended Tsutakawa’s [43] approach by adopting a constrained covariance matrix for the latent traits and modeling the constrained covariance matrix indirectly. Lee’s [31] method not only solves the model indeterminacy problem, but also appropriately estimates the interrelationship between multiple latent traits (see also [32, 44]).
The more general MIRT model, as defined in (2.1), involves a new source of model indeterminacy called rotational invariance and is statistically more complicated than the unidimensional or multi-unidimensional models. As such, a Gibbs sampler is subsequently derived based on the ideas suggested in [5, 31] to address the general MIRT model identification problems and to model the latent structure directly.
It is noted that in an effort to develop computer software, Sheng [45] has shown that the approaches based on [5, 31] are useful for the 2PNO additive MIRT model, whose systematic component for modeling 𝑦𝑙𝑖𝑗 takes the form 𝛼0𝑙𝑗𝜃0𝑖+𝛼𝑙𝑗𝜃𝑙𝑖𝛽𝑙𝑗. The model assumes that each item measures two latent traits: 𝜃0𝑖, a common latent trait that all items measure, and 𝜃𝑙𝑖, a latent trait that is specific for items in the 𝑙th subtest. The difference between the model in [45] and the general MIRT model presented herein is comparable to that between a bifactor model (see [46]) and a general factor analysis model. The two models assume different latent structures, and hence the approaches for resolving their model indeterminacies are not the same.

3. The Gibbs Sampler

The derivation of the Gibbs sampler associated with the MIRT model defined in (2.1) begins by considering a multivariate distribution for 𝜽𝑖 and a linear transformation on it, which will be based on the following definitions.

Definition 3.1. Let 𝜽𝑖𝑁𝑚(𝟎,𝐏), where 𝐏 is a constrained covariance matrix or a correlation matrix, with 1s on the diagonal and with correlations 𝜌𝑠𝑡 (between 𝜃𝑠𝑖 and 𝜃𝑡𝑖) on the off-diagonal.

Definition 3.2. Let 𝜽𝑖𝑁𝑚(𝝁,𝚺), where 𝚺=[𝜎𝑙𝑙]𝑚×𝑚 and 𝚺=𝐷𝐏𝐷, where 𝐷 is an 𝑚×𝑚 diagonal matrix. Note that this variance-correlation decomposition of 𝚺 [47] makes the interpretation easier [48] and is essential for modeling the correlation matrix indirectly while solving the model indeterminacy in the context of the MIRT model.
From Definitions 3.1 and 3.2, it can be shown that 𝜽𝑖𝝁=𝐷𝜽𝑖,(3.1) where 𝐏 can be transformed from 𝚺 using 𝜌𝑠𝑡=𝜎𝑠𝑡𝜎𝑠𝑠𝜎𝑡𝑡(3.2) for 𝑠𝑡. To obviate the identification problem associated with the unconstrained parameters, let 𝜽𝑖 be related with the item parameters (𝜶𝑗 and 𝛽𝑗) so that the likelihoods are preserved given 𝜶𝑗𝜽𝑖𝛽𝑗=𝜶𝑗𝜽𝑖𝛽𝑗,(3.3) where the item parameters (𝜶𝑗 and 𝛽𝑗) will have to be constrained such that 𝑗𝛼𝑙𝑗=1 and 𝑗𝛽𝑗=0. This leads us to the following proposition.

Proposition 3.3. If 𝜶𝑗 are constrained such that 𝑗𝛼𝑙𝑗=1, then 𝐷=diag𝑗𝛼1𝑗1/𝑘,,𝑗𝛼𝑚𝑗1/𝑘.(3.4)

Proof. It follows from (3.1) that 𝜽𝑖=𝐷𝜽𝑖+𝝁, and thus, substituting 𝐷𝜽𝑖+𝝁 into (3.3) gives 𝜶𝑗𝐷𝜽𝑖+𝝁𝛽𝑗=𝜶𝑗𝜽𝑖𝛽𝑗.(3.5) Using (3.5), we can subsequently derive 𝜶𝑗𝐷=𝜶𝑗.(3.6) Setting 𝐷=diag(𝑑1,,𝑑𝑚) in (3.6) and subsequently multiplying the left-hand side yields 𝛼𝑙𝑗𝑑𝑙=𝛼𝑙𝑗,(3.7) which leads to 𝑗𝛼𝑙𝑗𝑑𝑙=𝑗𝛼𝑙𝑗,(3.8) for 𝑙=1,,𝑚. Hence, given the constraint that 𝑗𝛼𝑙𝑗=1, each nonzero element in 𝐷 is 𝑑𝑙=(𝑗𝛼𝑙𝑗)1/𝑘.

To implement Gibbs sampling for the MIRT model in (2.1), a latent variable 𝑍 is introduced such that 𝑍𝑖𝑗𝑁(𝜂𝑖𝑗,1) (see, e.g., [27, 49]). Further, from Definition 3.1, we assume that 𝜽𝑖𝑁𝑚(𝟎,𝐏) to ensure unique scaling for 𝜽, which precludes the identification problem associated with such models (see [45]). Furthermore, for the unconstrained covariance matrix 𝚺, we assume that 𝑝(𝚺)=|𝚺|(𝑚+2)/2. Thus, if 𝝃𝑗=(𝜶𝑗,𝛽𝑗) with assumed prior distributions, then the joint posterior distribution of (𝜽,𝝃,𝐙,𝚺) is𝑝(𝜽,𝝃,𝐙,𝚺)𝑓(𝐲𝐙)𝑝(𝐙𝜽,𝝃)𝑝(𝝃)𝑝(𝜽𝐏)𝑝(𝚺),(3.9) where 𝑓(𝐲𝐙)=𝑛𝑖=1𝑘𝑗=1𝑝𝑦𝑖𝑗𝑖𝑗(1𝑝𝑖𝑗)1𝑦𝑖𝑗 is the likelihood function, with 𝑝𝑖𝑗 being the model probability function as defined in (2.1).

The proposed Gibbs sampler involves the following five steps:(1)sampling of the augmented parameters from𝑍𝑖𝑗𝑁(0,)𝜂𝑖𝑗,1if𝑦𝑖𝑗𝑁=1,(,0)𝜂𝑖𝑗,1if𝑦𝑖𝑗=0,(3.10)(2)sampling of the latent variable (person) parameters 𝜽𝑖 from 𝜽𝑖𝑁𝑚𝐀𝐀+𝐏1𝐀𝐀𝐁,𝐀+𝐏1,(3.11) where 𝐀=𝜶1𝜶𝑘𝑘×𝑚 and 𝐁=𝑍𝑖1+𝜷1𝑍𝑖𝑘+𝜷𝑘𝑘×1,(3)sampling of the item parameters 𝝃𝑗 from 𝝃𝑗𝑁𝑚𝐱𝐱+𝐈1𝐱𝐙𝑗,𝐱𝐱+𝐈1𝐼𝛼𝑙𝑗>0,(3.12) where 𝐱=[𝜽,1], assuming uniform priors 𝛼𝑙𝑗>0 and 𝑝(𝛽𝑗)1, or from 𝝃𝑗𝑁𝑚𝐱𝐱+𝚺𝝃11𝐱𝐙𝑗+𝚺𝝃1𝝁𝝃,𝐱𝐱+𝚺𝝃11𝐼𝛼𝑙𝑗>0,(3.13) where 𝝁𝝃=(𝜇𝛼1,,𝜇𝛼𝑚,𝜇𝛽) and 𝚺𝝃=diag(𝜎2𝛼1,,𝜎2𝛼𝑚,𝜎2𝛽), assuming conjugate normal priors 𝛼𝑙𝑗𝑁(0,)(𝜇𝛼𝑙,𝜎2𝛼𝑙), 𝛽𝑗𝑁(𝜇𝛽,𝜎2𝛽),(4)sampling of the unconstrained covariance matrix 𝚺 from 𝚺𝑊1𝐒1,,𝑛(3.14) where 𝑊1 is an inverse Wishart distribution, 𝐒=𝑛𝑖=1(𝐷𝜽𝑖)(𝐷𝜽𝑖), and where 𝐷 is derived from (3.4),(5)a transformation from 𝚺 to 𝐏.

In view of the additional model indeterminacy that results from the additive nature of 𝜃𝑙𝑗, the parameters are further normalized after each Markov transition step is completed [5, 45]. More specifically, 𝜃𝑙𝑖, 𝛼𝑙𝑗, and 𝛽𝑗 are transformed (𝑡) to the following normalized parameters: 𝜃𝑡𝑙𝑖=(𝜃𝑙𝑖𝜃𝑙)/𝑠𝜃𝑙, 𝛼𝑡𝑙𝑗=𝛼𝑙𝑗𝑠𝜃𝑙 and 𝛽𝑡𝑗=𝛽𝑗𝑙𝛼𝑙𝑗𝜃𝑙, where 𝜃𝑙 and 𝑠𝜃𝑙 represent the mean and standard deviation of 𝜃𝑙𝑗. This rescaling preserves the likelihood because 𝑙𝛼𝑡𝑙𝑗𝜃𝑡𝑙𝑖𝛽𝑡𝑗=𝑙𝛼𝑙𝑗𝜃𝑙𝑖𝛽𝑗, while allowing the computation to proceed more efficiently [50]. Further, the transformation also assists in terms of speeding up the convergence of the Markov chains by reducing the posterior correlation in the posterior probability densities [51].

Thus, with initial starting values of 𝜽(0), 𝝃(0), and 𝐏(0), the observations (i.e., 𝐙(), 𝜽(), 𝝃(), 𝚺(), and 𝐏()) can be drawn or transformed iteratively from (3.10), (3.11), (3.12), (3.14), and (3.2) (or (3.13) in lieu of (3.12)), respectively. This iterative process continues for a sufficient number of samples after the posterior distributions reach stationarity (i.e., a phase commonly referred to as burn-in). The posterior means of all the samples collected after the burn-in stage are considered to be estimates of the model parameters (𝜽, 𝝃) and the hyperparameter (𝐏).

4. Numerical Examples

To demonstrate the methodology presented above, the proposed Gibbs sampler was implemented using both simulated and real data. In terms of simulated data, tests that measure two latent traits were considered. In particular, three 1000×18 (i.e., 𝑚=2, 𝑛=1000, and 𝑘=18) dichotomous data matrices were simulated from the 2PNO MIRT model where the population correlation between the two latent traits was set to 𝜌𝜃1𝑖,𝜃2𝑖=0.2, 0.4, 0.6, respectively. The item parameters were generated randomly from uniform distributions so that 𝛼𝑙𝑗𝑈(0,2), 𝛽𝑗𝑈(2,2). Gibbs sampling was subsequently implemented to recover the model parameters assuming informative normal (i.e., 𝜇𝛼1=𝜇𝛼2=𝜇𝛽=0 and 𝜎2𝛼1=𝜎2𝛼2=𝜎2𝛽=1) or uniform priors for 𝝃𝑗. Convergence was evaluated using the Gelman and Rubin [52] 𝑅 statistic for each item parameter. While the usual practice is to use multiple Markov chains from different starting points, a single chain can also be divided into subchains so that convergence is assessed by comparing the between and within subchain variances (see [53]). In view of the fact that a single chain is more economical in the number of iterations needed, the latter approach was adopted. The posterior estimates of item parameters (𝛼1,𝛼2,𝛽), the intertrait correlation hyperparameter, and the associated Gelman-Rubin 𝑅 statistics were obtained and are listed in Tables 1, 2, and 3 (note that 𝜌𝜃1𝑖,𝜃2𝑖 is denoted as 𝜌12 in these tables).

The Gelman-Rubin 𝑅 statistic provides a numerical measure for assessing convergence for each item parameter. With values close to 1, it is determined that in the implementation of the Gibbs sampler, Markov chains reached stationarity with a run length of 10,000 iterations and a burn-in period of 5,000 iterations. The posterior estimates of the item parameters as well as the intertrait correlation hyperparameter are fairly close to the specified parameters, suggesting that the algorithm performs well in recovering these parameters when the latent dimensions have a low to medium correlation. Further, the two sets of posterior estimates, resulting from different prior distributions, differ only slightly from each other, signifying that the posterior estimates are not sensitive to the choice of noninformative or informative priors for the slope and intercept parameters.

In the context of real data, a subset of the College Basic Academic Subjects Examination (CBASE; [54]) English data was used to demonstrate the methodology. Specifically, these data contain independent binary responses of 1,200 college students to 41 multiple-choice items. The English test is further organized to have two subtests, namely, reading and writing, so that 25 items are in the reading subtest and 16 are in the writing subtest. It is noted that the test was designed in such a manner that it conforms to the multi-unidimensional model, as each item measures one of the two latent traits. However, one may use the more general MIRT model to explore the latent structure, and in particular, to assess individual test items (i.e., to determine if the trait mainly involved in answering each item agrees with the one that it is supposed to measure). This can be accomplished by examining the estimated slope parameters, as a larger 𝛼𝑙𝑗 corresponds to a latent dimension that is more important in determining a person’s success on the item. Hence, assuming uniform priors for 𝝃𝑗, Gibbs sampling was implemented to fit the MIRT model to the CBASE data with a run length of 10,000 iterations and a burn-in period of 5,000, which was sufficient for the chains to converge. An examination of the posterior estimates of 𝛼 shown in Table 4 suggests that all 16 items in the writing subtest relies on the second dimension writing more than the first dimension reading. However, some items in the reading subtest, such as items 17, 19–26, 28, and 30, require further attention and modification, as they do not seem to measure mainly reading as the rest of the items do.

In summary, the proposed MCMC algorithm provides computationally efficient and accurate estimation in the context of both simulated and real data examples. Not only does the algorithm appropriately model parameters, but also the algorithm efficiently models the intertrait correlations for the compensatory MIRT model, which provides an exploratory approach for examining the latent structure of a test and detecting items that do not measure the trait they are designed to measure.

5. Concluding Remarks

The MCMC algorithm presented in this paper offers solutions for directly modeling the underlying structure of IRT models with multiple continuous latent traits. The algorithm works well when the actual intertrait correlation is low to moderate (less than 0.8), as a high correlation tends to result in high collinearity, which makes it difficult to distinguish among multiple latent traits and estimate them. With model parameters being accurately estimated, the compensatory MIRT model can be used to explore the relative importance of a latent trait in answering each test item. This is particularly useful when the underlying structure is not known, or when it is desirable to confirm the structure by examining the performance of individual items.