Abstract

IRT models are widely used but often rely on distributional assumptions about the latent variable. For a simple class of IRT models, the Rasch models, conditional inference is feasible. This enables consistent estimation of item parameters without reference to the distribution of the latent variable in the population. Traditionally, specialized software has been needed for this, but conditional maximum likelihood estimation can be done using standard software for fitting generalized linear models. This paper describes an SAS macro %rasch_cml that fits polytomous Rasch models. The macro estimates item parameters using conditional maximum likelihood (CML) estimation and person locations using maximum likelihood estimator (MLE) and Warm's weighted likelihood estimation (WLE). Graphical presentations are included: plots of item characteristic curves (ICCs), and a graphical goodness-of-fit-test is also produced.

1. Introduction

Item response theory (IRT) models were developed to describe probabilistic relationships between correct responses on a set of test items and continuous latent traits [1]. In addition to educational and psychological testing, IRT models have been also used in other areas of research, for example, in health status measurement and evaluation of Patient-Reported Outcomes (PROs) like physical functioning and psychological well-being wich are typical in applications of IRT models. Traditional applications in education often use dichotomous (correct/incorrect) item scoring, but polytomous items are common in other applications.

Formally, IRT models deal with the situation where several questions (called items) are used for ordering of a group of subjects with respect to a unidimensional latent variable. Before the ordering of subjects can be done in a meaningful way, a number of requirements must be met.(i)Items should measure only one latent variable. (ii)Items should increase with the underlying latent variable. (iii)Items should be sufficiently different to avoid redundance. (iv)Items should function in the same way in any subpopulation. These requirements are standard in educational tests where (i) items should deal with only one subject (e.g., not being a mixture of math and language items), (ii) the probability of a correct answer should increase with ability, (iii) items should not ask the same thing twice, and (iv) the difficulty of an item should depend only on the ability of the student, for example, an item should not have features that makes it easier for boys than for girls at the same level of ability.

Let denote the latent variable, and let denote the vector of item responses. The two first requirements can be written as follows. (i) is a scalar. (ii)  is increasing for all items  . One would expect two similar items to be highly correlated and to have an even higher correlation than what the underlying latent variable accounts for, and it is usual to impose the requirement of local independence (iii),  for all  . This requirement is related to the requirement of nonredundancy. The fourth requirement can be written as (iv)   for all items     and all variables   . The requirements (i)–(iv) are referred to as unidimensionality, monotonicity, local independence, and absence of differential item functioning (DIF), respectively. Fitting observed data to an IRT model enables us to test if these requirements are met. Evaluation of model fit is crucial and many fit statistics exist [2], but the issue of fit can also be addressed graphically.

This paper describes a SAS macro %rasch_cml that fits an IRT model, the polytomous Rasch model [3, 4]. The SAS macro is available from biostat.ku.dk/~kach/index.html#cml. It estimates item parameters, plots item characteristic curves, estimates person locations, and produces graphical tests of fit.

2. The Polytomous Rasch Model

Consider items, where item has response categories represented by the numbers . Let be the response to item with realization . For items , the polytomous Rasch model is given by probabilities where is the vector of item parameters for item , , for all , and A normalizing constant. An alternative way of parameterizing is in terms of the thresholds for and that are easily interpreted, since is the location on the latent continuum where scale where the probability, for item , of choosing category equals the probability of choosing category . This model was originally proposed by Andersen [5], see also [6]. Masters [7] called this model the Partial Credit Model and derived the probabilities (1) from the requirement that the conditional probabilities , for , fit a dichotomous Rasch model: Using the assumption (iii) of local independence, the vector with realization that , where and . By Neyman’s factorization theorem, it is clear from (4) that the sum of item responses is sufficient for . The joint log likelihood for a sample of persons is given by where . Jointly, estimating all parameters from (5) does not provide consistent estimates, since the number of parameters increase with the sample size. If our interest is estimating the item parameters the person parameters, can be interpreted as incidental or nuisance parameters [8].

3. Conditional Maximum Likelihood Estimation

The joint log likelihood function (5) can be written as where , is the total score observed for person , and defined by are the item margins. Note that from (5) it can be seen that the total score is sufficient for the person location and that for each the item margin is sufficient for the item parameter .

Restrictions are needed to ensure that the model (5) is identified since from (1) it is clear that for all for defined by and for .

To obtain consistent item parameters estimates marginal [9] or conditional [10] maximum likelihood estimation is used. The marginal approach to item parameter estimation assumes that the latent variables are sampled from a population and introduces an assumption about the distribution of the latent variable. The sufficiency property can also be used to overcome the problem of inconsistency of item parameter estimates. This can be done by conditioning on the sum of the entire response vector yielding conditional maximum likelihood (CML) inference. For a vector from the Rasch model, the distribution of the score is given by the probabilities where summation is over the set of all response vectors with . The probability can be written as Let the last sum be denoted by The score is sufficient for , and the item parameters can be estimated consistently using the conditional distribution of the responses given that the scores. The conditional distribution of the vector of item responses given the score is given by the probabilities These do not depend on the value of , and the conditional likelihood function is the product Again a linear restriction on the parameters is needed in order to ensure that the model is identified. Maximizing this likelihood yields item parameter estimates which are conditionally consistent. If, for each possible response vector , we let denote the number of persons with this response vector and for each possible score and denote the observed number of persons with this value of the score, this likelihood function can be written as and using the indicator functions , this likelihood function can be rewritten as yielding the conditional log likelihood function where are the sufficient statistics for the item parameters. These sufficient statistics, called item margins, are the number of persons giving the response to item . The item parameters in this model can be estimated by solving the likelihood equations that equate the sufficient statistics to their expected values conditional on the observed value of the vector of scores. These expected values have the form and for an item , these can be written in terms of the probabilities of having a score of on the remaining items yielding Because these likelihood equations have the same form as those in a generalized linear model [1113], the item parameters can be estimated using standard software like SPSS [14] or SAS [15].

4. Estimation of Person Locations

There are various ways of estimating the person locations. An important feature of the Rasch model is that the sum score is sufficient for and consequently that the likelihood function for estimating is proportional to the probabilities where, as before, summation is over the of all response vectors with the sum . Now, define the -polynomials to obtain the expression Note from this that the normalizing constant can be written as a function of the ’s Calculation of the ’s is thus essential for estimation of the person locations. A recursion formula is described in what follows. Let denote the -polynomial based on the first items. It is then possible to calculate by the recursion formula since a total score of on the items must be obtained by scoring on item and on the items . The values of in the summation in the formula above must be chosen in such a way that the sum of the first items is at most and that item is at most implying that cannot exceed . That is, becomes Person locations can be estimated using maximum likelihood estimation or Bayes model estimation. A special case of the latter is socalled weighted likelihood estimation. Since the ’s do not depend on , (22) is an exponential family where the likelihood equation for estimating is and the maximum likelihood estimator (MLE) can be obtained by the Newton-Raphson algorithm. The probabilities (22) show that the score is increasing as a function of . For individuals who have obtained scores of zero or the largest possible score the probabilities (22) attain their maximum when is and , respectively. The Bayes model estimator (BME) of is obtained by choosing a prior density for the latent parameter and then maximizing the posterior density with respect to keeping item parameters and the observations fixed. The MLE described above is a special case corresponding to . Choosing the prior as the square root of the Fisher information results in the weighted maximum likelihood estimator (WLE) [16]. With this prior, one obtains an estimator with minimal bias and the same asymptotic distribution as the MLE. The equation to be solved in order to obtain the WLE is and the Newton-Raphson algorithm can be used for this.

5. Implementation in SAS

The SAS macro %rasch_cml uses PROC GENMOD to estimate item parameters and PROC NLMIXED to estimate person locations. It writes person locations estimated by maximum likelihood estimation (MLE) and by weighted likelihood estimation (WLE) and their asymptotic standard errors in a data set. Furthermore, a copy of input data set with an added column containing the maximum likelihood estimates is created.

6. Simulation

Evaluating of model fit can be done by comparing what has been observed with simulated data describing what could have been observed under the model. The SAS macro %rasch_cml simulates data sets under the model. These are obtained by first simulating person scores locations from the empirical score distribution and then simulating item responses. Let denote the set of possible scores and for define . Let denote the number of persons with each score. First simulate with probabilities , and next simulate a data matrix using the probabilities This procedure is repeated a number of times yielding data matrices ,

7. Graphics

Three graphical representations are made by the SAS macro %rasch_cml: item characteristic curves (ICCs) that display the response probabilities along the latent continuum and two item fit plots. Let denote the number of persons with total score giving the answer to item . For each combination , the macro plots the observed proportion as solid black dots and the expected proportions (the probabilities) as solid blue lines along with 95% confidence limits as dashed green lines. These plots are illustrated in Figure 2 and are closely related to plots of the ICCs because is sufficient for .

The observed mean score function for item is The simulated mean score function is obtained as follows by simulating item responses as described in Section 6 and calculating where .

8. The SAS Macro

The Hospital Anxiety and Depression Scale (HADS) was designed as a brief instrument used to assess symptoms of anxiety and depression [17] and contains 14 items often scored as two seven-item subscales: “depression” (even numbered items) and “anxiety” (odd numbered items). The SAS macro is illustrated using data reported by Pallant and Tennant [18]. The first step is to create a data set data inames;   input  item_name  $ item_text  $ max  Group  @@;    cards;    AHADS1  anx1 3 1 AHADS3 anx3 3 2   AHADS5 anx5 3 3 AHADS7 anx7 3 4    AHADS9 anx9 3 5 AHADS11 anx11 3 6    AHADS13 anx13 3 7     ;  run;that describes the items: item_name is the name of the items, item_text are text strings attached to the items, max is the maximum item score for each item, and group are integers defining groups of items that have the same item parameters. Thus, all HADS items are scored , and they all have their own vector of item parameters. The macro is called using the statement%rasch_cml(DATA=work.HADS,   ITEM_NAMES=inames,   OUT=HADSTEST);where DATA= specifies the data set to be analyzed, ITEM_NAMES= is the data set that describes the items and OUT= specifies a prefix for all output data sets generated by the macro (the default value is CML).

The SAS macro creates six data sets. The data set CML_logl contains the maximum value of the conditional log likelihood function. The data sets CML_par and CML_par_ci contain item parameter estimates, and the difference between them is illustrated by the (edited) outputitem  beta1  beta2  beta3AHADS1  3.75  0.17  0.82:AHADS9  0.93  1.51  2.43

from CML_par and see Table 1 from CML_par_ci. Note that the threshold parameters (’s) are the same.

The data sets CML_pp_regr and CML_regr are copies of the input data set with added variables useful for latent regression [19]. The data set CML_theta contains MLE and WLE estimates of person locations and their standard errors.

Further options can be specified: ICC=YES yields a plot of the item characteristic curves for each item. The ICCs for HADS item 9 are shown in Figure 1. Specifying plotcat=YES creates plots of observed and expected item category frequencies stratified by the total score. This yields plots for item as exemplified in Figure 2.

Using the option plotmean=YES makes the macro plot item means against raw scores as solid black lines along with item means simulated under the model plotted as gray-dashed lines. The default number of simulations is 30, but this can be changed using the NSIMU= option. Figure 3 shows an example.

The plot shows the mean scores to be increasing with the total score, compared with requirement (ii), and that the variations observed in the data are well within the range of what would be expected under the model.

9. Discussion

Several proprietary software packages for fitting Rasch models exist, the most widely used being RUMM [20], Conquest [21], and WINSTEPS [22]. With the increasing use of IRT and Rasch models in new research areas where access to specialized proprietary software is limited, it is important to provide implementations in standard statistical software such as R and SAS. SAS macros for Rasch models already exist. The macros %anaqol [23] and %irtfit [24] encompass a wide range of IRT models. The SAS macro %anaqol computes Cronbach’s coefficient alpha [25], several useful graphical representations, and estimates the parameters for any of the five IRT models (the dichotomous Rasch model [3, 4], the Birnbaum (2PL) model [26], OPLM, the partial credit model [7], and the rating scale model [6]) using marginal maximum likelihood. The SAS macro %irtfit produces a variety of indices for testing the fit of IRT models to dichotomous and polytomous item response data; it does not perform an estimation of item parameters but requires these that have been estimated using other IRT model software programs. The R package eRm [27] is a flexible tool for these analyses, as are the SAS macros %anaqol [23] and %irtfit [24]. The SAS macro %irtfit encompasses a wide range of IRT models but does not estimate item, parameters. The SAS macro %anaqol is very useful, but some features are only available for dichotomous items and the implemented plots of empirical and theoretical ICCs do not show confidence limits.

It has previously been discussed how to implement a conditional estimation in SAS [15], but no software was provided. The macro described in this paper uses these ideas to provide a user-friendly tool for item analysis, with focus on graphics.

Because the macro uses the contingency table of item responses no responses must be missing; if the estimation procedure fails to converge, a warning or error message is printed. The plots of observed and expected counts in each score group can be interpreted as empirical versions of the item characteristic curves. However, when many score groups are small, as is often the case in applications, these plots are not helpful. Therefore, the macro produces a single item-level goodness-of-fit plot. Furthermore, it extends previously implemented macros in that the output and features are the same for dichotomous and polytomous item response formats and that it presents more graphics, specifically new goodness-of-fit plot where observed item means are compared to item means simulated under the model.