Psychometric Limitations of the Center for Epidemiologic Studies-Depression Scale for Assessing Depressive Symptoms among Adults with HIV/AIDS: A Rasch Analysis
Table 1
Overview of the analytic process using a Rasch model approach.
Rating scale functioning: does the rating scale function consistently across items? (substantive validity)
(i) Average measures for each step category and threshold on each item should advance monotonically (ii) -values < 2.0 in outfit mean square (MnSq) values for step category calibrations [42]
Rating scale met criteria
Rating scale met criteria
Rating scale met criteria
2
Internal scale validity: how well do the actual item responses match the expected responses from the Rasch model? (content validity)
One item failed to meet the criterion: item 8: MnSq = 1.53
3
Internal scale validity: is the scale unidimensional (i.e., does it measure a single construct)? (structural validity)
Principal component analysis (i) ≥50% of total variance explained by first component (depressive symptoms) [44] (ii) Any additional component explains < 5% of the remaining variance after removing first component [44]
(i) First component explained 32.5% of total variance (ii) Second component explained 9.4% of total variance
(i) First component explained 37.9% of total variance (ii) Second component explained 7.4% of total variance
(i) First component explained 34.2% of total variance (ii) Second component explained 12.0% of total variance
4
Person-response validity: how well do the individual responses match expected responses from the Rasch model? (substantive validity)
Person goodness-of-fit statistics (i) MnSq values < 1.5 and -value ≤ 2.0 (ii) ≤5% of sample fails to demonstrate acceptable goodness-of-fit values [45]
52 respondents (15.0% of sample) failed to demonstrate acceptable goodness-of-fit values
36 respondents (10.3% of sample) failed to demonstrate acceptable goodness-of-fit values
38 respondents (11.0% of sample) failed to demonstrate acceptable goodness-of-fit values
5
Person-separation reliability: can the scale distinguish ≥3 distinct groups of depression in the sample tested? (reliability)
Differential item functioning (DIF): are item difficulty calibrations stable in relation to key demographic variables? (generalizability validity)
Mantel-Haenszel statistic (i) with Bonferroni correction [47] (ii) 1 item with DIF out of 20 may occur by chance and is deemed acceptable
Items with DIF (i) Gender: 15 & 17 (ii) Race: 20, 19, 18, 16 & 6 (iii) Antidepressant use: 20 (iv) AIDS diagnosis: 8
Items with DIF (i) Gender: 14 & 17 (ii) Race: 20, 19, 18 & 6
Not evaluated
8
Differential test functioning (DTF): how consistent are the scores for the original CES-D and reduced-item scales?
(i) ≤5% of -scores of the differences between the two test scores exceed ±1.96 (ii) Pearson correlation and
Not applicable
(i) 6 measures (1.7%) had -scores exceeding ±1.96 (ii) ,
(i) 1 measure had a z-score exceeding ±1.96 (ii) = 0.942,
Note. After initial evaluation of the original 20-item CES-D, a stepwise process was used whereby items failing to meet criteria were removed one at a time, and only those meeting criteria in earlier steps advanced to subsequent steps. If more than one item failed to meet a criterion, the item with the worst fit was removed and the step was repeated with the remaining items. The last column includes 15-item version omitting misfitting items 2 (appetite), 4 (as good as others), 8 (hopeful), 11 (restless sleep), and 16 (enjoyed life). The five misfitting items did not all demonstrate misfit in the first iteration; some emerged in subsequent iterations; items are listed in the order of removal and the MnSq values shown reflect the iteration prior to the item’s removal.