Depression Research and Treatment

Research Article

Psychometric Limitations of the Center for Epidemiologic Studies-Depression Scale for Assessing Depressive Symptoms among Adults with HIV/AIDS: A Rasch Analysis

Table 1

Overview of the analytic process using a Rasch model approach.


Step	Psychometric property	Statistical approach and criteria	Results
Step	Psychometric property	Statistical approach and criteria	Original 20-item CES-D	Reduced 15-item CES-D (omits items with poor fit)	Zhang et al. 10-item CES-D [41]

1	Rating scale functioning: does the rating scale function consistently across items? (substantive validity)	(i) Average measures for each step category and threshold on each item should advance monotonically (ii) -values < 2.0 in outfit mean square (MnSq) values for step category calibrations [42]	Rating scale met criteria	Rating scale met criteria	Rating scale met criteria

2	Internal scale validity: how well do the actual item responses match the expected responses from the Rasch model? (content validity)	Item goodness-of-fit statistics MnSq values < 1.3 [43]	5 items failed to meet criterion: (i) Item 4: MnSq = 1.58 (ii) Item 8: MnSq = 1.47 (iii) Item 11: MnSq = 1.33 (iv) Item 2: MnSq = 1.36 (v) Item 16: MnSq = 1.36	All items met criterion	One item failed to meet the criterion: item 8: MnSq = 1.53

3	Internal scale validity: is the scale unidimensional (i.e., does it measure a single construct)? (structural validity)	Principal component analysis (i) ≥50% of total variance explained by first component (depressive symptoms) [44] (ii) Any additional component explains < 5% of the remaining variance after removing first component [44]	(i) First component explained 32.5% of total variance (ii) Second component explained 9.4% of total variance	(i) First component explained 37.9% of total variance (ii) Second component explained 7.4% of total variance	(i) First component explained 34.2% of total variance (ii) Second component explained 12.0% of total variance

4	Person-response validity: how well do the individual responses match expected responses from the Rasch model? (substantive validity)	Person goodness-of-fit statistics (i) MnSq values < 1.5 and -value ≤ 2.0 (ii) ≤5% of sample fails to demonstrate acceptable goodness-of-fit values [45]	52 respondents (15.0% of sample) failed to demonstrate acceptable goodness-of-fit values	36 respondents (10.3% of sample) failed to demonstrate acceptable goodness-of-fit values	38 respondents (11.0% of sample) failed to demonstrate acceptable goodness-of-fit values

5	Person-separation reliability: can the scale distinguish ≥3 distinct groups of depression in the sample tested? (reliability)	Person-separation index ≥2.0 [46]	2.04	1.90	1.42

6	Internal consistency: are item responses consistent with each other? (reliability)	Cronbach's alpha coefficient >0.80 [46]	0.88	0.88	0.78

7	Differential item functioning (DIF): are item difficulty calibrations stable in relation to key demographic variables? (generalizability validity)	Mantel-Haenszel statistic (i) with Bonferroni correction [47] (ii) 1 item with DIF out of 20 may occur by chance and is deemed acceptable	Items with DIF (i) Gender: 15 & 17 (ii) Race: 20, 19, 18, 16 & 6 (iii) Antidepressant use: 20 (iv) AIDS diagnosis: 8	Items with DIF (i) Gender: 14 & 17 (ii) Race: 20, 19, 18 & 6	Not evaluated

8	Differential test functioning (DTF): how consistent are the scores for the original CES-D and reduced-item scales?	(i) ≤5% of -scores of the differences between the two test scores exceed ±1.96 (ii) Pearson correlation and	Not applicable	(i) 6 measures (1.7%) had -scores exceeding ±1.96 (ii) ,	(i) 1 measure had a z-score exceeding ±1.96 (ii) = 0.942,

Note. After initial evaluation of the original 20-item CES-D, a stepwise process was used whereby items failing to meet criteria were removed one at a time, and only those meeting criteria in earlier steps advanced to subsequent steps. If more than one item failed to meet a criterion, the item with the worst fit was removed and the step was repeated with the remaining items. The last column includes 15-item version omitting misfitting items 2 (appetite), 4 (as good as others), 8 (hopeful), 11 (restless sleep), and 16 (enjoyed life).
The five misfitting items did not all demonstrate misfit in the first iteration; some emerged in subsequent iterations; items are listed in the order of removal and the MnSq values shown reflect the iteration prior to the item’s removal.