Computational and Mathematical Methods in Medicine

Review Article

Medical Diagnostic Tests: A Review of Test Anatomy, Phases, and Statistical Treatment of Data

Table 10

Metrics for global test accuracy evaluation or comparisons of performances of two tests.


Statistic (Abb)	Method	Remarks

Area under the ROC curve (AUC)	(i) Nonparametric (no assumptions): empirical method (estimated AUC is biased if only a few points are in the curve) and smoothed-curve methods such as kernel density method (not reliable near the extremes of the ROC curve) (ii) Parametric (the distributions of the cases and controls are normal): binomial method (tighter asymptotic confidence bounds for samples less than 100)	(i) AUC = 1 ⟶ perfect diagnostic test (perfect accuracy) (ii) AUC ∼ 0.5 ⟶ random classification (iii) 0.9 < AUC ≤ 1 ⟶ excellent accuracy classification (iv) 0.8 < AUC ≤ 0.9 ⟶ good accuracy (v) 0.7 < AUC ≤ 0.8 ⟶ worthless

Partial area under the curve (pAUC)	(i) Nonparametric (no assumptions) (ii) Parametric: using the binomial assumption	(i) Looks to a portion AUC for a predefined range of interest (ii) Depends on the scale of possible values on the range of interest (iii) Has less statistical precision compared to AUC

Diagnostic odds ratio (DOR)	(i) Must use the same fixed cutoff (ii) Most useful in a meta-analysis when two or more tests are compared	(i) DOR = 1 ⟶ test (ii) DOR increases as ROC is closer to the top left-hand corner of the ROC plot (iii) The same DOR could be obtained for different combinations of Se and Sp

TP fraction for a given FP fraction (TPF_FPF)	(i) Need the same false-positive fraction	(i) Useful to compare two different tests at a specific FPF (decided based on clinical reasoning), especially when the ROC curves cross

Comparison of two tests	(i) Comparison of AUC of two different tests (ii) Absolute difference (Se_A − Se_B) or ratio (Se_A/Se_B), where A is one diagnostic test and B is another diagnostic test	(i) Apply the proper statistical test; each AUC must be done relative to the “gold-standard” test (ii) Test A better than B if absolute difference is > 0; ratio > 1

Abb = abbreviation; all indicators are reported with associated 95% confidence intervals; patient-centered indicator; TP = true positive; FP = false positive; FN = false negative; and TN = true negative.