Abstract

We summarize standard and novel statistical methods for evaluating the classification accuracy of DNA methylation markers. The choice of method will depend on the type of marker studied (qualitative/quantitative), the number of markers, and the type of outcome (time-invariant/time-varying). A minimum of two error rates are needed for assessing marker accuracy: the true-positive fraction and the false-positive fraction. Measures of association that are computed from the combination of these error rates, such as the odds ratio or relative risk, are not informative about classification accuracy. We provide an example of a DNA methylation marker that is strongly associated with time to death (logrank p = 0.0003) that is not a good classifier as evaluated by the true-positive and false-positive fractions. Finally, we would like to emphasize the importance of study design. Markers can behave differently in different groups of individuals. It is important to know what factors may affect the accuracy of a marker and in which subpopulations the marker may be more accurate. Such an understanding is extremely important when comparing marker accuracy in two groups of subjects.