Discovering the Unknown: Improving Detection of Novel Species and Genera from Short Reads
Table 1
Sensitivity, specificity, and (detector) accuracy rates of detectors for accepting/rejecting reads as “known” from a 275-strain test-set. Using 5-fold cross-validation, the maximum standard deviation is 1%. If all fragments were rejected, the species level would obtain 66% accuracy and the genus level 37% accuracy, and PhymmBL+Detector achieves 15–30% above this threshold. SOrt-ITEMS did not classify any fragment below the genus level, so N/A is designated for the species level. WebCarma's performance using 500 bp fragments resulted in a 20.1% sensitivity, 86.9% specificity, and 54% detector accuracy for the species level, and 23% sensitivity, 85% specificity, and 40.3% detector accuracy for the genus level. WebCarma only classified about 10 K of the 27.5 K reads. Due to its poor performance, we did not include it in the table.