Research Article

Discovering the Unknown: Improving Detection of Novel Species and Genera from Short Reads

Table 1

Sensitivity, specificity, and (detector) accuracy rates of detectors for accepting/rejecting reads as “known” from a 275-strain test-set. Using 5-fold cross-validation, the maximum standard deviation is 1%. If all fragments were rejected, the species level would obtain 66% accuracy and the genus level 37% accuracy, and PhymmBL+Detector achieves 15–30% above this threshold. SOrt-ITEMS did not classify any fragment below the genus level, so N/A is designated for the species level. WebCarma's performance using 500 bp fragments resulted in a 20.1% sensitivity, 86.9% specificity, and 54% detector accuracy for the species level, and 23% sensitivity, 85% specificity, and 40.3% detector accuracy for the genus level. WebCarma only classified about 10 K of the 27.5 K reads. Due to its poor performance, we did not include it in the table.

NBC detector
Fragment length Sensitivity Specificity Accuracy Sensitivity Specificity Accuracy

500 bp 53.7% 96.3% 81.9% 32.9% 99.9% 58.0%
100 bp 62.2% 95.5% 84.3% 39.3% 99.5% 61.8%
25 bp 77.4% 89.6% 85.5% 61.7% 76.6% 67.3%

PhymmBL detector
Species Genus
Fragment length Sensivitiy Specificity Accuracy Sensitivity Specificity Accuracy

500 bp 84.0% 88.3% 86.8% 58.5% 97.4% 73.0%
100 bp 79.9% 92.0% 87.9% 52.5% 98.3% 69.6%
25 bp 77.2% 86.8% 83.5% 51.2% 92.6% 66.7%

MEGAN as a detector
Species Genus
Fragment length Sensivitiy Specificity Accuracy Sensitivity Specificity Accuracy

500 bp 83.3% 60.0% 68.1% 76.6% 66.5% 72.8%
100 bp 79.5% 71.4% 74.2% 66.9% 76.8% 70.6%
25 bp 71.0% 74.5% 73.2% 55.3% 73.4% 62.1%

SOrt-ITEMS as a detector
Species Genus
Fragment length Sensivitiy Specificity Accuracy Sensitivity Specificity Accuracy

500 bp N/A N/A N/A 57.1% 96.5% 71.2%
100 bp N/A N/A N/A 44.8% 97.9% 64.5%
25 bp N/A N/A N/A 6.1% 98.7% 40.5%