Research Article

Discovering the Unknown: Improving Detection of Novel Species and Genera from Short Reads

Figure 1

The datasets used in this paper are composed of a classifier training set/database, novel genomes used to train the detector, and a separate novel-genome test set. The blue areas represent the percentage of genomes that have “known” genera/species; the green areas represent the percentage of genomes that are “known” at the genus level but “unknown” on the species level; the red areas represent the percentage of genomes that are “unknown” at both the species and genus levels.
495849.fig.001