Research Article

Discovering the Unknown: Improving Detection of Novel Species and Genera from Short Reads

Table 4

The table shows the distribution of top 10 most abundant species reads of PhymmBL and the top-8 species-reads passed the species resolution detectors for the red soudan acid mine drainage dataset, using the 635-genome training database.
(a)

PhymmBL
Organism Matched reads

Gramella forsetii 4102
Marinobacter hydrocarbonoclasticus 3885
Flavobacterium johnsoniae 3480
Dinoroseobacter shibae 3402
Ruegeria pomeroyi 3119
Polaromonas naphthalenivorans 3116
Aeromonas salmonicida 2899
Rhodobacter sphaeroides 2616
Rhizobium leguminosarum 2541
Paracoccus denitrificans 2533

(b)

NBC detectorPhymmBL detector
Organism Matched reads Organism Matched reads

Marinobacter hydrocarbonoclasticus 31 Dinoroseobacter shibae 85
Dinoroseobacter shibae 18 Marinobacter hydrocarbonoclasticus 62
Ruegeria sp. TM1040 17 Rhodobacter sphaeroides 24
Rhodobacter sphaeroides 15 Ruegeria pomeroyi 24
Shewanella sp. ANA-3 11 Ruegeria sp. TM1040 22
Shewanella baltica 5 Paracoccus denitrificans 20
Desulfotalea psychrophila 4 Shewanella baltica 17
Paracoccus denitrificans 4 Shewanella sp. ANA-3 14