Research Article

Metagenome Fragment Classification Using 𝑁 -Mer Frequency Profiles

Table 3

Comparison of the top 10 reads from the naive Bayes analysis of the Sargasso Sea set for 9 mers and 15 mers and a side-by-side comparison with MEGAN results. There are 7 common strains between the naive Bayes sets substantiating their presence in the sample. Not all NBC “best matches” are found in MEGAN (indicated by “None”), and this can be due to “no hits” or to not having that strain in the database. An interesting NBC find is that Trichodesmium erythraeum has been found to compose 0.6% of the sample. It has been extensively found in the Sargasso Sea, but no prior methods show this presence in the Sargasso Sea data set.

9 mers15 mers
High-strain content in sample (genome size of both sides)No. of readsNo. of MEGAN readsHigh-strain content in sampleNo. of readsNo. of MEGAN reads

Burkholderia 383 (9.3 M)693514Burkholderia 383 (9.3 M)2044514
Burkholderia Cenocepacia AU 1054 (14.6 M)68413Clostridium Beijerinckii NCIMB 8052 (12 M)16982
Clostridium beijerinckii NCIMB 8052 (12 M)6232Shewanella ANA-3 (10.3 M)989186
Shewanella ANA-3 (10.3 M)562186Trichodesmium erythraeum IMS101 (15.6 M)5842
Trichodesmium erythraeum IMS101 (15.6 M)5332Flavobacterium johnsoniae UW101 (12.2 M)48110
Burholderia xenovorans LB400 (19.6 M)404NoneSorangium cellulosum So Ce 56 (26 M)309None
Shewanella MR-4 (9.4 M)32914Shewanella oneidensis MR-1 (10.4 M)29778
Burholderia ambifaria/cepacia AMMD (15 M)26591Shewanella MR-4 (9.4 M)24514
Alkaliphilius metalliredigens QYMF (9.8 M)261NoneBurkholderia cenocepacia HI2424 (15.5 M)219102
Shewanella MR-7 (9.6 M)25026Shewanella MR-7 (9.6 M)20626
Acidobacteria bacterium Ellin345 (11.6 M)187NoneBurkholderia xenovorans LB400 (19.6 M)198None