Research Article

Metagenome Fragment Classification Using 𝑁 -Mer Frequency Profiles

Table 4

MEGAN's top-ten strains for the Sargasso Sea dataset, their respective reads, and comparison to the NBC 9 mer and 15 mer methods. N/A means the strain is not in our training set (it is unfinished so it cannot be found. Burkholderia and Shewanella which were also found by Venter et al. [33] also have high matches in the NBC. The NBC's detection of Candidatus Pelagibacter drastically changes from to .

High strain content in sample (genome size—bothsides)MEGAN # of ReadsNBC mer # of readsNBC mer # of reads

Burkholderia 383 (9.3 M)5146932044
Candidatus Pelagibacter ubique HTCC1062 (2.6 M)32313111
Shewanella ANA-3 (10.3 M)186484989
Procholorococcus marinus MIT 9312 (3.4 M)1252824
Psychroflexus torquis ATCC 700755 (8.6 M)119N/AN/A
Burkholderia cenoecapacia HI2424 (15.52 M)102106219
Burholderia vietanamiensis G4 (16.8 M)1019392
Burkholderia ambifaria/cepacia AMMD (15.06 M)91265127
Shewanella oneidensis MR-1 (10.32 M)7879297
Synechoccus sp. WH8102 (4.86 M)756882