Research Article

Metagenome Fragment Classification Using 𝑁 -Mer Frequency Profiles

Figure 6

The accuracy of the naive Bayes classifier versus -mer length versus fragment length for strain classifications for the 635 completed microbial genomes. This graph clearly shows that accuracy improves when the longer -mers are used in the scoring function. As expected, 500 bp fragments performed the best, reaching 88.8% accuracy in strains and 82.5% for 100 bp fragments. The 25 bp fragments surprisingly increased performance when using 15 mers, yielding 75.8%. There is a jump in accuracy at around the range which provides insight into the order needed for classification.
205969.fig.006