Metagenome Fragment Classification Using -Mer Frequency Profiles
Figure 6
The accuracy of the naive Bayes classifier versus -mer length versus fragment length for strain
classifications for the 635 completed microbial genomes. This graph clearly
shows that accuracy improves when the longer -mers are used in the scoring function. As
expected, 500 bp fragments performed the best, reaching 88.8% accuracy in
strains and 82.5% for 100 bp fragments. The 25 bp fragments surprisingly
increased performance when using 15 mers, yielding 75.8%. There is a jump in
accuracy at around the range which provides insight into the order
needed for classification.