Research Article

Metagenome Fragment Classification Using 𝑁 -Mer Frequency Profiles

Figure 4

A log-log plot of the -mer frequences versus -mers in ranked order for various E. coli strains (K12 is the commensal strain, O157H7 is highly pathogenic, and HS is the commensal isolate from the human gastrointestinal tract). E. coli has a characteristic curve for all strains in this domain. This curvature is then compared to Zipf's law which states that -mer frequency is directly related to inverse rank order. While E. coli generally obeys this law, the curvature deviation from the straight line shows that higher ranking of words has higher normal frequency.