Research Article

Metagenome Fragment Classification Using 𝑁 -Mer Frequency Profiles

Table 2

63 500 25 bp fragments, 100 from each genome, are BLASTed and compared to the NBC. BLAST gives 66% of them unique top-scoring hits, where all of them were correct. Almost 34% of the reads have ambiguous top-scoring hits, meaning that there are multiple organisms that have top scores and -values. Also, even though the exact string or complement exist in the database, 287 fragments receive no hit from BLAST with an -value of 3000. NBC is able to correctly identify 71% of those. Being that the multiple top-scoring genomes can be randomly chosen as a top hit, we can compare directly, how often BLAST would get the genome correct compared to the NBC. Taking this and the single top hits into consideration, NBC scored 48118 (75.8%) fragments correct while BLAST matched 47889 (75.4%) fragments correct.

63 500 fragments

BLAST categoryInterpretation of BLAST resultsNBC's results for the BLAST category
No. of reads that had Unique Top-scoring hits in BLASTNo. that BLAST got correctNo. that NBC got correct
416414164141211
No. of reads that had Multiple Top-scoring hits in BLASTBLAST hits for reads where the multiple top-scoring list contained the correct one/no. of unique top-hits BLAST would get by chance from ambiguous hitsNo. that NBC got a correct, Unique Top-hit
2157221559/62486702
Reads that had No hits in BLAST ( -value of 3000)Could not be assigned in BLASTNo. that NBC got correct
2870205