Research Article

Unsupervised Two-Way Clustering of Metagenomic Sequences

Table 1

Performance of Gaussian mixture model (without word grouping) on datasets containing more than 2 species, at various abundances on reads of length 500 bp. AR stands for abundance ratio.

SpeciesAR Number reads Accuracy (%)

T. thermophilis 1 50000
A. vinelandii 387.51
N. meningitidis 2

E. coli 5361 50000
S. acidocaldarius 297.01
H. salinarum R12

C. jejuni RM1221 3 60000
H. salinarium R1 296.61
E. coli 1
P. horikoshii OT3 3

S. erythraea 1 60000
M. thermoautotrophicum 190.28
B. burgdorferi ZS7 1
E. coli 536 1

B. burgdorferi ZS7 1 75000
C. jejuni RM1221 185.04
E. coli 536 1
H. salinarum R1 1
P. horikoshii OT3 1