Table of Contents Author Guidelines Submit a Manuscript
Erratum

An erratum for this article has been published. To view the erratum, please click here.

Journal of Biomedicine and Biotechnology
Volume 2012 (2012), Article ID 153647, 11 pages
http://dx.doi.org/10.1155/2012/153647
Research Article

Unsupervised Two-Way Clustering of Metagenomic Sequences

Department of Computer Science and Engineering, University Park, PA 16802, Pennsylvania State University, USA

Received 15 December 2011; Accepted 26 January 2012

Academic Editor: Wei Wang

Copyright © 2012 Shruthi Prabhakara and Raj Acharya. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. K. Chen and L. Pachter, “Bioinformatics for whole-genome shotgun sequencing of microbial communities,” PLoS Computational Biology, vol. 1, no. 2, article e24, pp. 0106–0112, 2005. View at Publisher · View at Google Scholar · View at Scopus
  2. M. S. Rappé and S. J. Giovannoni, “The uncultured microbial majority,” Annual Review of Microbiology, vol. 57, pp. 369–394, 2003. View at Publisher · View at Google Scholar · View at Scopus
  3. M. Pop, “Genome assembly reborn: recent computational challenges,” Briefings in Bioinformatics, vol. 10, no. 4, pp. 354–366, 2009. View at Publisher · View at Google Scholar · View at Scopus
  4. C. K. K. Chan, A. L. Hsu, S. K. Halgamuge, and S. L. Tang, “Binning sequences using very sparse labels within a metagenome,” BMC Bioinformatics, vol. 9, article no. 215, 2008. View at Publisher · View at Google Scholar · View at Scopus
  5. S. Nasser, A. Breland, F. Harris, and M. Nicolescu, “A fuzzy classifier to taxonomically group,” in Proceedings of the DNA Fragments within a Metagenome, vol. New york, NY, USA, pp. 1–6, 2008, Annual Meeting of the North American Fuzzy Information Processing Society (NAFIPS '08).
  6. D. H. Huson, A. F. Auch, J. Qi, and S. C. Schuster, “MEGAN analysis of metagenomic data,” Genome Research, vol. 17, no. 3, pp. 377–386, 2007. View at Publisher · View at Google Scholar · View at Scopus
  7. A. C. McHardy, H. G. Martín, A. Tsirigos, P. Hugenholtz, and I. Rigoutsos, “Accurate phylogenetic classification of variable-length DNA fragments,” Nature Methods, vol. 4, no. 1, pp. 63–72, 2007. View at Publisher · View at Google Scholar · View at Scopus
  8. A. Brady and S. L. Salzberg, “Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models,” Nature Methods, vol. 6, no. 9, pp. 673–676, 2009. View at Publisher · View at Google Scholar · View at Scopus
  9. S. Nasser, A. Breland, F. Harris, and M. Nicolescu, “Metagenome fragment classification using n-mer frequency profiles,” in Proceedings of the Annual Meeting of the North American Fuzzy Information Processing Society (NAFIPS '08), pp. 1–6, New York, NY, USA, 2008.
  10. S. Chatterji, I. Yamazaki, Z. Bai, and J. A. Eisen, “CompostBin: a DNA composition-based algorithm for binning environmental shotgun reads,” Lecture Notes in Computer Science, vol. 4955, pp. 17–28, 2008. View at Publisher · View at Google Scholar
  11. F. Gori, G. Folino, M. S.M. Jetten, and E. Marchiori, “MTR: taxonomic annotation of short metagenomic reads using clustering at multiple taxonomic ranks,” Bioinformatics, vol. 27, no. 2, pp. 196–203, 2011. View at Publisher · View at Google Scholar
  12. L. Krause, N. N. Diaz, A. Goesmann et al., “Phylogenetic classification of short environmental DNA fragments,” Nucleic Acids Research, vol. 36, no. 7, pp. 2230–2239, 2008. View at Publisher · View at Google Scholar · View at Scopus
  13. S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, “Basic local alignment search tool,” Journal of Molecular Biology, vol. 215, no. 3, pp. 403–410, 1990. View at Publisher · View at Google Scholar · View at Scopus
  14. H. Teeling, J. Waldmann, T. Lombardot, M. Bauer, and F. O. Glöckner, “TETRA: a web-service and a stand-alone program for the analysis and comparison of tetranucleotide usage patterns in DNA sequences,” BMC Bioinformatics, vol. 5, no. 1, article no. 163, 2004. View at Publisher · View at Google Scholar · View at Scopus
  15. M. Bailly-Bechet, A. Danchin, M. Iqbal, M. Marsili, and M. Vergassola, “Codon usage domains over bacterial chromosomes,” PLoS Computational Biology, vol. 2, no. 4, article e37, pp. 263–275, 2006. View at Publisher · View at Google Scholar · View at Scopus
  16. S. D. Bentley and J. Parkhill, “Comparative genomic structure of prokaryotes,” Annual Review of Genetics, vol. 38, pp. 771–792, 2004. View at Publisher · View at Google Scholar · View at Scopus
  17. A. Kislyuk, S. Bhatnagar, J. Dushoff, and J. S. Weitz, “Unsupervised statistical clustering of environmental shotgun sequences,” BMC Bioinformatics, vol. 10, article no. 1471, p. 316, 2009. View at Publisher · View at Google Scholar · View at Scopus
  18. D. R. Kelley and S. L. Salzberg, “Clustering metagenomic sequences with interpolated Markov models,” BMC Bioinformatics, vol. 11, no. 1, article no. 544, 2010. View at Publisher · View at Google Scholar · View at Scopus
  19. Y.-W. Wu and Y. Ye, “A novel abundance-based algorithm for binning metagenomic sequences using l-tuples,” Lecture Notes in Computer Science, vol. 6044, pp. 535–549, 2010. View at Publisher · View at Google Scholar · View at Scopus
  20. S. Karlin, I. Ladunga, and B. E. Blaisdell, “Heterogeneity of genomes: measures and values,” Proceedings of the National Academy of Sciences of the United States of America, vol. 91, no. 26, pp. 12837–12841, 1994. View at Publisher · View at Google Scholar · View at Scopus
  21. G. Reinert, S. Schbath, and M. S. Waterman, “Probabilistic and statistical properties of words: an overview,” Journal of Computational Biology, vol. 7, no. 1-2, pp. 1–46, 2000. View at Publisher · View at Google Scholar · View at Scopus
  22. H. Teeling, A. Meyerdierks, M. Bauer, R. Amann, and F. O. Glöckner, “Application of tetranucleotide frequencies for the assignment of genomic fragments,” Environmental Microbiology, vol. 6, no. 9, pp. 938–947, 2004. View at Publisher · View at Google Scholar · View at Scopus
  23. V. Brendel, J. S. Beckmann, and E. N. Trifonov, “Linguistics of nucleotide sequences: morphology and comparison of vocabularies,” Journal of Biomolecular Structure and Dynamics, vol. 4, no. 1, pp. 011–021, 1986. View at Google Scholar · View at Scopus
  24. P. J. Deschavanne, A. Giron, J. Vilain, G. Fagot, and B. Fertit, “Genomic signature: characterization and classification of species assessed by chaos game representation of sequences,” Molecular Biology and Evolution, vol. 16, no. 10, pp. 1391–1399, 1999. View at Google Scholar
  25. S. Robin, F. Rodolphe, and S. Schbath, DNA, Words and Models: Statistics of Exceptional Words, Cambridge University Press, 2005.
  26. Y. Song, Z. Zhuang, H. Li et al., “Real-time automatic tag recommendation,” in Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, (SIGIR '08), pp. 515–522, ACM, New York, NY, USA, 2008.
  27. J. Pearl, Probabilistic Reasoning in Intelligent Systems : Networks of Plausible Inference, Morgan Kaufmann, 1988.
  28. J. Li and H. Zha, “Two-way Poisson mixture models for simultaneous document classification and word clustering,” Computational Statistics and Data Analysis, vol. 50, no. 1, pp. 163–180, 2006. View at Publisher · View at Google Scholar · View at Scopus
  29. M. Qiao and J. Li, “Two-way Gaussian mixture models for high dimensional classification,” Statistical Analysis and Data Mining, vol. 3, no. 4, pp. 259–271, 2010. View at Publisher · View at Google Scholar · View at Scopus
  30. W. Feller, An Introduction to Probability Theory and Its Applications, vol. 1, Wiley, 1968.
  31. K. Mavromatis, N. Ivanova, K. Barry et al., “Use of simulated data sets to evaluate the fidelity of metagenomic processing methods,” Nature Methods, vol. 4, no. 6, pp. 495–500, 2007. View at Publisher · View at Google Scholar · View at Scopus
  32. R. Tibshirani and G. Walther, “Cluster validation by prediction strength,” Journal of Computational and Graphical Statistics, vol. 14, no. 3, pp. 511–528, 2005. View at Publisher · View at Google Scholar · View at Scopus
  33. J. R. Cole, B. Chai, R. J. Farris et al., “The Ribosomal Database Project (RDP-II): sequences and tools for high-throughput rRNA analysis,” Nucleic Acids Research, vol. 33, supplement 1, pp. D294–D296, 2005. View at Publisher · View at Google Scholar · View at Scopus
  34. B. Liu, T. Gibbons, M. Ghodsi, and M. Pop, “Metaphyler: taxonomic profiling for metagenomic sequences,” in Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine (BIBM'10), T. Park, S. K.-W. Tsui, L. Chen, M. K. Ng, L. Wong, and X. Hu, Eds., p. 95, IEEE Computer Society, 2010.
  35. F. Schreiber, P. Gumrich, R. Daniel, and P. Meinicke, “Treephyler: fast taxonomic profiling of metagenomes,” Bioinformatics, vol. 26, no. 7, Article ID btq070, pp. 960–961, 2010. View at Publisher · View at Google Scholar · View at Scopus
  36. J. Josse, A. D. Kaiser, and A. Kornberg, “Enzymatic synthesis of deoxyribonucleic acid. VIII. Frequencies of nearest neighbor base sequences in deoxyribonucleic acid,” The Journal of Biological Chemistry, vol. 236, pp. 864–875, 1961. View at Google Scholar · View at Scopus
  37. G. J. Russell, P. M. B. Walker, R. A. Elton, and J. H. Subak Sharpe, “Doublet frequency analysis of fractionated vertebrate nuclear DNA,” Journal of Molecular Biology, vol. 108, no. 1, pp. 1–20, 1976. View at Google Scholar · View at Scopus
  38. A. Campbell, J. Mrázek, and S. Karlin, “Genome signature comparisons among prokaryote, plasmid, and mitochondrial DNA,” Proceedings of the National Academy of Sciences of the United States of America, vol. 96, no. 16, pp. 9184–9189, 1999. View at Publisher · View at Google Scholar · View at Scopus
  39. G. W. Tyson, J. Chapman, P. Hugenholtz et al., “Community structure and metabolism through reconstruction of microbial genomes from the environment,” Nature, vol. 428, no. 6978, pp. 37–43, 2004. View at Publisher · View at Google Scholar · View at Scopus