Table of Contents Author Guidelines Submit a Manuscript
Advances in Bioinformatics
Volume 2015, Article ID 635437, 9 pages
http://dx.doi.org/10.1155/2015/635437
Research Article

Developing of the Computer Method for Annotation of Bacterial Genes

1Bioinformatics Laboratory, Centre of Bioengineering, Russian Academy of Sciences, Prospekt 60-letiya Oktyabrya 7/1, Moscow 117312, Russia
2Cybernetics Department, National Research Nuclear University “MEPhI”, Kashirskoe shosse 31, Moscow 115409, Russia

Received 15 September 2015; Revised 16 November 2015; Accepted 18 November 2015

Academic Editor: Bhaskar Dasgupta

Copyright © 2015 Mikhail A. Golyshev and Eugene V. Korotkov. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. F. Eisenhaber, “A decade after the first full human genome sequencing: when will we understand our own genome?” Journal of Bioinformatics and Computational Biology, vol. 10, no. 5, Article ID 1271001, 2012. View at Publisher · View at Google Scholar · View at Scopus
  2. M. Janitz, “Assigning functions to genes—the main challenge of the post-genomics era,” Reviews of Physiology, Biochemistry and Pharmacology, vol. 159, pp. 115–129, 2007. View at Publisher · View at Google Scholar · View at Scopus
  3. A. Saghatelian and B. F. Cravatt, “Assignment of protein function in the postgenomic era,” Nature Chemical Biology, vol. 1, no. 3, pp. 130–142, 2005. View at Publisher · View at Google Scholar · View at Scopus
  4. M. Y. Galperin and E. V. Koonin, “From complete genome sequence to ‘complete’ understanding?” Trends in Biotechnology, vol. 28, no. 8, pp. 398–406, 2010. View at Publisher · View at Google Scholar · View at Scopus
  5. E. J. Richardson and M. Watson, “The automatic annotation of bacterial genomes,” Briefings in Bioinformatics, vol. 14, no. 1, Article ID bbs007, pp. 1–12, 2013. View at Publisher · View at Google Scholar · View at Scopus
  6. W. Li, J. Frondenberg, and M. Oswald, “Principles for the organization of gene-sets,” Computational Biology and Chemistry, 2015. View at Publisher · View at Google Scholar
  7. S. B. Pandit, S. Balaji, and N. Srinivasan, “Structural and functional characterization of gene products encoded in the human genome by homology detection,” IUBMB Life, vol. 56, no. 6, pp. 317–331, 2004. View at Publisher · View at Google Scholar · View at Scopus
  8. I. Friedberg, “Automated protein function prediction—the genomic challenge,” Briefings in Bioinformatics, vol. 7, no. 3, pp. 225–242, 2006. View at Publisher · View at Google Scholar · View at Scopus
  9. S. B. Needleman and C. D. Wunsch, “A general method applicable to the search for similarities in the amino acid sequence of two proteins,” Journal of Molecular Biology, vol. 48, no. 3, pp. 443–453, 1970. View at Publisher · View at Google Scholar · View at Scopus
  10. T. F. Smith and M. S. Waterman, “Identification of common molecular subsequences,” Journal of Molecular Biology, vol. 147, no. 1, pp. 195–197, 1981. View at Publisher · View at Google Scholar · View at Scopus
  11. S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, “Basic local alignment search tool,” Journal of Molecular Biology, vol. 215, no. 3, pp. 403–410, 1990. View at Publisher · View at Google Scholar · View at Scopus
  12. W. R. Pearson and D. J. Lipman, “Improved tools for biological sequence comparison,” Proceedings of the National Academy of Sciences of the United States of America, vol. 85, no. 8, pp. 2444–2448, 1988. View at Publisher · View at Google Scholar · View at Scopus
  13. K. D. Pruitt, T. Tatusova, and D. R. Maglott, “NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins,” Nucleic Acids Research, vol. 33, pp. D501–D504, 2005. View at Publisher · View at Google Scholar · View at Scopus
  14. D. A. Benson, M. Cavanaugh, K. Clark et al., “GenBank,” Nucleic Acids Research, vol. 41, no. 1, pp. D36–D42, 2013. View at Publisher · View at Google Scholar · View at Scopus
  15. M. Kanehisa, S. Goto, S. Kawashima, Y. Okuno, and M. Hattori, “The KEGG resource for deciphering the genome,” Nucleic Acids Research, vol. 32, pp. D277–D280, 2004. View at Publisher · View at Google Scholar · View at Scopus
  16. The UniProt Consortium, “Ongoing and future developments at the universal protein resource,” Nucleic Acids Research, vol. 39, supplement 1, pp. D214–D219, 2011. View at Publisher · View at Google Scholar
  17. A. Bairoch and R. Apweiler, “The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1999,” Nucleic Acids Research, vol. 27, no. 1, pp. 49–54, 1999. View at Publisher · View at Google Scholar · View at Scopus
  18. R. D. Finn, J. Mistry, J. Tate et al., “The Pfam protein families database,” Nucleic Acids Research, vol. 38, no. 1, Article ID gkp985, pp. D211–D222, 2009. View at Publisher · View at Google Scholar · View at Scopus
  19. D. H. Haft, J. D. Selengut, and O. White, “The TIGRFAMs database of protein families,” Nucleic Acids Research, vol. 31, no. 1, pp. 371–373, 2003. View at Publisher · View at Google Scholar · View at Scopus
  20. S. F. Altschul, T. L. Madden, A. A. Schäffer et al., “Gapped BLAST and PSI-BLAST: a new generation of protein database search programs,” Nucleic Acids Research, vol. 25, no. 17, pp. 3389–3402, 1997. View at Publisher · View at Google Scholar · View at Scopus
  21. X. Deng and H. Ali, “A hidden Markov model for gene function prediction from sequential expression data,” in Proceedings of the IEEE Computational Systems Bioinformatics Conference (CSB '04), pp. 670–671, August 2004. View at Publisher · View at Google Scholar · View at Scopus
  22. R. L. Tatusov, M. Y. Galperin, D. A. Natale, and E. V. Koonin, “The COG database: a tool for genome-scale analysis of protein functions and evolution,” Nucleic Acids Research, vol. 28, no. 1, pp. 33–36, 2000. View at Publisher · View at Google Scholar · View at Scopus
  23. S. Hunter, P. Jones, A. Mitchell et al., “InterPro in 2011: new developments in the family and domain prediction database,” Nucleic Acids Research, vol. 40, no. 1, pp. D306–D312, 2012. View at Publisher · View at Google Scholar · View at Scopus
  24. M. Ashburner, C. A. Ball, J. A. Blake et al., “Gene ontology: tool for the unification of biology,” Nature Genetics, vol. 25, no. 1, pp. 25–29, 2000. View at Publisher · View at Google Scholar · View at Scopus
  25. E. Quevillon, V. Silventoinen, S. Pillai et al., “InterProScan: protein domains identifier,” Nucleic Acids Research, vol. 33, no. 2, pp. W116–W120, 2005. View at Publisher · View at Google Scholar · View at Scopus
  26. V. M. Markowitz, I.-M. A. Chen, K. Palaniappan et al., “IMG: the integrated microbial genomes database and comparative analysis system,” Nucleic Acids Research, vol. 40, no. 1, pp. D115–D122, 2012. View at Publisher · View at Google Scholar · View at Scopus
  27. D. M. Tanenbaum, J. Goll, S. Murphy et al., “The JCVI standard operating procedure for annotating prokaryotic metagenomic shotgun sequencing data,” Standards in Genomic Sciences, vol. 2, no. 2, pp. 229–237, 2010. View at Publisher · View at Google Scholar · View at Scopus
  28. R. K. Aziz, D. Bartels, A. Best et al., “The RAST Server: rapid annotations using subsystems technology,” BMC Genomics, vol. 9, article 75, 2008. View at Publisher · View at Google Scholar · View at Scopus
  29. F. Meyer, A. Goesmann, A. C. McHardy et al., “GenDB—an open source genome annotation system for prokaryote genomes,” Nucleic Acids Research, vol. 31, no. 8, pp. 2187–2195, 2003. View at Publisher · View at Google Scholar · View at Scopus
  30. G. F. Weiller, “Phylogenetic profiles: a graphical method for detecting genetic recombinations in homologous sequences,” Molecular Biology and Evolution, vol. 15, no. 3, pp. 326–335, 1998. View at Publisher · View at Google Scholar · View at Scopus
  31. M. Pellegrini, E. M. Marcotte, M. J. Thompson, D. Eisenberg, and T. O. Yeates, “Assigning protein functions by comparative genome analysis: protein phylogenetic profiles,” Proceedings of the National Academy of Sciences of the United States of America, vol. 96, no. 8, pp. 4285–4288, 1999. View at Publisher · View at Google Scholar · View at Scopus
  32. M. Pellegrini, “Using phylogenetic profiles to predict functional relationships,” in Methods in Molecular Biology, J. Helden, A. Toussaint, and D. Thieffry, Eds., vol. 804, pp. 167–177, 2012. View at Google Scholar
  33. P. R. Kensche, V. Van Noort, B. E. Dutilh, and M. A. Huynen, “Practical and theoretical advances in predicting the function of a protein by its phylogenetic distribution,” Journal of the Royal Society Interface, vol. 5, no. 19, pp. 151–170, 2008. View at Publisher · View at Google Scholar · View at Scopus
  34. S. C. Rastogi, N. Mendiratta, and P. Rastogi, Bioinformatics Methods and Applications: Genomics, Proteomics and Drug Discovery, PHI Learning, 2006.
  35. D. E. Raeside, “Monte Carlo principles and applications,” Physics in Medicine and Biology, vol. 21, no. 2, pp. 181–197, 1976. View at Publisher · View at Google Scholar · View at Scopus
  36. J. A. Eisen, “Phylogenomics: improving functional predictions for uncharacterized genes by evolutionary analysis,” Genome Research, vol. 8, no. 3, pp. 163–167, 1998. View at Publisher · View at Google Scholar · View at Scopus
  37. W. Feller, An Introduction to Probability Theory and Its Applications, Wiley, New York, NY, USA, 3rd edition, 1968. View at MathSciNet
  38. J. J. Shuster, “Hypergeometric distribution,” in Encyclopedia of Biostatistics, 2005. View at Publisher · View at Google Scholar
  39. P. Kharchenko, L. Chen, Y. Freund, D. Vitkup, and G. M. Church, “Identifying metabolic enzymes with multiple types of association evidence,” BMC Bioinformatics, vol. 7, article 177, 2006. View at Publisher · View at Google Scholar · View at Scopus
  40. R. Jothi, T. M. Przytycka, and L. Aravind, “Discovering functional linkages and uncharacterized cellular pathways using phylogenetic profile comparisons: a comprehensive assessment,” BMC Bioinformatics, vol. 8, article 173, 2007. View at Publisher · View at Google Scholar · View at Scopus
  41. S. V. Date and E. M. Marcotte, “Discovery of uncharacterized cellular systems by genome-wide analysis of functional linkages,” Nature Biotechnology, vol. 21, no. 9, pp. 1055–1062, 2003. View at Publisher · View at Google Scholar · View at Scopus
  42. F. E. Frenkel and E. V. Korotkov, “Classification analysis of triplet periodicity in protein-coding regions of genes,” Gene, vol. 421, no. 1-2, pp. 52–60, 2008. View at Publisher · View at Google Scholar · View at Scopus