Table of Contents
ISRN Biomathematics
Volume 2013 (2013), Article ID 615630, 14 pages
http://dx.doi.org/10.1155/2013/615630
Review Article

An Overview of Multiple Sequence Alignments and Cloud Computing in Bioinformatics

1Department of Biological Sciences, Cork Institute of Technology, Rossa Avenue, Bishopstown, Cork, Ireland
2Department of Computing, Cork Institute of Technology, Rossa Avenue, Bishopstown, Cork, Ireland

Received 24 May 2013; Accepted 23 June 2013

Academic Editors: M. Glavinovic and X.-Y. Lou

Copyright © 2013 Jurate Daugelaite et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. C. Kemena and C. Notredame, “Upcoming challenges for multiple sequence alignment methods in the high-throughput era,” Bioinformatics, vol. 25, no. 19, pp. 2455–2465, 2009. View at Publisher · View at Google Scholar · View at Scopus
  2. R. C. Edgar and S. Batzoglou, “Multiple sequence alignment,” Current Opinion in Structural Biology, vol. 16, no. 3, pp. 368–373, 2006. View at Publisher · View at Google Scholar · View at Scopus
  3. C. Notredame, “Recent evolutions of multiple sequence alignment algorithms,” PLoS Computational Biology, vol. 3, no. 8, article e123, 2007. View at Publisher · View at Google Scholar · View at Scopus
  4. Human Genome Project Information, 2013, http://web.ornl.gov/sci/techresources/Human_Genome/home.shtml.
  5. “Home, 1000 Genomes,” 2013, http://www.1000genomes.org/.
  6. G. K. C. O. Scientists, “Genome 10K: a proposal to obtain whole-genome sequence for 10, 000 vertebrate species,” Journal of Heredity, vol. 100, no. 6, pp. 659–674, 2009. View at Google Scholar
  7. 454 Life Sciences, a Roche Company, 2013, http://www.454.com/.
  8. Illumina, Inc, 2013, https://www.illumina.com/.
  9. SOLiDTM 4 System, 2013, http://www.appliedbiosystems.com/absite/us/en/home/applications-technologies/solid-next-generation-sequencing/next-generation-systems/solid-4-system.html?CID=FL-091411_solid4.
  10. H. Li and N. Homer, “A survey of sequence alignment algorithms for next-generation sequencing,” Briefings in Bioinformatics, vol. 11, no. 5, pp. 473–483, 2010. View at Publisher · View at Google Scholar · View at Scopus
  11. SourceForge.net: jnomics, 2013, http://sourceforge.net/apps/mediawiki/jnomics/index.php?title=Jnomics.
  12. C. B. Do and K. Katoh, “Protein multiple sequence alignment,” Methods in Molecular Biology, vol. 484, pp. 379–413, 2008. View at Publisher · View at Google Scholar · View at Scopus
  13. R. C. Edgar, “MUSCLE: a multiple sequence alignment method with reduced time and space complexity,” BMC Bioinformatics, vol. 5, article 113, 2004. View at Publisher · View at Google Scholar · View at Scopus
  14. S. B. Needleman and C. D. Wunsch, “A general method applicable to the search for similarities in the amino acid sequence of two proteins,” Journal of Molecular Biology, vol. 48, no. 3, pp. 443–453, 1970. View at Google Scholar · View at Scopus
  15. T. F. Smith and M. S. Waterman, “Identification of common molecular subsequences,” Journal of Molecular Biology, vol. 147, no. 1, pp. 195–197, 1981. View at Google Scholar · View at Scopus
  16. I. M. Wallace, G. Blackshields, and D. G. Higgins, “Multiple sequence alignments,” Current Opinion in Structural Biology, vol. 15, no. 3, pp. 261–266, 2005. View at Publisher · View at Google Scholar · View at Scopus
  17. K. Katoh and H. Toh, “Recent developments in the MAFFT multiple sequence alignment program,” Briefings in Bioinformatics, vol. 9, no. 4, pp. 286–298, 2008. View at Publisher · View at Google Scholar · View at Scopus
  18. D.-F. Feng and R. F. Doolittle, “Progressive sequence alignment as a prerequisitetto correct phylogenetic trees,” Journal of Molecular Evolution, vol. 25, no. 4, pp. 351–360, 1987. View at Publisher · View at Google Scholar · View at Scopus
  19. W. J. Wilbur and D. J. Lipman, “Rapid similarity searches of nucleic acid and protein data banks,” Proceedings of the National Academy of Sciences of the United States of America, vol. 80, no. 3, pp. 726–730, 1983. View at Google Scholar · View at Scopus
  20. R. C. Edgar, “MUSCLE: multiple sequence alignment with high accuracy and high throughput,” Nucleic Acids Research, vol. 32, no. 5, pp. 1792–1797, 2004. View at Publisher · View at Google Scholar · View at Scopus
  21. F. Sievers, A. Wilm, D. Dineen et al., “Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega,” Molecular Systems Biology, vol. 7, article 539, 2011. View at Publisher · View at Google Scholar · View at Scopus
  22. N. Saitou and M. Nei, “The neighbor-joining method: a new method for reconstructing phylogenetic trees,” Molecular Biology and Evolution, vol. 4, no. 4, pp. 406–425, 1987. View at Google Scholar · View at Scopus
  23. I. Gronau and S. Moran, “Optimal implementations of UPGMA and other common clustering algorithms,” Information Processing Letters, vol. 104, no. 6, pp. 205–210, 2007. View at Publisher · View at Google Scholar · View at Scopus
  24. J. D. Thompson, D. G. Higgins, and T. J. Gibson, “CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice,” Nucleic Acids Research, vol. 22, no. 22, pp. 4673–4680, 1994. View at Google Scholar · View at Scopus
  25. K. Katoh and D. M. Standley, “MAFFT multiple sequence alignment software version 7: improvements in performance and usability,” Molecular Biology and Evolution, vol. 30, no. 4, pp. 772–780, 2013. View at Google Scholar
  26. T. Lassmann and E. L. L. Sonnhammer, “Kalign—an accurate and fast multiple sequence alignment algorithm,” BMC Bioinformatics, vol. 6, article 298, 2005. View at Publisher · View at Google Scholar · View at Scopus
  27. U. Roshan and D. R. Livesay, “Probalign: multiple sequence alignment using partition function posterior probabilities,” Bioinformatics, vol. 22, no. 22, pp. 2715–2721, 2006. View at Publisher · View at Google Scholar · View at Scopus
  28. B. Morgenstern, “DIALIGN: multiple DNA and protein sequence alignment at BiBiServ,” Nucleic Acids Research, vol. 32, supplement 2, pp. W33–W36, 2004. View at Publisher · View at Google Scholar · View at Scopus
  29. A. Löytynoja and N. Goldman, “Phylogeny-aware gap placement prevents errors in sequence alignment and evolutionary analysis,” Science, vol. 320, no. 5883, pp. 1632–1635, 2008. View at Publisher · View at Google Scholar · View at Scopus
  30. R. K. Bradley, A. Roberts, M. Smoot et al., “Fast statistical alignment,” PLoS Computational Biology, vol. 5, no. 5, Article ID e1000392, 2009. View at Publisher · View at Google Scholar · View at Scopus
  31. P. Di Tommaso, S. Moretti, I. Xenarios et al., “T-Coffee: a web server for the multiple sequence alignment of protein and RNA sequences using structural information and homology extension,” Nucleic Acids Research, vol. 39, supplement 2, pp. W13–W17, 2011. View at Publisher · View at Google Scholar · View at Scopus
  32. C. Notredame, D. G. Higgins, and J. Heringa, “T-coffee: a novel method for fast and accurate multiple sequence alignment,” Journal of Molecular Biology, vol. 302, no. 1, pp. 205–217, 2000. View at Publisher · View at Google Scholar · View at Scopus
  33. C. B. Do, M. S. P. Mahabhashyam, M. Brudno, and S. Batzoglou, “ProbCons: probabilistic consistency-based multiple sequence alignment,” Genome Research, vol. 15, no. 2, pp. 330–340, 2005. View at Publisher · View at Google Scholar · View at Scopus
  34. Y. Liu, B. Schmidt, and D. L. Maskell, “MSAProbs: multiple sequence alignment based on pair hidden Markov models and partition function posterior probabilities,” Bioinformatics, vol. 26, no. 16, pp. 1958–1964, 2010. View at Publisher · View at Google Scholar · View at Scopus
  35. D. W. Mount, “Using iterative methods for global multiple sequence alignment,” Cold Spring Harbor Protocols, vol. 4, no. 7, Article ID pdb.top44, 2009. View at Publisher · View at Google Scholar · View at Scopus
  36. O. Gotoh, “Optimal alignment between groups of sequences and its application to multiple sequence alignment,” Computer Applications in the Biosciences, vol. 9, no. 3, pp. 361–370, 1993. View at Google Scholar · View at Scopus
  37. C. Notredame and D. G. Higgins, “SAGA: sequence alignment by genetic algorithm,” Nucleic Acids Research, vol. 24, no. 8, pp. 1515–1524, 1996. View at Publisher · View at Google Scholar · View at Scopus
  38. J. D. Thompson, F. Plewniak, and O. Poch, “A comprehensive comparison of multiple sequence alignment programs,” Nucleic Acids Research, vol. 27, no. 13, pp. 2682–2690, 1999. View at Publisher · View at Google Scholar · View at Scopus
  39. A. M. Lesk and C. Chothia, “How different amino acid sequences determine similar protein structures: the structure and evolutionary dynamics of the globins,” Journal of Molecular Biology, vol. 136, no. 3, pp. 225–270, 1980. View at Google Scholar · View at Scopus
  40. O. O'Sullivan, K. Suhre, C. Abergel, D. G. Higgins, and C. Notredame, “3DCoffee: combining protein sequences and structures within multiple sequence alignments,” Journal of Molecular Biology, vol. 340, no. 2, pp. 385–395, 2004. View at Publisher · View at Google Scholar · View at Scopus
  41. F. Armougom, S. Moretti, O. Poirot et al., “Expresso: automatic incorporation of structural information in multiple sequence alignments using 3D-Coffee,” Nucleic Acids Research, vol. 34, supplement 2, pp. W604–W608, 2006. View at Publisher · View at Google Scholar · View at Scopus
  42. X. Xia, S. Zhang, Y. Su, and Z. Sun, “MICAlign: a sequence-to-structure alignment tool integrating multiple sources of information in conditional random fields,” Bioinformatics, vol. 25, no. 11, pp. 1433–1434, 2009. View at Publisher · View at Google Scholar · View at Scopus
  43. Z. Zhang, A. A. Schäffer, W. Miller et al., “Protein sequence similarity searches using patterns as seeds,” Nucleic Acids Research, vol. 26, no. 17, pp. 3986–3990, 1998. View at Google Scholar · View at Scopus
  44. M. C. Frith, N. F. W. Saunders, B. Kobe, and T. L. Bailey, “Discovering sequence motifs with arbitrary insertions and deletions,” PLoS Computational Biology, vol. 4, no. 4, Article ID e1000071, 2008. View at Google Scholar · View at Scopus
  45. H. Li, J. Ruan, and R. Durbin, “Mapping short DNA sequencing reads and calling variants using mapping quality scores,” Genome Research, vol. 18, no. 11, pp. 1851–1858, 2008. View at Publisher · View at Google Scholar · View at Scopus
  46. R. Li, Y. Li, K. Kristiansen, and J. Wang, “SOAP: short oligonucleotide alignment program,” Bioinformatics, vol. 24, no. 5, pp. 713–714, 2008. View at Publisher · View at Google Scholar · View at Scopus
  47. B. Langmead, C. Trapnell, M. Pop, and S. L. Salzberg, “Ultrafast and memory-efficient alignment of short DNA sequences to the human genome,” Genome Biology, vol. 10, no. 3, article R25, 2009. View at Publisher · View at Google Scholar · View at Scopus
  48. “A Definition of The Cloud at Last?—Web Performance Watch,” 2013, http://blogs.keynote.com/the_watch/2010/05/for-as-long-as-ive-been-involved-in-writing-about-the-cloud-and-its-related-applications-ive-seen-the-basic-question.html.
  49. “Virtualization is a key enabler of cloud computing,” 2013, http://www.itnewsafrica.com/2010/03/virtualization-is-a-key-enabler-of-cloud-computing/.
  50. D. Arthur and S. Vassilvitskii, “k-means++: the advantages of careful seeding,” in Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, Society for Industrial and Applied Mathematics, 2007.
  51. J. Söding, “Protein homology detection by HMM-HMM comparison,” Bioinformatics, vol. 21, no. 7, pp. 951–960, 2005. View at Publisher · View at Google Scholar · View at Scopus
  52. F. Sievers, D. Dineen, A. Wilm, and D. G. Higgins, “Making automated multiple alignments of very large numbers of protein sequences,” Bioinformatics, vol. 29, no. 8, pp. 989–995, 2013. View at Google Scholar
  53. K. Katoh, K. Misawa, K.-I. Kuma, and T. Miyata, “MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform,” Nucleic Acids Research, vol. 30, no. 14, pp. 3059–3066, 2002. View at Google Scholar · View at Scopus
  54. “What cloud computing really means,” 2013, http://www.infoworld.com/d/cloud-computing/what-cloud-computing-really-means-031.
  55. “MIT credits Irish-based entrepreneur with co-coining term “cloud computing”,” 2013, http://www.siliconrepublic.com/cloud/item/24280-mit-credits-irish-based-ent.
  56. “The Benefits Of Data Center Virtualization For Businesses, CloudTweaks,” 2013, http://www.cloudtweaks.com/2012/03/the-benefits-of-data-center-virtualization-for-businesses/.
  57. “VMware Virtualization Software for Desktops, Servers & Virtual Machines for Public and Private Cloud Solutions,” 2013, http://www.vmware.com/.
  58. Main Page—KVM, 2013, http://www.linux-kvm.org/page/Main_Page.
  59. “Accelerating high-performance computing applications using parallel computing,” 2013, http://embedded-computing.com/articles/accelerating-using-parallel-computing/.
  60. “Cloud 101: What the heck do IaaS, PaaS and SaaS companies do?” 2013, http://venturebeat.com/2011/11/14/cloud-iaas-paas-saas/.
  61. Amazon Web Services, “Cloud Computing: Compute, Storage, Database,” 2013, http://aws.amazon.com/.
  62. Microsoft Home Page, Devices and Services, 2013, http://www.microsoft.com/en-us/default.aspx.
  63. Rackspace, 2013, http://www.rackspace.com/.
  64. Ensembl Genome Browser, 2013, http://www.ensembl.org/index.html.
  65. GenBank Home, 2013, http://www.ncbi.nlm.nih.gov/genbank.
  66. CRM—The Enterprise Cloud Computing Company—Salesforce.com Europe, 2013, http://www.salesforce.com/eu/.
  67. V. Choudhary, “Software as a service: implications for investment in software development,” in Proceedings of the 40th Annual Hawaii International Conference on System Sciences (HICSS '07), January 2007. View at Publisher · View at Google Scholar · View at Scopus
  68. G. Lawton, “Developing software online with platform-as-a-service technology,” Computer, vol. 41, no. 6, pp. 13–15, 2008. View at Publisher · View at Google Scholar · View at Scopus
  69. J. Dean and S. Ghemawat, “MapReduce: simplified data processing on large clusters,” Communications of the ACM, vol. 51, no. 1, pp. 107–113, 2008. View at Publisher · View at Google Scholar · View at Scopus
  70. T. Nguyen, W. Shi, and D. Ruden, “CloudAligner: a fast and full-featured MapReduce based tool for sequence mapping,” BMC Research Notes, vol. 4, article 171, 2011. View at Publisher · View at Google Scholar · View at Scopus
  71. M. C. Schatz, “CloudBurst: highly sensitive read mapping with MapReduce,” Bioinformatics, vol. 25, no. 11, pp. 1363–1369, 2009. View at Publisher · View at Google Scholar · View at Scopus
  72. L. Pireddu, S. Leo, and G. Zanetti, “Seal: a distributed short read mapping and duplicate removal tool,” Bioinformatics, vol. 27, no. 15, pp. 2159–2160, 2011. View at Publisher · View at Google Scholar · View at Scopus
  73. “Blastreduce: high performance short read mapping with mapreduce,” 2013, http://www.cbcb.umd.edu/software/blastreduce.
  74. B. Langmead, M. C. Schatz, J. Lin, M. Pop, and S. L. Salzberg, “Searching for SNPs with cloud computing,” Genome Biology, vol. 10, no. 11, article R134, 2009. View at Publisher · View at Google Scholar · View at Scopus
  75. M. C. Schatz, D. Sommer, D. Kelley, and M. Pop, “De novo assembly of large genomes using cloud computing,” in Proceedings of the CSHL Biology of Genomes Conference, 2010.
  76. Y.-J. Chang, C. C. Chen, C. L. Chen, and J. M. Ho, “A de novo next generation genomic sequence assembler based on string graph and mapreduce cloud computing framework,” BMC Genomics, vol. 13, supplement 7, p. S28, 2012. View at Google Scholar
  77. B. Langmead, K. D. Hansen, and J. T. Leek, “Cloud-scale RNA-sequencing differential expression analysis with Myrna,” Genome Biology, vol. 11, no. 8, p. R83, 2010. View at Publisher · View at Google Scholar · View at Scopus
  78. D. Hong, A. Rhie, S.-S. Park et al., “FX: an RNA-seq analysis tool on the cloud,” Bioinformatics, vol. 28, no. 5, pp. 721–723, 2012. View at Publisher · View at Google Scholar · View at Scopus
  79. L. Jourdren, M. Bernard, M. A. Dillies, and S. Le Crom, “Eoulsan: a cloud computing-based framework facilitating high throughput sequencing analyses,” Bioinformatics, vol. 28, no. 11, pp. 1542–1543, 2012. View at Google Scholar
  80. M. Niemenmaa, A. Kallio, A. Schumacher, P. Klemelä, E. Korpelainen, and K. Heljanko, “Hadoop-BAM: directly manipulating next generation sequencing data in the cloud,” Bioinformatics, vol. 28, no. 6, pp. 876–877, 2012. View at Publisher · View at Google Scholar · View at Scopus
  81. B. D. O'Connor, B. Merriman, and S. F. Nelson, “SeqWare Query Engine: storing and searching sequence data in the cloud,” BMC Bioinformatics, vol. 11, supplement 12, article S2, 2010. View at Publisher · View at Google Scholar · View at Scopus
  82. A. McKenna, M. Hanna, E. Banks et al., “The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data,” Genome Research, vol. 20, no. 9, pp. 1297–1303, 2010. View at Publisher · View at Google Scholar · View at Scopus
  83. S. J. Matthews and T. L. Williams, “MrsRF: an efficient MapReduce algorithm for analyzing large collections of evolutionary trees,” BMC Bioinformatics, vol. 11, supplement 1, article S15, 2010. View at Publisher · View at Google Scholar · View at Scopus
  84. M. E. Colosimo, M. W. Peterson, S. Mardis, and L. Hirschman, “Nephele: genotyping via complete composition vectors and MapReduce,” Source Code for Biology and Medicine, vol. 6, article 13, 2011. View at Publisher · View at Google Scholar · View at Scopus
  85. P. D. Vouzis and N. V. Sahinidis, “GPU-BLAST: using graphics processors to accelerate protein sequence alignment,” Bioinformatics, vol. 27, no. 2, pp. 182–188, 2011. View at Publisher · View at Google Scholar · View at Scopus
  86. C.-M. Liu, T. Wong, E. Wu et al., “SOAP3: ultra-fast GPU-based parallel alignment tool for short reads,” Bioinformatics, vol. 28, no. 6, pp. 878–879, 2012. View at Publisher · View at Google Scholar · View at Scopus
  87. S. Lewis, A. Csordas, S. Killcoyne et al., “Hydra: a scalable proteomic search engine which utilizes the Hadoop distributed computing framework,” BMC Bioinformatics, vol. 13, article 324, 2012. View at Google Scholar
  88. A. Matsunaga, M. Tsugawa, and J. Fortes, “CloudBLAST: combining MapReduce and virtualization on distributed resources for bioinformatics applications,” in Proceedings of the 4th IEEE International Conference on eScience (eScience '08), pp. 222–229, December 2008. View at Publisher · View at Google Scholar · View at Scopus
  89. X. Feng, R. Grossman, and L. Stein, “PeakRanger: a cloud-enabled peak caller for ChIP-seq data,” BMC Bioinformatics, vol. 12, article 139, 2011. View at Publisher · View at Google Scholar · View at Scopus
  90. L. Zhang, S. Gu, Y. Liu, B. Wang, and F. Azuaje, “Gene set analysis in the cloud,” Bioinformatics, vol. 28, no. 2, pp. 294–295, 2012. View at Publisher · View at Google Scholar · View at Scopus
  91. S. Leo, F. Santoni, and G. Zanetti, “Biodoop: bioinformatics on hadoop,” in Proceedings of the 38th International Conference Parallel Processing Workshops (ICPPW '09), pp. 415–422, September 2009. View at Publisher · View at Google Scholar · View at Scopus
  92. H. Huang, S. Tata, and R. J. Prill, “BlueSNP: R package for highly scalable genome-wide association studies using Hadoop clusters,” Bioinformatics, vol. 29, no. 1, pp. 135–136, 2013. View at Google Scholar
  93. D. R. Kelley, M. C. Schatz, and S. L. Salzberg, “Quake: quality-aware detection and correction of sequencing errors,” Genome Biology, vol. 11, no. 11, article R116, 2010. View at Publisher · View at Google Scholar · View at Scopus
  94. Apache Hadoop, 2013, http://hadoop.apache.org/.
  95. How Hadoop Makes Short Work Of Big Data—Forbes, 2013, http://www.forbes.com/sites/netapp/2012/09/24/hadoop-big-data/.
  96. Public Data Sets on Amazon Web Services (AWS), 2013, http://aws.amazon.com/publicdatasets/.
  97. P. M. Kasson, “Computational biology in the cloud: methods and new insights from computing at scale,” in Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing, World Scientific, 2013.
  98. S. Vijayakumar, A. Bhargavi, U. Praseeda, and S. A. Ahamed, “Optimizing sequence alignment in cloud using hadoop and MPP database,” in Proceedings of the 5th IEEE International Conference on Cloud Computing (CLOUD '12), 2012.
  99. R. D. Sleator, “Proteins: form and function,” Bioeng Bugs, vol. 3, no. 2, pp. 80–85, 2012. View at Google Scholar