About this Journal Submit a Manuscript Table of Contents
BioMed Research International
Volume 2013 (2013), Article ID 791051, 16 pages
http://dx.doi.org/10.1155/2013/791051
Research Article

Streaming Support for Data Intensive Cloud-Based Sequence Analysis

1Center for Informatics Sciences, Nile University, Giza, Egypt
2IBM Innovation Center, Zurich, Switzerland
3Center for Biomedical Informatics, Harvard Medical School, Boston, MA, USA
4Department of Biology, University of Bern, Bern, Switzerland
5Systems and Biomedical Engineering Department, Faculty of Engineering, Cairo University, Giza, Egypt

Received 10 September 2012; Revised 26 December 2012; Accepted 17 February 2013

Academic Editor: Ming Ouyang

Copyright © 2013 Shadi A. Issa et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. AWS (Amazon Web Services), http://aws.amazon.com/.
  2. W. Azure, http://www.microsoft.com/windowsazure/.
  3. IBM Smart Cloud Enterprise, http://www.ibm.com/cloud-computing/.
  4. Rackspace, http://www.rackspace.com/.
  5. “Magellan—a cloud for Science,” http://magellan.alcf.anl.gov/.
  6. DIAG-Data Intensive Academic Grid, http://diagcomputing.org/.
  7. E. Pennisi, “Will computers crash genomics?” Science, vol. 331, no. 6018, pp. 666–668, 2011. View at Publisher · View at Google Scholar · View at Scopus
  8. M. C. Schatz, B. Langmead, and S. L. Salzberg, “Cloud computing and the DNA data race,” Nature Biotechnology, vol. 28, no. 7, pp. 691–693, 2010. View at Publisher · View at Google Scholar · View at Scopus
  9. A. Bateman and M. Wood, “Cloud computing,” Bioinformatics, vol. 25, no. 12, p. 1475, 2009. View at Publisher · View at Google Scholar · View at Scopus
  10. J. T. Dudley and A. J. Butte, “In silico research in the era of cloud computing,” Nature Biotechnology, vol. 28, no. 11, pp. 1181–1185, 2010. View at Publisher · View at Google Scholar · View at Scopus
  11. L. D. Stein, “The case for cloud computing in genome informatics,” Genome Biology, vol. 11, no. 5, article 207, 2010. View at Publisher · View at Google Scholar · View at Scopus
  12. V. Fusaro, P. Patil, E. Gafni, D. Wall, and P. Tonellato, “Biomedical cloud computing with Amazon web services,” PLOS Computational Biology, vol. 7, no. 8, Article ID e1002147, 2011.
  13. B. Langmead, M. C. Schatz, J. Lin, M. Pop, and S. L. Salzberg, “Searching for SNPs with cloud computing,” Genome Biology, vol. 10, no. 11, article R134, 2009. View at Publisher · View at Google Scholar · View at Scopus
  14. D. Wall, P. Kudtarkar, V. Fusaro, R. Pivovarov, P. Patil, and P. Tonellato, “Cloud computing for comparative genomics,” BMC Bioinformatics, vol. 11, article 259, 2010. View at Publisher · View at Google Scholar
  15. B. Langmead, K. D. Hansen, and J. T. Leek, “Cloud-scale RNA-sequencing differential expression analysis with Myrna,” Genome Biology, vol. 11, no. 8, article R83, 2010. View at Publisher · View at Google Scholar · View at Scopus
  16. M. C. Schatz, “CloudBurst: highly sensitive read mapping with MapReduce,” Bioinformatics, vol. 25, no. 11, pp. 1363–1369, 2009. View at Publisher · View at Google Scholar · View at Scopus
  17. B. Langmead, C. Trapnell, M. Pop, and S. L. Salzberg, “Ultrafast and memory-efficient alignment of short DNA sequences to the human genome,” Genome Biology, vol. 10, no. 3, article R25, 2009. View at Publisher · View at Google Scholar · View at Scopus
  18. C. Rapier and B. Bennett, “High speed bulk data transfer using the SSH protocol,” in Proceedings of the 15th ACM Mardi Gras Conference: From Lightweight Mash-Ups to Lambda grids: Understanding the Spectrum of Distributed Computing Requirements, Applications, Tools, Infrastructures, Interoperability, and the Incremental Adoption of Key Capabilities (MG '08), vol. 11, pp. 1–11, ACM.
  19. T. Oinn, M. Addis, J. Ferris et al., “Taverna: a tool for the composition and enactment of bioinformatics workflows,” Bioinformatics, vol. 20, no. 17, pp. 3045–3054, 2004. View at Publisher · View at Google Scholar · View at Scopus
  20. D. Hull, K. Wolstencroft, R. Stevens et al., “Taverna: a tool for building and running workflows of services,” Nucleic Acids Research, vol. 34, pp. W729–W732, 2006. View at Publisher · View at Google Scholar · View at Scopus
  21. B. Giardine, C. Riemer, R. C. Hardison et al., “Galaxy: a platform for interactive large-scale genome analysis,” Genome Research, vol. 15, no. 10, pp. 1451–1455, 2005. View at Publisher · View at Google Scholar · View at Scopus
  22. StarCluster, http://web.mit.edu/stardev/cluster/.
  23. Vappio, http://vappio.sf.net/.
  24. E. Afgan, D. Baker, N. Coraor, B. Chapman, A. Nekrutenko, and J. Taylor, “Galaxy CloudMan: delivering cloud compute clusters,” BMC Bioinformatics, vol. 11, supplement 12, article S4, 2010. View at Publisher · View at Google Scholar · View at Scopus
  25. J. Goecks, A. Nekrutenko, J. Taylor, and T. G. Team, “Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences,” Genome Biology, no. 8, article R86, 2010. View at Publisher · View at Google Scholar · View at Scopus
  26. G. Graefe, “Query evaluation techniques for large databases,” ACM Computing Surveys, vol. 25, no. 2, pp. 73–170, 1993. View at Publisher · View at Google Scholar · View at Scopus
  27. D. Logothetis, C. Trezzo, K. Webb, and K. Yocum, “In-situ MapReduce for log processing,” in Proceedings of the USENIX Conference on USENIX Annual Technical Conference (USENIXATC'11), pp. 9–9, USENIX Association, 2011.
  28. N. Backman, K. Pattabiraman, and U. Cetintemel, “C-MR: a continuous MapReduce processing model for low-latency stream processing on multi-core architectures,” 2010.
  29. R. Kienzler, R. Bruggmann, A. Ranganathan, and N. Tatbul, “Large-scale DNA sequence analysis in the cloud: a stream-based approach,” in Proceedings of the Euro-Par VHPC Workshop, 2011.
  30. R. Kienzler, R. Bruggmann, A. Ranganathan, and N. Tatbul, “Stream as you go: the case for incremental data access and processing in the cloud,” in Proceedings of the ICDE DMC Workshop, 2012.
  31. S. M. Rumble, P. Lacroute, A. V. Dalca, M. Fiume, A. Sidow, and M. Brudno, “SHRiMP: accurate mapping of short color-space reads,” PLoS Computational Biology, vol. 5, no. 5, Article ID e1000386, 2009. View at Publisher · View at Google Scholar · View at Scopus
  32. s3fs, “FUSE-based le system backed by Amazon S3,” http://code.google.com/p/s3fs/.
  33. J. Dean and S. Ghemawat, “MapReduce: simplied data processing on large clusters,” in Proceedings of the 6th Conference on Symposium on Opearting Systems Design and Implementation (OSDI '04), vol. 6, pp. 10–10, USENIX Association, 2004.
  34. B. Linke, R. Giegerich, and A. Goesmann, “Conveyor: a workflow engine for bioinformatic analyses,” Bioinformatics, vol. 27, no. 7, Article ID btr040, pp. 903–911, 2011. View at Publisher · View at Google Scholar · View at Scopus
  35. B. Ludäscher, I. Altintas, C. Berkley et al., “Scientific workflow management and the Kepler system,” Concurrency Computation Practice and Experience, vol. 18, no. 10, pp. 1039–1065, 2006. View at Publisher · View at Google Scholar · View at Scopus
  36. I. Taylor, M. Shields, I. Wang, and A. Harrison, “Visual grid workflow in Triana,” Journal of Grid Computing, vol. 3, no. 3-4, pp. 153–169, 2005. View at Publisher · View at Google Scholar · View at Scopus
  37. I. Taylor, M. Shields, I. Wang, and A. Harrison, “The Triana work ow environment: architecture and applications,” in Workflows for e-Science, pp. 320–339, Springer, New York, NY, USA, 2007.
  38. E. Deelman, G. Singh, M. H. Su et al., “Pegasus: a framework for mapping complex scientific workflows onto distributed systems,” Scientific Programming, vol. 13, no. 3, pp. 219–237, 2005. View at Scopus
  39. S. P. Shah, D. Y. M. He, J. N. Sawkins et al., “Pegasys: software for executing and integrating analyses of biological sequences,” BMC Bioinformatics, vol. 5, article 40, 2004. View at Publisher · View at Google Scholar · View at Scopus
  40. P. Rice, L. Longden, and A. Bleasby, “EMBOSS: the european molecular biology open software suite,” Trends in Genetics, vol. 16, no. 6, pp. 276–277, 2000. View at Scopus
  41. H. Li, B. Handsaker, A. Wysoker et al., “The Sequence Alignment/Map format and SAMtools,” Bioinformatics, vol. 25, no. 16, pp. 2078–2079, 2009. View at Publisher · View at Google Scholar · View at Scopus
  42. FASTX-Toolkit, http://hannonlab.cshl.edu/fastx_toolkit/.
  43. S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, “Basic local alignment search tool,” Journal of Molecular Biology, vol. 215, no. 3, pp. 403–410, 1990. View at Publisher · View at Google Scholar · View at Scopus
  44. S. F. Altschul, T. L. Madden, A. A. Schäffer et al., “Gapped BLAST and PSI-BLAST: a new generation of protein database search programs,” Nucleic Acids Research, vol. 25, no. 17, pp. 3389–3402, 1997. View at Publisher · View at Google Scholar · View at Scopus
  45. Z. Zhang, S. Schwartz, L. Wagner, and W. Miller, “A greedy algorithm for aligning DNA sequences,” Journal of Computational Biology, vol. 7, no. 1-2, pp. 203–214, 2000. View at Publisher · View at Google Scholar · View at Scopus
  46. M. Eriksen, “Trickle: a userland bandwidth shaper for Unix-like systems,” in Proceedings of the Annual Conference on USENIX Annual Technical Conference (ATEC '05), pp. 43–43, 2005.