- About this Journal ·
- Abstracting and Indexing ·
- Advance Access ·
- Aims and Scope ·
- Annual Issues ·
- Article Processing Charges ·
- Articles in Press ·
- Author Guidelines ·
- Bibliographic Information ·
- Citations to this Journal ·
- Contact Information ·
- Editorial Board ·
- Editorial Workflow ·
- Free eTOC Alerts ·
- Publication Ethics ·
- Reviewers Acknowledgment ·
- Submit a Manuscript ·
- Subscription Information ·
- Table of Contents
BioMed Research International
Volume 2013 (2013), Article ID 791051, 16 pages
Streaming Support for Data Intensive Cloud-Based Sequence Analysis
1Center for Informatics Sciences, Nile University, Giza, Egypt
2IBM Innovation Center, Zurich, Switzerland
3Center for Biomedical Informatics, Harvard Medical School, Boston, MA, USA
4Department of Biology, University of Bern, Bern, Switzerland
5Systems and Biomedical Engineering Department, Faculty of Engineering, Cairo University, Giza, Egypt
Received 10 September 2012; Revised 26 December 2012; Accepted 17 February 2013
Academic Editor: Ming Ouyang
Copyright © 2013 Shadi A. Issa et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
- AWS (Amazon Web Services), http://aws.amazon.com/.
- W. Azure, http://www.microsoft.com/windowsazure/.
- IBM Smart Cloud Enterprise, http://www.ibm.com/cloud-computing/.
- Rackspace, http://www.rackspace.com/.
- “Magellan—a cloud for Science,” http://magellan.alcf.anl.gov/.
- DIAG-Data Intensive Academic Grid, http://diagcomputing.org/.
- E. Pennisi, “Will computers crash genomics?” Science, vol. 331, no. 6018, pp. 666–668, 2011.
- M. C. Schatz, B. Langmead, and S. L. Salzberg, “Cloud computing and the DNA data race,” Nature Biotechnology, vol. 28, no. 7, pp. 691–693, 2010.
- A. Bateman and M. Wood, “Cloud computing,” Bioinformatics, vol. 25, no. 12, p. 1475, 2009.
- J. T. Dudley and A. J. Butte, “In silico research in the era of cloud computing,” Nature Biotechnology, vol. 28, no. 11, pp. 1181–1185, 2010.
- L. D. Stein, “The case for cloud computing in genome informatics,” Genome Biology, vol. 11, no. 5, article 207, 2010.
- V. Fusaro, P. Patil, E. Gafni, D. Wall, and P. Tonellato, “Biomedical cloud computing with Amazon web services,” PLOS Computational Biology, vol. 7, no. 8, Article ID e1002147, 2011.
- B. Langmead, M. C. Schatz, J. Lin, M. Pop, and S. L. Salzberg, “Searching for SNPs with cloud computing,” Genome Biology, vol. 10, no. 11, article R134, 2009.
- D. Wall, P. Kudtarkar, V. Fusaro, R. Pivovarov, P. Patil, and P. Tonellato, “Cloud computing for comparative genomics,” BMC Bioinformatics, vol. 11, article 259, 2010.
- B. Langmead, K. D. Hansen, and J. T. Leek, “Cloud-scale RNA-sequencing differential expression analysis with Myrna,” Genome Biology, vol. 11, no. 8, article R83, 2010.
- M. C. Schatz, “CloudBurst: highly sensitive read mapping with MapReduce,” Bioinformatics, vol. 25, no. 11, pp. 1363–1369, 2009.
- B. Langmead, C. Trapnell, M. Pop, and S. L. Salzberg, “Ultrafast and memory-efficient alignment of short DNA sequences to the human genome,” Genome Biology, vol. 10, no. 3, article R25, 2009.
- C. Rapier and B. Bennett, “High speed bulk data transfer using the SSH protocol,” in Proceedings of the 15th ACM Mardi Gras Conference: From Lightweight Mash-Ups to Lambda grids: Understanding the Spectrum of Distributed Computing Requirements, Applications, Tools, Infrastructures, Interoperability, and the Incremental Adoption of Key Capabilities (MG '08), vol. 11, pp. 1–11, ACM.
- T. Oinn, M. Addis, J. Ferris et al., “Taverna: a tool for the composition and enactment of bioinformatics workflows,” Bioinformatics, vol. 20, no. 17, pp. 3045–3054, 2004.
- D. Hull, K. Wolstencroft, R. Stevens et al., “Taverna: a tool for building and running workflows of services,” Nucleic Acids Research, vol. 34, pp. W729–W732, 2006.
- B. Giardine, C. Riemer, R. C. Hardison et al., “Galaxy: a platform for interactive large-scale genome analysis,” Genome Research, vol. 15, no. 10, pp. 1451–1455, 2005.
- StarCluster, http://web.mit.edu/stardev/cluster/.
- Vappio, http://vappio.sf.net/.
- E. Afgan, D. Baker, N. Coraor, B. Chapman, A. Nekrutenko, and J. Taylor, “Galaxy CloudMan: delivering cloud compute clusters,” BMC Bioinformatics, vol. 11, supplement 12, article S4, 2010.
- J. Goecks, A. Nekrutenko, J. Taylor, and T. G. Team, “Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences,” Genome Biology, no. 8, article R86, 2010.
- G. Graefe, “Query evaluation techniques for large databases,” ACM Computing Surveys, vol. 25, no. 2, pp. 73–170, 1993.
- D. Logothetis, C. Trezzo, K. Webb, and K. Yocum, “In-situ MapReduce for log processing,” in Proceedings of the USENIX Conference on USENIX Annual Technical Conference (USENIXATC'11), pp. 9–9, USENIX Association, 2011.
- N. Backman, K. Pattabiraman, and U. Cetintemel, “C-MR: a continuous MapReduce processing model for low-latency stream processing on multi-core architectures,” 2010.
- R. Kienzler, R. Bruggmann, A. Ranganathan, and N. Tatbul, “Large-scale DNA sequence analysis in the cloud: a stream-based approach,” in Proceedings of the Euro-Par VHPC Workshop, 2011.
- R. Kienzler, R. Bruggmann, A. Ranganathan, and N. Tatbul, “Stream as you go: the case for incremental data access and processing in the cloud,” in Proceedings of the ICDE DMC Workshop, 2012.
- S. M. Rumble, P. Lacroute, A. V. Dalca, M. Fiume, A. Sidow, and M. Brudno, “SHRiMP: accurate mapping of short color-space reads,” PLoS Computational Biology, vol. 5, no. 5, Article ID e1000386, 2009.
- s3fs, “FUSE-based le system backed by Amazon S3,” http://code.google.com/p/s3fs/.
- J. Dean and S. Ghemawat, “MapReduce: simplied data processing on large clusters,” in Proceedings of the 6th Conference on Symposium on Opearting Systems Design and Implementation (OSDI '04), vol. 6, pp. 10–10, USENIX Association, 2004.
- B. Linke, R. Giegerich, and A. Goesmann, “Conveyor: a workflow engine for bioinformatic analyses,” Bioinformatics, vol. 27, no. 7, Article ID btr040, pp. 903–911, 2011.
- B. Ludäscher, I. Altintas, C. Berkley et al., “Scientific workflow management and the Kepler system,” Concurrency Computation Practice and Experience, vol. 18, no. 10, pp. 1039–1065, 2006.
- I. Taylor, M. Shields, I. Wang, and A. Harrison, “Visual grid workflow in Triana,” Journal of Grid Computing, vol. 3, no. 3-4, pp. 153–169, 2005.
- I. Taylor, M. Shields, I. Wang, and A. Harrison, “The Triana work ow environment: architecture and applications,” in Workflows for e-Science, pp. 320–339, Springer, New York, NY, USA, 2007.
- E. Deelman, G. Singh, M. H. Su et al., “Pegasus: a framework for mapping complex scientific workflows onto distributed systems,” Scientific Programming, vol. 13, no. 3, pp. 219–237, 2005.
- S. P. Shah, D. Y. M. He, J. N. Sawkins et al., “Pegasys: software for executing and integrating analyses of biological sequences,” BMC Bioinformatics, vol. 5, article 40, 2004.
- P. Rice, L. Longden, and A. Bleasby, “EMBOSS: the european molecular biology open software suite,” Trends in Genetics, vol. 16, no. 6, pp. 276–277, 2000.
- H. Li, B. Handsaker, A. Wysoker et al., “The Sequence Alignment/Map format and SAMtools,” Bioinformatics, vol. 25, no. 16, pp. 2078–2079, 2009.
- FASTX-Toolkit, http://hannonlab.cshl.edu/fastx_toolkit/.
- S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, “Basic local alignment search tool,” Journal of Molecular Biology, vol. 215, no. 3, pp. 403–410, 1990.
- S. F. Altschul, T. L. Madden, A. A. Schäffer et al., “Gapped BLAST and PSI-BLAST: a new generation of protein database search programs,” Nucleic Acids Research, vol. 25, no. 17, pp. 3389–3402, 1997.
- Z. Zhang, S. Schwartz, L. Wagner, and W. Miller, “A greedy algorithm for aligning DNA sequences,” Journal of Computational Biology, vol. 7, no. 1-2, pp. 203–214, 2000.
- M. Eriksen, “Trickle: a userland bandwidth shaper for Unix-like systems,” in Proceedings of the Annual Conference on USENIX Annual Technical Conference (ATEC '05), pp. 43–43, 2005.