About this Journal Submit a Manuscript Table of Contents
BioMed Research International
Volume 2013 (2013), Article ID 791051, 16 pages
http://dx.doi.org/10.1155/2013/791051
Research Article

Streaming Support for Data Intensive Cloud-Based Sequence Analysis

1Center for Informatics Sciences, Nile University, Giza, Egypt
2IBM Innovation Center, Zurich, Switzerland
3Center for Biomedical Informatics, Harvard Medical School, Boston, MA, USA
4Department of Biology, University of Bern, Bern, Switzerland
5Systems and Biomedical Engineering Department, Faculty of Engineering, Cairo University, Giza, Egypt

Received 10 September 2012; Revised 26 December 2012; Accepted 17 February 2013

Academic Editor: Ming Ouyang

Copyright © 2013 Shadi A. Issa et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Cloud computing provides a promising solution to the genomics data deluge problem resulting from the advent of next-generation sequencing (NGS) technology. Based on the concepts of “resources-on-demand” and “pay-as-you-go”, scientists with no or limited infrastructure can have access to scalable and cost-effective computational resources. However, the large size of NGS data causes a significant data transfer latency from the client’s site to the cloud, which presents a bottleneck for using cloud computing services. In this paper, we provide a streaming-based scheme to overcome this problem, where the NGS data is processed while being transferred to the cloud. Our scheme targets the wide class of NGS data analysis tasks, where the NGS sequences can be processed independently from one another. We also provide the elastream package that supports the use of this scheme with individual analysis programs or with workflow systems. Experiments presented in this paper show that our solution mitigates the effect of data transfer latency and saves both time and cost of computation.