Streaming Support for Data Intensive Cloud-Based Sequence Analysis

Table 2

Description of the used datasets. All the datasets are NGS sequences (reads). The third column includes the number of sequences in millions, and the third column is the data size in GB. There are four sets of the African human genomes of different sizes. The final dataset (130 G) is the original complete one. The three previous ones (1 G, 10 G, 40 G) are subsets of it.

Description Source No_Seq Size

E. coli genome [13] 9 million 1.4 GB
1 G African human genome (study SRP000239)7 million 1 GB
10 G African human genome (study SRP000239)72 million 10 GB
40 G African human genome (study SRP000239)437 million 40 GB
130 G African human genome (study SRP000239)1419 million 130 GB