Research Article

A Practical and Scalable Tool to Find Overlaps between Sequences

Table 2

Data sets used in experiments.

Data setSizeNumber of strings

Generated randomly using a uniform distribution 10 MB–50 GB104–66 × 107
First fully public female human genome (SRR098909) 32.7 G162 M
Illumina whole human genome (SRR866986) 9.8 G53 M
A study in rat genome (ERR125766) 5 G97 M
Homo sapiens 1.1 G15 M
Exome (SRR500004)
EST of C. elegans 167 MB334,465
EST of Citrusclementina 104 MB118,365
EST of Citrussinensis 154 MB208,909
EST of Citrustrifoliata 46 MB62,344
EST of Attacephalotes 278 MB2,835