Research Article
A Practical and Scalable Tool to Find Overlaps between Sequences
Table 2
Data sets used in experiments.
| Data set | Size | Number of strings |
| Generated randomly using a uniform distribution | 10 MB–50 GB | 104–66 × 107 | First fully public female human genome (SRR098909) | 32.7 G | 162 M | Illumina whole human genome (SRR866986) | 9.8 G | 53 M | A study in rat genome (ERR125766) | 5 G | 97 M | Homo sapiens | 1.1 G | 15 M | Exome (SRR500004) | EST of C. elegans | 167 MB | 334,465 | EST of Citrusclementina | 104 MB | 118,365 | EST of Citrussinensis | 154 MB | 208,909 | EST of Citrustrifoliata | 46 MB | 62,344 | EST of Attacephalotes | 278 MB | 2,835 |
|
|