Research Article

A Practical and Scalable Tool to Find Overlaps between Sequences

Table 5

Running SOF with large random data set using 16-core AWS machine. Number of strings in millions.

Data setTotal sizeNumber of stringsTimeSpace
(Minutes)

Random 10 GB100 M3015 GB
Random 20 GB200 M4131 GB
Random 30 GB300 M7646 GB
Random 50 GB660 M11096 GB
SRR500004 1.1 GB15 M32.2 GB
ERR125766 5 GB97 M1112 GB
SRR866986 10 GB53 M1210 GB
SRR098909 32 GB162 M11931.2 GB