Research Article

SCBI_MapReduce, a New Ruby Task-Farm Skeleton for Automated Parallelisation and Distribution in Chunks of Sequences: The Implementation of a Boosted Blast+

Table 1

SCBI_MAPREDUCE performance tests using three different datasets on the “x86 upgraded” cluster. Execution times are expressed in seconds. The number immediately before X indicates the number of reads grouped in a chunk for every parallel task.

Cores Integer dataseta Real-world sequencesb Artificial sequencesc
1X 100X 250X 2000X 1X 100X 250X 2000X

1d 1264 13608 13849 13424 19328 23264 22124 24185 33393
2 635 8824 7903 7584 10251 11462 11554 11776 15302
4 322 4363 4507 4167 5890 6776 6507 5881 7503
8 164 2182 2194 2231 3132 3403 3337 3371 4874
16 81 1097 1098 1121 1633 1901 1797 1817 2602
32 41 568 549 569 899 921 888 915 1339
64 21 293 282 295 532 506 449 466 755
128 12 173 153 179 352 268 233 245 464

aIntegers were subjected to futile, intensive calculations that took at least 1 s on every object.
bThe dataset of real-world sequences consisted of 261 304 sequence reads (mean: 276 nt; mode: 263 nt; coefficient of variation: 11%) obtained from a 454/FLX sequencer downloaded from the SRA database (AC# SRR069473).
cThe dataset of artificial sequences consisted of 425 438 sequences obtained using the software ART with a 2 coverage simulating a 454/FLX sequencing from the Danio rerio chromosome 1 (AC# NC_007112.5).
dUsing one core is equivalent to a linear job without any parallelisation or distribution; it acts as control reference.