Computational Biology Journal

Research Article

SCBI_MapReduce, a New Ruby Task-Farm Skeleton for Automated Parallelisation and Distribution in Chunks of Sequences: The Implementation of a Boosted Blast+

Table 1

SCBI_MAPREDUCE performance tests using three different datasets on the “x86 upgraded” cluster. Execution times are expressed in seconds. The number immediately before X indicates the number of reads grouped in a chunk for every parallel task.


Cores	Integer dataset^a	Real-world sequences^b				Artificial sequences^c
Cores	Integer dataset^a	1X	100X	250X	2000X	1X	100X	250X	2000X

1^d	1264	13608	13849	13424	19328	23264	22124	24185	33393
2	635	8824	7903	7584	10251	11462	11554	11776	15302
4	322	4363	4507	4167	5890	6776	6507	5881	7503
8	164	2182	2194	2231	3132	3403	3337	3371	4874
16	81	1097	1098	1121	1633	1901	1797	1817	2602
32	41	568	549	569	899	921	888	915	1339
64	21	293	282	295	532	506	449	466	755
128	12	173	153	179	352	268	233	245	464

^aIntegers were subjected to futile, intensive calculations that took at least 1 s on every object.
^bThe dataset of real-world sequences consisted of 261 304 sequence reads (mean: 276 nt; mode: 263 nt; coefficient of variation: 11%) obtained from a 454/FLX sequencer downloaded from the SRA database (AC# SRR069473).
^cThe dataset of artificial sequences consisted of 425 438 sequences obtained using the software ART with a 2 coverage simulating a 454/FLX sequencing from the Danio rerio chromosome 1 (AC# NC_007112.5).
^dUsing one core is equivalent to a linear job without any parallelisation or distribution; it acts as control reference.