BioMed Research International

Research Article

Handling Big Data Scalability in Biological Domain Using Parallel and Distributed Processing: A Case of Three Biological Semantic Similarity Measures

Enhanced SORA average time in the distributed system (2, 3 and 4 slaves) with input data divided equally.


Number of Gene Pairs	Original SORA Average Time (ns)	Threaded SORA Average Time (ns) (Input Data Divided Equally)			% Threaded SORA Average Time (Input Data Divided Equally) vs. Original SORA Average Time
Number of Gene Pairs	Original SORA Average Time (ns)	2 Slaves	3 Slaves	4 Slaves	2 Slaves	3 Slaves	4 Slaves

10	4.14E+07	6.03E+08	5.27E+12	1.06E+13	4.82E+08	2.13E+11	1.84E+11
100	1.23E+08	7.06E+07	3.20E+11	6.63E+11	4.42E+07	1.63E+11	4.60E+10
1000	1.11E+08	6.15E+07	3.09E+10	6.40E+10	6.61E+07	6.75E+09	6.01E+09
10000	3.51E+09	1.83E+09	6.37E+09	3.56E+09	3.31E+08	6.69E+08	3.46E+08
100000	X	X	X	X	X	X	X
1000000	X	X	X	X	X	X	X
Average	9.47E+08	6.41E+08	1.41E+12	2.83E+12	2.31E+08	9.59E+10	5.91E+10

X indicates that, due to limited memory, the system required many hours to find the similarity of some of the pairs.