BioMed Research International

Research Article

Handling Big Data Scalability in Biological Domain Using Parallel and Distributed Processing: A Case of Three Biological Semantic Similarity Measures

Average time obtained using a distributed system (2, 3, and 4 slaves) with Enhanced SORA and input data divided by their similarity.


Number of Gene Pairs	Original SORA Average Time (ns)	Threaded SORA Average Time (ns) (Input Data Divided by Their Similarity)			% Threaded SORA Average Time (Input Data Divided by Their Similarity) vs. Original SORA Total Time
Number of Gene Pairs	Original SORA Average Time (ns)	2 Slaves	3 Slaves	4 Slaves	2 Slaves	3 Slaves	4 Slaves

10	4.14E+07	4.82E+08	2.13E+11	1.84E+11	1065.48	514936.11	444813.82
100	1.23E+08	4.42E+07	1.63E+11	4.60E+10	-64.12	132207.19	37238.23
1000	1.11E+08	6.61E+07	6.75E+09	6.01E+09	-40.35	5991.10	5323.33
10000	3.51E+09	3.31E+08	6.69E+08	3.46E+08	-90.57	-80.94	-90.14
100000	X	X	X	X	X	X	X
1000000	X	X	X	X	X	X	X
Average	9.47E+08	2.31E+08	9.59E+10	5.91E+10	1065.48	1.63E+05	1.22E+05

X indicates that, due to limited memory, the system required many hours to find the similarity of some of the pairs.