BioMed Research International

Research Article

Handling Big Data Scalability in Biological Domain Using Parallel and Distributed Processing: A Case of Three Biological Semantic Similarity Measures

Total time obtained using a distributed system (2, 3, and 4 slaves) with Enhanced SORA and input data divided by their similarity.


Number of Gene Pairs	Original SORA Total Time (ns)	Threaded SORA Total Time (ns) (Input Data Divided by Their Similarity)			% Threaded SORA Total Time (Input Data Divided by Their Similarity) vs. Original SORA Total Time
Number of Gene Pairs	Original SORA Total Time (ns)	2 Slaves	3 Slaves	4 Slaves	2 Slaves	3 Slaves	4 Slaves

10	1708984548	482298616	359402373	399416852	-71.78	-78.97	-76.63
100	12714096406	2840539191	2788018116	2459561951	-77.66	-78.07	-80.65
1000	1.11216E+11	28551368161	23437947918	18670965654	-74.33	-78.93	-83.21
10000	3.51063E+13	1.22567E+12	1.03491E+12	1.24612E+12	-96.51	-97.05	-96.45
100000	X	X	X	X	X	X	X
1000000	X	X	X	X	X	X	X
Average	8.80799E+12	3.14385E+11	2.65373E+11	3.16912E+11	-80.07	-83.25	-84.24

X indicates that, due to limited memory, the system required many hours to find the similarity of some of the pairs.