Research Article

Handling Big Data Scalability in Biological Domain Using Parallel and Distributed Processing: A Case of Three Biological Semantic Similarity Measures

Table 24

Total time reduction obtained using Threaded SORA in a distributed system with input data divided by their similarity versus input data divided equally.

Number of Gene PairsImprovement Percentage (IP)
2 Slaves3 Slaves4 Slaves

10-20.07-15.78-69.18
100-38.13-28.80-3.77
1000-22.941.69-10.74
10000-90.087-91.41-83.00
100000XXX
1000000XXX
Average-42.80820245-33.57-41.67

X indicates that, due to limited memory, the system required many hours to find the similarity of some of the pairs.