Research Article

Handling Big Data Scalability in Biological Domain Using Parallel and Distributed Processing: A Case of Three Biological Semantic Similarity Measures

Table 23

Average time reduction obtained using Threaded SORA in a distributed system with input data divided by their similarity versus input data divided equally.

Number of Gene PairsImprovement Percentage (IP)
2 Slaves3 Slaves4 Slaves

10-20.07-95.96-98.26
100-37.39-49.06-93.06
10007.48-78.16-90.61
10000-81.91-89.50-90.28
100000XXX
1000000XXX
Average-3.30E+01-7.82E+01-9.31E+01

X indicates that, due to limited memory, the system required many hours to find the similarity of some of the pairs.