Research Article

Handling Big Data Scalability in Biological Domain Using Parallel and Distributed Processing: A Case of Three Biological Semantic Similarity Measures

Table 18

Average time obtained using a distributed system (2, 3, and 4 slaves) with Enhanced SORA and input data divided by their similarity.

Number of Gene PairsOriginal SORA Average Time (ns) Threaded SORA Average Time (ns) (Input Data Divided by Their Similarity) % Threaded SORA Average Time (Input Data Divided by Their Similarity) vs. Original SORA Total Time
2 Slaves3 Slaves4 Slaves2 Slaves3 Slaves4 Slaves

104.14E+074.82E+082.13E+111.84E+111065.48514936.11444813.82
1001.23E+084.42E+071.63E+114.60E+10-64.12132207.1937238.23
10001.11E+086.61E+076.75E+096.01E+09-40.355991.105323.33
100003.51E+093.31E+086.69E+083.46E+08-90.57-80.94-90.14
100000XXXXXXX
1000000XXXXXXX
Average9.47E+082.31E+089.59E+105.91E+101065.481.63E+051.22E+05

X indicates that, due to limited memory, the system required many hours to find the similarity of some of the pairs.