Research Article
Handling Big Data Scalability in Biological Domain Using Parallel and Distributed Processing: A Case of Three Biological Semantic Similarity Measures
Table 18
Average time obtained using a distributed system (2, 3, and 4 slaves) with Enhanced SORA and input data divided by their similarity.
| Number of Gene Pairs | Original SORA Average Time (ns) | Threaded SORA Average Time (ns) (Input Data Divided by Their Similarity) | % Threaded SORA Average Time (Input Data Divided by Their Similarity) vs. Original SORA Total Time | 2 Slaves | 3 Slaves | 4 Slaves | 2 Slaves | 3 Slaves | 4 Slaves |
| 10 | 4.14E+07 | 4.82E+08 | 2.13E+11 | 1.84E+11 | 1065.48 | 514936.11 | 444813.82 | 100 | 1.23E+08 | 4.42E+07 | 1.63E+11 | 4.60E+10 | -64.12 | 132207.19 | 37238.23 | 1000 | 1.11E+08 | 6.61E+07 | 6.75E+09 | 6.01E+09 | -40.35 | 5991.10 | 5323.33 | 10000 | 3.51E+09 | 3.31E+08 | 6.69E+08 | 3.46E+08 | -90.57 | -80.94 | -90.14 | 100000 | X | X | X | X | X | X | X | 1000000 | X | X | X | X | X | X | X | Average | 9.47E+08 | 2.31E+08 | 9.59E+10 | 5.91E+10 | 1065.48 | 1.63E+05 | 1.22E+05 |
|
|
X indicates that, due to limited memory, the system required many hours to find the similarity of some of the pairs.
|