Research Article
Handling Big Data Scalability in Biological Domain Using Parallel and Distributed Processing: A Case of Three Biological Semantic Similarity Measures
Table 12
Enhanced SORA average time in the distributed system (2, 3 and 4 slaves) with input data divided equally.
| Number of Gene Pairs | Original SORA Average Time (ns) | Threaded SORA Average Time (ns) (Input Data Divided Equally) | % Threaded SORA Average Time (Input Data Divided Equally) vs. Original SORA Average Time | 2 Slaves | 3 Slaves | 4 Slaves | 2 Slaves | 3 Slaves | 4 Slaves |
| 10 | 4.14E+07 | 6.03E+08 | 5.27E+12 | 1.06E+13 | 4.82E+08 | 2.13E+11 | 1.84E+11 | 100 | 1.23E+08 | 7.06E+07 | 3.20E+11 | 6.63E+11 | 4.42E+07 | 1.63E+11 | 4.60E+10 | 1000 | 1.11E+08 | 6.15E+07 | 3.09E+10 | 6.40E+10 | 6.61E+07 | 6.75E+09 | 6.01E+09 | 10000 | 3.51E+09 | 1.83E+09 | 6.37E+09 | 3.56E+09 | 3.31E+08 | 6.69E+08 | 3.46E+08 | 100000 | X | X | X | X | X | X | X | 1000000 | X | X | X | X | X | X | X | Average | 9.47E+08 | 6.41E+08 | 1.41E+12 | 2.83E+12 | 2.31E+08 | 9.59E+10 | 5.91E+10 |
|
|
X indicates that, due to limited memory, the system required many hours to find the similarity of some of the pairs.
|