Table 5: Times spent for Hadoop cluster to eliminate redundancy.

Number of tagsCluster size
1 server2 servers3 servers4 servers10 servers15 servers20 servers

30 4 sec7 sec7 sec7 sec7 sec7 sec7 sec
500005 sec8 sec7 sec7 sec4 sec2 sec1 sec
50000016 sec18 sec14 sec12 sec6 sec3 sec2 sec
1.2 million 33 sec24 sec22 sec19 sec10 sec6 sec3 sec
45 million 52 min48 min42 min41 min21 min11 min5 min