Research Article

Handling Data Skew in MapReduce Cluster by Using Partition Tuning

Table 1

Application characteristics.

ApplicationData typeInput data size (GB)Frequency of variation
of the keys
Average variation in
key distribution
Method

II-1Wikipedia461%33%Hadoop, PTSH
II-2Wikipedia4156%108%Hadoop, PTSH
WC-1RandomWriter7.542%136%Hadoop, PTSH
WC-2RandomWriter7.5125%211%Hadoop, PTSH
WC-3RandomWriter7.5116%130%Hadoop, Closer, LEEN, PTSH