Research Article

Improving I/O Efficiency in Hadoop-Based Massive Data Analysis Programs

Table 1

Statistics of the selected dataset from TPC-H [24].

Table nameData sizeThe # of rows in a table

CustomerS ∗ 24 MBS ∗ 150,000
OrdersS ∗ 171 MBS ∗ 1,500,000
LineitemS ∗ 759 MBS ∗ 6,001,215

S: scale factor.