Research Article

Query Execution Optimization in Spark SQL

Table 2

Parameters in the cost model.

ParameterMeaning

XSize of read/written data
C0Seeking time and rotation delay time
C1Time required to transmit 1 MB data
ΑProportion of non-local data to total data
TNumber of I/O occurrences
|Din|Size of stage input data
|Dout|Size of stage output data tr time to read 1 MB data locally
Time to write 1 MB data locally
tbTime to transfer 1 MB data over network
BBuffer size of spark task m task number in stage