Research Article
Query Execution Optimization in Spark SQL
Algorithm 1
Algorithm to construct variable-width distribution histogram.
| Construct variable-width distribution histogram by frequency distribution histogram | | input: Hfre = {<attr1, freq1>, <attr2, freq2>, …, <attrn, freqn>} | | output: Hwidth = {<start1, end1, times1>,<start2, end2, times2>, …, <startm, endm, timesm>} | | procedure | | i ⟵ 1; Hwidth ⟵ {} | | start ⟵ attr1; end ⟵ attr1; | | max ⟵ freq1; T ⟵ freq1 | | while i ≤ n do | | i ⟵ i + 1 | | if |max-freqi|/freqi < 0.05 then | | end ⟵ attri | | T ⟵ T + freqi | | if freqi > max then | | max ⟵ freqi | | end if | | else | | Hwidth ⟵ Hwidth + <start, end, T> | | start ⟵ attri; end ⟵ attri | | max ⟵ freqi; T ⟵ freqi | | end if | | end while | | end procedure |
|