Research Article

Query Execution Optimization in Spark SQL

Algorithm 1

Algorithm to construct variable-width distribution histogram.
Construct variable-width distribution histogram by frequency distribution histogram
input: Hfre = {<attr1, freq1>, <attr2, freq2>, …, <attrn, freqn>}
output: Hwidth = {<start1, end1, times1>,<start2, end2, times2>, …, <startm, endm, timesm>}
procedure
i ⟵ 1; Hwidth ⟵ {}
 start ⟵ attr1; end ⟵ attr1;
 max ⟵ freq1; T ⟵ freq1
while i ≤ n do
  i ⟵ i + 1
  if |max-freqi|/freqi < 0.05 then
   end ⟵ attri
   T ⟵ T + freqi
   if freqi > max then
    max ⟵ freqi
   end if
  else
   Hwidth ⟵ Hwidth + <start, end, T>
   start ⟵ attri; end ⟵ attri
   max ⟵ freqi; T ⟵ freqi
  end if
end while
end procedure