Research Article
Query Execution Optimization in Spark SQL
Algorithm 2
Calculation of the tuple number of join operation by histogram method.
| Estimate the size of join operation by histogram method | | input: HR = {h1r, h2r, …, hnr}, HS = {h1s, h2s, …, hms} | | Output: Total tuples Sum after join; | | procedure | | i ⟵ 1; j ⟵ 1; Sum ⟵ 0; | | while i ≤ n and j ≤ m do; | | if hi and hj have overlap then; | | Overlap ⟵ Overlap of two histogram buckets; | | templeft ⟵ hi.times ∗ Overlap/(hi.end-hi.start) | | tempright ⟵ hj.times ∗ Overlap/(hj.end-hj.start) | | Sum ⟵ Sum + templeft ∗ tempright/Overlap | | if hi.end < hj.end then | | i ⟵ i + 1 | | else | | j ⟵ j + 1 | | end if | | else | | if hi.end < hj.start then | | i ⟵ i + 1 | | else | | j ⟵ j + 1 | | end if | | end if | | end while | | end procedure |
|