Research Article
A Type-Based Blocking Technique for Efficient Entity Resolution over Large-Scale Data
Algorithm 2. Attribute clustering algorithm.
Input: Map < attribute name, List >//this is for same type block | δ: the threshold for the attributes similarity | Output: Set of attribute names clusters: C | (a) connects ←{}; | (b) noOfAttributes ←the size of Map < attribute name, List >; | (c) For (int i = 0; i < noOfAttributes; i++) do | (d) For (int j = i + 1; j < noOfAttributes; j++) do | (e) ←attribute[i].getSimilarAttribute(attribute[j]);//refer to the formula (1)~(4) | (f) If ( > δ) then connects ← connects.add(new Connect(i, j)); | (g) End For(j) | (h) End For (i) | (i) cons ←computerTransitiveClosure(connects); | (j) C←getConnectedComponents(cons); | (k) For each do | (l) If (.size() == 1) then C.remove(); | (m) End For (); | (n) Return C; |
|
Algorithm 2. Attribute clustering algorithm. |