Research Article

A Type-Based Blocking Technique for Efficient Entity Resolution over Large-Scale Data

Algorithm 2. Attribute clustering algorithm.

Input: Map < attribute name, List >//this is for same type block
     δ: the threshold for the attributes similarity
Output: Set of attribute names clusters: C
(a)   connects ←{};
(b)   noOfAttributes ←the size of Map < attribute name, List >;
(c)   For (int i = 0; i < noOfAttributes; i++) do
(d)     For (int j = i + 1; j < noOfAttributes; j++) do
(e)      ←attribute[i].getSimilarAttribute(attribute[j]);//refer to the formula (1)~(4)
(f)      If ( > δ) then connectsconnects.add(new Connect(i, j));
(g)     End For(j)
(h)  End For (i)
(i)   cons ←computerTransitiveClosure(connects);
(j)   C←getConnectedComponents(cons);
(k)   For each do
(l)      If (.size() == 1) then C.remove();
(m)   End For ();
(n)     Return C;
Algorithm 2. Attribute clustering algorithm.