Input: sd: dataset; |
δ: the threshold for the number of possible values of an enumeration attribute; |
λ: the number of values which are randomly selected from sd; |
Output: Map < BT(block type), list of attribute names > BT∈{NUME, STRING, DATE, ENUM} |
(a) Map < attribute, List < v1, v2,…, vλ> > mediateData ← sd;// Using Map to store attribute and its |
values. |
(b) blockMap← new HashMap < String, List>; // blockMap is used to store the return value; |
(c) For each attribute in mediateData Do |
(d) valuesNoRep← Remove duplicate elements from List < v1,v2,…,vλ>; |
(e) n← valuesNoRep.size(); |
(f) If ((double)n/λ < δ) then {//the type of this attribute is enumeration |
(g) List enumAttributes ← blockMap.get(“ENUM”); |
(h) If (enumAttributes == null) then {enumAttributes ← new ArrayList; |
(i) blockMap.put(“ENUM”, enumAttributes);} |
(j) enumAttributes← enumAttributes.add(attribute name); |
(k) }Else if (the elements of valuesNoRep conform to the date type rules) then { |
(l) List dateAttributes ← blockMap.get(“DATE”); |
(m) If (dateAttributes == null) then {dateAttributes ← new ArrayList; |
(n) blockMap.put(“DATE”, dateAttributes);} |
(o) dateAttributes ← dateAttributes.add(attribute name); |
(p) }Else if (The elements of listWithoutDu conform to the numerical type rules) then { |
(q) List numericAttributes ← blockMap.get(“NUME”); |
(r) If (numericAttributes == null) then {numericAttributes ← new ArrayList; |
(s)) blockMap.put(“NUME”, numericAttributes);} |
(t) numericAttributes ← numericAttributes. add(attribute name); |
(u) } Else { |
(v) //other attributes will be treated as string type |
(w) List stringAttributes ← blockMap.get(“STRING”); |
(x) If (stringAttributes == null) then {stringAttributes ← new ArrayList; |
(y) blockMap.put(“STRING”, stringAttributes);} |
(z) stringAttributes ← stringAttributes.add(attribute name); |
(aa) } |
(bb) End For each attribute in mediateData |
(cc) return blockMap; |
(Note: the δ and λ should be adjusted according to the size of dataset) |