Research Article

Parallel Attribute Reduction Algorithm for Complex Heterogeneous Data Using MapReduce

Algorithm 2

Hash-Reduce function.
Input: <KEYHM, VALUEHM>
Output: <KEYHR, VALUEHR> // let KEYHR be the set of different hash value key', and VALUEHR be the set of sample IDs subset value' with the same hash value key'.
begin
  <KEYHR, VALUEHR>=
  for <key, value>in <KEYHM, VALUEHM>do
   if key is not appeared in <KEYHR, VALUEHR>
    <key', value'>=<key, value>
   else
   if key=key'k
    <KEYHR, VALUEHR>=<KEYHR, VALUEHR>-<key', value'>
    value'k=value'k value // combine samples with the same hash value, obtain the hash bucket
   end if
   end if
   <KEYHR, VALUEHR>=<KEYHR, VALUEHR> <key', value'>
  end for //output with multi-file; a file named after a hash value is a hash bucket
end