Research Article
Partition Selection for Large-Scale Data Management Using KNN Join Processing
| Input: k, R, S | | Output: s.predictLabel//prediction label of s | | map: <row number, s> | | foreach r ∈ R do | | dis = dis(r, s);//calculate the Euclidean distance between r and s | | for i = 0 to k do | | if dis < distance[i] then//find the minimum k distances | | distance[i] = dis; | | trainLabel[i] = r.label; | | break; | | for j = 0 to k do | | output(s, trainLabel[i]); | | reduce: <s, Labels> | | hmp = new HashMap(); //create a HashMap object hmp | | foreach label ∈ Labels do//count the number of each label | | if hmp.get(label) ! = NULL then//if the label exists in hmp | | label.value ++; //take the value of the label and add 1 | | hmp.put(label) = label.value;//update the value of the label in hmp | | else//if the label does not exist in hmp | | hmp.put(label) = 1; //set the value of the label to 1 and insert to hmp | | predictLabel = hmp.maxvalue; //the label with the largest value as the prediction label | | output(s, predictLabel); |
|