Research Article

Distributed Learning over Massive XML Documents in ELM Feature Space

Algorithm 2

Reducer of DXRC.
Input: term, list(docID, element, times, sum
Output: training samples matrix in the form of position, tfidf
(1) Initiate HashMap mapDocEleTF;
(2) Initiate HashMap mapTDocs;
(3) = DistributedCache.get(“totalDocsNum”);
(4) weights = DistributedCache.get(“elementWeightsVector”);
(5) foreach     do
(6)  weightedDocEleTF = weightsdocId, element  * itr.times/itr.sum;
(7)  mapDocEleTF.put(docId, element, weightedDocEleTF);
(8)  if    then
(9)    newTimes = mapTF.get(docId) + irt.getValue().times;
(10)    mapTDocs.put(docID, newTimes);
(11)  else
(12)    mapTDocs.put(docID, itr.getValue().times);
(13) docsNumber = mapTDocs.size();
(14) idf = ;
(15) foreach     do
(16)  position = itrDocEleTF.getKey();
(17)  tfidf = itrDocEleTF.getValue() idf;
(18)  emit(position, tfidf);