Mathematical Problems in Engineering

Research Article

Distributed Learning over Massive XML Documents in ELM Feature Space

Reducer of DXRC.

Input: term, list(docID, element, times, sum
Output: training samples matrix in the form of position, tfidf
(1) Initiate HashMap mapDocEleTF;
(2) Initiate HashMap mapTDocs;
(3) = DistributedCache.get(“totalDocsNum”);
(4) weights = DistributedCache.get(“elementWeightsVector”);
(5) foreach do
(6) weightedDocEleTF = weightsdocId, element * itr.times/itr.sum;
(7) mapDocEleTF.put(docId, element, weightedDocEleTF);
(8) if then
(9) newTimes = mapTF.get(docId) + irt.getValue().times;
(10) mapTDocs.put(docID, newTimes);
(11) else
(12) mapTDocs.put(docID, itr.getValue().times);
(13) docsNumber = mapTDocs.size();
(14) idf = ;
(15) foreach do
(16) position = itrDocEleTF.getKey();
(17) tfidf = itrDocEleTF.getValue() idf;
(18) emit(position, tfidf);