Research Article
Distributed Learning over Massive XML Documents in ELM Feature Space
Input: term, list(docID, element, times, sum | Output: training samples matrix in the form of position, tfidf | (1) Initiate HashMap mapDocEleTF; | (2) Initiate HashMap mapTDocs; | (3) = DistributedCache.get(“totalDocsNum”); | (4) weights = DistributedCache.get(“elementWeightsVector”); | (5) foreach do | (6) weightedDocEleTF = weightsdocId, element * itr.times/itr.sum; | (7) mapDocEleTF.put(docId, element, weightedDocEleTF); | (8) if then | (9) newTimes = mapTF.get(docId) + irt.getValue().times; | (10) mapTDocs.put(docID, newTimes); | (11) else | (12) mapTDocs.put(docID, itr.getValue().times); | (13) docsNumber = mapTDocs.size(); | (14) idf = ; | (15) foreach do | (16) position = itrDocEleTF.getKey(); | (17) tfidf = itrDocEleTF.getValue() idf; | (18) emit(position, tfidf); |
|