Research Article
A Dynamic Ensemble Framework for Mining Textual Streams with Class Imbalance
Algorithm 1
Clustering forest.
Output: Class label of the testing instance | Input: : The data chunk at each time stamp | : The number of CTs in CF | : The maximum number of CTs selected in the clustering forest | : The current ensemble model | : Misclassified subset at the -th time stamp | : Rare-class subset at the -th time stamp. | : The Rare-class subset as a proportion of the training chunk; : The threshold of | (1) at the -th time stamp | (2) built the training set by Algorithm 2 | (2) create a new CT using | (3) if | (4) | (5) Endif | (6) if | (7) | (8) Endif | (9) compute the accuracy weight of each clustering tree | (10) UPDATE() based on the adaptive selection method | (11) obtain the misclassified subset and the rare-class subset | (12) for each testing instance do: | (13) PREDICT() by the voting method. | (14) Endfor. |
|