Biomedical Text Categorization Based on Ensemble Pruning and Optimized Topic Modelling
Table 9
The macro-averaged F-measure results obtained by conventional algorithms and the proposed diversity-based ensemble pruning (with LDA (k=50) based representation).
Classification algorithm
oh5
oh10
oh15
ohscal
ohsumed
NB
0.76
0.68
0.72
0.61
0.30
SVM
0.78
0.81
0.86
0.73
0.35
Bagging+NB
0.77
0.70
0.72
0.61
0.30
Bagging+SVM
0.85
0.78
0.81
0.73
0.37
AdaBoost+NB
0.74
0.69
0.72
0.61
0.31
AdaBoost+SVM
0.85
0.78
0.80
0.74
0.36
RandomSubspace+NB
0.76
0.68
0.70
0.59
0.29
RandomSubspace+SVM
0.79
0.71
0.73
0.69
0.33
Stacking
0.84
0.80
0.81
0.72
0.38
ESM
0.80
0.81
0.81
0.74
0.39
BES
0.81
0.82
0.83
0.75
0.41
LibD3C
0.84
0.85
0.86
0.76
0.42
CDM
0.86
0.86
0.87
0.78
0.45
DEP (Genetic clustering)
0.82
0.84
0.86
0.76
0.45
DEP (PSO clustering)
0.82
0.83
0.85
0.75
0.47
DEP (Firefly clustering)
0.87
0.88
0.88
0.79
0.49
DEP (Cuckoo clustering)
0.86
0.85
0.88
0.78
0.47
DEP (Bat clustering)
0.85
0.86
0.84
0.74
0.45
NB: Naïve Bayes algorithm, SVM: support vector machines, ESM: ensemble selection from libraries of models, BES: Bagging ensemble selection, LibD3C: hybrid ensemble pruning based on k-means and dynamic selection, CDM: ensemble pruning based on combined diversity measures, and DEP: the proposed diversity-based ensemble pruning.