Research Article
HSDP: A Hybrid Sampling Method for Imbalanced Big Data Based on Data Partition
Table 3
F-measure values of KNN + various sampling methods.
| Dataset | Algorithm | KNN | SMOTE + KNN | ADASYN + KNN | Borderline-SMOTE + KNN | HSDP (proposed) + KNN |
| Pima | 0.5543 | 0.5766 | 0.5824 | 0.5726 | 0.6105 | Yeast3 | 0.6343 | 0.6301 | 0.6294 | 0.6528 | 0.6698 | Abalone19 | 0.2019 | 0.3355 | 0.2801 | 0.2438 | 0.3045 | Segment0 | 0.8653 | 0.8734 | 0.8697 | 0.8843 | 0.8928 | Page-blocks0 | 0.7146 | 0.7301 | 0.7099 | 0.7401 | 0.6988 | Glass5 | 0.7132 | 0.7102 | 0.7129 | 0.7265 | 0.6827 | Ecoli4 | 0.5401 | 0.5268 | 0.6011 | 0.5701 | 0.6698 | Haberman | 0.2922 | 0.3027 | 0.3417 | 0.3285 | 0.3636 |
|
|