Research Article
HSDP: A Hybrid Sampling Method for Imbalanced Big Data Based on Data Partition
Table 4
G-means values of KNN + various sampling methods.
| Dataset | Algorithm | KNN | SMOTE + KNN | ADASYN + KNN | Borderline-SMOTE + KNN | HSDP (proposed) + KNN |
| Pima | 0.6503 | 0.6673 | 0.6721 | 0.6635 | 0.6932 | Yeast3 | 0.7536 | 0.7786 | 0.7798 | 0.7824 | 0.8034 | Abalone19 | 0.3165 | 0.6028 | 0.6037 | 0.4366 | 0.5437 | Segment0 | 0.9154 | 0.9379 | 0.9440 | 0.9261 | 0.9528 | Page-blocks0 | 0.8153 | 0.8629 | 0.8601 | 0.8421 | 0.8757 | Glass5 | 0.7887 | 0.7857 | 0.7898 | 0.7890 | 0.7715 | Ecoli4 | 0.7012 | 0.7401 | 0.7928 | 0.7429 | 0.8438 | Haberman | 0.4502 | 0.4810 | 0.5211 | 0.4983 | 0.5201 |
|
|