Research Article
HSDP: A Hybrid Sampling Method for Imbalanced Big Data Based on Data Partition
Table 6
G-means values of RF + various sampling methods.
| Dataset | Algorithm | RF | SMOTE + RF | ADASYN + RF | Borderline-SMOTE + RF | HSDP (proposed) + RF |
| Pima | 0.6173 | 0.6412 | 0.6315 | 0.6385 | 0.6479 | Yeast3 | 0.8243 | 0.8502 | 0.8473 | 0.8362 | 0.8532 | Abalone19 | 0.2782 | 0.6352 | 0.6529 | 0.4564 | 0.7001 | Segment0 | 0.9344 | 0.9501 | 0.9313 | 0.9421 | 0.9567 | Page-blocks0 | 0.8348 | 0.8990 | 0.9078 | 0.8846 | 0.8815 | Glass5 | 0.8366 | 0.8201 | 0.8302 | 0.8172 | 0.8401 | Ecoli4 | 0.6490 | 0.7442 | 0.7751 | 0.7403 | 0.7239 | Haberman | 0.4659 | 0.5686 | 0.5982 | 0.5743 | 0.5991 |
|
|