Research Article
HSDP: A Hybrid Sampling Method for Imbalanced Big Data Based on Data Partition
Table 5
F-measure values of RF + various sampling methods.
| Dataset | Algorithm | RF | SMOTE + RF | ADASYN + RF | Borderline-SMOTE + RF | HSDP (proposed) + RF |
| Pima | 0.5891 | 0.5625 | 0.5632 | 0.5710 | 0.6000 | Yeast3 | 0.7326 | 0.7411 | 0.7214 | 0.7366 | 0.7611 | Abalone19 | 0.3054 | 0.3123 | 0.3652 | 0.2864 | 0.3912 | Segment0 | 0.9102 | 0.9375 | 0.8865 | 0.9292 | 0.9301 | Page-blocks0 | 0.7443 | 0.7745 | 0.7586 | 0.7723 | 0.7438 | Glass5 | 0.8148 | 0.7826 | 0.7931 | 0.7725 | 0.8201 | Ecoli4 | 0.5701 | 0.5737 | 0.5723 | 0.5802 | 0.5623 | Haberman | 0.3402 | 0.4011 | 0.4386 | 0.4154 | 0.4489 |
|
|