Research Article

Consensus Clustering-Based Undersampling Approach to Imbalanced Learning

Table 1

Descriptive information for the datasets [12, 24].

DatasetNumber of data samplesNumber of featuresImbalance ratio

Small-scale datasets
Abalone9-18731816.68
Abalone1941748128.87
Ecoli-0_vs_122071.86
Ecoli-0-1-3-7_vs_2-6281739.15
Ecoli133673.36
Ecoli233675.46
Ecoli333678.19
Ecoli4336713.84
Glass021493.19
Glass0123vs456192910.29
Glass016vs2184919.44
Glass016vs521491.82
Glass1214910.39
Glass2214915.47
Glass4214922.81
Glass5214922.81
Glass621496.38
Haberman30632.68
Iris015042
New-thyroid121555.14
New-thyroid221554.92
Page-blocks05472108.77
Page-blocks13vs24721015.85
Pima76881.9
Segment2308196.01
Shuttle0vs41829913.87
Shuttle2vs4129920.5
Vehicle0846183.23
Vehicle1846182.52
Vehicle2846182.52
Vehicle3846182.52
Vowel09881310.1
Wisconsin68391.86
Yeast05679vs452889.35
Yeast1148482.46
Yeast1vs7459813.87
Yeast1289vs7947830.56
Yeast1458vs7693822.1
Yeast2vs451489.08
Yeast2vs8482823.1
Yeast3148488.11
Yeast41484828.41
Yeast51484832.78
Yeast61484839.15

Large-scale datasets
Breast cancer102294117163.19
Protein homology prediction14575174111.46