Research Article

An Empirical Study on the Performance of Cost-Sensitive Boosting Algorithms with Different Levels of Class Imbalance

Table 1

Summary of characteristics for the used data sets.

Data set Samples Minority/majority No. of min/no. of maj Imbalance ratio

LetterA 20000 Class 1/rest 789/19211 24.35
Cbands 12000 Class 1/rest 500/11500 23.00
Pendigits 10992 Class 5/rest 1055/9937 9.42
Satimage 6435 Class 4/rest 626/5809 9.28
Optidigts 5620 Class 8/rest 554/5066 9.14
Mfeat_kar 2000 Digit 9/rest 200/1800 9.00
Mfeat_zer 2000 Digit 9/rest 200/1800 9.00
Segment 2310 Class 5/rest 330/1980 6.00
Scrapie 3113 Class 1/class 0 531/2582 4.86
Vehicle 846 van/rest 212/634 2.99
Haberman 306 Class 2/class 1 81/225 2.78
Yeast 1484 Class 2/rest 429/1055 2.46
Breast 336 Class 1/class 0 81/196 2.42
Phoneme 5404 Class 1/class 0 1586/3818 2.41
German 1000 Class 2/class 1 300/700 2.33
Pima 768 Class 1/class 0 268/500 1.87
Spambase 4601 Class 1/class 0 1813/2788 1.54