Research Article

Improved Feature-Selection Method Considering the Imbalance Problem in Text Categorization

Table 8

The comparison of nine improved and existing feature-selection methods with respect to AUC on WebKB for NB and SVM, respectively. The bold values indicate the best performance of the classifier when various feature-selection methods are used, respectively.

Feature selectionNaïve BayesSupport vector machines
400800120016002000400800120016002000

IG0.80510.82530.83690.83990.84420.89330.90440.90930.91280.9120
IGX0.80550.82420.83570.83980.84440.89730.90800.91340.91530.9151

CHI0.82140.83120.83650.84020.84230.90900.91140.90950.91310.9133
CHIX0.83450.84160.84780.85210.85200.91300.91370.91580.91910.9170

MI0.50090.50300.51130.54190.60820.63570.66180.71370.74780.7620
MIX0.52370.55050.59250.63590.65930.56810.63590.66530.70240.7310

DF0.80410.82420.83440.84040.84220.89520.90780.91570.91400.9143
DFX0.80510.82620.83820.84320.84390.89810.91030.91810.91740.9168

GINI0.81770.83580.84630.84760.84920.91050.91440.91710.91550.9138
GINIX0.82290.84200.84800.85340.85330.91230.91790.92290.91840.9174

DIA0.62520.63720.60580.61920.64650.78670.81510.84200.85750.8696
DIAX0.57000.59310.61110.62790.65110.57910.61420.64730.68420.7052

CMFS0.81320.82160.83540.83970.84300.90670.90980.91230.91810.9152
CMFSX0.81830.83510.84450.84750.85130.90790.91230.91650.91850.9190

OCFS0.81890.83490.84360.84530.84900.91060.90410.91190.91600.9158
OCFSX0.82440.84010.84800.84880.85080.91510.91120.91410.91680.9191

DFPFS0.75040.75070.75110.75170.75250.87350.87500.87430.87410.8729
DFPFSX0.77230.77700.77720.77820.77860.88150.88240.88190.88060.8809