Research Article

Improved Feature-Selection Method Considering the Imbalance Problem in Text Categorization

Table 7

The comparison of nine improved and existing feature-selection methods with respect to micro- measure on WebKB for NB and SVM, respectively. The bold values indicate the best performance of the classifier when various feature-selection methods are used, respectively.

Feature selectionNaïve BayesSupport vector machines
400800120016002000400800120016002000

IG71.3674.1775.8476.6177.4683.5285.6686.1086.8086.77
IGX71.6473.9675.6476.5777.4884.1086.2986.7687.2787.33

CHI71.9274.1674.6576.1976.9385.8886.5286.0886.8986.89
CHIX74.9876.2877.3978.5778.8386.6986.9787.2687.6387.34

MI33.9634.6336.3240.7149.5844.9648.6057.2361.7564.44
MIX33.6938.3144.4951.0054.3038.3746.8650.5656.6360.85

DF71.0473.7475.8276.8777.2383.8485.8787.2787.0487.05
DFX71.9374.4876.5777.5277.5584.5186.2887.7387.6487.59

GINI72.1375.5877.6877.8478.4286.1786.8187.0987.1386.90
GINIX73.7576.9678.0678.8078.9786.6187.5288.2087.5687.44

DIA47.7548.4046.0648.7751.0062.6669.1374.8476.9579.03
DIAX50.1153.2360.9963.8663.6856.4263.4269.9771.7371.36

CMFS72.1473.2275.6276.8177.3385.6486.0686.7287.6387.17
CMFSX73.3775.8977.6378.0678.7086.0486.6687.2987.5487.75

OCFS73.0775.2376.4677.0978.0284.3985.6486.4986.8487.04
OCFSX73.7976.5278.0578.2778.5387.1986.8686.8587.4487.78

DFPFS65.1764.7464.6564.6665.3380.6981.4681.2781.2680.28
DFPFSX67.9668.8168.3768.5868.8381.8382.0882.1182.0082.01