Research Article

Improved Feature-Selection Method Considering the Imbalance Problem in Text Categorization

Table 3

The comparison of nine improved and existing feature-selection methods with respect to micro- measure on 20-Newsgroups for NB and SVM, respectively. The bold values indicate the best performance of the classifier when various feature-selection methods are used, respectively.

Naïve BayesSupport vector machines
400800120016002000400800120016002000

IG49.5557.7662.8865.9968.3458.1562.5365.1767.2868.45
IGX49.5457.7662.8565.9268.3359.0163.5166.0667.9169.32

CHI67.2973.2675.4676.5877.6173.6776.0776.2676.1875.60
CHIX69.4874.6676.3977.4678.3575.1477.0977.2277.2776.96

MI23.1333.8051.8855.4760.5027.4638.2454.2157.0260.62
MIX27.3939.0653.7859.9763.8729.1741.1856.7161.6164.19

DF59.2366.3770.3272.7874.0763.5766.6969.5771.2372.06
DFX60.8867.3570.9773.6274.9465.7168.1770.3771.9272.97

GINI73.7976.6877.8378.2178.7477.1375.5874.5073.9974.26
GINIX74.4577.1778.2278.9479.1877.1376.7575.6874.9775.16

DIA15.4522.9725.7028.5130.9633.5342.4246.5650.2252.19
DIAX47.9059.5966.0873.4576.2359.5767.9472.1276.2277.80

CMFS73.0575.6877.1378.5779.1776.9177.5076.8076.2075.75
CMFSX74.2476.8478.3679.5680.0777.3777.9677.5577.2477.00

OCFS46.5055.3760.6464.7266.8754.8760.3063.5966.3667.61
OCFSX55.4962.3366.2568.3669.8963.0866.5168.6069.4570.21

DFPFS67.3671.4173.1174.1474.7769.9771.2172.0072.5072.92
DFPFSX56.7461.9064.1564.9465.4062.1565.1166.2266.9167.31