Research Article

Improved Feature-Selection Method Considering the Imbalance Problem in Text Categorization

Table 5

The comparison of nine improved and existing feature-selection methods with respect to micro- measure on Reuters-21578 for NB and SVM, respectively. The bold values indicate the best performance of the classifier when various feature-selection methods are used, respectively.

Feature selectionNaïve BayesSupport vector machines
400800120016002000400800120016002000

IG62.0964.6064.7665.1165.2261.3162.1962.6662.5962.88
IGX62.0764.6164.7765.1065.2264.8265.8966.5666.6667.01

CHI62.8964.0264.9664.9265.3562.8563.1262.9662.9362.73
CHIX64.6665.7066.0066.0765.9467.3667.2967.2867.1366.82

MI34.6554.8759.7161.5862.7743.0251.6756.9458.9860.11
MIX39.5551.4659.6461.8463.4339.4751.3561.4163.4464.59

DF62.4164.0964.9965.3665.5161.0762.7562.6962.7062.82
DFX63.9965.3865.6565.9265.9465.9066.6567.0567.0167.58

GINI65.1365.8266.6466.1665.7863.4263.1763.0762.9562.71
GINIX66.4366.4566.3966.5466.4967.4467.1866.9967.1167.20

DIA30.1030.8231.4932.8537.6045.0648.8751.0551.8553.04
DIAX48.1157.7163.6964.2364.9549.7959.1565.7366.3966.96

CMFS66.0366.5266.3866.6666.6463.7363.5263.1463.0662.97
CMFSX66.9467.2166.8766.8466.5967.4867.4167.1067.2767.45

OCFS60.9063.4364.4164.3964.9660.2861.4962.6962.5262.69
OCFSX63.9165.6265.5665.6565.8565.8266.1166.1966.5566.92

DFPFS57.2057.0457.0357.0557.0361.5661.3961.2761.4461.17
DFPFSX62.5563.7163.6763.4063.4064.9265.4265.3765.4665.53