Research Article

Improved Feature-Selection Method Considering the Imbalance Problem in Text Categorization

Table 9

The accuracy comparison of ECE with nine feature-selection algorithms when the NB is used on 20-Newsgroups. The numbers in the parentheses are the difference of accuracy of the corresponding feature-selection algorithm from that of ECE.

Feature selections400800120016002000

ECE70.81 (—)74.54 (—)75.51 (—)76.73 (—)77.16 (—)
CHI66.44 (−4.37)72.36 (−2.18)74.54 (−0.97)75.62 (−1.11)76.66 (−0.50)
DF56.01 (−14.8)63.02 (−11.52)67.74 (−7.77)70.64 (−6.09)72.10 (−5.06)
GINI72.81 (+2.00)75.79 (+1.25)76.78 (+1.27)77.22 (+0.49)77.76 (+0.60)
IG46.47 (−24.34)53.99 (−20.55)59.42 (−16.09)62.92 (−13.81)65.55 (−11.61)
MI20.77 (−50.04)29.38 (−45.16)46.90 (−28.61)50.70 (−26.03)56.21 (−20.95)
DIA15.86 (−54.95)24.76 (−49.78)26.02 (−49.49)27.14 (−49.59)28.64 (−48.52)
CMFS72.19 (+1.38)74.76 (+0.22)76.20 (+0.69)77.72 (+0.99)78.43 (+1.27)
OCFS43.10 (−27.71)51.05 (−23.49)56.82 (−18.69)61.41 (−15.32)63.87 (−13.29)
DFPFS65.33 (−5.48)69.56 (−4.98)71.48 (−4.03)72.66 (−4.07)73.39 (−3.77)