The Scientific World Journal / 2014 / Article / Tab 4 / Research Article
Improved Feature-Selection Method Considering the Imbalance Problem in Text Categorization Table 4 The comparison of nine improved and existing feature-selection methods with respect to AUC on 20-Newsgroups for NB and SVM, respectively. The bold values indicate the best performance of the classifier when various feature-selection methods are used, respectively.
Feature selection Naïve Bayes Support vector machines 400 800 1200 1600 2000 400 800 1200 1600 2000 IG 0.7183 0.7579 0.7864 0.8048 0.8187 0.7746 0.7998 0.8144 0.8257 0.8318 IGX 0.7182 0.7578 0.7863 0.8045 0.8187 0.7774 0.8038 0.8183 0.8284 0.8361 CHI 0.8234 0.8545 0.8660 0.8717 0.8771 0.8485 0.8671 0.8702 0.8707 0.8690 CHIX 0.8346 0.8621 0.8710 0.8770 0.8815 0.8589 0.8726 0.8747 0.8763 0.8753 MI 0.5830 0.6283 0.7205 0.7405 0.7695 0.6165 0.6733 0.7569 0.7721 0.7905 MIX 0.6020 0.6597 0.7315 0.7677 0.7902 0.6269 0.6894 0.7700 0.7961 0.8089 DF 0.7685 0.8054 0.8302 0.8454 0.8531 0.8043 0.8224 0.8380 0.8464 0.8509 DFX 0.7763 0.8111 0.8338 0.8501 0.8585 0.8146 0.8297 0.8416 0.8500 0.8557 GINI 0.8569 0.8726 0.8778 0.8801 0.8829 0.8711 0.8678 0.8631 0.8609 0.8625 GINIX 0.8603 0.8749 0.8800 0.8843 0.8858 0.8720 0.8735 0.8690 0.8660 0.8672 DIA 0.5571 0.6040 0.6107 0.6165 0.6245 0.6331 0.6838 0.7086 0.7302 0.7419 DIAX 0.7429 0.7933 0.8219 0.8577 0.8728 0.7519 0.8038 0.8324 0.8585 0.8690 CMFS 0.8536 0.8671 0.8748 0.8828 0.8865 0.8689 0.8769 0.8747 0.8723 0.8705 CMFSX 0.8598 0.8736 0.8821 0.8887 0.8915 0.8722 0.8798 0.8788 0.8776 0.8766 OCFS 0.7083 0.7375 0.7892 0.8017 0.8113 0.7962 0.8164 0.8260 0.8301 0.8351 OCFSX 0.7550 0.7882 0.8091 0.8205 0.8290 0.7984 0.8193 0.8313 0.8365 0.8408 DFPFS 0.8175 0.8398 0.8499 0.8561 0.8600 0.8377 0.8446 0.8492 0.8521 0.8543 DFPFSX 0.7591 0.7873 0.8000 0.8056 0.8086 0.7952 0.8119 0.8181 0.8218 0.8240