The Scientific World Journal / 2014 / Article / Tab 6 / Research Article
Improved Feature-Selection Method Considering the Imbalance Problem in Text Categorization Table 6 The comparison of nine improved and existing feature-selection methods with respect to AUC on Reuters-21578 for NB and SVM, respectively. The bold values indicate the best performance of the classifier when various feature-selection methods are used, respectively.
Feature selection Naïve Bayes Support vector machines 400 800 1200 1600 2000 400 800 1200 1600 2000 IG 0.8978 0.9068 0.9073 0.9088 0.9093 0.9005 0.9050 0.9065 0.9079 0.9086 IGX 0.8977 0.9068 0.9073 0.9087 0.9093 0.9083 0.9124 0.9143 0.9159 0.9168 CHI 0.8988 0.9055 0.9093 0.9091 0.9112 0.9053 0.9071 0.9066 0.9070 0.9079 CHIX 0.8864 0.9058 0.9074 0.9101 0.9116 0.9090 0.9126 0.9143 0.9144 0.9138 MI 0.6923 0.8521 0.8799 0.8902 0.8979 0.7957 0.8541 0.8810 0.8904 0.8974 MIX 0.7282 0.8256 0.8815 0.8928 0.9007 0.7357 0.8341 0.8897 0.8970 0.9070 DF 0.8977 0.9075 0.9098 0.9107 0.9107 0.9012 0.9060 0.9091 0.9086 0.9090 DFX 0.9008 0.9091 0.9105 0.9111 0.9116 0.9103 0.9138 0.9159 0.9169 0.9167 GINI 0.9082 0.9119 0.9135 0.9138 0.9123 0.9083 0.9068 0.9075 0.9088 0.9083 GINIX 0.9112 0.9129 0.9139 0.9135 0.9135 0.9165 0.9147 0.9152 0.9157 0.9164 DIA 0.7005 0.7146 0.7249 0.7334 0.7565 0.8501 0.8673 0.8750 0.8790 0.8811 DIAX 0.7884 0.8654 0.9040 0.9066 0.9088 0.7954 0.8778 0.9124 0.9143 0.9159 CMFS 0.9109 0.9141 0.9133 0.9139 0.9134 0.9095 0.9094 0.9084 0.9087 0.9094 CMFSX 0.9071 0.9135 0.9143 0.9147 0.9144 0.9150 0.9171 0.9153 0.9156 0.9162 OCFS 0.8983 0.9074 0.9088 0.9092 0.9104 0.9027 0.9029 0.9046 0.9057 0.9075 OCFSX 0.8914 0.9065 0.9091 0.9095 0.9102 0.9068 0.9094 0.9111 0.9127 0.9136 DFPFS 0.8159 0.8157 0.8159 0.8161 0.8162 0.8875 0.8869 0.8865 0.8871 0.8863 DFPFSX 0.8828 0.8882 0.8884 0.8880 0.8883 0.9032 0.9066 0.9064 0.9062 0.9067