Research Article
Distance Variance Score: An Efficient Feature Selection Method in Text Classification
Table 2
Accuracy of classifiers (unit for mean and std: %).
(a) DBWorld data set |
| | LS | DVS | | DT(C5.0) | SVM | DT(C5.0) | SVM | Mean std | Sig. | Mean std | Sig. | Mean std | Sig. | Mean std | Sig. |
| 3721 | 91.30 7.54 | — | 85.73 6.73 | — | 91.30 7.54 | — | 85.73 6.73 | — | 2721 | 82.64 8.17 | .004 | 84.14 7.06 | .505 | 89.50 6.59 | .736 | 85.22 6.10 | .807 | 1721 | 70.27 8.22 | .000 | 80.73 6.68 | .023 | 89.61 5.57 | .779 | 87.02 8.92 | .291 | 721 | 58.53 10.52 | .000 | 63.97 11.09 | .000 | 91.74 6.07 | .433 | 84.15 8.67 | .129 | 521 | 56.49 6.94 | .000 | 62.66 9.16 | .000 | 89.07 7.08 | .185 | 85.10 7.57 | .584 | 321 | 53.58 10.79 | .000 | 57.75 8.83 | .000 | 92.16 4.58 | .454 | 83.55 7.19 | .192 | 121 | 52.44 10.34 | .000 | 51.92 10.91 | .000 | 92.74 6.61 | .179 | 90.56 6.16 | .098 | 21 | 58.03 6.73 | .000 | 57.38 9.53 | .000 | 88.26 7.68 | .332 | 91.40 6.41 | .722 |
|
|
(b) CNAE data set |
| | LS | DVS | | DT(C5.0) | SVM | DT(C5.0) | SVM | Mean std | Sig. | Mean std | Sig. | Mean std | Sig. | Mean std | Sig. |
| 856 | 87.59 2.25 | — | 92.93 1.58 | — | 87.59 2.25 | — | 92.93 1.58 | — | 656 | 86.67 2.21 | .910 | 82.19 4.52 | .000 | 87.64 1.77 | .924 | 83.41 2.02 | .000 | 456 | 86.87 2.62 | .394 | 84.42 2.26 | .000 | 88.25 1.86 | .338 | 85.18 2.73 | .000 | 256 | 73.11 2.77 | .000 | 66.53 2.09 | .000 | 87.93 2.31 | .700 | 84.53 2.12 | .000 | 156 | 17.60 1.48 | .000 | 15.08 2.48 | .000 | 86.54 2.46 | .146 | 82.77 2.33 | .000 | 56 | 8.86 0.930 | .000 | 5.33 0.95 | .000 | 81.99 1.90 | .000 | 77.47 2.33 | .000 |
|
|