Research Article

Unbiased Feature Selection in Learning Random Forests for High-Dimensional Data

Table 5

Test accuracy results (%) of random forest models, GRRF(0.1), varSelRF, and LASSO logistic regression, applied to gene datasets. The average results of 100 repetitions were computed; higher values are better. The number of genes in the strong group and the weak group is used in xRF.

Dataset xRF RF wsRF GRRF varSelRF LASSO

colon 87.65 84.35 84.50 86.45 76.80 82.00 245 317
srbct 97.71 95.90 96.76 97.57 96.50 99.30 606 546
Leukemia 89.25 82.58 84.83 87.25 89.30 92.40 502 200
Lymphoma 99.30 97.15 98.10 99.10 97.80 99.10 1404 275
breast.2.class 78.84 62.72 63.40 71.32 61.40 63.40 194 631
breast.3.class 65.42 56.00 57.19 63.55 58.20 60.00 724 533
nci 74.15 58.85 59.40 63.05 58.20 60.40 247 1345
Brain 81.93 70.79 70.79 74.79 76.90 74.10 1270 1219
Prostate 92.56 88.71 90.79 92.85 91.50 91.20 601 323
Adenocarcinoma 90.88 84.04 84.12 85.52 78.80 81.10 108 669