The Scientific World Journal

Research Article

Unbiased Feature Selection in Learning Random Forests for High-Dimensional Data

Table 5

Test accuracy results (%) of random forest models, GRRF(0.1), varSelRF, and LASSO logistic regression, applied to gene datasets. The average results of 100 repetitions were computed; higher values are better. The number of genes in the strong group and the weak group is used in xRF.


Dataset	xRF	RF	wsRF	GRRF	varSelRF	LASSO

colon	87.65	84.35	84.50	86.45	76.80	82.00	245	317
srbct	97.71	95.90	96.76	97.57	96.50	99.30	606	546
Leukemia	89.25	82.58	84.83	87.25	89.30	92.40	502	200
Lymphoma	99.30	97.15	98.10	99.10	97.80	99.10	1404	275
breast.2.class	78.84	62.72	63.40	71.32	61.40	63.40	194	631
breast.3.class	65.42	56.00	57.19	63.55	58.20	60.00	724	533
nci	74.15	58.85	59.40	63.05	58.20	60.40	247	1345
Brain	81.93	70.79	70.79	74.79	76.90	74.10	1270	1219
Prostate	92.56	88.71	90.79	92.85	91.50	91.20	601	323
Adenocarcinoma	90.88	84.04	84.12	85.52	78.80	81.10	108	669