Research Article

Multiple Suboptimal Solutions for Prediction Rules in Gene Expression Data

Table 1

A taxonomy of feature selection techniques summarized by Saeys et al. [8]. These major feature selections are addressed. Each type has a subcategory. Advantages, disadvantages, and example methods are shown.

Model search Advantages Disadvantages Examples

Univariate
Filter Fast
Scalable
Independent of the classifier
Ignores feature dependencies
Ignores interaction with the classifier
  
Euclidean distance
t-test
Information gain
Multivariate
Models feature dependencies
Independent of the classifier
Better computational complexity than wrapper methods
Slower than univariate techniques
Less scalable than univariate techniques
Ignores interaction with the classifier
Correlation-based feature selection
Markov blanket filter
Fast correlation-based feature selection

Deterministic
WrapperSimple
Interacts with the classifier
Models feature dependencies
Less computationally intensive than randomized methods
Risk of overfitting
More prone than randomized algorithms to getting stuck in a local optimum
Classifier dependent selection
Sequential forward selection
Sequential backward selection
Randomized
Less prone to local optima
Interacts with the classifier
Models feature dependencies
Computationally intensive
Classifier dependent selection
Higher risk of overfitting than deterministic algorithms
Simulated annealing
Randomized hill climbing
Genetic algorithms
Estimation of distribution algorithms

EmbeddedInteracts with the classifier
Better computational complexity than wrapper methods
Models feature dependencies
Classifier dependent selection Decision trees
Weighted naive Bayes
RFE-SVM