Table of Contents Author Guidelines Submit a Manuscript
Journal of Applied Mathematics
Volume 2013, Article ID 590614, 18 pages
Research Article

Selecting Optimal Feature Set in High-Dimensional Data by Swarm Search

1Department of Computer and Information Science, University of Macau, Macau
2Faculty of Science and Technology, Middlesex University, UK
3Department of Computer Science and Engineering, Cambridge Institute of Technology, Ranchi, India

Received 22 July 2013; Revised 1 October 2013; Accepted 8 October 2013

Academic Editor: Zong Woo Geem

Copyright © 2013 Simon Fong et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Selecting the right set of features from data of high dimensionality for inducing an accurate classification model is a tough computational challenge. It is almost a NP-hard problem as the combinations of features escalate exponentially as the number of features increases. Unfortunately in data mining, as well as other engineering applications and bioinformatics, some data are described by a long array of features. Many feature subset selection algorithms have been proposed in the past, but not all of them are effective. Since it takes seemingly forever to use brute force in exhaustively trying every possible combination of features, stochastic optimization may be a solution. In this paper, we propose a new feature selection scheme called Swarm Search to find an optimal feature set by using metaheuristics. The advantage of Swarm Search is its flexibility in integrating any classifier into its fitness function and plugging in any metaheuristic algorithm to facilitate heuristic search. Simulation experiments are carried out by testing the Swarm Search over some high-dimensional datasets, with different classification algorithms and various metaheuristic algorithms. The comparative experiment results show that Swarm Search is able to attain relatively low error rates in classification without shrinking the size of the feature subset to its minimum.