Table of Contents Author Guidelines Submit a Manuscript
Computational and Mathematical Methods in Medicine
Volume 2015 (2015), Article ID 193406, 11 pages
http://dx.doi.org/10.1155/2015/193406
Research Article

Genetic Programming Based Ensemble System for Microarray Data Classification

1Software School of Xiamen University, Xiamen, Fujian 361005, China
2Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Kowloon 999077, Hong Kong
3Baidu Inc., Beijing 100000, China
4School of Computer Engineering, Jimei University, Xiamen, Fujian 361021, China

Received 22 October 2014; Revised 1 January 2015; Accepted 19 January 2015

Academic Editor: John Mitchell

Copyright © 2015 Kun-Hong Liu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Recently, more and more machine learning techniques have been applied to microarray data analysis. The aim of this study is to propose a genetic programming (GP) based new ensemble system (named GPES), which can be used to effectively classify different types of cancers. Decision trees are deployed as base classifiers in this ensemble framework with three operators: Min, Max, and Average. Each individual of the GP is an ensemble system, and they become more and more accurate in the evolutionary process. The feature selection technique and balanced subsampling technique are applied to increase the diversity in each ensemble system. The final ensemble committee is selected by a forward search algorithm, which is shown to be capable of fitting data automatically. The performance of GPES is evaluated using five binary class and six multiclass microarray datasets, and results show that the algorithm can achieve better results in most cases compared with some other ensemble systems. By using elaborate base classifiers or applying other sampling techniques, the performance of GPES may be further improved.