About this Journal Submit a Manuscript Table of Contents
BioMed Research International
Volume 2013 (2013), Article ID 148014, 9 pages
http://dx.doi.org/10.1155/2013/148014
Research Article

Biomarker Selection and Classification of “-Omics” Data Using a Two-Step Bayes Classification Framework

1Department of Pharmacology, Faculty of Pharmacy, Mahidol University, 447 Sri-Ayuthaya Road, Rajathevi, Bangkok 10400, Thailand
2Department of Electrical and Computer Engineering, Faculty of Engineering, Thammasat University, 99 Phahonyothin Road, Khlong Nueng, Khlong Luang, Pathum Thani 12120, Thailand
3National Center for Genetic Engineering and Biotechnology, 113 Thailand Science Park, Phahonyothin Road, Khlong Nueng, Khlong Luang, Pathum Thani 12120, Thailand
4Department of Electrical and Computer Engineering, King Mongkut University of Technology North Bangkok, 1518 Piboonsongkarm Road, Bangkok 10800, Thailand
5Language and Semantic Technology Laboratory, National Electronic and Computer Technology Center, 112 Thailand Science Park, Phahonyothin Road, Khlong Nueng, Khlong Luang, Pathum Thani 12120, Thailand

Received 22 April 2013; Revised 4 July 2013; Accepted 6 August 2013

Academic Editor: Florencio Pazos

Copyright © 2013 Anunchai Assawamakin et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Identification of suitable biomarkers for accurate prediction of phenotypic outcomes is a goal for personalized medicine. However, current machine learning approaches are either too complex or perform poorly. Here, a novel two-step machine-learning framework is presented to address this need. First, a Naïve Bayes estimator is used to rank features from which the top-ranked will most likely contain the most informative features for prediction of the underlying biological classes. The top-ranked features are then used in a Hidden Naïve Bayes classifier to construct a classification prediction model from these filtered attributes. In order to obtain the minimum set of the most informative biomarkers, the bottom-ranked features are successively removed from the Naïve Bayes-filtered feature list one at a time, and the classification accuracy of the Hidden Naïve Bayes classifier is checked for each pruned feature set. The performance of the proposed two-step Bayes classification framework was tested on different types of -omics datasets including gene expression microarray, single nucleotide polymorphism microarray (SNParray), and surface-enhanced laser desorption/ionization time-of-flight (SELDI-TOF) proteomic data. The proposed two-step Bayes classification framework was equal to and, in some cases, outperformed other classification methods in terms of prediction accuracy, minimum number of classification markers, and computational time.