Table of Contents Author Guidelines Submit a Manuscript
The Scientific World Journal
Volume 2012, Article ID 278352, 10 pages
Research Article

Effects of Pooling Samples on the Performance of Classification Algorithms: A Comparative Study

1Institute for Bioinformatics and Translational Research, UMIT, 6060 Hall in Tyrol, Austria
2Faculty of Chemistry and Pharmacy, Leopold-Franzens-University Innsbruck, 6020 Innsbruck, Austria
3Institute of Electrical and Biomedical Engineering, UMIT, 6060 Hall in Tyrol, Austria
4Novartis Pharmaceuticals Corporation, Oncology Biomarkers and Imaging, One Health Plaza, East Hanover, NJ 07936, USA

Received 18 December 2011; Accepted 10 January 2012

Academic Editor: Zhenqiang Su

Copyright © 2012 Kanthida Kusonmano et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


A pooling design can be used as a powerful strategy to compensate for limited amounts of samples or high biological variation. In this paper, we perform a comparative study to model and quantify the effects of virtual pooling on the performance of the widely applied classifiers, support vector machines (SVMs), random forest (RF), k-nearest neighbors (k-NN), penalized logistic regression (PLR), and prediction analysis for microarrays (PAMs). We evaluate a variety of experimental designs using mock omics datasets with varying levels of pool sizes and considering effects from feature selection. Our results show that feature selection significantly improves classifier performance for non-pooled and pooled data. All investigated classifiers yield lower misclassification rates with smaller pool sizes. RF mainly outperforms other investigated algorithms, while accuracy levels are comparable among all the remaining ones. Guidelines are derived to identify an optimal pooling scheme for obtaining adequate predictive power and, hence, to motivate a study design that meets best experimental objectives and budgetary conditions, including time constraints.