Table of Contents Author Guidelines Submit a Manuscript
The Scientific World Journal
Volume 2012, Article ID 278352, 10 pages
http://dx.doi.org/10.1100/2012/278352
Research Article

Effects of Pooling Samples on the Performance of Classification Algorithms: A Comparative Study

1Institute for Bioinformatics and Translational Research, UMIT, 6060 Hall in Tyrol, Austria
2Faculty of Chemistry and Pharmacy, Leopold-Franzens-University Innsbruck, 6020 Innsbruck, Austria
3Institute of Electrical and Biomedical Engineering, UMIT, 6060 Hall in Tyrol, Austria
4Novartis Pharmaceuticals Corporation, Oncology Biomarkers and Imaging, One Health Plaza, East Hanover, NJ 07936, USA

Received 18 December 2011; Accepted 10 January 2012

Academic Editor: Zhenqiang Su

Copyright © 2012 Kanthida Kusonmano et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. R. Clarke, H. W. Ressom, A. Wang et al., “The properties of high-dimensional data spaces: implications for exploring gene and protein expression data,” Nature Reviews Cancer, vol. 8, no. 1, pp. 37–49, 2008. View at Publisher · View at Google Scholar · View at Scopus
  2. F. Molina, M. Dehmer, P. Perco et al., “Systems biology: opening new avenues in clinical research,” Nephrology Dialysis Transplantation, vol. 25, no. 4, pp. 1015–1018, 2010. View at Publisher · View at Google Scholar · View at Scopus
  3. D. Agrawal, T. Chen, R. Irby et al., “Osteopontin identified as lead marker of colon cancer progression, using pooled sample expression profiling,” Journal of the National Cancer Institute, vol. 94, no. 7, pp. 513–521, 2002. View at Google Scholar · View at Scopus
  4. R. A. Jolly, K. M. Goldstein, T. Wei et al., “Pooling samples within microarray studies: a comparative analysis of rat liver transcription response to prototypical toxicants,” Physiological Genomics, vol. 22, pp. 346–355, 2005. View at Publisher · View at Google Scholar · View at Scopus
  5. C. M. Kendziorski, Y. Zhang, H. Lan, and A. D. Attie, “The efficiency of pooling mRNA in microarray experiments,” Biostatistics, vol. 4, no. 3, pp. 465–477, 2003. View at Google Scholar · View at Scopus
  6. X. Peng, C. L. Wood, E. M. Blalock, K. C. Chen, P. W. Landfield, and A. J. Stromberg, “Statistical implications of pooling RNA samples for microarray experiments,” BMC Bioinformatics, vol. 4, article no. 26, 2003. View at Publisher · View at Google Scholar · View at Scopus
  7. J. H. Shih, A. M. Michalowska, K. Dobbin, Y. Ye, T. H. Qiu, and J. E. Green, “Effects of pooling mRNA in microarray class comparisons,” Bioinformatics, vol. 20, no. 18, pp. 3318–3325, 2004. View at Publisher · View at Google Scholar · View at Scopus
  8. S. D. Zhang and T. W. Gant, “Effect of pooling samples on the efficiency of comparative studies using microarrays,” Bioinformatics, vol. 21, no. 24, pp. 4378–4383, 2005. View at Publisher · View at Google Scholar · View at Scopus
  9. W. Zhang, A. Carriquiry, D. Nettleton, and J. C. M. Dekkers, “Pooling mRNA in microarray experiments and its effect on power,” Bioinformatics, vol. 23, no. 10, pp. 1217–1224, 2007. View at Publisher · View at Google Scholar · View at Scopus
  10. J. A. Cruz and D. S. Wishart, “Applications of machine learning in cancer prediction and prognosis,” Cancer Informatics, vol. 2, pp. 59–77, 2006. View at Google Scholar · View at Scopus
  11. J. Hayward, S. A. Alvarez, C. Ruiz, M. Sullivan, J. Tseng, and G. Whalen, “Machine learning of clinical performance in a pancreatic cancer database,” Artificial Intelligence in Medicine, vol. 49, no. 3, pp. 187–195, 2010. View at Publisher · View at Google Scholar · View at Scopus
  12. M. Netzer, G. Millonig, M. Osl et al., “A new ensemble-based algorithm for identifying breath gas marker candidates in liver disease using ion molecule reaction mass spectrometry,” Bioinformatics, vol. 25, no. 7, pp. 941–947, 2009. View at Publisher · View at Google Scholar · View at Scopus
  13. E. K. Lee, “Machine learning framework for classification in medicine and biology,” in Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems, Springer, Berlin, Germany, 2009. View at Google Scholar
  14. A. Telaar, G. Nürnberg, and D. Repsilber, “Finding biomarker signatures in pooled sample designs: a simulation framework for methodological comparisons,” Advances in Bioinformatics, vol. 2010, Article ID 318573, 8 pages, 2010. View at Publisher · View at Google Scholar · View at Scopus
  15. C. Kendziorski, R. A. Irizarry, K. S. Chen, J. D. Haag, and M. N. Gould, “On the utility of pooling biological samples in microarray experiments,” Proceedings of the National Academy of Sciences of the United States of America, vol. 102, no. 12, pp. 4252–4257, 2005. View at Publisher · View at Google Scholar · View at Scopus
  16. T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer, 2009.
  17. T. M. Mitchell, Machine Learning, McGraw-Hill, 1997.
  18. V. N. Vapnik, Statistical Learning Theory, John Wiley & Sons, New York, NY, USA, 1998.
  19. W. S. Noble, “What is a support vector machine?” Nature Biotechnology, vol. 24, no. 12, pp. 1565–1567, 2006. View at Publisher · View at Google Scholar · View at Scopus
  20. K. Kusonmano, M. Netzer, B. Pfeifer, C. Baumgartner, K. R. Liedl, and A. Graber, “Evaluation of the impact of dataset charactertics for classification problems in biological applications,” in Proceedings of the International Conference on Bioinformatics and Biomedicine, pp. 741–745, Venice, Italy, 2009.
  21. L. Breiman, “Random forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, 2001. View at Publisher · View at Google Scholar · View at Scopus
  22. M. Y. Park and T. Hastie, “Penalized logistic regression for detecting gene interactions,” Biostatistics, vol. 9, no. 1, pp. 30–50, 2008. View at Publisher · View at Google Scholar · View at Scopus
  23. J. Zhu and T. Hastie, “Classification of gene microarrays by penalized logistic regression,” Biostatistics, vol. 5, no. 3, pp. 427–443, 2004. View at Publisher · View at Google Scholar · View at Scopus
  24. R. Tibshirani, T. Hastie, B. Narasimhan, and G. Chu, “Diagnosis of multiple cancer types by shrunken centroids of gene expression,” Proceedings of the National Academy of Sciences of the United States of America, vol. 99, no. 10, pp. 6567–6572, 2002. View at Publisher · View at Google Scholar · View at Scopus
  25. C. Baumgartner and A. Graber, “Data mining and knowledge discovery in metabolomics,” in Successes and New Directions in Data Mining, P. Poncelet, F. Masseglia, and M. Teisseire, Eds., pp. 141–166, IGI Global, 2008. View at Google Scholar
  26. Y. Saeys, I. Inza, and P. Larrañaga, “A review of feature selection techniques in bioinformatics,” Bioinformatics, vol. 23, no. 19, pp. 2507–2517, 2007. View at Publisher · View at Google Scholar · View at Scopus
  27. B. Wu, T. Abbott, D. Fishman et al., “Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data,” Bioinformatics, vol. 19, no. 13, pp. 1636–1643, 2003. View at Publisher · View at Google Scholar · View at Scopus
  28. Y. Guo, A. Graber, R. N. McBurney, and R. Balasubramanian, “Sample size and statistical power considerations in high-dimensionality data settings: a comparative study of classification algorithms,” BMC Bioinformatics, vol. 11, article no. 447, 2010. View at Publisher · View at Google Scholar · View at Scopus
  29. J. Quackenbush, “Microarray data normalization and transformation,” Nature Genetics, vol. 32, no. 5, pp. 496–501, 2002. View at Google Scholar · View at Scopus
  30. M. Slawski, M. Daumer, and A. L. Boulesteix, “CMA—a comprehensive Bioconductor package for supervised classification with high dimensional data,” BMC Bioinformatics, vol. 9, article no. 439, 2008. View at Publisher · View at Google Scholar · View at Scopus
  31. W. Pan, “A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments,” Bioinformatics, vol. 18, no. 4, pp. 546–554, 2002. View at Google Scholar · View at Scopus
  32. A. Statnikov, L. Wang, and C. F. Aliferis, “A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification,” BMC Bioinformatics, vol. 9, article no. 319, 2008. View at Publisher · View at Google Scholar · View at Scopus
  33. R. Díaz-Uriarte and S. Alvarez de Andrés, “Gene selection and classification of microarray data using random forest,” BMC Bioinformatics, vol. 7, article no. 3, 2006. View at Publisher · View at Google Scholar · View at Scopus
  34. R. Polikar, “Ensemble based systems in decision making,” IEEE Circuits and Systems Magazine, vol. 6, no. 3, Article ID 1688199, pp. 21–44, 2006. View at Publisher · View at Google Scholar · View at Scopus
  35. R. N. McBurney, W. M. Hines, L. S. Von Tungeln et al., “The liver toxicity biomarker study: phase i design and preliminary results,” Toxicologic Pathology, vol. 37, no. 1, pp. 52–64, 2009. View at Publisher · View at Google Scholar · View at Scopus