Table of Contents Author Guidelines Submit a Manuscript
BioMed Research International
Volume 2013, Article ID 239628, 13 pages
http://dx.doi.org/10.1155/2013/239628
Research Article

Recognition of Multiple Imbalanced Cancer Types Based on DNA Microarray Data Using Ensemble Classifiers

1School of Computer Science and Engineering, Jiangsu University of Science and Technology, No. 2 Mengxi Road, Zhenjiang 212003, China
2Department of Radiology, Carver College of Medicine, The University of Iowa, Iowa City, IA 52242, USA
3School of Biology and Chemical Engineering, Jiangsu University of Science and Technology, No. 2 Mengxi Road, Zhenjiang 212003, China

Received 7 April 2013; Revised 8 July 2013; Accepted 17 July 2013

Academic Editor: Alexander Zelikovsky

Copyright © 2013 Hualong Yu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. T. Puelma, R. A. Gutierrez, and A. Soto, “Discriminative local subspaces in gene expression data for effective gene function prediction,” Bioinformatics, vol. 28, no. 17, pp. 2256–2264, 2012. View at Google Scholar
  2. F. De Longueville, D. Surry, G. Meneses-Lorente et al., “Gene expression profiling of drug metabolism and toxicology markers using a low-density DNA microarray,” Biochemical Pharmacology, vol. 64, no. 1, pp. 137–149, 2002. View at Publisher · View at Google Scholar · View at Scopus
  3. S. Bates, “The role of gene expression profiling in drug discovery,” Current Opinion in Pharmacology, vol. 11, no. 5, pp. 549–556, 2011. View at Publisher · View at Google Scholar · View at Scopus
  4. M. Kabir, N. Noman, and H. Iba, “Reverse engineering gene regulatory network from microarray data using linear time-variant model,” BMC Bioinformatics, vol. 11, no. 1, article S56, 2010. View at Publisher · View at Google Scholar · View at Scopus
  5. G. Chalancon, C. N. J. Ravarani, S. Balaji et al., “Interplay between gene expression noise and regulatory network architecture,” Trends in Genetics, vol. 28, no. 5, pp. 221–232, 2012. View at Publisher · View at Google Scholar · View at Scopus
  6. T. R. Golub, D. K. Slonim, P. Tamayo et al., “Molecular classification of cancer: class discovery and class prediction by gene expression monitoring,” Science, vol. 286, no. 5439, pp. 531–527, 1999. View at Publisher · View at Google Scholar · View at Scopus
  7. C. L. Nutt, D. R. Mani, R. A. Betensky et al., “Gene expression-based classification of malignant gliomas correlates better with survival than histological classification,” Cancer Research, vol. 63, no. 7, pp. 1602–1607, 2003. View at Google Scholar · View at Scopus
  8. X. Wang and R. Simon, “Microarray-based cancer prediction using single genes,” BMC Bioinformatics, vol. 12, article 391, 2011. View at Publisher · View at Google Scholar · View at Scopus
  9. S. Ghorai, A. Mukherjee, S. Sengupta, and P. K. Dutta, “Cancer classification from gene expression data by NPPC ensemble,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 8, no. 3, pp. 659–671, 2011. View at Publisher · View at Google Scholar · View at Scopus
  10. T. D. Pham, C. Wells, and D. I. Grane, “Analysis of microarray gene expression data,” Current Bioinformatics, vol. 1, no. 1, pp. 37–53, 2006. View at Google Scholar
  11. K. Yang, Z. Cai, J. Li, and G. Lin, “A stable gene selection in microarray data analysis,” BMC Bioinformatics, vol. 7, article 228, 2006. View at Publisher · View at Google Scholar · View at Scopus
  12. G.-Z. Li, H.-H. Meng, and J. Ni, “Embedded gene selection for imbalanced microarray data analysis,” in Proceedings of the 3rd International Multi-Symposiums on Computer and Computational Sciences (IMSCCS '08), pp. 17–24, Shanghai, China, October 2008. View at Publisher · View at Google Scholar · View at Scopus
  13. A. H. M. Kamal, X. Zhu, and R. Narayanan, “Gene selection for microarray expression data with imbalanced sample distributions,” in Proceedings of the International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing (IJCBS '09), pp. 3–9, Shanghai, China, August 2009. View at Publisher · View at Google Scholar · View at Scopus
  14. R. Blagus and L. Lusa, “Class prediction for high-dimensional class-imbalanced data,” BMC Bioinformatics, vol. 11, article 523, 2010. View at Publisher · View at Google Scholar · View at Scopus
  15. M. Wasikowski and X.-W. Chen, “Combating the small sample class imbalance problem using feature selection,” IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 10, pp. 1388–1400, 2010. View at Publisher · View at Google Scholar · View at Scopus
  16. H. L. Yu, J. Ni, and J. Zhao, “ACOSampling: an ant colony optimization-based undersampling method for classifying imbalanced DNA microarray data,” Neurocomputing, vol. 101, no. 2, pp. 309–318, 2013. View at Google Scholar
  17. W. J. Lin and J. J. Chen, “Class-imbalanced classifiers for high-dimensional data,” Briefings in Bioinformatics, vol. 14, no. 1, pp. 13–26, 2013. View at Google Scholar
  18. R. Blagus and L. Lusa, “Evaluation of SMOTE for high-dimensional class-imbalanced microarray data,” in Proceedings of the 11th International Conference on Machine Learning and Applications, pp. 89–94, Boca Raton, Fla, USA, 2012.
  19. S. Wang and X. Yao, “Multiclass imbalance problems: analysis and potential solutions,” IEEE Transactions on Systems, Man, and Cybernetics B, vol. 42, no. 4, pp. 1119–1130, 2012. View at Publisher · View at Google Scholar · View at Scopus
  20. M. J. Abdi, S. M. Hosseini, and M. Rezghi, “A novel weighted support vector machine based on particle swarm optimization for gene selection and tumor classification,” Computational and Mathematical Methods in Medicine, vol. 2012, Article ID 320698, 7 pages, 2012. View at Publisher · View at Google Scholar
  21. A. C. Lorena, A. C. P. L. F. De Carvalho, and J. M. P. Gama, “A review on the combination of binary classifiers in multiclass problems,” Artificial Intelligence Review, vol. 30, no. 1–4, pp. 19–37, 2008. View at Publisher · View at Google Scholar · View at Scopus
  22. C.-H. Yeang, S. Ramaswamy, P. Tamayo et al., “Molecular classification of multiple tumor types,” Bioinformatics, vol. 17, supplement 1, pp. S316–S322, 2001. View at Google Scholar · View at Scopus
  23. L. Shen and E. C. Tan, “Reducing multiclass cancer classification to binary by output coding and SVM,” Computational Biology and Chemistry, vol. 30, no. 1, pp. 63–71, 2006. View at Publisher · View at Google Scholar · View at Scopus
  24. S. J. Joseph, K. R. Robbins, W. Zhang, and R. Rekaya, “Comparison of two output-coding strategies for multi-class tumor classification using gene expression data and latent variable model as binary classifier,” Cancer Informatics, vol. 9, pp. 39–48, 2010. View at Google Scholar · View at Scopus
  25. A. Statnikov, C. F. Aliferis, I. Tsamardinos, D. Hardin, and S. Levy, “A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis,” Bioinformatics, vol. 21, no. 5, pp. 631–643, 2005. View at Publisher · View at Google Scholar · View at Scopus
  26. L. Nanni, S. Brahnam, and A. Lumini, “Combining multiple approaches for gene microarray classification,” Bioinformatics, vol. 28, no. 8, pp. 1151–1157, 2012. View at Publisher · View at Google Scholar · View at Scopus
  27. A. Bertoni, R. Folgieri, and G. Valentini, “Classification of DNA microarray data with Random Projection Ensembles of Polynomial SVMs,” in Proceedings of the 18th Italian Workshop on Neural Networks, pp. 60–66, Vietri sul Mare, Italy, 2008.
  28. Y. Chen and Y. Zhao, “A novel ensemble of classifiers for microarray data classification,” Applied Soft Computing Journal, vol. 8, no. 4, pp. 1664–1669, 2008. View at Publisher · View at Google Scholar · View at Scopus
  29. K.-J. Kim and S.-B. Cho, “An evolutionary algorithm approach to optimal ensemble classifiers for DNA microarray data analysis,” IEEE Transactions on Evolutionary Computation, vol. 12, no. 3, pp. 377–388, 2008. View at Publisher · View at Google Scholar · View at Scopus
  30. A. Anand, G. Pugalenthi, G. B. Fogel, and P. N. Suganthan, “An approach for classification of highly imbalanced data using weighting and undersampling,” Amino Acids, vol. 39, no. 5, pp. 1385–1391, 2010. View at Publisher · View at Google Scholar · View at Scopus
  31. T. G. Dietterich and G. Bariki, “Solving multiclass learning problems via error-correcting output codes,” Journal of Artificial Intelligence Research, vol. 2, pp. 263–286, 1995. View at Google Scholar
  32. B. Kijsirikul and N. Ussivakul, “Multiclass support vector machines using adaptive directed acyclic graph,” in Proceedings of the International Joint Conference on Neural Networks (IJCNN '02), pp. 980–985, Honolulu, Hawaii, USA, May 2002. View at Scopus
  33. H. L. Yu, S. Gao, B. Qin et al., “Multiclass microarray data classification based on confidence evaluation,” Genetics and Molecular Research, vol. 11, no. 2, pp. 1357–1369, 2012. View at Google Scholar
  34. J.-H. Hong and S.-B. Cho, “A probabilistic multi-class strategy of one-vs.-rest support vector machines for cancer classification,” Neurocomputing, vol. 71, no. 16–18, pp. 3275–3281, 2008. View at Publisher · View at Google Scholar · View at Scopus
  35. A. Krogh and J. Vedelsby, “Neural network ensembles, cross validation, and active learning,” Advances in Neural Information Processing Systems, vol. 7, pp. 231–238, 1995. View at Google Scholar
  36. L. Breiman, “Bagging predictors,” Machine Learning, vol. 24, no. 2, pp. 123–140, 1996. View at Google Scholar · View at Scopus
  37. X. Li, L. Wang, and E. Sung, “AdaBoost with SVM-based component classifiers,” Engineering Applications of Artificial Intelligence, vol. 21, no. 5, pp. 785–795, 2008. View at Publisher · View at Google Scholar · View at Scopus
  38. T. K. Ho, “The random subspace method for constructing decision forests,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 8, pp. 832–844, 1998. View at Publisher · View at Google Scholar · View at Scopus
  39. L. Breiman, “Random forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, 2001. View at Publisher · View at Google Scholar · View at Scopus
  40. M. Gao, X. Hong, S. Chen, and C. J. Harris, “A combined SMOTE and PSO based RBF classifier for two-class imbalanced problems,” Neurocomputing, vol. 74, no. 17, pp. 3456–3466, 2011. View at Publisher · View at Google Scholar · View at Scopus
  41. R. Akbani, S. Kwek, and N. Japkowicz, “Applying support vector machines to imbalanced datasets,” in Proceedings of the 15th European Conference on Machine Learning (ECML '04), vol. 3201 of Lecture Notes in Computer Science, pp. 39–50, September 2004. View at Scopus
  42. P. Phoungphol, Y. Zhang, and Y. Zhao, “Robust multiclass classification for learning from imbalanced biomedical data,” Tsinghua Science and Technology, vol. 17, no. 6, pp. 619–628, 2012. View at Google Scholar
  43. S. A. Armstrong, J. E. Staunton, L. B. Silverman et al., “MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia,” Nature Genetics, vol. 30, no. 1, pp. 41–47, 2002. View at Publisher · View at Google Scholar · View at Scopus
  44. A. I. Su, J. B. Welsh, L. M. Sapinoso et al., “Molecular classification of human carcinomas by use of gene expression signatures,” Cancer Research, vol. 61, no. 20, pp. 7388–7393, 2001. View at Google Scholar · View at Scopus
  45. J. Khan, J. S. Wei, M. Ringnér et al., “Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks,” Nature Medicine, vol. 7, no. 6, pp. 673–679, 2001. View at Google Scholar
  46. S. L. Pomeroy, P. Tamayo, M. Gaasenbeek et al., “Prediction of central nervous system embryonal tumour outcome based on gene expression,” Nature, vol. 415, no. 6870, pp. 436–442, 2002. View at Publisher · View at Google Scholar · View at Scopus
  47. A. Bhattacharjee, W. G. Richards, J. Staunton et al., “Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses,” Proceedings of the National Academy of Sciences of the United States of America, vol. 98, no. 24, pp. 13790–13795, 2001. View at Publisher · View at Google Scholar · View at Scopus
  48. S. Ramaswamy, P. Tamayo, R. Rifkin et al., “Multiclass cancer diagnosis using tumor gene expression signatures,” Proceedings of the National Academy of Sciences of the United States of America, vol. 98, no. 26, pp. 15149–15154, 2001. View at Publisher · View at Google Scholar · View at Scopus
  49. A. Özgür, L. Özgür, and T. Güngör, “Text categorization with class-based and corpus-based keyword selection,” Lecture Notes in Computer Science, vol. 3733, pp. 606–615, 2005. View at Google Scholar · View at Scopus
  50. N. Japkowicz and S. Stephen, “The class imbalance problem: a systematic study,” Intelligent Data Analysis, vol. 6, no. 5, pp. 429–449, 2002. View at Google Scholar
  51. X.-Y. Liu, J. Wu, and Z.-H. Zhou, “Exploratory undersampling for class-imbalance learning,” IEEE Transactions on Systems, Man, and Cybernetics B, vol. 39, no. 2, pp. 539–550, 2009. View at Publisher · View at Google Scholar · View at Scopus