Journal of Artificial Evolution and Applications

Journal of Artificial Evolution and Applications / 2009 / Article
Special Issue

Artificial Evolution Methods in the Biological and Biomedical Sciences

View this Special Issue

Research Article | Open Access

Volume 2009 |Article ID 848532 | 13 pages |

Classification of Oncologic Data with Genetic Programming

Academic Editor: Jason Moore
Received14 Nov 2008
Revised02 Apr 2009
Accepted13 Jun 2009
Published12 Aug 2009


Discovering the models explaining the hidden relationship between genetic material and tumor pathologies is one of the most important open challenges in biology and medicine. Given the large amount of data made available by the DNA Microarray technique, Machine Learning is becoming a popular tool for this kind of investigations. In the last few years, we have been particularly involved in the study of Genetic Programming for mining large sets of biomedical data. In this paper, we present a comparison between four variants of Genetic Programming for the classification of two different oncologic datasets: the first one contains data from healthy colon tissues and colon tissues affected by cancer; the second one contains data from patients affected by two kinds of leukemia (acute myeloid leukemia and acute lymphoblastic leukemia). We report experimental results obtained using two different fitness criteria: the receiver operating characteristic and the percentage of correctly classified instances. These results, and their comparison with the ones obtained by three nonevolutionary Machine Learning methods (Support Vector Machines, MultiBoosting, and Random Forests) on the same data, seem to hint that Genetic Programming is a promising technique for this kind of classification.


  1. P. Russel, Fundamentals of Genetics, Addison-Wesley, Reading, Mass, USA, 2000.
  2. J. Koza, Genetic Programming, MIT Press, Cambridge, Mass, USA, 1992.
  3. Y. Lu and J. Han, “Cancer classification using gene expression data,” Information Systems, vol. 28, no. 4, pp. 243–268, 2003. View at: Publisher Site | Google Scholar
  4. D. Michie, D.-J. Spiegelhalter, and C.-C. Taylor, Machine Learning, Neural and Statistical Classification, Prentice-Hall, Upper Saddle River, NJ, USA, 1994.
  5. U. Alon, N. Barkai, D. A. Notterman et al., “Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays,” Proceedings of the National Academy of Sciences of the United States of America, vol. 96, no. 12, pp. 6745–6750, 1999. View at: Publisher Site | Google Scholar
  6. A. L. Hsu, S.-L. Tang, and S. K. Halgamuge, “An unsupervised hierarchical dynamic self-organizing approach to cancer class discovery and marker gene identification in microarray data,” Bioinformatics, vol. 19, no. 16, pp. 2131–2140, 2003. View at: Publisher Site | Google Scholar
  7. I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, “Gene selection for cancer classification using support vector machines,” Machine Learning, vol. 46, no. 1–3, pp. 389–422, 2002. View at: Publisher Site | Google Scholar
  8. J. C. Hernandez, B. Duval, and J.-K. Hao, “A genetic embedded approach for gene selection and classification of microarray data,” in Proceedings of the 5th European Conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics (EvoBIO '07), vol. 4447 of Lecture Notes in Computer Science, pp. 90–101, Springer, Valencia, Spain, April 2007. View at: Google Scholar
  9. N. Friedman, M. Linial, I. Nachman, and D. Pe'er, “Using Bayesian networks to analyze expression data,” Journal of Computational Biology, vol. 7, no. 3-4, pp. 601–620, 2000. View at: Publisher Site | Google Scholar
  10. J. H. Holland, Adaptation in Natural and Artificial Systems, University of Michigan Press, Ann Arbor, Mich, USA, 1975.
  11. D. E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning, Addison-Wesley, Reading, Mass, USA, 1989.
  12. J. J. Liu, G. Cutler, W. Li et al., “Multiclass cancer classification and biomarker discovery using GA-based algorithms,” Bioinformatics, vol. 21, no. 11, pp. 2691–2697, 2005. View at: Publisher Site | Google Scholar
  13. J.-H. Moore, J.-S. Parker, and L.-W. Hahn, “Symbolic discriminant analysis for mining gene expression patterns,” L. De Raedt and P. Flach, Eds., vol. 2167 of Lecture Notes in Artificial Intelligence, pp. 372–381, Springer, Berlin, Germany, 2001. View at: Google Scholar
  14. M. Rosskopf, H. A. Schmidt, U. Feldkamp, and W. Banzhaf, “Genetic programming based DNA microarray analysis for classification of tumour tissues,” Tech. Rep. 2007-03, Memorial University of Newfoundland, 2007. View at: Google Scholar
  15. J. Yu, J. Yu, A. A. Almal et al., “Feature selection and molecular classification of cancer using genetic programming,” Neoplasia, vol. 9, no. 4, pp. 292–303, 2007. View at: Publisher Site | Google Scholar
  16. C. C. Bojarczuk, H. S. Lopes, and A. A. Freitas, “Data mining with constrained-syntax genetic programming: applications to medical data sets,” in Proceedings of the Intelligent Data Analysis in Medicine and Pharmacology, 2001. View at: Google Scholar
  17. J.-H. Hong and S.-B. Cho, “The classification of cancer based on DNA microarray data that uses diverse ensemble genetic programming,” Artificial Intelligence in Medicine, vol. 36, no. 1, pp. 43–58, 2006. View at: Publisher Site | Google Scholar
  18. T. R. Golub, D. K. Slonim, P. Tamayo et al., “Molecular classification of cancer: class discovery and class prediction by gene expression monitoring,” Science, pp. 531–537, 1999. View at: Google Scholar
  19. M. Keijzer, “Scaled symbolic regression,” Genetic Programming and Evolvable Machines, vol. 5, no. 3, pp. 259–269, 2004. View at: Publisher Site | Google Scholar
  20. C. E. Metz, “Basic principles of ROC analysis,” Seminars in Nuclear Medicine, vol. 8, no. 4, pp. 283–298, 1978. View at: Google Scholar
  21. M. H. Zweig and G. Campbell, “Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine,” Clinical Chemistry, vol. 39, no. 4, pp. 561–577, 1993. View at: Google Scholar
  22. V. Vapnik, Statistical Learning Theory, John Wiley & Sons, New York, NY, USA, 1998.
  23. J. Platt, “Fast training of support vector machines using sequential minimal optimization,” in Advances in Kernel Methods: Support Vector Learning, B. Schoelkopf, C. Burges, and A. Smola, Eds., MIT Press, Cambridge, Mass, USA, 1998. View at: Google Scholar
  24. Weka, a multi-task machine learning software developed by Waikato University,
  25. Y. Freund and R. E. Schapire, “A decision-theoretic generalization of on-line learning and an application to boosting,” Journal of Computer and System Sciences, vol. 55, no. 1, pp. 119–139, 1997. View at: Google Scholar
  26. G. I. Webb, “MultiBoosting: a technique for combining boosting and wagging,” Machine Learning, vol. 40, no. 2, pp. 159–196, 2000. View at: Publisher Site | Google Scholar
  27. L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classification and Regression Trees, Wadsworth International Group, Belmont, Calif, USA, 1984.
  28. L. Breiman, “Random forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, 2001. View at: Publisher Site | Google Scholar | Zentralblatt MATH
  29. L. Vanneschi, D. Rochat, and M. Tomassini, “Multi-optimization for generalization in symbolic regression using genetic programming,” in Proceedings of the 2nd Annual Italian Workshop on Artificial Life and Evolutionary Computation (WIVACE '07), G. Nicosia et al., Ed., 2007. View at: Google Scholar

Copyright © 2009 Leonardo Vanneschi et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

More related articles

177 Views | 0 Downloads | 2 Citations
 PDF  Download Citation  Citation
 Download other formatsMore
 Order printed copiesOrder

Related articles

We are committed to sharing findings related to COVID-19 as quickly and safely as possible. Any author submitting a COVID-19 paper should notify us at to ensure their research is fast-tracked and made available on a preprint server as soon as possible. We will be providing unlimited waivers of publication charges for accepted articles related to COVID-19. Sign up here as a reviewer to help fast-track new submissions.