Table of Contents Author Guidelines Submit a Manuscript
Computational and Mathematical Methods in Medicine
Volume 2017 (2017), Article ID 7907163, 18 pages
https://doi.org/10.1155/2017/7907163
Research Article

A Multicriteria Approach to Find Predictive and Sparse Models with Stable Feature Selection for High-Dimensional Data

Department of Statistics, TU Dortmund University, 44221 Dortmund, Germany

Correspondence should be addressed to Andrea Bommert; ed.dnumtrod-ut.kitsitats@tremmob

Received 22 February 2017; Revised 3 May 2017; Accepted 5 June 2017; Published 1 August 2017

Academic Editor: Benjamin Hofner

Copyright © 2017 Andrea Bommert et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. M. Lang, H. Kotthaus, P. Marwedel, C. Weihs, J. Rahnenführer, and B. Bischl, “Automatic model selection for high-dimensional survival analysis,” Journal of Statistical Computation and Simulation, vol. 85, no. 1, pp. 62–76, 2015. View at Publisher · View at Google Scholar · View at MathSciNet
  2. A. Kalousis, J. Prados, and M. Hilario, “Stability of feature selection algorithms: a study on high-dimensional spaces,” Knowledge and Information Systems, vol. 12, no. 1, pp. 95–116, 2007. View at Publisher · View at Google Scholar · View at Scopus
  3. Z. He and W. Yu, “Stable feature selection for biomarker discovery,” Computational Biology and Chemistry, vol. 34, no. 4, pp. 215–225, 2010. View at Publisher · View at Google Scholar · View at Scopus
  4. L. Lausser, C. Müssel, M. Maucher, and H. A. Kestler, “Measuring and visualizing the stability of biomarker selection techniques,” Computational Statistics, vol. 28, no. 1, pp. 51–65, 2013. View at Publisher · View at Google Scholar · View at MathSciNet
  5. S. Nogueira and G. Brown, “Measuring the stability of feature selection,” in Machine Learning and Knowledge Discovery in Databases, vol. 9852 of Lecture Notes in Computer Science, pp. 442–457, Springer International Publishing, Cham, 2016. View at Publisher · View at Google Scholar
  6. S. Alelyani, Z. Zhao, and H. Liu, “A dilemma in assessing stability of feature selection algorithms,” in Proceedings of the 2011 IEEE International Conference on High Performance Computing and Communications, pp. 701–707, can, September 2011. View at Publisher · View at Google Scholar · View at Scopus
  7. H. Wang, T. M. Khoshgoftaar, R. Wald, and A. Napolitano, “A novel dataset-similarity-aware approach for evaluating stability of software metric selection techniques,” in Proceedings of the 2012 IEEE 13th International Conference on Information Reuse and Integration, IRI 2012, pp. 1–8, August 2012. View at Publisher · View at Google Scholar · View at Scopus
  8. N. Meinshausen and P. Bühlmann, “Stability selection,” Journal of the Royal Statistical Society. Series B. Statistical Methodology, vol. 72, no. 4, pp. 417–473, 2010. View at Publisher · View at Google Scholar · View at MathSciNet
  9. A.-L. Boulesteix and M. Slawski, “Stability and aggregation of ranked gene lists,” Briefings in Bioinformatics, vol. 10, no. 5, pp. 556–568, 2009. View at Publisher · View at Google Scholar · View at Scopus
  10. S. Lee, J. Rahnenführer, M. Lang et al., “Robust selection of cancer survival signatures from high-throughput genomic data using two-fold subsampling,” PLoS ONE, vol. 9, no. 10, Article ID e108818, 2014. View at Publisher · View at Google Scholar · View at Scopus
  11. W. Awada, T. M. Khoshgoftaar, D. Dittman, R. Wald, and A. Napolitano, “A review of the stability of feature selection techniques for bioinformatics data,” in Proceedings of the 2012 IEEE International Conference on Information Reuse and Integration, pp. 356–363, usa, August 2012. View at Publisher · View at Google Scholar · View at Scopus
  12. T. Abeel, T. Helleputte, Y. Van de Peer, P. Dupont, and Y. Saeys, “Robust biomarker identification for cancer diagnosis with ensemble feature selection methods,” Bioinformatics, vol. 26, no. 3, pp. 392–398, 2009. View at Publisher · View at Google Scholar · View at Scopus
  13. C. A. Davis, F. Gerick, V. Hintermair et al., “Reliable gene signatures for microarray classification: assessment of stability and performance,” Bioinformatics, vol. 22, no. 19, pp. 2356–2363, 2006. View at Publisher · View at Google Scholar · View at Scopus
  14. N. Dessì, E. Pascariello, and B. Pes, “A comparative analysis of biomarker selection techniques,” BioMed Research International, vol. 2013, Article ID 387673, 10 pages, 2013. View at Publisher · View at Google Scholar · View at Scopus
  15. D. Dittman, T. M. Khoshgoftaar, R. Wald, and H. Wang, “Stability analysis of feature ranking techniques on biological datasets,” in Proceedings of the 2011 IEEE International Conference on Bioinformatics and Biomedicine, pp. 252–256, November 2011. View at Publisher · View at Google Scholar · View at Scopus
  16. A. Haury, P. Gestraud, and J. Vert, “The influence of feature selection methods on accuracy, stability and interpretability of molecular signatures,” PLoS ONE, vol. 6, no. 12, Article ID e28210, 2011. View at Publisher · View at Google Scholar · View at Scopus
  17. H. W. Lee, C. Lawton, Y. J. Na, and S. Yoon, “Robustness of chemometrics-based feature selection methods in early cancer detection and biomarker discovery,” Statistical Applications in Genetics and Molecular Biology, vol. 12, no. 2, pp. 207–223, 2013. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  18. Y. Saeys, T. Abeel, and Y. Van De Peer, “Robust feature selection using ensemble feature selection techniques,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 5212, no. 2, pp. 313–325, 2008. View at Publisher · View at Google Scholar · View at Scopus
  19. L.-R. Schirra, L. Lausser, and H. A. Kestler, Analysis of Large and Complex Data, Springer, 2016. View at Publisher · View at Google Scholar · View at Scopus
  20. P. Jaccard, “Étude comparative de la distribution florale dans une portion des alpes et du jura,” Bulletin de la Société Vaudoise des Sciences Naturelles, vol. 37, pp. 547–579, 1901. View at Google Scholar
  21. L. R. Dice, “Measures of the amount of ecologic association between species,” Ecology, vol. 26, no. 3, pp. 297–302, 1945. View at Publisher · View at Google Scholar
  22. A. Ochiai, “Zoogeographic studies on the soleoid fishes found in japan and its neighbouring regions,” '' Bulletin of the Japanese Society for the Science of Fish, vol. 22, no. 9, pp. 526–530, 1957. View at Publisher · View at Google Scholar
  23. M. Zucknick, S. Richardson, and E. Stronach, “Comparing the characteristics of gene expression profiles derived by univariate and multivariate classification methods,” Statistical Applications in Genetics and Molecular Biology, vol. 7, no. 1, pp. 1–34, 2008. View at Publisher · View at Google Scholar · View at MathSciNet
  24. J. L. Lustgarten, V. Gopalakrishnan, and S. Visweswaran, “Measuring stability of feature selection in biomedical datasets,” AMIA ... Annual Symposium proceedings/AMIA Symposium. AMIA Symposium, vol. 2009, pp. 406–410, 2009. View at Google Scholar · View at Scopus
  25. J. Novovicová, P. Somol, and P. Pudil, “A new measure of feature selection algorithms' stability,” in Proceedings of the 2009 IEEE International Conference on Data Mining Workshops, ICDMW 2009, pp. 382–387, December 2009. View at Publisher · View at Google Scholar · View at Scopus
  26. P. Somol and J. Novovicová, “Evaluating stability and comparing output of feature selectors that optimize feature subset cardinality,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 11, pp. 1921–1939, 2010. View at Publisher · View at Google Scholar · View at Scopus
  27. L. I. Kuncheva, “A stability index for feature selection,” in Proceedings of the 25th IASTED International Conference on Artificial Intelligence and Applications (AIA '07), pp. 390–395, February 2007. View at Scopus
  28. C. Sammut and G. I. Webb, Encyclopedia of Machine Learning, Springer, New York, NY, USA, 2011.
  29. H. Peng, F. Long, and C. Ding, “Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 8, pp. 1226–1238, 2005. View at Publisher · View at Google Scholar · View at Scopus
  30. B. Hofner, A. Mayr, N. Robinzonov, and M. Schmid, “Model-based boosting in R: a hands-on tutorial using the R Package mboost,” Computational Statistics, vol. 29, no. 1-2, pp. 3–35, 2014. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  31. P. Bühlmann and B. Yu, “Boosting with the L2 loss,” Journal of the American Statistical Association, vol. 98, no. 462, pp. 324–339, 2003. View at Publisher · View at Google Scholar · View at MathSciNet
  32. G.-X. Yuan, C.-H. Ho, and C.-J. Lin, “Am improved GLMNET for L1-regularized logistic regression,” Journal of Machine Learning Research (JMLR), vol. 13, no. 1, pp. 1999–2030, 2012. View at Google Scholar · View at MathSciNet
  33. A. J. Izenman, Modern Multivariate Statistical Techniques: Regression, Classification, and Manifold Learning, Springer, New York, NY, USA, 2nd edition, 2013.
  34. K. Miettinen, Nonlinear Multiobjective Optimization, Kluwer Academic Publishers, Norwell, Mass, USA, 4th edition, 2004.
  35. G. Stiglic and P. Kokol, “Stability of ranked gene lists in large microarray analysis studies,” Journal of Biomedicine and Biotechnology, vol. 2010, Article ID 616358, 2010. View at Publisher · View at Google Scholar · View at Scopus
  36. J. Vanschoren, J. N. van Rijn, B. Bischl, and L. Torgo, “OpenML: networked science in machine learning,” ACM SIGKDD Explorations Newsletter, vol. 15, no. 2, pp. 49–60, 2013. View at Google Scholar
  37. The Cancer Genome Atlas Research Network, “Comprehensive molecular characterization of gastric adenocarcinoma,” Nature, vol. 513, pp. 202–209, 2014. View at Google Scholar
  38. R. Core and R. R Core Team, Team, A Language and, Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, 2016.
  39. B. Bischl, M. Lang, L. Kotthoff et al., “mlr: machine learning in R,” Journal of Machine Learning Research (JMLR), vol. 17, no. 170, pp. 1–5, 2016. View at Google Scholar
  40. B. Bischl, M. Lang, O. Mersmann, J. Rahnenführer, and C. Weihs, “Batchjobs and batchexperiments: Abstraction mechanisms for using R in batch environments,” Journal of Statistical Software, vol. 64, no. 11, pp. 1–25, 2015. View at Google Scholar · View at Scopus
  41. M. Lang, “fmrmr: Fast mRMR,” R package version 0.1, 2015. View at Google Scholar
  42. A. Karatzoglou, K. Hornik, A. Smola, and A. Zeileis, “kernlab—an S4 package for kernel methods in R,” Journal of Statistical Software, vol. 11, no. 9, pp. 1–20, 2004. View at Google Scholar · View at Scopus
  43. T. Helleputte and P. Gramme, “LiblineaR: Linear Predictive Models Based on the LIBLINEAR C/C++ Library,” R package version 1.94-2, 2015. View at Google Scholar
  44. T. Hothorn, P. Bühlmann, T. Kneib, M. Schmid, B. Hofner, and P. Bühlmann, “mboost: Model-Based Boosting,” in R package version 2.6-0, p. 2, Model-Based Boosting, mboost, 2015. View at Google Scholar
  45. M. N. Wright and A. Ziegler, “Ranger: a fast implementation of random forests for high dimensional data in C++ and R,” Journal of Statistical Software, vol. 77, no. 1, 2017. View at Publisher · View at Google Scholar
  46. T. Sing, O. Sander, N. Beerenwinkel, and T. Lengauer, “ROCR: visualizing classifier performance in R,” Bioinformatics, vol. 21, no. 20, pp. 3940-3941, 2005. View at Publisher · View at Google Scholar · View at Scopus