Table of Contents Author Guidelines Submit a Manuscript
Computational and Mathematical Methods in Medicine
Volume 2017 (2017), Article ID 1421409, 8 pages
https://doi.org/10.1155/2017/1421409
Research Article

Probing for Sparse and Fast Variable Selection with Model-Based Boosting

1Department of Statistics, LMU München, München, Germany
2Department of Medical Informatics, Biometry and Epidemiology, FAU Erlangen-Nürnberg, Erlangen, Germany
3Department of Medical Biometry, Informatics and Epidemiology, University Hospital Bonn, Bonn, Germany

Correspondence should be addressed to Tobias Hepp; ed.negnalre-ku@ppeh.saibot

Received 9 February 2017; Accepted 13 April 2017; Published 31 July 2017

Academic Editor: Yuhai Zhao

Copyright © 2017 Janek Thomas et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. R. Romero, J. Espinoza, F. Gotsch et al., “The use of high-dimensional biology (genomics, transcriptomics, proteomics, and metabolomics) to understand the preterm parturition syndrome,” BJOG: An International Journal of Obstetrics and Gynaecology, vol. 113, no. s3, pp. 118–135, 2006. View at Publisher · View at Google Scholar · View at Scopus
  2. R. Clarke, H. W. Ressom, A. Wang et al., “The properties of high-dimensional data spaces: implications for exploring gene and protein expression data,” Nature Reviews Cancer, vol. 8, no. 1, pp. 37–49, 2008. View at Publisher · View at Google Scholar · View at Scopus
  3. P. Mallick and B. Kuster, “Proteomics: a pragmatic perspective,” Nature Biotechnology, vol. 28, no. 7, pp. 695–709, 2010. View at Publisher · View at Google Scholar · View at Scopus
  4. M. L. Bermingham, R. Pong-Wong, A. Spiliopoulou et al., “Application of high-dimensional feature selection: evaluation for genomic prediction in man,” Scientific Reports, vol. 5, Article ID 10312, 2015. View at Publisher · View at Google Scholar · View at Scopus
  5. R. Tibshirani, “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistical Society B, vol. 58, no. 1, pp. 267–288, 1996. View at Google Scholar · View at MathSciNet
  6. B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, “Least angle regression,” The Annals of Statistics, vol. 32, no. 2, pp. 407–499, 2004. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  7. H. Zou and T. Hastie, “Regularization and variable selection via the elastic net,” Journal of the Royal Statistical Society. Series B. Statistical Methodology, vol. 67, no. 2, pp. 301–320, 2005. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  8. J. Friedman, T. Hastie, and R. Tibshirani, “Additive logistic regression: a statistical view of boosting,” The Annals of Statistics, vol. 28, no. 2, pp. 337–407, 2000. View at Publisher · View at Google Scholar · View at MathSciNet
  9. P. Bühlmann and T. Hothorn, “Boosting algorithms: regularization, prediction and model fitting,” Statistical Science, vol. 22, no. 4, pp. 477–505, 2007. View at Publisher · View at Google Scholar · View at MathSciNet
  10. N. Meinshausen and P. Bühlmann, “High-dimensional graphs and variable selection with the lasso,” The Annals of Statistics, vol. 34, no. 3, pp. 1436–1462, 2006. View at Publisher · View at Google Scholar · View at MathSciNet
  11. C. Leng, Y. Lin, and G. Wahba, “A note on the lasso and related procedures in model selection,” Statistica Sinica, vol. 16, no. 4, pp. 1273–1284, 2006. View at Google Scholar · View at MathSciNet · View at Scopus
  12. N. Meinshausen and P. Bühlmann, “Stability selection,” Journal of the Royal Statistical Society. Series B. Statistical Methodology, vol. 72, no. 4, pp. 417–473, 2010. View at Publisher · View at Google Scholar · View at MathSciNet
  13. R. D. Shah and R. J. Samworth, “Variable selection with error control: another look at stability selection,” Journal of the Royal Statistical Society. Series B. Statistical Methodology, vol. 75, no. 1, pp. 55–80, 2013. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  14. B. Hofner, L. Boccuto, and M. Göker, “Controlling false discoveries in high-dimensional situations: boosting with stability selection,” BMC Bioinformatics, vol. 16, no. 1, article 144, 2015. View at Publisher · View at Google Scholar · View at Scopus
  15. I. Guyon and A. Elisseeff, “An introduction to variable and feature selection,” Journal of Machine Learning Research, vol. 3, pp. 1157–1182, 2003. View at Google Scholar · View at Scopus
  16. J. Bi, K. P. Bennett, M. Embrechts, C. M. Breneman, and M. Song, “Dimensionality reduction via sparse support vector machines,” Journal of Machine Learning Research, vol. 3, pp. 1229–1243, 2003. View at Google Scholar · View at Scopus
  17. Y. Wu, D. D. Boos, and L. A. Stefanski, “Controlling variable selection by the addition of pseudovariables,” Journal of the American Statistical Association, vol. 102, no. 477, pp. 235–243, 2007. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  18. V. G. Tusher, R. Tibshirani, and G. Chu, “Significance analysis of microarrays applied to the ionizing radiation response,” Proceedings of the National Academy of Sciences of the United States of America, vol. 98, no. 9, pp. 5116–5121, 2001. View at Publisher · View at Google Scholar · View at Scopus
  19. T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning, Springer Series in Statistics, Springer New York Inc., New York, NY, USA, 2001. View at Publisher · View at Google Scholar · View at MathSciNet
  20. G. Ridgeway, “The state of boosting,” Computing Science and Statistics, vol. 31, pp. 172–181, 1999. View at Google Scholar
  21. A. Mayr, B. Hofner, and M. Schmid, “The importance of knowing when to stop: a sequential stopping rule for component-wise gradient boosting,” Methods of Information in Medicine, vol. 51, no. 2, pp. 178–186, 2012. View at Publisher · View at Google Scholar · View at Scopus
  22. T. Hepp, M. Schmid, O. Gefeller, E. Waldmann, and A. Mayr, “Approaches to regularized regression—a comparison between gradient boosting and the lasso,” Methods of Information in Medicine, vol. 55, no. 5, pp. 422–430, 2016. View at Publisher · View at Google Scholar
  23. A.-C. Haury, F. Mordelet, P. Vera-Licona, and J.-P. Vert, “TIGRESS: trustful inference of gene regulation using stability selection,” BMC Systems Biology, vol. 6, article 145, 2012. View at Publisher · View at Google Scholar · View at Scopus
  24. S. Ryali, T. Chen, K. Supekar, and V. Menon, “Estimation of functional connectivity in fMRI data using stability selection-based sparse partial correlation with elastic net penalty,” NeuroImage, vol. 59, no. 4, pp. 3852–3861, 2012. View at Publisher · View at Google Scholar · View at Scopus
  25. J. Thomas, A. Mayr, B. Bischl, M. Schmid, A. Smith, and B. Hofner, Stability selection for component-wise gradient boosting in multiple dimensions, 2016.
  26. A. Mayr, B. Hofner, and M. Schmid, “Boosting the discriminatory power of sparse survival models via optimization of the concordance index and stability selection,” BMC Bioinformatics, vol. 17, no. 1, article 288, 2016. View at Publisher · View at Google Scholar · View at Scopus
  27. H. Strasser and C. Weber, “The asymptotic theory of permutation statistics,” Mathematical Methods of Statistics, vol. 8, no. 2, pp. 220–250, 1999. View at Google Scholar · View at MathSciNet
  28. M. B. Kursa, A. Jankowski, and W. . Rudnicki, “Boruta---a system for feature selection,” Fundamenta Informaticae, vol. 101, no. 4, pp. 271–285, 2010. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  29. M. Lang, B. Bischl, and D. Surmann, “batchtools: Tools for R to work on batch systems,” The Journal of Open Source Software, vol. 2, no. 10, 2017. View at Publisher · View at Google Scholar
  30. U. Alon, N. Barka, D. A. Notterman et al., “Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays,” Proceedings of the National Academy of Sciences of the United States of America, vol. 96, no. 12, pp. 6745–6750, 1999. View at Publisher · View at Google Scholar · View at Scopus
  31. E. Gravier, G. Pierron, A. Vincent-Salomon et al., “A prognostic DNA signature for T1T2 node-negative breast cancer patients,” Genes Chromosomes and Cancer, vol. 49, no. 12, pp. 1125–1134, 2010. View at Publisher · View at Google Scholar · View at Scopus
  32. P. Bühlmann, M. Kalisch, and L. Meier, “High-dimensional statistics with a view toward applications in biology,” Annual Review of Statistics and Its Application, vol. 1, no. 1, pp. 255–278, 2014. View at Google Scholar
  33. T. Hothorn, P. Buehlmann, T. Kneib, M. Schmid, and B. Hofner, mboost: Model-Based Boosting. R package version R package version 2.7-0, 2016.
  34. B. Hofner and T. Hothorn, stabs: Stability Selection with Error Control. R package version R package version 0.5-1, 2015.
  35. J. Friedman, T. Hastie, and R. Tibshirani, “Regularization paths for generalized linear models via coordinate descent,” Journal of Statistical Software, vol. 33, no. 1, pp. 1–22, 2010. View at Google Scholar · View at Scopus
  36. G. Tutz and H. Binder, “Generalized additive modeling with implicit variable selection by likelihood-based boosting,” Biometrics, vol. 62, no. 4, pp. 961–971, 2006. View at Publisher · View at Google Scholar · View at MathSciNet
  37. A. Mayr, H. Binder, O. Gefeller, and M. Schmid, “The evolution of boosting algorithms: From machine learning to statistical modelling,” Methods of Information in Medicine, vol. 53, no. 6, pp. 419–427, 2014. View at Publisher · View at Google Scholar · View at Scopus
  38. A. Mayr, N. Fenske, B. Hofner, T. Kneib, and M. Schmid, “Generalized additive models for location, scale and shape for high dimensional data---a flexible approach based on boosting,” Journal of the Royal Statistical Society. Series C. Applied Statistics, vol. 61, no. 3, pp. 403–427, 2012. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus