Table of Contents Author Guidelines Submit a Manuscript
BioMed Research International
Volume 2015 (2015), Article ID 320385, 10 pages
http://dx.doi.org/10.1155/2015/320385
Research Article

A Robust Supervised Variable Selection for Noisy High-Dimensional Data

1Institute of Computer Science of the Czech Academy of Sciences, Pod Vodárenskou Vĕží 2, 182 07 Prague 8, Czech Republic
2Department of Biomedical Informatics, Faculty of Biomedical Engineering, Czech Technical University in Prague, Náměstí Sítná 3105, 272 01 Kladno, Czech Republic

Received 14 November 2014; Accepted 7 April 2015

Academic Editor: Rosalyn H. Hargraves

Copyright © 2015 Jan Kalina and Anna Schlenker. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning, Springer, New York, NY, USA, 2001.
  2. J. A. Lee and M. Verleysen, Nonlinear Dimensionality Reduction, Springer, New York, NY, USA, 2007. View at Publisher · View at Google Scholar · View at MathSciNet
  3. H. Schwender, K. Ickstadt, and J. Rahnenführer, “Classification with high-dimensional genetic data: assigning patients and genetic features to known classes,” Biometrical Journal, vol. 50, no. 6, pp. 911–926, 2008. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  4. J. J. Dai, L. Lieu, and D. Rocke, “Dimension reduction for classification with gene expression microarray data,” Statistical Applications in Genetics and Molecular Biology, vol. 5, article 6, pp. 1–19, 2006. View at Publisher · View at Google Scholar · View at MathSciNet
  5. L. Davies, Data Analysis and Approximate Models, CRC Press, Boca Raton, Fla, USA, 2014. View at MathSciNet
  6. M. Hubert, P. J. Rousseeuw, and S. Van Aelst, “High-breakdown robust multivariate methods,” Statistical Science, vol. 23, no. 1, pp. 92–119, 2008. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  7. P. Filzmoser and V. Todorov, “Review of robust multivariate statistical methods in high dimension,” Analytica Chimica Acta, vol. 705, no. 1-2, pp. 2–14, 2011. View at Publisher · View at Google Scholar · View at Scopus
  8. V. Todorov and P. Filzmoser, “Comparing classical and robust sparse PCA,” Advances in Intelligent Systems and Computing, vol. 190, pp. 283–291, 2013. View at Publisher · View at Google Scholar · View at Scopus
  9. H. Xu, C. Caramanis, and S. Mannor, “Outlier-robust PCA: the high-dimensional case,” IEEE Transactions on Information Theory, vol. 59, no. 1, pp. 546–572, 2013. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  10. S. van Aelst, J. A. Khan, and R. H. Zamar, “Fast robust variable selection,” in COMPSTAT 2008: Proceedings in Computational Statistics, P. Brito, Ed., pp. 359–370, Physica-Verlag HD, Heidelberg, Germany, 2008. View at Google Scholar
  11. C. Ding and H. Peng, “Minimum redundancy feature selection from microarray gene expression data,” Journal of Bioinformatics and Computational Biology, vol. 3, no. 2, pp. 185–205, 2005. View at Publisher · View at Google Scholar · View at Scopus
  12. R. Battiti, “Using mutual information for selecting features in supervised neural net learning,” IEEE Transactions on Neural Networks, vol. 5, no. 4, pp. 537–550, 1994. View at Publisher · View at Google Scholar · View at Scopus
  13. X. Liu, A. Krishnan, and A. Mondry, “An entropy-based gene selection method for cancer classification using microarray data,” BMC Bioinformatics, vol. 6, article 76, 15 pages, 2005. View at Publisher · View at Google Scholar · View at Scopus
  14. H. Peng, F. Long, and C. Ding, “Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 8, pp. 1226–1238, 2005. View at Publisher · View at Google Scholar · View at Scopus
  15. B. Auffarth, M. Lopez, and J. Cerquides, “Comparison of redundancy and relevance measures for feature selection in tissue classification of CT images,” in Advances in Data Mining, Applications and Theoretical Aspects, vol. 6171 of Lecture Notes in Computer Science, pp. 248–262, Springer, 2010. View at Google Scholar
  16. J. Kalina, “Implicitly weighted methods in robust image analysis,” Journal of Mathematical Imaging and Vision, vol. 44, no. 3, pp. 449–462, 2012. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet · View at Scopus
  17. J. Á. Víšek, “Consistency of the least weighted squares under heteroscedasticity,” Kybernetika, vol. 47, no. 2, pp. 179–206, 2011. View at Google Scholar · View at MathSciNet · View at Scopus
  18. F. E. Harrell, Regression Modeling Strategies with Applications to Linear Models, Logistic Regression, and Survival Analysis, Springer, New York, NY, USA, 2002.
  19. Y. Guo, T. Hastie, and R. Tibshirani, “Regularized linear discriminant analysis and its application in microarrays,” Biostatistics, vol. 8, no. 1, pp. 86–100, 2007. View at Publisher · View at Google Scholar · View at Scopus
  20. M. Pourahmadi, High-Dimensional Covariance Estimation, Wiley Series in Probability and Statistics, John Wiley & Sons, Hoboken, NJ, USA, 2013. View at Publisher · View at Google Scholar · View at MathSciNet
  21. J. Schäfer and K. Strimmer, “A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics,” Statistical Applications in Genetics and Molecular Biology, vol. 4, article 32, 30 pages, 2005. View at Publisher · View at Google Scholar · View at MathSciNet
  22. R. Tibshirani, T. Hastie, B. Narasimhan, and G. Chu, “Class prediction by nearest shrunken centroids, with applications to DNA microarrays,” Statistical Science, vol. 18, no. 1, pp. 104–117, 2003. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  23. O. Ledoit and M. Wolf, “A well-conditioned estimator for large-dimensional covariance matrices,” Journal of Multivariate Analysis, vol. 88, no. 2, pp. 365–411, 2004. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  24. C. Stein, “Inadmissibility of the usual estimator for the mean of a multivariate normal distribution,” in Proceedings of the 3rd Berkeley Symposium on Mathematical Statistics and Probability, pp. 197–206, University of California Press, Berkeley, Calif, USA, 1956. View at MathSciNet
  25. J. Kalina, “Classification methods for high-dimensional genetic data,” Biocybernetics and Biomedical Engineering, vol. 34, no. 1, pp. 10–18, 2014. View at Publisher · View at Google Scholar · View at Scopus
  26. P. Xanthopoulos, P. M. Pardalos, and T. B. Trafalis, Robust Data Mining, Springer, New York, NY, USA, 2013. View at Publisher · View at Google Scholar · View at MathSciNet
  27. G. L. Shevlyakov and N. O. Vilchevski, Robustness in Data Analysis: Criteria and Methods, VSP, Utrecht, The Netherlands, 2002.
  28. P. Čížek, “Semiparametrically weighted robust estimation of regression models,” Computational Statistics & Data Analysis, vol. 55, no. 1, pp. 774–788, 2011. View at Publisher · View at Google Scholar · View at Scopus
  29. P. J. Rousseeuw and K. V. Driessen, “Computing LTS regression for large data sets,” Data Mining and Knowledge Discovery, vol. 12, no. 1, pp. 29–45, 2006. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  30. P. J. Rousseeuw and A. M. Leroy, Robust Regression and Outlier Detection, John Wiley & Sons, New York, NY, USA, 1987.
  31. D. L. Donoho and P. J. Huber, “The notion of breakdown point,” in A Festschrift for Erich L. Lehmann, P. J. Bickel, K. Doksum, and J. L. J. Hodges, Eds., pp. 157–184, Belmont, Wadsworth, Ohio, USA, 1983. View at Google Scholar
  32. C. R. Rao, Linear Methods of Statistical Induction and their Applications, Wiley, New York, NY, USA, 2nd edition, 1973. View at MathSciNet
  33. A. Christmann, “Least median of weighted squares in logistic regression with large strata,” Biometrika, vol. 81, no. 2, pp. 413–417, 1994. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  34. A. Sreekumar, L. M. Poisson, T. M. Rajendiran et al., “Metabolomic profiles delineate potential role for sarcosine in prostate cancer progression,” Nature, vol. 457, no. 7231, pp. 910–914, 2009. View at Publisher · View at Google Scholar · View at Scopus
  35. J. Kalina and J. Duintjer Tebbens, “Algorithms for regularized linear discriminant analysis,” in Proceedings of the 6th International Conference on Bioinformatics Models, Methods and Algorithms (BIOINFORMATICS '15), pp. 128–133, Scitepress, Lisbon, Portugal, 2015.
  36. A. Schlenker, Keystroke Dynamics Data, 2015, http://www2.cs.cas.cz/~kalina/keystrokedyn.html.
  37. J. Kalina, A. Schlenker, and P. Kutílek, “Highly robust analysis of keystroke dynamics measurements,” in Proceedings of the 13th International Symposium on Applied Machine Intelligence and Informatics (SAMI '15), pp. 133–138, IEEE, Herľany, Slovakia, January 2015. View at Publisher · View at Google Scholar
  38. M. K. Özdemir, A framework for authentication of medical reports based on keystroke dynamics [M.S. thesis], Middle East Technical University, 2010, http://etd.lib.metu.edu.tr/upload/12612081/index.pdf.
  39. S. Bhatt and T. Santhanam, “Keystroke dynamics for biometric authentication-a survey,” in Proceedings of the International Conference on Pattern Recognition, Informatics and Mobile Engineering (PRIME '13), pp. 17–23, IEEE, February 2013. View at Publisher · View at Google Scholar · View at Scopus
  40. J. Hájek, Z. Šidák, and P. K. Sen, Theory of Rank Tests, Academic Press, San Diego, Calif, USA, 2nd edition, 1999. View at MathSciNet
  41. L. Breiman, “Bagging predictors,” Machine Learning, vol. 24, no. 2, pp. 123–140, 1996. View at Google Scholar · View at Scopus
  42. C. Furlanello, M. Serafini, S. Merler, and G. Jurman, “Entropy-based gene ranking without selection bias for the predictive classification of microarray data,” BMC Bioinformatics, vol. 4, article 54, 2003. View at Publisher · View at Google Scholar · View at Scopus