Table of Contents Author Guidelines Submit a Manuscript
Computational and Mathematical Methods in Medicine
Volume 2015, Article ID 370640, 14 pages
http://dx.doi.org/10.1155/2015/370640
Research Article

A Novel Hybrid Dimension Reduction Technique for Undersized High Dimensional Gene Expression Data Sets Using Information Complexity Criterion for Cancer Classification

1Department of Statistics, Faculty of Science, Firat University, 23119 Elazig, Turkey
2Department of Business Analytics and Statistics, The University of Tennessee, Knoxville, TN 37996, USA

Received 24 December 2014; Accepted 18 February 2015

Academic Editor: David A. Winkler

Copyright © 2015 Esra Pamukçu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. S. Raychaudhuri, J. M. Stuart, and R. B. Altman, “Principal components analysis to summarize microarray experiments: application to sporulation time series,” in Proceedings of the Pacific Symposium on Biocomputing, pp. 455–466, 2000.
  2. K. Y. Yeung and W. L. Ruzzo, “Principal component analysis for clustering gene expression data,” Bioinformatics, vol. 17, no. 9, pp. 763–774, 2001. View at Publisher · View at Google Scholar · View at Scopus
  3. D. S. Huang and C. H. Zheng, “Independent component analysis-based penalized discriminant method for tumor classification using gene expression data,” Bioinformatics, vol. 22, no. 15, pp. 1855–1862, 2006. View at Publisher · View at Google Scholar · View at Scopus
  4. X. Chen, L. Wang, J. D. Smith, and B. Zhang, “Supervised principal component analysis for gene set enrichment of microarray data with continuous or survival outcomes,” Bioinformatics, vol. 24, no. 21, pp. 2474–2481, 2008. View at Publisher · View at Google Scholar · View at Scopus
  5. L.-W. Yang, E. Eyal, I. Bahar, and A. Kitao, “Principal component analysis of native ensembles of biomolecular structures (PCA_NEST): insights into functional dynamics,” Bioinformatics, vol. 25, no. 5, pp. 606–614, 2009. View at Google Scholar
  6. S. Ma and M. R. Kosorok, “Identification of differential gene pathways with principal component analysis,” Bioinformatics, vol. 25, no. 7, pp. 882–889, 2009. View at Publisher · View at Google Scholar · View at Scopus
  7. G. Nyamundanda, L. Brennan, and I. C. Gormley, “Probabilistic principal component analysis for metabolomic data,” BMC Bioinformatics, vol. 11, article 571, 2010. View at Publisher · View at Google Scholar · View at Scopus
  8. Z. Liu and H. Bozdogan, “Kernel PCA for feature extraction with information complexity,” in Statistical Data Mining & Knowledge Discovery, pp. 309–322, Chapman & Hall/CRC, Boca Raton, Fla, USA, 2004. View at Google Scholar · View at MathSciNet
  9. C. Liberati, J. A. Howe, and H. Bozdogan, “Data adaptive simultaneous parameter and kernel selection in kernel discriminant analysis using information complexity,” Journal of Pattern Recognition Research, vol. 4, no. 1, pp. 119–132, 2009. View at Publisher · View at Google Scholar
  10. I. T. Jolliffe, Principal Component Analysis, Springer Series in Statistics, Springer, New York, NY, USA, 2nd edition, 2002. View at MathSciNet
  11. M. Scholz, Approaches to analyse and interpret biological profile data [Ph.D. dissertation], Postdam University, Potsdam, Germany, 2006.
  12. D. G. Fiebig, “On the maximum-entropy approach to undersized samples,” Applied Mathematics and Computation, vol. 14, no. 3, pp. 301–312, 1984. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  13. M. Tipping and M. C. Bishop, “Probabilistic principal component analysis,” Tech. Rep. NCRG/97/10, 1997. View at Google Scholar
  14. M. E. Tipping and C. M. Bishop, “Probabilistic principal component analysis,” Journal of the Royal Statistical Society Series B: Statistical Methodology, vol. 61, no. 3, pp. 611–622, 1999. View at Publisher · View at Google Scholar · View at MathSciNet
  15. H. Akaike, “Information theory and an extension of the maximum likelihood principle,” in Information Theory: Proceedings of the 2nd International Symposium, B. N. Petrov and F. Csaki, Eds., pp. 267–281, Academiai Kiado, Budapest, Hungary, 1973. View at Publisher · View at Google Scholar
  16. H. Bozdogan, “Model selection and Akaike's information criterion (AIC): the general theory and its analytical extensions,” Psychometrika, vol. 52, no. 3, pp. 345–370, 1987. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  17. H. Bozdogan, “On the information-based measure of covariance complexity and its application to the evaluation of multivariate linear models,” Communications in Statistics. Theory and Methods, vol. 19, no. 1, pp. 221–278, 1990. View at Publisher · View at Google Scholar · View at MathSciNet
  18. H. Bozdogan, Information Complexity and Multivariate Learning in High Dimensions with Applications, Forthcoming Book, 2015.
  19. R. Bellman, Adaptive Control Processes: A Guided Tour, Princeton University Press, New Jersey, NJ, USA, 1961. View at MathSciNet
  20. A. Shurygin, “The linear combination of the simplest discriminator and Fisher's one,” in Applied Statistics, Nauka, Ed., Nauka, Moscow, Rusia, 1983. View at Google Scholar
  21. W. James and C. Stein, “Estimation with quadratic loss,” in Proceedings of the 4th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 361–379, Berkeley, Calif, USA, 1961.
  22. C. Stein, “Estimation of a covariance matrix, Rietz Lecture,” in Proceedings of the 39th Annual Meeting IMS, Atlanta, Ga, USA, 1975.
  23. H. Bozdogan and J. A. Howe, “Misspecified multivariate regression models using the genetic algorithm and information complexity as the fitness function,” European Journal of Pure and Applied Mathematics, vol. 5, no. 2, pp. 211–249, 2012. View at Google Scholar · View at MathSciNet
  24. L. R. Haff, “Empirical Bayes estimation of the multivariate normal covariance matrix,” The Annals of Statistics, vol. 8, no. 3, pp. 586–597, 1980. View at Publisher · View at Google Scholar · View at MathSciNet
  25. H. Theil and K. Laitinen, “Singular moment matrices in applied econometrics,” in Multivariate Analysis, P. R. Krishnaiah, Ed., pp. 629–649, North-Holland, Amsterdam, The Netherlands, 1980. View at Google Scholar
  26. H. Theil and D. G. Fiebig, Exploiting Continuity: Maximum Entropy Estimation of Continuous Distributions, Ballinger Publishing Company, Cambridge, Mass, USA, 1984.
  27. O. Ledoit and M. Wolf, “Improved estimation of the covariance matrix of stock returns with an application to portfolio selection,” Journal of Empirical Finance, vol. 10, no. 5, pp. 603–621, 2003. View at Publisher · View at Google Scholar · View at Scopus
  28. O. Ledoit and M. Wolf, “A well-conditioned estimator for large-dimensional covariance matrices,” Journal of Multivariate Analysis, vol. 88, no. 2, pp. 365–411, 2004. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  29. S. Press, “Estimation of a normal covariance matrix,” Tech. Rep., University of British Columbia, 1975. View at Google Scholar
  30. M. Chen, “Estimation of covariance matrices under a quadratic loss function,” Research Report S-46, Department of Mathematics, Stony Brook University, Albany, NY, USA, 1976. View at Google Scholar
  31. H. Bozdogan, Shrinkage Covariance Estimators, Unpublished Lecture Notes, 2010.
  32. C. E. Thomaz, Maximum entropy covariance estimate for statistical pattern recognition [Ph.D. thesis], Department of Computing Imperial College, University of London, London, UK, 2004.
  33. H. Bozdogan, “Intelligent statistical data mining with information complexity and genetic algorithms,” in Statistical Data Mining and Knowledge Discovery, H. Bozdogan, Ed., pp. 15–56, Chapman & Hall/CRC, Boca Raton, Fla, USA, 2004. View at Google Scholar · View at MathSciNet
  34. M. H. van Emden, An Analysis of Complexity, Mathematical Centre Tracts, Amsterdam, The Netherlands, 1971.
  35. H. Bozdogan, “Choosing the number of component clusters in the mixture-model using a new informational complexity criterion of the inverse-fisher information matrix,” in Information and Classification, O. Opitz, B. Lausen, and R. Klar, Eds., Studies in Classification, Data Analysis and Knowledge Organization, pp. 40–54, Springer, Berlin, Germany, 1993. View at Publisher · View at Google Scholar
  36. H. Bozdogan, “Mixture-model cluster analysis using a new informational complexity and model selection criteria,” in Multivariate Statistical Modeling: Proceedings of the 1st US/Japan Conference on the Frontiers of Statistical Modeling: An Informational Approach, H. Bozdogan, Ed., pp. 69–113, Kluwer Academic Publishers, Dordrecht, The Netherlands, 1994. View at Google Scholar
  37. H. Bozdogan and D. M. A. Haughton, “Informational complexity criteria for regression models,” Computational Statistics & Data Analysis, vol. 28, no. 1, pp. 51–76, 1998. View at Publisher · View at Google Scholar · View at MathSciNet
  38. H. Bozdogan, “Akaike's information criterion and recent developments in information complexity,” Journal of Mathematical Psychology, vol. 44, no. 1, pp. 62–91, 2000. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  39. H. Bozdogan, “A new class of information complexity (ICOMP) criteria with an application to customer profiling and segmentation,” Istanbul University Journal of the School of Business Administration, vol. 39, no. 2, pp. 370–398, 2010. View at Google Scholar
  40. H. Bozdogan and J. A. Howe, “Dimension reduction with probabilistic principal component analysis with the genetic algorithm and misspecification-resistant multivariate regression,” Under review in Statistical Methodology.
  41. A. S. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” Journal of the Royal Statistical Society Series B: Methodological, vol. 39, no. 1, pp. 1–38, 1977. View at Google Scholar · View at MathSciNet
  42. M. Dettling, “BagBoosting for tumor classification with gene expression data,” Bioinformatics, vol. 20, no. 18, pp. 3583–3593, 2004. View at Publisher · View at Google Scholar · View at Scopus
  43. T. R. Golub, D. K. Slonim, P. Tamayo et al., “Molecular classification of cancer: class discovery and class prediction by gene expression monitoring,” Science, vol. 286, no. 5439, pp. 531–537, 1999. View at Publisher · View at Google Scholar · View at Scopus
  44. U. Alon, N. Barka, D. A. Notterman et al., “Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays,” Proceedings of the National Academy of Sciences of the United States of America, vol. 96, no. 12, pp. 6745–6750, 1999. View at Publisher · View at Google Scholar · View at Scopus
  45. D. Singh, P. G. Febbo, K. Ross et al., “Gene expression correlates of clinical prostate cancer behavior,” Cancer Cell, vol. 1, no. 2, pp. 203–209, 2002. View at Publisher · View at Google Scholar · View at Scopus
  46. A. A. Alizadeh, M. B. Elsen, R. E. Davis et al., “Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling,” Nature, vol. 403, no. 6769, pp. 503–511, 2000. View at Publisher · View at Google Scholar · View at Scopus
  47. J. Khan, J. S. Wei, M. Ringnér et al., “Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks,” Nature Medicine, vol. 7, no. 6, pp. 673–679, 2001. View at Publisher · View at Google Scholar · View at Scopus
  48. S. L. Pomeroy, P. Tamayo, M. Gaasenbeek et al., “Prediction of central nervous system embryonal tumour outcome based on gene expression,” Nature, vol. 415, no. 6870, pp. 436–442, 2002. View at Publisher · View at Google Scholar · View at Scopus