About this Journal Submit a Manuscript Table of Contents
Disease Markers
Volume 35 (2013), Issue 5, Pages 513–523
http://dx.doi.org/10.1155/2013/613529
Research Article

A Semiautomated Framework for Integrating Expert Knowledge into Disease Marker Identification

1Computational Biology and Bioinformatics, Pacific Northwest National Laboratory, Richland, WA 99352, USA
2Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99352, USA
3Knowledge Discovery and Informatics, Pacific Northwest National Laboratory, Richland, WA 99352, USA
4Department of Internal Medicine, University of Utah School of Medicine, Salt Lake City, UT 84132, USA
5Department of Biochemistry and Molecular Biology, University of Texas Medical School, Houston, TX 77030, USA

Received 19 March 2013; Accepted 13 August 2013

Academic Editor: Sheng Pan

Copyright © 2013 Jing Wang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. D. Ghosh and L. M. Poisson, “‘Omics’ data and levels of evidence for biomarker discovery,” Genomics, vol. 93, no. 1, pp. 13–16, 2009. View at Publisher · View at Google Scholar · View at Scopus
  2. B. P. Bradley, “Finding biomarkers is getting easier,” Ecotoxicology, vol. 21, no. 3, pp. 631–636. View at Publisher · View at Google Scholar · View at Scopus
  3. Z. Feng, R. Prentice, and S. Srivastava, “Research issues and strategies for genomic and proteomic biomarker discovery and validation: a statistical perspective,” Pharmacogenomics, vol. 5, no. 6, pp. 709–719, 2004. View at Publisher · View at Google Scholar · View at Scopus
  4. J. E. McDermott, J. Wang, H. D. Mitchell, et al., “Challenges in biomarker discovery: combining expert insights with statistical analysis of complex omics data,” Expert Opinion on Medical Diagnostics, vol. 7, no. 1, pp. 37–51, 2013.
  5. T. Wei, B. Liao, L. Ackermann et al., “Data-driven analysis approach for biomarker discovery using molecular-profiling technologies,” Biomarkers, vol. 10, no. 2-3, pp. 153–172, 2005. View at Publisher · View at Google Scholar · View at Scopus
  6. L. Chen, C. Wang, I.-M. Shih et al., “Biomarker identification by knowledge-driven multi-level ICA and motif analysis,” in Proceedings of the 6th International Conference on Machine Learning and Applications (ICMLA '07), pp. 560–566, December 2007. View at Publisher · View at Google Scholar · View at Scopus
  7. Z. Zhang, Y. Yu, F. Xu et al., “Combining multiple serum tumor markers improves detection of stage I epithelial ovarian cancer,” Gynecologic Oncology, vol. 107, no. 3, pp. 526–531, 2007. View at Publisher · View at Google Scholar · View at Scopus
  8. S. M. Hill, R. M. Neve, N. Bayani, et al., “Integrating biological knowledge into variable selection: an empirical Bayes approach with an application in cancer biology,” BMC Bioinformatics, vol. 13, article 94, pp. 94–109, 2012.
  9. D. L. Hoyert, K. D. Kochanek, and S. L. Murphy, “Deaths: final data for 1997,” National Vital Statistics Reports, vol. 47, no. 19, pp. 1–104, 1999. View at Scopus
  10. A. D. Lopez and C. C. Murray, “The global burden of disease, 1990–2020,” Nature Medicine, vol. 4, no. 11, pp. 1241–1243, 1998. View at Publisher · View at Google Scholar · View at Scopus
  11. S. R. Rosenberg and R. Kalhan, “Biomarkers in chronic obstructive pulmonary disease,” Translational Research, vol. 159, no. 4, pp. 228–237, 2012. View at Publisher · View at Google Scholar · View at Scopus
  12. Y. Zhou, D. J. Schneider, and M. R. Blackburn, “Adenosine signaling and the regulation of chronic lung disease,” Pharmacology and Therapeutics, vol. 123, no. 1, pp. 105–116, 2009. View at Publisher · View at Google Scholar · View at Scopus
  13. M. R. Blackburn, S. K. Datta, and R. E. Kellems, “Adenosine deaminase-deficient mice generated using a two-stage genetic engineering strategy exhibit a combined immunodeficiency,” Journal of Biological Chemistry, vol. 273, no. 9, pp. 5093–5100, 1998. View at Publisher · View at Google Scholar · View at Scopus
  14. H. Zhong, J. L. Chunn, J. B. Volmer, J. R. Fozard, and M. R. Blackburn, “Adenosine-mediated mast cell degranulation in adenosine deaminase-deficient mice,” Journal of Pharmacology and Experimental Therapeutics, vol. 298, no. 2, pp. 433–440, 2001. View at Scopus
  15. M. R. Blackburn, J. B. Volmer, J. L. Thrasher et al., “Metabolic consequences of adenosine deaminase deficiency in mice are associated with defects in alveogenesis, pulmonary inflammation, and airway obstruction,” Journal of Experimental Medicine, vol. 192, no. 2, pp. 159–170, 2000. View at Publisher · View at Google Scholar · View at Scopus
  16. M. R. Blackburn, C. G. Lee, H. W. Young et al., “Adenosine mediates IL-13-induced inflammation and remodeling in the lung and interacts in an IL-13-adenosine amplification pathway,” Journal of Clinical Investigation, vol. 112, no. 3, pp. 332–344, 2003. View at Publisher · View at Google Scholar · View at Scopus
  17. A. V. Sauer, I. Brigida, N. Carriglio, et al., “Autoimmune dysregulation and purine metabolism in adenosine deaminase deficiency,” Frontiers in Immunology, vol. 3, pp. 1–19, 2012.
  18. H. Jin, B.-J. Webb-Robertson, E. S. Peterson et al., “Smoking, COPD, and 3-nitrotyrosine levels of plasma proteins,” Environmental Health Perspectives, vol. 119, no. 9, pp. 1314–1320, 2011. View at Publisher · View at Google Scholar · View at Scopus
  19. M. M. Matzke, J. N. Brown, M. A. Gritsenko, et al., “A comparative analysis of computational approaches to relative protein quantification using peptide peak intensities in label-free LC-MS proteomics experiments,” Proteomics, vol. 13, no. 3-4, pp. 493–503, 2013.
  20. B.-J. M. Webb-Robertson, L. A. McCue, K. M. Waters et al., “Combined statistical analyses of peptide intensities and peptide occurrences improves identification of significant peptides from MS-based proteomics data,” Journal of Proteome Research, vol. 9, no. 11, pp. 5748–5756, 2010. View at Publisher · View at Google Scholar · View at Scopus
  21. B.-J. M. Webb-Robertson, M. M. Matzke, J. M. Jacobs, J. G. Pounds, and K. M. Waters, “A statistical selection strategy for normalization procedures in LC-MS proteomics experiments through dataset-dependent ranking of normalization scaling factors,” Proteomics, vol. 11, no. 24, pp. 4736–4741, 2011. View at Publisher · View at Google Scholar · View at Scopus
  22. M. M. Matzke, K. M. Waters, T. O. Metz et al., “Improved quality control processing of peptide-centric LC-MS proteomics data,” Bioinformatics, vol. 27, no. 20, Article ID btr479, pp. 2866–2872, 2011. View at Publisher · View at Google Scholar · View at Scopus
  23. P. Wang, H. Tang, H. Zhang, J. Whiteaker, A. G. Paulovich, and M. Mcintosh, “Normalization regarding non-random missing values in high-throughput mass spectrometry data,” Pacific Symposium on Biocomputing, pp. 315–326, 2006. View at Scopus
  24. A. D. Polpitiya, W.-J. Qian, N. Jaitly et al., “DAnTE: A statistical tool for quantitative analysis of -omics data,” Bioinformatics, vol. 24, no. 13, pp. 1556–1558, 2008. View at Publisher · View at Google Scholar · View at Scopus
  25. T. Schneider, “Analysis of incomplete climate data: estimation of mean values and covariance matrices and imputation of missing values,” Journal of Climate, vol. 14, no. 5, pp. 853–871, 2001. View at Scopus
  26. M. Ashburner, C. A. Ball, J. A. Blake et al., “Gene ontology: tool for the unification of biology. The Gene Ontology Consortium,” Nature Genetics, vol. 25, no. 1, pp. 25–29, 2000. View at Publisher · View at Google Scholar · View at Scopus
  27. C. Posse, A. Sanfilippo, B. Gopalan, et al., “Cross-ontological analytics: combining associative and hierarchical relations in the gene ontologies to assess gene product similarity,” in Computational Science, Lecture Notes in Computer Science, pp. 871–878, 2006.
  28. G. Yu, F. Li, Y. Qin, X. Bo, Y. Wu, and S. Wang, “GOSemSim: an R package for measuring semantic similarity among GO terms and gene products,” Bioinformatics, vol. 26, no. 7, Article ID btq064, pp. 976–978, 2010. View at Publisher · View at Google Scholar · View at Scopus
  29. D. H. von Seggern, CRC Standard Curves and Surfaces With Mathematica, Applied Mathematics & Nonlinear Science, Chapman and Hall/CRC, London, UK, 2nd edition, 2006.
  30. D. Hanisch, A. Zien, R. Zimmer, and T. Lengauer, “Co-clustering of biological networks and gene expression data,” Bioinformatics, vol. 18, supplement 1, pp. S145–S154, 2002. View at Scopus
  31. J. H. Ward, “Hierarchical grouping to optimize an objective function,” Journal of the American Statistical Association, vol. 58, no. 301, pp. 236–244, 1963.
  32. J. E. McDermott, H. Shankaran, A. J. Eisfeld et al., “Conserved host response to highly pathogenic avian influenza virus infection in human cell culture, mouse and macaque model systems,” BMC Systems Biology, vol. 5, article 190, pp. 190–212, 2011. View at Publisher · View at Google Scholar · View at Scopus
  33. B.-J. M. Webb-Robertson, L. A. McCue, N. Beagley et al., “A Bayesian integration model of high-throughput proteomics and metabolomics data for improved early detection of microbial infections,” Pacific Symposium on Biocomputing, pp. 451–463, 2009. View at Scopus
  34. M. Ahdesmäki and K. Strimmer, “Feature selection in omics prediction problems using cat scores and false nondiscovery rate control,” Annals of Applied Statistics, vol. 4, no. 1, pp. 503–519, 2010.
  35. A. F. Atiya, “Estimating the posterior probabilities using the K-nearest neighbor rule,” Neural Computation, vol. 17, no. 3, pp. 731–740, 2005. View at Publisher · View at Google Scholar · View at Scopus
  36. P. MacCullagh and J. A. Nelder, Generalized Linear Models, Monographs on Statistics and Applied Probability, Chapman and Hall/CRC, London, UK, 1989.
  37. T. Mitchell, B. Buchanan, G. Dejong, et al., “Machine learning,” Annual Review of Computer Science, vol. 4, pp. 417–433, 1989.
  38. N. Beagley, K. G. Stratton, and B.-J. M. Webb-Robertson, “VIBE 2.0: visual integration for bayesian evaluation,” Bioinformatics, vol. 26, no. 2, pp. 280–282, 2010. View at Scopus
  39. S. Oh, D. D. Kang, G. N. Brock, and G. C. Tseng, “Biological impact of missing-value imputation on downstream analyses of gene expression profiles,” Bioinformatics, vol. 27, no. 1, Article ID btq613, pp. 78–86, 2011. View at Publisher · View at Google Scholar · View at Scopus
  40. E. Younesi, L. Toldo, B. Muller, et al., “Mining biomarker information in biomedical literature,” BMC Medical Informatics and Decision Making, vol. 12, article 148, pp. 148–160, 2012.
  41. R. Nugent and M. Meila, “An overview of clustering applied to molecular biology,” Methods in Molecular Biology, vol. 620, pp. 369–404, 2010. View at Publisher · View at Google Scholar · View at Scopus
  42. W. S. Noble, “What is a support vector machine?” Nature Biotechnology, vol. 24, no. 12, pp. 1565–1567, 2006. View at Publisher · View at Google Scholar · View at Scopus
  43. M. G. Schrauder, R. Strick, R. Schulz-Wendtland et al., “Circulating micro-RNAs as potential blood-based markers for early stage breast cancer detection,” PLoS ONE, vol. 7, no. 1, Article ID e29770, 2012. View at Publisher · View at Google Scholar · View at Scopus
  44. C. Kingsford and S. L. Salzberg, “What are decision trees?” Nature Biotechnology, vol. 26, no. 9, pp. 1011–1013, 2008. View at Publisher · View at Google Scholar · View at Scopus
  45. R. Díaz-Uriarte and S. Alvarez de Andrés, “Gene selection and classification of microarray data using random forest,” BMC Bioinformatics, vol. 7, article 3, 2006. View at Publisher · View at Google Scholar · View at Scopus
  46. M. H. Zweig and G. Campbell, “Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine,” Clinical Chemistry, vol. 39, no. 4, pp. 561–577, 1993. View at Scopus
  47. D. M. V. Powers, “Evaluation: from precision, recall and Fmeasure to ROC, informedness, markedness & correlation,” Journal of Machine Learning Technologies, vol. 2, no. 1, pp. 37–63, 2011.