Table of Contents Author Guidelines Submit a Manuscript
Disease Markers
Volume 35 (2013), Issue 5, Pages 513–523
Research Article

A Semiautomated Framework for Integrating Expert Knowledge into Disease Marker Identification

1Computational Biology and Bioinformatics, Pacific Northwest National Laboratory, Richland, WA 99352, USA
2Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99352, USA
3Knowledge Discovery and Informatics, Pacific Northwest National Laboratory, Richland, WA 99352, USA
4Department of Internal Medicine, University of Utah School of Medicine, Salt Lake City, UT 84132, USA
5Department of Biochemistry and Molecular Biology, University of Texas Medical School, Houston, TX 77030, USA

Received 19 March 2013; Accepted 13 August 2013

Academic Editor: Sheng Pan

Copyright © 2013 Jing Wang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Background. The availability of large complex data sets generated by high throughput technologies has enabled the recent proliferation of disease biomarker studies. However, a recurring problem in deriving biological information from large data sets is how to best incorporate expert knowledge into the biomarker selection process. Objective. To develop a generalizable framework that can incorporate expert knowledge into data-driven processes in a semiautomated way while providing a metric for optimization in a biomarker selection scheme. Methods. The framework was implemented as a pipeline consisting of five components for the identification of signatures from integrated clustering (ISIC). Expert knowledge was integrated into the biomarker identification process using the combination of two distinct approaches; a distance-based clustering approach and an expert knowledge-driven functional selection. Results. The utility of the developed framework ISIC was demonstrated on proteomics data from a study of chronic obstructive pulmonary disease (COPD). Biomarker candidates were identified in a mouse model using ISIC and validated in a study of a human cohort. Conclusions. Expert knowledge can be introduced into a biomarker discovery process in different ways to enhance the robustness of selected marker candidates. Developing strategies for extracting orthogonal and robust features from large data sets increases the chances of success in biomarker identification.