Journal of Biomedicine and Biotechnology
Volume 2009 (2009), Article ID 632786, 7 pages
doi:10.1155/2009/632786
Research Article
Developing Prognostic Systems of Cancer Patients by Ensemble Clustering
1Division of Epidemiology and Biostatistics, Uniformed Services University of the Health Sciences, Bethesda, MD 20814, USA
2Department of Computer Science, The George Washington University, Washington DC 20052, USA
3The George Washington University Cancer Institute, The George Washington University, Washington DC 20037, USA
4Department of Mathematics, Drexel University, Philadelphia, PA 19104, USA
5Department of Pathology, The George Washington University Medical Center, Washington DC 20037, USA
Received 7 January 2009; Accepted 27 March 2009
Academic Editor: Zhenqiu Liu
Copyright © 2009 Dechang Chen et al. This is an open access article distributed under the
Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract
Accurate prediction of survival rates of cancer patients is often key to stratify patients for prognosis and treatment. Survival prediction is often accomplished by the TNM system that involves only three factors: tumor extent, lymph node involvement, and metastasis. This prediction from the TNM has been limited, because other potential prognostic factors are not used in the system. Based on availability of large cancer datasets, it is possible to establish powerful prediction systems by using machine learning procedures and statistical methods. In this paper, we present an ensemble clustering-based approach to develop prognostic systems of cancer patients. Our method starts with grouping combinations that are formed using levels of factors recorded in the data. The dissimilarity measure between combinations is obtained through a sequence of data
partitions produced by multiple use of PAM algorithm. This dissimilarity measure is then used with a hierarchical clustering method in order to find clusters of combinations. Prediction of survival is made simply by using the survival function derived from each cluster. Our approach admits multiple factors and provides a practical and useful tool in outcome prediction of cancer patients. A demonstration of use of the proposed method is given for lung cancer patients.