Table of Contents Author Guidelines Submit a Manuscript
BioMed Research International
Volume 2014, Article ID 459203, 15 pages
Research Article

Breast Cancer Prognosis Risk Estimation Using Integrated Gene Expression and Clinical Data

School of Information Technology, Deakin University, 221 Burwood Highway, Melbourne, VIC 3125, Australia

Received 2 November 2013; Revised 11 January 2014; Accepted 2 March 2014; Published 14 May 2014

Academic Editor: Brian Oliver

Copyright © 2014 Ashish Saini et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Background. Novel prognostic markers are needed so newly diagnosed breast cancer patients do not undergo any unnecessary therapy. Various microarray gene expression datasets based studies have generated gene signatures to predict the prognosis outcomes, while ignoring the large amount of information contained in established clinical markers. Nevertheless, small sample sizes in individual microarray datasets remain a bottleneck in generating robust gene signatures that show limited predictive power. The aim of this study is to achieve high classification accuracy for the good prognosis group and then achieve high classification accuracy for the poor prognosis group. Methods. We propose a novel algorithm called the IPRE (integrated prognosis risk estimation) algorithm. We used integrated microarray datasets from multiple studies to increase the sample sizes (∼2,700 samples). The IPRE algorithm consists of a virtual chromosome for the extraction of the prognostic gene signature that has 79 genes, and a multivariate logistic regression model that incorporates clinical data along with expression data to generate the risk score formula that accurately categorizes breast cancer patients into two prognosis groups. Results. The evaluation on two testing datasets showed that the IPRE algorithm achieved high classification accuracies of 82% and 87%, which was far greater than any existing algorithms.