Machine Learning and Network Methods for Biology and MedicineView this Special Issue
Research Article | Open Access
A Five-Gene Signature Predicts Prognosis in Patients with Kidney Renal Clear Cell Carcinoma
Kidney renal clear cell carcinoma (KIRC) is one of the most common cancers with high mortality all over the world. Many studies have proposed that genes could be used to predict prognosis in KIRC. In this study, RNA expression data from next-generation sequencing and clinical information of 523 patients downloaded from The Cancer Genome Atlas (TCGA) dataset were analyzed in order to identify the relationship between gene expression level and the prognosis of KIRC patients. A set of five genes that significantly associated with overall survival time was identified and a model containing these five genes was constructed by Cox regression analysis. By Kaplan-Meier and Receiver Operating Characteristic (ROC) analysis, we confirmed that the model had good sensitivity and specificity. In summary, expression of the five-gene model is associated with the prognosis outcomes of KIRC patients, and it may have an important clinical significance.
In recent years, the incidence and mortality of kidney cancer have been rising throughout the world . In 2013, nearly 58,000 new cases occurred, and 130,001 patients died of kidney cancer in the United States . Among them, kidney renal clear cell carcinoma (KIRC) is the most common histological subtype and accounts for 70%–80% of renal cancer cases . KIRC tissue is resistant to traditional chemotherapeutic drugs , and patient outcomes varied a lot . Although various researches have been done on KIRC, the clinical prognosis of KIRC patients still remains very poor; the survival time of 90% of patients with metastatic KIRC is less than 5 years . Therefore, there is an urgent need to find potential molecular-based prognostic biomarkers in KIRC, and it is also one of the most important steps for prognostic prediction of patients.
Messenger RNA is one of the most common molecular markers. Many studies have suggested that genes were involved in the biological processes of many cancers and related to prognostic survival time of patients. For instance, SIPL1 (Shank-Interacting Protein-Like 1) has reported to have overexpression during breast cancer tumorigenesis, and inhibiting the expression of SIPL1 may contribute to inhibition of breast cancer . PLA2G16 has been proved as an important prognostic factor in primary osteosarcoma patients . Dicerl has been found to be expressed at low level in nasopharyngeal carcinoma tissues no matter whether at the gene or at the protein levels, and it could also be a novel prognostic biomarker . As for KIRC, several studies have been performed to detect gene expression signatures which may provide diagnostic and prognostic information [10–12]. Ge et al. have identified miRNA signature including 22 miRNAs as an independent novel predictor of patient outcomes . Yu et al. have found that the expression of CIDE (cell death-inducing DFF45-like effector) is a novel predictor of prognosis . However, detailed analyses of the associations between gene expression level and survival time of patients in KIRC remain limited.
The goal of this paper is identifying genes that are related to overall survival time of KIRC patients by analyzing high-throughput RNA sequencing data downloaded from TCGA . In brief, the main goals are as follows: (1) identify genes that could predict the survival time of KIRC patient, and construct a model; (2) evaluate the prognostic value, sensitivity, and specificity of the model; and (3) investigate the independence and universality of the gene marker in different KIRC stages.
2. Materials and Methods
2.1. KIRC Gene Expression Data from TCGA
Up to January 2015, TCGA database (https://tcga-data.nci.nih.gov/tcga/) contained 533 KIRC patient samples . The gene expression profiling was performed by using the Illumina HiSeq platforms (Illumina Inc., San Diego, CA, USA). After excluding patients without survival status information, UNC RNASeqV2 level 3 expression data for 523 patients including 20,531 human genes and corresponding clinical data were downloaded. Then the 523 KIRC samples were randomly divided into training set () and testing set (). Specimen IDs in the two sets were shown in Supplemental Table S1 (in Supplementary Material available online at http://dx.doi.org/10.1155/2015/842784). Training set was used to identify gene expression signature, and the testing set was used for validation.
2.2. Statistical Analysis
Firstly, transformed was used for normalizing the RNA-seq expression values . Subsequently, as previous reports [17, 18], genes that were significantly () related to patient survival were identified by Cox regression analysis and random survival forests-variable hunting (RSFVH) algorithm . Considering that a model with a smaller number of genes is generally accompanied with a practically better value, we performed Cox proportional-hazard regression analysis with two genes, three genes, and five genes, respectively, expecting to dig out a better model for predicting survival. Then, based on Cox regression analysis, a risk score formula was built to calculate the risk score for each patient. As reported by Margolin et al.  and Meng et al. , the survival differences between the low-risk and high-risk groups were evaluated, and the sensitivity and specificity of the model in the survival prediction were also compared.
3.1. Patient Characteristics
All 523 patients used in this study were clinically and pathologically diagnosed with KIRC. Clinical stages of the tumor were classified into stages I to IV based on the Fuhrman nuclear grading system . Here, there are 260 patients from stage 1, 57 patients from stage 2, 125 patients from stage 3, and 81 patients from stage 4, respectively. Additionally, the average age and average prognostic survival time of these 523 patients were 61 years and 902 days, respectively. All the statistical information was summarized in Table 1.
3.2. Detection of Genes Associated with Overall Survival Time of KIRC Patients in Training Set
To identify the gene which would be potentially associated with overall survival time of patients in KIRC, univariable Cox regression analysis (see Materials and Methods) for gene expression data was conducted in training set. With the significance level of 0.001, a total of 3,849 genes were identified (Table S2). Subsequently, 100 genes with the largest importance value in random survival forests analysis with default parameters [22, 23] were selected. Then, 1–5 genes were chosen from 100 genes as covariates by enumeration algorithm and 79,375,495 models were established in multivariate Cox regression analysis. After comparing with each other, the best model (indexed by AUROC) including 5 genes (CKAP4, ISPD, MAN2A2, OTOF, and SLC40A1) was determined, and the risk score formula for this model was (0.422 × expression value of CKAP4) + (−0.443 × expression value of ISPD) + (0.551 × expression value of MAN2A2) + (0.330 × expression value of OTOF) + (−0.369 × expression value of SLC40A1). The information of these five genes was shown in Table 2. And the functions of these genes were also summarized in Table 3. In addition, the error rate (27.27%) and variable importance values of these five genes were obtained with RSFVH (Figure 1). It can be seen from Figure 1 that the five genes have relatively large importance value; CKAP4 has more importance than other predictors. Taking the median risk score as the cut-off, the 262 KIRC patients were separated into low-risk group () and high-risk group (). Survival analysis was performed by using the Kaplan-Meier method with a log-rank statistical test. As shown in Figure 2(a), Kaplan-Meier curves indicated that patients in high-risk group have significantly () worse prognosis comparing with the low-risk group (Figure 2(a)).
3.3. Verification of Survival-Associated Genes in Testing Set
To determine the prognostic potential of the five-gene signature, Kaplan-Meier survival analysis was performed in testing set. Just as it is in training set, based on the risk score of individual patient, patients in testing set were divided into low-risk and high-risk groups and Kaplan-Meier analysis was used to compare the patient survival differences. Statistically significant differences () between high-risk group and low-risk group were observed; in other words, higher risk score was related to shorter survival time (Figure 2(b)), which is in agreement with that in training set, revealing that five-gene signature may play an important role in predicting the survival of KIRC patients.
To further confirm the clinical performance of the five-gene model as a biomarker for predicting prognosis, the Receiver Operating Characteristic (ROC) analysis was performed for estimating the effect of the gene signature on patient survival. And the corresponding AUROC were calculated by hiring three years as the cut-off point. The AUROC was 0.783 (Figure 3), showing that the five-gene model has high sensitivity and specificity and could be used as a biomarker to predict the prognostic survival of patients.
3.4. The Independence and Universality of the Five-Gene Model
Studies have shown that age and clinical stage were also related to patient survival [5, 13, 21]. To examine whether the five-gene signature could distinguish the high-risk patients from low-risk patients when age of patients and stage were taken into account, multivariate Cox proportional hazard analyses were performed in both training and testing set. The results confirmed that risk score of five genes is independent of age and stage, as shown in Table 4. Besides, whether the five-gene signature was functional in different KIIRC stages was also investigated by using Kaplan-Meier and ROC analysis. Results showed that, in stage 3 and stage 4, the survival time of patients was dramatically different between high-risk group and low-risk group (, Figure S1). Moreover, the AUROC in stage 2, stage 3, and stage 4 were 0.761, 0.718, and 0.715, respectively (Figure S2), further revealing that the five-gene signature has predictive value in different clinical stages.
KIRC is one of the most common primary renal malignancies with high morbidity and mortality . However, the understanding of KIRC is not complete, and there are no clinical tools for predicting patient outcome apart from the traditional clinical parameters. Accurate data from the clinical examination of KIRC specimens could help doctors to decide appropriate treatment for patients . Therefore, the identification and validation of novel biomarkers account for an important part of practical KIRC study . In this study, we identified a five-gene signature that was significantly related to patient survival in KIRC based on genome-wide RNA profiling of 523 KIRC patients from TCGA database. In addition, we confirmed that the five-gene signature could be regarded as an independent predictor of prognostic survival after considering the various variables including age and stage, and it is also universal in different stages.
Many previous studies on genes in KIRC have mainly considered some known cancer-associated genes. For instance, Wei et al. have found that high expression of pituitary tumor-transforming gene-1 (PTTG1) in KIRC patients was associated with poor prognosis by using qRT-PCR and immunohistochemistry . Peters et al. have proved that low gene expression levels of GATA1 and GATA2 were related to tumor aggressiveness and short survival time in KIRC . With respect to the five genes we identified in this study, all of them have also been reported to be associated with cancer. It turned out that CKAP4 could be used to distinguish primary salivary oncocytic lesions from metastatic RCC effectively in dubious cases with 100% accuracy  and related to lymphatic metastasis [30, 31]. Mutations in OTOF, which functionally triggers membrane fusion and exocytosis, may provide a link between calcium signaling and cancer [22, 32, 33]. SLC40A1 is a cell membrane protein that has been identified to mediate cellular iron efflux [23, 34] and contribute to the invasive phenotype . Mutations in ISPD may cause Walker-Warburg syndrome [36, 37]. MAN2A2 was downregulated in hepatocellular carcinoma . However, up to now, such predictive markers were not analyzed in KIRC patients and the molecular study concerning these genes has not been reported in KIRC. Nevertheless, our research showed that the expressions of these genes were related to survival time of patients. ROC curve showed that the AUROC is approximately 0.8, considering that the larger AUROC usually implies a better model for prediction [6, 39], which further demonstrated that the five-gene signature in our study is a novel prognostic marker with high accuracy and has important clinical significance. Furthermore, the five-gene signature was an independent predictor, which was pervasive in different stages. In different stages, ROC analysis shows high sensitivity and specificity (AUROC >0.7) except stage 1, which is possibly because stage 1 is slow-growing tumor, cancer cells are not invasive and metastatic, and the number of patients that died of KIRC was smaller than that in other stages . We found here that the average age of patients who died in stage 1 was more than 67, which is higher than in other stages, revealing that the age at diagnosis may have some influence on KIRC prognosis, and part of deaths was attributed to increased risk of disease mortality with increasing age. Therefore, these results suggested that the five-gene signature is significantly important in clinic. The functional mechanisms of these genes remain unclear. Moreover, the five-gene signature has not yet been tested in a clinical trial. The experimental studies on these genes and further well-designed studies should be conducted to verify our findings, thereby providing a better understanding of their roles in predicting KIRC prognosis.
In summary, a five-gene signature strongly associated with patients’ survival was identified by performing Cox regression analysis and Kaplan-Meier analysis in training set. Subsequently, Kaplan-Meier and ROC analysis in testing set further indicated that the five-gene signature could be used as a novel biomarker to predict the treatment outcome of KIRC patient. Additionally, multivariate Cox regression analysis revealed that the five-gene signature was an independent predictor. These results suggested that the five-gene signature could help to predict the survival with significant clinical implications.
|KIRC:||Kidney renal clear cell carcinoma|
|TCGA:||The Cancer Genome Atlas|
|RSFVH:||Random survival forests-variable hunting|
|ROC:||Receiver Operating Characteristic.|
Conflict of Interests
The authors declare that they have no competing interests.
Yueping Zhan conceived and designed the study, carried out data analysis, and interpreted the entire results. Wenna Guo carried out data analysis and drafted the paper. Ying Zhang and Qiang Wang helped to draft the paper. Xin-jian Xu carried out data analysis and helped to draft the paper. Liucun Zhu participated in the design of the study and interpreted the results. All authors read and approved the final paper. Yueping Zhan and Wenna Guo contributed equally to this work.
This work was supported by Shanghai Province Science Foundation for Youths (Grant no. 12ZR1444200), National Natural Science Foundation of China (Grant no. 31471200), Foundation for the Author of National Excellent Doctoral Dissertation of PR China (201134), Doctor Gathering Scheme of Jiangsu Province, and the High Performance Computing Platform of Shanghai University.
Table S1 The sample list and information of patients in training set and testing set. Table S2 Univariable Cox regression analysis with a significance level of 0.001 reveal significant relation between gene expression and survival time. Figure S1 Kaplan-Meier curves analysis of the different clinical stages in the testing set. The two-sided log-rank test were used to determine the survival differences. Figure S2 Receiver operating characteristic (ROC) analysis of sensitivity and specificity by five-gene model in predicting survival time of patient with different clinical stages in the testing set.
- R. Siegel, D. Naishadham, and A. Jemal, “Cancer statistics, 2012,” CA Cancer Journal for Clinicians, vol. 62, no. 1, pp. 10–29, 2012.
- R. Siegel, D. Naishadham, and A. Jemal, “Cancer statistics, 2013,” CA: Cancer Journal for Clinicians, vol. 63, no. 1, pp. 11–30, 2013.
- L. Zhang, B. Xu, S. Chen et al., “The complex roles of microRNAs in the metastasis of renal cell carcinoma,” Journal of Nanoscience and Nanotechnology, vol. 13, no. 5, pp. 3195–3203, 2013.
- E. A. Singer, G. N. Gupta, D. Marchalik, and R. Srinivasan, “Evolving therapeutic targets in renal cell carcinoma,” Current Opinion in Oncology, vol. 25, no. 3, pp. 273–280, 2013.
- J. R. Karamchandani, M. Y. Gabril, R. Ibrahim et al., “Profilin-1 expression is associated with high grade and stage and decreased disease-free survival in renal cell carcinoma,” Human Pathology, vol. 46, no. 5, pp. 673–680, 2015.
- L. A. Tse, J. Dai, M. Chen et al., “Prediction models and risk assessment for silicosis using a retrospective cohort study among workers exposed to silica in China,” Scientific Reports, vol. 5, Article ID 11059, 2015.
- J. De Melo and D. Tang, “Elevation of SIPL1 (SHARPIN) increases breast cancer risk,” PLoS ONE, vol. 10, no. 5, Article ID e0127546, 2015.
- S. Liang, Z. Ren, X. Han et al., “PLA2G16 expression in human osteosarcoma is associated with pulmonary metastasis and poor prognosis,” PloS ONE, vol. 10, no. 5, Article ID e0127236, 2015.
- S. Xu and X. Jiang, “Reduced expression of Diceri is associated with poor prognosis in patients with nasopharyngeal carcinoma,” Lin Chung Er Bi Yan Hou Tou Jing Wai Ke Za Zhi, vol. 29, no. 2, pp. 126–131, 2015.
- A. R. Brannon, S. M. Haake, K. E. Hacker et al., “Meta-analysis of clear cell renal cell carcinoma gene expression defines a variant subgroup and identifies gender influences on tumor biology,” European Urology, vol. 61, no. 2, pp. 258–268, 2012.
- A. R. Brannon, A. Reddy, M. Seiler et al., “Molecular stratification of clear cell renal cell carcinoma by consensus clustering reveals distinct subtypes and survival patterns,” Genes and Cancer, vol. 1, no. 2, pp. 152–163, 2010.
- H. Zhao, B. Ljungberg, K. Grankvist, T. Rasmuson, R. Tibshirani, and J. D. Brooks, “Gene expression profiling predicts survival in conventional renal cell carcinoma,” PLoS Medicine, vol. 3, no. 1, article e13, 2006.
- Y.-Z. Ge, R. Wu, H. Xin et al., “A tumor-specific microRNA signature predicts survival in clear cell renal cell carcinoma,” Journal of Cancer Research and Clinical Oncology, vol. 141, no. 7, pp. 1291–1299, 2015.
- M. Yu, H. Wang, J. Zhao et al., “Expression of CIDE proteins in clear cell renal cell carcinoma and their prognostic significance,” Molecular and Cellular Biochemistry, vol. 378, no. 1-2, pp. 145–151, 2013.
- The Cancer Genome Atlas Research Network, “Comprehensive molecular characterization of clear cell renal cell carcinoma,” Nature, vol. 499, no. 7456, pp. 43–49, 2013.
- Y. Li, J. M. Krahn, G. P. Flake, D. M. Umbach, and L. Li, “Toward predicting metastatic progression of melanoma based on gene expression data,” Pigment Cell & Melanoma Research, vol. 28, no. 4, pp. 453–463, 2015.
- A. E. Zou, J. Ku, T. K. Honda et al., “Transcriptome sequencing uncovers novel long noncoding and small nucleolar RNAs dysregulated in head and neck squamous cell carcinoma,” RNA, vol. 21, no. 6, pp. 1122–1134, 2015.
- J. Meng, P. Li, Q. Zhang, Z. Yang, and S. Fu, “A four-long non-coding RNA signature in predicting breast cancer survival,” Journal of Experimental & Clinical Cancer Research, vol. 33, article 84, 2014.
- H. Ishwaran and U. B. Kogalur, “Consistency of random survival forests,” Statistics & Probability Letters, vol. 80, no. 13-14, pp. 1056–1064, 2010.
- A. A. Margolin, E. Bilal, E. Huang et al., “Systematic analysis of challenge-driven improvements in molecular prognostic models for breast cancer,” Science Translational Medicine, vol. 5, no. 181, Article ID 181re1, 2013.
- S. J. Nam, C. Lee, J. H. Park, and K. C. Moon, “Decreased PBRM1 expression predicts unfavorable prognosis in patients with clear cell renal cell carcinoma,” Urologic Oncology: Seminars and Original Investigations, vol. 33, no. 8, pp. 340.e9–340.e16, 2015.
- M. Yildirim-Baylan, G. Bademci, D. Duman, H. Ozturkmen-Akay, S. Tokgoz-Yilmaz, and M. Tekin, “Evidence for genotype-phenotype correlation for OTOF mutations,” International Journal of Pediatric Otorhinolaryngology, vol. 78, no. 6, pp. 950–953, 2014.
- N. Montalbetti, A. Simonin, G. Kovacs, and M. A. Hediger, “Mammalian iron transporters: families SLC11 and SLC40,” Molecular Aspects of Medicine, vol. 34, no. 2-3, pp. 270–287, 2013.
- J. R. Srigley, B. Delahunt, J. N. Eble et al., “The International Society of Urological Pathology (ISUP) Vancouver classification of renal neoplasia,” The American Journal of Surgical Pathology, vol. 37, no. 10, pp. 1469–1489, 2013.
- H. Moch, J. Srigley, B. Delahunt, R. Montironi, L. Egevad, and P. H. Tan, “Biomarkers in renal cancer,” Virchows Archiv, vol. 464, no. 3, pp. 359–365, 2014.
- S. Gulati, P. Martinez, T. Joshi et al., “Systematic evaluation of the prognostic impact and intratumour heterogeneity of clear cell renal cell carcinoma biomarkers,” European Urology, vol. 66, no. 5, pp. 936–948, 2014.
- C. Wei, X. Yang, J. Xi et al., “High expression of pituitary tumor-transforming gene-1 predicts poor prognosis in clear cell renal cell carcinoma,” Molecular and Clinical Oncology, vol. 3, no. 2, pp. 387–391, 2015.
- I. Peters, N. Dubrowinskaja, H. Tezval et al., “Decreased mRNA expression of GATA1 and GATA2 is associated with tumor aggressiveness and poor outcome in clear cell renal cell carcinoma,” Targeted Oncology, vol. 10, no. 2, pp. 267–275, 2015.
- J. B. McHugh, A. P. Hoschar, M. Dvorakova, A. V. Parwani, E. L. Barnes, and R. R. Seethala, “p63 immunohistochemistry differentiates salivary gland oncocytoma and oncocytic carcinoma from metastatic renal cell carcinoma,” Head and Neck Pathology, vol. 1, no. 2, pp. 123–131, 2007.
- M.-H. Li, L.-W. Dong, S.-X. Li et al., “Expression of cytoskeleton-associated protein 4 is related to lymphatic metastasis and indicates prognosis of intrahepatic cholangiocarcinoma patients after surgery resection,” Cancer Letters, vol. 337, no. 2, pp. 248–253, 2013.
- J. Zhang, S. L. Planey, C. Ceballos, S. M. Stevens Jr., S. K. Keay, and D. A. Zacharias, “Identification of CKAP4/p63 as a major substrate of the palmitoyl acyltransferase DHHC2, a putative tumor suppressor, using a novel proteomics method,” Molecular and Cellular Proteomics, vol. 7, no. 7, pp. 1378–1388, 2008.
- X. Jiao, L. D. Wood, M. Lindman et al., “Somatic mutations in the notch, NF-KB, PIK3CA, and hedgehog pathways in human breast cancers,” Genes, Chromosomes and Cancer, vol. 51, no. 5, pp. 480–489, 2012.
- M. Padmanarayana, N. Hams, L. C. Speight, E. J. Petersson, R. A. Mehl, and C. P. Johnson, “Characterization of the lipid binding properties of otoferlin reveals specific interactions between PI(4,5)P2 and the C2C and C2F Domains,” Biochemistry, vol. 53, no. 30, pp. 5023–5033, 2014.
- M.-I. Moreno-Carralero, J.-A. Muñoz-Muñoz, N. Cuadrado-Grande et al., “A novel mutation in the SLC40A1 gene associated with reduced iron export in vitro,” American Journal of Hematology, vol. 89, no. 7, pp. 689–694, 2014.
- S. Weissmueller, E. Manchado, M. Saborowski et al., “Mutant p53 drives pancreatic cancer metastasis through cell-autonomous PDGF receptor beta signaling,” Cell, vol. 157, no. 2, pp. 382–394, 2014.
- T. Willer, H. Lee, M. Lommel et al., “ISPD loss-of-function mutations disrupt dystroglycan O-mannosylation and cause Walker-Warburg syndrome,” Nature Genetics, vol. 44, no. 5, pp. 575–580, 2012.
- T. Roscioli, E.-J. Kamsteeg, K. Buysse et al., “Mutations in ISPD cause Walker-Warburg syndrome and defective glycosylation of α-dystroglycan,” Nature Genetics, vol. 44, no. 5, pp. 581–585, 2012.
- R. A. Kroes, G. Dawson, and J. R. Moskal, “Focused microarray analysis of glyco-gene expression in human glioblastomas,” Journal of Neurochemistry, vol. 103, supplement 1, pp. 14–24, 2007.
- P. J. Heagerty, T. Lumley, and M. S. Pepe, “Time-dependent ROC curves for censored survival data and a diagnostic marker,” Biometrics, vol. 56, no. 2, pp. 337–344, 2000.
- Q. Liu, P. F. Su, S. Zhao, and Y. Shyr, “Transcriptome-wide signatures of tumor stage in kidney renal clear cell carcinoma: connecting copy number variation, methylation and transcription factor activity,” Genome Medicine, vol. 6, no. 12, p. 117, 2014.
Copyright © 2015 Yueping Zhan et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.