Biomarkers for Urological MalignanciesView this Special Issue
Prognostic Value of DNA Methylation-Driven Genes in Clear Cell Renal Cell Carcinoma: A Study Based on Methylation and Transcriptome Analyses
Background. Few previous studies have comprehensively explored the level of DNA methylation and gene expression in ccRCC. The purpose of this study was to identify the key clear cell renal cell carcinoma- (ccRCC-) related DNA methylation-driven genes (MDG) and to build a prognostic model based on the level of DNA methylation. Methods. RNA-seq transcriptome data and DNA methylation data were obtained from The Cancer Genome Atlas. Based on the MethylMix algorithm, we obtain ccRCC-related MDG. The univariate and multivariate Cox regression analyses were employed to investigate the correlation between patient overall survival and the methylation level of each MDG. Finally, a prognosis risk score was established based on a linear combination of the regression coefficient derived from the multivariate Cox regression model () multiplied with the methylation level of the gene. Results. 19 ccRCC-related MDG were identified. Three MDG (NCKAP1L, EVI2A, and BATF) were further screened and integrated into a prognostic risk score model, . The risk model was independent from conventional clinical characteristics as a prognostic factor for ccRCC (, 95% confidence interval: 1.063–1.402, and ). The joint survival analysis showed that the gene expression and methylation levels of the prognostic genes EVI2A and BATF were significantly related with prognosis. Conclusion. This study provided an important bioinformatics foundation for in-depth studies of ccRCC DNA methylation.
Clear cell renal cell carcinoma (ccRCC) is the most common pathological type of renal cancer [1, 2]. Approximately 350,000 new patients with RCC are confirmed globally every year, and RCC causes more than 140,000 deaths each year . ccRCC has complex biological characteristics and is not sensitive to radiotherapy and chemotherapy; radical or partial nephrectomy is the main method for treating RCC [4, 5]. Although the surgical effect is definite, 20%-40% of patients experience local recurrence or distant metastasis after surgery , and the mortality rate is very high for advanced renal cell carcinoma . In the last ten years, with the development of molecular biology, several ccRCC driver genes such as von Hippel–Lindau (VHL) and PBRM1 have been discovered [8, 9], and targeted drugs, such as tyrosine kinase inhibitors (TKIs) , have been developed to treat these advanced ccRCC patients; however, not all of these patients can benefit from these drugs. Therefore, the identification of new ccRCC-related driver genes and the establishment of risk models through bioinformatics analysis are very important for patient prognosis evaluation.
Epigenetic changes are thought to be closely related to cancer progression, and aberrant DNA methylation is one of the most crucial and routine epigenetic modifications . DNA methylation is an epigenetic modification that plays a crucial role in regulating the growth, development, gene expression pattern, and stability of the genome without changing the DNA sequence [12, 13]. Many studies in recent years have found that abnormal DNA methylation is closely related to tumorigenesis, development, and cell canceration [14–16]. In particular, alterations in DNA methylation can supply crucial information for the early diagnosis and prognosis of cancer [17–20].
Few previous studies have comprehensively explored the level of DNA methylation and gene expression in ccRCC. The purpose of this study was to identify the key ccRCC-related methylation-driven genes and to build a prognostic model based on the level of DNA methylation.
2. Materials and Methods
2.1. Data Acquisition and Integrative Analysis
In this study, RNA-seq transcriptome data and DNA methylation data were obtained from The Cancer Genome Atlas (TCGA) website (https://portal.gdc.cancer.gov/repository). Of them, the DNA methylation data was using the Illumina Infinium HumanMethylation450 platform and 27 platform, and beta values, ranged from 0 to 1, were quantified to indicate the levels of DNA methylation. Based on R software and packages , we analyzed the above data to obtain differentially methylated genes and differentially expressed genes. In addition, clinicopathological data including survival status, survival time, age, sex, the International Society of Urological Pathology (ISUP) grade, and American Joint Committee on Cancer (AJCC) stage were obtained from TCGA. Based on the MethylMix package , we performed an analysis by combining the differentially methylated genes and differentially expressed genes. MethylMix is an algorithm for identifying the hypermethylation and hypomethylation of genes in diseases . MethylMix is based on a -mixture model to identify the methylation status and compares this status with the normal DNA methylation state. The correlation was computed between the gene methylation level and the gene expression level. Finally, the exported result of MethylMix is methylation-driven genes (MDG).
2.2. MDG Set Enrichment Analysis
Gene functional enrichment analyses were conducted to discover the main biological characteristics of the MDG, including Gene Ontology (GO) analyses with molecular function, biological process, and cellular component analyses . In our study, the biological process category was selected for GO analysis through the enrichGO function in the clusterProfiler package (version 3.6) and the original database was acquired from the “org.Hs.eg.db” package .
2.3. Definition of the MDG-Related Prognostic Model
We randomly divided the tumor samples obtained from TCGA into two cohorts: a training cohort (244 patients, Table S1) and a validation cohort (242 patients, Table S2). We used the training cohort to build the Cox regression risk model, and the validation cohort was adopted to verify the performance of the model. First, we performed univariate Cox regression analysis to identify six methylation-driven genes related to overall survival in patients by considering value < 0.01 as significant. Then, multivariate Cox regression analysis was used to build a prognostic risk model. The risk score was figured out using the regression coefficients () from the multivariate Cox regression model to weight the methylation values of the selected genes. . According to the median risk score, the training cohort was separated into the low-risk and high-risk groups. To assess the predictive power of the risk score, we performed receiver operating characteristic (ROC) curve analysis. The same method was adopted in the testing cohort to test the performance of the model.
2.4. Independence of the Risk Model from Traditional Clinical Features and Building a Nomogram
To verify whether the risk score was independent of other clinical variables in ccRCC patients (including age, sex, ISUP grade, and AJCC stage), the univariate and multivariate Cox regression analyses were performed on the entire TCGA cohort. According to results of multivariate Cox regression analysis, we constructed a nomogram with R package “rms” to assess the overall survival for ccRCC. The predictive performance was evaluated by the index and calibration curve.
2.5. Joint Survival Analysis of the Gene Expression Levels and Methylation Levels of the MDG
To further identify the key genes related to the prognosis of ccRCC patients, based on the survival R package, a joint survival analysis was performed by combining the methylation levels of MDG with the corresponding gene expression levels.
2.6. Statistical Analysis
All statistical analyses were performed using R software. was considered statistically significant. The MethylMix package was used to identify DNA methylation-driven genes. We performed the univariate and multivariate Cox regression analyses to determine whether the risk score had prognostic value independent of various clinicopathological characteristics. We conducted the log-rank test and Kaplan-Meier survival analysis to evaluate the predictive ability of the risk model.
3.1. Data Analysis and Acquisition of DMG in ccRCC
All data were acquired from TCGA. The methylation data were downloaded from 542 cancer tissues and 357 noncancer tissues. In addition, the mRNA expression data were retrieved from 611 samples, consisting of 72 normal samples and 539 cancer samples. On the basis of the LIMMA software package, we analyzed the downloaded data to obtain differentially methylated genes and differentially expression genes. We performed an integrative analysis via the R package MethylMix, and the analysis required three datasets, including normal DNA methylation data, cancer DNA methylation data, and matched gene expression data (Tables S3, S4, and S5). , , and were adopted for screening MDG. Finally, 19 MDG were obtained (Table 1). Heat maps of the ccRCC-related aberrant MDG are shown in Figure 1.
3.2. MDG Set Enrichment Analysis in ccRCC
Gene Ontology analysis showed that MDG were mainly enriched in the regulation of interferon-gamma production, interferon-gamma production, regulation of the inflammatory response to antigenic stimuli, lymphocyte-mediated immunity, myeloid dendritic cell activation, regulation of lymphocyte-mediated immunity, negative regulation of interferon-gamma production, and negative regulation of lymphocyte-mediated immunity (, Figure 2).
Above MDG set enrichment analysis revealed that MDG were significantly linked to inflammatory response regulation and immune regulation.
3.3. Construction of the MDG-Related Prognostic Model
The training cohort was used to construct the risk model. Firstly, we conducted a univariate Cox regression analysis of the methylation level of nineteen MDG. We identified six genes (NCKAP1L, EVI2A, BATF, CD96, IL20RB, and CTSZ) related to overall survival in the training cohort by considering value < 0.01 as significant (Figure 3(a), value < 0.01). To better predict the relationships between the methylation levels of the methylation-driven genes and overall survival of ccRCC, we further analyzed them by stepwise multivariate Cox regression analysis. Finally, three genes were selected to build a predictive model (Figure 3(b), value < 0.05). The risk score was figured out using a linear combination of the methylation levels of the three selected MDG weighted by their specific regression coefficients (). . On the basis of the median risk score, the patients in the training cohort were classified as a high-risk group and a low-risk group. The risk score distribution of the patients according to the prognostic model is shown in Figure 4(a). Survival status scatter plots for the patients according to the prognostic model are shown in Figure 4(b), which shows that the high-risk subgroup contained a higher number of patients who died than the low-risk subgroup. We observed a significant difference in overall survival between the two groups (Figure 4(c), ), and the AUCs at three and five years were 0.719 and 0.689, respectively (Figures 4(d) and 4(e)).
3.4. Testing of the Methylation-Driven Gene-Related Prognostic Model
To confirm the performance of the model, the testing cohort was analyzed. First, we used the selected methylation-driven genes (NCKAP1L, EVI2A, and BATF) to compute the risk score of each patient in the validation cohort. Similar results were observed in the testing cohort, and the risk score distribution of the patients according to the prognostic model is shown in Figure 5(a). Survival status scatter plots for the patients according to the prognostic model are shown in Figure 5(b). Significant survival differences were observed in the testing cohort (Figure 5(c), ). The AUCs at three and five years were 0.674 and 0.659, respectively (Figures 5(d) and 5(e)).
3.5. Independence of the Prognostic Model from Other Clinical Characteristics and Building a Nomogram Based on the Risk Score and Other Clinical Characteristics
We conducted the univariate and multivariate Cox regression analyses to determine whether the risk score was a prognostic factor of ccRCC independent from traditional clinical features (including age, sex, ISUP grade, and AJCC stage). The results revealed that the risk score was independent of conventional clinical features in the entire TCGA cohort (Figure 6), with a hazard ratio (HR) of 1.221 (95% confidence interval: 1.063–1.402; ).
Then, a nomogram predicting the overall survival in ccRCC was constructed based on the risk score and clinical characteristics (Figure 7(a)). The index was 0.776. The calibration curve showed that the nomogram performed well (Figures 7(b) and 7(c)). Both the index and the calibration curve suggested a good predictive performance.
3.6. MethylMix Model of Three Selected DNA MDG
3.7. Joint Prognostic Assessment of MDG in ccRCC
To further identify the key genes related to the prognosis of ccRCC patients, a joint survival analysis was conducted by combining the methylation levels of the MDG with the corresponding gene expression levels. The results are shown in Figures 9(d) and 9(e). The gene expression and methylation levels of the prognostic genes EVI2A and BATF were significantly related to prognosis.
ccRCC is the most common pathological type of kidney cancer . In-depth molecular pathogenesis research and the early detection of tumor prognostic biomarkers and specific driven-genes may have great significance for improving the prognosis of patients [27–29]. In recent years, with the deepening of epigenetic research, the epigenetic regulatory mechanisms of renal cancer, especially DNA methylation, are often used to predict the survival time of renal cancer, including DAB2IP , RCVRN , CRHBP , AR , and CDO1 . However, the heterogeneity of ccRCC limits single-gene methylation in predicting ccRCC outcomes, so it is important to establish a multigene prediction model.
Related studies have shown that abnormal DNA MDG may cause transcriptional disorders, causing some gene expression mistakes and cell differentiation mistakes , and tumor suppressor genes and DNA repair genes were silenced due to hypermethylation. DNA methylation alterations at precancerous stages may determine tumor aggressiveness and patient prognosis . Therefore, identifying abnormal DNA MDG can provide new insights for the risk assessment and prognosis of patients.
In this study, we identified key ccRCC-related MDG and constructed a prognostic model based on the level of DNA methylation. First, we screened nineteen methylation-driven genes by analyzing methylation and transcriptome data. Then, to discover the main biological characteristics of these methylation-driven genes, a series of gene functional enrichment analyses were performed. The gene functional enrichment analyses revealed that MDG were significantly linked to inflammatory response regulation and immune regulation (, Figure 2).
In addition, a risk score-based prognostic model was constructed for these methylation-driven genes, and the prediction ability of this model was validated in the testing dataset. Three methylation-driven genes (NCKAP1L, EVI2A, and BATF) were identified and used to construct a prognostic risk model for ccRCC. We performed the univariate and multivariate Cox regression analyses on the entire TCGA cohort to verify that the risk model was a prognostic factor for ccRCC independent from other clinicopathological data (age, sex, ISUP grade, and AJCC stage) (, 95% confidence interval: 1.063–1.402, and ). In addition, we constructed a nomogram to predict overall survival in patients with ccRCC. The index and calibration curve indicated that the predictive performance of the nomogram was good. Finally, we performed joint survival analysis by combining the methylation levels of the methylation-driven genes with the corresponding gene expression levels. The prognostic genes EVI2A and BATF were significantly related to prognosis.
The gene EVI2A, as the human homolog of mouse genes, may be associated with other proteins in the membrane as a part of a cell surface receptor complex ; EVI2A has been shown to be an oncogene ; EVI2A is highly expressed in oral tongue squamous cell carcinoma  and osteosarcoma  and is a risk factor for cancer prognosis. In this study, we found that EVI2A is one of the MDG of ccRCC, and its hypomethylation leads to poor prognosis (Figure 9(d)).
The BATF gene acts a pivotal part in the development of different types of cancer, including colon cancer, lymphoma, and multiple myeloma [41–43]. In addition, Feng et al. found that BATF acted as an oncogene in non-small-cell lung cancer . In this study, the joint survival analyses revealed that the BATF gene was closely related with poor prognosis in patients with ccRCC (Figure 9(e)).
There are some limitations in this study. The specific mechanisms of EVI2A and BATF in ccRCC were not investigated in the previous literature. Therefore, future studies may focus on the molecular mechanisms underlying the potential interactions of EVI2A and BATF in ccRCC.
In conclusion, we identified key ccRCC-related MDG and constructed a prognostic model based on the level of DNA methylation. In addition, we verified that the risk model was an independent prognostic factor for overall survival in ccRCC. Although further experimental verification is needed, this study provided an important bioinformatics foundation for in-depth studies of ccRCC DNA methylation.
The data included in the current study was available in TCGA database (https://cancergenome.nih.gov/).
Because the dataset in the current study was downloaded from TCGA, and data acquiring and application complied with TCGA publication guidelines and data access policies, additional approval by an ethics committee and consent to participate were not needed.
Conflicts of Interest
The authors declared that they have no conflict of interest.
MH and JX carried out the extraction of data and interpretation of results and wrote the initial manuscript. MH and HH performed the data analyses. ML and JW provided feedback on the manuscript and contributed to the study idea. All authors read and approved the final manuscript. Maolin Hu and Jiangling Xie contributed equally to this work.
This study was supported by the Beijing Hospital Clinical Research 121 Project (BJ-2018-090) and Chinese Academy of Medical Sciences Innovation Fund for Medical Sciences (2018-I2M-1-002).
Table S1 a training cohort. Table S2 a validation cohort. Table S3 normal DNA methylation data. Table S4 cancer DNA methylation data. Table S5 matched gene expression data. (Supplementary Materials)
J. Teng, Y. Gao, M. Chen et al., “Prognostic value of clinical and pathological factors for surgically treated localized clear cell renal cell carcinoma,” Chinese Medical Journal, vol. 127, no. 9, pp. 1640–1644, 2014.View at: Google Scholar
E. Arai and Y. Kanai, “Genetic and epigenetic alterations during renal carcinogenesis,” International Journal of Clinical and Experimental Pathology, vol. 4, no. 1, pp. 58–73, 2010.View at: Google Scholar
H. Tezval, N. Dubrowinskaja, I. Peters et al., “Tumor specific epigenetic silencing of corticotropin releasing hormone -binding protein in renal cell carcinoma: association of hypermethylation and metastasis,” PLoS One, vol. 11, no. 10, article e0163873, 2016.View at: Publisher Site | Google Scholar
R. M. Cawthon, P. O'Connell, A. M. Buchberg et al., “Identification and characterization of transcripts from the neurofibromatosis 1 region: the sequence and genomic structure of EVI2 and mapping of other transcripts,” Genomics, vol. 7, no. 4, pp. 555–565, 1990.View at: Publisher Site | Google Scholar
S. Li, F. Yang, Y.-K. Yang, and Y. Zhou, “Increased expression of ecotropic viral integration site 2A indicates a poor prognosis and promotes osteosarcoma evolution through activating MEK/ERK pathway,” Journal of Receptor and Signal Transduction Research, vol. 39, no. 4, pp. 368–372, 2019.View at: Publisher Site | Google Scholar