Identification of Novel Gene Signature Associated with Cell Glycolysis to Predict Survival in Hepatocellular Carcinoma Patients
Purpose. As hepatocellular carcinoma (HCC) is a complex disease, it is hard to classify HCC with a specific biomarker. This study used data from TCGA to create a genetic signature for predicting the prognosis of HCC patients. Methods. In a group of HCC patients (n = 424) from TCGA, mRNA profiling was carried out. To recognize gene sets that differed significantly between HCC and normal tissues, an enrichment study of genes was carried out. Cox relative hazard regression models have been used to identify genes that are significantly associated with overall survival. To test the function of a prognostic risk parameter, the following multivariate Cox regression analysis was used. The log-rank test and Kaplan–Meier survival estimates were used to test the significance of risk parameters for predictive prognoses. Results. Eight genes have been identified as having a significant link to overall survival (PAM, NUP155, GOT2, KDELR3, PKM, NSDHL, ENO1, and SRD5A3). The 377 HCC patients were divided into eight-gene signature-based high/low-risk subgroups. The eight-gene signature’s prognostic ability was unaffected by a number of factors. Conclusion. To predict the survival of patients with HCC, an eight-gene signature associated with cellular glycolysis was then identified. The findings shed light on cellular glycolysis processes and the diagnosis of patients with low HCC prognoses.
Hepatocellular carcinoma (HCC) is the most prevalent primary type of liver cancer [1, 2]. This is a heterogeneous tumor with multiple genetic and epigenetic events and is typically associated with particular risk factors such as hepatitis B or C infection, excessive consumption of alcohol, hemochromatosis, or nonalcoholic fatty liver disease induced by insulin resistance and obesity . It is also the world’s fifth most prevalent cancer and the second most influential cause of cancer mortality in people. Liver cancer accounts for 70–85 percent of the overall economic burden of cancer [4, 5]. While some improvement in its clinical diagnosis and treatment has been made in recent years, HCC’s metastasis and recurrence rates after radical resection are still high. Specific diagnosis criteria and clinical targets are also desperately required for this disorder.
In the modern age of “omics,” the advent of a number of innovative techniques such as sequencing and microarray also accelerated the quest for biomarkers [6–8]. Many biomarkers of HCC have been identified, such as α-fetoprotein (AFP) and des-γ-carboxy prothrombin (DCP) . Via database mining, we established thousands of biomarkers which may be correlated with tumor patient prognosis. Since HCC is a dynamic disorder, HCC with a specific biomarker is also challenging to characterise. Studies have shown that analyzing genetic traits affecting several genes may boost prediction of prognoses [10, 11]. Specific treatment strategies can be guided by the polygenic prognostic characteristics of primary tumor biopsy. Latest research also investigated the impact of polygenic markers on HCC to determine prognosis and classify prospective patients with high-risk HCCs.
Aerobic glycolysis is one of the important characteristics of tumor, which provides survival advantage for tumor. At present, most people think that malignant tumor is not only a genetic disease but also an energy metabolic disease [10, 12]. Many glycolytic enzymes can stimulate cancer cell growth, and this “Warburg effect” is reported in various tumor forms [13, 14]. The creation of a new gene signature correlated with glycolysis may also forecast HCC. Genes were selected in this study using the gene set enrichment analysis (GSEA). GSEA is a new computing method, which can reveal more general trend of data, rather than just detect gene expression differences . This approach thus strengthens the mathematical study of biological speech and biological context.
In our analysis, we collected glycolysis-related genomes from 424 HCC cases with full TCGA database mRNA expression datasets. We have verified the primary glycolysis-related mRNAs and built up an eight-gene risk signature that can predict patient prognosis accurately. Interestingly, this signature of the risk associated with glycolysis will accurately identify patients in the high-risk community who have low prognosis on multiple pathways.
2.1. Patients’ Clinical and mRNA Expression Data Collection
In TCGA (https:/portal.gdc.cancer.gov/), we have collected clinical evidence and mRNA expression profiles from the hepatocellular cancer patients . The trial included clinical data from 377 patients and age, gender, grade, stage, topography of the tumor (T), distant metastasis (M), and lymph node status (N) (Table 1).
2.2. Gene Set Enrichment Analysis
We selected five gene sets that are most closely related to glycolysis for GSEA (http://www.broadinstitute.org/gsea/index.jsp) analysis to determine whether there are significant differences in the recognized gene sets between the HCC group and normal group . First, the expression level of 56753 mRNAs in the liver and neighboring tissues was examined. Finally, we determined the function of the follow-up analysis with the standardized value ().
2.3. Data Processing and Calculation of Risk Parameters
The log2 transformation was used to normalize single mRNA from the expression profiles. Univariate Cox regression analysis has been used to identify genes correlated with total survival (OS) and has been exposed to multivariate Cox regression to test prognostic genes and gain coefficients. Selected mRNAs have then been divided into the type and protective type (0 < HR < 1) (hazard ratio, HR> 1). Risk parameter = ∑ (βn × expression of gene n). The 548 patients were grouped into high-risk and low-risk subgroups utilizing the median risk criterion as a cutoff.
2.4. Statistical Analysis
Kaplan–Meier survival curves have been used to measure the value of the risk parameter. Multivariate Cox analysis and stratification data analysis were performed to check whether the risk parameters were independent of clinical characteristics such as age, gender, grade, stage, tumor topography (T), distant metastasis status (M), and lymph node status (N). was found statistically important. R software (v3.6.1) was used for all statistical analysis.
3.1. Initial Gene Screening Using GSEA
From the TCGA report, we received clinical features from 377 HCC patients along with expression details for 56753 mRNAs. The expressive signatures of the glycolysis gene sets have been derived from the MSigDB database by concentrating multiple gene sets. GSEA was performed to decide if the gene sets detected differed considerably between tumor tissues and normal tissues. 3 gene sets, including the Hallmark, Reactome, and Reactome modulation of glycolysis, were significantly enriched with standardized values <5% of the five gene sets most correlated with glycolysis (Table 2 and Figures 1 and 2).
3.2. Identification of Survival-Associating mRNAs Related to Glycolysis
First, the univariate Cox regression study was carried out with 226 genes for early screening, and 201 genes () were collected. A multivariate Cox regression study was subsequently performed to further explore the relation between the expression profiles of 201 mRNA and the survival of the individual, using the phased exclusion approach to classify the most relevant mRNAs. 31 mRNAs were verified, and eight of the 31 genes validated as independent prognostic markers of HCC are given in Table 3. The filtered mRNAs were classified into dangerous forms (PAM, NUP155, KDELR3, NSDHL, ENO1, and SRD5A3), with HR >1 associated with weaker survival and safe sort (GOT2 and PKM) and HR <1 associated with enhanced survival (Table 3).
The changes in eight filtered genes were then evaluated by analyzing 377 HCC samples in the database of cBioPortal (http:/cbioportal.org) . The findings revealed that 35 (9.3%) of sequenced instances had changed the queried genes. The PAM gene included 3 missense mutations samples. The NUP155 gene included 3 missense mutations samples and 1 splice mutation sample. The KDELR3 gene was altered in 1.1% of cases. The PKM and SRD5A3 genes were modified in 0.3% of cases, with the NSDHL and ENO1 genes changed in 1.4% and 2.8% of cases, respectively (Figure 3(a)).
The expression differences between adjacent normal tissues and HCC tissues were also compared with 8 genes. The expression rates of the 8 genes have been greatly enhanced or reduced in HCC tissues (Figures 3(b) and 3(c)).
3.3. Construction of an 8-mRNA Signature to Predict Patient Outcomes
The forecast score model was developed on the basis of a linear combination of weighted expression rates and regression coefficients from the Cox regression multivariate analysis: risk score = 0.2193 × expression of PAM + 0.4542 × expression of NUP155-0.2835 × expression of GOT2 + 0.1396 × expression of KDELR3-0.1785 × expression of PKM + 0.3203 × expression of NSDHL + 0.1829 × expression of ENO1 + 0.3133 × expression of SRD5A3. We estimated the outcomes and graded the patients by a mean risk value into high and low categories (Figure 4(a)). The life period of every patient (in years) is shown in Figure 4(b), and the high-risk patients showed higher mortality rates than the low-risk patients. In addition, a heatmap (Figure 4(c)) was released to show the expression profiles of the 8 mRNAs, utilizing the median risk score as a cutoff for patients to use the 8-mRNA survival risk score in a low-risk or high-risk category. The ROC curve review value was 0.717 (Figure 5), which showed that the 8-mRNA signature was well adapted and unique to the metastasis and survival of HCC patients. The amount of expression of dangerous mRNA (PAM, NUP155, KDELR3, NSDHL, ENO1, and SRD5A3) in the high-risk community was higher than that of the low-risk category. In comparison, in the high-risk community, the expression level mRNA type (GOT2 and PKM) was lower than in the low-risk category.
3.4. Risk Parameter Derived from 8-mRNA Signature is an Independent Prognostic Indicator
We have contrasted the prognostic meaning of risk parameters with clinical pathological parameters through univariable and multivariate analyses (Table 1). Samples were choosed with well-established clinical evidence. For the 377 HCC cases, the mean age was 65. Among the 377 patients, 255 (67.6%) were male and 122 (32.4%) were female. Among 372 patients, 235 (63.2%) had grade I-II tumors and 137 (36.8%) had grade III-IV tumors. Moreover, 262 (74.2%) of the 353 HCC patients suffered from stage I-II disease and 91 (25.8%) of the remaining patients suffered from stage III-IV disease. From the above, the risk parameter and stage were determined as independent prognostic indicators because these factors demonstrated significant variations both in univariate and in multivariate analyses (Table 4). The risk parameters displayed important (HR = 1.770) prognostic values, in particular (Figure 6).
3.5. Validation of Eight mRNA Markers for Survival Prediction by Kaplan–Meier Curve Analysis
Kaplan–Meier assessments of survival found that high-risk patients have a weak prognosis (Figure 7(a)). Univariate OS regression analysis of Cox found many predictive HCC-related clinicopathologic parameters, including age, gender, grade, stage, tumor topography, distant metastasis status, and lymph node status. We then used survival figures from Kaplan–Meier to test these findings, which provided clear outcomes, with weak prognosis correlated with patients that suffered from stage III-IV cancer and with tumor topography 3-4 (Figures 7(b) and 7(c)). These results further confirmed the reliability of the analysis.
Further stratified analysis for data processing has also been performed. As shown in the K–M curve, irrespective of age, class, or grade (e.g., grade I-II or grade III-IV); the eight-mRNA signature was a reliable prognostic marker for high-risk HCC patients with poor prognosis (Figure 8). The danger parameter cannot, however, be used separately for such subgroups in view of the specific subgroups of stage III-IV, T3-4. Maybe, it is because there are relatively few normal samples in these groups. That point calls for further exploration.
In recent years, studies have shown that it is not accurate to use one or several clinical features to evaluate the prognosis of tumors. Therefore, more and more research studies focus on mRNA and regard it as a biological marker of tumor progression and prognosis. For example, AFP-L3 is considered to be a specific biomarker of HCC . In the early stage of hepatocarcinogenesis, the expression of squamous cell carcinoma antigen (SCCA) complexed with IgM increased, which may be an important serum biomarker for early detection of hepatocarcinoma [19, 20]. Because gene expression is easily influenced by many factors, it cannot be used as a reliable and independent prognostic indicator in many cases. In fact, HCC is a complex disease, so it is difficult to use a single biomarker to determine its nature. Therefore, the study of binding biomarkers may provide valuable reference for diagnosis and prognosis. To boost the prediction, a mathematical model consisting of genetic markers of several linked genes along with the predictive influence of each constituent gene is used. This model is more reliable to determine the prognosis of tumor patients than a biomarker, so it is commonly used [21, 22].
The existing high-throughput sequencing technology can extract a large number of genomic data from a single sample to determine new diagnosis, prognosis, or pharmacological biomarkers . And mathematical simulations have estimated the prognosis of certain cancers with the advancement of gene marker technologies. In patients with lung adenocarcinoma, for example, a new signature to inhibit metastasis and survival was discovered by means of Cox regression and ROC study. . In this study, we identified three functional glycoside gene sets that are closely related to GSEA. As above, we have selected the top-level screening feature for glucose metabolism and survival in cancer patients. The prognostic value of 8 gene combinations was determined by univariate and multivariate Cox regression analyses. Compared with other known prognostic indicators, the selected risk model may be a more targeted and powerful prognostic evaluation method, which can be used as a more effective classification tool for patients with HCC.
We used TCGA’s HCC data collection to gather genes linked to glycolysis and compare standard and HCC tissue results. Then, we selected Kaplan–Meier survival assessment, and the results showed that the prognosis of patients with low-risk parameters was relatively good. Among the eight genes, the expression of nup155 in HCC is considered to be part of the p53 regulatory network . GOT2 has been shown to be involved in the energy metabolism of tumor cells. KDELR3 is considered to be one of genes that formed 11-gene-based prognostic signature of uveal melanoma . PKM can promote anabolism and regulate glycolysis and promote tumorigenesis by glycolysis and control gene expression . NSDHL is believed to be closely related to cholesterol metabolism. And in the study of glioma and lung cancer, ENO1 has been considered to be a possible promoter of tumor metabolism and make those tumor cells with high expression of ENO1 have growth advantage . SRD5A3 is considered to be a target of prostate cancer treatment . However, we did not find the relationship between PAM, metabolism, and tumor. The traditional prognosis system usually does not estimate the risk stratification and clinical results accurately. Thus, the prediction method dependent on 8-mRNA markers will help predict metastasis and prognosis of HCC in contrast with one widely used biomarker.
Aerobic glycolysis is one of the most important characteristics of tumor, which provides survival advantage for tumor, also known as the “Warburg effect” . This seemingly uneconomical way of energy supply is necessary for tumor cells. It not only provides energy for the growth of tumor cells but also provides raw materials for their biosynthesis. Zuo et al. believed that PGC1-α can inhibit the Warburg effect by downregulating pyruvate dehydrogenase kinase isoenzyme 1 (PDK1), so PGC1-α is considered as a potential factor to predict the prognosis and treatment target of liver cancer patients . In terms of treatment, Ma et al. found that SRC-3 was highly expressed in patients with liver cancer and interacted with c-myc, the central regulator of the Warburg effect to promote its recruitment to the glycolysis gene promoter. And SRC-3 maybe is a potential target for sorafenib resistant treatment of liver cancer . Three key allosteric enzymes control aerobic glycolysis: hexokinase (HK), phosphofructokinase (PFK), and pyruvate kinase (PK). Hexokinase (HK) is a key enzyme that catalyzes the first irreversible step of glycolysis and is associated with poor prognosis in cancer patients. Xu et al. found that mir-885-5p can act on 3’ UTR of hexokinase 2 (HK2), significantly reducing glucose uptake and lactate production . PFK is highly expressed and activated in human cancers, including HCC, in order to generate additional energy to support tumor growth. PFK is an important potential target that can take away from cancer cells the energy and matrix needed for macromolecular synthesis and proliferation and cause normal cells to survive. The last main enzyme in glycolysis is pyruvate kinase (PK). Among the three key enzymes, PK is the most important because it controls the final conversion of phosphoenolpyruvate to pyruvate. There are four subtypes of PK (L, R, M1, and M2), and PKM2 has been found to increase significantly in hepatoma cells and played a key role in the regulation of glycolysis. It has been reported that targeting PKM2 can enhance the therapeutic effect of HCC. In general, these important glycolytic enzymes play an important role in the growth and treatment of HCC. Glycolysis can therefore be involved in the development and occurrence of HCC. We reported first a glycolysis gene marker (PAM, NUP155, KDELR3, NSDHL, ENO1, SRD5A3, GOT2, and PKM) and then proved the prognostic value for HCC.
In conclusion, eight gene risk factors associated with glycolysis are reported which can help predict survival and prognosis in HCC patients. The greater the probability factor, the bad the prediction is. This finding will allow prospective studies to discover potential HCC treatments which will provide HCC patients with further genomic targets.
The datasets used to support the findings of this study are available from TCGA (https://portal.gdc.cancer.gov/repository) database.
Conflicts of Interest
The authors declare that there are no conflicts of interest.
M.W and F.J contributed equally to this work. CY.W and GY.Y designed the study; K.W and M.W collected the data; F.J and CY.W did the statistic analysis; F.J and EL.M prepared the manuscript draft; GY.Y revised the manuscript. All the authors approved the final manuscript.
Chuyan Wu was supported by the China Scholarship Council for 1 y study at the University of Johns’ Hopkins.
Y. Du, S. Lu, J. Ge et al., “ROCK2 disturbs MKP1 expression to promote invasion and metastasis in hepatocellular carcinoma,” American Journal of Cancer Research, vol. 10, no. 3, pp. 884–896, 2020.View at: Google Scholar
T. IuS, “Detection of embryo-specific alpha-globulin in the blood serum of a patient with primary liver cancer,” Voprosy Meditsinskoi Khimii, vol. 10, pp. 90-91, 1964.View at: Google Scholar