Abstract

Background. The epithelial-mesenchymal transition (EMT) is a pivotal process for fibrotic disease, embryonic development, and wound healing. Moreover, some evidence has proven that the disorder of EMT also plays an important role in carcinogenesis, especially invasion and metastasis of various tumors (Ritchie et al., 2015). Additionally, gastric adenocarcinoma (GAC) is a common gastrointestinal malignancy which is the fourth most commonly diagnosed tumor. Our study is aimed at identifying the prognostic value of EMT-related genes in gastric adenocarcinoma. Methods. Firstly, high-throughput and clinical data were downloaded from The Cancer Genome Atlas (TCGA) database. 99 differentially expressed EMT-related genes (ERGs) were obtained in these gastric adenocarcinoma data. Secondly, GO and KEGG enrichment analyses show that EMT may promote gastric carcinogenesis. Next, 10 ERGs associated with prognosis of gastric adenocarcinoma patients are screened out by univariate Cox regression, and 6 pivotal prognostic ERGs (MMP8, MMP11, TFDP3, MYB, F2, and CNTN1) are identified through multivariate Cox regression. These 6 genes are confirmed with significant prognostic value in gastric adenocarcinoma through overall survival (OS) analysis. Finally, a risk score formula is constructed and tested in another gastric adenocarcinoma cohort from GEO. Results. 99 differentially expressed EMT-related genes (ERGs) and their enriched pathways are identified. 10 ERGs are strongly related to the prognosis of GAC patients. A risk score formula of 6 prognosis-related ERGs used to predict the prognosis of gastric adenocarcinoma patients is identified and tested (). Conclusion. This study provides a potential prognostic signature for predicting prognosis of gastric adenocarcinoma patients and molecular insights of EMT in gastric adenocarcinoma, and the formula focusing on the prognosis of gastric adenocarcinoma can be effective.

1. Introduction

Gastric adenocarcinoma (GAC) is a common gastrointestinal malignancy which is the fourth most commonly diagnosed tumor and the second pivotal cause of cancer-related death toll worldwide [1]. The morbidity of GAC exceeds 400,000 cases, accounting for more than 40% of cancer rates worldwide, and the death toll has reached as high as 300,000 [2]. Over the past 20 years, there has been great progress in the diagnosis and treatment of GAC. However, the detection rate for GAC patients in the early stage remains poor, and most GAC patients are already in the progressive stage when diagnosed. Additionally, the prognosis of GAC patients in advanced stage remains poor, with a 5-year survival rate of less than 30% because of the high recurrence rate after surgery and chemotherapy resistance [3]. Hence, understanding the interaction mechanisms of critical molecules behind GAC occurrence and progression is significant, which will make great aid in the precaution of GAC and in the identification of novel effective therapeutic targets.

The epithelial-mesenchymal transition (EMT) is a pivotal process for fibrotic disease, embryonic development, and wound healing [4, 5]. Moreover, some evidence has proven that the disorder of EMT also plays an important role in carcinogenesis, especially invasion, and metastasis of various tumors, including gastric cancer [68]. Even though the profound mechanism that EMT acts in carcinogenesis has been studied a lot, the prognostic value of EMT-related genes and the biological function of these prognosis-related genes in EMT process remain rudimentary and inconclusive. To sum up, studies on the correlations and underlying biological process between EMT and gastric tumorigenesis should be valuable to develop some new ideas for cancer diagnostic, prognostic, and treatment.

In our present study, the correlation between EMT-related gene (ERG) expression and clinical data in 375 gastric adenocarcinoma (GAC) patients downloaded from The Cancer Genome Atlas (TCGA) was identified, and risk score (RS) was calculated as an independent index for overall survival (OS) prognosis based on ERGs. These findings indicate some pivotal EMT-related genes in gastric adenocarcinoma and can guide the following research on EMT of cancer. Furthermore, these research results could also raise an effective formula of risk score that could monitor EMT and predict prognosis in gastric adenocarcinoma patients.

2. Materials and Methods

2.1. Data Acquisition

The EMT-related gene (ERGs) list was downloaded from epithelial-mesenchymal transition gene database (http://dbemt.bioinfo-minzhao.org/download.cgi) which was used to collect human EMT-related gene. High-throughput sequencing mRNA expression data of ERGs and corresponding clinical data of Stomach Adenocarcinoma (STAD) cohort were obtain from the UCSC Xena datasets (https://xenabrowser.net/datapages/).

2.2. Differentially Expressed ERGs and Enrichment Analysis

The differentially expressed EMT-related genes (ERGs) in mRNA expression data of STAD cohort were identified by the limma package in R software (, ) [9]. Visualization (with volcano plots and heatmaps) was executed with the ggrepel, ggplot, and pheatmap packages in R software. Biological processes (BP), cellular components (CC), and biological pathways in which these ERGs were significantly enriched were identified by gene ontology (GO) enrichment and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses and visualized with the goplot package.

2.3. Establishing an Individualized Prognostic Index according to ERGs

After combining clinical data and mRNA expression data of ERGs, univariate Cox regression analyses were used to screen the differentially expressed ERGs with remarkable prognostic value. Then, multivariate Cox regression analysis was executed in these screened genes to filter out an independent prognostic predictor in ERGs. The clinical feature was identified, and the survival plot of these ERGs were drawn based on STAD-TCGA cohort using log-rank test. At last, the risk score (RS) for these prognostic ERGs was developed. The formula based on multiplied Cox regression coefficients can obtain the relative weight of genes in multiple Cox analysis. The clinical data of GAC patients were separated into high- and low-risk groups by the best RS value as the risk cutoff value. The survival curves were plotted by Kaplan–Meier (KM) method, and differences in the survival rates between high- and low-risk groups were assessed using log-rank test. Finally, these findings are tested in another GAC cohort in GEO datasets through survival analysis and ROC curve analysis.

2.4. Statistical Analysis

All statistical analyses and plots were implemented through R software 3.6.1 (https://www.r-project.org/). In detail, the relations between the mRNA expression of each ERGs and corresponding clinical data were analyzed through univariate Cox regression and independent prognostic factor, and their risk score (RS) was identified through multivariate Cox proportional hazards regression. The diagnostic and prognostic value of RS was calculated through receiver operating characteristic (ROC) curve in each dataset by “survival ROC” package in R software. Factors with value < 0.05 was considered with statistical significance.

3. Result

3.1. Differentially Expressed ERGs

Hierarchical clustering of differentially expressed ERG expression levels.

RNA-seq and clinical follow-up data from TCGA-STAD datasets was downloaded, which contains 375 cancer samples and 32 normal samples.

1015 EMT-related genes (ERGs) are acquired from Human EMT Database. Then, the transcriptional expression of these 1015 ERGs in RNA-seq data was extracted and compared between normal samples and GAC samples through limma package in R software (, ). The results indicated that there are 78 genes significantly upregulated and 21 genes significantly downregulated in GAC (Figures 1(a) and 1(b)). The specific logFC and FDR of differentially expressed genes are showed in Table 1.

3.2. Biological Functions and Significant Pathway Analysis Involved in the Expression of ERGs

Biological functions and significant pathways analysis were executed of these 99 differentially expressed ERGs. Gene ontology (GO) enrichment terms are shown in Figure 2. The result of GO enrichment shows that the screened ERGs positively regulate the growth of GAC.

KEGG pathway enrichment of these genes are shown in Figure 3. These results indicate that the differentially expressed ERGs can promote cancer-related pathways, including PI3k-Akt, IL-17, and JAK-STAT pathways, and may lead to transcriptional misregulation in cancer (Figure 4).

3.3. Identification of Prognostic ERGs

The correlations between the transcriptional expression of these 99 differentially expressed ERGs and clinical data were analyzed using univariate Cox regression ( value < 0.05 is considered a significant criterion). 10 genes were screened with significant prognostic value in GAC (Figure 3).

Then, multivariate Cox regression analysis of these 10 genes was implement, and MMP8, MMP11, MYB, TFDP3, F2, and CNTN1 are screened out which could be the independent prognostic predictor in GAC () (Table 2). The relationships between these screened genes and clinical indexes are as follows () (Figures 5(a)5(f)). Moreover, 5 of these 6 genes, including CNTN1, MMP8, MMP11, MYB, and F2, are identified with significant prognostic value in gastric adenocarcinoma (Figures 6(a)6(e)).

4. Construction and Definition of the RS

According to the result of multivariate Cox regression analysis as Table 1, the formula of the risk score (RS) is as follows: risk score = 0.448115expression value of MMP8 + 0.378892expression value of MMP11 − 0.3226expression value of MYB + 1.322812expression value of TFDP3 + 0.325063expression value of F2 + 0.334197 expression value of CNTN 1. In this formula, MYB has the only negative coefficient, indicating that high expression level of MYB is positively related with favorable prognosis while upregulated expression of MMP8, MMP11, TFDP3, F2, and CNTN1 is negatively related with the survival time of GAC patients.

Furthermore, we investigated the correlations for clinical index and risk score (RS) (Figures 7(a)7(c)). The results of independent sample -test analysis revealed that RS differentially expressed between male and female in GAC patients. Additionally, RS is also higher in patients with distant metastasis and pathological grades.

To validate the pivotal role of RS in predicting the prognosis of GAC, survival analysis between the low-risk group and high-risk group divided by best cutoff was executed in STAD-TCGA cohort. Heatmap shows MMP11 and MYB are significantly downregulated while others are upregulated in samples from high RS patients (Figure 8(a)). The results of survival analysis reveal that GAC patients with high-risk score have significantly worse prognosis than GAC patients with low-risk score (; Figures 8(b)8(d) and 8(f)). As visualized in Figure 8(e), ROC (receiver operating characteristic) curve analysis indicates that RS has considerable diagnostic and prognostic efficacy in GAC patients ().

In addition, the reliability of RS in predicting the prognosis for GAC patients is further verified through univariate Cox analysis and multivariate Cox analyses in STAD-TCGA cohorts (Figures 9(a) and 9(b)). The results indicate that RS remained as an independent prognostic predictor in GAC patients in both univariate Cox regression analysis (, , ) and multivariate Cox regression analysis (, , ), adjusting with other clinical characteristic such as age, gender, tumor histological grade (grade), tumor pathologic stage (stage), and TNM staging (T, N, and M).

Finally, the formula and RS index above were tested in GSE84433 cohort from GEO datasets with 357 GAC tumor cases and corresponding follow-up information. Figure 10(a) shows the significantly differential distribution of high-score group and low-score group through survival analysis (). Figure 10(b) verified the prognostic and diagnostic value of RS (). The result indicated that RS can be a reliable predictor for predicting the prognosis of GAC patients.

5. Discussion

Gastric adenocarcinoma (GAC) is a major fatal type of cancer all around the world with a lack of reliable diagnostic and prognostic biomarker and therapeutic target [10]. It is of great importance for researchers to explore the new diagnostic and therapeutic strategy of GAC on these aspects. On the other hand, it has been widely proved that EMT is an important biological process in cancer, including gastric cancer. Specifically, EMT is characterized by downregulation of the epithelial cell properties and activation of the mesenchymal characteristic. Epithelial and endothelial cells are able to acquire a mesenchymal phenotype through EMT. The expression of certain genes, including those for cadherins, integrins, and many transcription factors, is reprogrammed in the EMT. EMT is thought to be one of the major mechanisms that determine invasion and metastasis of cancer cell [11]. However, most of the studies focus on the function and mechanism of some EMT-related genes and the diagnostic and prognostic value of overall EMT-related genes(ERGs) has not been widely elucidated in GAC.

In this study, we collected the high-throughput data about the transcriptional expression of 375 GAC samples and 32 nontumor sample with their corresponding clinical data from TCGA database. Among these, we screened out the differentially expressed ERGs between GAC samples and nontumor gastric samples. Then, the differentially expressed ERGs are listed and analyzed, and the pivotal functional clusters and pathways of the differential expressed ERGs are identified. Interestingly, major differentially expressed ERGs in the cluster associated with negative regulation of cell aging are upregulated, indicating that EMT process promotes the cellular immortalization of GAC. Another novel finding is DNA H3-K4 methylation, in which the MAPK pathway, JAK-STAT pathway, PI3K-Akt pathway, and IL-17 pathway are involved in EMT process of GAC. The enrichment analysis verifies the affinity between EMT and these enrichment pathways, which reveals some therapeutic target of GAC in EMT process and may lead further research about the internal mechanism. In previous studies, H3K4 methylation has been found widely existing in EMT of various cancers and our finding verified it through high-throughput data [12, 13]. Interestingly, there was no previous study that reported that the IL-17 pathway participates in EMT process while our study revealed that the IL-17 pathway may play an important role in EMT process. The researches about IL-17 pathways are needed in further practice. Univariate Cox regression analysis with survival data indicates that 10 ERGs were related to prognosis of GAC patients in STAD-TCGA cohort. Among these 10 genes, 6 genes, including MMP8, MMP11, MYB, TFDP3, F2, and CNTN1, are screened out as independent prognostic predictor, and a formula about risk score (RS) was raised through multivariate Cox regression analysis. MMP8 and MMP11 are two members of matrix metalloproteinases (MMPs), which are used to degrade extracellular matrix (ECM) components and deeply participated in EMT [14]. MYB is another critical transcriptional factor for EMT. Tao et al. reported that b-Myb regulates snail expression to promote EMT in breast cancer [15]. Qu et.al reported c-Myb promotes development of colorectal cancer through EMT [16]. Our study has revealed the significant role of MYB in GAC tumorigenesis, and further studies are needed to identify the related protein and protein interaction about MYB in GAC. TFDP3 is another MET-related gene which can also induce apoptosis and autophagy in breast cancer and prostate cancer [1719]. F2 and CNTN1 has been reported to promote EMT in gastric cancer [20]. As for our present study, we found that upregulation of MMP8, MMP11, TFDP3, F2, and CNTN1 adds the risk score (RS), and overexpression of MYB could decreases RS. Additionally, upregulation of CNTN1 is significantly related with worse pathologic grade. Overexpression of MMP8 is significantly associated with higher clinical stage, overexpression of MMP11 is significantly associated with bigger tumor size, and low expression of MYB is significantly related to worse pathologic grade and metastasis. The validation gastric cancer cohort from GEO database shows basic consistence with our findings in STAD-TCGA cohort. To sum up, we speculate that MMP8, MMP11, TFDP3, F2, and CNTN1 can promote the progression of GAC while MYB could be a tumor suppressor in GAC.

So far, most of the cancer-related genes identified in bioinformatic methods are analyzed individually by single, which cannot be a comprehensive reflection of carcinogenesis process, and the ability of these single genes remains poor in diagnosis and prognostic prediction [2124]. Moreover, most studies research on cancer of an organ without focusing one specific type of cancer or one specific process in carcinogenesis. In our point of view, it is more useful to identify a cluster of genes with clinical value in one specific cancer-related process, and this can be the highlight in the present study. However, this study also has some imperfection. Firstly, the study can be more valuable if further basic experiments are done on these 6 screened genes to explore their deep mechanism in EMT. Secondly, the validation group in our study is based on GEO database. A cohort based on PCR results from samples of clinical patients and corresponding follow-up can enhance the quality of our present study. The study on these flaws can be our next goal about precision therapy research in GAC.

In conclusion, our study indicates some related genes and pathways in the EMT process of GAC, and MMP8, MMP11, TFDP3, MYB, F2, and CNTN1 can be pivotal ERGs in GAC. The risk score formula can be a sensitive diagnostic and prognostic predictor in GAC. Risk score = 0.448115expression value of MMP8 + 0.378892expression value of MMP11 − 0.3226expression value of MYB + 1.322812expression value of TFDP3 + 0.325063expression value of F2 + 0.334197expression value of CNTN1.

Data Availability

The data used to support the findings of this study are included within the article. The gene expression data can be accessed on The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO).

Conflicts of Interest

No conflicts of interest, financial or otherwise, are declared by the authors.

Acknowledgments

This project was funded by the National Natural Science Foundation of China (No. 81870454 to Bingrong Liu).

Supplementary Materials

The data of risk score of GAC patients related to EMT (Figure 8). (Supplementary Materials)