Abstract

Objective. Epithelial-mesenchymal transition (EMT) exerts a key function in cancer initiation and progression. Herein, we aimed to develop an EMT-based prognostic signature in gastric cancer. Methods. The gene expression profiles of gastric cancer were obtained from TCGA dataset as a training set and GSE66229 and GSE84437 datasets as validation sets. By LASSO regression and Cox regression analyses, key prognostic EMT-related genes were screened for developing a risk score (RS) model. Potential small molecular compounds were predicted by the CMap database based on the RS model. GSEA was employed to explore signaling pathways associated with the RS. ESTIMATE and seven algorithms (TIMER, CIBERSORT, CIBERSORT-ABS, QUANTISEQ, MCPCOUNTER, XCELL, and EPIC) were applied to assess the RS and immune microenvironment. Results. This study developed an EMT-related gene signature comprised of SERPINE1, PCOLCE2, MATN3, and DKK1. High-RS patients displayed poorer survival outcomes than those with low RS. ROC curves demonstrated the robustness of the model in predicting the prognosis. After external validation, the RS model was an independent risk factor for gastric cancer. Several compounds were predicted for gastric cancer treatment based on the RS model. ECM receptor interaction, focal adhesion, pathway in cancer, TGF-beta, and WNT pathways were distinctly activated in high-RS samples. Also, high RS was significantly associated with increased stromal and immune scores and increased infiltration of CD4+ T cell, CD8+ T cell, cancer-associated fibroblast, and macrophage in gastric cancer tissues. Conclusion. Our findings suggested that the EMT-related gene model may robustly predict gastric cancer prognosis, which could improve the efficacy of personalized therapy.

1. Introduction

Gastric cancer represents a common aggressive malignancy and a common cause of cancer-related deaths globally due to its rapid progress to advanced stages and badly metastatic characteristics [1]. The incidence and prevalence of gastric cancer vary geographically [2]. Despite the improvement in clinical outcomes by implementing standard D2 lymphadenectomy as well as development of chemotherapy and targeted therapy, the overall survival rate of gastric cancer patients is <30% [3]. As a heterogeneous malignancy [4], survival outcomes may greatly vary even for subjects with similar clinical characteristics and therapy regimens, indicating that traditional clinicopathologic characteristics are inadequate for prognosis prediction and risk stratification [5]. Hence, it is important to develop novel clinical tools for predicting the prognosis of gastric cancer.

Epithelial-mesenchymal transition (EMT), a well-characterized embryological process, is a critical molecular step during the process of distant metastases [68]. Clinically, EMT is in relation to unfavorable survival outcomes of gastric cancer [9]. During the EMT process, gastric cancer cells lose the expression of cellular adhesion proteins like E-cadherin and tight junction proteins as well as express many mesenchymal markers like N-cadherin, Vimentin, and ZEB1 [10]. The mesenchymal phenotype also may raise resistance to chemotherapy and contribute to a desirable prognosis [11]. Therefore, an in-depth comprehension on the mechanisms of the EMT process in gastric cancer is required for promoting the progress of specific treatment strategies. Because various large datasets are easily accessible, exploring the gene signatures underlying the mechanisms of gastric cancer has flourished [1214]. Despite the extensive research on the mechanisms of EMT in gastric cancer, the prognostic value of EMT-related genes is still inconclusive. Hence, this study constructed an EMT-based signature for predicting survival outcomes of gastric cancer patients. After external verification, this signature might be a robust prognostic prediction tool and assist clinical strategy.

2. Materials and Methods

2.1. Gene Expression Profiles and Data Processing

RNA-sequencing (RNA-seq) profiles of 32 normal samples and 350 gastric cancer samples were downloaded from The Cancer Genome Atlas (TCGA) via Genomic Data Commons (GDC; https://portal.gdc.cancer.gov/). Also, the matched clinical information was also retrieved. RNA-seq data were converted to transcripts per kilobase million (TPM) values. This dataset was used as the training set. From the Gene Expression Omnibus (GEO; http://www.ncbi.nlm.nih.gov/geo/), microarray expression profiling and clinical information of 400 cases of gastric cancer were retrieved from the GSE66229 dataset on the GPL570 platform ([HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array) [15]. Furthermore, expression profiles and clinical features of 433 gastric cancer were obtained from the GSE84437 dataset on the GPL6947 platform (Illumina HumanHT-12 V3.0 expression beadchip) [16]. The raw microarray data were adjusted by background, normalized, and log transformed. The GSE66229 and GSE84437 datasets were employed as the validation sets. The “HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION” gene set was retrieved from the Gene Set Enrichment Analysis (GSEA) database (http://software.broadinstitute.org/gsea/index.jsp) [17] (Supplementary Table 1).

2.2. Differential Expression Analysis

The expression of EMT-related genes in 350 gastric cancer tissue specimens was compared with 32 normal tissues in TCGA dataset using the limma package [18]. The and adjusted were set as cutoff criteria. Differentially expressed EMT-related genes were visualized into volcano plots and heatmaps.

2.3. Functional and Pathway Enrichment Analysis

Biological functions of differentially expressed EMT-related genes were analyzed via the clusterProfiler package, containing Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis [19]. Terms with were significantly enriched.

2.4. Small Molecular Compound Prediction

Differentially expressed genes with and adjusted were screened between the high- and low-RS groups. Then, up- and downregulated tags were separately uploaded onto Connectivity Map (CMap) [20]. The match between these genes and small molecular compounds from CMap was evaluated through a connectivity score from −1 to 1. Positive scores denote stimulative effects of compounds on the query signatures. Meanwhile, negative scores implicate inhibitory effects of compounds on the query signatures.

2.5. Generation and Verification of a Risk Score (RS) Model

In TCGA dataset, differentially expressed EMT-related genes with prognostic value were filtered via univariate Cox regression analyses. Genes with were included for least absolute shrinkage and selection operator (LASSO) Cox regression model analyses using the glmnet package [21]. The penalized Cox regression model with LASSO penalty was employed for achieving shrinkage and variable selection. Tenfold cross-validation was presented for determining the optimal value of the penalty parameter . Based on value, factors with the matched coefficients were chosen. RS of each patient was determined on the basis of the expression levels of genes and their coefficients. According to the median value, patients were split into the high- and low-RS groups. Kaplan-Meier curves and log-rank test were employed for analyzing the overall survival (OS) difference between the high- and low-RS groups. Receiver operating characteristic (ROC) analysis was conducted for detecting the predictive accuracy of this RS model in the prognosis. Furthermore, the RS model was externally validated in the GSE66229 and GSE84437 datasets.

2.6. Screening Independent Prognostic Factors

Univariate Cox regression analysis was applied for evaluating the significance of the RS model and clinical characteristics in predicting gastric cancer patients’ OS. Factors with were included for multivariable logistic regression analysis, and confounding factors were excluded. The hazard ratio (HR) and 95% confidence interval (CI) were calculated. The results were visualized into a forest plot.

2.7. Subgroup Analysis

To evaluate the predictive sensitivity of the RS model in gastric cancer OS, patients were split into subgroups based on clinical features, as follows: age (>65 and ≤65), gender (female and male), M (M0 and M1), N (N0 and N1-3), T (T1-2 and T3-4), and stage (I-II and III-IV). The survival difference between the high- and low-RS samples was compared in each subgroup.

2.8. Development of a Prognostic Nomogram

RS and traditional clinicopathological characteristics were included in the nomogram through the rms package. To assess the performance of the nomogram in predicting 1-, 3-, and 5-year OS time, nomogram-predicted OS probability was compared with actual survival time by calibration curves. Furthermore, the predictive efficacy of this nomogram was externally verified in the GSE66229 and GSE84437 datasets.

2.9. GSEA

The GSEA method was applied for exploring the potential KEGG pathways activated in high-RS gastric cancer samples. The reference gene set was retrieved from “c2.cp.kegg.v7.1.symbols” file. The significantly enriched pathways were screened with .

2.10. Estimation of Immune Score, Stromal Score, and Tumor Purity

The immune score, stromal score, and tumor purity were estimated in gastric cancer tissue specimens via the Estimation of STromal and Immune cells in MAlignant Tumor tissues using Expression data (ESTIMATE) algorithm [22].

2.11. Analysis of Immune Cell Infiltrations

To reveal the associations of the risk score and diverse tumor-infiltrating immune cells, seven algorithms including TIMER, CIBERSORT, CIBERSORT-ABS, QUANTISEQ, MCPCOUNTER, XCELL, and EPIC were applied for quantifying the infiltration levels. Differences in immune-infiltrating cell fractions were estimated between the high- and low-risk groups.

2.12. Statistical Analysis

All statistical analyses were conducted using R software (version 3.6.2; https://www.r-project.org/). Comparisons between groups were carried out with Student’s -test and Wilcoxon rank-sum test. The Spearman correlation test was applied to assess the correlation between immune cells. values < 0.05 were considered statistically significant.

3. Results

3.1. Identification of Dysregulated EMT-Related Genes and Their Functions in Gastric Cancer

Following the comparison of expression of EMT-related genes between gastric cancer and normal tissues, 79 differentially expressed EMT-related genes with and adjusted were identified (Supplementary Table 2). Among them, 67 EMT-related genes were upregulated and 12 were downregulated in gastric cancer (Figures 1(a) and 1(b)). GO enrichment analyses were conducted to elucidate the functional characteristics of these differentially expressed EMT-related genes. Our data showed that these genes were markedly enriched in extracellular matrix (ECM) organization, extracellular structure organization, and collagen fibril organization (Figure 1(c)). Meanwhile, these genes were distinctly related to several key pathways like focal adhesion, ECM-receptor interaction, PI3K-Akt signaling pathway, and proteoglycans in cancer (Figure 1(d)). Hence, it is required to illustrate their clinical implications in gastric cancer.

3.2. Generation of a Prognostic EMT-Related RS Model for Gastric Cancer

By the mRNA expression profiling of TCGA dataset, we screened 35 EMT-related genes associated with OS of gastric cancer with univariable Cox regression analysis (Figure 2(a); Table 1). These genes were further analyzed using LASSO Cox regression model analysis. As a result, we generated a 4-EMT-related gene model for gastric cancer (Figures 2(b) and 2(c)). The RS was determined for each gastric cancer, as follows: . Because the median RS was convenient for clinical application, this study set the median value as the cutoff value, and patients were split into the high- and low-RS groups (Figure 2(d)). We compared the survival status between groups. In Figure 2(e), more deaths occurred in the high-RS group. Furthermore, for each patient, high RS was indicative of an unfavorable prognosis (; Figure 2(f)). However, there was no significant difference in clinical characteristics between the high- and low-RS groups (Table 2). The area under the curve (AUC) of the RS model was 0.763, indicating good performance in predicting patients’ OS (Figure 2(g)). Our univariate Cox regression analysis showed that age (), stage (), N (), and RS () were distinctly associated with a poor prognosis (Figure 2(h)). Under multivariate Cox regression analysis, age (), stage (), and RS () were independent risk factors for the gastric cancer prognosis (Figure 2(i)).

3.3. Subgroup Analysis of the Prognostic Value of the EMT-Related RS Model

SERPINE1, PCOLCE2, MATN3, and DKK1 expression was compared between the high- and low-RS groups. In Figure 3(a), there were increased expression levels in the high- than low-RS groups. To assess whether the EMT-related RS model could sensitively predict gastric cancer patients’ prognosis, we carried out subgroup analysis. Our data showed that high RS was predictive of undesirable survival outcomes compared with low RS in each subgroup including (; Figure 3(b)) and (; Figure 3(c)), female (; Figure 3(d)) and male (; Figure 3(e)), M0 (; Figure 3(f)) and M1 (; Figure 3(g)), N0 (; Figure 3(h)) and N1-3 (; Figure 3(i)), T1-2 ( Figure 3(j)) and T3-4 (; Figure 3(k)), stage I-II (; Figure 3(l)) and stage III-IV (; Figure 3(m)).

3.4. External Validation of the EMT-Related RS Model

The predictive efficacy of the EMT-related RS model was externally verified in the GSE66229 and GSE84437 datasets. With the same formula, we calculated the RS of each patient. In the GSE66229 dataset, patients were split into the high- and low-RS groups based on the median value (Figure 4(a)). As expected, more deaths were found in the high-RS group (Figure 4(b)). The clinical features between groups were compared, and we found that high RS was in relation to late stage, T, and M (Table 3). Furthermore, high-RS patients exhibited more undesirable survival outcomes (; Figure 4(c)). AUC of the RS model was 0.675 (Figure 4(d)). Similarly, we split patients in the GSE84437 dataset into the high- and low-RS groups (Figure 4(e)). There were more patients with dead status in the high-RS group (Figure 4(f)). In Figure 4(g), high RS was distinctly related to poor prognosis (). And AUC of the model was 0.637 (Figure 4(h)). Consistent with TCGA dataset, increased SERPINE1, PCOLCE2, MATN3, and DKK1 expression was detected in the high-RS group than the low-RS group in GSE66229 (Figure 5(a)) and GSE84437 (Figure 5(b)) datasets. Following univariate (Figure 5(c)) and multivariate (Figure 5(d)) Cox regression analyses, the RS model was markedly correlated with gastric cancer prognosis in the GSE66229 dataset. Consistently, in the GSE84437 dataset, the RS model was also a risk factor for prognosis according to univariate (Figure 5(e)) and multivariate (Figure 5(f)) Cox regression analyses. Collectively, the EMT-related RS model displayed good generalizability in clinical practice.

3.5. Development of a Prognostic Nomogram Based on the EMT-Related RS Model

Independent risk factors were included in the prognostic nomogram for gastric cancer. In TCGA dataset, the nomogram including age, stage, and RS was constructed for predicting patients’ survival duration (Figure 6(a)). The calibration curves confirmed that the nomogram-predicted 1-, 3-, and 5-year survival probabilities were in accord with observed survival duration (Figures 6(b)6(d)). Similarly, the nomogram was developed in the GSE66229 dataset (Figure 6(e)). The well predictive efficacy was verified by the calibration curves (Figures 6(f)6(h)). Meanwhile, the nomogram was validated in the GSE84437 dataset (Figures 6(i)6(l)).

3.6. Prediction of Underlying Small Molecular Compounds for Gastric Cancer Based on Dysregulated EMT-Related Genes

Totally, 209 differentially expressed genes were identified between the high- and low-RS groups (Supplementary Table 3). Based on them, underlying compounds were predicted by the CMap database, as listed in Table 4. The mechanism of action analysis was then conducted to investigate the shared mechanisms among the compounds. In Figure 7(a), estrogen receptor agonist was shared by dienestrol and diethylstilbestrol.

3.7. Identification of the EMT-Related Gene Model Associated Signaling Pathways

In TCGA dataset, ECM receptor interaction (), focal adhesion (), pathway in cancer (), TGF-beta signaling pathway (), and Wnt signaling pathway () were markedly activated in high-RS gastric cancer specimens (Figure 7(b)). The above activated pathways were confirmed in the GSE66229 (Figure 7(c)) and GSE84437 (Figure 7(d)) datasets.

3.8. Associations between the EMT-Related RS Model and Immune Microenvironment of Gastric Cancer

Using the ESTIMATE algorithm, we estimated the stromal score, immune score, and tumor purity of gastric cancer tissues from TCGA dataset and analyzed their relationships with the RS. Our data showed that high RS was distinctly related to increased stromal and immune scores as well as lowered tumor purity in gastric cancer (Figure 8(a)). Seven algorithms including TIMER, CIBERSORT, CIBERSORT-ABS, QUANTISEQ, MCPCOUNTER, XCELL, and EPIC were employed to estimate the immune cell infiltrations in each sample. We compared the differences in immune cell infiltrations between the high- and low-RS groups. In Figure 8(b), higher infiltration levels of CD4+ T cell, CD8+ T cell, cancer-associated fibroblast, and macrophage were found in the high-RS group than the low-RS group.

4. Discussion

EMT-based gene signatures have been developed in bladder cancer [23], glioma [24], and colorectal cancer [25]. EMT is determined to be closely associated with gastric cancer progression and prognosis. Increased motility and invasiveness mediated by the EMT process are key during the initiation of cancer metastasis. However, no studies have reported the prognostic value of EMT-based signatures in gastric cancer. Here, we developed an EMT-related RS model that was comprised of SERPINE1, PCOLCE2, MATN3, and DKK1 in gastric cancer via the LASSO method, which may classify gastric cancer patients into the high- and low-risk categories. This LASSO method has been widely applied for analyzing high-dimensional data, which may screen feature signatures with robust prognostic potential and weak correlations among them to avoid overfitting [26].

Alterations in gene expression are in relation to the carcinogenic process. Here, we screened 67 upregulated and 12 downregulated EMT-related genes in gastric cancer. These genes were distinctly enriched in ECM organization, extracellular structure organization, and collagen fibril organization as well as several cancer-related pathways like focal adhesion, ECM-receptor interaction, PI3K-Akt signaling pathway, and proteoglycans in cancer, highlighting their critical implications in gastric cancer pathogenesis. By the LASSO method, we generated an EMT-based signature containing SERPINE1, PCOLCE2, MATN3, and DKK1. After validation, this signature was independently predictive of survival outcomes. Previously, SERPINE1 upregulation was found in gastric cancer and in relation to unfavorable prognoses [27]. Furthermore, it was tightly correlated to the EMT process in gastric cancer [28]. As an oncogene, it may facilitate tumor cell proliferation, migration, and invasion in gastric cancer through mediating the EMT process [29]. The roles of SERPINE1 on angiogenesis and metastasis in gastric cancer were also found [30]. MATN3 was aberrantly methylated and dysregulated in gastric cancer and related to an undesirable prognosis [31]. DKK1, as an inhibitor of Wnt signaling, was also in relation to survival outcomes of gastric cancer [32]. Nevertheless, more research should be conducted for investigating the roles of PCOLCE2 in gastric cancer progression. To facilitate personalized prediction of the patient’s prognosis, we generated the nomogram by incorporating the RS model and traditional clinicopathological characteristics. These model-predicted survival probabilities were highly consistent with actual survival probabilities.

Several small molecular compounds were predicted for treating gastric cancer based on the RS model such as puromycin, trolox C, cloxacillin, indoprofen, diethylstilbestrol, and caffeic acid. In our future studies, we will verify the therapeutic effects of these compounds on antigastric cancer by experiments. Our GSEA demonstrated that ECM receptor interaction, focal adhesion, pathway in cancer, TGF-beta signaling pathway, and Wnt signaling pathway were markedly activated in high-RS gastric cancer, indicating that this model was in relation to these pathways. The immune microenvironment exerts a key role in tumor progression. Our further analysis found tight associations between this model and immune microenvironment. This indicated that EMT might participate in reshaping the immune microenvironment of gastric cancer, which will be validated in our future research.

5. Conclusion

Collectively, our study established an EMT-based signature that may robustly predict gastric cancer prognosis and improve the efficacy of personalized therapy. The predictive performance will be verified in a larger cohort of gastric cancer.

Abbreviations

EMT:Epithelial-mesenchymal transition
RNA-seq:RNA-sequencing
TCGA:The Cancer Genome Atlas
GDC:Genomic Data Commons
TPM:Transcripts per kilobase million
GEO:Gene Expression Omnibus
GSEA:Gene Set Enrichment Analysis
GO:Gene Ontology
KEGG:Kyoto Encyclopedia of Genes and Genomes
RS:Risk score
LASSO:Least absolute shrinkage and selection operator
OS:Overall survival
ROC:Receiver operating characteristic
HR:Hazard ratio
CI:Confidence interval
CMap:Connectivity Map
ESTIMATE:Estimation of STromal and Immune cells in MAlignant Tumor tissues using Expression data.

Data Availability

The data used to support the findings of this study are included within the supplementary information files.

Conflicts of Interest

The authors declare no conflicts of interest.

Authors’ Contributions

Huiyong Xu and Huilai Wan contributed equally to this work.

Acknowledgments

This research was supported by the Medical and Health Guidance Project of Xiamen (3502Z20184041 and 3502Z20184042).

Supplementary Materials

Supplementary 1. Supplementary Table 1: a list of EMT-related gene signatures.

Supplementary 2. Supplementary Table 2: differentially expressed EMT-related genes in gastric cancer.

Supplementary 3. Supplementary Table 3: differentially expressed genes between high- and low-RS groups.