Abstract

Objective. Esophageal cancer (ESCA) is one of the most aggressive malignancies globally with an undesirable five-year survival rate. Here, this study was conducted for determining specific functional genes linked with ESCA initiation and progression. Methods. Gene expression profiling of ESCA was curated from TCGA (containing 160 ESCA and 11 nontumor specimens) and GSE38129 (30 paired ESCA and nontumor tissues) datasets. Differential expression analysis was conducted between ESCA and nontumor tissues with adjusted value <0.05 and |log2fold-change|>1. Weighted gene coexpression network analysis (WGCNA) was conducted for determining the ESCA-specific coexpression modules and genes. Thereafter, ESCA-specific differentially expressed genes (DEGs) were intersected. Functional enrichment analysis was then presented with clusterProfiler package. Protein-protein interaction was conducted, and hub genes were determined. Association of hub genes with pathological staging was evaluated, and survival analysis was presented among ESCA patients. Results. This study determined 91 ESCA-specific DEGs following intersection of DEGs and ESCA-specific genes in TCGA and GSE38129 datasets. They were remarkably linked to cell cycle progression and carcinogenic pathways like the p53 signaling pathway, cellular senescence, and apoptosis. Ten ESCA-specific hub genes were determined, containing ASPM, BUB1B, CCNA2, CDC20, CDK1, DLGAP5, KIF11, KIF20 A, TOP2A, and TPX2. They were prominently associated with pathological staging. Among them, KIF11 upregulation was in relation to undesirable prognosis of ESCA patients. Conclusion. Collectively, we determined ESCA-specific coexpression modules and hub genes, which offered the foundation for future research concerning the mechanistic basis of ESCA.

1. Introduction

Esophageal cancer (ESCA) ranks the eighth major cancer type as well as the sixth major cause of cancer-relevant deaths across the globe [1]. Tobacco and alcohol consumption are the main environmental risk factors of ESCA. The five-year survival rate is nearly 15% [2]. It mainly contains two histological subtypes: esophageal squamous cell carcinoma (approximately 90%) and esophageal adenocarcinoma (around 10%) [3]. Patients’ advanced clinical presentation is linked to locally late and distant metastasis, which contributes to undesirable survival outcome. Additionally, because of tumor heterogeneity and acquired drug resistance, inherent resistance to radiotherapy and chemotherapy triggers therapeutic failure and unfavorable survival rate [2]. ESCA therapy depends upon patients’ and tumors’ features, especially the tumor, node, metastasis (TNM) staging system [4]. In the early stage, patients are suitable for endoscopic resection, while those in the advanced stage receive surgical resection, chemotherapy, chemoradiotherapy, or their combination [4]. For patients with unresectable ESCA, systemic chemotherapy is applied. Additionally, immunotherapy has emerged as a therapeutic option for advanced or metastatic patients [5]. Although the therapeutic options have been steadily increasing, the molecular mechanisms underlying ESCA remain indistinct.

The pathogenesis of ESCA is a multistep process, involving distinct stages until eventually cancers [6]. Hence, to focus on the molecular mechanisms underlying the initiation and progression of ESCA may assist uncover underlying diagnostic markers or treatment targets. Weighted gene coexpression network analysis (WGCNA) is a reliable systematic biological algorithm, which may emphasize coexpression genomic modules and effectively evaluate the interactions between coexpression modules and clinical phenotypes [7]. This algorithm has been widely utilized for discovering cancer-specific modules and hub genes like bladder cancer [7], hepatocellular carcinoma [8], and lung cancer [9]. Limited studies have applied the WGCNA method to uncover the pathogenesis of ESCA. For instance, Nangraj et al. identified hub genes shared between Barrett’s esophagus and esophageal adenocarcinoma through integrated analysis of protein-protein interaction (PPI) and WGCNA [10]. Through WGCNA, miR-92b-3p was determined as a pathogenic gene in ESCA [11]. Integrated analysis of WGCNA and network pharmacology deciphered the molecular mechanisms of compound Kushen injection in ESCA treatment [12]. Here, this study adopted the WGCNA algorithm for determining specific functional modules and genes in ESCA, offering the foundation for future research concerning the mechanistic basis of ESCA.

2. Materials and Methods

2.1. Data Collection and Preprocessing

The RNA-seq data of ESCA were retrieved from the Cancer Genome Atlas (TCGA) GDC Application Programming Interface. Gene expression profiling data (read counts) were processed and transformed into gene ID Ensembl (version 90). In total, 160 ESCA and 11 normal tissues were included. Microarray expression profiling of 30 ESCC tumors and adjacent normal tissues was curated from the GSE38129 dataset [13] in the Gene Expression Omnibus (GEO; https://www.ncbi.nlm.nih.gov/gds/) repository. This dataset was in accordance with the GPL571 platform ((HG-U133 A_2) Affymetrix Human Genome U133A 2.0 Array). The raw expression profiling was background-corrected and normalized by quantile utilizing the robust multiarray average (RMA) method.

2.2. Differential Expression Analysis

Differentially expressed genes (DEGs) were selected utilizing the linear models for microarray data (limma; version 3.50.0) package through comparison of the expression profiling between ESCA and normal tissues [14]. The matched values of gene symbols following the t-test were calculated, and adjusted value <0.05 and |log2fold-change|>1 were set as the selection criteria. The volcano and heatmap of the DEGs were drawn.

2.3. WGCNA

Coexpression networks were separately established in TCGA and GSE38129 datasets utilizing WGCNA package (version 1.69) [15]. The genes with the first 25% standard deviation were chosen as the input genes. For constructing a scale-free network, the optimal soft threshold power value (β; ranging from 1 to 20) was determined with the “pickSoftThreshold” function through calculation of the scale-free fit index. Pearson’s correlation matrix was conducted for evaluating the similarity among the pairwise genes utilizing the “cor” function. Thereafter, the adjacency was determined in accordance with β and Pearson’s correlation matrix utilizing the “TOMsimilarity” function. Meanwhile, the corresponding dissimilarity (dissTOM) was determined. The modules were segmented with a dynamic cut tree algorithm, and similar modules were merged into one. Module eigengenes (MEs) that were the first principal component of gene expression patterns within a specific module were identified for each module.

2.4. Identification of ESCA-Relevant Coexpression Models

In this study, the most crucial critical feature was tissue type that was designated as ESCA tumor and normal specimens. Pearson correlation between MEs and clinical feature was analyzed. Modules that possessed the strongest correlation coefficient were determined as the ESCA-relevant coexpression models. Module membership indicates the intramodule connectivity of any gene within a given module. The higher the absolute value of module membership, the higher the negative or positive correlation between the gene with the module eigengenes. Gene significance was utilized for incorporating external information to the coexpression network. The higher the absolute value of gene significance, the higher the biological significance of a gene for tissue type. ESCA-relevant genes within the ESCA-relevant coexpression models were determined in accordance with module membership >0.8 and gene significance >0.5.

2.5. Identification of ESCA-Specific DEGs

For achieving the intersection of DEGs and coexpressed genes, an online web tool (http://bioinfogp.cnb.csic.es/tools/venny/index.html) was adopted for plotting Venn diagram.

2.6. Function Enrichment Analysis

Functional annotation of ESCA-specific DEGs was presented with the clusterProfiler package (version 4.2.0), containing Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis [16]. GO terms comprised of the biological process (BP), cellular component (CC), and molecular function (MF).

2.7. Protein-Protein Interaction (PPI) Analysis

The PPI network of ESCA-specific DEGs was conducted on the basis of the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING; version 11.0; https://string-db.org) online tool [17]. The CytoHubba plugin [18] of Cytoscape software (version 3.7.2) [19] was adopted for selecting the hub genes within the PPI network [18]. Herein, the first 10 genes were determined as hub genes.

2.8. Survival Analysis

In accordance with the optimal cutoff value determined by survival package, ESCA patients were stratified into high and low expression groups of the 10 ESCA-specific hub genes. Kaplan–Meier curves of overall survival were conducted between groups, and log-rank tests were utilized for comparing the survival differences.

2.9. Statistical Analysis

All the analyses in this study were implemented utilizing R software (version 3.5.1). Student’s t test or Wilcoxon test was adopted for comparisons between groups. Spearman correlation analysis was carried out to evaluate the interactions of the 10 ESCA-specific hub genes with pathological staging of ESCA patients. value<0.05 indicated the statistical significance.

3. Results

3.1. Exploration of DEGs in ESCA

For investigating the genetic alterations during the progression from normal to ESCA, we conducted differential expression analysis between ESCA tumors and normal tissues both in TCGA and GSE38129 datasets. In TCGA cohort, compared with 11 normal tissues, 1221 genes presented remarkable downregulation while 1169 genes displayed prominent upregulation in 160 ESCA tumors in accordance with adjusted value <0.05 and |log2fold-change|>1 (Figures 1(a) and 1(b); Supplementary Table 1). With the same selection criteria, in the GSE38129 dataset, we determined 360 upregulated and 376 downregulated genes in 30 ESCA tumors in comparison to 30 nontumor tissues (Figures 1(c) and 1(d); Supplementary Table 2).

3.2. Establishment of a Coexpression Network and Discovery of ESCA-Specific Coexpression Module in the TCGA Dataset

We first curated gene expression matrix of ESCA patients from TCGA cohort and chose the genes with the top 25% variances for subsequent analysis. No outlier sample was found, and we conducted a sample clustering tree, as shown in Figure 2(a). Thereafter, the soft threshold power value was set as 10 (scale-free topology R2 = 0.90) for constructing a scale-free network (Figure 2(b)). The adjacency matrix and the topological overlap matrix were separately developed. In total, 9 coexpression modules were clustered in accordance with the average hierarchical clustering and dynamic cutting tree (Figure 2(c)). The association of coexpression modules with clinical trait was analyzed. In Figure 2(d), the yellow module displayed the strongest correlation to tissue type, indicating that this module was strongly linked to ESCA progression. In line with module membership >0.8 and gene significance >0.5, we determined ESCA-specific genes (Figures 2(e) and 2(f)).

3.3. Development of a Coexpression Network and Discovery of ESCA-Specific Coexpression Module in the GSE38129 Cohort

The coexpression network was also developed in the GSE38129 dataset. In accordance with the mRNA expression matrix, we selected the genes with the top 25% variances. As shown in Figure 3(a), there was no outlier sample among 30 paired ESCA tumors and nontumors. Afterwards, we established a scale-free network in line with the soft threshold power value = 20 (scale-free topology R2 = 0.90; Figure 3(b)). Following construction of the adjacency matrix and the topological overlap matrix, we determined 7 coexpression modules on the basis of the average hierarchical clustering and dynamic cutting tree (Figure 3(c)). In Figure 3(d), the turquoise module presented the strongest association with tissue type, demonstrating that this module was strongly linked to ESCA progression. Following module membership >0.8 and gene significance >0.5, ESCA-specific genes were determined (Figures 3(e) and 3(f)).

3.4. Identification of ESCA-Specific DEGs and Their Biological Significance

For determining ESCA-specific DEGs, we intersected the DEGs and the ESCA-specific genes in TCGA and GSE38129 cohorts. As a result, 91 ESCA-specific DEGs were finally identified (Figure 4(a) and Table 1). Their biological significance was further evaluated through GO and KEGG enrichment analysis. In Figure 4(b) and Table 2, we noted that the ESCA-specific DEGs were remarkably linked to cell cycle progression like chromosome segregation, nuclear division, mitotic nuclear division, and sister chromatid segregation. Additionally, the ESCA-specific DEGs were in relation to ESCA progression-relevant KEGG pathways like cell cycle, DNA replication, cellular senescence, base excision repair, mismatch repair, p53 signaling pathway, homologous recombination, nucleotide excision repair, and apoptosis (Figure 4(c) and Table 3).

3.5. Establishment of a PPI Network and Discovery of ESCA-Specific Hub Genes

For uncovering the interactions of the ESCA-specific DEGs, we conducted a PPI network in accordance with the STRING online tool. As shown in Figure 5(a), there were close interactions of proteins derived from the ESCA-specific DEGs. Utilizing CytoHubba plugin, we further determined the 10 ESCA-specific hub genes among them, containing TOP2A (score = 4.45 E + 23), ASPM (score = 4.45 E + 23), CDK1 (score = 4.45 E + 23), CDC20 (score = 4.45 E + 23), CCNA2 (score = 4.45 E + 23), KIF20 A (score = 4.45 E + 23), KIF11 (score = 4.45 E + 23), DLGAP5 (score = 4.45 E + 23), TPX2 (score = 4.45 E + 23), and BUB1B (score = 4.45 E + 23; Figure 5(b)). These ESCA-specific hub genes might exert crucial roles in ESCA progression.

3.6. Association of the ESCA-Specific Hub Genes with Pathological Staging of ESCA

Further analysis was carried out for evaluating the associations of the ten ESCA-specific hub genes with diverse pathological staging of ESCA patients in TCGA cohort. Our results demonstrated that ASPM, BUB1B, CCNA2, CDC20, CDK1, DLGAP5, KIF11, KIF20 A, TOP2A, and TPX2 presented the different expression in diverse pathological stages across ESCA patients (Figures 6(a)6(j)). This indicated that the 10 ESCA-specific hub genes were remarkably linked to pathological staging of ESCA.

3.7. Association of the ESCA-Specific Hub Genes with ESCA Patients’ Prognosis

In accordance with the optimal cutoff value of the expression of the ESCA-specific hub genes, we stratified ESCA patients in TCGA cohort into high and expression groups of ASPM, BUB1B, CCNA2, CDC20, CDK1, DLGAP5, KIF11, KIF20A, TOP2A, and TPX2 (Figures 7(a)7(j)). Among them, we noted that ESCA patients in the high expression of the KIF11 group presented more undesirable overall survival outcome in comparison to those in the low expression of the KIF11 group.

4. Discussion

High-throughput sequencing technologies have improved our understanding about the heterogeneity and molecular basis underlying ESCA. At present, available biomarkers for prediction of ESCA patients’ survival outcome remain nonsufficiently sensitive and specific. Hence, this study was conducted for discovering novel biomarkers for efficiently predicting ESCA patients’ prognosis through the WGCNA algorithm, eventually lowering patients’ morbidity and mortality.

Combining the DEGs and ESCA-specific genes in TCGA and GSE38129 cohorts, we determined 91 ESCA-specific DEGs. Our functional enrichment analyses uncovered that the ESCA-specific DEGs were remarkably linked to cell cycle progression and carcinogenic pathways like the p53 signaling pathway, cellular senescence, and apoptosis. This indicated that the ESCA-specific DEGs exerted crucial roles in ESCA progression. Additionally, there were prominent interactions between proteins derived from the ESCA-specific DEGs in accordance with the PPI network. Among them, the 10 ESCA-specific hub genes were finally determined, containing ASPM, BUB1B, CCNA2, CDC20, CDK1, DLGAP5, KIF11, KIF20A, TOP2A, and TPX2.

The tumorigenic roles of ASPM have been proposed in diverse cancer types. For instance, ASPM triggers prostate carcinoma stemness and progression through enhancing the Wnt-Dvl-3-beta-catenin pathway [20]. It is predictive of undesirable prognosis and modulates cellular proliferation in bladder carcinoma [21]. Its upregulation accelerates glioblastoma growth through modulating G1 restriction point progression as well as the Wnt-beta-catenin pathway [22]. Aberrantly expressed ASPM regulated by transcriptional factor FoxM1 triggers the malignant progression of gliomas [23]. Additionally, it is linked to poor survival outcome as well as induces carcinogenesis in diffuse large B cell lymphoma [24]. Abnormally expressed ASPM induces the progression of lung squamous cell carcinoma through modulating CDK4 [25]. Increasing evidences demonstrate the crucial role of ASPM in cancer progression. For example, BUB1B accelerates prostate carcinoma proliferation through transcriptionally modulating MELK [26]. It triggers hepatocellular carcinoma development through activating mTORC1 signaling [27]. It can facilitate extrahepatic cholangiocarcinoma development through JNK/c-Jun signaling [28]. Moreover, it participates in the tumorigenicity and radioresistance of glioblastoma [29]. For CCNA2, it can be suppressed by miR-219-5p, thereby affecting cellular proliferation and cell cycle progression in ESCA [30]. A previous study has proposed that CDC20 modulates E2F1 degradation and thymidylate synthase expression, thereby triggering ESCA chemoresistance [31]. Furthermore, CDK1 has been considered as an underlying diagnostic and cancer progression biomarker as well as a drug target for ESCA [32]. Previous bioinformatics and experimental evidences have demonstrated the tumorigenic role of DLGAP5 in ESCA [33]. KIF11 is essential for spheroid formation of ESCA cells [34]. ScRNA-seq and qPCR analysis uncovered that KIF20A possesses the potential to diagnose and predict ESCA patients’ prognosis [35]. For TOP2A, experimental data demonstrate that it can affect the resistance of ESCA cells to paclitaxel [36]. Targeting TPX2 relieves ESCA progression through weakening tumor growth and invasion [37, 38]. Additionally, its upregulation is mediated by LINC00337 and triggers autophagy and resistance to cisplatin in ESCA cells [39]. On the basis of previously published literature and our findings, ASPM, BUB1B, CCNA2, CDC20, CDK1, DLGAP5, KIF11, KIF20A, TOP2A, and TPX2 play crucial roles in ESCA progression.

Currently, approach of predicting ESCA patients’ prognosis primarily depends on the conventional TNM staging system. Although conventional TNM staging is crucial for diagnosis and treatment interventions, it cannot roundly uncover the intrinsic biological processes and pathological development due to the high heterogeneity in tumor microenvironment and individual discrepancy. Our results demonstrated that the 10 ESCA-specific hub genes (ASPM, BUB1B, CCNA2, CDC20, CDK1, DLGAP5, KIF11, KIF20A, TOP2A, and TPX2) presented the remarkable associations with pathological staging, indicating that their roles in ESCA progression. Among them, KIF11 upregulation was indicative of an unfavorable survival outcome of ESCA patients, indicative of the potential of KIF11 as a prognostic indicator of ESCA.

However, there are certain drawbacks in our study. First, the influence of expression alteration of the ESCA-specific hub genes upon patients’ prognosis remains to be explored. Hence, in our future, the interactions of the ESCA-specific hub genes with patients’ prognosis will be monitored and verified in the large-scale clinical data. Additionally, it is of importance to consider statistical bias because the sample size is relatively small. Moreover, in-depth investigation will be presented for validating the biological significance of the ESCA-specific hub genes through in vitro and in vivo experiments.

5. Conclusion

Overall, this study determined the 10 ESCA-specific hub genes as novel markers for ESCA with the WGCNA algorithm based on distinct datasets, which offered promising targets for ESCA precision medicine. Nevertheless, in-depth exploration is required for validating the biological function of the specific hub genes in large‐scale clinical cohorts.

Abbreviations

ESCA:Esophageal cancer
TNM:Tumor, node, metastasis
WGCNA:Weighted gene coexpression network analysis
TCGA:The Cancer Genome Atlas
GEO:Gene Expression Omnibus
DEGs:Differentially expressed genes
GO:Gene Ontology
KEGG:Kyoto Encyclopedia of Genes and Genomes
BP:Biological process
CC:Cellular component
MF:Molecular function
PPI:Protein-protein interaction
STRING:Search Tool for the Retrieval of Interacting Genes/Proteins.

Data Availability

The data used to support the findings of this study are included within the supplementary information files.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Authors’ Contributions

Wei Xu and Jian Xu contributed equally to this work.

Acknowledgments

This work was funded by Chongqing Medical Scientific Research Project (Joint Project of Chongqing Health Commission and Science and Technology Bureau) (2021MSXM287) and The Planning Decision Consultation Project of Shapingba Science and Technology Bureau (Jcd202018).

Supplementary Materials

Supplementary Table 1. The detailed information of DEGs between ESCA tumors and normal tissues in TCGA cohort. Supplementary Table 2. The detailed information of DEGs between ESCA tumors and normal tissues in the GSE38129 dataset. . (Supplementary Materials)