Integrated Bioinformatics Analysis for Identifying the Significant Genes as Poor Prognostic Markers in Gastric Adenocarcinoma
Gastric adenocarcinoma (GAC) is the most common histological type of gastric cancer and imposes a considerable health burden globally. The purpose of this study was to identify significant genes and key pathways participated in the initiation and progression of GAC. Four datasets (GSE13911, GSE19826, GSE54129, and GSE79973) including 171 GAC and 77 normal tissues from Gene Expression Omnibus (GEO) database were collected and analyzed. Through integrated bioinformatics analysis, we obtained 69 commonly differentially expressed genes (DEGs) among the four datasets, including 20 upregulated and 49 downregulated genes. The prime module in protein-protein interaction network of DEGs, including ADAMTS2, COL10A1, COL1A1, COL1A2, COL8A1, BGN, and SPP1, was enriched in protein digestion and absorption, ECM-receptor interaction, focal adhesion, PI3K-Akt signaling pathway, and amoebiasis. Furthermore, expression and survival analysis found that all seven hub genes were highly expressed in GAC tissues and 6 of them (except for SPP1) were able to predict poor prognosis of GAC. Finally, we verified the 6 high-expressed hub genes in GAC tissues via immunohistochemistry, Western blot, and RNA quantification analysis. Altogether, we identified six significantly upregulated DEGs as poor prognostic markers in GAC based on integrated bioinformatical methods, which could be potential molecular markers and therapeutic targets for GAC patients.
Gastric adenocarcinoma (GAC), the predominant histological type of gastric cancer, is the fifth most common cancer ranked after lung, breast, colorectal, and prostate cancers [1, 2]. GAC, also known as stomach adenocarcinoma (STAD), has increased more than 1,000,000 new cases and led to deaths of more than 768,000 people worldwide in 2020 . Although improvements in endoscopic, surgical, and systemic treatments have been made for decades, the mortality rate of GAC is still high and the global 5-year survival rates remain unsatisfactory [1, 4]. Thus, GAC still imposes a considerable health burden globally.
Although the global 5-year survival rates are relatively low, the rates in Japan and South Korea are far more optimistic [5, 6], owing to early detection and screening efforts in these Asian countries . Furthermore, it is reported that the 5-year survival rate of early-stage T1 GAC (according to the TNM classification of malignant tumors) is ∼95%, while advanced-stage GAC (which cannot be surgically treated) has a median survival of ∼9-10 months [8, 9], which further emphasizes the critical importance of early detection and diagnosis.
Molecular markers are vital for early detection of cancer [10–12]. To date, several biomarkers have been used for the diagnosis and determination of the clinical stage of GAC. Among them, carcinoembryonic antigen (CEA), carbohydrate antigen 19-9 (CA19-9), and erb-b2 receptor tyrosine kinase 2 (HER2) are the most frequently used biomarkers for GAC in clinical setting [13, 14]. However, due to the insufficient specificity and sensitivity of the current markers, novel specific and sensitive molecular markers are still on urgent demand, especially in the field of early diagnosis and prognosis [13–15]. Bioinformatics analysis is a powerful and comprehensive tool for analyzing gene expression data from multiple datasets, which is perfect for excavating the potential molecular markers laid in Gene Expression Omnibus (GEO) and The Cancer Genome Atlas (TCGA) datasets. Therefore, in the current study, we mainly focused on exploring the commonly differential expressed genes among different GEO datasets. Gene ontology (GO) and KEGG enrichment analysis were further conducted to identify the hub genes and key pathways enriched in the commonly DEGs. Protein-protein interaction (PPI) network of the DEGs was constructed, and core genes were determined via the Cytoscape Molecular Complex Detection (MCODE). In addition, DAVID, GEPIA, and Kaplan–Meier plotter were applied to re-analyze the expression and survival information of the core genes, respectively [16, 17]. Finally, immunohistochemistry, Western blot, and RNA quantification analysis were performed to validate the expressions of the identified genes in GAC tissue samples.
2. Materials and Methods
2.1. Microarray Data Information
NCBI-GEO is a free public database and provides us with gene expression profile of numerous cancers. The following criteria were used to screen the datasets and ensure relevant data were recorded: (I) the sample includes gastric adenocarcinoma and normal tissues; (II) the study type is expression profiling by array; (III) the species is limited to Homo sapiens; (IV) access to raw data is allowed. We obtained the gene expression profiles of GSE13911, GSE19826, GSE54129, and GSE79973 in gastric adenocarcinoma and paired normal tissues. Microarray data of GSE13911, GSE19826, GSE54129, and GSE79973 were all on account of GPL570 platforms ([HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array) which included a total of 171 GAC tissues and 77 normal tissues.
2.2. DEGs Identification
Background correction and normalization were conducted through robust multi-array average (RMA) and Microarray Suite (MAS) approach. The GEO2R online tools  were used to identify DEGs between the GAC specimen and normal specimen with |log2FC| > 2 and an adjusted value <0.05 [16–18]. Then, the raw data were analyzed in Venn software online to identify the commonly DEGs among the original four datasets. The DEGs with log2FC > 0 were considered as upregulated genes, while the DEGs with log2FC ＜ 0 were considered as downregulated genes [16, 17].
2.3. GO and Pathway Enrichment Analysis
The functions and pathways enrichment of candidate DEGs were analyzed using DAVID (the Database for Annotation, Visualization and Integrated Discovery, https://david.ncifcrf.gov/) , which is an online bioinformatic tool with the function of integrating the GO and pathway enrichment analysis [20, 21]. Through DAVID, we identified the unique biological properties of the commonly DEGs and visualized the DEGs enrichment of molecular function (MF), cellular component (CC), biological process (BP), and Kyoto Encyclopedia of Gene and Genome (KEGG) pathways () [16, 17].
2.4. Integration of PPI Network and Modular Analysis
STRING (Search Tool for the Retrieval of Interacting Genes, https://cn.string-db.org/) online tool  was used to evaluate the PPI information of DEGs. Then, the STRING app in Cytoscape  was employed to examine the potential correlation between these DEGs (maximum number of interactors = 0 and confidence score ≥0.4). In addition, the MCODE app in Cytoscape was used to identify the modules and hub genes of the PPI network (degree cutoff = 2, max. depth = 100, k-core = 2, and node score cutoff = 0.2) [16, 17]. PPI network properties, such as node degree and betweenness centrality, were visualized by shape size and label font size, respectively.
2.5. Survival and RNA Sequencing Expression Analysis
Kaplan–Meier plotter is a common website tool (https://kmplot.com/), which contains considerable information among several cancers, including breast and gastric cancer . The survival analysis was conducted by Kaplan–Meier plotter, and the log rank value and hazard ratio (HR) with 95% confidence intervals were computed and shown on the plot. To validate these DEGs with significant expression pattern, we applied the GEPIA (Gene Expression Profiling Interactive Analysis, https://gepia.cancer-pku.cn/) website to analyze the data of RNA sequencing expression based on the GTEx projects and TCGA datasets .
2.6. Immunohistochemical (IHC) Staining
IHC staining was performed to detect the protein level of certain genes in GAC and normal human tissue samples and performed according to the standard protocols using following antibodies: anti-ADAMTS2 (bs-5858R, 1 : 500), anti-COL10A1 (BA2023, 1 : 400), anti-COL1A1 (BA0325, 1 : 400), anti-COL1A2 (BM4017, 1 : 100), anti-COL8A1 (bs-7529R, 1 : 500), and anti-BGN (bs-7552R, 1 : 500).
2.7. Western Blot
GAC and adjacent normal tissue samples were grinded and lysed with RIPA buffer supplemented with protease inhibitor cocktail. Protein concentrations of the extracts were measured with BCA assay. The Western blot analysis was performed according to the standard protocols using the above antibodies.
2.8. RNA Quantification
Total RNA was extracted from GAC and adjacent normal tissues with TRIzol reagent (Invitrogen) and reverse-transcribed using PrimeScript™ RT reagent kit (Takara). Quantitative real-time PCR analysis was performed on LightCycler (Roche) with TB Green® Premix Ex Taq™ II (Takara). Data were normalized to GAPDH expression. The primers used for real-time PCR were as follows: GAPDH (forward: 5′-GGA GCG AGA TCC CTC CAA AAT-3′, reverse: 5′-GGC TGT TGT CAT ACT TCT CAT GG-3′), ADAMTS2 (forward: 5′-GTG CAT GTG GTG TAT CGC C-3′, reverse: 5′-AGG ACC TCG ATG TTG TAG TCA-3′), COL10A1 (forward: 5′-CAT AAA AGG CCC ACT ACC CAA C-3′, reverse: 5′-ACC TTG CTC TCC TCT TAC TGC-3′), COL1A1 (forward: 5′-GAG GGC CAA GAC GAA GAC ATC-3′, reverse: 5′-CAG ATC ACG TCA TCG CAC AAC-3′), COL1A2 (forward: 5′-GGC CCT CAA GGT TTC CAA GG-3′, reverse: 5′-CAC CCT GTG GTC CAA CAA CTC-3′), COL8A1 (forward: 5′-GCT GCC ACC TCA AAT TCC TC-3′, reverse: 5′-CTT TCT TGG GTA CGG CTT CCT-3′), and BGN (forward: 5′-GAG ACC CTG AAT GAA CTC CAC C-3′, reverse: 5′-CTC CCG TTC TCG ATC ATC CTG-3′).
3.1. Identification of DEGs in Gastric Adenocarcinoma
GEO2R online tool was used to determine the DEGs from GSE13911, GSE19826, GSE54129, and GSE79973, which resulted in 484, 388, 971, and 524 DEGs, respectively (Figure 1(a)) (|log2FC| ＞ 2 and adjust value ＜0.05). Then, the commonly DEGs among the above four datasets were identified by Venn diagram software. Results showed that a total of 69 commonly DEGs were identified, including 20 upregulated genes (log2FC ＞ 2) and 49 downregulated genes (log2FC ＜ −2) in GAC tissues (Figures 1(b) and 1(c) and Table 1).
3.2. GO and KEGG Analysis of DEGs in Gastric Adenocarcinoma
In order to examine the biological properties of the 69 DEGs, DAVID software was applied to conduct GO and KEGG analysis. Results of GO analysis indicated that (1) for biological processes (BP), upregulated DEGs were particularly enriched in endodermal cell differentiation, collagen fibril organization, protein heterotrimerization, skin morphogenesis, and cell adhesion, and downregulated DEGs in digestion, potassium ion import, oxidation-reduction process, and bicarbonate transport (Figure 2(a)); (2) for cell component (CC), upregulated DEGs were significantly enriched in extracellular space, collagen trimer, collagen type I trimer, and proteinaceous extracellular matrix, and downregulated DEGs in extracellular space, lysosome, apical plasma membrane, integral component of plasma membrane, and integral component of membrane (Figure 2(b)); (3) for molecular function (MF), upregulated DEGs were enriched in extracellular matrix structural constituent and heparin binding, and downregulated DEGs in inward rectifier potassium channel activity, hydrogen:potassium-exchanging ATPase activity, and G-protein activated inward rectifier potassium channel activity (Figure 2(c) and Table 2, ).
Results of KEGG analysis showed that upregulated DEGs were particularly enriched in ECM-receptor interaction, focal adhesion, protein digestion and absorption, PI3K-Akt signaling pathway, amoebiasis, and platelet activation (Figure 3(a)), while downregulated DEGs in gastric acid secretion, retinol metabolism, chemical carcinogenesis, collecting duct acid secretion, glycolysis/gluconeogenesis, drug metabolism-cytochrome P450, metabolism of xenobiotics by cytochrome P450, and metabolic pathways (Figure 3(b) and Table 3, ).
3.3. PPI Network and Modular Analysis of DEGs
The DEGs PPI network complex was constructed via Cytoscape software. Results showed that 44 DEGs including 16 upregulated and 28 downregulated genes were enrolled, and 75 edges were formed (Figure 4(a)). There were 25 DEGs which were not included into the DEGs PPI network. Then, we applied Cytoscape MCODE to further analyze the prime module and ADAMTS2, COL10A1, COL1A1, COL1A2, COL8A1, BGN, and SPP1 were identified among the 44 nodes. Results also showed that the above seven hub nodes were all upregulated genes (Figure 4(b)).
3.4. Re-Analysis of Seven Hub Genes by KEGG Pathway Enrichment
To further understand the possible enriched pathways of the seven hub DEGs, KEGG pathway enrichment was re-analyzed via DAVID. Results showed that seven core genes were significantly enriched in several cancer-related pathways. In detail, COL1A2, COL1A1, and COL10A1 were enriched in protein digestion and absorption; COL1A2, COL1A1, and SPP1 were enriched in ECM-receptor interaction, focal adhesion, and PI3K-Akt signaling pathway; COL1A2 and COL1A1 were further enriched in amoebiasis (Figure 5 and Table 4, ).
3.5. Analysis of Hub Genes via the GEPIA and Kaplan–Meier Plotter
To further validate the significance of the seven central genes, GEPIA and Kaplan–Meier plotter online tools were utilized to identify the expression data and survival data, respectively. GEPIA expression data showed that all seven hub genes were highly expressed in GAC tissues compared to normal tissues (Figure 6 and Table 5, ). Kaplan–Meier plotter survival data showed that high expression of ADAMTS2, COL10A1, COL1A1, COL1A2, COL8A1, and BGN had a significantly worse survival probability, while high expression of SPP1 showed no effect on patient survival (Figure 7 and Table 6, ).
3.6. Validation of the Expression Levels of Six Core Genes in GAC Patients
Finally, we detected the expression levels of the above six genes in GAC specimens and adjacent normal specimens by immunohistochemistry (Figure 8(a)), Western blot (Figure 8(b)), and RNA quantification (Figure 8(c)) analysis. Results showed that ADAMTS2, COL10A1, COL1A1, COL1A2, COL8A1, and BGN were highly expressed in GAC tissues compared to adjacent normal tissues (Figure 8), consistent with the GEPIA expression data.
Gastric adenocarcinoma is a lethal malignance cancer. In this study, we applied bioinformatical methods on the basis of four gene expression profile datasets to identify more useful prognostic molecular markers in GAC. A total of 171 GAC specimens and 77 normal specimens were enrolled. First, we revealed a total of 69 commonly DEGs via GEO2R and Venn software (|log2FC| ＞ 2 and adjust value ＜0.05), including 20 upregulated and 49 downregulated DEGs. Second, GO and KEGG pathway enrichment analysis showed that 20 upregulated genes enriched in endodermal cell differentiation, protein heterotrimerization, ECM-receptor interaction, focal adhesion, protein digestion and absorption, PI3K-Akt signaling pathway, amoebiasis, and platelet activation, while 49 downregulated genes enriched in digestion, potassium ion import, oxidation-reduction process, bicarbonate transport, inward rectifier potassium channel activity, hydrogen:potassium-exchanging ATPase activity, gastric acid secretion, retinol metabolism, and metabolic pathways (). Third, DEGs PPI network complex of 44 nodes and 75 edges was constructed via Cytoscape software and prime module analysis identified 7 hub genes (ADAMTS2, COL10A1, COL1A1, COL1A2, COL8A1, BGN, and SPP1), which were all upregulated genes and were significantly enriched in several cancer-related pathways. Furthermore, GEPIA analysis showed that all the seven hub genes were highly expressed in GAC tissues (). In addition, Kaplan–Meier plotter analysis showed that high expression of ADAMTS2, COL10A1, COL1A1, COL1A2, COL8A1, and BGN had a significantly worse survival probability (P < 0.05), while SPP1 had no significance (). Finally, the 6 highly expressed core genes were validated via immunohistochemistry, Western blot, and RNA quantification analysis in tissue samples. Altogether, we identified six significant upregulated genes as poor prognosis markers in gastric adenocarcinoma via bioinformatical analysis, which could be potential new molecular markers and effective targets for early detection and further research.
The hub genes in the main module of the PPI network of the commonly DEGs are mainly associated with protein digestion and absorption, ECM-receptor interaction, focal adhesion, PI3K-Akt signaling pathway, and amoebiasis. The family of collagen genes (CLO10A1, COL1A1, COL1A2, etc.) is tightly clustered and participates in the above cancer-related pathways. Furthermore, studies have demonstrated the close relation between collagen genes and gastric adenocarcinoma, including COL10A1, COL1A1, COL1A2, and COL8A1. What’s more, it is well known that PI3K-Akt signaling pathway (COL1A2, COL1A1, etc.) plays a vital role in the cell cycle and is activated in various cancers, including GAC . For ADAMTS2, a member of the ADAMTS family is a procollagen N-proteinase . Researches have shown that ADAMTS2 participated in major biological pathways and human disorders , but the relation between ADAMTS2 and GAC has rarely been studied . Furthermore, BGN, a key member of the small leucine-rich proteoglycan family, has been shown to participate in many cancers and is associated with poor prognosis in cancer patients, including gastric adenocarcinoma . The results and related studies have provided solid evidence to prove the relation between the hub genes along with the enriched pathways and GAC.
Expression and survival analysis have demonstrated that ADAMTS2, COL10A1, COL1A1, COL1A2, COL8A1, and BGN are all highly expressed in GAC and their high expression has a significantly worse survival. Previous studies have also showed that the abnormal expression level of the six hub genes could be indicators of the initiation, progression, and clinical outcome of GAC. Till now, little is known about the exact mechanism of the six genes in GAC initiation and progression. In our study, we have provided more helpful information and direction for the future study of GAC via integrated bioinformatical methods, which would be new perspective and clues for early detection and diagnosis of GAC.
Altogether, our bioinformatics analysis study identified six upregulated DEGs (ADAMTS2, COL10A1, COL1A1, COL1A2, COL8A1, and BGN) between gastric adenocarcinoma and normal tissues based on four different microarray datasets. Results showed that these six genes were poor prognostic markers, which may play key roles in the initiation and progression of GAC. These data presented in this study may provide new perspectives and clues into the early detection and therapeutic targets of GAC. However, more experiments and details are needed to verify the prediction and underlying mechanisms in the near future.
|GEO:||Gene Expression Omnibus|
|DEGs:||Differentially expressed genes|
|DAVID:||The Database for Annotation, Visualization and Integrated Discovery|
|KEGG pathways:||Kyoto Encyclopedia of Gene and Genome pathways|
|STRING:||Search Tool for the Retrieval of Interacting Genes|
|MCODE:||Molecular Complex Detection|
|GEPIA:||Gene Expression Profiling Interactive Analysis|
|TCGA:||The Cancer Genome Atlas|
|ADAMTS2:||ADAM metallopeptidase with thrombospondin type 1 motif 2|
|COL10A1:||Collagen type X alpha 1 chain|
|COL1A1:||Collagen type I alpha 1 chain|
|COL1A2:||Collagen type I alpha 2 chain|
|COL8A1:||Collagen type VIII alpha 1 chain|
|SPP1:||Secreted phosphoprotein 1.|
The dataset supporting our findings is available at the following website: https://www.ncbi.nlm.nih.gov/geo/. All data generated or analyzed during this study are available from the corresponding author upon reasonable request.
This study was approved by the ethics committee of the General Hospital of Western Theater Command (APPROVAL NUMBER/ID: 2021ky141-1), and informed consent was exempted.
Conflicts of Interest
The authors have declared that there are no conflicts of interest.
Yamei Li and Yan Luo contributed equally to this work.
This study was supported by the Key Research and Development Project of Sichuan Province (Grant no. 2022YFS0259, 2022YFS0332) and General Hospital of Western Theater Command (Grant no. 2021-XZYG-C01).
H. Sung, J. Ferlay, R. Siegel et al., “Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries,” CA: A Cancer Journal for Clinicians, vol. 71, no. 3, pp. 209–249, 2021.View at: Google Scholar