Screening Driving Transcription Factors in the Processing of Gastric Cancer
Background. Construction of the transcriptional regulatory network can provide additional clues on the regulatory mechanisms and therapeutic applications in gastric cancer. Methods. Gene expression profiles of gastric cancer were downloaded from GEO database for integrated analysis. All of DEGs were analyzed by GO enrichment and KEGG pathway enrichment. Transcription factors were further identified and then a global transcriptional regulatory network was constructed. Results. By integrated analysis of the six eligible datasets (340 cases and 43 controls), a bunch of 2327 DEGs were identified, including 2100 upregulated and 227 downregulated DEGs. Functional enrichment analysis of DEGs showed that digestion was a significantly enriched GO term for biological process. Moreover, there were two important enriched KEGG pathways: cell cycle and homologous recombination. Furthermore, a total of 70 differentially expressed TFs were identified and the transcriptional regulatory network was constructed, which consisted of 566 TF-target interactions. The top ten TFs regulating most downstream target genes were BRCA1, ARID3A, EHF, SOX10, ZNF263, FOXL1, FEV, GATA3, FOXC1, and FOXD1. Most of them were involved in the carcinogenesis of gastric cancer. Conclusion. The transcriptional regulatory network can help researchers to further clarify the underlying regulatory mechanisms of gastric cancer tumorigenesis.
As one of the most common malignant tumors, gastric cancer is the third cause of cancer-related mortality worldwide, which is mainly related to late presentation. Its incidence is affected by various genetic and environmental factors, reflecting a characteristic geographical distribution. Eastern Asia, Central and Eastern Europe, and South America are higher-risk areas, whereas Northern America and most parts of Africa are low-risk areas . Therefore, it is urgent to uncover the underling regulatory mechanism of gastric cancer tumorigenesis and identify the useful targets for early diagnosis and treatment.
Many researchers have devoted themselves to study the pathogenesis of gastric cancer and look for the potential targets for diagnosis and treatment. At present, several factors, such as HER2, VEGF, FEGFR, and mammalian target of rapamycin (mTOR), have been considered as targets of therapy for gastric cancer . By bioinformatics method, Jian and Chen suggested that two potentially critical transcription factors, E2F1 and STAT1, may play vital roles in progression of gastric cancer . Recently, the TF-miRNA coregulatory network was constructed, which provided the first evidence to illustrate that altered gene network was associated with gastric cancer invasion .
To date, there are still no definitive tools for the diagnosis of gastric carcinoma, due to the fact that regulatory mechanism of gastric cancer is not clarified. The integration of multiple microarray studies may be useful to provide additional evidence for understanding the regulatory mechanism. Herein, we conducted integrated analysis of gastric cancer microarray data and identified more candidate differentially expressed genes (DEGs) between gastric cancer and normal control tissues. Moreover, the significantly enriched functions of these genes were screened and analyzed to discover the biological processes and signaling pathways associated with gastric cancer. A transcriptional regulatory network was further constructed.
2. Materials and Methods
2.1. Microarray Data
Gene Expression Omnibus (GEO) database is a public functional genomics data repository (http://www.ncbi.nlm.nih.gov/geo/) . The following key words were used: (“gastric cancer” [MeSH Terms] OR gastric cancer [All Fields]) AND “Homo sapiens” [porgn] AND “gse” [Filter]. The study type was defined as “expression profiling by array.” All the expression profiles were measured using the platform of Affymetrix Human Genome U133 Plus 2.0 Array. All the cancer and normal adjacent gastric tissues were obtained by resection during surgery and immediately frozen in liquid nitrogen.
2.2. Identifying DEGs by Information Theoretic Analysis
Firstly, the six datasets were preprocessed by background correction and normalization. Limma package  is the most popular method for the analysis of DEGs, and the gastric cancer and normal samples were compared using Limma package in order to identify the DEGs between the two tissue types. value was determined by R software using the two-tailed Student’s -test , and the further false discovery rate (FDR) was further calculated. The gene with FDR < 0.01 was considered to indicate a DEG.
2.3. Functional Annotation of DEGs
In order to assess the changes in DEGs occurring at the cellular level and the functional clustering of DEGs, the enrichment analysis tool GeneCodis3 (http://genecodis.cnb.csic.es/analysis/) was used to uncover the biological meaning for groups of genes , including Gene Ontology (GO) categories  and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway annotation .
2.4. Screening the Target Sites of Potential Transcription Factors (TFs)
DEGs between gastric cancer and normal tissues could be activated or repressed by TFs. All the TFs in human genome and the motifs of genomic binding sites were downloaded from the TRANSFAC database . Moreover, the position weight matrix (PWM) was also downloaded for gene promoter scanning . The target sites of potential TFs were then identified. Combined with the DEGs obtained from integrated analysis, the differentially expressed targets were screened. Finally, the transcriptional regulatory network was constructed and visualized by Cytoscape software .
2.5. Online Validation of Differentially Expressed TFs
The online tool Cancer Browser (https://genome-cancer.ucsc.edu/proj/site/hgHeatmap/) was used to verify the expression of top ten differentially expressed TFs, which regulated the most downstream target genes. We selected the dataset of TCGA stomach adenocarcinoma (STAD) gene expression by RNAseq (Illumina HiSeq), in which 421 samples were enrolled, including 384 cases and 37 normal controls. The dataset ID was TCGA_STAD_exp_HiSeq. We input the names of top ten TFs in the “Genes” item on the top of screen and then clicked the “Go” button and the heat map would appear automatically, which represented the expression level for TFs in different samples.
3.1. Identification of DEGs in Gastric Cancer
According to the inclusion criteria, we downloaded six gene expression profiles of gastric cancer from microarray experiments. GEO IDs were GSE13911, GSE19826, GSE34942, GSE35809, GSE51105, and GSE57303. Totally, there were 340 tumor samples and 43 normal gastric tissues, respectively. The types of samples were as follows: GSE13911 (26 intestinal + 6 diffuse + 4 mixed + 2 unclassified), GSE19826 (unknown Lauren subtype), GSE34942 (39 intestinal + 11 diffuse + 6 unclassified), GSE35809 (34 intestinal + 30 diffuse + 6 unclassified), GSE51105 (49 intestinal + 35 diffuse + 10 mixed), and GSE57303 (Lauren subtype not further provided). The characteristics of eligible datasets were summarized in Table 1.
Integrated analysis of six microarray datasets led to 17481 genes. Using the FDR < 0.01 as the statistical significance threshold, a total of 2327 DEGs were identified, including 2100 upregulated DEGs and 227 downregulated DEGs. The top ten upregulated and downregulated DEGs between gastric cancer and normal tissues were listed in Table 2.
3.2. Functional Enrichment Analysis of DEGs
GO enrichment analysis of DEGs was performed to understand their biological functions. In our present study, the three GO categories (biological process, cellular component, and molecular function) were detected, respectively, using web-based software GeneCodis3. The results of enrichment analysis showed that the significantly enriched GO terms for biological process were multicellular organismal process (GO: 32501, FDR = ) and digestion (GO: 7586, FDR = ) (Table 3). Moreover, the extracellular space (GO: 5615, FDR = ) was the significantly enriched GO term for cellular component. Notably, the significantly enriched GO term for molecular functions was extracellular matrix structural constituent (GO: 0005201, FDR = ).
Moreover, the KEGG pathway enrichment analysis indicated that cell cycle (FDR = ) was significantly enriched (Table 4). Furthermore, several pathways were also significantly enriched which may be closely related to transcription and translation process, including spliceosome (FDR = ), RNA transport (FDR = ), ribosome biogenesis in eukaryotes (FDR = ), and homologous recombination (FDR = ).
3.3. Building Up TFs-Target Genes Regulatory Network for Gastric Cancer
In order to display the TFs-target genes regulatory network for gastric cancer, we utilized TRANSFAC to inquire TFs and their latent target genes and then selected the differentially expressed TFs and latent target genes in gastric cancer tissues. We found a total of 70 differentially expressed TFs (54 upregulated and 16 downregulated) and 470 latent differentially expressed target genes in gastric cancer, respectively (Table 5). And, based on them, the transcriptional regulatory network was subsequently constructed. In the network, there were 63 TFs (49 upregulated and 14 downregulated) and 566 TF-target interactions in the context of gastric cancer (Figure 1). No differentially expressed target genes were found for the other seven TFs. In the network, the top ten TFs regulating most downstream target genes were BRCA1, ARID3A, EHF, SOX10, ZNF263, FOXL1, FEV, GATA3, FOXC1, and FOXD1. The three hub TFs were BRCA1 (degree = 49), ARID3A (degree = 47), and EHF (degree = 42).
3.4. Online Validation of Differentially Expressed TFs
The top ten differentially expressed TFs were selected for validation. The online validation revealed that expression patterns of the top ten TFs were similar to the integrated analysis. The results revealed that SOX10 and FEV were downregulated, while BRCA1, ARID3A, EHF, ZNF263, FOXL1, GATA3, FOXC1, and FOXD1 were upregulated in primary gastric adenocarcinoma compared with the normal lung tissue (Figure 2).
Gastric cancer has few symptoms during the early stages, and most patients are usually diagnosed after the cancer has progressed to an advanced stage, which results in short survival times. Therefore, the high mortality rate underlines the need for early diagnosis and effective medical treatments for the patients . The transcriptional regulatory network may be helpful to understand the underlying regulatory mechanisms and provide additional evidence for therapeutic applications.
In this study, according to integrated analysis of six microarray datasets for gastric cancer, 2327 DEGs were identified (2100 upregulated and 227 downregulated). We also observed that digestion (GO: 7586, FDR = ) was a significantly enriched GO term for biological process. The pathway of homologous recombination was also significantly enriched, which is in accordance with the previous study where homologous recombination deficiency directly compromises the genomic stability and predisposes to cancer formation . Gastric cancer is a multistep and multifactorial process, in which the dynamic balance between the cell proliferation and apoptosis of gastric mucosa was broken. Tumor suppressor gene p53 can be repressed, excessively leading to the gastric epithelial cell proliferation and the apoptosis signal cannot be started. We found that DEGs were significantly enriched in p53 signaling pathway, and various pathways related to cell proliferation were also enriched, such as cell cycle, DNA replication, pyrimidine metabolism, purine metabolism, and mismatch repair. Our results suggested that the above pathways may drive the tumorigenesis of gastric cancer.
Moreover, 70 differentially expressed TFs were identified and a transcriptional regulatory network was constructed. In the network, top ten TFs regulating most downstream target genes were BRCA1, ARID3A, EHF, SOX10, ZNF263, FOXL1, FEV, GATA3, FOXC1, and FOXD1. Most of them were involved in the progression of gastric cancer.
BRCA1 is an important tumor suppressor, which plays an essential role in maintaining genomic stability and integrity. BRCA1 was previously suggested as a good prognostic factor for gastric cancer . It was reported that downregulation of BRCA1 nuclear expression was associated with advanced stage and perineural invasion in sporadic gastric cancer . The loss of BRCA1 expression may serve as a predictive factor for the progression of gastric cancer .
ARID3A is a member of the ARID family of DNA-binding proteins. The expression of ARID3A was markedly increased in colon cancer tissue compared with matched normal colonic mucosa. A previous study suggested that strong expression of ARID3A may predict a good prognosis in patients with colorectal carcinoma, and Song et al. mentioned that whether ARID3A acts as an oncogene or tumor suppressor remains controversial . In our study, we found that ARID3A was upregulated in gastric cancer compared with normal tissues. Therefore, we speculated that ARID3A may act as an oncogene in the development of gastric cancer.
Abnormalities of SOX factors have been shown to play critical roles in cancer formation and development. SOX10 was identified as a methylated gene in digestive cancers [20, 21]. It was also reported that SOX10 exhibits tumor suppressor activity by inducing tumor cell apoptosis, inhibiting invasion, and regulating cell epithelial to mesenchymal transition (EMT) in digestive cancers through suppressing Wnt/β-catenin signaling pathway . Consistent with that, our results also indicated that SOX10 was significantly downregulated in gastric cancer, implying that the reduction of SOX10 expression could be a good predictor for gastric cancer.
The expression of FOXC1 has significance in the development, progression, and metastasis of gastric cancer, and overexpression of FOXC1 may serve as a useful marker for predicting the outcome of patients with gastric cancer . Moreover, by comparative transcriptome analysis, Feng et al. found that FOXD1 was an important differentially expressed TF between metastatic gastric cancer and nonmetastatic gastric cancer . Our results showed that FOXC1 and FOXD1 were upregulated in gastric cancer compared with normal tissues, which provides additional evidence for their roles of potential biomarkers.
It was reported that FOXL1 was also upregulated in pancreatic intraepithelial neoplasia . Integration analysis of SNPs and gene expression profile revealed that FOXL1 regulated the most important DEGs of IRX1, SOX1, and MSX1 with risk associated SNP loci, which may serve as candidate biomarkers for diagnosis and prognosis of gastric cancer . By the FOX family member such as FOXL1, hedgehog signals can induce WNT5A upregulation, which is a cancer-associated gene involved in invasion and metastasis of gastric cancer . Another study reported that FOXL1 was the first mesenchymal Modifier of Min and plays a key role in gastrointestinal tumorigenesis . We found that FOXL1 was upregulated in gastric cancer compared with normal tissues, indicating that FOXL1 will be a powerful driver in the progression of gastric cancer.
The expression level of GATA3 was significantly increased in patients with gastric cancer . Another study reported that GATA3 plays an important role in tumor progression of gastric adenocarcinoma, and the downregulation of GATA3 is associated with unfavorable prognosis in primary gastric adenocarcinoma . Specifically, Keshari et al. found that the low GATA3 expression was associated with tumors with deeper invasion, higher lymph node metastatic status, cases with distant metastases, and a later TNM stage . In our study, GATA3 was upregulated in gastric cancer compared with the normal tissues, which may be partly related to the TNM stage of patients with gastric cancer. Our results suggested that GATA3 plays vital roles in different developmental stages of gastric cancer. Further functional experiments are necessary to better understand the function of GATA3 in gastric cancer.
Taken together, our integrated analysis discovered a bunch of DEGs in gastric cancer. Moreover, the results of function enrichment analysis revealed that some biological functions or pathways may be closely related to the development of gastric cancer, including digestion, cell cycle, and homologous recombination. The constructed transcriptional regulatory network may be helpful to further understand the underlying regulatory mechanism of gastric cancer. Ten TFs regulating most downstream target genes were obtained: BRCA1, ARID3A, EHF, SOX10, ZNF263, FOXL1, FEV, GATA3, FOXC1, and FOXD1.
|DEGs:||Differentially expressed genes|
|GEO:||Gene Expression Omnibus|
|KEGG:||Kyoto Encyclopedia of Genes and Genomes|
The authors declare that they have no competing interests.
R. Gentleman, V. Carey, W. Huber et al., Bioinformatics and Computational Biology Solutions Using R and Bioconductor, Springer Science & Business Media, 2006.