Colorectal cancer (CRC) is one of the most common and deadly malignancies in the world. In China, the morbidity rate of CRC has increased during the period 2000 to 2011. Biomarker detection for early CRC diagnosis can effectively reduce the mortality of patients with CRC. To explore the underlying mechanisms of effective biomarkers and identify more of them, we performed weighted correlation network analysis (WGCNA) on a GSE68468 dataset generated from 378 CRC tissue samples. We screened the gene set (module), which was significantly associated with CRC histology, and analyzed the hub genes. The key genes were identified by obtaining six colorectal raw data (i.e., GSE25070, GSE44076, GSE44861, GSE21510, GSE9348, and GSE21815) from the GEO database (https://www.ncbi.nlm.nih.gov/geo). The robust differentially expressed genes (DEGs) in all six datasets were calculated and obtained using the library “RobustRankAggreg” package in R 3.5.1. An integrated analysis of CRC based on the top 50 downregulated DEGs and hub genes in the red module from WGCNA was conducted, and the intersecting genes were screened. The Kaplan–Meier plot was further analyzed, and the genes associated with CRC prognosis based on patients from the TCGA database were determined. Finally, we validated the candidate gene in our clinical CRC specimens. We postulated that the candidate genes screened from the database and verified by our clinical pathological data may contribute to understanding the molecular mechanisms of tumorigenesis and may serve as potential biomarkers for CRC diagnosis and treatment.

1. Introduction

Colorectal cancer (CRC) is a malignant tumor that ranks third in terms of incidence and second in terms of mortality worldwide [1]. Similarly, the incidence and mortality of CRC rank fifth in China [2]. Despite dramatic reduction in the overall CRC incidence and mortality [3, 4], the morbidity rate in China is still rising from 2000 to 2011 [2]. Hence, further research is desperately needed to elucidate the causes for the increasing burden of CRC and to advance treatments for tumor subtypes with low response rates to current therapies. Treatment according to the distinctive tumor biology is an effective means to reduce the mortality of patients with CRC, as well as the detection of biomarkers that enable the stratification of patients with CRC into different prognostic subgroups and in relation therapeutic response [5]. Advances in RNA sequencing technologies and bioinformatics analysis provide novel potential biomarkers and drug targets for tumor treatment [6]. Weighted gene coexpression network analysis (WGCNA), as a systems biology algorithm, can enable the identification of highly coexpressed gene clusters (modules) [7]. Then, such interest modules and hub genes related to clinical traits can be screened out and used to identify candidate biomarkers [8]. The robust rank aggregation (RRA) method can be used to integrate multiple sets of chip data gene lists and to perform comprehensive reordering to find the most significant difference genes [9]. RRA prevents cross-platform standardization processing, and the number of samples per chip has no strict limit, which is of great significance for the effective evaluation of the results of different gene expression profiles [10].

In our study, we performed WGCNA on GSE68468 and screened out the key gene modules and hub genes significantly associated with the CRC histology. Additionally, we conducted RRA on six raw data (i.e., GSE25070, GSE44076, GSE44861, GSE21510, GSE9348, and GSE21815) and calculated the top 50 robust DEGs in all the six data by using the library “RobustRankAggreg” package. We analyzed the integrated genes between DEGs and hub genes in the key module by using Kaplan–Meier analysis in the TCGA database and obtained the candidate genes associated with OS. Finally, we validated the candidate gene in our clinical CRC specimens. The candidate genes screened from the database and verified using our clinical pathological data might have significant clinical implications for CRC diagnosis, treatment, and prognosis prediction.

2. Results

2.1. Weighted Gene Coexpression Network Construction and Module Detection

We performed WGCNA to find the highly correlated genes. The sample dendrogram and trait heatmap are shown in Figure S1A. As shown in Figure S1B, power value 4 was set to guarantee the high-scale independence and low mean connectivity of 13515 genes. We set the dissimilarity as 0.25 to merge similar modules (Figure S1C), and 22 modules were generated (Figure 1(a)). Furthermore, the interaction relationship network of 22 modules was plotted (Figure 1(b)). From those obtained modules, the red module had the deepest association with tumor histology (cor = −0.76, ), which was selected for further analysis (Figure 1(c)). Additionally, the module memberships in the red module and the gene significance had a high correlation (0.89) and a high value (<1e − 200) (Figure 1(d)), suggesting its suitability for identifying the hub genes associated with CRC occurrence and metastasis.

2.2. Coexpression Network Construction and Hub Gene Identification

In our study, we obtained 616 genes in the red modules. The hub genes in the red module were filtered by a condition of the weight value being greater than 0.15, and a total of 37 edges were obtained and are visualized in Figure 2. The top 10 hub genes were CA2, MS4A12, DHRS11, GUCA2A, ETHE1, CLCA4, TSPAN1, HSD11B2, AQP8, and CHP2.

2.3. RRA Analysis

We calculated DEGs expressed between the cancerous and adjacent tissues in each dataset and displayed the results by using volcano maps (Figures S2af). Then, DEGs in all six data were recalculated and reorganized using the library “RobustRankAggreg” package. A total of 464 robust DEGs were identified, including 176 upregulated genes and 288 downregulated genes (Table 1), by using and |logFC| ≥ 1 as the cutoff criteria. The 50 most prominent up- and downregulated genes were screened and are visualized in Figures 3(a) and 3(b).

2.4. Survival Analysis in the TCGA Database and Validation in the GEO Database

By taking the intersection of the 50 downregulated DEGs and hub genes in the red module from WGCNA (Figure 4(a)), we obtained 19 interacting genes.

K-M analysis was conducted to evaluate the relationship between gene expression and the overall survival (OS) of CRC, and only GUCA2A was clearly related to the OS of CRC patients in the TCGA database. Patients with a lower GUCA2A expression had a significantly shorter OS than those with a higher expression () (Figure 4(b)). Obviously, GUCA2A was abnormally expressed in tumor tissues and was significantly different in TCGA and GEO databases (Figures 4(c) and 4(d)). We further validated the aberrant expression of GUCA2A in GSE68468, which contains both CRC tissues and cellular RNA-Seq data. Compared with adjacent normal tissues, the expression of GUCA2A in tumor and metastatic tissues was significantly low, and the normal liver and lung tissues had the lowest expression value () (Figure 4(e)). The colonic epithelial cell (NCM460) has the highest GUCA2A expression compared with other CRC cells (Figure 4(f)).

2.5. Human Tissue Samples and Quantitative Real-Time PCR

We performed real-time PCR to examine the levels of GUCA2A in 31 paired CRC/adjacent tissues to further validate the dysregulated expression of GUCA2A (Figure 4(g)). Then, we further evaluated the diagnostic value of GUCA2A for CRC tissues and normal counterparts using ROC curve analyses. The results showed that the area under the ROC curve (AUC) was 0.835 (; sensitivity: 0.806; specificity: 0.903). These results suggest that GUCA2A downregulation may play an important role in colorectal tumorigenesis and has a potential diagnostic value for CRC patients.

2.6. Pathway Analysis

The pathway enrichment analysis of GUCA2A coexpressed genes was conducted in the ConsensusPathDB human database. From the 16 pathways shown in Figure 5, the transport of small molecules and metabolism are prominent.

3. Discussion

Early detection and complete resection before metastasis are considered the only curative therapy for CRC [11]. The five-year survival rate of CRC patients was obviously better when the localized disease was detected at an early stage than that of CRC patients with distant metastasis. Cancer is a molecularly heterogeneous disease, and none of the currently identified biomarkers are sensitive and specific enough for reliable CRC screening in the clinical setting. Thus, identifying novel molecular biomarkers has significant clinical benefits.

In our study, we first performed WGCNA for GSE68468 and screened the pathologically related hub genes. We conducted RRA analysis for six datasets and screened the top 100 robust DEGs, which had a high or low expression in all expression profiles. By taking the intersection, we obtained 19 candidate genes, and only the expression of GUCA2A was associated with the OS of CRC patients in the TCGA database. We found that GUCA2A was prominent and top-ranking in both the hub gene network (Figure 2) and robust DEGs (Figure 3(b)), indicating its value in tumorigenesis.

Guanylin (GUCA2A) and uroguanylin (GUCA2B) are two peptide hormones that function as paracrine endogenous ligands for the guanylate cyclase-C (GUCY2C) receptor [12]. A previous study indicated the role of GUCY2C signaling in facilitating mucosal wounding and inflammation mediation, in part, through the control of resistin-like molecule β production [13]. GUCA2A, GUCA2B, and GUCY2C are downregulated in inflammatory bowel disease [14], which may have implications in inflammatory bowel disease pathogenesis. A recent study suggests that GUCY2C can suppress tumor progress [15] in the intestine, and the loss of the GUCY2C signaling cascade increases CRC susceptibility [16]. Intestinal homeostasis disruption and CRC tumorigenesis are associated with a fairly common loss of GUCA2A and GUCA2B [1719]. Bashir et al. revealed the possibility of GUCA2A loss silencing GUCY2C, which leads to microsatellite instability tumor [20].

Consistent with the expression of GUCA2A in our study, the expression of GUCA2B was significantly downregulated in CRC tissues and had a relative high weight in the red module, which further indicates that GUCA2A and GUCA2B may play a consistent role in CRC neoplasia.

In conclusion, we revealed that GUCA2A was downregulated in CRC tissues. Aberrantly expressed GUCA2A can be a candidate marker of poor prognosis in patients with CRC, which may be a therapeutic target for precision medicine in cervical cancer. However, further studies are still needed to explore the underlying molecular mechanism through which GUCA2A plays a role in CRC. However, future in vivo and in vitro experiments are still required to explore the mechanisms underlying the roles of GUCA2A in CRC.

4. Methods

4.1. WGCNA Construction and Module Detection

We performed WGCNA to microarray data GSE68468 generated from 378 CRC tissue samples, and the “WGCNA” package in R 3.5.1 was used to construct a coexpression network. WGCNA seeks to identify modules of densely interconnected genes by searching for genes with similar patterns of connection strengths or high topological overlap. For each dataset, Pearson correlation coefficients were calculated for all pairwise comparisons of expressed genes across all samples. Genes with similar expression profiles were classified into modules based on the TOM dissimilarity with a minimum size of 30 for the gene dendrogram and visualized via hierarchical clustering [7]. Then, the modules whose eigengenes were highly correlated (correlation above 0.7) were merged. The gene network was visualized with randomly selected 400 genes.

The resulting Pearson correlation matrix was transformed into a matrix of connection strengths (that is, an adjacency matrix) by using a power function (connection). All the modules were assigned to the corresponding color. The relationships of modules and clinical traits (i.e., disease status, histology, and organism part) were calculated. Among these clinical traits, pathology, including normal mucosa, polyps, CRC tissue, CRC with metastases, and normal lung/liver tissues, can reflect the occurrence and metastasis of CRC. The associations of individual genes with the clinical trait, namely, gene significance (GS), and the module eigengenes, namely, module membership (MM), were evaluated. Then, the correlation between GS and MM was calculated, and the highly correlated interest module can be used to construct the coexpression network and identify the hub genes.

4.2. Coexpression Network Construction and Hub Gene Identification

The genes in the key modules screened from WGCNA were further analyzed.

We filtered the hub genes by a condition of the weight value (>0.15) and visualized them using Cytoscape 3.6.0 [21].

4.3. Robust Rank Aggregation (RRA) Analysis

To further increase the reliability of the results and screen out the ideal candidate, we enrolled six published colorectal microarray data (i.e., GSE25070, GSE44076, GSE44861, GSE21510, GSE9348, and GSE21815) [2231] from the GEO database, which have 530 CRC tissues and 50 normal tissues in our study (Table 2). After screening DEGs ( and |logFC| ≥ 2) in each dataset by using the “limma” package [32], the RRA method was used to identify significantly robust DEGs by using the library “RobustRankAggreg” package in R 3.5.1. The statistically significant DEGs were defined to have and |logFC| ≥ 1. Finally, the top 50 up- and downregulated DEGs were selected and visualized by using a heatmap.

4.4. Survival Analysis in the TCGA Database

In order to increase the reliability of the results, the intersection of hub genes in the red module from WGCNA and 50 downregulated robust DEGs was performed and analyzed. Kaplan–Meier (K-M) analysis was conducted to evaluate the relationship between gene expression and the overall survival (OS) of 617 CRC patients in The Cancer Genome Atlas (TCGA) database (https://cancergenome.nih.gov/). Patients were classified into high- or low-expression groups according to the median value. Then, genes associated with CRC survival were screened.

4.5. Human Tissue Samples and Quantitative Real-Time PCR

Overall, the 31 CRC and adjacent normal tissues obtained at Jiangsu Cancer Hospital (Nanjing, China) were frozen immediately after surgical resection and kept at 80°C until further analysis. Tumor histopathology was classified according to the World Health Organization Classification of Tumors system. The present study was done with the approval of the local ethics committee.

RNA isolation (Takara, Dalian, China) was performed according to the manufacturer’s instructions. Reverse transcription was conducted with a PrimeScript RT reagent kit (Takara, Kusatsu City, Shiga, Japan) and the fluorescent quantitative experiment with ABI qPCR 7300 (Torrance, CA). The PCR reactant mix consisted of 2 µl cDNA solution, 10 μl 2× PowerUp™ SYBR™ Green Master Mix (Thermo Fisher Scientific, Carlsbad, CA), 0.5 μl of 10 μM forward and reverse PCR primer, and 7 μl DNA template Nuclease-Free Water. The PCR conditions were set as follows: denaturation at 50°C for 2 min, 95°C for 2 min, 95°C for 15 s, and 60°C for 1 min with 40 cycles. A GAPDH primer set was used as an internal control.

4.6. Pathway Analysis

We performed pathway enrichment analysis for the coexpressed genes to explore the possible mechanism of the candidate gene in CRC. The coexpressed genes were obtained from the TCGA database, and the top 100 genes with the highest Pearson correlation coefficient were considered to be significantly coexpressed (Table 3). Pathway enrichment analysis was performed by using the ConsensusPathDB human database (http://cpdb.molgen.mpg.de/) [33]. The overrepresentation gene set analysis was utilized, and the following pathway databases were enrolled in our analysis: Manual upload, NetPath, SignaLink, PID, EHMN, HumanCyc, INOH, KEGG, BioCarta, WikiPathways, SMPDB, and PharmGKB. Minimum overlap input list >5 and value cutoff <0.01 were considered significant enrichment.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.


This study was supported in part by grants from the Jiangsu Provincial Key Research Development Program (BE2016794 to Jifeng Feng and BE2016795 to Jianzhong Wu).

Supplementary Materials

S1: sample dendrogram and soft-thresholding value estimation. (A) Sample dendrogram and trait heatmap. The three traits correspond to the disease status, histology, and organism part, respectively. (B) Scale independence and mean connectivity of soft-thresholding values (β). (C) Clustering of module eigengenes. The dissimilarity was set as 0.25 to merge the similar modules. S2: volcano plots of DEGs between cancerous and adjacent tissues in six microarray data. (Supplementary Materials)