A 6-Gene Risk Signature Predicts Survival of Glioblastoma Multiforme
Background. This study aims to develop novel signatures for glioblastoma multiforme (GBM). Methods. GBM expression profiles from The Cancer Genome Atlas (TCGA) were downloaded and DEGs between tumor and normal samples were identified by differential expression analysis (DEA). A risk signature was developed by applying weighted gene coexpression network analysis (WGCNA) and Cox regression analysis. Patients were divided into high and low risk group, followed by evaluating the performance of the signature via Kaplan-Meier curve analysis. In addition, the prognostic significance of the signature was further validated using an independent validation dataset from Chinese Glioma Genome Atlas (CGGA). DEGs between high and low risk group were subjected to functional annotation. Results. A total of 748 DEGs were identified between primary tumor and normal samples. Following WGCNA and Cox regression analysis, 6 DEGs were identified and used to construct a risk signature. The signature showed high performance in both training and validation dataset. Subsequently, 397 DEGs were identified between high and low risk group. These DEGs were mainly enriched in terms related to calcium signaling, cAMP-mediated signaling, and synaptic transmission. Conclusions. The risk signature may contribute to GBM diagnosis in future clinical practice.
As the most malignant and frequently occurring tumor of the central nervous system (CNS), glioblastoma multiforme (GBM) has been considered as a Grade IV glioma according to World Health Organization (WHO) classification [1–3]. Prominent features of GBM include enhanced tumor cell proliferation, migration, and invasion . Recently, the prognosis of GBM has been gradually improved as a result of advances in surgical resection, radiotherapy, and adjuvant chemotherapy . However, GBM remains a deadly tumor with a median survival of only 15 months [2, 5].
Gene expression aberrations are universal events in cancers and may contribute to cancer development and progression . For example, amplification and overexpression of epidermal growth factor receptor (EGFR) is found in more than 30% of GBM . It has been shown that GBM tumor cells with EGFR amplification have higher infiltration ability and inhibition of EGFR activity suppresses tumor cell growth . In addition, the expression level of inhibitor of growth family member 4 (ING4), which may inhibit tumor cell growth by suppressing nuclear factor kappa B (NF-κB) signaling, has been significantly reduced in GBM .
Recent advancements in bioinformatics and high-throughput sequencing have led to the identification of numerous tumor biomarkers, which may allow for more accurate outcome prediction and better management of GBM [10, 11]. Sreekanthreddy et al. identified serum osteopontin (OPN) as a biomarker of GBM . High level of OPN was confirmed as an indicator of poor outcome of GBM . Colman et al. proposed a 9-gene signature as a predictor of GBM outcome, which showed a close association with markers of glioma stem like cells, including nestin and CD13 . Besides, Arimappamagan et al. established a 14-gene prognostic signature with high accuracy in distinguishing high risk GBM patients from low risk patients . These markers may be integrated into state-of-the-art diagnosis and decision-making processes in future clinical practice. Despite these progressions, more robust prognostic predictors are still needed for GBM treatment.
In our study, we analyzed the expression data of GBM from The Cancer Genome Atlas (TCGA) and Chinese Glioma Genome Atlas (CGGA) and identified differentially expressed genes (DEGs) by differential expression analysis. Subsequently, a 6-gene signature was identified by weighted gene coexpression network analysis (WGCNA) and Cox regression analysis. The signature showed high performance in predicting GBM clinical outcome and may serve as a novel predictor of GBM outcome in future clinical practice.
2. Materials and Methods
2.1. Data Source
The GBM expression dataset (Illumina HiSeq 2000 RNA Sequencing) from TCGA (https://portal.gdc.cancer.gov/) was downloaded in February, 2019. A total of 173 samples (154 primary tumors, 14 recurrent tumors, and 5 solid tissue normal samples) were included in the dataset. Primary tumor and solid tissue normal samples were selected as training samples for further study. Another GBM expression dataset under accession code “part D”  were downloaded from CGGA (http://cgga.org.cn/). The dataset included a total of 272 GBM samples, 138 of which were used as validation samples.
2.2. Screening of DEGs
DEGs between primary tumor and solid tissue normal groups were screened using the limma package (version 3.34.7, https://bioconductor.org/packages/release/bioc/html/limma.htm) of R3.4.1 . The selection criteria were fold discovery rate (FDR) < 0.5 and > 0.5. Based on the expression values of the DEGs, two-way hierarchical clustering analysis was performed using pheatmap (version 1.0.8, https://cran.r-project.org/web/packages/pheatmap/index.html) .
WGCNA is a bioinformatic method based on high throughput expression data, which is used for the construction of coexpression network . All the expressed genes from TCGA dataset were subjected to WGCNA (version 1.61, https://cran.r-project.org/web/packages/WGCNA/) . Specifically, Pearson coefficients were generated for all pairwise comparisons of genes and the resulting coexpression matrix was transformed to an adjacency matrix using soft threshold power. The soft threshold power was the value where the square of the correlation coefficient between (k, the number of connected nodes) and (p(k), the probability of k connected nodes) reached 0.9. Subsequently, in order to group genes into different modules, the dissimilarities between genes in the adjacency matrix were generated and hierarchical clustering was performed using the dynamic hybrid tree-cutting method (cutHeight = 0.9; the minimum module size = 50).
The DEGs were then mapped to the WGCNA modules. A hypergeometric algorithm was used to calculate the value of fold enrichment , which was defined aswhere indicated the total number of genes analyzed by WGCNA, indicated the number of genes in each module, indicated the number of DEGs and indicated the number of DEGs mapped to each module. The modules with fold enrichment > 1 and p < 0.05 were selected as disease associated modules.
2.4. Identification of GBM Associated Gene Signature
In order to identify a GBM related signature genes, univariate and multivariate Cox regression analysis were performed using the survival package (version 2.41-1, https://cran.r-project.org/web/packages/survival/index.html) of R3.4.1. The selection criterion was log-rank p < 0.05. Based on the expression levels of the signature genes, a risk signature was then formulated as where indicated the coefficient of derived from multivariate Cox regression whereas indicated the expression level of .
The risk score of each sample was calculated according to the above formula. The median of risk score value was used as the threshold to divide the training samples into high and low risk group. The prognostic significance of the risk signature was assessed by Kaplan-Meier curve analysis using the survival package. Subsequently, the prognostic significance of the risk signature was validated using the CGGA dataset with the same procedure.
2.5. Functional Characterization of the Different Prognosis
DEGs between high and low risk group of the training dataset were further screened by limma package. The selection criteria for DEGs were FDR < 0.5 and > 0.5. The resulting DEGs were subjected to functional annotation analysis using the clusterProfiler package (http://bioconductor.org/packages/release/bioc/html/clusterProfiler.html) of R3.4.1 . The selection criterion for GO biological processes and KEGG pathways was FDR < 0.05.
3.1. DEGs between Tumor and Normal Samples
Following differential expression analysis, a total of 748 DEGs (218 upregulated and 530 downregulated) (Figure 1(a)) were identified between primary tumor and solid tissue normal samples. Then the specificity of the DEGs was evaluated by two-way hierarchical clustering analysis. According our results, primary tumor and solid tissue normal samples were divided into two clusters, which showed completely different overall expression patterns (Figure 1(b)).
3.2. Disease Related WGCNA Modules and Genes
All the expressed genes from TCGA dataset were used as input for WGCNA. The soft threshold power used for matrix transformation was determined as 18, where the square of the correlation coefficient between and reached 0.9 and the mean connectivity of the co-expression network was 1.0 (Figures 2(a) and 2(b)). According to the WGCNA results, totally 10 different disease related modules were obtained, except for the grey module (Figure 2(c)). The correlations between WGCNA modules and disease were shown as heatmap in Figure 2(d).
A total of 745 overlap genes were obtained after mapping DEGs to WGCNA modules (Table 1). According to hypergeometric algorithm analysis, DEGs were significantly enriched in blue and yellow module (fold enrichment > 1 and p < 0.05), each containing 150 and 136 DEGs, respectively (Table 1). The expression level of these DEGs in primary tumor and solid tissue normal samples were shown as heatmaps in Figure 2(e).
3.3. GBM Associated Risk Signature
In total, 152 of the primary tumor samples had clinical prognosis information and were used for subsequent screening of signature genes from DEGs in the blue and yellow module. Univariate and multivariate Cox regression analysis were performed sequentially. A total of 27 DEGs were obtained by univariate Cox regression, among which 6 DEGs were further identified as prognosis associated genes by multivariate Cox regression (Table 2). The 6 DEGs were BPI fold containing family B member 2 (BPIFB2), Xin actin binding repeat containing 2 (XIRP2), leucine rich repeat containing 10 (LRRC10), short chain dehydrogenase/reductase family 16C member 5 (SDR16C5), Homeobox A13 (HOXA13), and neural EGFL like 1 (NELL1).
The 6 DEGs were signature genes and were used to develop a tumor associated risk signature. According to the β values (Table 2) and the expression levels of these genes, risk score could be defined as Risk score of each sample was calculated based on the above formulation. The median risk score was used as the cutoff to separate samples in TCGA dataset into high and low risk group. According to the Kaplan-Meier survival curve, the prognosis of low risk group was significantly better than that of high risk group (Figure 3(a); p < 0.05). The performance of the signature in predicting prognosis was further validated in the validation dataset using the same procedure. Consistent with the result of training dataset, low risk group also showed a significantly better prognosis than high risk group in the validation dataset (Figure 3(b)). Moreover, the validation dataset also showed a high area under the receiver operating characteristic curve (AUC), close to that of the training dataset (Figure 3(c)).
3.4. Functional Annotation of Feature Genes
A total of 397 DEGs (371 upregulated and 26 downregulated) between low and high risk group of the training dataset were screened by differential expression analysis (Figure 4(a)). The expression patterns of these DEGs were shown as heatmap in Figure 4(b).
In order to interpret the biological functions and pathways perturbated by the signature genes, the DEGs between low and high risk group were used as inputs for GO and KEGG analysis. The enriched KEGG pathways were “hsa04080: Neuroactive ligand-receptor interaction” and “hsa04020: Calcium signaling pathway” (Figure 4(c)). The main enriched GO biological processes included “GO:0019933~cAMP−mediated signaling” “GO:0006811~ion transport” “GO:0007268~synaptic transmission” and “GO:0019226~transmission of nerve impulse” (Figure 4(c)).
GBM is the most malignant brain tumor, efficient management of which requires robust biomarkers . In the present study, we first analyzed the expression dataset of GBM and identified DEGs between primary tumor and solid tissue normal samples. Based on the DEGs, we successfully developed a risk signature, which was efficient and reliable in predicting the clinical outcome of GBM.
Signatures composed of multiple genes are generally more robust and more accurate than single-gene biomarker in predicting tumor outcome . The risk signature developed in our study consisted of 6 genes, including BPIFB2, XIRP2, LRRC10, SDR16C5, HOXA13, and NELL1. GBM samples could be divided into high and low risk group by applying the signature. As expect, our results indicated that low risk groups showed significantly better overall survival than high risk groups in both training and validation dataset. Consequently, the 6-gene signature may provide useful information for clinical practice if incorporated into future decision-making processes.
Among the signature genes, HOXA13 is a Homeobox gene overexpressed in multiple cancers and has been shown to be associated with the progression of hepatocellular carcinoma, pancreatic cancer, esophageal squamous cell carcinoma and GBM . It has been reported that HOXA13 may promote GBM progression through activation of Wnt/beta-catenin and TGF-β signaling pathway, whereas downregulation of HOXA13 may suppress GBM cell invasion and decrease tumor growth . Though few studies have investigated roles of BPIFB2, XIRP2 and NELL1 in GBM, all the three genes may have potential roles in other cancers. BPIFB2 has been reported to be overexpressed in gastric cancer, which may result in expression alteration of epithelial-mesenchymal transition (EMT) markers , and XIRP2 has been shown to be frequently mutated in lung adenocarcinoma . As a gene encoding a secreted protein regulating skeletal ossification , NELL1 has also been proposed to be a tumor suppressor gene in colon cancers . Therefore, dysregulation of the BPIFB2, XIRP2 and NELL1 may also underly GBM tumorigenesis. While the roles of the remaining two signature genes in GBM tumorigenesis remains unclear, both genes have important physiological roles. LRRC10 is required for early heart development [27, 28] and SDR16C5 encodes a retinol dehydrogenase, which may be essential for retinoic acid biosynthesis . Considering their important physiological roles, they may also potentially be involved in GBM tumorigenesis.
Dysregulation of cancer related pathways and functions is common in cancers . Our functional annotation analysis showed that DEGs between high and low risk group were significantly enriched in pathways and functions related to synaptic transmission, indicating that synaptic function in GBM patients may be deregulated. In addition, DEGs were also enriched in calcium signaling and cAMP-mediated signaling. Ca2+ is an essential regulator for neurogenesis and synaptic transmission, and the deregulation of Ca2+ signaling may advance GBM progression [31, 32]. It has been proposed that manipulating Ca2+ signaling may benefit the management of GBM . Suppression of cAMP signaling pathway has been shown to be a common feature in tumorigenesis and activation of cAMP signaling in GBM may inhibit tumor cell growth and induce cell apoptosis . Considering the prognostic differences between high and low risk group, we speculated that deregulation of calcium signaling and cAMP-mediated signaling may play important roles in the development and progression of GBM.
One main advantage of our study was that a robust GBM risk signature was developed through a combination of WGCNA and Cox regression analysis. The signature was efficient and reliable for both training and validation dataset when applied in outcome prediction. In addition, we also identified BPIFB2, XIRP2, NELL1, LRRC10, and SDR16C5 as novel GBM biomarkers, as they have never been reported to be associated with GBM development and progression. However, we also noticed some limitations in our study. The samples included in our study was insufficient and more samples are required in future studies. Additionally, experimental studies should be designed and performed to confirm the involvement of the novel biomarkers in GBM and to provide an insight into corresponding molecular mechanisms.
In conclusion, we analyzed the expression profiles of GBM and identified DEGs between primary tumor and solid tissue normal samples. A 6-gene risk signature consisting of BPIFB2, XIRP2, LRRC10, SDR16C5, HOXA13, and NELL1 was further developed for outcome prediction. The signature may contribute to future decision-making processes in clinical practice.
No data were used to support this study.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Jingwei Zhao and Le Wang contributed equally to this work.
F. Hanif, K. Muzaffar, K. Perveen, S. M. Malhi, and S. U. Simjee, “Glioblastoma multiforme: A review of its epidemiology and pathogenesis through clinical presentation and treatment,” Asian Pacific Journal of Cancer Prevention, vol. 18, no. 1, pp. 3–9, 2017.View at: Google Scholar
A. F. Tamimi and M. Juweid, “Epidemiology and outcome of glioblastoma,” in Glioblastoma, S. De Vleeschouwer, Ed., Brisbane, 2017.View at: Google Scholar
F. Foucault, F. Kiefer, B. Zeitouni et al., “Abstract 1203: Whole-exome sequencing analysis across 23 histotypes of patient-derived tumor xenografts reveals their similarities with TCGA patient tumors,” in Proceedings of the Proceedings: AACR Annual Meeting 2014; April 5-9, 2014; San Diego, CA, pp. 1203–1203.View at: Publisher Site | Google Scholar
O. V. Belyaeva, S. Lee, M. K. Adams, C. Chang, and N. Y. Kedishvili, “Short chain dehydrogenase/reductase rdhe2 is a novel retinol dehydrogenase essential for frog embryonic development,” The Journal of Biological Chemistry, vol. 287, no. 12, pp. 9061–9071, 2012.View at: Publisher Site | Google Scholar