Bioinformatic Identification of Neuroblastoma Microenvironment-Associated Biomarkers with Prognostic Value
The microenvironment plays a vital role in the tumor recurrence of neuroblastoma. This research aimed at exploring prognostic genes that are involved in neuroblastoma microenvironment. We used “estimate” R package to calculate the immune/stromal/ESTIMATE scores of each sample of ArrayExpress dataset E-MTAB-8248 based on the ESTIMATE algorithm. Then we found that immune/stromal/ESTIMATE scores were not correlated with age/chromosome 11q, but tumor stage, MYCN gene amplifications, and chromosome 1p. Samples were then divided into high- and low-score groups, and 280 common differentially expressed genes (DEGs) were identified. 64 potential prognostic genes were harvested through overall survival analysis from the common DEGs. 14 prognostic genes (ABCA6, SEPP1, SLAMF8, GPR171, ABCA9, ARHGAP15, IL7R, HLA-DPB1, GZMA, GPR183, CCL19, ITK, FGL2, and CD1C) were obtained after screening in two independent cohorts. GO and KEGG analysis discovered that common DEGs and 64 potential prognostic genes are mainly involved in T-cell activation, lymphocyte activation regulation, leukocyte migration, and the interaction of cytokines and cytokine receptors. Correlation analysis showed that all prognostic genes were negatively correlated with MYCN amplification. Cox analysis identified 5 independent prognostic genes (ARHGAP15, ABCA9, CCL19, SLAMF8, and CD1C).
Neuroblastoma is a cancer that develops from immature nerve cells found in multiple parts of the body, including neuroblastomas, ganglioblastomas, and ganglion neuromas . Neuroectodermal cells containing neuroblastomas originate from the neural crest during fetal development and are destined for the adrenal medulla and sympathetic nervous system . Neuroblastoma accounts for 97% of all neuroblastic tumors, is heterogeneous, and differs in location, histopathological appearance, and biological characteristics . Neuroblastoma comprises 6% to 10% of all childhood cancers and causes 15% of all pediatric cancer deaths . Although its molecular basis is still unknown, clinical diversity is closely related to numerous clinical and biological factors, including patient age, tumor stage and histology, and genetic and chromosomal abnormalities . Better learning the molecular mechanism of neuroblastoma could provide a crucial message related to prognosis .
The tumor microenvironment is the environment around a tumor, including the surrounding blood vessels, immune cells, fibroblasts, signaling molecules, and the extracellular matrix. The tumor and the surrounding microenvironment are closely related and interact constantly . The tumor microenvironment has been recognized as a complex milieu where cancer cells interact with immune and stromal cells via numerous biochemical and physical signals that are crucial for cancer progression and metastasis [4–6]. The activities of immune cells and stromal cells have been demonstrated owing to the capacity to predict cancer outcomes .
ESTIMATE is a method that uses gene expression characteristics to infer the ratio of stromal cells to immune cells in a tumor sample. ESTIMATE scores are correlated with tumor purity based on DNA copy number in samples from 11 different tumor types. These samples have been profiled on Agilent and Affymetrix platforms, or based on RNA sequencing, which can be obtained through The Cancer Genome Atlas. The use of 3,809 transcription profiles provided elsewhere in the public domain further confirmed the accuracy of the prediction. The ESTIMATE method can be used to evaluate the presence of stromal cells and immune cell infiltration in tumor samples using gene expression data . According to the ESTIMATE algorithm, researchers have performed a prognostic assessment and are exploring the genetic changes of many malignant tumors [9–11]. But the immune and stromal scores of neuroblastoma have not yet been elucidated. No research has clearly shown whether the ESTIMATE algorithm can be used to predict the prognosis of neuroblastoma.
To understand the molecular pathogenesis of neuroblastoma involved in the tumor microenvironment, in this work, we used the ESTIMATE algorithm in conjunction with multiple cohorts to explore underlying genetic factors in the tumor microenvironment of neuroblastoma and identify prognostic genes.
2. Materials and Methods
2.1. ArrayExpress Microarray Data
We searched the ArrayExpress (https://www.ebi.ac.uk/arrayexpress/) database  using “neuroblastoma” as the keyword and filtered the results by checking “Homo sapiens” as the “Organism” and selecting “ArrayExpress data only.” In the filtered results, we dug into each detailed description of each dataset to see if they contain survival data. Finally, dataset E-MTAB-8248 that comes from the platform of Agilent-020382 Human Custom Microarray 44k (Feature Number version) was selected for this study. The processed data of the annotation table were obtained from the ArrayExpress website.
2.2. Pretreatment of Microarray Data
To begin with, the annotation procedure was performed by using the downloaded annotation table. Then, “estimate” R package was used to figure immune, stromal, and ESTIMATE scores of the dataset by ESTIMATE algorithm .
2.3. Distributions of Immune/Stromal/ESTIMATE Scores in Different Age/Tumor Stage/MYCN Gene Amplifications/Chromosome 1p/Chromosome 11q/Prognosis
All cases were grouped according to their status of age/tumor stage/MYCN gene amplifications/chromosome 1p/chromosome 11q. Wilcoxon signed-rank test (two groups) or Kruskal–Wallis H test (more than two groups) was utilized to assess the distributions of scores.
2.4. Differentially Expressed Gene Analysis
Patients were then stratified according to the median of immune and stromal scores. Differentially expressed genes (DEGs) were explored between high and low immune/stromal score groups using the “limma” R package . The cutoff was set as (1) |log2 (fold-change) | > 1 and (2) false discovery rate (FDR) < 0.05. The heatmaps of DEGs were generated from “pheatmap” R package.
2.5. Enrichment Analysis of Common DEGs
The “clusterProfiler” R package was applied to perform functional enrichment analysis of common DEGs, including Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG).
2.6. Identification of Potential Prognostic Genes
Kaplan–Meier analysis was conducted by the “survival” R package. The Kaplan–Meier curve illustrated the overall survival difference between low and high expressions of each common DEG gene. This relationship was examined by the log-rank test.
2.7. Establishment of the Protein-Protein Interaction (PPI) Network of Potential Prognostic Genes
To better study the potential prognostic genes, we constructed a PPI network using the STRING online database (http://string-db.org) and the Cytoscape software (http://www.cytoscape.org/). The degree distribution was calculated using “cytoHubba” plug-in and Network Analyzer from Cytoscape. The clusters (highly interconnected regions) in this PPI network were discovered by a Cytoscape plug-in named “MCODE.”
2.8. Screening for Prognostic Genes in Two Independent Cohorts
Neuroblastoma patients with expression and survival data were available in the datasets of GSE85047 and GSE49710 from the Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih.gov/geo/) database. The gene expression profiles were measured experimentally using platforms of GPL5175 and GPL16876, respectively (Table 1). We screened the identified potential prognostic genes using the two cohorts by overall survival. value <0.05 (log-rank test) was considered statistically significant. The genes that are in the intersection of the above two results were considered prognostic genes.
2.9. Identification of the Independent Prognostic Value of Prognostic Genes
MYCN amplification was strongly correlated with a poor prognosis in neuroblastoma cases . The status of MYCN and the expression of prognostic genes were extracted from GSE49710, and their correlation was checked using the Spearman test. In addition, to explore the independence of the prognostic genes, we performed univariate and multivariate Cox analysis on prognostic genes and MYCN in the GSE49710 cohort. In the end, we reviewed previous studies related to crucial prognostic genes and neuroblastoma.
3.1. Characteristics of Cohorts Included in This Study
E-MTAB-8248 (n = 223) was chosen for identifying potential prognostic genes in the TME of neuroblastoma. GSE85047 (n = 283) and GSE49710 (n = 498) were taken as screening tools to obtain prognostic genes from potential genes. The clinical characteristics of the cohorts included in this study are shown in Table 2.
3.2. Distributions of Immune/Stromal/ESTIMATE Scores in Different Age/Tumor Stage/MYCN Gene Amplifications/Chromosome 1p/Chromosome 11q/Prognosis
The distributions of immune/stromal/ESTIMATE scores did not vary with age/chromosome 11q (Figure 1). However, in the tumor stage section, the distribution of stromal scores was significantly different in the stages, but immune and ESTIMATE scores not. In the distribution of scores of MYCN gene amplifications, immune and ESTIMATE scores mattered, but stromal score not. The same happened in the distribution of chromosome 1p scores, the stromal score could not differentiate the status of chromosome 1p, but immune and ESTIMATE scores could.
3.3. Immune/Stromal/ESTIMATE Scores and Prognosis
223 neuroblastoma patients were divided into high- and low-score groups according to their median scores. Kaplan–Meier survival curves showed that high immune score patients had a better trend of overall survival than that the low-score group ( value = 0.091 in the log-rank test) (Figure 2(a)). Analogously, as shown in Figures 2(b) and 2(c), patients with high stromal/ESTIMATE scores achieved a better trend on overall outcomes than those with low scores ( value >0.216). Not noticeable significant statistical results simply predict that fewer microenvironment-related genes may be found in the next steps and do not negate the role of the microenvironment in neuroblastoma. So, we still will focus on the genes related to both the immune and stromal scores.
3.4. Identification of Common DEGs from Immune and Stromal Scores
223 neuroblastoma patients were grouped based on their median scores. As shown in Figure 3(a), 601 DEGs were found among immune score groups (Table S1). 503 DEGs among stromal score groups were discovered, which are shown in Figure 3(b) and Table S2. 279 common upregulated DEGs (Figure 3(c)) and 1 common downregulated DEG (Figure 3(d)) were identified then by comprehensive analysis. The 280 common DEGs (Table S3) then entered into the next steps.
GO enrichment analysis (Table S4) showed top GO terms identified in the 280 common DEGs mainly involved with lymphocyte differentiation, T-cell activation, regulation of lymphocyte activation, external side of the plasma membrane, collagen-containing extracellular matrix, G protein-coupled receptor binding, and the peptide binding (Figure 3(e)). KEGG analysis (Figure 3(f) and Table S5) of 280 common DEGs was mainly observed for cytokine-cytokine receptor interaction, viral protein interaction with cytokine and cytokine receptor, Staphylococcus aureus infection, Th17 cell differentiation, and chemokine signaling pathway.
3.5. Identification of Potential Prognostic Genes
Kaplan–Meier survival curves were built for the 280 common DEGs based on overall survival. According to the log-rank test ( value <0.05), 64 common DEGs had survival predictive ability (all were upregulated in TME), of which 60 and 4 were positively and negatively correlated with survival, respectively (Figure 4 and Table S3).
3.6. Functional Enrichment Analysis and Protein-Protein Interaction Construction of Potential Prognostic Genes
GO enrichment was conducted using the 64 prognostic genes, and the enriched items in GO were involved largely in T-cell activation and the regulation of lymphocyte activation, MHC class II protein complex and the MHC protein complex, and antigen binding and the cytokine activity (Figure 5(a)). KEGG pathways were mainly enriched for hematopoietic cell lineage, herpes simplex virus 1 infection, influenza A, and cytokine-cytokine receptor interaction (Figure 5(b)).
A PPI network was constructed by the online STRING tool and Cytoscape software. After hiding disconnected nodes, there were 50 nodes and 239 edges in this network (Figure 5(c)). All genes positively correlated with the overall outcomes of neuroblastoma cases. CD69, CD3D, CD2, CCL5, CCR2, CD48, LCK, IL7R, IRF8, and HLA-DRB1 were the top ten genes in this network sorted by degree. RGS1 had the strongest predictive ability in this network for overall survival; however, FYB had the weakest ability to do so. The interaction between HLA-DRA and CD74 was the tightest in this PPI network, with a combine score of 0.999. The loosest relationship happened between CD52 and CCR2, with a combine score of 0.409.
Then, we explored the PPI network using the Cytoscape “MCODE” plug-in and found 3 main clusters. The top two modules had more than 9 nodes, which were plotted by us (Figures 5(d) and 5(e)). In cluster one, CCL5, HLA-DRA, HLA-DRB1, IRF8, and CD74 were the top five genes sorted by degree, and HLA-DMA, IGHV4-38-2, HLA-DPB1, CCL5, and HLA-DMB were the top five genes sorted by prognostic ability. In cluster two, CD69, CCR2, CD48, LCK, and TLR8 were on the top five sorted by degree distribution, and CXCR6, CXCL14, CD69, TLR8, and GPR183 were on the top five sorted by prognostic ability.
3.7. Screening in Two GEO Cohorts
We then screened the 64 potential prognostic genes described above using the GSE85047 and GSE49710 cohorts from the GEO database. value <0.05 in the log-rank test was set as the cutoff. 14 genes were screened from GSE85047 cohort (Table S6), 56 genes were found from GSE49710 (Table S7), and 14 in the intersection were identified as prognostic genes (Table 1).
3.8. Identification of the Independent Prognostic Value of Prognostic Genes
MYCN amplification was strongly correlated with a poor prognosis in neuroblastoma cases . The status of MYCN and the expression of prognostic genes were extracted from GSE49710, and their correlation was checked using the Spearman test. The correlation results shown in Table 3 indicated that all prognostic genes were negatively correlated with the amplification of MYCN. In univariate Cox analysis, all prognostic genes and MYCN amplification were related to prognosis. In multivariate Cox analysis, MYCN amplification, ARHGAP15, ABCA9, CCL19, SLAMF8, and CD1C could still predict prognosis (Table 4). The above results suggested that ARHGAP15, ABCA9, CCL19, SLAMF8, and CD1C were potential independent prognostic factors for neuroblastoma. Finally, we did a literature review, finding that, among 14 prognostic genes identified in the present study, 3 had been reported having experimental evidence involving in the progression of neuroblastoma, none of them had been previously experimentally proofed affecting neuroblastoma prognosis (Table 5).
In this study, we attempted to identify tumor microenvironment-related genes that contribute to neuroblastoma outcome from an ArrayExpress dataset. By comparing global gene expression between high- and low-score patients, 280 common DEGs were found. Then 64 genes were identified owning potential prognostic value via survival analysis. Importantly, we screened the 64 genes in two independent GEO cohorts, identifying 14 genes (ABCA6, SEPP1, SLAMF8, GPR171, ABCA9, ARHGAP15, IL7R, HLA-DPB1, GZMA, GPR183, CCL19, ITK, FGL2, and CD1C) as prognostic genes. Moreover, ARHGAP15, ABCA9, CCL19, SLAMF8, and CD1C were found via Cox analysis having independent prognostic value (Figure 6).
We deployed the ESTIMATE algorithm to figure the immune/stromal/ESTIMATE scores of neuroblastoma cases and found these scores were not correlated with age/chromosome 11q, but tumor stage, MYCN gene amplifications, and chromosome 1p. The MYCN, a member of the MYC family, has been associated with high-risk disease and poor prognosis in approximately 25% of cases of neuroblastoma. Currently, the amplification of MYCN is still one of the most risky genetic markers in neuroblastoma . Neuroblastoma tumor cells lacking parts of chromosome 1 or 11 (known as 1p deletion or 11q deletion) may be predictive of poor prognosis . In our study, we found that the scores were related to the MYCN gene amplifications and chromosome 1p, which may reveal a new understanding of this disease, but need further research. And we also found the scores were related to the prognosis of neuroblastoma. Immune cells and stromal cells are essential components active in the tumor microenvironment, which affect the survival, proliferation, and treatment resistance of neuroblastoma [20–24]. The crosstalk between tumor and microenvironment influences the inflammatory response: cancer cells interact with both the innate and the adaptive immune system and use immune cells for tumor survival and protection from immunological attacks [21, 24–26].
GO analysis showed the common DEGs were largely enriched in T-cell activation, external side of the plasma membrane, and G protein-coupled receptor binding. Besides, the KEGG analysis demonstrated that cytokine-cytokine receptor interaction, viral protein interaction with cytokine and cytokine receptor, and Staphylococcus aureus infection were mostly enriched. Previous studies have shown that the biological processes of the immune system are critical to the formation of a complex tumor microenvironment, which is consistent with our findings [21, 22, 24]. In the last few years, the immunological characteristics of neuroblastoma have been more deeply understood, and the development of effective neuroblastoma immunotherapy strategies has attracted widespread attention [27–29].
Overall survival analysis of the common DEGs found 64 genes owned potential prognosis capacity in neuroblastoma cases. In the PPI network of these 64 potential prognostic genes, we are particularly interested in CCL5 in cluster one (Figure 5(d)) and CD69 and TLR8 in cluster two (Figure 5(e)), of whom hold the top five degree and prognostic ability in their own cluster. CCL5 is an inflammatory chemokine that is widely secreted from natural killer cells, T cells, fibroblasts, epithelial cells, and platelets  and promotes chemotaxis on cells involved in the immune/inflammatory response . Studies demonstrated that CCL5 is related to certain tumor cells, such as malignant melanoma cells , ovarian , prostate , and breast cancer cells . The exact functions of CCL5 in tumor biology are still unclear. CCL5 production is relevant to inducing proper immune responses against tumors . The current research showed that CCL5 promotes tumor migration and invasion. CCL5 secreted by tumor cells and other cells in the breast cancer tumor microenvironment can recruit tumor-associated macrophages into the tumor microenvironment . Evidence on the direct relationship between CCL5 and neuroblastoma is still lacking and requires further study. CD69, a type II glycoprotein known to regulate inflammation through T-cell migration and retention in tissues, plays an important role in inducing the exhaustion of tumor-infiltrating T cells . CD69 expression is readily upregulated upon activation in most leukocytes, which mainly underlies its widespread use as a marker of activated lymphocytes and NK cells . Mita et al. reported that the use of anti-CD69 monoclonal antibodies to treat tumor-bearing mice significantly reduced tumor-infiltrating T-cell failure and enhanced protection against metastasis, indicating the efficacy of anti-CD69 antibodies in the treatment of malignant tumors . CD69 expression is associated with hypoxia in the tumor microenvironment . The research on the relationship between CD69 and tumors is still scarce, is still in infancy, and needs more in-depth research. TLR7 and TLR8 are phylogenetically and structurally closely related members of the TLR family; together with TLR9, they constitute one of the six major TLR clades . TLR8 has been identified as a natural receptor for single-stranded RNA and is thought to act as an effective activator of the innate immune response after viral infection [41–43]. TLR8 is the only TLR that has been shown to be necessary and sufficient to reverse the suppressive function of Treg cells, resulting in strong tumor-suppressive effects . Numerous reports have described the Toll-like receptor (TLR) expression in the tumor microenvironment as it relates to cancer progression, as well as their involvement in inflammation . Inflammation triggered by TLR signaling can directly influence tumor-induced senescence in tumor microenvironments, and the effects are variable depending on different TLR signaling and tumor types .
In this study, GSE85047 and GSE49710 cohorts were used for the screening tools for prognostic genes. We screened the prognostic value of these 64 genes based on the cohorts, of which 14 genes were identified as prognostic genes. ARHGAP15, ABCA9, CCL19, SLAMF8, and CD1C could predict prognosis independently. We then conducted a small-scale review of these 14 genes, finding IL7R, CCL19, and GZMA have been demonstrated having experimental evidence with the progression of neuroblastoma by the previous research [15–17], while no gene has been experimentally confirmed having roles in prognosis to neuroblastoma. Several gene expression studies indicate that IL7 has increased expression in NB specifically in tumors with a better prognosis, making IL7 a key candidate for this soluble factor. Prasad’s finding implicated IL7 within the Schwannian stroma of the neuroblastoma tumor architecture as having a paracrine signaling effect on neighboring neuroblasts which may provide the antiproliferative and differentiation signals postulated . Walker and colleagues found that changes in the CCL19 signal transduction pathway can lead to defects in the migration of dendritic cells in neuroblastoma, which in turn affects tumor progression . Yarmarkovich demonstrated that GZMA has a strong correlation with T-cell factors, and GZMA is very active in the neuroblastoma tumor microenvironment . Most of the prognostic genes obtained in our study have not been clarified the mechanism in neuroblastoma, and more efforts need to be implemented in the future.
In summary, by applying ESTIMATE algorithm, and mining from ArrayExpress dataset and other two independent GEO cohorts, we got 14 prognostic genes (ABCA6, SEPP1, SLAMF8, GPR171, ABCA9, ARHGAP15, IL7R, HLA-DPB1, GZMA, GPR183, CCL19, ITK, FGL2, and CD1C) having the capacity to illustrate the prognosis of neuroblastoma patients. Besides, ARHGAP15, ABCA9, CCL19, SLAMF8, and CD1C had independent prognostic value. Interestingly, only 3 of the 14 genes have been previously experimentally identified to be involved in the progression of neuroblastoma. It would be anticipated to test whether this set of genes can provide better survival prediction capabilities than individual genes. More efforts on the further research of the 14 genes may reveal a potential relationship among tumor microenvironment and neuroblastoma prognosis in a novel and comprehensive means.
Conflicts of Interest
The authors declare no conflicts of interest.
Yi Wang organized and wrote the manuscript. Huan Luo and Jing Cao contributed to the literature search for the manuscript. Chao Ma designed and produced the figures and revised the manuscript. All authors reviewed the manuscript and approved the manuscript for publication.
Table S1: 601 DEGs found between immune score groups (FDR < 0.05). Table S2: 503 DEGs found between stromal score groups (FDR < 0.05). Table S3: the performance of 280 common DEGs in Kaplan–Meier analysis from the E-MTAB-8248 cohort with value tested in the log-rank test. Table S4: the gene ontology term enrichment of 280 common differentially expressed genes with .adjust <0.05. Table S5: the Kyoto Encyclopedia of Genes and Genomes pathway enrichment of 280 common differentially expressed genes with .adjust <0.05. Table S6: the list of 14 prognostic genes screened from the GSE85047 cohort with value <0.05 in the log-rank test. Table S7: the list of 56 prognostic genes screened from the GSE49710 cohort with value < 0.05 in the log-rank test. Table S8: the main scripts used in this study. (Supplementary Materials)
L. Prasad, A. Gayagay, A. Charlton et al., “Expression of interleukin-7 and its signalling intermediates in human neuroblastoma tumours,” in Proceedings of the AACR 103rd Annual Meeting 2012, March 2012, Chicago, IL, USA.View at: Google Scholar
M. Yarmarkovich, M. Di Marco, O. Padovan et al., “MHC class I immunogenicity and novel tumor antigen discovery in neuroblastoma,” in Proceedings of the AACR Annual Meeting 2017, Washington, DC, USA, April 2017.View at: Google Scholar
Y. Wang, H. Yang, L. Wang et al., “Development of a novel highly potent TLR8 selective agonist for cancer immunotherapy,” in Proceedings of the American Association for Cancer Research Annual Meeting 2017, Washington, DC, USA, July 2017.View at: Google Scholar