Abstract

Background. Immune checkpoint inhibitors are a promising therapeutic strategy for breast cancer (BRCA) patients. The tumor microenvironment (TME) can downregulate the immune response to cancer therapy. Our study is aimed at finding a TME-related biomarker to identify patients who might respond to immunotherapy. Method. We downloaded raw data from several databases including TCGA and MDACC to identify TME hub genes associated with overall survival (OS) and the progression-free interval (PFI) by WGCNA. Correlations between hub genes and either tumor-infiltrating immune cells or immune checkpoints were conducted by ssGSEA. Result. TME-related green and black modules were selected by WGCNA to further screen hub genes. Random forest and univariate and multivariate Cox regressions were applied to screen hub genes (MYO1G, TBC1D10C, SELPLG, and LRRC15) and construct a nomogram to predict the survival of BRCA patients. The -index for the nomogram was 0.713. A DCA of the predictive model revealed that the net benefit of the nomogram was significantly higher than others and the calibration curve demonstrated a good performance by the nomogram. Only TBC1D10C was correlated with both OS and the PFI (both values < 0.05). TBC1D10C also had a high positive association with tumor-infiltrating immune cells and common immune checkpoints (PD-1, CTLA-4, and TIGIT). Conclusion. We constructed a TME-related gene signature model to predict the survival probability of BRCA patients. We also identified a hub gene, TBC1D10C, which was correlated with both OS and the PFI and had a high positive association with tumor-infiltrating immune cells and common immune checkpoints. TBC1D10C may be a new biomarker to select patients who may benefit from immunotherapy.

1. Introduction

Breast cancer (BRCA) is the leading cause of death by female malignancy tumors, and the morbidity is increasing in urban areas each year [1, 2]. With advances in multidiscipline therapies including immunotherapy, the prognosis of patients with BRCA has improved dramatically. However, due to the significant variability in tumor heterogeneity, almost 62,667 patients died of BRCA in 2018 [3].

Immunotherapy is a promising strategy for cancer therapy that has a significant survival benefit in some BRCA patients. Immune checkpoint inhibitors such as anti-PD-1 and anti-PD-L1 that were approved as therapeutics for some malignant tumors have participated in various clinical trials, but only some patients responded well [4]. Selecting BRCA patients who will respond to immunotherapy is a critical topic right now.

The tumor microenvironment (TME) consists of tumor cells, immune cells, fibroblasts, extracellular matrix, chemokines, cytokines, etc. However, the immune and stromal cells in the TME are the primary nontumor components [5, 6]. Research on the TME demonstrates that it can downregulate the immune response to cancer therapy, which reduces the infiltration of dendritic cells and inhibits effector T cell activation [7, 8]. However, which factors regulate the components of the TME and influence the immune response to immunotherapy is unclear. Patients with melanoma, colon, or lung cancer along with a high tumor mutational burden (TMB) could benefit from immunotherapy [911]. However, the association between the TMB and tumor immunogenicity in BRCA patients is also unclear. Therefore, understanding the relationship between prognosis, the TME, and the TMB is crucial to identifying potentially effective immunotherapies.

Advances in bioinformatics and machine learning have enabled the widespread screening of cancer therapy biomarkers using high-throughput sequencing data [12]. The weighted gene coexpression network analysis (WGCNA) performed on genetic clusters and constructed coexpression modules provides the relationship between genes and modules, enabling an association between modules and the phenotypic traits of tumors [13, 14]. Therefore, we have the tools to identify TME-related genes related to the phenotypic traits of tumors for further study.

Overall survival (OS) is an appropriate endpoint for many clinical studies, especially for research on glioblastoma multiform [15]. However, for studies involving the least aggressive breast cancer subtype, luminal A, the progression-free interval (PFI) is suitable [16]. After integrating TCGA Pan-Cancer clinical database, Liu et al. [16] recommended OS and the PFI as the best endpoint events for TCGA analysis. In our study, we used both OS and the PFI as endpoints for further study of BRCA patients.

In the present study, we adapted bioinformatics and machine learning to construct a nomogram to predict the survival probability of BRCA patients and screening hub gene related with TME, which could be a new biomarker for selecting patients who might be a more likely response to immune checkpoint blockade therapies.

2. Materials and Methods

2.1. Data Source

The RNA-Seq data and related clinical phenotypes were downloaded from TCGA database (http://cancergenome.nih.gov/). The PFI information was obtained from a previous study [17]. The immune score and stromal score of breast cancer samples were downloaded from MDACC (MD Anderson Cancer Center). The TMB data were obtained from the “tmb_data” source for the R package “UCSCXenaShiny.” Hallmark gene sets applied to the gene set enrichment analysis (GSEA) were sourced from the “msigdbr” package. Incomplete TNM stage samples were excluded, and 940 breast cancer samples were screened for further study. However, only 898 matched samples had PFI information available, which was required for a clinical study. Potential mRNA sequences were selected based on an RNA-Seq expression level greater than zero in at least half of the analyzed BRCA tissues (see Figure 1 for a workflow schematic).

2.2. Construction of the Weighted Gene Coexpression Network and Screening for Hub Genes

The R package “WCGNA” was used for the gene coexpression network analysis [18]. We set 200 as the height cut-off for the sample clustering analysis and log-transformed (log(count+1)) the RNA-Seq count value for further gene clustering analysis. We next calculated the optimal soft threshold for the adjacency computation. Twelve modules were screened for the WGCNA package one-step process. Gene significance (GS) was defined as the correlation between each trait and the gene expression level. The module membership (MM) was defined as the correlation between each module Eigengene (ME) and the gene expression level [13, 14]. The criteria for screening genes in relevant modules were a and a (). We also applied univariate and random forest analyses to screen for hub genes.

2.3. Construction and Calibration of the Nomogram

First, the stepwise multivariate Cox regression analysis was performed to construct the risk score formula, and the R package “survival” was applied (a model was chosen by AIC using a stepwise model selection) [3, 19, 20]. The following four-gene signature formula was constructed based on gene expression levels and coefficients ():

Second, the risk score was stratified into high-risk and low-risk groups based on the median. The risk score, M stage, N stage, tumor size, TNM stage, TMB, immune score, and age were selected as variables to construct the nomogram with the R package “rms.” We used the decision curve analysis (DCA), an emerging method for predicting the effectiveness of a model, to evaluate the discrimination of our nomogram [21]. The bootstrap method, a fast-development method based on random sampling with replacement, was used to internally validate the nomogram.

2.4. Hub Gene Pathway Enrichment Analysis and Correlation with Tumor-Infiltrating Immune Cells

We made a gene list to rank the correlation between a hub gene and others. Hallmark gene sets from the R package “msigdbr” were applied. A gene set enrichment analysis (GSEA) was then conducted with the “clusterProfiler” package [22]. We also performed a single-sample GSEA (ssGSEA) to identify the score of tumor-infiltrating immune cells in each sample using the “GSVA” package [23]. We next calculated the differential expression and correlation of tumor-infiltrating immune cells with the hub genes. We used the R package “circlize” to visualize the association between the hub genes and common immune checkpoint inhibitors.

2.5. Statistical Analysis

The Mann-Whitney -test was used to compare the relationship of continuous variables in two groups; otherwise, the Kruskal-Wallis -test was used. If the variables were categorical, the test (or Fisher’s exact test) was applied. All data and figures were analyzed and plotted with R (version 3.6.3).

3. Results

3.1. Association between the TME, the TMB, and Prognosis in BRCA Patients

The immune scores were divided into high-immune and low-immune score groups according to the optimal cut-off value. The K-M survival analysis and log-rank test were performed to identify the relationship between either OS or the PFI with the immune score. The survival probability of the high-immune score group was significantly higher than the low-immune score group, with a value of 0.0028 (Figure 2(a)). The probability of a progression-free interval was also significantly higher for the high-immune score group than the low-immune score group () (Figure 2(d)). Similarly, the stromal score and the TMB were classified into high-stromal and low-stromal score groups and high-TMB and low-TMB groups, respectively. The K-M survival analysis and log-rank test demonstrated that patients with a high-stromal score had no statistical difference with either OS (Figure 2(b)) or the PFI (Figure 2(e)) compared to patients with a low-stromal score. While patients with high-TMB have a lower survival probability than patients with low-TMB () (Figure 2(c)), patients with high-TMB had no statistical difference with PFI compared to patients with low-TMB () (Figure 2(f)). Patients in the high-immune score group had a significantly higher stromal score and TMB than patients in the low-immune score group (Figure 3).

3.2. Construction of the WGCNA and Identification of Corresponding Modules

Setting the criterion of protein-coding genes expressed in at least half of BRCA tissues, a total of 13721 mRNA sequences were screened for the WGCNA. The screen identified 12 modules (Figure 4(a)). The association between modules and traits was constructed and identified correlations between the green module and immune score (0.94) and the black module and stromal-score (0.77). Both values were less than 0.001 (Figure 4(b)). In the present study, eight was defined as a set point for the soft-threshold power, and the related scale-free topology index was 0.9 (Figure 4(c)). Therefore, we further analyzed the green and black modules. The relationships between GS and MM were identified for both the green and black modules; the correlation was 0.99 and 0.85, respectively ( values < 0.001) (Figures 4(d) and 4(e)). In the green and black modules, 753 and 180 mRNA sequences were identified, respectively. We defined the cut-off as a and a , narrowing our results for further study to 191 and 40 mRNA sequences in the green and black modules, respectively.

3.3. Calculation of the Risk Score by Random Forest, Univariate, and Multivariate Cox Analysis

Random forest was performed for a survival analysis of the 191 mRNA sequences from the green module and 40 mRNA sequences from the black module, respectively. A univariate Cox analysis was used to examine the relationships between the expression of the 191 mRNA sequences in the green module, and the expression of the 40 mRNA sequences in the black module, with OS. Based on these results, 37 mRNA sequences (Figure 5(a)) were selected for a stepwise multivariate Cox analysis. In addition, a four-mRNA risk score formula was established (Figure 5(b)). . Based on the median value, the risk score was classified into high-risk and low-risk groups. A K-M survival analysis and log-rank test demonstrated that the high-risk group had a lower probability of survival than the low-risk group ( value < 0.001) (Figure 5(c)).

3.4. Development of the Nomogram for Predicting Survival Probability

A multivariate Cox regression was performed to construct the prediction model (Figure 6(a)). The nomogram illustrates that the M stage made a robust contribution to the prediction of survival probability, followed by age, the tumor size, and the risk score. The score for each variable subtype was presented on a point scale. By calculating the total score and identifying it on the scale for the total possible points, we can easily obtain the survival probability of a patient. The DCA (decision curve analysis) is an emerging method to evaluate the discrimination of a predictive model [21]; it is widely used in many top journals, such as BMJ, JAMA, and NATURE. In our study, DCA to evaluate the predictability of our model demonstrated that the net benefit of our nomogram was significantly higher than others (Figure 6(b)). The -index for the nomogram was 0.713, and the calibration curve demonstrated good performance by the nomogram (Figure 6(c)).

3.5. Relationship between Four TME Genes and BRCA Endpoints

We divided four genes (MYO1G, TBC1D10C, SELPLG, and LRRC15) into high and low groups based on the median expression value. The K-M analysis and log-rank test were applied. SELPLG and MYO1G expression had statistically significant correlations with OS in both the high and low expression groups, but not with the PFI (Figures 7(a), 7(b), 7(e), and 7(f)). The differential expression of LRRC15 was not correlated with either OS or the PFI (Figures 7(c) and 7(g)). The TBC1D10C high-expression group had a higher positive association with OS and the PFI than the low-expression group (Figures 7(d) and 7(h)). These data demonstrated that TBC1D10C is a protective marker. Meanwhile, we found there is a significant association between breast cancer subtypes and the level of TBC1D10C expression () (Figure 8).

We next calculated the differential expression of all four genes in tumor tissue from TCGA database and normal tissue from the GTEx database. These data indicated that all four genes were more highly expressed in tumor tissues than in normal tissue (Figure 9).

3.6. Enrichment Pathway Analysis of TBC1D10C

Considering the differential prognosis for differential expression of TBC1D10C, we made a gene list to rank the correlation between TBC1D10C and other genes. GSEA indicated that it primarily correlated with, and thereby might be involved in, the allograft rejection, interferon-gamma response, inflammatory response, K-Ras signaling, TNFA signaling via the NFKB, and complement pathways (Figure 10(a)). The top 6 of the high enrichment score pathway is displayed in Figure 10(b).

3.7. Tumor-Infiltrating Immune Cells and Immune Checkpoints Correlation with TBC1D10C in BRCA Patients

We next evaluated the relationship of TBC1D10C expression levels with tumor-infiltrating immune cells in the breast cancer microenvironment using ssGSEA (Figure 11(a)). Only 3 of 28 immune cell types (central memory CD8 T cells, memory B cells, and neutrophils) that were differentially expressed had no statistical significance. The tumor-infiltrating immune cells, e.g., activated CD8 T cells and activated B cells, had a robust correlation with TBC1D10C ( value < 0.05) (Figure 11(b)).

Immune checkpoint blockade therapies are emerging and effective strategies for treating cancer [24]. We next explored the associations between TBC1D10C and immune checkpoints including PD1, PD-L1, TIGIT, CTLA-4, TIM-3, and LAG-3. The chord chart indicated that TBC1D10C had a robust positive correlation with PD1, TIGIT, and CTLA-4 (Figure 11(c)).

We further analyzed the relationship between the clinical characteristics of BRCA patients and TBC1D10C expression levels. These data indicated that the differential expression of TBC1D10C had a statistically significant difference according to the PAM50 subtype, M stage, N stage, stromal score, and immune score (Table 1).

4. Discussion

Recently, the relationship between the TME and immune therapy has become an increasingly hot issue. Differential quantities of tumor-infiltrating immune cells result in diverse responses to immunotherapy [25, 26]. We applied the immune and stromal scores downloaded from MD, which were calculated by the ESTIMATE algorithm [27], to represent the status of the TME. We next performed WGCNA to identify modules correlated with immune and stromal scores. Finally, four hub genes (MYO1G, TBC1D10C, SELPLG, and LRRC15) that correlated with the TME were selected to construct a nomogram to predict the survival probability using a univariate and multivariate Cox regression and random forest. The only gene to have a clear correlation with both OS and the PFI was TBC1D10C. We further focused on the function of TBC1D10C and its association with tumor-infiltrating immune cells and common immune checkpoints.

Overall survival (OS) is a vital endpoint with less ambiguity in defining an overall survival event. However, OS as an endpoint can attenuate a clinical study since noncancer death also qualifies as an endpoint event [28, 29]. The progression-free interval (PFI), characterized as the minimum follow-up time needed, is also used in many clinical trials [28], as patients with disease recurrence or progression usually have a long lifespan. In the present study, both OS and the PFI as endpoint events were included in our study to screen OS-related and PFI-related hub genes. Only the gene TBC1D10C had a clear correlation with both OS and the PFI. This indicates that TBC1D10C may be a potential biomarker in cancer recurrence and progression.

We also found that TBC1D10C is an immune-related gene using the Immport database (https://www.immport.org) [30]. TBC1D10C, also known as Carabin or EPI64C, is overexpressed in blood leukocytes and the spleen and negatively regulates the NF-κB signaling pathway via activity as a Ras GTPase-activating protein [31]. Moreover, TBC1D10C physically interacts with CaN T cells and H-Ras in addition to inhibiting Ras/MAPK signaling [31, 32]. Our GSEA indicated that TBC1D10C6 regulated the K-Ras-signaling pathway, which is consistent with the existing research [33]. Moreover, ssGSEA revealed that TBC1D10C has a high positive correlation with B cells and T cells, which is consistent with the research of Jiang et al. [31] but contradicts Schickel et al. [32]. The opposing results may be due to study differences; one focused on lymphoid disease while the other was interested in myocardial disease.

TMB is a vital promising biomarker that plays a key role in predicting the response to immune checkpoint blockade therapy in several cancer types [34]; however, few studies have focused on the significance of TMB in BRCA. Mei et al. [35] revealed that high TMBs occur at a low frequency in BRCA. Narang et al. [36] demonstrated that triple-negative breast cancers have the highest TMB, followed by the HER2-positive subtype. In our study, higher TMBs have a lower survival probability than lower TMB patients but have no statistical significance with the PFI. Conflicting results may be due to high tumor heterogeneity in BRCA and varying therapies.

Tumor immune checkpoint inhibitors have been proven as effective therapies for many malignant tumors. BRCA, however, has highly heterogeneous tumors, preventing many patients from benefiting from immunotherapy [37]. Misidentifying patients who cannot respond to immunotherapy is potentially fatal. TBC1D10C was highly correlated with three common immune checkpoints (PD-1, CTLA-4, and TIGIT) in our study and associated with OS and the PFI. These data suggest that TBC1D10C may be a new immune checkpoint.

There are still many gaps left to be understood. First, we only conducted an internal validation of the nomogram with bootstrapping. However, the lack of external validation limits the generalization of our results. Second, TCB1D10C was overexpressed in tumor tissue from TCGA database compared with normal tissue from the GTEx database; however, studies, such as RT-PCR and Western blot, are needed to validate TCB1D10C expression. Third, the function of TCB1D10C and the mechanism by which it regulates the K-Ras signaling pathway needs further exploration.

5. Conclusions

By performing WGCNA and machine learning, we constructed a TME-related gene signature model for predicting the survival probability of BRCA patients. Subsequently, we identified a hub gene, TBC1D10C, that correlated with OS and the PFI and had a high positive association with tumor-infiltrating immune cells and three common immune checkpoints (PD-1, CTLA-4, and TIGIT). TBC1D10C may be a new biomarker for identifying patients that would benefit from immunotherapy.

Data Availability

The RNA-Seq data and related clinical phenotypes were downloaded from TCGA database (http://cancergenome.nih.gov/). The PFI information was obtained from this study (doi:10.1016/j.celrep.2016.12.019). The immune score and stromal score of breast cancer samples were downloaded from MDACC (MD Anderson Cancer Center). The TMB data were obtained from the “tmb_data” source for the R package “UCSCXenaShiny.”

Conflicts of Interest

The authors declare that they have no competing interests.

Authors’ Contributions

HQ and RL designed the study and wrote the manuscript. YP, XZ, and ZY contributed to statistical analysis. WQZ and WZ reviewed and funded the manuscript. Huiying Qiao and Rong Lv contributed equally to this work.

Acknowledgments

We would like to thank TopEdit (http://www.topeditsci.com/) for English language editing of this manuscript. We would like to thank Research Square for presenting the preprint (https://www.researchsquare.com/article/rs-314139/v1). This work was supported by grants from the Science and Education Foundation of Wujiang District (wwk201906), Suzhou Science and Technology Development Plan (SKJYD2021028), and National Natural Science Foundation of China (82072908).