Background. The purpose of this study was to investigate the regulatory mechanisms of ceRNAs in breast cancer (BC) and construct a new five-mRNA prognostic signature. Methods. The ceRNA network was constructed by different RNAs screened by the edgeR package. The BC prognostic signature was built based on the Cox regression analysis. The log-rank method was used to analyse the survival rate of BC patients with different risk scores. The expression of the 5 genes was verified by the GSE81540 dataset and CPTAC database. Results. A total of 41 BC-adjacent tissues and 473 BC tissues were included in this study. A total of 2,966 differentially expressed lncRNAs, 5,370 differentially expressed mRNAs, and 359 differentially expressed miRNAs were screened. The ceRNA network was constructed using 13 lncRNAs, 267 mRNAs, and 35 miRNAs. Kaplan-Meier (K-M) methods showed that two lncRNAs (AC037487.1 and MIR22HG) are related to prognosis. Five mRNAs (VPS28, COL17A1, HSF1, PUF60, and SMOC1) in the ceRNA network were used to establish a prognostic signature. Survival analysis showed that the prognosis of patients in the low-risk group was significantly better than that in the high-risk group (). ROC analysis showed that this signature has a good diagnostic ability (). Compared with clinical features, this signature was also an independent prognostic factor (HR: 1.206, 95% CI 1.108−1.311; ). External verification results showed that the expression of the 5 mRNAs differed between the normal and tumour groups at the chip and protein levels (). Conclusions. These ceRNAs may play a key role in the development of BC, and the new 5-mRNA prognostic signature can improve the prediction of survival for BC patients.

1. Introduction

Breast cancer (BC) is a highly invasive and metastatic malignant tumour with a high incidence, which poses a serious threat to women’s health and quality of life [1]. In recent years, the incidence and mortality of BC have increased significantly, affecting progressively younger patients [2]. Currently, surgery, chemotherapy, and radiotherapy are the main treatment options for BC [3]. However, due to heterogeneity, breast tumours tend to show high recurrence rates and drug resistance, and the therapeutic effects and prognosis of the disease are not satisfactory [4]. There is an urgent need to develop individualized BC treatment strategies, such as identifying potential biomarkers and therapeutic targets. Therefore, screening for BC biomarkers is important for early diagnosis, improvement of prognosis, and reduction of mortality in BC.

Since Salmena et al. proposed the competitive endogenous RNA (ceRNA) hypothesis, an increasing number of studies have confirmed that ceRNA regulatory networks play an important role in the development of various tumours [5]. Liu et al. demonstrated that the long noncoding RNA (lncRNA) LINC00909 can interact with miR-194 to upregulate the expression of MUCl-C, thereby promoting the proliferation and invasion of glioma cells [6]. Zhao et al. showed that the lncRNA PVT1-214-induced inhibition of miR-128 regulated TrkC and further promoted gastric cancer cell proliferation [7]. Jiang et al. stated that lncRNA TDRG1 regulated the expression of MAPK1 in cervical cancer by targeting miR-326 and promoted the proliferation, migration, and invasion of cervical cancer cells [8]. In addition, studies have shown that pseudogene transcripts also can be used as ceRNAs and participate in tumour regulation. For example, lncRNA PTENP1 can compete with miR-106b to promote PTEN expression and inhibit cell proliferation, thereby inhibiting the development of cervical cancer [9]. The BRAF pseudogene (BRAFP1) can act as an RNA sponge to induce lymphoma in vivo [10]. The regulatory network of ceRNAs involving pseudogenes of the HMGA1 gene plays an important role in the occurrence of breast cancer [11].

In our study, we systematically analysed RNA-seq data on breast cancer (BC) from TCGA database and constructed a BC-related ceRNA network. Finally, a 5-mRNA prognostic signature was constructed based on the mRNAs in the network. The signature-related genes showed stable expression at the chip and protein levels, indicating that this model has value in clinical applications and can be used as a supplement to TNM staging.

2. Methods

2.1. Data Acquisition and Screening of Differentially Expressed Genes

BC data from 514 samples (41 paracancerous tissues and 473 cancer tissues) were downloaded from TCGA website, starting with gene data. The protein-encoded mRNAs and lncRNAs were isolated from the gene data using reannotation, and the differences between mRNAs and lncRNAs were analysed based on and . This step was performed using the edgeR package [12]. Similarly, miRNA data from 458 BC samples (8 paracancerous tissues and 450 cancer tissues) were downloaded from TCGA website using the same criteria and methods for differential expression screening. The workflow is shown in Figure 1.

2.2. ceRNA Network Construction

To further explore the detailed regulatory mechanism between lncRNAs and miRNAs, we used the miRcode database to compare DElncRNAs and DEmiRNAs, which were selected according to the above criteria. Next, TargetScan, miRTarBase, and miRDB were used to predict the target genes of the miRNAs in the lncRNA-miRNA network. The obtained target genes, DEmiRNAs obtained by the edgeR intersection, and the necessary mRNA components of the lncRNA-miRNA network were determined. Finally, Cytoscape was used to visualize the lncRNA-miRNA regulatory network, which is termed the ceRNA network.

2.3. GO Enrichment Analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) Pathway Analysis

Biological process (BP), cellular components (CC), and molecular function (MF) are usually used as the main domains of Gene Ontology (GO), as they can specifically identify the function of genes in the body. KEGG was used to describe gene participation in signalling pathways to determine the biological functions of different genes in the ceRNA network and their associated signalling pathways. In the present study, the clusterProfiler package was used for GO and KEGG enrichment analyses [13]. The GO terms and KEGG pathway with an adjusted were considered statistically significant.

2.4. Identification of Survival-Related RNAs

To accurately explore the mRNAs in the ceRNA network that play an important role in the prognosis of BC patients, we analysed all of the mRNAs by univariate analysis. In univariate analysis, those with , which are likely related to prognosis, will be included in the multivariate Cox regression analysis. Furthermore, the mRNA prognostic signature was constructed based on the multivariate regression results. The prognostic value of the lncRNAs and miRNAs in the network was determined using the K-M method.

2.5. Validation of Signature

To determine whether the prognosis of patients with different risks, according to mRNA signature stratification, was significant, we used log-rank analysis to compare the high-risk and low-risk groups. Multivariate regression was used to evaluate the value of the risk score compared with other clinical features. In addition, to verify whether the core genes of the signature are significantly expressed at the chip and protein levels, we used the GSE81540 dataset from bc-GeneExMiner V4.4 (http://bcgenex.centregauducheau.fr) and the CPTAC online database (https://proteomics.cancer.gov/data-portal) for validation.

3. Results

3.1. Screening of Differentially Expressed RNAs

A total of 514 breast tissue samples were downloaded from TCGA website for follow-up analysis. Of these, 41 samples were paracancerous tissues and 473 were cancerous tissues. Following standardization with edgeR, the differences between the lncRNAs and mRNAs were analysed based on and . A total of 2,966 differentially expressed genes, 2,146 upregulated genes, and 820 downregulated genes were identified (Figures 2(a) and 2(b)). The top 10 differentially expressed lncRNAs are shown in Table 1 (part of the lncRNAs). The same method was used to further process the miRNA data from 458 BC samples downloaded from TCGA database (8 paracancerous tissues and 450 cancerous tissues). A total of 359 differentially expressed miRNAs were screened, of which 210 were upregulated and 149 were downregulated (Figures 2(c) and 2(d)). The top 10 differentially expressed miRNAs are shown in Table 1 (part of the miRNAs). The results revealed 5,370 differentially expressed genes, 3,031 upregulated genes, and 2,339 downregulated genes (Figures 2(e) and 2(f)). The top 10 differentially expressed mRNAs are also shown in Table 1 (part of the mRNAs).

3.2. ceRNA Regulatory Networks

The ceRNA network plays an important role in the occurrence and development of breast cancer; thus, elucidation of the internal regulatory relationship of the network is critical. First, 2,966 differentially expressed lncRNAs were compared with 359 differentially expressed miRNAs using 4 databases (miRcode, TargetScan, miRTarBase, and miRDB). The results showed that 13 lncRNAs were compared to 35 miRNAs. The above database was also used to predict the target genes of miRNAs; the predicted results intersected with the differentially expressed mRNAs, and 477 mRNAs were included in the intersection. Next, the database comparison was performed again, and the results showed that 35 miRNAs were compared with 267 mRNAs. Based on the above results, the ceRNA regulatory network was successfully constructed.

3.3. mRNA Functional Enrichment Analysis of the ceRNA Networks

Different genes have different important functions in the occurrence and development of diseases, so it is particularly important to understand the function of genes and the signalling pathways in which they may be involved. The clusterProfiler R package was used to analyse the enrichment of differentially expressed genes by GO and KEGG. BP analysis indicated that the genes were enriched in the regulatory system process, circulatory system process, and blood circulation (Figure 3(a)). CC analysis indicated that the genes were enriched in a proteinaceous extracellular matrix, apical cell region, and transporter complex (Figure 3(b)). Regarding MM, these genes mainly participated in passive transmembrane transporter and channel and metal ion transmembrane transporter activities (Figure 3(c)). According to the KEGG pathway analysis, the genes were significantly enriched in neuroactive ligand-receptor interaction, protein digestion and absorption, cytokine-cytokine receptor interaction, and calcium signalling pathway (Figure 3(d)).

3.4. Construction of the 5-mRNA Signature and Survival Analysis of ceRNAs

To accurately identify the potential biological targets of BC and the prognosis-related genes, we combined the mRNAs obtained from the ceRNA networks with prognostic information. In total, 18 genes were found to be significantly associated with overall survival (OS; ). These genes were subsequently subjected to stepwise multivariate Cox regression analysis. Finally, 5 independent genes were selected, and a gene-based prognostic model was established to estimate the survival of patients using the following equation: (Table 2). The prognostic effect of lncRNAs and miRNAs on the network cannot be ignored, so we analysed the effects of lncRNAs and miRNAs in the network on prognosis. The results showed that 2 lncRNAs (AC037487.1 and MIR22HG) were associated with OS. High expression of AC037487.1 indicated a poorer prognosis than low expression (Figure 4(a)). In contrast, patients with high expression of MIR22HG often obtained a satisfactory prognosis (Figure 4(b)).

3.5. Signature Performance and Verification

To assess the performance of the model in predicting overall patient survival, we assigned a risk score to each patient using a prognostic model based on the 5 genes. The patients were classified as high risk or low risk based on the median value of the risk score. Kaplan-Meier analysis showed significant differences in the OS curves between the two groups (; Figure 5(a)). ROC curve analysis of the 10-year survival rate was performed to evaluate the predictive potential of the 5 genes. The area under the curve (AUC) for the 5-gene signature-based prognostic model was 0.77 at an OS of 120 months (Figure 5(b)). In addition, the expression levels of these 5 genes from each patient were analysed (Figure 5(c)); green indicates decreased expression and red indicates increased expression. Furthermore, compared with other clinical features, the 5-mRNA signature showed excellent prognostic value and was an independent prognostic factor for BC patients (HR: 1.206, 95% CI 1.108−1.311; ).

3.6. Expression of the 5 mRNAs at the Chip and Protein Levels

To verify whether the 5 genes also have consistent expression results in external databases, we used the GSE81540 dataset and CPTAC to verify expression at the chip and protein levels. The results showed that, regardless of the chip or protein levels, VPS28, HSF1, and PUF60 showed higher expression in tumour tissues than in normal tissues (Figures 6(a), 6(c), and 6(d); Figures 7(a), 7(c), and 7(d)), while COL17A1 and SMOC1 showed significantly higher expression in the adjacent tissues compared with the tumour tissues (Figures 6(b) and 6(e); Figures 7(b) and 7(e)). In addition, according to Figure 6, we found that VPS28 was significantly highly expressed in the luminal B subtype, while the expression pattern of SMOC1 in this subtype was the opposite. However, we also observed that the expression of HSF1 in the luminal A subtype was significantly lower than that of other subtypes, and PUF60 showed a higher expression trend in the TNBC (basal-like) subtype. Notably, compared with that in the TNBC (basal-like) and luminal A subtypes, the expression of COL17A in the HER2+ subtype was significantly reduced. This result implies that these five mRNAs are related to the different BC subtypes. Deregulation of these five mRNAs may become a new breakpoint for the treatment of different BC subtypes.

4. Discussion

Analysis of ceRNA networks and gene signatures in tumours is ongoing. However, research has generally focused on one topic, either ceRNAs or a signature. Few studies have investigated both topics. We innovatively built a 5-mRNA signature based on the ceRNA network, and this prediction model is highly accurate and stable.

The mechanism underlying progression and metastasis of BC remains unclear, but with the development of high-throughput sequencing technology, our understanding of this disease is deepening. In a previous study, Lattrich et al. found that systemic miR-195 is a BC-specific biomarker that distinguishes BC from other types of cancer with a sensitivity of 88% and specificity of 91% [14]. Through mouse and human cell experiments, Ma et al. demonstrated that miR-10b is highly expressed in metastatic BC cells and can mediate cell migration in the process of forward transfer and invasion [15]. In addition, Zhang et al. revealed that the miR-191/425 cluster promotes the growth, invasion, and metastasis of breast tumours in vivo [16]. These results indicated that miRNAs play an important role in the development of BC. In the present study, a total of 35 miRNAs were included in the ceRNA network, suggesting that these 35 miRNAs may be involved in the development of BC. Unfortunately, when the association between these 35 miRNAs and the OS of patients was evaluated, no miRNAs affecting the prognosis of patients were found in the network. We examined the cause of this result. In Kaplan-Meier analysis, there were too many competitive lncRNAs and downstream mRNAs, which weakened the prognostic value of the miRNAs.

Multiple studies have revealed a close association between lncRNAs and the development of BC [17, 18]. lncRNAs are considered to be potential therapeutic targets for BC and effective predictors of prognosis. Jiang et al. demonstrated that the overexpression of the new lncRNA NLIPMT reduced the phosphorylation of glycogen synthase kinase 3β, thereby inhibiting BC metastasis [19]. Shima et al. found that high expression of lncRNA H19 is usually associated with poor prognosis in BC patients, particularly in triple-negative BC [20]. Yu et al. showed that lncRNA HOTAIR regulates the growth, migration, invasion, and apoptosis of BC cells [21]. The ceRNA network explored in the present study contained a total of 13 abnormally expressed lncRNAs. Among them, MIR22HG and ACO37487.1 may be key oncogenes and prognostic markers of BC progression, as they are involved not only in the construction of key nodes in the ceRNA network but also in the OS of BC patients. This conclusion was also confirmed by other researchers. Studies have shown that a high expression of MIR22HG is positively correlated with the OS of patients with hepatocellular carcinoma following hepatectomy, and a high expression of MIR22HG could suppress gastric cancer progression by attenuating NOTCH2 signalling [22, 23]. However, there are no reports of cancer types associated with ACO37487.1, which indicates that ACO37487.1 may be a new target for the diagnosis and treatment of BC.

Due to the differences among individuals, the traditional TNM staging system cannot accurately predict the clinical prognosis of patients [24]. Researchers have attempted to build a new prognostic model to improve the accuracy and sensitivity of survival predictions for BC patients [25, 26]. In the present study, a prognostic risk assessment model for BC was constructed based on Cox multivariate regression results. An AUC of 0.77 indicated that the genetic model has a certain sensitivity and specificity. Five mRNAs in the signature were screened: VPS28, COL17A1, HSF1, PUF60, and SMOC1. Among them, COL17A1, HSF1, and PUF60 are reportedly associated with cancer, as discussed below. COL17A1 encodes the alpha chain of type XVII collagen. COL17A1, as a downstream target of p53, was shown to affect cell migration and invasion, and patients with high expression of this gene often have a better prognosis than those with low expression [27]. Heat shock transcription factor 1 (HSF1), a member of the HSF family of transcription factors, regulates abnormal signals of tumour cells and promotes metastasis and metabolism of BC tumour cells. This finding suggests that HSF1 plays an important role in the occurrence and development of breast tumours [28]. PUF60, an oncogene, is highly expressed in BC tissue samples, and its high expression is closely related to the high lymph node metastasis rate and TNM stage of breast cancer [29]. Nevertheless, the relationships among VPS28, SMOC1, and breast cancer are not currently known, and thus, further investigation in BC patients is required.

There is always an inevitable bias in the data of a single centre. To eliminate this bias and verify the stability of the model, we used an external database to verify the expression of the 5 mRNAs in the signature. The results showed that at both the chip and protein levels, the 5 mRNAs showed stable expression. Thus, these proteins can be detected in BC patients. By applying these results to our model, researchers could obtain a reasonable risk stratification while avoiding excessive medical treatment for low-risk patients and provide a systematic and comprehensive diagnosis and treatment for high-risk patients.

In conclusion, these ceRNAs may play a key role in the development of BC, and the new 5-mRNA prognostic signature was valid and may improve the prediction of survival for BC patients.

Data Availability

The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

Conflicts of Interest

The authors declare that they have no competing interests.

Authors’ Contributions

RZ and WS designed the experiment, provided financial support, revised the manuscript, and gave final approval of the version to be published. WS and DH performed the statistical analysis and wrote the paper. SL made substantive contributions to the work, including data collecting and manuscript revising. Wenjie Shi and Daojun Hu contributed equally to this work.


We thank the American Journal Experts (AJE) for their language editing support.