Research Article | Open Access
Wenqing Zhou, Yongkui Pang, Yunmin Yao, Huiying Qiao, "Development of a Ten-lncRNA Signature Prognostic Model for Breast Cancer Survival: A Study with the TCGA Database", Analytical Cellular Pathology, vol. 2020, Article ID 6827057, 9 pages, 2020. https://doi.org/10.1155/2020/6827057
Development of a Ten-lncRNA Signature Prognostic Model for Breast Cancer Survival: A Study with the TCGA Database
Long noncoding RNA (lncRNA) plays a critical role in the development of tumors. The aim of our study was construction of a lncRNA signature model to predict breast cancer (BRCA) patient survival. We downloaded RNA-seq data and relevant clinical information from the Cancer Genome Atlas (TCGA) database. Differentially expressed lncRNA were computed using the “edgeR” package and subjected to the univariate and multivariate Cox regression analysis. Corresponding protein-coding genes were used for Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genome (KEGG) pathway analysis. Finally, 521 differentially expression lncRNA were obtained. We constructed a ten-lncRNA signature model (LINC01208, RP5-1011O1.3, LINC01234, LINC00989, RP11-696F12.1, RP11-909N17.2, CTC-297N7.9, CTA-384D8.34, CTC-276P9.4, and MAPT-IT1) to predict BRCA patient survival using the multivariate Cox proportional hazard regression model. The C-index was 0.712, and AUC scores of training, test, and entire sets were 0.746, 0.717, and 0.732, respectively. Univariate Cox regression analysis indicated that age, tumor status, N status, M status, and risk score were significantly related to overall survival in patients with BRCA. Further, the multivariate analysis showed that risk score and M status had outstanding independent prognostic values, both with . The Gene Ontology (GO) function and KEEG pathway analysis was primarily enriched in immune response, receptor binding, external surface of plasma membrane, signal transduction, cytokine-cytokine receptor interaction, and cell adhesion molecules (CAMs). Finally, we constructed a ten-lncRNA signature model that can serve as an independent prognostic model to predict BRCA patient survival.
Breast cancer (BRCA) is the most commonly diagnosed cancer and the leading cause of cancer death among women worldwide . Although BRCA death is declining due to early detection and improved treatment, significant variability in patient outcomes remains. Almost 62667 died of this malignancy in 2018 . The prognosis of BRCA is affected by multiple factors including age, tumor size, grade, lymph node involvement, histology, hormone-receptor status, HER-2 status, and positive margins . Many clinical prediction models for predicting patient prognosis and disease-free survival have been proposed, mainly focusing on age at diagnosis; post-menopausal status; ER, HER-2, and ki-67 status; tumor size; lymph node involvement; metastasis; and therapeutic strategy [4–7]. These models are difficult to implement in clinical practice due to incomplete diagnostic characteristics and model limitations.
Next-generation sequencing (NGS) allows continuing identification of biomarkers for tumor diagnosis and prognosis. Such models increasingly focus on mRNA, miRNA, and lncRNA. Li et al.  used NGS data available in several databases, including the Geno Expression Omnibus (GEO), the Cancer Genome Atlas (TCGA), the Human Protein Atlas, and the International Cancer Genome Consortium, to construct a six-gene model (SQSTM1, AHSA1, VNN2, SMG5, SRXN1, and GLS) to assist clinicians in selecting personalized treatment for patients with hepatocellular carcinoma. Shi et al. , using the TCGA, identified a five-lncRNA signature model (AC069513.4, AC003092.1, CTC-205M6.2, RP11-507K2.3, and U91328.21) to inform the prognosis of patients with clear cell renal cell carcinoma. Lv et al.  constructed a six-mRNA signature model (TMEM252, PRB2, SMCO1, IVL, SMR3B, and COL9A3) that may aid prognosis for patients with triple-negative BRCA. Zhu et al.  built a four-lncRNA signature model (PVT1, MAPT-AS1, LINC00667, and LINC00938) to predict BRCA survival, but the AUC value for the time-dependent receiver operation characteristic (ROC) curve was only 0.64. To date, few studies of multi-lncRNA signatures in BRCA are available, and functions and mechanisms of lncRNA in BRCA have yet to be explored.
In this study, we deeply mined and analyzed high-throughput sequencing data and clinical characteristics from the TCGA. We subsequently developed a ten-lncRNA signature model that effectively predicts BRCA survival and demonstrated its independence from clinical factors.
2. Material and Methods
2.1. Data Source
The RNA-seq data and corresponding clinical information were mined from TCGA: http://cancergenome.nih.gov/). As of December 2019, 1164 clinical samples and relevant gene expression information were obtained. BACA samples with repeated or incomplete prognostic information were eliminated. Ultimately, 1035 BRCA samples and 111 normal samples were collected for the construction of the model and coexpression analysis. TCGA is an open public database and, ethics approval was not needed for the study.
2.2. Identification of Differentially Expressed lncRNA
RNA count data were obtained from the TCGA data portal, and expression levels of lncRNA and mRNA were determined using Reads Per Kilobase of exons per Million mapped Reads (RPKM). Potential lncRNA were identified based on (1) transcriptome sequences mapped in corresponding lncRNA rather than protein-coding regions, (2) transcriptome sequences annotated in GENCODE data, and (3) transcription sequences expressed in at least half of BRCA tissues. The expression profile of lncRNA was defined as of 1146 BRCA samples. Finally, a total of 6129 lncRNA were enrolled. Differentially expressed lncRNA were identified using the R software “edgeR” package.
2.3. Statistical Analysis and Definition of the lncRNA-Related Prognostic Model
BRCA samples (1035 sequences) were randomly divided into a training set () and a test set (). The Univariate Cox regression analysis was used to examine associations among lncRNA expression levels and overall survival (OS) in the training set. lncRNA were considered significant when values were <0.05. These sequences were then used for the stepwise multivariate Cox regression analysis, using the R package “survival” (choose a model by AIC in a stepwise algorithm) [12–14]. Based on expression levels and coefficients () from multivariate Cox proportional hazards regression analysis, a novel ten-lncRNA-based prognostic risk score formula was defined [13–16]. The risk score formula was as follows:
A risk score for each patient in the training set was then calculated. BRCA patient samples can be divided into high-risk and low-risk groups, respectively, based on the median risk score as a cutoff. The Kaplan–Meier survival curve and the log-rank test were used to assess the prognostic value of the risk score using the R package “survival.”
The receiver operation characteristic (ROC) curve analysis within 3- and 5-year, using the R package “survivalROC,” compared sensitivity and specificity of survival predictions. Subsequently, we compared model predictions with traditional clinical risk factors (age, risk, stage, metastasis, tumor size, and lymph node involvement) using the univariate and multivariate Cox analysis. We also reassessed the relationship between risk level and clinical characteristics using the chi-square test. was considered statistically significant. All data were analyzed using R scripts (version 3.6.1). All the figures were plotted by ggplot2 (version 3.2.1).
2.4. Functional Enrichment Analysis
To explore functional implications of lncRNA, Spearman’s correlation coefficients were calculated between related lncRNA and protein-coding genes. Related protein-coding genes were screened for functional enrichment analysis. We subsequently performed GO analysis and the Kyoto Encyclopedia of Genes and Genome (KEGG) pathway enrichment analysis of differential expression protein-coding genes using the Database for Annotation, Visualization, and Integration Discovery (DAVID version 6.7).
3.1. Differentially Expressed lncRNA and Clinical Characteristic
We analyzed specific baseline clinical characteristics of 1035 BRCA patients (Table 1). We selected lncRNA expression profiles from raw RNA-seq expression data, and then, differentially expressed lncRNA between BRCA samples and normal samples were identified following and false discovery rate . This analysis recognized 406 upregulated lncRNA and 115 downregulated lncRNA (Figure 1).
Age: the age of patient at diagnosis; OS time: overall survival time.
3.2. Identification of lncRNA Associated with the OS of Patients from the Training Set and Validated in Test Set and Entire Sets
The training set was first analyzed to identify possible prognostic lncRNA; then, the test and entire sets were used for validation. We performed univariate and multivariate Cox regression analysis to identify correlation among differentially expressed lncRNA and OS of BRCA patients using the training set. Finally, 32 of the 521 differentially expressed lncRNA were found to be associated with survival time () by performing univariate Cox regression analysis (Table 2). In addition, a prognostic model, composed of 10 lncRNA, was established by performing a stepwise multivariate Cox proportional hazard regression model (Figure 2). . The C-index for model was 0.712 (CI 0.686-0.740), and the calibration curve showed good performance for the prognostic model (Figure 3).
HR: hazard ratio. Coef: coefficient.
Patients in the training set were classified into high-risk groups and low-risk groups using median risk score (0.938) as a cutoff. The survival rate of the high-risk group was significantly lower than the low-risk group in the Kaplan–Meier method and the log-rank test (Figure 4(a)). Subsequently, the prognostic ability of the model was assessed by calculating AUC of the time-dependent ROC curve. Generally, an AUC from 0.7 to 0.9 is deemed reliable. The AUC value of the training set was 0.760 in 3 years, 0.746 in 5 years (Figure 4(d)), indicating that the ten-lncRNA signature model shows high sensitivity and specificity.
We next validated our ten-lncRNA signature model with test and entire data sets. We again used risk scores for all patients in test and entire sets and divided patients into high- and low-risk groups based on the same threshold from the training set. Again, we found that survival rate of the high-risk group was significantly lower than the low-risk group in both test and entire sets. (Figures 4(b) and 4(c)). Time-dependent ROC curve analysis for the ten-lncRNA signature model achieved an AUC score of 0.717 in both 3- and 5-year for the test set, and the AUC score of the entire set was 0.741 in 3 years and 0.732 in 5 years, respectively (Figures 4(e) and 4(f)).
3.3. Independence of the Ten-lncRNA Signature and Other Clinical Variables
We assessed whether survival prediction based on the 10-lncRNA signature model was independent of clinical characteristics. Univariate Cox regression analysis indicated that age, tumor status, N status, M status, and risk score were significantly related to OS in the entire set. Also, the multivariate Cox analysis indicated that risk score and M status show outstanding independent prognostic value, both with (Table 3). Further, based on tests, the risk level had no association with age, stage, metastasis, tumor size, and lymph node involvement (Table 4). Collectively, our study demonstrates that the ten-lncRNA signature prognostic model is a robust tool for predicting the prognosis of BRCA patients.
HR: hazard ratio. CI: confidence interval.
Chi-square test was used.
3.4. Functional Characteristic of Ten Prognostic lncRNA
Biological functions of lncRNA remain unclear, but the expression of lncRNA is remarkably correlated with neighboring protein-coding genes. We obtained expression profiles of protein-coding genes from raw RNA-seq data and extracted corresponding protein-coding genes with ten lncRNA. Spearman’s correlation coefficients with and as the cutoff yielded 1178 protein-coding genes for stepwise functional enrichment analysis.
The GO function and KEGG pathway enrichment analysis of protein-coding genes used DAVID bioinformatics resources 6.7. BP results showed that protein-coding genes were enriched for signal transduction, immune response, inflammatory response, and positive regulation of transcription from RNA polymerase II promoter (Figure 5(a)). Characteristics of enrichment in MF were primarily transmembrane signaling receptor activity, protein binding, protein homodimerization activity, calcium ion binding, and receptor binding (Figure 5(b)). For CC analysis, genes were enriched in integral components of the plasma membrane, integral components of membranes, plasma membrane, extracellular exosome, and external side of the plasma membrane (Figure 5(c)). Also, results from the KEGG pathway analysis were enriched for cytokine-cytokine receptor interaction, cell adhesion molecules (CAMs), calcium signaling pathway, transcriptional dysregulation in cancer, and HTLV-I infection (Figure 5(d)).
High-throughput sequencing technology produces increasing amounts of sequencing data for studies of cancer diagnosis, therapy, and prognosis . Current studies focus on ncRNA associated with cancer, especially lncRNA. Some studies confirm that lncRNA plays a crucial role in th4e occurrence and progress of tumors, such as gastric , colon , and BRCAs .
In the present study, we downloaded RNA-seq data and clinically relevant information related to BRCA from the TCGA database. We obtained 1164 clinical samples and corresponding gene expression information. A total of 521 differently expression lncRNA involved in BRCA were pulled from the TCGA database, including 406 upregulated and 115 downregulated genes. Subsequently, univariate and multivariate Cox regression analyses identified correlations among differentially expressed lncRNA and OS in a training set. These correlations were used to establish a risk model for predicting BRCA prognosis. A 10-lncRNA signature risk prediction model (LINC01208, RP5-1011O1.3, LINC01234, LINC00989, RP11-696F12.1, RP11-909N17.2, CTC-297N7.9, CTA-384D8.34, CTC-276P9.4, and MAPT-IT1) was produced. Patients were subdivided into high- and low-risk groups based on the median risk score. Three-year AUC values for the time-dependent ROC curve in the training, test, and entire sets were 0.760, 0.717, and 0.741, respectively, indicating outstanding performance in survival prediction. Recently, Zhu et al.  and Li et al.  have proposed a breast cancer prognosis model based on RNA-seq, and three-year AUC values of their models are 0.641 and 0.711 in the training set, respectively. Therefore, the performance of our model outperforms these two models. We also compared the risk model and clinical parameters (age, stage, metastasis, tumor size, and lymph node involvement) using univariate and multivariate Cox analysis. The prognostic value of the model was independent of other clinical factors in BRCA, but the functional relationship between risk score and tumor development was unclear.
Previously, Liao et al.  showed that LINC01234 knockdown suppressed cell proliferation, migration, and invasion of colorectal cancer cells, while blocking the cell cycle and inducing cell apoptosis. Chen et al.  found that LINC01234 functioned as a ceRNA for miR-304-5p, resulting in derepression of its endogenous target core-binding factor. In addition, Chen et al.  found that LINC01234 expression is increased in non-small-cell lung cancer tissues, and its upregulation is associated with metastasis and shorter survival. Downregulation of LINC01234 impairs cell migration and invasion in vitro and inhibits cell metastasis in vivo by serving as a competing endogenous RNA for the miR-340-5p and miR-27b-3p; LINC01234 also affects RNA-binding proteins LSD1 and EZH2, leading to histone modification and transcriptional repression of antiproliferative gene BTG2 [24, 25]. In another study, Wang et al.  found that the expression of RP11-909N17.2 is positively associated with colorectal cancer outcomes and prognosis; in our study, RP11-909N17.2 is a protective factor in BRCA prognosis. This discrepancy requires further study.
LINC00989 and MAPI-IT1 are associated with congenital diseases [26, 27], and their relationship with cancer remains unclear. No studies have reported associations between LINC01208, RP5-1-11O1.3, RP11-696F12.1, CTC-297N7.9, CTA-384D8.34, CTC-276P9.4, and cancer, but we speculate that these lncRNA may be involved in BRCA tumorigenesis. More research effort is necessary to test this hypothesis.
Many issues remain to be addressed. First, we only download data from the TCGA database. More data are available in other databased that could prove valuable for the risk model. Second, lncRNA play important roles in the occurrence and progress of tumors, but the function of lncRNA in the signature is unclear. Additional experimental study of these lncRNA may help understand functional mechanisms and thus the functional basis for the ten lncRNA for prognosis of BRCA.
We identified differentially expressed gene associated with the pathogenesis of breast cancer and constructed a ten lncRNA prognostic model to predict prognosis of patients with BRCA. The prognostic model presented a good performance in 3- and 5-year OS prediction. Functional mechanisms of these lncRNA have not yet been investigated. Prospective studies are needed to further validate the utility of the ten-lncRNA prognostic model.
The raw data and relevant R code used to support the findings of this study are recorded in https://github.com/zhouwenqing789/TCGA-MODEL-LCNRNA-BRCA.git
Conflicts of Interest
The authors declare that they have no competing interests.
Wenqing Zhou and Yongkui Pang contributed equally to this work.
This work was supported by the Science and Education Foundation of Wujiang district (wwk201906). We would like to thank professor Wei Zhu from the Zhongshan Hospital of Fudan University for her technical guidance.
- C. W. S. Tong, M. Wu, and W. C. S. Cho, “Recent advances in the treatment of breast cancer,” Frontiers in Oncology, vol. 8, 2018.
- F. Bray, J. Ferlay, I. Soerjomataram, R. L. Siegel, L. A. Torre, and A. Jemal, “Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries,” CA: a Cancer Journal for Clinicians, vol. 68, no. 6, pp. 394–424, 2018.
- E. Senkus, S. Kyriakides, S. Ohno et al., “Primary breast cancer: ESMO Clinical Practice Guidelines for diagnosis, treatment and follow-up,” Annals of Oncology, vol. 26, Supplement 5, no. 26, pp. v8–30, 2015.
- F. J. Candido dos Reis, G. C. Wishart, E. M. Dicks et al., “An updated predict breast cancer prognostication and treatment benefit prediction model with independent validation,” Breast Cancer Research, vol. 19, no. 1, 2017.
- V. Guarneri, F. Piacentini, G. Ficarra et al., “A prognostic model based on nodal status and Ki-67 predicts the risk of recurrence and death in breast cancer patients with residual disease after preoperative chemotherapy,” Annals of Oncology, vol. 20, no. 7, pp. 1193–1198, 2009.
- P. M. Ravdin, L. A. Siminoff, G. J. Davis et al., “Computer program to assist in making decisions about adjuvant therapy for women with early breast cancer,” Journal of Clinical Oncology, vol. 19, no. 4, pp. 980–991, 2001.
- D. Liu, D. Wang, C. Wu et al., “Prognostic significance of serum lactate dehydrogenase in patients with breast cancer: a meta-analysis,” Cancer Management and Research, vol. 11, pp. 3611–3619, 2019.
- W. Li, J. Lu, Z. Ma, J. Zhao, and J. Liu, “An integrated model based on a six-gene signature predicts overall survival in patients with hepatocellular carcinoma,” Frontiers in Genetics, vol. 10, 2019.
- D. Shi, Q. Qu, Q. Chang, Y. Wang, Y. Gui, and D. Dong, “A five-long non-coding RNA signature to improve prognosis prediction of clear cell renal cell carcinoma,” Oncotarget, vol. 8, no. 35, pp. 58699–58708, 2017.
- X. Lv, M. He, Y. Zhao et al., “Identification of potential key genes and pathways predicting pathogenesis and prognosis for triple-negative breast cancer,” Cancer Cell International, vol. 19, no. 1, 2019.
- M. Zhu, Q. Lv, H. Huang, C. Sun, and J. W. Da Pang, “Identification of a four-long non-coding RNA signature in predicting breast cancer survival,” Oncology Letters, vol. 19, no. 1, pp. 221–228, 2020.
- X. Wang, J. Zhou, M. Xu et al., “A 15-lncRNA signature predicts survival and functions as a ceRNA in patients with colorectal cancer,” Cancer Management and Research, vol. 10, pp. 5799–5806, 2018.
- Z.-R. Zhou, W.-W. Wang, Y. Li et al., “In-depth mining of clinical data: the construction of clinical prediction model with R,” Annals of Translational Medicine, vol. 23, no. 7, 2019.
- P. Andersen and R. Gill, “Cox's regression model for counting processes: a large sample study,” Annals of Statistics, vol. 10, no. 4, pp. 1100–1120, 1982.
- H. J. Cho, A. Yu, S. Kim, and J. Kang, “Robust likelihood-based survival modeling with microarray data,” Journal of Statistical Software, vol. 29, no. 1, pp. 1–16, 2009.
- S. Abe, S. Zhang, Y. Tomata, T. Tsuduki, Y. Sugawara, and I. Tsuji, “Japanese diet and survival time: the Ohsaki Cohort 1994 study,” Clinical Nutrition, vol. 39, no. 1, pp. 298–303, 2020.
- A. E. Zou, J. Ku, T. K. Honda et al., “Transcriptome sequencing uncovers novel long noncoding and small nucleolar RNAs dysregulated in head and neck squamous cell carcinoma,” RNA, vol. 21, no. 6, pp. 1122–1134, 2015.
- Y. X. Song, J. X. Sun, J. H. Zhao et al., “Non-coding RNAs participate in the regulatory network of CLDN4 via ceRNA mediated miRNA evasion,” Nature Communication, vol. 8, no. 1, 2017.
- J. X. Liu, W. Li, J. T. Li, F. Liu, and L. Zhou, “Screening key long non-coding RNAs in early-stage colon adenocarcinoma by RNA-sequencing,” Epigenomics, vol. 10, no. 9, pp. 1215–1228, 2018.
- D. Wang, J. Li, F. Cai et al., “Overexpression of MAPT-AS1 is associated with better patient survival in breast cancer,” Biochemistry and Cell Biology, vol. 97, no. 2, pp. 158–164, 2019.
- H. Li, C. Gao, L. Liu et al., “7-lncRNA assessment model for monitoring and prognosis of breast cancer patients: based on Cox regression and co-expression analysis,” Frontiers in Oncology, vol. 9, 2019.
- X. Liao, W. Zhan, J. Zhang et al., “Long noncoding RNA LINC01234 promoted cell proliferation and invasion via miR-1284/TRAF6 axis in colorectal cancer,” Journal of Cellular Biochemistry, pp. 1–15, 2020.
- X. Chen, Z. Chen, S. Yu et al., “Long noncoding RNA LINC01234 functions as a competing endogenous RNA to regulate CBFB expression by sponging miR-204-5p in gastric cancer,” Clinical Cancer Research, vol. 24, no. 8, pp. 2002–2014, 2018.
- Z. Chen, X. Chen, B. Lu et al., “Up-regulated LINC01234 promotes non-small-cell lung cancer cell metastasis by activating VAV3 and repressing BTG2 expression,” Journal of Hematology & Oncology, vol. 13, no. 1, 2020.
- X. Jiang, Q. Zhu, P. Wu, F. Zhou, and J. Chen, “Upregulated long noncoding RNA LINC01234 predicts unfavorable prognosis for colorectal cancer and negatively correlates with KLF6 expression,” Annals of Laboratory Medicine, vol. 40, no. 2, pp. 155–163, 2020.
- Z. Ye, H. Luo, B. Gong et al., “Evaluation of four genetic variants in han chinese subjects with high myopia,” Journal of Ophthalmology, vol. 2015, Article ID 729463, 6 pages, 2015.
- G. R. Nascimento, I. P. Pinto, A. V. de Melo et al., “Molecular characterization of koolen de vries syndrome in two girls with idiopathic intellectual disability from Central Brazil,” Molecular Syndromology, vol. 8, no. 3, pp. 155–160, 2017.
Copyright © 2020 Wenqing Zhou et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.