Abstract

Background. Colorectal cancer (CRC) has been the 3rd most commonly malignant tumor of the gastrointestinal tract in the world. 5-Methylcytosine (m5C) and long noncoding RNAs (lncRNAs) have an essential role in predicting the prognosis and immune response for CRC patients. Therefore, we built a m5C-related lncRNA (m5CRlncRNA) model to investigate the prognosis and treatment methods for CRC patients. Methods. Firstly, we secured the transcriptome and clinical data for CRC from The Cancer Genome Atlas (TCGA). Then, m5CRlncRNAs were recognized by coexpression analysis. Then, univariate Cox, least absolute shrinkage and selection operator (LASSO), and multivariate Cox regression analyses were utilized to build m5C-related prognostic characteristics. Besides, Kaplan-Meier analysis, ROC, PCA, -index, enrichment analysis, and nomogram were performed to investigate the model. Additionally, immunotherapy responses and antitumor medicines were explored for CRC patients. Results. A total of 8 m5C-related lncRNAs (AC093157.1, LINC00513, AC025171.4, AC090948.2, ZEB1-AS1, AC109449.1, AC009041.3, and LINC02516) were adopted to construct a risk model to investigate survival and prognosis for CRC patients. CRC samples were separated into low- and high-risk groups, with the latter having a worse prognosis. The m5C-related lncRNA model helps us to better distinguish immunotherapy responses and IC50 of antitumor medicines in different groups of CRC patients. Conclusion. The research may give new perspectives on tailored therapy approaches as well as novel theories for forecasting the prognosis of CRC patients.

1. Introduction

In terms of cancer-related deaths, colorectal cancer (CRC) is the third most frequent malignant tumor worldwide [1]. The recent epidemiological surveys showed that CRC contributes to 10% of all diagnosed cancers and 9.4% of cancer-related deaths [2]. The high incidence and low survival rate of CRC imposed a heavy economic burden and enormous public health pressure on the government. At present, the main clinical treatment strategies for CRC include surgery, chemotherapy, and radiotherapy, but with poor prognosis, easy recurrence, and significant side effects [3]. In order to better understand CRC, it is urgently needed to select key CRC-related genes, elucidate the potential pathogenesis of CRC, and develop novel diagnostic and therapeutic strategies for CRC.

Numerous studies have found that RNA modifications in epigenetic changes are intimately related to the progression of different types of tumors [4, 5]. At present, more than 150 RNA modifications have been recognized, such as N1-methyladenosine (m1A), 5-methylcytosine (m5C), N6-methyladenosine (m6A), 7-methylguanosine (m7G), microRNA, and long noncoding RNA (lncRNA) [6]. With the in-depth study of RNA modification, m5C has received increasing attention from scholars around the world. As a widespread RNA modification of noncoding and coding RNAs, m5C has a crucial function in the regulation of physiological and pathological processes in the organism [7]. A study demonstrated that m5C regulators were linked to the occurrence and progression of cancer [8]. In bladder cancer, the m5C modification writer NSUN2 modulates HDGF expression in a m5C-dependent manner in order to promote cancer development [9].

lncRNA is the nonprotein-coding RNA fraction of over 200 nucleotides in length that cannot be translated into protein [10]. It has been shown that RNA methylation of lncRNAs could impact cancer progression [11]. With the advancement of sequencing technology, m5C was found to be extensively distributed in lncRNAs. However, the utility of m5C in lncRNAs is still uncertain. Therefore, identifying m5C-related lncRNAs (m5CRlncRNAs) in CRC pathogenesis may help provide a rational basis for targeted therapy and prognosis.

In this study, bioinformatics analysis was used to examine the potential contribution of m5CRlncRNAs to CRC. The Cancer Genome Atlas was used to obtain a database of m5C genes and lncRNAs (TCGA). Then, using Pearson’s correlation analysis, we were able to identify the m5CRlncRNA. Additionally, a brand-new risk model for the m5CRlncRNA was developed to forecast overall survival (OS) in CRC patients. We also created a nomogram incorporating clinical data to predict the overall survival of CRC patients. Finally, we looked for the connection between immunotherapy responses.

2. Materials and Methods

2.1. Data Acquisition

TCGA database was utilized to retrieve RNA transcriptome data, relevant clinical information, and mutation data from CRC samples. We used the R package to process the downloaded files. To reduce statistical bias, we excluded CRC patients with absent OS values and short OS values (<30 days).

2.2. Identification m5C Genes and m5CRlncRNAs

Based on previous publications [12, 13], we extracted 17 m5C regulators from TCGA-CRC, including expression data on 11 writers, 2 readers, and 4 erasers (Supplementary Table 1). Then, we screened m5CRlncRNA by Pearson correlation analysis, and we derived 2,028 m5CRlncRNA. and were the threshold criteria.

2.3. Construction of a Risk Model

TCGA dataset was randomly distributed into a training set and a testing set (ratio: 0.7 : 0.3; sample: 355 : 148). We used the entire set to construct a m5CRlncRNA risk model, and the training set and testing set were employed to verify the risk model. No significant differences were found in the clinical features of CRC patients between the two sets (Table 1). We utilized univariate Cox analysis of the filtered 14 m5CRlncRNA in combination with CRC survival information (). Besides, we adopted the least absolute shrinkage and selection operator (LASSO) and Cox regression analyses to construct a risk assessment model that consisted of 8 m5CRlncRNAs via the R package “glmnet” [14]. According to median risk scores, the CRC patients were assigned to low- and high-risk groups [15]. And the risk score was calculated as follows: .

2.4. Validation of the Risk Signature

By using the “survminer” and “survive” packages in the R programming language, Kaplan-Meier survival analysis was employed to compare the clinical outcomes of the two groups. The time-dependent receiver-operating characteristic curves (ROC) and the area under the curve (AUC) were employed to confirm the accuracy of the model. We also grouped patients according to clinical characteristics to assess the ability of the model to predict prognosis across clinical characteristics. Principal component analysis (PCA) was employed for effective dimension reduction, model recognition, and exploratory visualization of high-dimensional data of the whole gene expression profiles, m5C genes, m5CRlncRNAs, and a risk model on the basis of the expression patterns of the m5CRlncRNAs. A consistency index (-index) was applied to determine the accuracy of the model compared to the traditional clinical features.

2.5. Construction of Predictive Nomogram

We developed a nomogram to predict the clinical features for the 1-, 3-, and 5-year OS of CRC patients via the R package of “rms.”.

2.6. Evaluation of Enrichment Analysis

A clusterProfiler R package was used to perform GO enrichment analysis and KEGG pathway analysis to explore possible biological functions. indicated that the functional pathways had significant enrichment.

2.7. Assessment of the Prognostic Features in the Tumor Immune Microenvironment

Studying how the model interacts with the tumor microenvironment, we measured the infiltration values for TCGA-CRC dataset samples on the basis of these algorithms: XCELL [16], TIMER [17], QUANTISEQ [18], MCPCOUNTER [19], EPIC [20], CIBERSORT-ABS [21], and CIBERSORT [22]. Additionally, we adopted single-sample GSEA (ssGSEA) for scoring CRC-infiltrating immune cells to quantify the tumor-infiltrating immune cells between different groups. Furthermore, we also evaluated the immune checkpoint activation among different groups.

2.8. Investigation of Immunotherapy Response

The mutation data was assessed and summarized by the “maftools” of R package. Based on tumor-specific mutated genes, we calculated the tumor mutational burden (TMB). In addition, the tumor immune dysfunction and exclusion (TIDE) algorithm was performed to estimate the probability of an immunotherapeutic response.

2.9. Exploration of Antitumor Agents

To predict therapeutic response, the “pRRophetic” R package was employed to determine the half-maximal inhibitory concentration (IC50) of commonly used antitumor drugs in different risk groups.

3. Results

3.1. Screen of m5CRlncRNAs in CRC Patients

A total of 17 m5C genes and 16,876 lncRNAs were selected from TCGA datasets. We found 2,028 lncRNAs that were strongly linked to one of the 17 m5C genes ( and ) (Supplementary Table 2). As shown in Figure 1(a), the m5C-lncRNA expression network was visualized via the Sankey diagram. Figure 1(b) depicts the relationship between m5C genes and m5CRlncRNAs in TCGA datasets.

3.2. Construction and Validation of a Risk Model

We adopted univariate Cox regression analysis to select 14 prognostic m5CRlncRNAs (Supplementary Figure S1A). The LASSO-Cox regression algorithm was performed to construct the risk signature, including 8 m5CRlncRNAs (AC093157.1, LINC00513, AC025171.4, AC090948.2, ZEB1-AS1, AC109449.1, AC009041.3, and LINC02516) in CRC (Figures 2(a)2(c)). In addition, Kaplan-Meier analysis revealed a significant difference between distinct groups (, Figure 2(d)). In Figure 2(e), the 1-, 3-, and 5-year AUC values were 0.746, 0.717, and 0.792, which demonstrated that CRC patients have a better prognosis. Furthermore, the AUC value of the signature was 0.792, which was notably higher than that of clinicopathological characteristics, including age (0.646), gender (0.481), and stage (0.737; Figure 2(f)). The Kaplan-Meier analyses and ROC curves of training set and testing set indicated that the prediction accuracy of the model is satisfactory (Supplementary Figure S1B-E).

Next, we studied the differences in clinical outcomes among distinct groups stratified by clinical characteristics. Kaplan-Meier survival analysis demonstrated that our model can be applied to a variety of clinical characteristics (Figure 3(a)). The PCA analysis showed that the distributions of the two groups were relatively dispersed, which indicated diverse groups had different distributions on the basis of the signature (Figure 3(b)). And the -index of the model was superior to the clinicopathological features, indicating that this model could better predict the prognosis of CRC patients (Figure 4(a)).

3.3. Construction of Nomogram and Calibration in CRC Patients

As shown in Figure 4(b), the calibration curves of a nomogram revealed good accordance between the predicted 1-, 3-, and 5-year OS rates and actual observations. And the nomogram was established to demonstrate the superior predictive power of m5C compared to clinical features (Figure 4(c)).

3.4. Functional Enrichment Analysis

The GO analysis (Figure 5(a)) showed that the terms were mainly enriched in the signaling receptor activator activity and receptor ligand activity of biological processes (BP), the apical part of cell and presynapse of cellular component (CC), and the epidermis development and skin development of molecular function (MF) (Supplementary Table 3). The KEGG analysis showed that the terms were mainly enriched in hsa04978, hsa04972, hsa04020, hsa00040, hsa05226, and hsa04390 (Figure 5(b)) (Supplementary Table 4).

3.5. Evaluation of Tumor Immune Microenvironment

To further explore whether the m5CRlncRNA was related to the TIME, we assessed the relationship between the signature and tumor-infiltrating immune cells. Significant correlations were noted between the abundance of these tumor-infiltrating immune cells and increased CRC risk (Figure 6(a)) (Supplementary Table 5). The ssGSEA results showed that HLA, type I IFN response, and type II IFN response of patients in the low-risk group were lower compared to high-risk group (, Figure 6(b)). We further investigated the immune checkpoints, and the results revealed significant differences in the distribution of immune checkpoint-related molecule expression among different groups. We examined the expression levels of 46 immune checkpoint genes, 14 of which differed in expression in the high- and low-risk groups. The immune checkpoints of TNFRSF15 and HHLA2 in the low-risk group were higher () (Figure 6(c)). The above findings might imply that the low-risk group was more immunologically active and might be more sensitive to immunotherapy.

3.6. Evaluation of Immunotherapy

On the basis of the somatic mutational data from TCGA, we calculated the mutation frequency among different groups. And the mutation frequencies of the different groups were depicted by the waterfall chart. It found that 223 of 232 (96.12%) CRC samples in the high-risk group and 219 of 234 (93.59%) CRC samples in the low-risk group displayed genetic mutations, and missense mutation was the most common variant classification. In addition, APC had high genetic alterations (72%), which was only lower than that of TP53 (60%) in the high-risk group (Figure 7(a)). APC had the highest genetic alterations (79%) (Figure 7(b)). Then, we evaluated the relationship between different groups and immunotherapy biomarkers. As exhibited in Figures 7(c) and 7(d), we observed that the high-risk group was more sensitive to immunotherapy, suggesting that the m5C-based classifier index may be a predictor of tumor mutation burden and TIDE. Finally, we used Kaplan-Meier curve analysis of patient OS based on TMB. As displayed in Figure 7(e), a significant difference was observed between patients in the high TMB and low TMB groups (). We further investigated whether the m5CRlncRNA model could predict OS outcomes greater than TMB alone. Compared with other groups, the low-risk group with high TMB had the best prognosis among those of the other three groups () (Figure 7(f)). In summary, the signature of m5CRlncRNA has greater prognostic implications than that of TMB.

3.7. Selection Potential Antitumor Drugs by m5CRlncRNA Model

To identify for potential drugs targeting the m5CRlncRNA model for the treatment of CRC patients, the pRRophetic algorithm was implemented to assess therapeutic efficacy according to the IC50 of each data. In addition, we noticed that the sensitivity of the two groups differed significantly for 23 compounds by predicting the potential therapeutic agents. As shown in Figure 8, we detected that the low-risk group was closely connected with chemotherapeutic agents with higher IC50, suggesting that low-risk patients were more responsive to chemotherapeutic drugs.

4. Discussion

Many experts and scholars have concentrated on exploring the pathogenesis and treatment strategies of CRC in recent years [23]. Despite the fact that surgery, chemotherapy, radiotherapy, and targeted therapy were used for CRC patients, treatment outcomes were poor, and 5-year survival rates were low [24]. In recent years, research has demonstrated that cancer patients with different clinical characteristics and subgroups are likely to have a different prognosis and response to treatment [25, 26]. Thus, it is vital to investigate effective and personalized treatment options for the prognosis and management of CRC.

Firstly, we downloaded the lncRNA data of CRC patients from TCGA database. According to univariate Cox, LASSO, and multivariate Cox regression analysis, 8 m5CRlncRNAs (AC093157.1, LINC00513, AC025171.4, AC090948.2, ZEB1-AS1, AC109449.1, AC009041.3, and LINC02516) were determined to be significant prognostic factors to explore the prognostic function in CRC. In recent years, lncRNAs have been linked to cancer survival and development in many studies [27, 28]. And ZEB1-AS1 was found to be a tumor-related lncRNA prognostic factor in CRC [29]. In addition, a study revealed that regulation of LINC00513 lncRNA expression can affect disease susceptibility in systemic lupus erythematosus [30]. Besides, other lncRNAs were screened for the first time as prognostic markers for CRC. Based on 8 m5CRlncRNAs, we built a risk assessment model to further investigate the association between m5CRlncRNA and CRC. Next, CRC patients were divided into different risk groups based on median scores, and the high-risk group had a lower clinical survival rate. There were similar results found in the analysis of subgroups categorized by gender, age, and tumor stage. Additionally, PCA analysis also supported the grouping ability of m5C. As part of the present study, we developed a nomogram with clinical characteristics and m5CRlncRNAs. These results suggest that OS was shorter in the high-risk group, with better concordance between 1-year, 3-year, and 5-year observation and predicted OS rates.

Furthermore, we investigated the associations between risk groups and TMB and TIME. The TMB is regarded as the total number of somatic cell-encoded mutations that lead to neoantigens being generated that trigger antitumor immunity [31]. A large number of studies have proven that TMB is a powerful biomarker for predicting the efficacy of checkpoint inhibitors in cancer [32, 33]. In the high-risk group, TMB appeared to be significantly higher. Additionally, TIDE is a computational framework for simulating tumor immune evasion and is usually applied to forecast the effects of immune checkpoint inhibition therapy [34]. Our results demonstrated that the low-risk group was predicted to have a superior response to immunotherapy. To probe the therapeutic potential of the identified m5CRlncRNAs for CRC patients, we analyzed their sensitivity to different drugs. And we discovered that the low-risk group was significantly related to chemotherapeutic agents with higher IC50. Altogether, the above results help us to further predict the prognosis of CRC and elucidate the molecular biological mechanism between m5CRlncRNAs and CRC.

However, there are several issues with the research. First off, we lack external datasets to validate the prediction accuracy of the risk model since there are not any lncRNA-related CRC datasets in the Gene Expression Omnibus (GEO) database. Second, because of the study’s limited sample size, there could be some bias in the data analysis. Thirdly, there is no experimental validation of analytical findings in the research to demonstrate the use of the risk model in clinical treatment. Therefore, we will try to verify the validity of the signature utilizing animal and cellular tests.

Overall, our study did provide not only new insights into individualized treatment strategies for patients with CRC but also new ideas for predicting the prognosis of these patients. In addition, this study might contribute to further exploring the biological functions of m5C-regulated lncRNAs.

Data Availability

The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Authors’ Contributions

MS and PZ designed the study. PZ, TTZ, DGC, and LG performed data analysis. PZ, TTZ, DGC, and LG drafted the manuscript. TTZ, LG, and MS revised the manuscript. All authors read and approved the final manuscript. Peng Zhang, Tingting Zhang, and Denggang Cheng contributed equally to this work.

Acknowledgments

This research was supported by the National Natural Science Foundation of China (81902498), Hubei Provincial Natural Science Foundation (2019CFB177), Natural Science Foundation of Hubei Provincial Department of Education (Q20182105), Chen Xiao-Ping Foundation for the development of science and technology of Hubei Provincial (CXPJJH11800001-2018333), The Foundation of Health and Family Planning Commission of Hubei Province (WJ2021Q007), Innovation and Entrepreneurship Training Program (201810929005, 201810929009, 201810929068, 201813249010, S201910929009, S201910929045, S202013249005, S202013249008, and 202010929009), and The Scientific and Technological Project of Taihe Hospital (2021JJXM009).

Supplementary Materials

Supplementary 1. Supplementary Figure S1: (a) the prognostic m5CRlncRNAs by univariate Cox regression analysis. (b–e) Kaplan-Meier curves and the 1-, 3-, and 5-year ROC curves of the training and testing sets.

Supplementary 2. Supplementary Table S1: the list of m5C genes.

Supplementary 3. Supplementary Table S2: the details of m5CRlncRNAs.

Supplementary 4. Supplementary Table S3: the details of GO enrichment analysis.

Supplementary 5. Supplementary Table S4: the details of KEGG enrichment analysis.

Supplementary 6. Supplementary Table S5: the abundance of these tumor-infiltrating immune cells and increased CRC risk.