Prognostic Models for Nonmetastatic Triple-Negative Breast Cancer Based on the Pretreatment Serum Tumor Markers with Machine Learning

Chen, Huihui; Wu, Shijie; Hu, Jun; Zhang, Kun; Hu, Kaimin; Lu, Yuexin; He, Jiapan; Pan, Tao; Chen, Yiding

doi:https://doi.org/10.1155/2021/6641421

Journal of Oncology

On this page

Abstract Introduction Methods Results Discussion Data Availability Ethical Approval Conflicts of Interest Authors’ Contributions Acknowledgments Supplementary Materials References Copyright Related Articles

Special Issue

Prognostic Models Based on Machine Learning for Clinical Cancer Research

View this Special Issue

Research Article | Open Access

Volume 2021 | Article ID 6641421 | https://doi.org/10.1155/2021/6641421

Prognostic Models for Nonmetastatic Triple-Negative Breast Cancer Based on the Pretreatment Serum Tumor Markers with Machine Learning

Huihui Chen,^1,2Shijie Wu,^1,2Jun Hu,^1,2,3Kun Zhang,^1,2Kaimin Hu,^1,2Yuexin Lu,^1,2Jiapan He,^1,2Tao Pan ,^1,2and Yiding Chen^1,2

Academic Editor: Zhixiong Liu

Received10 Dec 2020

Accepted03 May 2021

Published17 May 2021

Abstract

Purpose. Triple-negative breast cancer (TNBC) is a heterogeneous and aggressive disease with poorer prognosis than other subtypes. We aimed to investigate the prognostic efficacy of multiple tumor markers and constructed a prognostic model for stage I-III TNBC patients. Patients and Methods. We included stage I-III TNBC patients whose serum tumor markers levels were measured prior to the treatment. The optimal cut-off value of each tumor marker was determined by X-tile. Then, we adopted two survival models (lasso Cox model and random survival forest model) to build the prognostic model and AUC values of the time-dependent receiver operating characteristic (ROC) were calculated. The Kaplan-Meier method was used to plot the survival curves and the log-rank test was used to test whether there was a significant difference between the predicted high-risk and low-risk groups. We used univariable and multivariable Cox analysis to identify independent prognostic factors and did subgroup analysis further for the lasso Cox model. Results. We included 258 stage I-III TNBC patients. CEA, CA125, and CA211 showed independent prognostic value for DFS when using the optimal cut-off values; their HRs and 95% CI were as follows: 1.787 (1.056–3.226), 2.684 (1.200–3.931), and 2.513 (1.567–4.877). AUC values of lasso Cox model and random survival forest model were 0.740 and 0.663 for DFS at 60 months, respectively. Both the lasso Cox model and random survival forest model demonstrated excellent prognostic value. According to tumor marker risk scores (TMRS) computed by the lasso Cox model, the high TMRS group had worse DFS (HR = 3.138, 95% CI: 1.711–5.033, ) and OS (3.983, 1.637–7.214, ) than low TMRS group. Furthermore, subgroup analysis of N₀-N₁ patients in the lasso Cox model indicated that TMRS still had a significant prognostic effect on DFS (2.278, 1.189–4.346) and OS (2.982, 1.110–7.519). Conclusions. Our study indicated that pretreatment levels of serum CEA, CA125, and CA211 had independent prognostic significance for TNBC patients. Both lasso Cox model and random survival forest model that we constructed based on tumor markers could strongly predict the survival risk. Higher TMRS was associated with worse DFS and OS both in stage I-III and N₀-N₁ TNBC patients.

1. Introduction

Breast cancer is the most common malignancy among women throughout the world, with the highest morbidity and mortality in various female cancers. According to the global cancer statistics report released by the World Health Organization, there would be about 2.08 million newly diagnosed female breast cancer cases and more than 0.62 million patients died of it in 2018 [1]. Triple-negative breast cancer (TNBC) is characterized by the absence of estrogen receptor (ER), progesterone receptor (PR) expression, and human epidermal growth factor receptor-2 (HER-2) amplification, accounting for 10%-20% of all breast cancers [2–4]. TNBC patients usually have more unfavorable histopathologic features when compared with non-TNBC, such as more rapid proliferation, larger tumor size, higher grade, and lymph node positivity [5, 6]. TNBC patients can not benefit from endocrine therapy or anti-HER-2 therapy since targets are missing, making chemotherapy become currently the mainstay of systemic treatment.

Notorious for its heterogeneity, aggressiveness, and limited treatment options, TNBC is thought to have the poorest prognosis in all subtypes. Although it is reported that TNBC patients are sensitive to chemotherapy as demonstrated by higher pathologic complete response (pCR) rates than other subtypes after neoadjuvant chemotherapy [7, 8]. There are still a considerable number of patients who cannot obtain pCR, and those with residual lesions have significantly worse survival compared to non-TNBC [7]. On the other hand, there is a higher risk of relapse and disease progression after surgery and chemotherapy for TNBC [9, 10]. Montagna E et al. evaluated the outcome of breast cancer patients after locoregional recurrence (LRR) furtherly and they found that patients with TNBC at LRR experienced a higher risk of subsequent relapse and death [11]. Recently, a retrospective analysis based on the SEER database also revealed that when in comparison with non-TNBC, TNBC patients had worse overall survival (OS) and breast cancer cause-specific survival (BCSS) in every stage and substage [12]. As for the survival of those patients with distant metastasis, it is also shorter in TNBC compared to other subtypes and this can be explained by the predilection for brain and lung metastasis of TNBC, while ER-positive breast cancers are more likely to relapse in bone or skin [4, 13, 14]. Therefore, it is important to discover some efficient and easy detection prognostic markers to evaluate the risk of postoperative recurrence or survival.

Apart from the extensively documented clinicopathological risk factors such as lymph node status, tumor size, grade, and the level of Ki-67, there are still no prognostic biomarkers suitable for clinical use in TNBC [15, 16]. The prognostic value of serum tumor markers has been investigated in breast cancer for several years and carcinoembryonic antigen (CEA) and cancer antigen 15-3 (CA15-3) are the most widely used tumor markers in clinical practice [17–21]. However, the prognostic efficacy of preoperative levels of serum tumor markers such as CEA and CA15-3 in breast cancer remains controversial. Several previous studies suggested that elevated preoperative CEA and CA15-3 levels are associated with tumor burden and poor prognosis [17, 22, 23]. In contrast, there are also some reports that failed to support this conclusion, showing no prognostic significance of CEA or CA15-3 [21, 24]. Although the European Group on Tumor Markers has recommended the use of CEA and CA15-3 for assessing prognosis and early detection of disease progression in breast cancer since 2005 [25], the American Society of Clinical Oncology (ASCO) and National Comprehensive Cancer Network (NCCN) guidelines have not recommended the routine utilization of CEA and CA15-3 [26, 27]. Additionally, most studies have been based on breast cancer overall; the association of these tumor markers and different subtypes of breast cancer such as TNBC remains to be clarified.

In recent years, machine learning methods have been widely applied to disease prognosis and prediction [28–30]. These techniques are utilized for identifying informative factors and modeling the progression of cancer. Park et al. compared three classification models, namely, support vector machines (SVM), artificial neural network (ANN), and semisupervised learning models (SSLM) for the prediction of breast cancer survivability based on 16 features, including tumor size, the number of nodes, and age [28]. However, SVM, ANN, and SSLM, which are designed for classification data, are not suitable for time-to-event data. Lasso Cox regression model and random survival forest model are commonly used survival machine learning algorithms. For example, Zheng et al. developed a novel scoring system based on hypoxia and immune status by taking the lasso Cox regression model [30].

In our study, we intended to conduct research to investigate the prognostic efficacy of multiple tumor markers and constructed prognostic models for stage I-III TNBC patients based on the six pretreatment tumor markers’ levels (including CEA, CA19-9, CA125, CA242, CA211, and CA15-3) with machine learning algorithms, so as to help identify the early-stage patients with high recurrence and mortality risk.

2. Patients and Methods

2.1. Study Population

We conducted a retrospective analysis of stage I-III TNBC patients who were admitted to The Second Affiliated Hospital of Zhejiang University, School of Medicine, between January 2011 and December 2017 and whose serum tumor markers (including CEA, CA19-9, CA125, CA242, CA211, CA15-3) levels were measured prior to surgery or neoadjuvant chemotherapy. TNBC was defined as ER and PR negative or <1% if the percentage was specified and HER-2 status is 0 or 1+ by immunohistochemistry analysis or 2+ with negative fluorescent in situ hybridization [31, 32]. Patients with any missing receptor information or a missing pathology report were excluded from the analysis. In addition, the patients were also excluded for meeting one of the following criteria: (1) carcinoma in situ; (2) male patients; (3) stage IV disease with distant metastasis at first diagnosis; (4) history of other malignant tumors. All data, including clinical and pathological information, treatment modality, serum tumor markers, and details of outcomes, were collected. TNM stage was based on the Eighth American Joint Committee on Cancer Criteria. The written informed consent was acquired from each breast cancer patient or patient’s guardian and the study was approved by the Ethics Committee of The Second Affiliated Hospital of Zhejiang University, School of Medicine.

2.2. Tumor Markers Detection

Peripheral blood samples (5 mL) were collected from all patients before treatment. Then serum was separated by centrifugation kept at −80°C for later detection. The serum CEA, CA19-9, CA125, CA242, CA211, and CA15-3 levels were measured using the chemiluminescence immunoassay method (ARCHITECT i2000; Abbott Laboratories Inc). The cut-off values for normal and elevated tumor markers were 5 ng/mL for CEA, 37 U/mL for CA19-9, 35 U/mL for CA125, 20 U/mL for CA242, 5 ng/mL for CA211, and 30 U/mL for CA15-3.

2.3. Follow-Up and Study Endpoints

Patients were followed up at an interval of 3 months within 2 years, 6 months within 3–5 years, and 1 year for more than 5 years, with the date of surgery performed considered as the first day of follow-up. The primary study endpoints were disease-free survival (DFS) and overall survival (OS). DFS was defined to be from the date of surgery to the date of locoregional recurrence, distant metastasis, another second primary cancer, and death before recurrence or the date of the last follow-up. OS was defined to be from the date of surgery to death from any cause or the date of the last follow-up.

2.4. Lasso Cox Model and Random Survival Forest Model

The least absolute shrinkage and selection operator (lasso) Cox regression model analysis was performed by using the “glmnet” package [33]. Partial likelihood deviance was selected as the loss function, and the optimal values of penalty parameter λ were determined through twenty-fold cross-validation [34]. Regression coefficients of each tumor marker were calculated with the optimal λ value, and tumor marker risk scores (TMRS) of patients were then calculated based on the levels of serum tumor markers and their associated regression coefficients accordingly.

Random survival forest (RSF) is an extension of Breiman’s random forest method which was designed for analysis of right-censored time-to-event data [35]. We performed a RSF model to build the predictive model using the “randomForestSRC” package [35]. Tuning parameters, such as node size and mtry, where node size represented the number of samples in the terminal node and mtry was the number of randomly selected candidate variables in each parent node, were optimized by a grid search to minimize the out-of-bag (OOB) error. TMRS of the RSF model were calculated utilizing the “predict” function of the “stats” package. With the median TMRS as a cut-off value, all TNBC patients were split into high TMRS and low TMRS groups in both models.

2.5. Statistical Analysis

Statistical evaluation of comparison of each tumor marker levels in different stages was performed using one-way analysis of variance (ANOVA) and Tukey’s post hoc test or nonparametric Kruskal-Wallis test according to the distribution and homogeneity test of variances of data. X-tile 3.6.1 software (Yale University, New Haven, CT, USA) was used to determine the optimal prognostic cut-off value of each tumor marker in TNBC patients [36]. The sensitivity and specificity of the survival prediction based on the TMRS were depicted by a time-dependent receiver operating characteristic (ROC) curve, with quantification of the area under the ROC curve (AUC) using the “timeROC” package [37]. All packages were used in our study to analyze data with the R project (version 3.4.2). Graphpad prism 6 was used to plot Kaplan-Meier survival curves and the group differences in survival time were tested using the log-rank test, with hazard ratios (HRs) and 95% confidence intervals (CIs) being calculated. The difference between proportions was evaluated by the chi-square or Fisher’s exact test as appropriate. Univariable and multivariable Cox’s proportional hazard analyses were performed to compare and identify independent prognostic factors for DFS. All tests were 2-sided and statistical significance was set at . All data were analyzed using the SPSS 24.0 and Graphpad prism 6 software.

3. Results

3.1. Patient Characteristics and Follow-Up

258 stage I-III TNBC patients met the criteria for inclusion in the study. The clinicopathological characteristics of the patients are shown in Table 1. The median age at diagnosis for participants was 51.5 years old (range 25–87 years). Among them, the age of disease onset in most (68.2%) patients was between 40 and 60 years. Bilateral morbidity was basically the same, with left 50.8% and right 48.8%, respectively. One patient was diagnosed with bilateral breast cancer, left invasive ductal carcinoma and right carcinoma in situ, with both sides having a negative expression of ER, PR, and HER-2. The pathological classification of 203 cases (78.7%) was nonspecific invasive cancer. 110 (42.6%) patients were classified as histologic grade III and the expression of Ki-67 was ≥30% (high expression) in 193 cases (74.8%). As for the TNM stage, there were 100 cases (38.8%) in stage I, 111 cases (43.0%) in stage II, and 36 (14.0%) in stage III. In addition, a total of 178 (69.0%) patients underwent a total mastectomy, and 80 (31.0%) received breast-conserving surgery. 236 patients (91.5%) received chemotherapy (including adjuvant and neoadjuvant) and 114 cases (44.2%) received postoperative radiotherapy. During follow-up, 53 patients (20.5%) displayed disease progression, with 16 of locoregional recurrence (6.2%), 31 of distant metastasis (12.0%), 3 of second primary cancer (1.2%), and 3 of death because of other reasons (1.2%). Moreover, 28 patients (10.8%) died, 23 of whom died of breast cancer.

Kaplan-Meier survival curves of DFS and OS in all included TNBC patients are shown in Figure 1. The median follow-up time of our study population was 41.25 months for DFS and 49.25 months for OS. The 5-year DFS and OS were 76.5% and 86.7%, respectively (Figures 1(a) and 1(b)).

(a)

(b)

3.2. The Levels of Pretreatment Serum Tumor Markers

Figure 2 shows the distribution of each tumor marker among different stages patients. First of all, for these early-stage TNBC patients, there were only a few people with elevated serum tumor markers levels. For example, only 10 (3.9%), 17 (6.6%), and 10 (3.9%) patients showed elevated levels of CEA, CA19-9, and CA15-3. However, in the comparison of stage I-III, the elevations of four markers (including CEA, CA125, CA211, and CA15-3) tend to be more found in more advanced stages (stage II or III). As we can see in Figures 2(a) and 2(e), the serum levels of CEA and CA211 were significantly higher in stage III patients than those in stage I and stage II. In terms of CA15-3, both stage II and III TNBC patients showed higher levels than stage I (Figure 2(f)). However, there was no obvious correlation between serum levels of CA19-9, CA242, and TNM stage (Figures 2(b) and 2(d)).

(a)

(b)

(c)

(d)

(e)

(f)

Figure 2

The levels of each serum tumor marker in stage I-III TNBC patients. (a) CEA. (b) CA19-9. (c) CA125. (d) CA242. (e) CA211. (f) CA15-3. A scatter represents a patient, and the cut-off value of each scatter plot is the clinical upper limit, in which higher than the cut-off is indicated by red scatter while the lower is blue. The comparison between different stages was performed using one-way ANOVA and Tukey’s post hoc test or the nonparametric Kruskal-Wallis test as appropriate. , , indicated a significant difference. CEA: carcinoembryonic antigen; CA: cancer antigen; TNBC: triple-negative breast cancer; NS: not significant.

On the other hand, we also compared the levels of tumor markers among patients without recurrence evidence, with locoregional recurrence and with distant metastasis, respectively (Figure S1). The results suggested that for those with different DFS status, their pretreatment levels of serum tumor markers had no significant difference.

3.3. The Optimal Cut-Off Values Determined by X-Tile and Their Prognostic Role

Stage II or III patients showed higher levels of tumor markers than stage I patients, but only a few people had elevated tumor markers levels; we did not think it was appropriate to use the clinical cut-off value as the prognostic cut-off for early-stage TNBC patients. So, we used X-tile to determine the optimal prognostic cut-off value of each tumor marker, and as shown in Table 2, the optimal cut-off values of CEA, CA19-9, CA125, CA242, CA211, and CA15-3 were 2.15 ng/mL, 17.30 U/mL, 9.05 U/mL, 8.85 U/mL, 1.15 ng/mL and 16.00 U/mL, respectively.

Based on the newly determined cut-off value, we plotted the Kaplan-Meier survival curve of each tumor marker, as shown in Figure 3. Compared with lower tumor makers levels, higher CEA, CA125, and CA211 levels were clearly associated with poor DFS, and their corresponding HRs and 95% CIs were as follows: 1.787 (1.056–3.226), 2.684 (1.200–3.931), and 2.513 (1.567–4.877) (Figures 3(a), 3(c), and 3(e)). As for CA19-9 (HR = 1.743, 95% CI: 0.975–3.759, ), CA242 (HR = 1.558, 95% CI: 0.779–3.612, ), and CA15-3 (HR = 1.759, 95% CI: 0.939–4.143, ), although we still could not find their significantly independent prognostic value, there was a tendency that patients with high levels of serum tumor markers had poorer prognosis (Figures 3(b), 3(d) and 3(f)). Thus, we aimed to evaluate patients’ prognosis according to the levels of these six tumor markers.

(a)

(b)

(c)

(d)

(e)

(f)

Figure 3

Kaplan-Meier survival curves of TNBC patients DFS based on the optimal cut-off value. (a) CEA. (b) CA19-9. (c) CA125. (d) CA242. (e) CA211. (f) CA15-3. The group differences in survival time were tested using the log-rank test; HRs with 95% CIs and (p) value were shown in the figure. HR and its 95% CI larger than 1 indicated a poorer prognosis of high-level tumor marker. reported the significant differences. TNBC: triple-negative breast cancer; DFS: disease-free survival; CEA : carcinoembryonic antigen; CA: cancer antigen; HR: hazard ratio; CI: confidence interval.

Construction of the Prognosis Prediction Model for TNBC Patients by Lasso Cox Model and Random Survival Forest Model.

We counted it as 1 score if the level of each serum tumor marker was higher than the optimal cut-off value, otherwise as 0 score. Based on the levels of these tumor markers, the lasso Cox model identified the risk signature that was significantly associated with DFS based on the optimal λ value 0.0234 (Figure 4(a)). The lasso algorithm is a shrinkage estimate that can be used to construct a penalty function and obtain a relatively refined model [34]. Here in our study, the regression coefficient of CA242 turned into zero, while the remaining tumor markers were included in the simplified lasso Cox model (Table 3). TMRS of each patient was then calculated based on these regression coefficients and levels of five tumor markers. Time-dependent ROC curve analysis showed that prognostic accuracy of TMRS was 0.678 at 36 months and 0.740 at 60 months for DFS; 0.737 at 36 months and 0.702 at 60 months for OS (Figures 4(b) and 4(c)).

(a)

(b)

(c)

Figure 4

The tuning parameter plot and time-dependent ROC curves of the lasso Cox model. (a) The tuning parameter plot. The x-axis represents log-transformed lambda values, and the y-axis represents the partial likelihood deviance. The vertical dashed line indicates the minimal partial likelihood deviance. (b) tdROC curve for DFS at 36 months, with an AUC of 0.678 and at 60 months, with an AUC of 0.740. (c) tdROC curve for OS at 36 months, with an AUC of 0.737 and at 60 months, with an AUC of 0.702. tdROC: time-dependent receiver operating characteristic; AUC: area under the ROC curve; DFS: disease-free survival; OS: overall survival.

We further chose another machine learning method, the RSF model, to build the predictive model. As Figure 5(a) shows, the OOB error was lowest when mtry was 1 and node size was 65, indicating the best RSF model. In this model, the recurrence risk of each patient was computed as well, and time-dependent ROC curves were plotted then. As is shown in Figures 5(b) and 5(c), AUC values were 0.637 and 0.663 at 36 and 60 months for DFS; 0.777 and 0.659 at 36 and 60 months for OS, respectively.

(a)

(b)

(c)

Figure 5

The tuning parameter plot and time-dependent ROC curves of the random survival forest model. (a) The tuning parameter plot. Node size is the number of samples in the terminal node and mtry is the number of randomly selected candidate variables in each parent node; OOB error is the out-of-bag error. A darker color indicates a larger OOB error, while a lighter color indicates smaller OOB error, suggesting a better RSF model. (b) tdROC curve for DFS at 36 months, with an AUC of 0.637 and at 60 months, with an AUC of 0.663. (c) tdROC curve for OS at 36 months, with an AUC of 0.777 and at 60 months, with an AUC of 0.659. RSF: random survival forest; tdROC: time-dependent receiver operating characteristic; AUC: area under the ROC curve; DFS: disease-free survival; OS: overall survival.

3.4. Prognostic Value of TMRS Groups in Two Survival Models and Subgroup Analysis

The median TMRS was used as the threshold to divide total TNBC patients into high-risk and low-risk groups in both models and survival analyses of the total study population were performed in terms of TMRS. According to the risk scores calculated by the lasso Cox model, higher TMRS was significantly associated with worse DFS (HR = 3.138, 95% CI: 1.711–5.033, ) and OS (HR = 3.983, 95% CI: 1.637–7.214, ). The 5-year DFS and OS of the low TMRS group vs. the high TMRS group patients were 88.5% vs. 64.4% and 93.3% vs. 79.9%, respectively (Figure 6(a) and 6(b)). On the other hand, the RSF model also showed great predictive value for nonmetastatic TNBC patients. The survival analysis indicated that patients in the high-risk group had significantly higher recurrence risk (HR = 2.454, 95% CI: 1.395–4.107, ) and mortality risk (HR = 2.857, 95% CI: 1.290–5.694, ) than those in the low-risk group (Figure 6(c) and 6(d)).

(a)

(b)

(c)

(d)

Figure 6

Kaplan-Meier survival curves of lasso Cox and random survival forest model in total TNBC patients. (a) DFS of lasso Cox model in all TNBC patients. (b) OS of lasso Cox model in all TNBC patients. (c) DFS of RSF model in all TNBC patients. (d) OS of RSF model in all TNBC patients. The group differences in survival time were tested using the log-rank test; HRs with 95% CIs and (p) value were shown in the figure. HR and its 95% CI larger than 1 indicated a poorer prognosis of high TMRS. reported the significant differences. TNBC: triple-negative breast cancer; TMRS: tumor marker risk score; HR: hazard ratio; CI: confidence interval; DFS: disease-free survival; OS: overall survival; RSF: random survival forest.

We further chose the lasso Cox model to evaluate the model performance in the subgroup analysis since it had a larger AUC value and better prognostic significance than the RSF model, with good interpretability for the survival model. Univariable analysis showed that T-stage (), N-stage () and TMRS groups () were potential prognostic factors for DFS (Table 4). Multivariable analysis including these factors demonstrated that besides TMRS groups, the traditional clinicopathological factor, N-stage, had independent prognostic value for DFS in TNBC patients as well (, Table 4). When stratified by lymph node status (N-stage), N₂-stage (HR = 2.767, 95% CI: 1.218–6.288) and N₃-stage (HR = 4.980, 95% CI: 2.081–11.917) patients showed poorer prognosis than N₀-stage patients, while N₁-stage showed no significant difference (HR = 0.658, 95% CI: 0.263–1.650) (Table 4). Hence, we selected N₀-N₁ patients as low recurrence risk patients and plotted the Kaplan-Meier survival curve according to TMRS groups. As in Figures 7(a) and 7(b), TMRS groups showed excellent prognostic value again. Those N₀-N₁ patients with higher TMRS showed significantly worse DFS (HR = 2.278, 95% CI: 1.189–4.346, ) and OS (HR = 2.982, 95% CI: 1.110–7.519, ) than those with lower TMRS (Figures 7(a) and 7(b)).

(a)

(b)

Figure 7

Kaplan-Meier survival curves of lasso Cox model in N₀-N₁ TNBC patients. (a) DFS in N₀-N₁ TNBC patients. (b) OS in N₀-N₁ TNBC patients. The group differences in survival time were tested using the log-rank test; HRs with 95% CIs and (p) value were shown in the figure. HR and its 95% CI larger than 1 indicated a poorer prognosis of high TMRS. reported the significant differences. TNBC: triple-negative breast cancer; TMRS: tumor marker risk score; HR: hazard ratio; CI: confidence interval; DFS: disease-free survival; OS: overall survival.

4. Discussion

The independent prognostic value of serum tumor markers, such as CEA and CA15-3, was revealed in several previous studies [17, 20, 22]. However, among all these studies, there is little discussion on molecular subtypes of breast cancer and few studies were performed to explore the prognostic value of multiple tumor markers. In our current study, we used X-tile to determine the best prognostic cut-off value of each tumor marker based on the idea of “optimal cut-off value” [36] and confirmed the significant prognostic role of CEA, CA125, and CA211. On the other hand, we synthesized the role of six tumor markers and constructed an excellent prognostic model for stage I-III TNBC patients, providing a method for assisting in predicting prognosis.

Among the six tumor markers included in our study, CEA and CA15-3 were mostly demonstrated and their elevated levels were closely related to poor prognosis in breast cancer patients [17, 20, 22, 23, 38]. Wu SG et al. found that elevated levels of CEA and CA15-3 had no significant effect on local recurrence-free survival but were significantly associated with the decrease of distant metastasis-free survival, DFS, and OS in the Chinese breast cancer cohort [23]. The correlation analysis between molecular subtypes and tumor markers indicated that there was only 1 case (1.6%) in TNBC with elevated CEA, much less than other subtypes, while the proportion of CA15-3 (14.3%) was similar to others [23]. Although two additional studies confirmed the significant prognostic value of CEA and CA15-3 for DFS and OS in overall breast cancer patients, subgroup analysis of molecular subtype showed inconsistent results [20, 38]. The study of Nam SE et al. suggested no correlation between the levels of CEA, CA15-3, and OS of TNBC patients, while another research indicated that in basal-like subtype, which had an overlap of approximately 70–80% TNBC patients, elevated CEA conferred reduction for breast cancer-specific survival (BCSS), but without association observed for DFS [20, 38]. Different from our study, the studies mentioned above all were performed based on the clinical upper limit as the prognostic cut-off. The negative evidence in the TNBC subtype suggested that perhaps we should screen an optimal cut-off used for prognosis. Our results confirmed the prognostic value of CEA in early-stage TNBC patients when using the cut-off selected by X-tile. CA125, which is mostly used in ovarian cancer, was found to increase significantly in metastatic breast cancer patients [39, 40]. In Li JX’s study, there was no relevance found between CA125 and breast cancer outcomes, including BCSS and DFS [38]. But another study that included young breast cancer patients indicated that a high level of CA125 was associated with worse DFS and OS when using 19.38 U/mL as the cut-off value [41], providing further evidence for selecting an optimal cut-off value. Although no study explored the prognostic significance of CA125 in different molecular subtypes, it was shown that the levels of CA125 in TNBC patients were higher than non-TNBC, suggesting that elevation of CA125 can be used to predict a poor outcome of TNBC patients [42]. According to our results, CA125 showed a significant prognostic value when using 9.05 U/mL as the cut-off. As for CA19-9, CA242, and CA211, there were quite a few studies exploring their relationship with breast cancer. Some researchers have investigated the diagnostic value of CA19-9 in breast cancer [43], but its role in predicting prognosis still remains unknown. CA242 and CA211 were discovered relatively later than other tumor markers. Thanks to their low specificity, most studies explored its application in the diagnosis or prognosis of pancreatic cancer or gastrointestinal cancer [44, 45]. This is the first time to report the significant prognostic value of CA211 in breast cancer patients, suggesting that its role in breast cancer is worthy of further study.

In the present study, we found that both the lasso Cox model and RSF model based on tumor markers could help stratify stage I-III TNBC patients’ recurrence risk and mortality risk. Numerous previous studies adopted the Cox proportional hazard model with lasso penalization for survival data [30, 46] because it had wider application value for its role in simplifying variables. Our results also suggested that lasso Cox model had a larger AUC value and better prognostic significance than the RSF model. Therefore, we developed a prognostic model involving five tumor markers except for CA242, which are easily detected in clinical practice, to calculate TMRS based on machine learning algorithms for predicting the outcome of TNBC patients. The role of tumor markers was further validated in our study. When comparing the clinicopathological characteristics, we found that the high TMRS group indicated a more advanced stage, with more lymph nodes metastasis (Table S1), which provided the possibility of estimating the stage according to tumor markers. In addition to being associated with tumor burdens, TMRS groups were also reported an excellent prognostic significance for TNBC patients. The multivariable analysis also confirmed that TMRS was one of the independent prognostic factors. Thus, the prognostic value of the combination of multiple tumor marker levels was further intensified in our study. Although the ASCO has not recommended therapeutic decisions based on the serum tumor marker status [26], we still think that elevated serum tumor markers could be useful in discriminating high-risk groups, for which the hypothesis should be verified.

There are some limitations to this study that should be considered as well. First of all, we did not verify the validity of the model by using a verification set. Since public datasets that provide information about levels of patients’ serum tumor markers are inaccessible, we can not perform further analysis based on an external dataset. On the other hand, it is a single-center study with a limited number of patients, and all the patients included are in the Chinese cohort, so multicenter prospective studies should be performed to confirm the validity of this prognostic model. In addition, due to the generally good prognosis of breast cancer, the number of cases with recurrence or death was small. Given this limitation, longer-term follow-up will be needed to update the results. What is more, we did not evaluate the prognosis of patients by comparing their changes in serum tumor marker concentrations before and after surgery, which is also a strategy of using tumor marker. Finally, whether the prognostic model is suitable for metastatic patients and other molecular subtypes is worthy of further exploration.

In conclusion, our study indicated that pretreatment levels of serum CEA, CA125, and CA211 had great prognostic significance for TNBC patients when using the optimal cut-off value determined by X-tile. TMRS, which was calculated based on tumor markers by taking the lasso Cox model, was an independent prognostic factor as well. A higher score of TMRS was associated with worse DFS and OS both in stage I-III and N₀-N₁ TNBC patients. We hope that further study should be used in an effort to confirm the validity of this study and to provide more information by using tumor markers regarding therapeutic decision-making in clinical practice.

Abbreviations:

TNBC:	Triple-negative breast cancer
ER:	Estrogen receptor
PR:	Progesterone receptor
HER-2:	Human epidermal growth factor receptor-2
pCR:	Pathologic complete response
LRR:	Locoregional recurrence
DFS:	Disease-free survival
OS:	Overall survival
BCSS:	Breast cancer cause-specific survival
CEA:	Carcinoembryonic antigen
CA:	Cancer antigen
ASCO:	American Society of Clinical Oncology
NCCN:	National Comprehensive Cancer Network
HR:	Hazard ratio
CI:	Confidence interval
TMRS:	Tumor marker risk score
ROC:	Receiver operating characteristic
AUC:	Area under the ROC curve
RSF:	Random survival forest.

Data Availability

The underlying data used to support our findings of this study are available from the corresponding author on request.

Ethical Approval

All procedures performed in this study were in accordance with GCP, the ethical standards of the national research committee, and the 1964 Helsinki Declaration and its later amendments or comparable ethical standards. The study was approved by the Ethics Committee of The Second Affiliated Hospital of Zhejiang University, School of Medicine. Written informed consent was acquired from each breast cancer patient or patient’s guardian.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Authors’ Contributions

Chen HH, Wu SJ, Hu J, Lu YX, and He JP collected the patients’ clinicopathological characteristics and follow-up information. Chen HH and Wu SJ performed the statistical analysis and Chen HH wrote the manuscript. Hu KM and Zhang K provided advice on data analysis and paper writing. Chen YD and Pan T supervised the study and Chen YD led a scientific discussion. All authors read and approved the final manuscript for publication.

Acknowledgments

This study was funded by the National Natural Science Foundation of China (Grant No.82072900), the Key Program of the Natural Science Foundation of Zhejiang Province (Grant no. LZ16H160002), the Zhejiang Provincial Program for the Cultivation of High-Level Innovative Health Talents, and the Preclinical and Multi-center Basket Clinical Trial of the Multi-kinase Inhibitor TT-00420 (2019ZX09301158). The authors thank Dr. Jiaojiao Zhou of The Second Affiliated Hospital, Zhejiang University, School of Medicine, for polishing the English of the whole article.

Supplementary Materials

Figure S1: The levels of each serum tumor marker in patients without event, local recurrence, and distant metastasis. (A) CEA. (B) CA19-9. (C) CA125. (D) CA242. (E) CA211. (F) CA15-3. A scatter represents a patient, and the cut-off value of each scatter plot is the clinical upper limit. The comparison of tumor markers’ levels between different groups was performed using one-way ANOVA and Tukey’s post hoc test or nonparametric Kruskal-Wallis test as appropriate (CEA: carcinoembryonic antigen; CA: cancer antigen; TNBC: triple-negative breast cancer; NS: not significant). Table S1: Clinicopathological characteristics of the patients according to TMRS. (Supplementary Materials)

References

F. Bray, J. Ferlay, I. Soerjomataram, R. L. Siegel, L. A. Torre, and A. Jemal, “Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries,” CA: A Cancer Journal for Clinicians, vol. 68, no. 6, pp. 394–424, 2018.
View at: Publisher Site | Google Scholar
P. Boyle, “Triple-negative breast cancer: epidemiological considerations and recommendations,” Annals of Oncology, vol. 23, no. 6, pp. vi7–vi12, 2012.
View at: Publisher Site | Google Scholar
W. D. Foulkes, I. E. Smith, and J. S. Reis-Filho, “Triple-negative breast cancer,” New England Journal of Medicine, vol. 363, no. 20, pp. 1938–1948, 2010.
View at: Publisher Site | Google Scholar
P. Kumar and R. Aggarwal, “An overview of triple-negative breast cancer,” Archives of Gynecology and Obstetrics, vol. 293, no. 2, pp. 247–269, 2016.
View at: Publisher Site | Google Scholar
R. Dent, M. Trudeau, K. I. Pritchard et al., “Triple-negative breast cancer: clinical features and patterns of recurrence,” Clinical Cancer Research, vol. 13, no. 15, pp. 4429–4434, 2007.
View at: Publisher Site | Google Scholar
F. Z. Mouh, M. El Mzibri, M. Slaoui, and M. Amrani, “Recent progress in triple negative breast cancer research,” Asian Pacific Journal of Cancer Prevention, vol. 17, no. 4, pp. 1595–1608, 2016.
View at: Publisher Site | Google Scholar
C. Liedtke, C. Mazouni, K. R. Hess et al., “Response to neoadjuvant therapy and long-term survival in patients with triple-negative breast cancer,” Journal of Clinical Oncology, vol. 26, no. 8, pp. 1275–1281, 2008.
View at: Publisher Site | Google Scholar
G. Von Minckwitz, M. Untch, J.-U. Blohmer et al., “Definition and impact of pathologic complete response on prognosis after neoadjuvant chemotherapy in various intrinsic breast cancer subtypes,” Journal of Clinical Oncology, vol. 30, no. 15, pp. 1796–1804, 2012.
View at: Publisher Site | Google Scholar
K. D. Voduc, M. C. U. Cheang, S. Tyldesley, K. Gelmon, T. O. Nielsen, and H. Kennecke, “Breast cancer subtypes and the risk of local and regional relapse,” Journal of Clinical Oncology, vol. 28, no. 10, pp. 1684–1691, 2010.
View at: Publisher Site | Google Scholar
D. S. P. Tan, C. Marchió, R. L. Jones et al., “Triple negative breast cancer: molecular profiling and prognostic impact in adjuvant anthracycline-treated patients,” Breast Cancer Research and Treatment, vol. 111, no. 1, pp. 27–44, 2008.
View at: Publisher Site | Google Scholar
E. Montagna, V. Bagnardi, N. Rotmensz et al., “Breast cancer subtypes and outcome after local and regional relapse,” Annals of Oncology, vol. 23, no. 2, pp. 324–331, 2012.
View at: Publisher Site | Google Scholar
X. Li, J. Yang, L. Peng et al., “Triple-negative breast cancer has worse overall survival and cause-specific survival than non-triple-negative breast cancer,” Breast Cancer Research and Treatment, vol. 161, no. 2, pp. 279–287, 2017.
View at: Publisher Site | Google Scholar
H. Kennecke, R. Yerushalmi, R. Woods et al., “Metastatic behavior of breast cancer subtypes,” Journal of Clinical Oncology, vol. 28, no. 20, pp. 3271–3277, 2010.
View at: Publisher Site | Google Scholar
A. Soni, Z. Ren, O. Hameed et al., “Breast cancer subtypes predispose the site of distant metastases,” American Journal of Clinical Pathology, vol. 143, no. 4, pp. 471–478, 2015.
View at: Publisher Site | Google Scholar
E. S. Stovgaard, D. Nielsen, E. Hogdall, and E. Balslev, “Triple negative breast cancer - prognostic role of immune-related factors: a systematic review,” Acta Oncologica, vol. 57, no. 1, pp. 74–82, 2018.
View at: Publisher Site | Google Scholar
A. S. Coates, E. P. Winer, A Goldhirsch et al., “Tailoring therapies--improving the management of early breast cancer: st gallen international expert consensus on the primary therapy of early breast cancer 2015,” Annals of Oncology Official Journal of the European Society for Medical Oncology, vol. 26, no. 8, pp. 1533–1546, 2015.
View at: Publisher Site | Google Scholar
J. S. Lee, S. Park, J. M. Park, J. H. Cho, S. I. Kim, and B.-W. Park, “Elevated levels of preoperative CA 15-3 and CEA serum levels have independently poor prognostic significance in breast cancer,” Annals of Oncology, vol. 24, no. 5, pp. 1225–1231, 2013.
View at: Publisher Site | Google Scholar
R. Molina, J. M. Auge, B. Farrus et al., “Prospective evaluation of carcinoembryonic antigen (CEA) and carbohydrate antigen 15.3 (CA 15.3) in patients with primary locoregional breast cancer,” Clinical Chemistry, vol. 56, no. 7, pp. 1148–1157, 2010.
View at: Publisher Site | Google Scholar
M. T. Sandri, M. Salvatici, E. Botteri et al., “Prognostic role of CA15.3 in 7942 patients with operable breast cancer,” Breast Cancer Research and Treatment, vol. 132, no. 1, pp. 317–326, 2012.
View at: Publisher Site | Google Scholar
S. e. Nam, W. Lim, J. Jeong et al., “The prognostic significance of preoperative tumor marker (CEA, CA15-3) elevation in breast cancer patients: data from the Korean Breast Cancer Society Registry,” Breast Cancer Research and Treatment, vol. 177, no. 3, pp. 669–678, 2019.
View at: Publisher Site | Google Scholar
A. Rasmy, W. Abozeed, S. Elsamany et al., “Correlation of preoperative Ki67 and serum CA15.3 levels with outcome in early breast cancers a multi institutional study,” Asian Pacific Journal of Cancer Prevention: APJCP, vol. 17, no. 7, pp. 3595–3600, 2016.
View at: Google Scholar
B.-W. Park, J.-W. Oh, J.-H. Kim et al., “Preoperative CA 15-3 and CEA serum levels as predictor for breast cancer outcomes,” Annals of Oncology, vol. 19, no. 4, pp. 675–681, 2008.
View at: Publisher Site | Google Scholar
S.-g. Wu, Z.-y. He, J. Zhou et al., “Serum levels of CEA and CA15-3 in different molecular subtypes and prognostic value in Chinese breast cancer,” The Breast, vol. 23, no. 1, pp. 88–93, 2014.
View at: Publisher Site | Google Scholar
K. V. Albuquerque, M. R. Price, R. A. Badley et al., “Pre-treatment serum levels of tumour markers in metastatic breast cancer: a prospective assessment of their role in predicting response to therapy and survival,” European Journal of Surgical Oncology (EJSO), vol. 21, no. 5, pp. 504–509, 1995.
View at: Publisher Site | Google Scholar
R. Molina, V. Barak, A. van Dalen et al., “Tumor markers in breast cancer – European group on tumor markers recommendations,” Tumor Biology, vol. 26, no. 6, pp. 281–293, 2005.
View at: Publisher Site | Google Scholar
L. Harris, H. Fritsche, R. Mennel et al., “American society of clinical oncology 2007 update of recommendations for the use of tumor markers in breast cancer,” Journal of Clinical Oncology, vol. 25, no. 33, pp. 5287–5312, 2007.
View at: Publisher Site | Google Scholar
2020, NCCN Clinical Practice Guidelines in Oncology: Breast Cancer https://www.nccn.org/professionals/physician_gls/pdf/breast_blocks.pdf.
K. Park, A. Ali, D. Kim, Y. An, M. Kim, and H. Shin, “Robust predictive model for evaluating breast cancer survivability,” Engineering Applications of Artificial Intelligence, vol. 26, no. 9, pp. 2194–2205, 2013.
View at: Publisher Site | Google Scholar
K. Kourou, T. P. Exarchos, K. P. Exarchos, M. V. Karamouzis, and D. I. Fotiadis, “Machine learning applications in cancer prognosis and prediction,” Computational and Structural Biotechnology Journal, vol. 13, pp. 8–17, 2015.
View at: Publisher Site | Google Scholar
S. Zheng, Y. Zou, J. y. Liang et al., “Identification and validation of a combined hypoxia and immune index for triple‐negative breast cancer,” Molecular Oncology, vol. 14, no. 11, pp. 2814–2833, 2020.
View at: Publisher Site | Google Scholar
M. E. H. Hammond, D. F. Hayes, M. Dowsett et al., “American Society of Clinical Oncology/College of American Pathologists guideline recommendations for immunohistochemical testing of estrogen and progesterone receptors in breast cancer,” Journal of Clinical Oncology, vol. 28, no. 16, pp. 2784–2795, 2010.
View at: Publisher Site | Google Scholar
A. C. Wolff, M. E. H. Hammond, D. G. Hicks et al., “Recommendations for human epidermal growth factor receptor 2 testing in breast cancer: American Society of Clinical Oncology/College of American Pathologists clinical practice guideline update,” Journal of Clinical Oncology, vol. 31, no. 31, pp. 3997–4013, 2013.
View at: Publisher Site | Google Scholar
J. Friedman, T. Hastie, and R. Tibshirani, “Regularization paths for generalized linear models via coordinate descent,” Journal of Statistical Software, vol. 33, no. 1, pp. 1–22, 2010.
View at: Publisher Site | Google Scholar
R. Tibshirani, “The lasso method for variable selection in the Cox model,” Statistics in Medicine, vol. 16, no. 4, pp. 385–395, 1997.
View at: Publisher Site | Google Scholar
H. Ishwaran, U. B. Kogalur, E. H. Blackstone, and M. S. Lauer, “Random survival forests,” The Annals of Applied Statistics, vol. 2, no. 3, pp. 841–860, 2008.
View at: Publisher Site | Google Scholar
R. L. Camp, M. Dolled-Filhart, and D. L. Rimm, “X-Tile,” Clinical Cancer Research, vol. 10, no. 21, pp. 7252–7259, 2004.
View at: Publisher Site | Google Scholar
A. N. Kamarudin, T. Cox, and R. Kolamunnage-Dona, “Time-dependent ROC curve analysis in medical research: current methods and applications,” BMC Medical Research Methodology, vol. 17, no. 1, p. 53, 2017.
View at: Publisher Site | Google Scholar
J. Li, L. Liu, Z. Feng et al., 2020, Tumor Markers CA15-3, CA125, CEA and Breast Cancer Survival by Molecular Subtype: A Cohort Study. Breast Cancer.
T. Meyer and G. J. Rustin, “Role of tumour markers in monitoring epithelial ovarian cancer,” British Journal of Cancer, vol. 82, no. 9, pp. 1535–1538, 2000.
View at: Publisher Site | Google Scholar
D. Baskic, P. Ristic, S. Matic, D. Bankovic, S. Popovic, and N. Arsenijevic, “Clinical evaluation of the simultaneous determination of CA 15-3, CA 125 and sHER2 in breast cancer,” Biomarkers, vol. 12, no. 6, pp. 657–667, 2007.
View at: Google Scholar
X. Li, D. Dai, B. Chen et al., “Prognostic values of preoperative serum CEA and CA125 levels and nomograms for young breast cancer patients,” OncoTargets and Therapy, vol. 12, pp. 8789–8800, 2019.
View at: Publisher Site | Google Scholar
C. Fang, Y. Cao, X. Liu, X.-T. Zeng, and Y. Li, “Serum CA125 is a predictive marker for breast cancer outcomes and correlates with molecular subtypes,” Oncotarget, vol. 8, no. 38, pp. 63963–63970, 2017.
View at: Publisher Site | Google Scholar
W. Wang, X. Xu, B. Tian et al., “The diagnostic value of serum tumor markers CEA, CA19-9, CA125, CA15-3, and TPS in metastatic breast cancer,” Clinica Chimica Acta, vol. 470, pp. 51–55, 2017.
View at: Publisher Site | Google Scholar
O. Nilsson, C. Johansson, B. Glimelius et al., “Sensitivity and specificity of CA242 in gastro-intestinal cancer. A comparison with CEA, CA50 and CA 19-9,” British Journal of Cancer, vol. 65, no. 2, pp. 215–221, 1992.
View at: Publisher Site | Google Scholar
H. Dou, G. Sun, and L. Zhang, “CA242 as a biomarker for pancreatic cancer and other diseases,” Progress in Molecular Biology and Translational Science, vol. 162, pp. 229–239, 2019.
View at: Publisher Site | Google Scholar
Z. Liu, M. Li, Q. Hua, Y. Li, and G. Wang, “Identification of an eight-lncRNA prognostic model for breast cancer using WGCNA network analysis and a Coxproportional hazards model based on L1-penalized estimation,” International Journal of Molecular Medicine, vol. 44, no. 4, pp. 1333–1343, 2019.
View at: Google Scholar

Copyright

Copyright © 2021 Huihui Chen et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

1023

Downloads

871

Citations