Abstract

Background. It has been demonstrated that inflammatory and nutritional variables are associated with poor breast cancer survival. However, some studies do not include these variables due to missing data. To investigate the predictive potential of the INPS, we constructed a novel inflammatory-nutritional prognostic scoring (INPS) system with machine learning. Methods. This retrospective analysis included 249 patients with malignant breast tumors undergoing neoadjuvant chemotherapy (NAC). After comparing seven potent machine learning models, the best model, Xgboost, was applied to construct an INPS system. K-M survival curves and the log-rank test were employed to determine OS and DFS. Univariate and multivariate analyses were carried out with the Cox regression model. Additionally, we compared the predictive power of INPS, inflammatory, and standard nutritional variables using the test. Results. After comparing seven machine learning models, it was determined that the XGBoost model had the best OS and DFS performance (, respectively). For overall survival (OS, ) and disease-free survival (), all patients were divided into two groups by the INPS. Those with low INPS had higher 5-year OS and DFS rates (77.2% vs. 50.0%, ; and 59.6% vs. 32.1%, , respectively) than patients with high INPS. For OS and DFS, the INPS exhibited the highest AUC compared to the other inflammatory and nutritional variables (, ; , , respectively). Conclusion. The INPS was an independent predictor of OS and DFS and exhibited better predictive ability than BMI, PNI, and MLR. For patients undergoing NAC for nonpCR breast cancer, INPS was a crucial and comprehensive biomarker. It could also forecast individual survival in breast cancer patients with low HER-2 expression.

1. Introduction

Breast malignant tumors, the most common malignancy, now more prevalent than lung cancer worldwide, are the primary cause of cancer-related deaths in women globally [1]. As breast cancer treatment continues to evolve, neoadjuvant chemotherapy (NAC) plays an increasingly important role in determining patient prognosis [2]. To decrease their clinical stage and increase their likelihood of undergoing breast-conserving surgery, NAC is an excellent option for patients with locally advanced breast cancer. Additionally, physicians may now be able to adjust treatments based on drug-sensitivity information [3].

A pathological complete response (pCR) is defined as a breast and lymph node free of invasive cancer on postoperative pathology, but carcinoma in situ of the breast is allowed [4]. Specifically, for the TNBC and HER-2 positive subtypes, achieving a pCR with neoadjuvant therapy predicts an excellent outcome and long-term survival [4, 5]. In contrast, patients with nonpCR breast malignant tumors have a poor prognosis [6]. The CREAT-X trial’s findings demonstrated that adjuvant capecitabine therapy could considerably increase OS and DFS in HER-2-negative breast cancer patients who did not achieve a pCR following NAC, with the TNBC group benefiting the most [7]. The KATHERINE study revealed that the 3-year invasive disease-free survival (iDFS) rate of T-DM1 was considerably higher than that of the trastuzumab group for patients who did not achieve a pCR following 6-8 cycles of neoadjuvant therapy [8]. Although the above drugs have improved the prognosis of breast cancer patients with a nonpCR, it is worth considering screening out patients with poor responses to NAC and the standard adjuvant therapy agents and implementing different treatment strategies.

Traditional biomarkers found to be closely connected to a pCR include tumor-infiltrating lymphocytes (TILs), p53, human epidermal growth factor receptor-2 (HER-2), Ki-67 index, estrogen receptor (ER), and progesterone receptor (PR) [9, 10]. However, few biomarkers are specifically designed to predict the outcome of breast cancer patients with a nonpCR. Thus, it is essential and meaningful to construct a novel and convenient biomarker for patients with a nonpCR.

Several studies have recently examined the relationships among inflammation, nutrition, and malignant tumors [11]. Different inflammatory and nutritional parameters, such as body mass index (BMI), prognostic nutrition index (PNI), albumin to globulin ratio (AGR), neutrophil to lymphocyte ratio (NLR), platelet to lymphocyte ratio (PLR), monocyte to lymphocyte ratio (MLR), systemic immune-inflammation index (SII), and systemic inflammation response index (SIRI), as well as their combinations, have all been shown to be vital predictors for breast cancer patients [710, 1215]. However, a single variable can only provide limited information. Compared to models based on one or a few inflammatory indices, prognostic models combining multiple indicators can offer improved prediction accuracy [16, 17].

Biomedicine has embraced machine learning techniques for predictive modeling and decision-making in contrast to conventional statistical methods since they have the potential to produce prediction models by conducting extensive searches across the parameter space [18]. Machine learning methods are more accurate across various subject areas than traditional logistic regression [19].

The above inflammatory and nutritional parameters have attracted extensive attention. However, few studies have comprehensively explored the relationship between these variables and the prognosis, particularly for breast cancer patients receiving neoadjuvant chemotherapy. Therefore, our study is aimed at constructing a novel inflammatory-nutritional scoring (INPS) system based on machine learning models and to investigate its relationship with the outcomes of breast cancer patients with a nonpCR. Then, we compared its predictive ability with commonly used inflammatory and nutritional variables. Additionally, we conducted an exploratory analysis to discuss the relationship between INPS and the HER-2 low expression subtype, as it has become clear from an increasing number of studies that patients with HER-2 low expression breast cancer may have a different prognosis than those with HER-2 negative and positive breast cancer.

2. Materials and Methods

2.1. Patients

All 249 patients with invasive, malignant breast tumors who underwent NAC and surgery at Harbin Medical University Cancer Hospital between January 2012 and March 2016 were included in the final retrospective analysis. This study was approved by the hospital’s ethics committee and complied with the original 1964 Declaration of Helsinki by the World Medical Association and any updated versions. Prior to receiving treatment, each patient signed an informed consent form.

The inclusion criteria included: (1) being diagnosed by pathology with an invasive, malignant breast tumors through core needle biopsy before NAC; (2) undergoing neoadjuvant chemotherapy and surgery at our hospital; (3) available clinical and pathological data, as well as follow-up data; and (4) a postoperative pathology report that indicated that the patient did not achieve a pCR.

The exclusion criteria included (1) achieving a pCR according to the postoperative pathology report; (2) being diagnosed with bilateral breast cancer or other particular types of breast cancer; (3) having distant metastasis; and (4) having an acute or chronic inflammatory disease, such as dermatomyositis.

2.2. Classification of Variables

Peripheral venous blood samples were collected seven days before the first cycle of NAC, and the electronic medical records provided all of the patients’ clinical and pathological data. The status of nonpCR was evaluated based on the postoperative pathological report.

Patients were divided into groups based on their median age and BMI (according to Chinese standards) [20]. This study used the eighth edition of the TNM staging system from the American Joint Committee on Cancer [21]. Breast cancer is classified into four main subtypes: luminal A, luminal B, HER-2 overexpression (HER2-OE), and triple-negative breast cancer (TNBC) [22]. A HER-2 IHC score of 1+ or 2+ with negative in situ hybridization (ISH) is considered low expression, a HER-2 IHC score of 0 is considered negative, and 3+ or 2+ with positive ISH is considered HER-2 positive [16].

The following parameters were calculated: PNI is serum ALB (g/L) +5 × total lymphocyte count (109/L); AGR is the ratio of albumin to globulin; NLR is the ratio of neutrophil count (109/L) to lymphocyte count (109/L); PLR is the ratio of platelet count (109/L) to lymphocyte count (109/L); MLR is the ratio of monocyte count (109/L) to lymphocyte count (109/L); SII is (neutrophil counts [109/L] × platelet counts [109/L])/total lymphocyte count (109/L). SIRI is (neutrophil counts [109/L] × monocyte counts [109/L))/total lymphocyte count (109/L).

With OS and DFS as the state variables, the maximally selected rank statistics were used to determine the best cutoff values for PNI, AGR, NLR, PLR, MLR, SII, SIRI, lymphocytes (L), neutrophils (N), monocytes (M), hemoglobin (Hb), platelets (P), albumin (ALB), and globulin (GLOB). Then, they were divided into low and high groups according to the following cutoff values: OS.PNI (60.4), OS.AGR (1.24), OS.NLR (2.72), OS.PLR (104), OS.MLR (0.33), OS.SII (672), OS.SIRI (1.4), OS.L (1.48), OS.N (5.9), OS.M (0.36), OS.Hb (132), OS.P (313), OS.ALB (46.3), OS.GLOB (34.8), DFS.PNI (60.4), DFS.AGR (1.24), DFS.NLR (2.47), DFS.PLR (122), DFS.MLR (0.33), DFS.SII (672), DFS.SIRI (1.19), DFS.L (1.39), DFS.N (5.47), DFS.M (0.36), DFS.Hb (132), DFS.P (304), DFS.ALB (43), and DFS.GLOB (28.5).

2.3. Follow-Up

Patients were followed up every three months after surgery for the first two years and then every six months for the following three years. Follow-up was up to five years after surgery or the date of death from any cause. OS was defined as the time between the date of operation and the date of death from any cause or last follow-up, and DFS was defined as the time from the date of surgery to the date of metastasis to distant organs, local recurrence, or death from any cause.

2.4. Machine Learning, Inflammatory and Nutritional Variables

Seven robust machine learning models were used to predict OS and DFS, including logistic regression (LR), support vector classification (SVC), k-nearest neighbor classification (KNN), extreme gradient boosting (Xgboost), random forests (RF), light gradient boosting machine (LightGBM), and adaptive boosting (AdaBoost). This study adopted the hold-out method (simple cross-validation) to address the overfitting issue brought on by the small sample size. The performance of each model was compared through the area under the curve (AUC) of the receiver operating characteristic (ROC). The most effective machine learning model was used to determine the importance of the inflammatory and nutritional variables as features.

2.5. Statistical Analysis

Statistical analyses were conducted with Python (version 3.9), R software (version 3.6.1), and MedCalc software (version 19.0.7). The cutoff values of the INPS and hematological variables were determined by the maximally selected rank statistics through the maxstat.text function based on the “maxstat” package in R software [17], with an initial cutoff score of 1 being assigned to variables above the cutoff value and an initial score of 0 to variables below it. Frequencies and percentages (%) were applied to describe the categorical variables, while the chi-squared test or Fisher’s exact test were used to assess differences. The median value of the continuous variables is presented with the interquartile range (IQR). The multicollinearity relationship among INPS, inflammatory and nutritional variables was tested by multiple linear regression analysis via variance inflation factor (VIF), with a considered noncollinear [23]. The Kaplan–Meier method was employed to estimate the survival curves, which were then compared by the log-rank test. The independent prognostic factors were determined with the Cox proportional hazards model, and pH assumptions were checked by the log minus log (LML) survival function. The test was used to compare different groups’ predictive functions, with a value <0.05 indicating statistical significance.

3. Results

3.1. Construction of INPS

Multiple linear regression analysis was conducted to test the possibility of multicollinearity between the inflammatory and nutritional variables, which showed that all of the variables had a . Eight inflammatory and nutritional variables were included in the seven machine-learning models to predict OS and DFS. The Xgboost model exhibited the highest AUC compared to other models for predicting OS or DFS (, respectively, Figures 1(a) and 1(b)). Then, the relative importance of the inflammatory and nutritional variables for predicting OS and DFS was calculated using the Xgboost model (Figures 1(c) and 1(d)). Variables below the respective cutoff value were scored 0, and those above the cutoff value were scored 1. The INPS was calculated as follows: (), DFS. () (Figure 2). According to the maxstat.text function, all of the patients were divided into low and high groups with a cutoff value for OS.INPS (0.3917) and DFS.INPS (0.4896).

3.2. Differences in Clinical and Pathologic Variables for Different INPS Groups

All 249 nonpCR breast malignant tumor patients were divided into two groups by the cutoff values of OS.INPS (0.3917) and DFS.INPS (0.4896). There were 193 (77.5%) cases in the low INPS group and 56 (22.5%) cases in the high INPS group, despite the state variable of OS or DFS, with ages ranging from 22 to 72 years old (median: 49 years old). A total of 217 (87.1%) patients suffered from clinical TNM stage III, and 117 (47.0%) patients suffered from the luminal B subtype. Clinical T stage, Ki-67 index, L, N, M, PNI, NLR, PLR, MLR, SII, and SIRI were correlated with OS.INPS status (), while clinical T stage, HER-2 status, Ki-67 index, P53, L, N, M, PNI, NLR, PLR, MLR, SII, and SIRI were correlated with DFS.INPS status () (Tables 1 and 2).

3.3. Univariable and Multivariable cox Regression Analysis for OS and DFS

The multicollinearity between INPS, inflammatory and nutritional variables was tested prior to the Cox analysis. OS.INPS, OS.NLR, OS.SII, and OS.SIRI had a VIF value of >2 for the state variable of OS. DFS.INPS, DFS.NLR, DFS.SII, and DFS.SIRI had a VIF of >2 for the state variable of DFS. Additionally, the INPS was constructed based on these inflammatory and nutritional variables. Therefore, the Cox regression analysis excluded BMI, PNI, AGR, NLR, PLR, MLR, SII, and SIRI. The relationship between the inflammatory and nutritional variables and OS and DFS is illustrated in Table S1. Meanwhile, the pH assumptions were checked using the log minus log (LML) survival function, and the Cox regression model was appropriate for the study data. In univariable Cox analysis, parturition, OS.INPS, OS.N, and clinical T stage were predictors of OS, while parturition, DFS.INPS, DFS.N, and DFS.M were predictors of DFS. Variables with were included in the multivariate analysis, demonstrating that parturition, OS.INPS and clinical T stage were independently associated with OS (, HR: 0.41, 95% CI: 0.22-0.77; , HR: 2.41, 95% CI: 1.45-4.01; , HR: 5.70, 95% CI: 1.43-22.8, respectively, Table 3). Only parturition and DFS.INPS were independently associated with DFS (, HR: 0.45, 95% CI: 0.27-0.76; , HR: 1.84, 95% CI: 1.20-2.83, respectively, Table 4). Compared to the high OS.INPS and high DFS.INPS groups, the low OS.INPS and low DFS.INPS groups exhibited higher 5-year OS and DFS rates (77.2% vs. 50.0%, ; 59.6% vs. 32.1%, , respectively, Figures 3(a) and 3(b)). In addition, the mean OS and DFS in the low INPS groups was significantly prolonged compared with that in the high INPS groups (54 vs. 43 months, ; 46 months vs. 35 months, , respectively, Figures 3(a) and 3(b)).

3.4. Relationships among OS, DFS, and INPS in Breast Cancer Patients with Different Clinical T Stages

Tables 1 and 2 reveal that clinical T stage, HER-2 status, Ki-67, and P53 were significantly related to INPS. Therefore, we conducted an exploratory analysis in these subgroups to identify the predictive ability of INPS for OS and DFS.

In all of the nonpCR breast cancer patients, compared to the clinical T3 + T4 group, patients with clinical T1 + T2 stage disease showed higher 5-year OS and DFS rates (73.6% vs. 61.5%, , ; 55.3% vs. 46.2%, , , respectively, Figures 4(a) and 4(b)). In the clinical T1 + T2 subgroup, patients with low INPS had significantly higher 5-year OS and DFS rates than those with high INPS (77.4% vs. 57.9%, , ; 59.1% vs. 39.5%, , , respectively, Figures 4(c) and 4(d)). In the clinical T3 + T4 subgroup, patients with low INPS also had significantly improved 5-year OS and DFS rates (76.5% vs. 33.3%, , ; 61.8% vs. 16.7%, , , respectively, Figures 4(e) and 4(f)).

3.5. Relationships among OS, DFS, and INPS in Breast Cancer Patients with Different HER-2 Statuses

In all breast cancer patients, there was no distinct difference in the 5-year OS and DFS rates among the HER-2-negative, low expression, and positive subgroups (69.7% vs. 68.0% vs. 76.0%, , ; 57.6% vs. 52.0% vs. 49.3%, , ; Figures 5(a) and 5(b)). In the HER-2-negative subgroup, patients in the low INPS group had significantly higher 5-year OS and DFS rates than those in the high INPS group (75.0% vs. 47.4%, , ; 63.9% vs. 25.0%, , , respectively, Figures 5(c) and 5(d)). In the HER-2 low expression subgroup, patients in the low INPS groups also had significantly higher 5-year OS and DFS rates (82.4% vs. 37.5%, , ; 60.8% vs. 33.3%, , , respectively, Figures 5(e) and 5(f)). In the HER-2-positive subgroup, there was no distinct difference in OS and DFS between the low and high INPS groups (75.8% vs. 76.9%, , ; 52.5% vs. 37.5%, , , respectively, Figures 5(g) and 5(h)).

3.6. Relationships among OS, DFS, and INPS in Breast Cancer Patients with Different Ki-67 Indices

In all breast cancer patients, no distinct difference was observed in the 5-year OS and DFS rates between the Ki-67<20% and Ki-67 ≥ 20% groups (75.7% vs. 67.6%, , ; 57.9% vs. 50.0%, , , respectively, Figures 6(a) and 6(b)). However, in the Ki-67<20% subgroup, patients with low OS.INPS had a higher 5-year OS rate than the high OS.INPS group (79.1% vs. 47.2%, , ; Figure 6(c)), with no difference in the 5-year DFS rate between the low and high DFS.INPS groups (60.2% vs. 42.9%, , ; Figure 6(d)). In the Ki-67 ≥ 20% group, patients with low INPS had significantly higher 5-year OS and DFS than those with high INPS (75.5% vs. 47.5%, , ; 59.0% vs. 28.6%, , , respectively, Figures 6(e) and 6(f)).

3.7. Relationships among OS, DFS, and INPS in Breast Cancer Patients with Different P53 Statuses

In all malignant breast cancer patients, there was no difference in 5-year OS and DFS rates between the P53-negative and P53-positive groups (73.1% vs. 65.7%, , ; 52.7% vs. 55.2%, , , respectively, Figures 7(a) and 7(b)). In the P53-negative group, patients with a low INPS showed significantly higher 5-year OS and DFS rates (79.5% vs. 47.2%, , ; 58.8% vs. 26.5%, , , respectively, Figures 7(c) and 7(d)). However, in the P53-positive group, there was no difference in the 5-year OS rate between the low and high OS.INPS groups (70.2% vs. 55.0%, , ; Figure 7(e)), while patients with low DFS.INPS had a higher 5-year DFS rate than those in the high DFS.INPS group (62.2% vs. 40.9%, , ; Figure 7(f)).

3.8. Comparison of the Predictive Capacity of INPS, Inflammatory and Nutritional Variables

The AUC was compared using the test to evaluate the prognostic significance of the INPS and inflammatory and nutritional variables. Whether the state variable was OS or DFS, INPS had the highest AUC compared with the other inflammatory and nutritional variables (, ; , , respectively, Table 5, Figure 8). Meanwhile, the distinction of AUC between OS.INPS and OS.BMI (, 95% CI: 0.007-0.202, ), OS.INPS and OS.PNI (, 95% CI: 0.017-0.150, ), OS.INPS and OS.MLR (, 95% CI: 0.019-0.133, ), OS.NLR and OS.MLR (, 95% CI: 0.016-0.125, ), and DFS.INPS and DFS.PNI (, 95% CI: 0.007-0.126, ) were statistically significant (Table 6). There were no distinct differences between any other groups ().

4. Discussion

This study investigated the clinical significance of a novel inflammatory-nutritional prognostic scoring (INPS) system based on BMI, PNI, AGR, NLR, PLR, MLR, SII, and SIRI through machine learning for breast cancer patients with a nonpCR after undergoing NAC and surgery. Low INPS was significantly associated with prolonged OS and DFS. This study also compared the predictive ability of INPS with the common inflammatory and nutritional variables, revealing that INPS was a better predictor for OS and DFS. The exploratory analysis demonstrated that INPS was a promising biomarker for HER-2 negative and low expression breast cancer patients.

Studies have shown that malignant tumors are related to systemic inflammation [24, 25]. Cancer-related inflammation occurs when cancer and inflammatory responses are entangled, resulting in a dramatically poor prognosis and a failure to respond to cancer therapy [11]. As a part of the inflammatory parameters, neutrophils may promote proliferation and metastasis by releasing inflammatory mediators [26]. Monocytes are also correlated with the metastasis and progression of malignant tumors [27]. In contrast, lymphocytes are essential for the antitumor effect [28]. Additionally, malnutrition is associated with cancer progression, as it may cause a poor immune response [29]. As a manifestation of malnutrition, poor survival is associated with low serum albumin levels [30].

As a holistic variable that incorporates many common inflammatory and nutritional variables, the utility of the INPS has been explored in other malignant tumors. Wang et al. found that preoperative INPS is an independent predictor of outcomes for stage III GC patients [31]. Hua et al. demonstrated that patients with high INPS had significantly worse survival than those with low INPS [32]. In that research, the authors chose the LASSO regression model to establish the INPS. However, as the LASSO algorithm is a type of machine learning, selection bias could not be avoided despite a large study sample. Therefore, in our study, we compared seven standard machine learning algorithms and selected the best model, Xgboost (, respectively, Figures 1(a) and 1(b)), to construct the INPS for OS and DFS. The multivariable Cox analysis demonstrated that OS. INPS and DFS. INPS were all independent predictors of outcomes for nonpCR breast cancer patients undergoing NAC and surgery (, HR: 2.41, 95% CI: 1.45-4.01; , HR: 1.84, 95% CI: 1.20-2.83; Tables 3 and 4, respectively).

Many studies have demonstrated that inflammatory and nutritional parameters are associated with survival; however, some of their results are inconsistent. According to a meta-analysis, a high NLR was significantly correlated with a poor pathological response in breast malignant tumor patients, with no association found with DFS or OS [33]. In contrast, another meta-analysis found that patients with high NLR and PLR had short OS and an increased risk of recurrence [34]. In addition, compared with NLR, which could only offer limited clinical information, our results noted that SII, an inflammatory parameter composed of neutrophils, platelets, and lymphocytes, was a better predictor of OS [9]. Therefore, we assumed that a biomarker integrated with various inflammatory and nutritional parameters should be more accurate than an individual biomarker. Our results proved that the INPS had a higher AUC for OS and DFS than the other inflammatory and nutritional variables. Pairwise comparisons of INPS, inflammatory and nutritional variables and the results of the test revealed that OS.INPS had a significantly larger AUC than OS.BMI, OS.PNI, and OS.MLR, and DFS.INPS had a substantially larger AUC than DFS.PNI.

We also conducted an exploratory analysis in patients with different clinical T stages, HER-2 statuses, Ki-67 indices, and P53 levels. Although significant survival differences could not be found in among above subgroups, patients with different INPSs showed considerable differences in OS and DFS. Especially in the distinct HER-2 status subgroups, patients with low INPS had better OS and DFS in HER-2 negative and low expression subgroups, with no difference observed in the HER-2 positive group. More recent studies have shown that breast cancer patients with low HER-2 expression have improved 3-year OS and DFS compared to HER-2-negative patients [35]. However, it is unclear whether low HER-2 expression is correlated with the long-term prognosis in breast cancer patients. Thus, the INPS may be a promising biomarker for HER-2 low breast cancer patients.

Although comprehensive and novel, this study had some limitations. First, it was a retrospective analysis conducted in a single center, and validation with data from additional centers may be necessary. Second, a more extended follow-up period is necessary to identify the long-term clinical significance of INPS. Last, the dynamic changes in INPS should be explored to identify its predictive ability more fully.

5. Conclusions

For nonpCR breast cancer patients receiving NAC, the INPS based on eight common inflammatory and nutritional variables is an independent predictor of survival. As a comprehensive parameter, it is superior to BMI, PNI, and MLR in predicting survival time. Additionally, it may be a promising biomarker for breast cancer with low HER-2 expression.

Data Availability

Due to the nature of this research, participants of this study did not agree for their data to be shared publicly, so supporting data is not available.

Ethical Approval

The ethics committee of Harbin Medical University Cancer Hospital approved this research. It complies with the World Medical Association Declaration of Helsinki in 1964 and its later amendments. All patients signed informed consent before each treatment.

Conflicts of Interest

There are no conflicts of interest for all authors.

Authors’ Contributions

Cong Jiang and Yuanxi Huang conceptualized and designed the work. Yuting Xiu, Shiyuan Zhang, and Xiao Yu collected the data. Cong Jiang and Kun Qiao drafted and analyzed the manuscript. All authors contributed to the article and approved the submitted version. Cong Jiang and Yuting Xiu contributed equally to this work.

Acknowledgments

We thank Harbin Medical University Cancer Hospital for the data support. This work was supported by grants from the Haiyan Foundation of Harbin Medical University Cancer Hospital (Grant Number: JJQN2022-01). The funder had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Supplementary Materials

Table S1: the relationship between hematological parameters which were included into COX regression analysis, OS, and DFS. Certificate of English Editing: the first Certificate of English Editing. AJE editing certificate: the second Certificate of English Editing. (Supplementary Materials)