Abstract

Lymph node metastasis (LNM) is considered to be one of the important factors in determining the optimal treatment for early gastric cancer (EGC). This study aimed to develop and validate a nomogram to predict LNM in patients with EGC. A total of 842 cases from the Surveillance, Epidemiology, and End Results (SEER) database were divided into training and testing sets with a ratio of 6 : 4 for model development. Clinical data (494 patients) from the hospital were used for external validation. Univariate and multivariate logistic regression analyses were used to identify the predictors using the training set. Logistic regression, LASSO regression, ridge regression, and elastic-net regression methods were used to construct the model. The performance of the model was quantified by calculating the area under the receiver operating characteristic curve (AUC) with 95% confidence intervals (CIs). Results showed that T stage, tumor size, and tumor grade were independent predictors of LNM in EGC patients. The AUC of the logistic regression model was 0.766 (95% CI, 0.709–0.823), which was slightly higher than that of the other models. However, the AUC of the logistic regression model in external validation was 0.625 (95% CI, 0.537–0.678). A nomogram was drawn to predict LNM in EGC patients based on the logistic regression model. Further validation based on gender, age, and grade indicated that the logistic regression predictive model had good adaptability to the population with grade III tumors, with an AUC of 0.803 (95% CI, 0.606–0.999). Our nomogram showed a good predictive ability and may provide a tool for clinicians to predict LNM in EGC patients.

1. Introduction

Gastric cancer, the third leading cause of cancer death in the world, is responsible for more than 1 million new cases each year [1]. Morbidity and mortality of gastric cancer were higher in East Asia, East Europe, and South America [25]. In addition, approximately half of the estimated deaths from gastric cancer in 2018 occurred in China [1]. Early gastric cancer (EGC) is defined as gastric cancer confined to the lamina propria or mucosa and submucosa, regardless of the size or presence of regional lymph node metastasis (LNM) [6]. LNM is the most common form of gastric cancer metastasis and a major contributor to the high mortality. In the TNM staging system of gastric cancer, LNM was used to guide the treatment plan, and the prognosis was predicted by the number of pathologically positive lymph nodes and the exact stage of the disease [7].

The main treatment methods for EGC include endoscopic mucosal resection (EMR) or endoscopic submucosal dissection (ESD), wedge resection, laparoscopically assisted gastrectomy, and open gastrectomy [8, 9]. Compared with other treatment methods, EMR and ESD can preserve gastric function and maintain quality of life [10, 11]. However, the absence of LNM is a prerequisite for EMR and ESD [12]. Therefore, a tool that can predict LNM in EGC patients was of great significance for surgical methods selection and of patients’ prognosis. Several studies have established nomograms for LNM in patients with EGC [6, 13, 14]. However, these studies had some limitations, such as small sample size, single-center research, and no external validation. In addition, there were few studies on the predictive effect of LNM on EGC patients in different populations.

Herein, we selected the predictor variables of LNM in EGC patients based on the Surveillance, Epidemiology, and End Results (SEER) database. Then, a nomogram to predict the LNM in EGC patients was developed, and external validation was performed to assess the fit of the model.

2. Methods

2.1. Study Design and Population

Data were extracted from the SEER database, which is a national sample of the population-based cancer database proposed by the National Cancer Institute. The SEER database covers approximately 28% of the entire American population. All patients with gastric adenocarcinoma were extracted from the SEER database from 2015 to 2020. For external validation, 494 patients who had been diagnosed with EGC were collected from the Xiangya Hospital Center South University between January 2012 and December 2019. Tumors were staged based on the criteria of the American Joint Committee on Cancer (AJCC) Staging Manual (7th), and EGC in this study included Tis, T1a, and T1b [15]. This study was approved by the Institution Review Board of the Xiangya Hospital Center South University (approval number: 2019030510), and all patients provided written informed consent.

2.2. Inclusion and Exclusion Criteria

Patients who met the following inclusion criteria were eligible for inclusion: (1) patients’ age ≥18 years; (2) patients who were diagnosed by histopathology as stage Tis, T1a, or T1b gastric adenocarcinoma; (3) patients with complete baseline data and pathological data. The exclusion criteria were as follows: (1) patients with no surgical resection or microscopic evaluation of lymph nodes; (2) patients who received radiotherapy or chemotherapy before surgery; (3) patients with metastasis at the time of diagnosis; (4) patients with other gastric tumors (neuroendocrine, gastrointestinal stromal tumors or metastatic disease); (5) patients with a history of other malignancies.

2.3. Data Collection

Demographic and clinical data included the patient’s age, gender, T stage, primary site, tumor size, tumor grade, and LNM. The tumor stage was assigned to Tis, T1a, and T1b stages. Tumor size was divided into <1 cm, 1-2 cm, 2-3 cm, 3-4 cm, and ≥4 cm. LNM was used as an outcome indicator.

2.4. Statistical Analysis

Data were extracted from the SEER database using SEERStat data retrieval software (version 8.3.2). The data were divided into the training set and test set in a 6 : 4 ratio. The clinical practice data were used for external validation. Continuous variables with normal or approximately normal distribution were expressed as mean ± standard deviation (SD), and a t-test was used for comparison between groups. Nonnormal variables were expressed as M (Q1, Q3), and the Wilcoxon rank-sum test was used for comparison between groups. Categorical variables were expressed in numbers and percentages, and the Chi-square test (χ2) or Fisher’s test was used for comparison between groups.

Univariate analysis and multivariate logistic regression analysis were used to select prediction variables and establish the prediction model. Logistic regression, LASSO regression, ridge regression, and elastic-net regression methods were used to construct the model. Meanwhile, the nomogram of the prediction model was drawn, and the Hosmer-Lemeshow goodness of fit test was performed on the predictive model. The performance of the model was quantified by calculating the area under the receiver operating characteristic curve (AUC) with 95% confidence intervals (CI), as well as sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV).

All statistical analyses and drawings were carried out using the R software (version 4.0.2). The caret package was used to normalize the data, and the relevant parameters for modeling were lambdas <-seq (0.0001, 0.01, length. out = 200). The glmnet package was utilized to construct the LASSO regression, ridge regression, and elastic-net regression models, and threefold cross-validation was performed. Others R packages such as compareGroups, ResourceSelection, rms, and pROC were also used. All tests were two-sided, and the test level was α = 0.05.

3. Results

3.1. Baseline Characteristics

Totally, 842 cases from the SEER database and 494 cases from clinical practice were included in this study (Figure 1). Among these 842 patients, the mean age was 69.4 ± 11.3 years, with 485 (57.60%) patients being males. The primary location of the tumor was mostly the lower part of the stomach (54.04%) and the middle part of the stomach (35.99%). The numbers of patients with LNM in the SEER database and clinical dataset were 176 (20.9%) and 133 (26.92%), respectively. More detailed characteristics were shown in Table 1.

3.2. Differences in Characteristics of Patients with and without LNM

Table 2 shows the characteristics of patients with and without LNM. The results indicated the significant differences between patients with and without LNM in T stage, tumor size, and tumor grade (all ). The incidence of LNM was higher in T1b stage patients than in T1a patients (). LNM was more likely to occur in tumors larger than 2 cm than in smaller tumors (). Tumor grade higher II grade was associated with higher LNM ().

3.3. Factors Associated with LNM in EGC Patients

The univariate and multivariate logistic regression analyses were shown in Table 3. The multivariate logistic regression analysis indicated that T stage, tumor size, and tumor grade positively were correlated with LNM in EGC patients. The risk of LNM in patients with T1b stage was 3.84 times (OR = 3.84; 95% CI, 2.04–7.21) higher than in patients with T1a stage. Compared with the patients with tumor sizes <1 cm, the risk of LNM in patients with tumor sizes of 2-3 cm, 3-4 cm, and ≥4 cm increased by 2.07 times (OR = 3.07; 95% CI, 1.10–8.61), 3.73 times (OR = 4.73; 95% CI, 1.58–14.13), 4.75 times (OR = 5.75; 95% CI, 2.08–15.92), respectively. The risk of LNM in patients with grade III tumors was 3.19 times (OR = 3.19; 95% CI, 1.27–8.00) higher than in those with grade I tumors.

3.4. Model Comparison and Selection

Logistic regression, LASSO regression, ridge regression, and elastic-net regression models were established. Table 4 presents the AUC of these models in the training set and test set. The AUC of the logistic regression, LASSO regression, ridge regression, and elastic-net regression models in the testing set was 0.766 (95% CI, 0.709–0.823), 0.740 (95% CI, 0.681–0.799), 0.737 (95% CI, 0.676–0.797), and 0.749 (95% CI, 0.691–0.807), respectively. There was no significant difference between the AUCs of these models (). The AUC of the logistic regression model was slightly higher than that in the other models, and the results were easier to interpret clinically. Therefore, the logistic regression model was chosen.

3.5. Nomogram for Prediction of LNM in EGC Patients

Table 5 displays the performance of the logistic regression model. In the test set, the AUC, accuracy, sensitivity, specificity, PPV, and NPV of the logistic regression model was 0.766 (95% CI, 0.709–0.823), 0.588 (95% CI, 0.533–0.641), 0.899 (95% CI, 0.802–0.958), 0.507 (95% CI, 0.446–0.569), 0.320 (95% CI, 0.255–0.390), and 0.951 (95% CI, 0.902–0.980), respectively. The Hosmer-Lemeshow goodness of fit test showed good calibration (χ2 = 3.916, ) of this prediction model. However, when external validation was performed using clinical practice data, the AUC of the model was 0.625 (95% CI, 0.537–0.678), implying that the model did not adapt to the external validation data (Figure 2, Table 5).

Then, a nomogram to predict the LNM in EGC patients was drawn based on the logistic regression model. The nomogram can predict the probability of developing LNM in EGC patients by using the sum of the scores determined on the point scale for each variable (Figure 3(a)). An example of the use of this nomogram was as follows: a patient in the SEER database was randomly selected. The patient with the tumor grade III, stage T1b, and tumor size ≥4 cm. The total score of this patient calculated by the nomogram was 243 points, and the possibility of developing LNM was 0.472. After verification, the patient had LNM, and the prediction was successful (Figure 3(b)).

3.6. Further Validation Based on Different Populations

Further validation was performed based on gender, age, and tumor grade (Table 6). In the test set, this logistic regression prediction model had a good prediction effect on males, females, patients with age ≥65 years, age <65 years, grade I tumors, and grade III tumors; the AUC of the model in these populations was 0.793 (95% CI, 0.720–0.866), 0.729 (95% CI, 0.635–0.822), 0.755 (95% CI, 0.688–0.821), 0.794 (95% CI, 0.681–0.907), 0.722 (95% CI, 0.583–0.861), and 0.713 (95% CI, 0.647–0.815), respectively. In the external validation data, the prediction model had good adaptability to the population with grade III tumors, with an AUC of 0.803 (95% CI, 0.606–0.999).

4. Discussion

In this study, a nomogram for LNM in EGC patients was established based on the SEER database, and external validation was performed by using clinical practice data. Factors associated with LNM in EGC patients such as T stage, tumor size, and tumor grade were included in the nomogram. The AUC, sensitivity, and NPV of the prediction model were 0.766, 0.899, and 0.951, respectively. However, the AUC of the external validation data was 0.625, implying a poor fit for the external population. In addition, further validation was performed based on different populations, and the results showed that the prediction model had good adaptability to the population with grade III tumors, with an AUC of 0.803.

Predicting LNM is of great significance in EGC patients, especially in the choice of treatment methods. Some models have been developed to predict the possibility of LNM in gastric cancer [6, 16]. Chen et al. establish a nomogram to predict the LNM of patients with gastric cancer using some variables such as Boarrmann type, preoperative CA199 level, T stage, and N stage, with an AUC of 0.786 [17]. Eom et al. showed that the prediction performance of conventional models established based on tumor size, histological type, lymphatic blood vessel invasion, and depth of invasion was not enough. The predictive performance of the model can be significantly improved by adding some biomarkers such as CD44v6 and α1 catenin to these models [18]. However, most prediction models were developed using a small sample population, or without external validation and advanced gastric cancer population. Our prediction model was established based on the SEER database, and clinical practice data were used for external validation. The AUC of our model was 0.766, indicating good predictive performance. Unfortunately, our nomogram had a poor fit in the external population. Therefore, further validation in different populations showed that the nomogram had good adaptability to the population with grade III tumors, with an AUC of 0.803.

Our results showed that LNM was associated with T stage, tumor size, and tumor grade. Similar results were found in the study of Pokala et al. Tumor stage, grade, and size were independent predictors of LNM [13]. Previous studies have proposed that the T stage was the independent risk factor for LNM [1921]. Tumor size was a risk factor for LNM in gastric cancer shown in many studies; a larger tumor size was correlated with a higher possibility of LNM [16, 22, 23]. Our results presented that the risk of LNM in patients with tumor sizes of ≥4 cm was 5.75 times higher than that in patients with tumor sizes <1 cm. Furthermore, T stage, size, and grade can be used to estimate the incidence of LNM in patients with early gastric adenocarcinoma and to help discuss the risks of different treatment modalities [13, 24].

Previous studies have shown that the prevalence of LNM in EGC patients ranges from 7.7 to 19.4% [21, 25, 26], and most patients underwent excessive surgery and suffered from morbidity [27]. In this case, pretreatment diagnosis of LNM status was very helpful to avoid the high morbidity and mortality of the lymphadenectomy caused by the overtreatment of patients [28]. Therefore, a nomogram that can predict LNM in patients with EGC has important clinical significance. A study by Pokala et al. indicated that patients with early gastric adenocarcinoma should be consulted on appropriate treatment options, and the impact of adverse oncological outcomes that may result from endoscopic treatment on surgical morbidity and quality of life related to major organ resection should be weighed [13].

We developed a nomogram to predict LNM in patients with EGC based on the SEER database and externally validated the model using clinical practice data. When the external validation data did not fit the nomogram, we conducted further validation based on different populations. This tool to predict the likelihood of LNM in EGC patients may help clinicians make surgical decisions. However, this study has some limitations. First, our external validation data did not fit the nomogram, which may be the difference between different races. Second, tumor ulceration [6, 29], lymphovascular invasion [6, 29], and lymph node involvement by endoscopic ultrasound [30] have been reported to be associated with LNM in some studies, but these data lacked in the SEER database.

5. Conclusion

A nomogram to predict the LNM in patients with EGC was developed based on the SEER database. Patients with higher T stage and tumor grade and larger tumor size were more likely to develop LNM. This tool can predict the possibility of LNM in EGC patients, which may help clinicians to make surgical decisions.

Data Availability

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.