Abstract
Background. Preventing in-hospital mortality in patients with ST-segment elevation myocardial infarction (STEMI) is a crucial step. Objectives. The objective of our research was to develop and externally validate the diagnostic model of in-hospital mortality in acute STEMI patients used artificial intelligence methods. Methods. We divided nonrandomly the American population with acute STEMI into a training set, a test set, and a validation set. We converted the unbalanced data into balanced data. We used artificial intelligence methods to develop and externally validate several diagnostic models. We used confusion matrix combined with the area under the receiver operating characteristic curve (AUC) to evaluate the pros and cons of the above models. Results. The strongest predictors of in-hospital mortality were age, gender, cardiogenic shock, atrial fibrillation (AF), ventricular fibrillation (VF), third degree atrioventricular block, in-hospital bleeding, underwent percutaneous coronary intervention (PCI) during hospitalization, underwent coronary artery bypass grafting (CABG) during hospitalization, hypertension history, diabetes history, and myocardial infarction history. The F2 score of logistic regression in the training set, the test set, and the validation dataset was 0.81, 0.6, and 0.59, respectively. The AUC of logistic regression in the training set, the test set, and the validation data set was 0.77, 0.78, and 0.8, respectively. The diagnostic model built by logistic regression was the best. Conclusion. The strongest predictors of in-hospital mortality were age, gender, cardiogenic shock, AF, VF, third degree atrioventricular block, in-hospital bleeding, underwent PCI during hospitalization, underwent CABG during hospitalization, hypertension history, diabetes history, and myocardial infarction history. We had used artificial intelligence methods developed and externally validated several diagnostic models of in-hospital mortality in acute STEMI patients. The diagnostic model built by logistic regression was the best. We registered this study with the registration number ChiCTR1900027129 (the WHO International Clinical Trials Registry Platform (ICTRP) on 1 November 2019).
1. Introduction
In the United States, an estimated 605000 acute myocardial infarction (AMI) events occur each year [1]. In Europe, the in-hospital mortality of patients with ST-segment elevation myocardial infarction (STEMI) is between 4% and 12% [2]. Coronary heart disease including STEMI remains the main cause of death [1]. Preventing in-hospital mortality of STEMI is a crucial step. A tool is needed to help early detection of patients with increased in-hospital mortality. The Global Registration Risk Score for Acute Coronary Events (GRACE) can be accessed via mobile devices, so it enjoyed a high reputation among users. Myocardial infarction thrombolysis (TIMI) risk score can predict the clinical manifestations of 30-day mortality in patients with fibrinolytic-eligible STEMI [3]. The ACTION (acute coronary treatment and intervention outcomes network) score [4] was established in 2011 using 65668 AMI patients, and 16336 AMI patients were used to validate as a model for predicting in-hospital mortality. The ACTION model updated in 2016 used more patients and added cardiac arrest as a risk factor [5]. Xiang Li used the machine learning method to make a prediction model of in-hospital mortality for STEMI patients [6]. Kwon JM used deep learning to establish a prediction model of in-hospital mortality in STEMI patients, which is better than the GRACE score and TIMI score [7].
The current prediction models had the following problems: people had insufficient understanding of the dataset of in-hospital mortality as unbalanced data. The unbalanced data were not converted into balanced data. There was no confusion matrix to be made, and the area under the receiver operating characteristic curve (AUC) or C statistic to evaluate the prediction model was not comprehensive. Traditional statistical methods were difficult to deal with the above problems calmly; artificial intelligence methods were needed.
The objective of our research was to develop and externally validate the diagnostic model of in-hospital mortality in acute STEMI patients used artificial intelligence methods.
2. Methods
The training dataset was 44996 patients with acute STEMI from January 2016 to December 2016 in the United States. The test dataset was 43581 hospitalized patients with acute STEMI from January 2017 to December 2017 in the United States. The validation dataset came from 40498 hospitalized patients with acute STEMI from January 2018 to December 2018 in the United States. Data from the National (Nationwide) Inpatient Sample (NIS) were used for this study.
Inclusion criteria were as follows: all those STEMI patients who were hospitalized and all those STEMI patients over 18 years of age. Exclusion criteria were none. It was a retrospective analysis, and informed consent was waived by Ethics Committee of Beijing Anzhen Hospital Capital Medical University. Outcome of interest was in-hospital mortality. In-hospital mortality was defined as cardiogenic or noncardiogenic death during hospitalization. The presence or absence of in-hospital mortality was decided, blinded to the predictor variables and based on the medical record. We selected 14 predictor variables according to clinical relevance. Fourteen potential candidate variables were age, gender, cardiogenic shock, atrial fibrillation (AF), ventricular fibrillation (VF), first degree atrioventricular block, second degree atrioventricular block, third degree atrioventricular block, in-hospital bleeding, underwent percutaneous coronary intervention (PCI) during hospitalization, underwent coronary artery bypass grafting (CABG) during hospitalization, hypertension history, diabetes history, and myocardial infarction history. All of them were based on the medical record and blinded to the predictor variables. AF was defined as all types of AF during hospitalization. VF was defined as all types of VF during hospitalization. In-hospital bleeding was defined as all types of bleeding during hospitalization.
We kept all continuous data as continuous and retained on the original scale. We used univariable and multivariable logistic regression models to identify the correlates of in-hospital mortality. We entered all variables of Table 1 into the univariable logistic regression. Based on the variables significantly generated by univariate logistic regression, we constructed a multivariate logistic regression model using the backward variable selection method. We used the Akaike information criterion (AIC) and Bayesian information criterion (BIC) to select predictors. It accounts for model fit while penalizing for the number of parameters being estimated and corresponds to using α = 0.157 [8].
In the training dataset, 5169 out of 44966 hospitalized patients (11.5%) experienced in-hospital mortality which represented an imbalanced dataset. We evaluated the effect of common sampling methods including downsampling methods. Therefore, the downsampling technique was additionally implemented on the original dataset to create 1 balanced dataset. We randomly selected 13% in the survival data as the control group. This ultimately yielded 2 datasets: original and downsampling.
To ensure reliability of data, we excluded patient who had missing information on predictors. Discrimination was the ability of the diagnostic model to differentiate between patient with and without in-hospital mortality. This measure was quantified by calculating the AUC [8] and confusion matrix.
Predictive classifiers were developed based on data from the training set using 5 supervised artificial intelligence methods: logistic regression, random forest, extreme gradient boosting (XGBoost), K nearest neighbour classification model, and multilayer perceptron. Confusion matrix includes accuracy, sensitivity, specificity, precision, F1 score, and F2 score. TP, true positive; FN, false negative; FP, false positive; TN, true negative = TN. Accuracy = (TP + TN)/(TP + TN + FP + FN); sensitivity = recall = TP/(TP + FN); specificity = TN/(TN + FP); precision = TP/(TP + FP); F1 score = 2precisionrecall/(precision + recall), F2 score = 5precisionrecall/(4precision + recall). F1 score was defined as the harmonic average of precision and recall. In addition to F1 scores, F2 score and F 0.5 score were also widely used in statistics. Among them, in the F2 score, the weight of the recall was higher than the precision, and in the F 0.5 score, the weight of the precision was higher than the recall. The weight of the recall was higher than the precision for the mortality in STEMI patients. We used F2 score combined with AUC to evaluate the pros and cons of the above models.
We performed statistical analyses with STATA version 15.1 (StataCorp, College Station, TX).
We performed artificial intelligence statistical analysis using Python 3.8.5, Pandas 1.2.1, Sklearn 0.0, NumPy 1.19.2, and Keras 2.4.3.
3. Results
The patients’ baseline characteristics of original and downsampling are given in Table 1. Twelve variables (age, gender, cardiogenic shock, AF, VF, third degree atrioventricular block, in-hospital bleeding, underwent PCI during hospitalization, underwent CABG during hospitalization, hypertension history, diabetes history, and myocardial infarction history) were significant differences in the two groups of patients (). After application of the backward variable selection method, AIC and BIC, all of them remained as significant independent predictors of in-hospital mortality. Results are given in Table 2. In the test set, 4895 out of 43581 hospitalized patients (11.2%) experienced in-hospital mortality. The baseline characteristics of the patients are given in Table 1. In the validation dataset, 4001 out of 40498 hospitalized patients (9.9%) experienced in-hospital mortality. The baseline characteristics of the patients are given in Table 1. By comparing F2 score and AUC of Table 3, we can find that the diagnostic model built by the dataset of downsampling was better than those of the diagnostic model built by the dataset of original. By comparing F2 score and AUC of Table 3, we can find that the diagnostic model built by logistic regression was better than the diagnostic model built by decision tree, XGBoost, multilayer perceptron, and K nearest neighbour. So, we used the diagnostic model built by dataset of downsampling and built by logistic regression (modellog.m Supplementary Materials).
The code used for using the diagnostic mode can be seen in code 1 Supplementary Materials. We input the following code on the browser: https://127.0.0.1:8000/ml/predict? AGE = 60&FEMALE = 1&HBP = 1&VF = 0&AF = 1&OMI = 1&CSHOCK = 1&IIIAVB = 1&DM = 1&PCI = 1&CABG = 0&BLEEDING = 1 The result can be as follows: ({“features”: {“AF”: 1.0, “AGE”: -1.0, “BLEEDING”: 1.0, “CABG”: 0.0, “CSHOCK”: 1.0, “DM”: 1.0, “FEMALE”: 1.0, “HBP”: 1.0, “IIIAVB”: 1.0, “OMI”: 1.0, “PCI”: 1.0, “VF”: 0.0}, “result”: 0}, {“message”: “1 = death,0 = alive”})
4. Discussion
In this study, we investigated the predisposing factors of in-hospital mortality in patients with acute STEMI. Age, gender, cardiogenic shock, AF, VF, third degree atrioventricular block, in-hospital bleeding, underwent PCI during hospitalization, underwent CABG during hospitalization, hypertension history, diabetes history, and myocardial infarction history were the significant independent predictors of in-hospital mortality.
The F2 score of logistic regression in the training set, the test set, and the validation dataset was 0.8, 0.6, and 0.6, respectively. The AUC of logistic regression in the training set, the test set, and the validation dataset was 0.77, 0.78, and 0.8, respectively. The diagnostic model built by logistic regression was the best. So, we use the diagnostic model built by logistic regression.
Granger CB et al. observed that age, Killip class, systolic blood pressure, ST-segment deviation, cardiac arrest during presentation, serum creatinine level, positive initial cardiac enzyme findings, and heart rate were the independent predictors of in-hospital mortality among 11389 patients in the GRACE [9]. Karen S. Pieper et al. generated the updated GRACE risk model and a nomogram [10]. The GRACE risk model has since been upgraded again [11] and simplified [12]. TIMI risk score predicted 30-day mortality at presentation of fibrinolytic-eligible patients with STEMI [3]. C-ACS [13] was simple four-variable score that have been developed to enable risk stratification at first medical contact. ACTION score [4] used 65668 patients to develop and 16336 patients to validate a model to predict in-hospital mortality. The ACTION model updated in 2016 used more patients (243440) and added cardiac arrest as a risk factor [5]. This was a form of internal validation because their cohorts were randomly created [8]. Xiang Li used the machine learning method to make a prediction model of in-hospital mortality for STEMI patients [6]. Kwon JM used deep learning to establish a prediction model of in-hospital mortality in STEMI patients [7].
So far, clinicians and researchers usually use GRACE or TIMI scores to guide treatment decisions. Our diagnostic model of in-hospital mortality was built upon these studies in several ways. We converted the unbalanced data into balanced data. We used the confusion matrix combined with AUC to evaluate the pros and cons of the models.
Our study has several important limitations including its retrospective nature. We used PCI and CABG during hospitalization as one of the baseline variables and predictors of in-hospital mortality. Therefore, there may be selection bias towards survivors and may not be clinically useful as predictor of survival (e.g., for clinical use to identify high-risk patients for more intensive treatment). Information such as number of hospitals, PCI vs. non-PCI centers, location of hospitals (i.e., rural vs. academic centers), and primary vs. rescue PCI were not included in the analysis. Variables such as electrocardiographic parameters, heart rate, presenting blood pressure, Killip class, creatinine, chronic renal failure, acute kidney injury, time to presentation, weight, biomarker levels, and location of culprit lesions were not included in the analysis either. The F2 score and AUC of logistic regression in the training set, the test set, and the validation dataset were the modest.
5. Conclusion
The strongest predictors of in-hospital mortality were age, gender, cardiogenic shock, AF, VF, third degree atrioventricular block, in-hospital bleeding, underwent PCI during hospitalization, underwent CABG during hospitalization, hypertension history, diabetes history, and myocardial infarction history. We had used artificial intelligence methods developed and externally validated several diagnostic models of in-hospital mortality in acute STEMI patients. The diagnostic model built by logistic regression was the best.
Abbreviations
AF: | Atrial fibrillation |
AMI: | Acute myocardial infarction |
AUC: | Area under the receiver operating characteristic curve |
FN: | False negative |
FP: | False positive |
MI: | Myocardial infarction |
NIS: | National (Nationwide) Inpatient Sample |
ROC: | Receiver operating characteristic |
STEMI: | ST-elevation myocardial infarction |
TN: | True negative |
TP: | True positive. |
Data Availability
The data generated or analysed during this study are included within the article (and its supplementary information files).
Ethical Approval
This study was approved by the Ethics Committee of Beijing Anzhen Hospital Capital Medical University (2019039X). The author registered this study with the WHO International Clinical Trials Registry Platform (ICTRP) (registration number: ChiCTR1900027129; registered date: 1 November 2019) (https://www.chictr.org.cn/edit.aspx?pid=44888&htm=4). All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.
Disclosure
A preprint has previously been published [14].
Conflicts of Interest
The author declares that there are no conflicts of interest.
Authors’ Contributions
YL contributed to the generation of the study data, analysed and interpreted the study data, drafted the manuscript, and revised the manuscript. YL was responsible for the overall content as guarantor. The author read and approved the final manuscript.
Supplementary Materials
The data are demographic and clinical characteristics of patients with acute STEMI. AGE, age; AF, atrial fibrillation; BLEEDING, in-hospital bleeding; CABG, underwent coronary artery bypass grafting during hospitalization; CSHOCK, cardiogenic shock; DIED, in-hospital mortality; DM, diabetes history; FEMALE, female; HBP, history of hypertension; OMI, history of myocardial infarction; PCI, percutaneous coronary intervention during hospitalization; VF, ventricular fibrillation. Data from the National (Nationwide) Inpatient Sample (NIS) data from January 2016 to December 2018 in the United States were used for this study (https://www.hcup-us.ahrq.gov/). (Supplementary Materials)