Advanced Scientific Programming Methods for Health InformaticsView this Special Issue
Deep Learning-Based Arrhythmia Detection in Electrocardiograph
This study aimed to explore the application of electrocardiograph (ECG) in the diagnosis of arrhythmia based on the deep convolutional neural network (DCNN). ECG was classified and recognized with the DCNN. The specificity (Spe), sensitivity (Sen), accuracy (Acc), and area under curve (AUC) of the DCNN were evaluated in the Chinese Cardiovascular Disease Database (CCDD) and Massachusetts Institute of Technology-Beth Israel Hospital (MIT-BIH) arrhythmia database, respectively. The results showed that in the CCDD, the original model tested by the small sample set had an accuracy (Acc) of 82.78% and AUC of 0.882, while the Acc and AUC of the translated model were 85.69% and 0.893, respectively, so the difference was notable ( < 0.05); the Acc of the original model and the translated model was 80.12% and 82.63%, respectively, in the large sample set, so the difference was obvious ( < 0.05). In the MIT-BIH database, the Acc of normal (N) heart beat (HB) (99.38%) was higher than that of the atrial premature beat (APB) (87.45%) ( < 0.05). In a word, applying the DCNN could improve the Acc of ECG for classification and recognition, so it could be well applied to ECG signal classification.
With the rapid development of society and economy in recent years, people’s life rhythm has also accelerated, and the increasing pressure of survival has increased the prevalence of cardiovascular disease (CVD) year by year, threatening the human life and health seriously . According to the Chinese Report on Cardiovascular Health and Disease (2019), the prevalence of CVD in China is on the rise, and there are about 330 million people with CVD currently [2, 3]. At present, death caused by CVD occupies the first place among the total deaths of urban and rural residents . Therefore, prevention and treatment of CVD are very important. Most CVD patients are accompanied by arrhythmia, and most of which suffered from some chronic diseases. Therefore, effective detection and diagnosis of arrhythmia and early prevention of CVD are of great clinical significance.
ECG is an important indicator reflecting the periodic activity of the heart, and it is widely used in clinical practice . At present, the ECG is still the final important means to diagnose arrhythmia. The traditional ECG analysis is mainly based on the naked eye observation of clinicians, and the analysis is based on personal experience and existing theoretical knowledge, which may lead to misjudgment of the results and cause serious impact on patients and hospitals [6, 7]. Therefore, the automatic analysis and diagnosis technology of ECG comes into being. With the development of computer technology and artificial intelligence (AI) technology, the automatic classification method of ECG becomes mature gradually. Many scholars have proposed a variety of automatic ECG classification algorithms. Xu et al. (2018)  proposed a framework combining the improved frequency slice wavelet transform (FSWT) and CNN, which could identify the atrial fibrillation or nonatrial fibrillation on the ECG automatically and accurately; the research process could be divided into four parts: ECG signal acquisition, signal denoising, signal feature extraction, and ECG signal classification and recognition . CNN is a discriminative deep learning model used for image signal and speech signal processing. It has shown excellent results in the application of image and speech and has become a hotspot in the field of speech recognition [10, 11]. CCDD contains 12-lead ECG records, feature annotation information, and heartbeat diagnosis results and introduces the morphological feature parameters of great diagnostic value, which is more important. The MIT-BIH is the first widely used data set containing standard test materials to evaluate the performance of arrhythmia detectors, and more than 500 websites worldwide have used it to evaluate the performance of the arrhythmia detector and basic research on cardiac dynamics.
Therefore, the deep learning method was adopted in this study to diagnose arrhythmia, and CNN was utilized to realize the automatic classification and recognition of ECG signals. Experiments were developed on the CCDD and the MIT-BIH arrhythmia database, respectively, and the experimental results were evaluated based on the Spe, Sen, Acc, and AUC.
2. Materials and Methods
2.1. Selection of Experimental Data
The four internationally recognized standard ECG databases referred to MIT-BIH arrhythmia database, American heart association (AHA) database, QT database, and common standards for electrocardiography (CSE) database . The MIT-BIH arrhythmia database included 15 types of arrhythmias, including APB, atrial flutter (AFL), ventricular premature beat (VPB), right bundle branch block (RBBB), and left bundle branch block (LBBB). All data in this database could be downloaded and used for free, and it is the most widely used database. In addition, it was undertaken as the experimental data and testing standards by many existing ECG-related studies . However, these databases had insufficient scale and representativeness to varying degrees. CCDD was a 12-lead database with a sampling rate of 500 Hz, and there were a total of more than 170,000 records derived from hospital clinical data from different individuals . Therefore, the data of the CCDD and MIT-BIH arrhythmia database were adopted for classification comparison in this study.
2.2. Deep Convolutional Neural Network
Under normal circumstances, convolutional neural networks mainly include convolutional layers, pooling layers, and fully connected layers. The loss function was used to optimize training and update parameters. The convolutional layer was the most critical basic structure in the CNN, which mainly implemented the network convolution operation on the data. If the input single sample data vector was , then the output signal C after the convolutional layer calculation could be written as follows:
C is the output signal calculated by the convolutional layer, t represents the index of the layer, f represents the activation function to introduce nonlinear processing to the layer, d represents the deviation term of the nth feature map, S refers to the convolution kernel size, and hs represents the weight value of the nth feature map and the sth filter. The pooling layer was also called the subsampling layer. Pooling was divided into maximum pooling and average pooling. The maximum pooling could be calculated with equation (2), and the average pooling could be calculated with equation (3):
The fully connected layer connected each neuron in the output layer and the upper input layer to achieve the feature integration. The calculation method was given as follows:
In equation (4), Q refers to the fully connected layer and stands for the weight value of the fully connected layer.
2.3. Evaluation Indicators
Spe, Sen, Acc, and AUC were adopted to quantitatively evaluate the performance of the classification model in this study.
The calculation equation of Spe was shown in equation (5), which represented the proportion of correctly classified samples among all negative samples.
The Sen could be calculated with equation (6), which represented the proportion of correctly classified samples among all positive samples.
The calculation equation of Acc was shown as follows, which referred to the proportion of training positive samples in all predicted positive samples.
In equations (5)–(7), Spe, Sen, and Acc represent the specificity, sensitivity, and accuracy, respectively; TN is the number of true negative samples, representing the number of actual negative samples classified as negative samples; FP is the number of false positive samples, indicating the number of actual negative samples classified as true samples; TP is the number of true samples, indicating the number of actual positive samples classified as true samples; and FN refers to the number of false negative samples, indicating the actual positive samples classified as the negative samples. The range of AUC is between 0 and 1. The closer the AUC is to 1, the better the model is.
2.4. Experiment on CCDD
Increasing the number of training samples could reduce the number of functions that can be fitted by the training samples, making it easier to obtain the true decision function and improving the prediction Acc of the fitting function. However, the number of samples in the ECG database was too limited to increase the number of training samples indefinitely, and the normal ECG was more than the ECG with a certain disease, so the training of the decision function was affected. The training sample was increased by translating the starting position of the ECG in this study. When the ECGs were collected from the ECG database, the starting point of the ECG was uncertain and can be any position of a HB. The ECGs taken from different starting points of the same person were featured with a completely different sequence. Thus, translating the position of the starting point could increase the number of training samples effectively.
For the data of 2,000 points in 10 s that have been downsampled, 1,900 points in 9.5 s were selected and 20 points were translated each time. In final, 6 times of the data could be obtained so that the training samples were changed from 12,320 to 73,920. The CCDD database included a total of more than 170,000 records. 12,320 ECG records of which were selected to train the deep network, 12,320 ECG records were selected in the small sample test set, and the remaining 150,000 were included in the large sample test set. Figure 1 shows the flowchart of experiment on the CCDD. The 8-lead ECG was defined as a two-dimensional image. The dimension of the input layer is 8 × 1900; the convolutional layer A contains 32 feature surfaces, each of which uses a 1 × 5 convolution kernel with a step size of 3. The convolutional layer outputs 32 8 × 662 (where 8 = 8−1 + 1, 662 = (1900 − 5)/3 + 1) feature surfaces. The sampling layer A adopts 1 × 3 sampling kernel with a step size of 2, generating 32 8 × 330 (8 = 8/1, 330 = (662 − 3)/2 + 1) feature surfaces. Convolutional layer B consists of a total of 64 feature surfaces, using a 1 × 5 kernel with a step length of 3, generating 64 8 × 109 feature surfaces. Sampling layer B uses a 1 × 3 kernel with a step size of 2 to generate 64 8 × 54 feature surfaces. The convolution layer C uses a 1 × 4 convolution kernel with a step size of 3, resulting in 64 8 × 17 feature surfaces. The sampling layer C also uses a 1 × 3 kernel with a step size of 2, generating 64 2 × 8 feature surfaces. Finally, two fully connected layers are adopted for binary classification output with a multilayer perceptron.
2.5. Experiments on MIT-BIH Database
A total of 48 records could be found in the MIT-BIM database, and each record was equipped with two leads. Of which, the corrected limb lead II accounted for the most, so the data of corrected limb lead II were selected for the experiment. 5 types of HBs with the most frequency were extracted from the 48 records, including 73,509 N HBs, 7,130 VPBs, 2,546 APBs, 7,259 RBBBs, and 8,075 LBBBs. A half of the data were selected to train the CNN, and the other half were used for test. Figure 2 shows the flowchart of the experiment on MIT-BIH database. The parameters are defined as follows: the dimension of the input layer is 1 × 300, and the convolutional layer A contains a total of 32 feature surfaces with a 1 × 5 convolution kernel with a step size of 3 for each surface; the input layer outputs 32 1 × 99 (1 = 1 − 1 + 1, 99 = (300 − 5)/3 + 1) feature surfaces; sampling layer A adopts 1 × 3 sampling kernel with a step size of 2, generating 32 1 × 49 (1 = 1/1, 49 = (99 − 3)/2 + 1) feature surfaces; convolutional layer B uses a 1 × 3 kernel with a step size of 3, generating 64 1 × 16 feature surfaces; sampling layer B uses a 1 × 3 kernel with a step size of 2, generating 64 1 × 7 feature surfaces. Finally, they are all fully connected for five-class output with a multilayer perceptron.
2.6. Statistical Methods
The data processing in this study was analyzed by SPSS19.0 version statistical software, the measurement data were expressed as mean ± standard deviation ( ± s), and the count data were indicated as percentage (%). Pairwise comparison was realized with analysis of variance. The difference was statistically obvious at < 0.05.
3.1. Experimental Results on CCDD
The ECGs of different leads were classified using DCNN structure. The experimental results showed that the Acc values of lead I, II, V1, V2, V3, V4, V5, and V6 were 73.96%, 75.02%, 75.89%, 75.13%, 76.39%, 77.26%, 81.97%, and 79.53%, respectively (as shown in Figure 3); Spe values were 82.78%, 83.10%, 82.24%, 82.21%, 83.12%, 85.67%, 88.07%, and 85.98%, respectively (as shown in Figure 4); and the Sen values were 54.79%, 57.77%, 61.59%, 57.66%, 59.68%, 63.15%, 71.04%, and 67.21%, respectively. Figure 3 reveals that the Acc of lead V5 was much higher than that of other leads with drastic difference ( < 0.05). AUC values of the 8 leads were 0.781, 0.793, 0.739, 0.737, 0.746, 0.831, 0.861, and 0.827, respectively (Figure 4).
The Acc, Sen, and Spe of the lead V5 were higher than those of other leads, so it could reflect more basic information of the patient. When the ECG was analyzed, doctors should pay attention to the ECG information of this lead, as illustrated in Figure 5.
The results of all leads in the small sample set experiment were collected and analyzed. It was found that the average Acc, Spe, Sen, and AUC of all 8 leads was 82.78%, 88.93%, 74.23%, and 0.882, respectively, as illustrated in Figure 6. The average AUC value of all 8 leads in the small sample set test was 0.882, which was close to 1, indicating that the classification performance was good.
3.2. Experimental Results of 8 Leads under Different Models
Before the starting point of the data was translated to increase the training sample, the Acc of the DCNN test on the small sample set was 82.78%. After the training data were translated to 6 times the data, the tested Acc was increased to 85.69% (Figure 7). Thus, there was a statistical difference between the Acc values before and after translation ( < 0.05). Figure 8 discloses that AUC changed from 0.882 to 0.893, so the difference was statistical ( < 0.05).
The experimental results in large sample sets showed that the Acc, Spe, and Sen of the original model without translation were 80.12%, 82.21%, and 78.97%, respectively, while those of the translated model were 82.63%, 84.16%, and 80.15%, respectively. Figure 9 suggests that the Acc of the translated model was much higher than that of the original model with obvious difference ( < 0.05).
3.3. Experimental Results on MIT-BIH Database
The single-beat ECG was extracted layer by layer through DCNN on the MIT-BIH database. The Sen, Spe, and Acc of N HB were 99.02%, 99.75%, and 99.38%, respectively; those of VPB were 98.67%, 95.72%, and 97.15%, respectively; those of the APB were 94.82%, 81.39%, and 87.45%, respectively; those of RBBB were 98.39%, 98.72%, and 98.55%, respectively; and the three indicators of LBBB were 98.65%, 99.08%, and 98.87%, respectively. Figure 10 suggests that the classification Sen of N HB reached 99.02%, but the classification Sen of APB was lower (94.82%), so there was an observable difference ( < 0.05); the classification Spe of N HB (99.75%) was higher obviously than that of APB (81.39%) ( < 0.05); the classification Acc of N HB was the highest (99.38%), which was greatly different with that of APB (87.45%) ( < 0.05).
The N HB and four types of arrhythmias were collected statistically. It showed that there were 49,260 records in the training sample and 49,459 records in the test sample; and the number of true negative (TN), true positive (TP), false positive (FP), and false negative (FN) records was 49,260, 48,729, 574, and 576, respectively. Figure 11 suggests that the Sen, Spe, and Acc were 98.83%, 98.85%, and 98.84%.
3.4. Arrhythmia ECG Diagnosis
Figure 12 shows the ECG of a 37-year-old male patient with arrhythmia. The QRS complex in the picture was abnormal, and the circulation was no longer repeated, so it was determined as VPB and atrioventricular block (AVB).
In recent years, there have been many studies on the automatic classification of ECG, but the existing algorithms are not effective in clinical practice. It is necessary to improve the Acc of automatic classification of electrocardiogram. CVDs have now seriously threatened people’s lives and health. With the development of social economy and changes in people’s lifestyles, the number of people suffering from CVDs has shown a continuous increase . In order to prevent the occurrence of heart disease, it is very important to effectively detect and identify the ECG. The traditional ECG relies on manual identification and analysis, which is different from the actual situation and lacks real-time performance, having reliable impact on the patient [16, 17]. At present, there are relatively few research studies on the application of deep learning in arrhythmia, lacking reliable theoretical foundation and data support, so deep learning algorithms have not been widely used in the diagnosis of arrhythmia, and in-depth research on DCNN algorithms is needed.
In recent years, the DCNNs have gradually become an important means of medical image analysis and have also been applied in physiological signal analysis . In this study, the DCNN method was adopted to classify and recognize the ECG. Experiments were realized on the CCDD and the MIT-BIH database. Acc, Spe, and Sen were analyzed and compared to evaluate the experimental results. The experiment in the CCDD showed that the V5 lead had the highest Acc of 81.97%, which could more reflect the basic information of the patient; the average Acc, Spe, Sen, and AUC of all 8 leads on the small sample set was 82.78%, 88.93%, 74.23%, and 0.882, respectively; and the AUC was close to 1, indicating good classification performance. The Acc of the original model tested on a small sample set was 82.78%, and the Acc of the model after translation was increased to 85.69%, and the AUC was increased from 0.882 to 0.893 with drastic difference ( < 0.05). In the large sample set, the Acc of the original model without translation was 80.12%, which was greatly different from that of the translated model (82.63%) ( < 0.05). It was similar to the results of Xie et al. (2018) . The experiment on MIT-BIH database showed that the Acc of N HB was the highest, which could be 99.38%, but the classification Acc of APB was only 87.45% ( < 0.05), which may be due to the least proportion of the APB. The average classification Sen, Spe, and Acc of all HBs were 98.83%, 98.85%, and 98.84%, which were similar to the results of Kamaleswaran et al. (2018) , indicating that the DCNN could realize high Acc for classification and recognition of arrhythmia.
The arrhythmia was classified and recognized based on the ECG of the DCNN algorithm. Experiments were realized on the CCDD and the MIT-BIH database. The Acc, Spe, and Sen were analyzed to evaluate the experimental results. The Acc reached 82.78% for classifying on the CCDD; and the Acc reached 98.84% for classifying on the MIT-BIH database. However, four arrhythmia diseases were selected for analysis only in this study due to the limitation of the amount of arrhythmia data, which may cause certain deviations in the results. It will consider increasing the range of arrhythmia types and further explore the application of the DCNN algorithm in ECG. In short, the DCNN algorithm could improve the Acc of automatic ECG classification and provide a theoretical basis for its application in the automatic classification and recognition of ECG.
No data were used to support this study.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
M. Falter, W. Budts, K. Goetschalckx, V. Cornelissen, and R. Buys, “Accuracy of apple watch measurements for heart rate and energy expenditure in patients with cardiovascular disease: cross-sectional study,” JMIR Mhealth Uhealth, vol. 7, no. 3, Article ID e11889, 2019.View at: Publisher Site | Google Scholar
H. Tracer and Y. T. Jadotte, “Screening for cardiovascular disease risk with electrocardiography,” American Family Physician, vol. 98, no. 6, pp. 375-376, 2018.View at: Google Scholar
Z. I. Attia, P. A. Noseworthy, F. Lopez-Jimenez et al., “An artificial intelligence-enabled ECG algorithm for the identification of patients with atrial fibrillation during sinus rhythm: a retrospective analysis of outcome prediction,” The Lancet, vol. 394, no. 10201, pp. 861–867, 2019.View at: Publisher Site | Google Scholar
P. Xie, G. Wang, C. Zhang et al., “Bidirectional recurrent neural network and convolutional neural network (BiRCNN) for ECG beat classification,” Annual International Conference of the IEEE Engineering in Medicine and Biology Society, vol. 2018, pp. 2555–2558, 2018.View at: Publisher Site | Google Scholar
R. Kamaleswaran, R. Mahajan, and O. Akbilgic, “A robust deep convolutional neural network for the classification of abnormal cardiac rhythm using single lead electrocardiograms of variable length,” Physiological Measurement, vol. 39, no. 3, Article ID 035006, 2018.View at: Publisher Site | Google Scholar