IoT enabled smart car era is expected to begin in the near future as convergence between car and IT accelerates. Current smart cars can provide various information and services needed by the occupants via wearable devices or Vehicle to Everything (V2X) communication environment. In order to provide such services, a system to analyze wearable device information on the smart car platform needs to be designed. In this paper a real time user recognition method using 2D ECG (Electrocardiogram) images, a biometric signal that can be obtained from wearable devices, will be studied. ECG (Electrocardiogram) signal can be classified by fiducial point method using feature points detection or nonfiducial point method due to time change. In the proposed algorithm, a CNN based ensemble network was designed to improve performance by overcoming problems like overfitting which occur in a single network. Test results show that 2D ECG image based user recognition accuracy improved by 1%~1.7% for the fiducial point method and by 0.9%~2% for the nonfiducial point method. By showing 13% higher performance compared to the single network in which recognition rate reduction occurs because similar characteristics are shown between classes, capability for use in a smart vehicle platform based user recognition system that requires reliability was demonstrated by the proposed method.

1. Introduction

Recent active convergence between IT and industry has resulted in various cutting edges, IT being grafted in cars, and this has led to progress in development of smart cars with improved driver safety and convenience. Smart car utilizes technology that maximizes safety while pursuing driver convenience by applying IoT (Internet of Things) technology [1]. While the concept of car IoT has not yet been clearly fixed, it can be defined to be a mobile connectivity based computing environment on which various customized user service industry can be created in addition to promoting traffic safety and reducing congestion. Along with wearable devices, the smart car industry is part of the next generation growth industry and supports state of art technology like advanced driver-assistance systems (ADAS) and connectivity. In addition, as shown in Figure 1, expansion of IoT and wearable technology, for which improvement is based on user information, has allowed application of related technology to smart cars. If user specific information gathered from the wearable device worn by the driver can be shared with the smart car, it will be possible to provide more convenience functions in the car. For example, in addition to executing commands via user voice recognition, there is the drowsiness detection service that alerts the driver according to fatigue and risk of driving while being drowsy by checking the biometric signals and vehicle information. In order to provide such services, a system to analyze wearable device information on the smart car platform needs to be designed [2].

In this paper, we will study a deep learning based user recognition method in order to apply biometric signal obtained from wearable devices to a smart car platform. Deep learning is a method in which optimum values are output from various deepened layers that were added to the multilayer perceptron used in existing machine learning. Deep Neural Networks [3], Convolutional Neural Network [4], Recurrent Neural Network [5], Deep Belief Network [6], and Deep Q-networks[7] are some representative examples. In particular, the Convolutional Neural Network which has displayed outstanding performance in various areas including recognition, classification, and prediction is a neural network that has been designed to learn and classify various feature extraction filters automatically. While existing feature extraction method requires design by humans and only the classification is carried out through machine learning, feature extraction and classification are done automatically by the computer for the Convolutional Neural Network.

Existing user recognition methods that analyze physical features like the face, fingerprint, and iris were perceived to be safe as well as convenient. However, these methods use anatomical and physical form of information that is displayed externally and requires cooperation from the user and face to face type user intervention. Additionally, since such information can be forged or altered, problems can occur if applied to services that require a high level of security like fintech, smart medicine, health-care and smart car service. Research in user recognition using biometric signals like ECG, EEG (Electroencephalogram), and EMG (Electromyogram) is being carried out to solve such problems. Important advantages of user recognition using biometric signals are as follows. First, compared to methods that use anatomical and physical form of information that is displayed externally like face and finger print, this method uses signals that occur internally and are hard to forge, they can be obtained from all living people, they include information on clinical and psychological state of the user, and finally, user rerecognition is easy to carry out since the waveform does not change by much over time [8]. Since ECG is a biometric signal that is nonreactive and hard to alter, it is being studied as a next generation user recognition technology. As shown in Figure 2, ECG is different for each individual according to factors like the location, size and structure of the heart, age, and gender. Normal ECG signal contains specific feature points and they can be used to extract features for user recognition. Therefore, an individual can be identified by using features unique to each individual and the user can be recognized by utilizing the ECG signal which can be measured no matter where the individual is located.

In this paper, existing research on ECG based user recognition will be analyzed in Section 2 and the proposed deep learning based ensemble network using ECG data will be explained in Section 3. Section 4 will contain analysis of performance results for the proposed method and the conclusion and suggestion for future research will be presented in Section 5.

Israel proposed an ECG based recognition system using temporal features. After removing noise from the input ECG signal, P, QRS and T waveforms were detected and classification was carried out using Linear Discriminant Analysis after 15 features were extracted. Test on 29 subjects showed 100% subject recognition rate and 82% ECG beat recognition rate [9]. Biel proposed a classification method which compares test data consisting of 30 extracted fiducial point features with the trained group data and selected the best matching class using a Soft Independent Modeling of Class Analogy (SIMCA) classifier.

Test was carried out with a total of 20 subjects and 100% recognition rate was achieved. Even though this method showed a high recognition rate during testing, it is unsuitable for a real time recognition system since it extracts many features during the classification stage [10]. Wang proposed a method that uses a combination of time, amplitude, and R waveform features from the ECG signal. Fiducial point detection was carried out on preprocessed ECG signal in order to measure the time and amplitude distance, and Morphological Characteristics were extracted from main components using Linear Discriminant Analysis. When classification was carried out after two types of features were combined, 100% subject recognition rate and 98.9 recognition rate were achieved for 13 subjects. While a relatively high recognition result was obtained, it was from a small sample of subjects [11]. Shen did research in which overall recognition rate between past data and data obtained by repeated measurement of 23 identical subjects was compared. Test result from template matching that compared morphological difference of the ECG signal by extracting 17 fiducial features showed recognition rates of 98.5% and 87.7%. However, even though the matching result had decreased, the recognition system using the ECG signal was found to display high performance even after some time has passed if calibration is carried out [12]. Chan proposed a feature extraction framework using a distance measurement set that includes the wavelet transform distance. Data was obtained from 50 subjects through electrodes placed between fingers. By applying the wavelet transform distance method for the user recognition method, 89% recognition rate was achieved. Even though test was carried out on many subjects compared to previous tests, it showed a low recognition rate [13]. Chiu proposed an individual identification method using the wavelet and Euclidean classifier. After acquiring ECG signal from 45 subjects, features were extracted using wavelet transformation and the Euclidean distance was utilized [14]. While a relatively simple process was carried out, a low 90.5% recognition result was shown. Loong used Linear Predictive Coding (LPC) parameters for ECG signal classification [15].

As shown in Figure 3, for research that applied the ECG signal to deep learning, Rajpurkar developed a new network composed of 34 layers that forecast 12 variety of arrhythmia from a single lead ECG signal [16] and Chauhan developed a neural network structure that uses many LSTM layers repetitively in order to detect irregular ECG signals [17]. Rahha studied a neural network structure consisting of feature representation layer and Softmax regression layer. In order to remove accumulated noise from the ECG signal, the feature representation layer classified the ECG signal through the Softmax regression layer after training with autoencoders, a nonsupervised learning method [18]. The recent research on user recognition methods using deep learning in various technology fields such as recognition, classification, and prediction has shown excellent performance as shown in Table 1.

However, existing neural network composed of a single structure network has a performance limiting problem since one network cannot learn even data that is difficult to recognize. If learning is carried out for even data that is difficult to recognize, performance degradation will be the result due to overfitting that occurs during the training process [19]. In this paper, a deep learning based ensemble network that relearns good features that were output from many networks for data that is difficult to learn by a single network is proposed. The algorithm proposed in this paper designs an ensemble network in order to solve the recognition rate reduction problem caused by exhibiting similar features between classes and the overfitting that occurs in existing single structure network. Relearning is carried out by blending good features that are output from each single network. In addition, feasibility of use in a wearable device based real time recognition system is verified by applying a 2D ECG image to the CNN model which has shown superior performance for image recognition and classification. ECG data used in testing was obtained from 18 test subjects in the MIT-BIH Normal Sinus Rhythm Database (NSRDB). 80% of the data was used as training data and the remaining 20% was used as test data. Test for 2D ECG image that was classified by the fiducial point method resulted in 97.3%, 97.9%, and 97.2% recognition rates for the case in which a single network was used for the 3 networks designed in the ensemble network while 98.9% recognition rate was obtained for the case in which the ensemble network was used, showing an increase in performance of 1.7%. For the case in which classification was carried out by the nonfiducial point method, single network displayed recognition rates of 97.2%, 96.6%, and 97.7% while the ensemble method resulted in recognition rate of 98.6%. Best results were achieved when the ensemble network was used for both the fiducial point method and the nonfiducial point method.

3. Proposed Algorithms for Ensemble Networks Based on ECG Data

Figure 4 shows the flowchart for the ensemble network based user recognition system using the ECG signal that is proposed in this paper. First, noise that occurs during ECG signal acquisition is removed through frequency filtering for DB formation. After detecting the R waveform peak using the Pan&Tompkins method, signal segmentation for each period including P, QRS, and T waveform is carried out. However, baseline fluctuation noise caused by respiration of person carrying out the measurement is not removed by the filter and median filter and the ECG image is acquired by estimating the partial baseline using first order regression analysis and projecting on the 2D space.

3.1. ECG Data Preprocessing

Preprocessing consisting of noise removal and calibration is carried out to remove noise and distortion before transforming 1D ECG signal into 2D image. ECG signal noise is removed via frequency filtering, R waveform detection, and median filtering process as shown in Figure 5 [20]. A band pass filter was used for frequency filtering in order to remove power line noise, muscle noise, and electrode contact noise which occur during ECG measurement. R waveform peak was detected from the ECG signal that had passed through the band pass filter using the Pan&Tompkins algorithm. Using R waveform peak as a fiducial, noise was removed by applying the median filter on the remaining interval excluding the QRS Complex interval which includes the physical feature information unique to each individual.

However, a calibration process is needed since baseline fluctuation noise caused by user respiration is not removed by filter and median filter. For the technology used to remove the baseline fluctuation noise, partial baseline was estimated by using first order regression analysis.For the partial baseline, first order regression analysis is calculated by applying , the median value between the ECG T waveform and P waveform, and , the median value between the ECG P waveform and Q waveform to (1). Finally, 1D ECG signal is transformed into 2D image through projection and linear equation for application to 2D-CNN. 1D ECG signal is projected using the amplitude value due to temporal fluctuation in (2). is the ECG pixel position and is the ECG amplitude value at time t. 1D ECG signal is a discontinuous voltage value in the time domain and data loss will occur between pixels when projected on 2D space. Therefore, after minimizing data loss through linear equation, fiducial point based ECG signal was classified by projecting 1D ECG signal onto 2D image space using (3) as shown in Figure 6(a).Nonfiducial point based ECG classification is carried out with a fixed length sliding window as shown in Figure 6(b). For t test subjects, n signals to be classified for each test subject and d as the length of the signal to classify, data of size can be obtained. Since the entire ECG for one person is longer than d, classification into many ECG signals is possible. The ith ECG signal is expressed as , and the jth sample of is expressed as . Since nonfiducial point based ECG uses all acquired signals, it was possible to construct a large amount of data compared to the fiducial point based ECG classification that only uses data between P, QRS, and T waveforms.

3.2. Ensemble Architecture Based on Deep Convolutional Neural Networks

Figure 7 shows the entire structure of the ensemble based CNN model proposed in this paper. CNN learns by extracting the features autonomously using a fixed size filter on information that is continuously input. Rather than using the input information as for feature extraction, only meaningful specific information is extracted and learned thereby reducing the volume of complex computation that occurs during the training process. Many features can be extracted by varying the filter size and features extracted using one or many filters make up one layer. In addition, data augmentation is applied in order to increase the limited data amount since recognition performance improves as the amount of data used in learning is increased. Since data loss and deformation are severe when there is left right reversal, scaling and color alteration are applied to 2D ECG and data amount can be increased by using up, down, left, and right translation on the image location with the original image as the fiducial as shown in Figure 8.

Recently introduced ResNet[21] and DenseNet[22] were proposed in order to solve the problem in which initial features of the image are lost at the final output stage as the depth of the CNN model is increased [23]. However, the ECG image consists of simple background and waveforms unlike regular image that contains complex patterns. Therefore, there is no need to design deep layer stages for the CNN model. On the contrary, performance can be degraded as the number of parameters is increased. Therefore, the proposed ensemble network consists of 3 single CNN models as the other layer number and parameter values. Good features extracted from each network are used for training by blending into one data. Nonlinear activation functions like Rectified Linear Units (ReLU), Leakage Rectified Linear Units (LReLU), and Exponential Linear Units (ELU) are used in the CNN model to output kernel weight. Since ReLU that is most used in CNN converts negative numbers to 0, ELU which allows use of negative numbers was used.An optimization function was used for the cost function, since it measures the learning state of the CNN model and displays the difference between the sample used for training and the expected output data Adam, a gradient descent based optimization function, was used to minimize the cost function. While ConvNet-1 and 2, the single networks used in the ensemble network, are made up of identical layer stages consisting of 3 Convolution Layers, 3 Max-Pooling Layers, and 3 Fully Connected Layers, differentiation was achieved by using Learning Rates of 0.001 and 0.01, respectively. ConvNet-3 consists of 2 Convolution Layers, 2 Max-Pooling Layers, and 3 Fully Connected Layers and 0.001 was used as the Learning Rate.

4. Experimental Results

The ECG database was obtained by taking 128 sampling measurements each from 18 subjects consisting of 5 males (ages 26~45) and 13 females (ages 20~50) in the MIT-BIH Normal Sinus Rhythm Database (NSRDB), an ECG DB. In this paper, 4,500 pages of training data, 2,700 pages of verification data and 1,800 pages of test data were used for the fiducial point based ECG image data and 90,000 pages of training data, 54,000 pages of verification data and 36,000 pages of test data were used for the nonfiducial point based ECG image data. For performance analysis of each class, Precision, Recall, F1-score, and Accuracy, the performance evaluation standards used in pattern recognition, were used. Precision, Recall, F1-score, and Accuracy for each class were calculated by applying TP (True Positive), TN (True Negative), FP (False Positive), and FN (False Negative) in (7), (8), (9), and (10).Figures 9 and 10 show the result displayed using the confusion matrix of the user recognition performance for fiducial point and nonfiducial point based 2D ECG image applied to the ensemble network. It is the result obtained with the estimated value by entering the fiducial point based ECG test data for 1,800 subjects and nonfiducial point based ECG test data for 36,000 subjects in the trained neural network model.

The columns display the Ground Truth and the rows display the number of ECG data for the proposed method. The following example will help in understanding the confusion matrix: the 15th row in Figure 9(b) which is O-class is classified into 89 ECG data (True Positive) that were recognized accurately as O-class and 11 ECG data (False Negative) that were falsely recognized as F-class and Q-class from a total of 100 data. In addition, 1,708 non-O-class ECG data (True Negative) that were accurately recognized as classes other than O-class and 2 B-class and C-class ECG data (False Positive) that were falsely recognized as O-class were confirmed.

Using such analysis method, Table 2 shows the result on user recognition performance using the fiducial point based 2D ECG image. User recognition accuracy for the ensemble network proposed in this paper displayed 1%~1.7% improved performance compared to the single network based user recognition performance. Table 3 shows the result on user recognition performance using nonfiducial point based 2D ECG image. As in the fiducial point based user recognition performance, user recognition accuracy for the ensemble network displayed 0.9%~2% improved performance compared to the single network based user recognition performance.

In particular, since N-class and O-class ECG data displayed similar waveforms with the lowest amount of change in P, QRS, and T waveform distance values when compared to other classes, a low recognition rate was shown when a single network is used. However, the highest recognition rate of 96.5% was shown when applied to the ensemble network model as shown in Figure 11. When compared to the single network, difference in recognition performance of 4.5%~13% was displayed.

5. Conclusions

In this paper, an ECG data based ensemble network method for user recognition in smart car platform environment was proposed. Recognition performance was verified by applying 2D ECG image to the CNN model which has displayed excellent performance in image recognition and classification fields. Training data was transformed into 2D image by proceeding through the process of noise removal and signal segmentation for each period after classifying them with the fiducial point based method using feature point detection and nonfiducial point method due to time change. In addition, an ensemble network was designed to retrain with good feature data extracted from each single network in order to process data that is difficult to train in a single network and applied to the user recognition system. When the performance of the method proposed in this paper was analyzed, accuracy of the ensemble network based user recognition using 2D fiducial point ECG image displayed improvement of 1%~1.7% compared to the single network based user recognition performance. User recognition result for 2D nonfiducial point ECG image using the ensemble network based user recognition method displayed 0.9%~2% improved performance compared to the single network based user recognition performance. In addition, by solving the problem inherent in single networks that display low recognition rate due to the inability to recognize similar waveforms with the proposed ensemble network, potential for application in a smart car platform that requires high reliability was verified. In the future, we plan to conduct research on user recognition for real life applications by acquiring multiple proprietary DBs based on state changes like the ECG measurement location and user workout, sleep and after drinking. In addition, we will also compare and analyze the performance of 1 dimensional ECG signals applied to the same network without converting them into 2D images. And we plan to improve the recognition performance by designing and combining various networks based on user change.

Data Availability

The Normal Sinus Rhythm DB data used to support the findings of this study have been deposited in the MIT-BIH repository PhysioNet.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.


This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (No. 2018R1A2B6001984) and supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (No. 2017R1A6A1A03015496).