Abstract

This paper presents a novel method to use the electrocardiogram (ECG) signal as biometrics for individual identification. The ECG characterization is performed using an automated approach consisting of analytical and appearance methods. The analytical method extracts the fiducial features from heartbeats while the appearance method extracts the morphological features from the ECG trace. We linearly project the extracted features into a subspace of lower dimension using an orthogonal basis that represent the most significant features for distinguishing heartbeats among the subjects. Result demonstrates that the proposed characterization of the ECG signal and subsequently derived eigenbeat features are insensitive to signal variations and nonsignal artifacts. The proposed system utilizing ECG biometric method achieves the best identification rates of 85.7% for the subjects of MIT-BIH arrhythmia database and 92.49% for the healthy subjects of our IIT (BHU) database. These results are significantly better than the classification accuracies of 79.55% and 84.9%, reported using support vector machine on the tested subjects of MIT-BIH arrhythmia database and our IIT (BHU) database, respectively.

1. Introduction

Many body parts, signaling methods, and behavioral characteristics have been suggested and used for biometrics. It includes facial characteristics, digital fingerprints, retinal scans, gait, voice patterns, and handwritten signatures [1]. Biometric identifiers are distinctive to an individual and are considered more reliable and capable than the traditional possession or knowledge based technologies in differentiating between an authorized and a fraudulent person. The biometric technology is being increasingly popular, but the concerns of this technology include the reproduction of falsified credentials from an original biometric sample, removal of the biometric features for restricting the establishment of a true identity, and presentation of the original biometric sample from an illegitimate subject. The reasons that conventional biometrics are not robust enough against falsification due to their characteristics are that they are neither confidential nor secret to an individual. For example, faces are publically visible, irises pattern can be observed anywhere they look, fingerprints are left on everything they touch, voices are being recorded, and handwritten signature can be falsely replicated [2].

In order to compliance the need of a practical biometric system such as low error rate to achieve high security level and the detection of fake biometric samples using liveness testing, search for the new biometric modalities is of great interest [3]. In the recent studies, the electrocardiogram (ECG) has proposed a novel biometric for human recognition [414]. A review of most of the methods that have been applied to use the ECG for biometric recognition is given in [15]. The ECG is a physiologically low frequency signal that has one-dimensional data representation. It measures the electrical manifestation of the ionic potential of the human heart. The difference in an individual heart structure such as chest geometry, size, and position manifests unique rhythm in the heartbeats. The ECG has intrinsic real-time vitality characteristic that signifies the life signs, an evidence to ensure that the biometric sample is being collected from a live and legitimate individual to be identified. In addition, the ECG as a biometrics offers other advantages to an individual identification system. The ECG information is intrinsic to an individual so it is highly secured and confidential; it is hard to steal and impossible to mimic. The ECG is universally present among all living persons. The ECG can also be combined with different and independent biometric modalities as a supplementary information in the multimodal system that may confirm secure and accurate individual identification [16].

The challenge of using the ECG signal as a biometrics is the variability of an individual’s heartbeats at different instance of time [17]. These variations may result due to muscle flexure and the change in mental or emotional states of an individual. The signal variation may also be caused due to change in sensor positions and long term baseline shifts. This paper presents a novel method to analyze heartbeat features for individual identification that is insensitive to signal variations and nonsignal artifacts up to greater extant. We perform the ECG characterization using a framework of hybrid approach consisting of analytical and appearance methods. The analytical method extracts the fiducial features from heartbeats that include temporal features that are computed from the dominant fiducials, while the appearance method extracts the morphological features from the heartbeats. The advantage of using the analytical features is that they capture local information of the heartbeats. The drawback of analytical method is that there are some heartbeats in the ECG signal that could be left from their robust analysis. In order to overcome this limitation the appearance method of ECG analysis is also considered. It captures heartbeat features from the ECG signal in the holistic manner such that its complete information could be preserved. The analytical features can be used as the supplementary information and combined with the morphological features for improved classification accuracy. In order to overcome the effects of sudden changes of the signal the aforementioned ECG characterization process selects a sequence of   heartbeats such that the present beat can only be analyzed if its predecessor and successors beats are segmented correctly.

The analysis of ECG using appearance method utilizes a two-stage procedure for the extraction of heartbeat morphological features. In the first stage, morphological features are extracted from the segmented heartbeats that have shown consistent features. In the second stage, fixed-interval morphological features are extracted from the scaled signal whereas the scaling is done using Pareto technique [18]. The purpose of scaling the ECG signal is to minimize the effect of noise components contaminated to it. In order to make the features insensitive to nonsignal artifacts, features are linearly projected in a lower-dimensional feature subspace. It uses principal component analysis (PCA) also known as Karhunen-Loeve methods [19] for dimensionality reduction and yields projection directions that maximize the scatter across all traces of an individual ECG. The identity classification is performed using a nearest neighbor classifier. The classification results are validated using a subset of MIT-BIH arrhythmia database [20] and our database of IIT (BHU) that contains the ECG recordings of healthy subjects. In the sections that follow, we report the dependency of classification performance on the number of principal components of the projected samples in both databases. We found extremely better results at lower dimension of the feature subspace in comparison to the total classification performance reported by the best classifier such as support vector machine (SVM) [21].

In summary, this paper has the following contributions.(1)It examines the feasibility of the ECG signal for biometric application. A framework of hybrid approach consisting of analytical and appearance methods is presented that shows the robustness to cover the healthy and arrhythmia subjects for their identification.(2)An approach for automated identification of individuals based on their ECG using heartbeat segmentation is developed. Our approach is insensitive to signal variations and muscle flexure. It performs ECG characterization using fiducial features (i.e., temporal features) and the morphological features of the heartbeats.(3)For better discrimination among the subjects, the method utilizes the linear projection of features into a low-dimensional subspace using the information of most significant features. It minimizes the effect of noise and nonsignal artifacts present in the data and reduces the complexity to access a high-dimensional feature set.(4)The performance of the proposed ECG biometric method is benchmarked on publically available MIT-BIH arrhythmia database [20] and one-lead ECG database of IIT (BHU). The IIT (BHU) database contains multisession recordings of the healthy subjects that are acquired during the period of 15 months whereas each record is prepared robustly.

The rest of the paper is organized as follows. The prior work and state-of-the-art using the ECG signal in biometric application is given in Section 2. Section 3 presents the method of characterizing the ECG signal for data representation and feature selection. The schematic description of individual identification system using the ECG signal is presented in Section 4. The experiment results that prove the efficacy of the proposed biometric system on publically available database and our database are presented in Section 5. Some conclusions are drawn in Section 6.

2. Prior Work

The use of ECG as a candidate of biometrics has been previously reported by other investigators (e.g., [412]) using a variety of features to represent the heartbeats. Biel et al. [4] were amongst the first who demonstrated the biometric applications of the ECG signal. They conducted the identity recognition experiment on the ECG features prepared using analytical method and discriminated 20 subjects utilizing the multivariate classification technique. The method generated the ECG attributes using a specific equipment named SIMCA from multilead ECG traces within a single session which limits the scope of applications.

Kyoso and Uchiyama [5] utilized the temporal durations of the ECG waveforms such as P wave duration, PQ interval, QRS duration, and QT interval to identify the person from registered ECG. These features were identified on the pulses by applying a threshold to the second order derivative. The subject with the smallest Mahalanobis distance between each two of the four feature parameters was selected as the output. The classifier achieved an accuracy of 94.2% at best after combining the features of QRS duration and QT interval while experimenting on nine subjects. Shen et al. [6] described a method for verifying individuals using temporal and appearance features of the heartbeats. However, the features extracted from QRS complex were stable with change in heart rate while the feature QT interval varied with the heart rate. Template matching and decision based neural network approaches were used to determine the identity verification rate and reported the accuracy of 95% and 80%, respectively. For a population of size 20, performance of the system raised to 100% after classifiers were combined.

Israel et al. [7] experimented that the ECG signal of an individual exhibits unique patterns. They analyzed the ECG signal for quality check and proposed a quantifiable metrics for classifying heartbeats among individuals. The method used fifteen intrabeat features extracted from each heartbeat and performed the classification using linear discriminant analysis. The experiment concluded that the extracted features were independent to the electrode positions, invariant to the individuals state of anxiety, and unique to an individual. Although the method facilitated automatic recognition, the identification accuracy was reported low due to insufficient representation of the feature extraction methods. Wang et al. [8] proposed a two-step fiducial detection framework that incorporated analytical and appearance features of the heartbeats. They presented a data integration scheme that combined the fiducial features as a complementary characteristics with the appearance features. The method of feature extraction used a combination of autocorrelation (AC) and discrete cosine transform (DCT). The AC/DCT method achieved the recognition accuracy between 94.47% and 97.8%.

Lourenço et al. [9] explored the feasibility of a nonintrusive ECG biometric system where the ECG signals were acquired from a minimally intrusive one-lead setup from fingers. They performed time domain signal processing task on the noisy data. It comprised filtering, peak detection, and heartbeat segmentation followed by detection of features from the locations of P-QRS-T complexes. The experiment resulted in 94.3% recognition rate in subject identification and an EER of 13% in subject authentication when test patterns were compared with the enrollment database using a simple minimum distance criterion. The EER result achieved up to 10.1% at the most using user-tuned threshold method when 16 subjects were examined. Singh and Gupta [1012] explored the feasibility of the ECG signal to aid in human identification. Their experiment used signal processing methods to first delineate the ECG waveforms from each heartbeat. Next, the delineated fiducials were used along with the QRS complex to extract the features of the following classes: time interval, amplitude, and angle features from clinically dominant fiducials of the heartbeats. They performed subject classification using template matching and correlation based criteria and the QT database was used [20] for validation. The individuals were classified with an accuracy of 99% in a population size 50.

The existing methods that utilize fiducial features for ECG biometric recognition are extracted mainly from temporal duration, amplitude difference, angles, and R-R intervals. The accurate detection of dominant fiducials of the ECG waveform is a difficult task due to inter- or intrasubject variability of the cardiac rhythm. The effectiveness of methods rely heavily on the accuracy of detected fiducials which is a challenge due to the lack of standardized definition of localizing the ECG waveform boundaries [17]. This paper generalizes the prior art of fiducial features combined with the morphological features of the heartbeats as a supplementary information for generating most discriminatory features from an individual ECG signal.

3. Methods

3.1. ECG Preamble

The ECG is a noninvasive tool used to record the electrical manifestation of the contractile and relaxation activity of the heart. It can be recorded with the surface electrodes placed on the limbs and chest. ECG devices use varying number of electrodes ranging from 3 to 12 for signal acquisition while the system using more electrodes exceeding 12 and up to 120 is also available [22]. Each normal cycle of a ECG signal contains P, QRS, and T waves (for instance see Figure 1). The P wave is a representation of contraction of the atrial muscle and has duration of 60–100 milliseconds (ms). It has low amplitude morphology of 0.1–0.25 millivolts (mV) and is usually found in the beginning of the heartbeat. The QRS complex is the result of depolarization of the messy ventricles. It is a sharp biphasic or triphasic wave of 80–120 ms duration and shows a significant amplitude deflection that varies from person to person. The time taken for ionic potential to spread from sinus node through the atrial muscle and entering the ventricles is 120–200 ms and known as PR interval. The ventricles have a relatively long ionic potential duration of 300–420 ms known as the QT interval. The plateau part of ionic potential is 80–120 ms after the QRS and known as the ST segment. The return of the ventricular muscle to its resting ionic state causes the T wave that has an amplitude of 0.1–0.5 mV and duration of 120–180 ms. The duration from resting of ventricles to the beginning of the next cycle of atrial contraction is known as TP segment which is a long plateau part of negligible elevation.

3.2. ECG Processing

Prior to using the ECG signal in subsequent stages of processing for heartbeat segmentation and feature extraction, the signal is passed through median filters to remove the baseline wander. The signal is first passed to a median filter of 200 ms width. The task of this median filter is to suppress the P waves and the QRS complexes from the ECG signal. The signal is then passed to a median filter of 600 ms width to suppress the T waves from the signal. The length of median filters is set heuristically according to the morphology of the P-QRS-T complexes such that these complexes can be completely covered. The signal resulting from the second median filter contains the baseline information, which is then subtracted from the original signal to produce the baseline corrected ECG signal [23].

3.3. Heartbeat Detection

The QRS complex delineator is used to detect the heartbeats from the ECG signal. The method proposed by Pan and Tompkins [24] is used for QRS complex detection with some improvements. It uses digital analysis of slope, amplitude, and width information of the ECG waveforms. The high level description of the processing steps of the QRS complex delineator implemented in this work is given as follows.

Step 1. We compute the time limited estimate of energy in the QRS frequency band. The signal is first passed to a system of lowpass filter of order 2 where and are the output and input of the system, respectively, that represent the data sample of size at discrete instance of time . The signal is then passed through a system of highpass filter to reduce the edge effect

Step 2. Next the absolute value of the derivative of the smoothened signal obtained from step 1 is computed. The derivative approximation is implemented using the following time domain difference equation: whereas the absolute value of each data sample is computed as follows:

Step 3. The signal is then passed to a moving average system of the following time domain difference equation:

Step 4. Finally the decision rules are applied to make a distinction between QRS peak and the noise peak using dynamic threshold criterion.

The aims of filtering functions are twofold; that is, the signal is corrected from nonsignal artifacts and the signal reports an improvement in the signal-to-noise ratio, which finally enhance the QRS complex characteristics. The advantage of computing the absolute value of first difference of the signal is to make the QRS detector less gain-sensitive and improve the algorithm performance (Pan and Tompkins used squaring function instead of absolute value function that caused nonlinear amplification while the absolute function causes the QRS detector to be gain sensitive.). The function of a moving average system is to capture the the most prominent waveform of the ECG signal, that is, the QRS complex. In this work, average size of the moving window is set near the width of a typical QRS complex (<100 ms), that is,  ms wide. In the original algorithm of Pan and Tompkins, the size of the window is set to 150 ms wide that allows wider QRS complexes produced by the premature ventricular contractions (PVC) and merging of the QRS complexes with the T waves. In [25], it has shown that the smaller window of size QRS width produces better results which is also verified by this work.

The smoothened signals are summed and tested on rectangular pulses using decision rules to detect the R peaks. The rules used to detect a peak can be summarized as follows. Let us assume that a true QRS peak is recognized. First, all peaks that precede the larger peaks before 200 ms can be ignored. From the physiology of the normal QRS complex, the delay between two normal heart beats is much larger than 200 ms. Therefore, setting of the refractory period of 200 ms can avoid the possibility of false detection. Second, if a peak occurs, then there is a need to check whether the raw signal contains both positive and negative slopes of nearly the same magnitude in a 200 ms window; if this is not the case, then the peak represents the noise. Next, the peak can be detected in the regions where signal rises above the threshold. The threshold is estimated dynamically to some value between the recently detected QRS peaks and the non-QRS peaks. Finally, if no QRS complex is detected within a time interval of RR intervals, then the peak greater than the of the threshold, and the peak followed by the previous detection at least 360 ms, can be classified as a QRS complex.

The beginning and the end of the QRS complex, that is, and time instances, respectively, are delineated according to the location and convexity of the R peak. Once the heartbeats are detected, temporal time windows are defined heuristically, before and after the QRS complex time instances to seek for the P and T waves. The proposed technique for the P wave delineator [26] determines Ponset and Poffset time instances, while the technique for the T wave delineator [27] finds Tonset and Toffset time instances. Through all these time instances of the heartbeats, we derive three different classes of features such as (1) heartbeat interval features, (2) interbeat interval features, and (3) ECG morphological features.

3.4. Heartbeats Feature Extraction
3.4.1. Heartbeat Interval Features

Five features relating to heartbeat intervals are computed after heartbeat segmentation. The QRS width is the duration between the and the . The T wave duration is defined as the time interval between the and the Toffset. The PQ segment is defined as the time interval between the Ponset and the . The pre-TP segment is defined as the time interval between a given Ponset and the previous wave Toffset. Similarly, the post-TP segment is defined as the time interval between a given Toffset and the following wave Ponset.

3.4.2. Interbeat Interval Features

Ten features relating to interheartbeat intervals are computed after segmentation of successive heartbeat fiducial points. These features are extracted from the PP, QQ, SS, TT, and RR sequence of the successive heartbeats. The pre-PP (post-PP) interval is the time interval between Ponset of a given heartbeat and the Ponset of the previous (following) heartbeat. The pre-QQ (post-QQ) interval is the time interval between QPeak of a given heartbeat and the Qpeak of the previous (following) heartbeat. The pre-SS (post-SS) interval is the time interval between Speak of a given heartbeat and the Speak of the previous (following) heartbeat. The pre-TT (post-TT) offset interval is the time interval between Toffset of a given heartbeat and the Toffset of the previous (following) heartbeat. Similarly, the pre-RR (post-RR) interval is defined as the RR interval between a given heartbeat and the previous (following) heartbeat.

The heartbeat interval features and the interbeat interval features are shown in Figure 2.

3.4.3. ECG Morphological Features

We divided the ECG morphological features into two groups where both groups contained amplitude values of the segmented heartbeats of the ECG signal. The major distinction between the groups is the method used to fix the temporal windows for amplitude features extraction. First group contained thirty-two features. These features are determined within the temporal windows as shown in Figure 3(a). The first window is set between the and the time instances. Five features are extracted corresponding to the fiducials of , Qpeak, Rpeak, Speak, and . The boundaries of the second window are set heuristically, such that it approximately covers the P wave. From the morphology of the P wave the contraction period of atrial muscle can be at most 120 ms; therefore the boundaries of the temporal window are extended from Ponset to  ms. Using linear interpolation method, thirteen features are estimated uniformly within the temporal window. Similarly, the third window is bounded by the and the Toffset time instances. Fifteen features of the heartbeat amplitude are derived uniformly within the window using linear interpolation.

The second group contained twenty-eight features which are extracted from the scaled ECG signal. In the scaled signal the amplitude difference from to the mean is measured in units of standard deviation such as where represents the data sample of size at discrete instance of time [18]. The aim of scaling is to reduce the sensitivity of the ECG signal both to noise and motion artifacts that are contaminated in the signal. We define three different temporal windows with respect to the location of the heartbeat fiducial point (FP) as shown in Figure 3(b). The first window approximately covers the QRS complex whose boundaries are set heuristically. From the morphology of the QRS complex, time required to depolarize the ventricles can be normally lesser than 180 ms. Further, the depolarization time of ventricles is divided into start of depolarization (e.g., 80 ms) and the end of depolarization (e.g., 100 ms). Therefore, the boundaries of the temporal window are extended from  ms to  ms that cover the portion of the ECG signal labeled as QRS complex. A total of nine features are resulted from this window.

The second temporal window is set for extracting the morphological information of the portion of the ECG waveform that occurred prior to ventricle depolarization. The duration of this period is set heuristically (e.g., 160 ms which is set more than the duration of a P wave); therefore we extend the boundaries of this window from  ms to  ms towards left. Nine features are resulted within the window. The third temporal window approximately contains the T wave. The boundaries of this window are set heuristically from the morphology of the T waveform. The time of ventricle repolarization such as the myocardium is prepared for the next cycle of the ECG which is set more than the normal duration of a T wave (e.g., 270 ms); therefore the boundaries of the temporal window are extended from  ms (i.e., a segment of 50 ms is left from  ms because it is the least time prior to the start of the repolarization of ventricles) to  ms. Ten amplitude features are derived from this window. In all temporal windows the features are derived from uniformly distributed sample positions using linear interpolation method while the ECG signal is sampled uniformly.

3.5. Feature Normalization

In order to obtain the consistent features from change in the heart rate the beat features are normalized. The heart rate varies due to change in pressure inside the heart and ventricular volume. Change in heart rate consequently changes the duration of atrium depolarization and ventricular repolarization. Therefore, heartbeat interval features are normalized by dividing them to the beat length, , where is the time interval between Ponset and , time instances, and is the corrected time interval between and Toffset, time instances, using the Bazett’s formula [28]. The interbeat interval features are normalized by dividing them to the mean of the beat length of its predecessor and the successor beats. Finally, the normalized features represent the relative position of fiducials within a heartbeat.

The amplitude values at different time segments of the different waves are unaffected from the change in the heart rate. Therefore, ECG morphological features are measured only with respect to the amplitude of the R peak which serves as the basis for automated determination of the heart rate.

3.6. Selection of Eigenbeat Features

The eigenbeat method is based on the linear projection of the sample space to a low dimensionality feature space [19]. It uses principal component analysis (PCA) which is an unsupervised learning technique that provides an optimal, in the least mean square error sense, representation of the input in a lower-dimensional space. It yields projection directions that maximize the scatter across all samples present in the gallery and probe ECG signals. When the heartbeat features are projected into the subspace spanned by the dominant eigenvectors, the separability among the subjects is manifested.

More formally, given class of features vectors the class contains the feature vectors . Let us assume that the feature vectors are taking the values in an -dimensional space such as , where ; therefore . PCA seeks to find an orthogonal subspace that reduces the dimensionality of the original feature space while preserving the majority of the data variance. This is achieved by performing an eigendecomposition on the covariance matrix computed from samples in the feature space. From, a total of feature vectors, PCA computes the sample mean . Let be a matrix containing each data instance centered at the mean such as ; we compute the scatter matrix as . Then a set of , where () feature basis vectors , can be estimated by maximizing the expression where is the set of -dimensional eigenvectors of the scatter matrix corresponding to the largest eigenvalues. It is to be noted that the dimension of the generated eigenvectors is the same as the original feature vectors; therefore they can be referred to as eigenbeat features. The generated eigenvectors form the basis representation of the gallery and the probe ECG signals. It yields projection directions that maximize the scatter across all feature vectors of the ECG traces.

We can solve by performing an eigendecomposition on , which yields the matrices of eigenvectors and eigenvalues ; that is, In general, eigenvectors are retained such that where and is the th diagonal entry corresponding to the eigenvector in the th column of [29]. However, is usually chosen so that the clinically dominant information from the beat is not lost.

The eigenbeat features are the first eigenvectors corresponding to the largest eigenvalues, denoted as ; that is, . The extracted beat features are therefore transformed to the -dimensional beat space by a linear mapping such as where are the coefficients that represent the ECG feature vector in the reduced feature space. Averaging overall the feature vectors for a single subject provides the gallery representation against which the probe data is compared.

The probe ECG signal is processed similar to the gallery data to derive a representation relative to the basis formed by the dominant eigenvectors. The best match in the gallery data is the choice of subject that minimizes the distance between and such that where is the a vector of coefficients in the eigenspace to the probe ECG signal.

The prime advantage of employing PCA lies in reducing the dimensionality of feature vector. Typically, the majority of the data variation is captured in the eigenvectors associated with large eigenvalues and the eigenvectors associated with small eigenvalues correspond to noisy measurements. Therefore, discarding the eigenvectors associated with small eigenvalues, the dimensionality of feature vector is greatly reduced without losing data variance information.

4. ECG Biometric System

The proposed biometric system for individual identification using the ECG signal is shown in Figure 4 It works in the offline mode. The ECG signals acquired from people are preprocessed for quality check. It makes necessary correction of the signal from noise and nonsignal artifacts that occur mainly due to muscle activity, body movement, and respiration. The data representation stage consists of heartbeat detection, heartbeat segmentation, and feature extraction. The heartbeat detection attempts to locate all heartbeats. The heartbeat segmentation includes the detection of the P, Q, R, S, and T waves and determination of their end fiducials. The feature extraction includes determination of heartbeat interval features, interbeat interval features, and the ECG morphological features from the successive beats and then the eigenbeat features are derived from them. Finally, a vector of measurement is generated from derived eigenbeat features; thus, the template is prepared and stored in the database. A similar process is adopted to generate a vector of measurement from probe ECG signal. The identification decision is then taken after comparing the feature vector derived from the probe ECG signal to the feature vectors stored in the database using 1 : N matching based on nearest neighbor criterion.

5. Experimental Results

The performance of the ECG biometric method is evaluated on two databases of different populations. The first database is acquired from publically available PhysioBank archives [20]; in particular MIT-BIH arrhythmia database is used in the experiment. The database includes ECG recordings of normal and inpatient both men and women of age between 20 and 84 years. Forty-four ECG recordings of this database are used in this study (Records 116, 207, 222, and 201 are excluded from this study because they contain multiform premature ventricular contractions.). The second database is prepared in-house at IIT (BHU), using the PowerLab 4/25 of AD instruments. Total 65 volunteers from a campus population of age between 20 and 56 years participated in the data enrollment process where the subjects did not report any cardiac arrhythmia. Three to five minutes of ECG recordings were acquired from each subject in multiple sessions across a period of 15 months. We performed data acquisition in a more simplistic manner where the subjects merely were seated on a chair or wooden stool under relax condition and the clamp electrodes were fixed to both wrist and left ankle in lead II configuration. Each signal was processed with a bandpass filter at 0.3–20 Hz and sampled at 1000 Hz.

The MIT-BIH arrhythmia database has ECG recordings from a single day for each subject, permitting only within-session analysis. Therefore, ECG recording at different time instances are used for gallery and probe data sets. For IIT (BHU) database, ECG recordings of different sessions are used for gallery and probe data sets. During preprocessing and the preparation of gallery and probe data sets, starting 60 sec recordings are used for setting of the delineators parameters whereas the first 30 sec recordings are discarded due to sensor and body stabilization effects. Ten sets of   heartbeats are randomly selected from the gallery data and the features are extracted from successive occurrences of 10 beats per set such that they meet the delineators requirement. Once data is represented using features, a representation relative to the basis formed by the dominant eigenvectors is derived by selecting most significant eigenvectors corresponding to the maximum eigenvalues. For example the dimensions such as 1, 2, 3, 4, 5, 10, and 20 represent the most significant eigenvectors that are used for evaluating the classification accuracy among the targeted subjects. Finally, coefficients of components are generated in the projected domain forming a compact representation of heartbeat information in gallery data set. The probe signal undergoes same processing steps as the gallery data set to derive a representation relative to the basis formed by dominant eigenvectors.

For different populations, each projected feature vector from gallery data set is compared to all projected feature vectors in the probe data set, using Euclidean distance as similarity measure to generate the matching scores. The matching score can be a genuine score or an impostor score. Genuine matching scores are generated comparing the attribute sets of probe and gallery data of the same subject; otherwise the score is an impostor score. Thus, system generates 44 values of genuine scores and 1892 (44 × 43) values of impostor scores for the population of MIT-BIH arrhythmia database. For healthy population of IIT (BHU) database, the system generates 65 values of genuine scores and 4160 (65 × 64) values of impostor scores. The performance of the aforementioned individual identification system is evaluated using classification accuracy by Rank-R considering closed set of population. The Rank-R classification accuracy of the system is defined as the percentage of probe signals that have the correct class as one of the top R scores. Further, In order to confirm the benefit of eigenbeat features the average rank classification accuracies are computed and the cumulative match characteristic (CMC) curve has been drawn.

In order to recognize the patterns derived from the feature vectors of an individual heartbeats, we use support vector machine (SVM) classifier [21]. The linear multiclass SVM reports an average classification error of 20.45% for the subjects of MIT-BIH arrhythmia database when 132 probe ECG signals are compared with 320 gallery ECG signals prepared from 44 subjects. The average classification error is found lower and reported to be 15.10% for the subjects of IIT (BHU) database when 192 probe ECG signals are compared with 447 gallery ECG signals prepared from 65 subjects. This classification is performed on the extracted feature set prior to applying the selection of the eigenbeat features. From these results, it is evident that the proposed method of ECG characterization extracted the beat features such that a correlation between gallery and probe ECG feature sets exists for an individual in both databases. Thus, a linear SVM performs subject classification with the rate of 79.55% and 84.9% for the subjects of MIT-BIH arrhythmia database and IIT (BHU) database, respectively.

The results from the linear projection of heartbeat features demonstrate that there exists a correlation between the number of eigenbeat features and the classification performance. There is an effective increase in the correlation that can be observed; alternatively this can effectively decrease the intrasubject variability as shown in Figures 5, 6, and 7. The decrease in intrasubject variability and an increase in intersubject separability represented by first two principal components (PC1 and PC2) for the subjects of MIT-BIH arrhythmia database and IIT (BHU) database are shown in Figures 5(a) and 5(b), respectively. Further, intersubject separability represented by first three principal components (PC1, PC2, and PC3) and (PC1, PC2, and PC5) for the subjects of MIT-BIH arrhythmia database and IIT (BHU) database are shown in Figures 6(a), 6(b), 7(a), and 7(b), respectively. The separability among the subjects in both databases reports lower dimension of projections, which are clearly represented by projection components as shown in Figure 7. In particular, the intersubject separability among the subjects of IIT (BHU) database which is shown in Figure 7(b) is better than the intersubject separability of the subjects of MIT-BIH arrhythmia database as shown in Figure 7(a).

The average rank classification accuracies computed for different principal components for the subjects of MIT-BIH arrhythmia database and IIT (BHU) database are given in Table 1. The performance of eigenbeat features vary with the number of principal components. The average rank classification accuracy increases with component dimensions. For example, there maximum values are reported to 85.7% for MIT-BIH Arrhythmia database at dimension five and 92.49% for IIT(BHU) database at dimension three. For higher dimensions of principal components the classification accuracies decrease for the subjects of both databases (see for instance Figure 8). These results demonstrate that the distinction present in an individual heartbeats is significantly captured by the eigenbeat features. In particular, first few eigenvectors potentially capture the discriminatory information present in the heartbeats. Further, these results also demonstrate the strength of the proposed method of ECG characterization and the linear projection of heartbeat features for human identification. Our ECG biometric method performs the subject classification to lower dimensions of feature vector in a cost-effective manner where a reduction in dimension does not lose the discriminatory information present in an individual heartbeats.

The CMC results of matching probe ECG signals to gallery ECG signals prepared from MIT-BIH arrhythmia database are shown in Figure 9(a). The system achieves the better Rank-1 classification accuracy of 47.7% at dimension five (DIM 5). The Rank-1 classification accuracies of the system increase with the increase in dimensions, that is, 18.18% at dimension one (DIM 1), 40.9% at dimension two (DIM 2), and 45.45% at dimensions three (DIM 3) and four (DIM 4). The Rank-1 accuracies of the system decrease for the dimensions above five, that is, 36.36% at dimension ten (DIM 10) and only 25% at dimension twenty (DIM 20). The CMC curve for DIM 1 shows poor classification performance and reported the accuracies are 29.5% by Rank-2, 43% by Rank-3, 70.5% by Rank-5, 88.6% by Rank-10, and 100% by Rank-15. The classification accuracies at DIM 2 are found to be 59.09% by Rank-2, 70.45% by Rank-3, 84.1% by Rank-5, 95.45% by Rank-10, and 100% by Rank-14. The system reports the improved classification accuracies at DIM 3 which are very close to the classification accuracies reported at DIM 4, that is, 45.45% by Rank-1, 68.18% by Rank-2, 72.72% by Rank-3, 84.09–84.1% by Rank-5, 95.45% by Rank-10, and 100% by Rank-13. The performance of the system is reported to be the best at DIM 5, that is, 47.7% by Rank-1, 68.18% by Rank-2, 75% by Rank-3, 86.36% by Rank-5, 93.18% by Rank-10, and 100% by Rank-14. The performance of the system degrades after DIM 5; in particular the accuracy values at DIM 10 (DIM 20) are reported to be 36.36% (25%) by Rank-1, 63.63% (34.1%) by Rank-2, 70.45% (47.7%) by Rank-3, 81.8% (70.45%) by Rank-5, 90.9% (88.63%) by Rank-10, and 100% (93.2%) by Rank-13.

The aforementioned system reports better identification performance for the subjects of IIT (BHU) database. The CMC curve related to better classification accuracies is achieved at dimension three (DIM 3). The system reports the classification accuracies of 64.62% by Rank-1, 80% by Rank-2, 86.15% by Rank-3, 90.77% by Rank-5, 98.46% by Rank-10, and 100% by Rank-17 which are represented by the curve labeled DIM 3 shown in Figure 9(b). The system performs poorer at DIM 1 and reported the accuracies of 18.46% by Rank-1, 38.46% by Rank-2, 46.15% by Rank-3, 66.15% by Rank-5, 86.15% by Rank-10, and 100% by Rank-19. The classification accuracies improve for DIM 2 such that the values are reported to be 46.15% by Rank-1, 69.23% by Rank-2, 73.85% by Rank-3, 86.15% by Rank-5, 96.9% by Rank-10, and 100% by Rank-15. The classification performance reported at DIM 4 is found very close to the performance reported at DIM 3. For instance, the accuracy values at DIM 4 are reported to be 64.62% by Rank-1, 78.46% by Rank-2, 84.6% by Rank-3, 89.23% by Rank-5, 96.9% by Rank-10, and 100% by Rank-19. Similar to the results obtained for the subjects of MIT-BIH arrhythmia database, the classification performance of the system degrades for the subjects of IIT (BHU) database at higher dimensions, that is, dimension beyond five. For instance the classification accuracies at DIM 10 (DIM 20) are reported to be 53.85% (40%) by Rank-1, 72.31% (50.77%) by Rank-2, 76.9% (56.9%) by Rank-3, 84.62% (63.1%) by Rank-5, 96.9% (69.2%) by Rank-10, 96.9% (75.38%) by Rank-15, and 100% (78.46%) by Rank-19.

From the results, it is evident that the average rank classification accuracies of 92.49% are reported for the subjects of IIT (BHU) database and 85.7% for the subjects of MIT-BIH arrhythmia database after the accumulation of first few principal components, that is, three and five, respectively. One of the reasons of reporting classification accuracy to the best for the subjects of IIT (BHU) database may be because the ECG recordings of this database were acquired from healthy subjects under normal conditions.

In conclusion, the experimental results of the proposed ECG biometric method for individual identification show the following.(1)The effectiveness of proposed characterization of the ECG signal using analytical and appearance methods that make the measurement insensitive to noise and nonsignal artifacts. Our approach provides an automated mean to analyze the heartbeats in a holistic manner than the other methods (e.g., [4, 5]). This facilitates the scope of biometric applications.(2)The linear projection method consisting of eigenbeat features is capable of classifying the individuals using their heartbeat features to lower dimensions. The resulting projection forms an optimal reconstruction that may be optimal for discriminatory standpoint. As a result, the individuals are classified with higher accuracy using a simple classifier like a nearest neighbour.(3)Principal component analysis is a noted method [30] that has been used to solve clinically oriented issues related to the characterization and diagnosis of cardiac arrhythmia. The findings of this experiment would open an avenue for analyzing electrical imaging of the heart, that is, the efficient exploration of voluminous data for clinical applications.(4)The best classification accuracy reported by ECG biometric system is found closer to the face recognition system [31] and slightly moderate than the fingerprint recognition system [32]. The advantages of using the ECG signal as a biometrics include robustness to circumvention, robustness to replay attacks, and robustness to obfuscation attacks [17].(5)The vitality feature of the ECG signal can be utilized effectively to countermeasure the spoof attacks. In a multimodal framework, the use of ECG signal as one of the biometrics would not only insure the presence of an individual to be authenticated but will also provide confidence to the collection of true samples of the other biometrics [16].

6. Conclusion

The use of ECG as biometric offers an attractive alternative to other conventional biometrics due to its nonsusceptibility against circumvention. It distinguishes itself by being a liveliness indicator and a difficult to falsify biometric modality. The study of [33] showed that the ECG biometric system can be fooled by synthesizing an ECG recording using measured features. But practically it may be harder to replicate an ECG signal at the acquisition level and fool the ECG sensors. This work has proposed a hybrid approach consisting of analytical and appearance techniques of ECG analysis for individual identification. The heartbeat fiducial features and morphological features that have shown consistent characteristics are extracted using analytical and appearance techniques, respectively. The biometric experiment is conducted considering different set of features, such as heartbeat interval features, interbeat interval features, and waveform morphological features. The eigenbeat features are derived from the extracted features using the method based on linearly projecting the feature space to a low-dimensional feature subspace. The experiment revealed the potential of the proposed framework of ECG biometric system that is robust to cover the healthy and arrhythmia subjects for their identification.

Conflict of Interests

The author declares that there is no conflict of interests regarding the publication of this paper.