#### Abstract

Electrocardiograph (ECG) human identification has the potential to improve biometric security. However, improvements in ECG identification and feature extraction are required. Previous work has focused on single lead ECG signals. Our work proposes a new algorithm for human identification by mapping two-lead ECG signals onto a two-dimensional matrix then employing a sparse matrix method to process the matrix. And that is the first application of sparse matrix techniques for ECG identification. Moreover, the results of our experiments demonstrate the benefits of our approach over existing methods.

#### 1. Introduction

Electrocardiogram (ECG) has become a popular tool in analyzing heart disease with the use of telemedicine and home care techniques [1, 2]. However, ECG not only is useful as a diagnostic tool but also has been applied on information watermarking [3–5], data compression [3, 6], and human identification [7–15]. ECG techniques have the potential to play a role in biometric identification.

Existing biometric identification techniques have focused on the use of fingerprints, facial geometry, and voice analysis. We may be able to apply ECG techniques to protect health care systems from data leakage.

In this work, we propose a new algorithm using two leads of ECG signals for human identification. This algorithm uses the sparse matrix for dimensionality reduction that mapped two-lead data into one coordinate. We take advantage of the sparse matrix for identification. Our algorithm is sparse matrix correlation coefficient (SMCC).

Through the experiment, we demonstrate that our approach is more accurate for human identification and verification than existing techniques. In a summary, compared to the previous ECG identification, our approach has the following advantages.

Using sparse matrix to store data that contains a large number of zero-valued elements can both save a significant amount of memory and speed up the processing of that data.

The algorithm performs rapidly with lower computational complexity than the PCA method to process two-lead signals.

The remainder of this paper is organized as follows: Section 2 contains an overview of related work in ECG identification; Section 3 introduces the proposed ECG identification algorithm; the experimental results are presented in Section 4; finally, some concluding remarks are stated in Section 5.

#### 2. Related Work

For ECG identification, research has focused on areas such as signal preprocessing, feature extraction, data classification, data reduction, and intelligence optimization.

Based on our survey, the ECG feature extraction algorithms can be classified into two categories: transform-based [9, 13, 15] and waveform-based [7, 8, 11, 12]. The transform-based algorithms consist of transforms in wavelet [15] and frequency domain. Since the wavelet transform contains information in both the time and frequency domains, it is more popular than the frequency based techniques which include Fourier transform [13] and discrete cosine transform (DCT). Waveform-based method measures the distance and amplitude difference between wave peaks and valleys. These attributes represent certain characteristics of the signal, such as in [8]; morphological characteristics are first extracted through the wavelet transform.

But some approaches are hybrid, for example [8], using morphological characteristics which are extracted through the wavelet transform. The feature extraction of our approach could be a hybrid approach as well; it transfers two waveform signals into a two-dimensional space then measures their similarity in the two-dimensional space.

Most approaches to ECG identification only use one-lead signal [7, 10, 14]. Lead systems allow you to look at the heart from different angles. Each different angle is called a lead. The different leads can be compared to radiographs taken from different angles. So we can use more than one feature to classify discrimination. In many characteristics classification, there are many kinds of features.

Two-dimensional processing of ECG data has been applied in compression and diagnostic areas. The authors of [16] proposed diagnosis of acute myocardial infarction using two-dimensional echocardiography. The authors of [17, 18] implemented ECG data compression on two-dimensional data. These approaches have however to our knowledge not been applied to ECG identification.

In a summary, most of the work on ECG biometrics made use of only one lead and ignored the other leads that may contain additional information. The ECG signals from the two leads are essentially two observations of the same physiological activity from two different perspectives. Thus, we proposed a new two-lead algorithm for ECG identification.

Data computation is another area that we reviewed when considering existing approaches to ECG identification. A common approach is correlation of coefficients for measurements of feature distance, such as the wavelet distances that have been used in matching acquired ECG signals for identification [9, 15]. The work [19] applied the feature set evaluation (FSE) with* k*-nearest neighbor (*k*-NN) algorithm to improve low recognition rates and used the eigen-space method to reduce data dimensions; however, this approach is both complicated and time consuming. By using typical neural classifier, the research [11, 20] is applying the neural network in ECG identification.

Further, one popular approach is PCA [21] which is an analogue of the principal axes theorem in mechanics, and it was later independently developed by [22]. A recent application of PCA in ECG signal processing is useful feature reduction of various ECG properties [1, 2, 8, 23].

In this work, we adapt spare matrix [24] for ECG identification, and it was invented as early as a century ago; CF Gauss, CGJ Jacobi, and others have studied the use of matrix sparse in some ways. Linear programming and numerical solution of boundary value problems had been apply for sparse problems in 1950s. DM Young and RS Varga on iterative research process can also be seen as the result of high-level sparse problem. But modern sparse matrix technology is mainly developed since the 1960s, and in the early and mid-60’s some researchers studied the direct method as a starting point. Sparse matrix has penetrated into many areas of research. For example, in structural analysis, network theory, power distribution systems, chemical engineering, photography, surveying and mapping, and other aspects of management science studies have appeared until hundreds of thousands of rank-order sparse matrix.

But according to our survey, we did not find any one to transform ECG signal into two-dimensional space and fuse with sparse matrix. In this research, we also found that they work well for the similarity measurement in ECG identification.

#### 3. Algorithm

In this work, we target the two-lead ECG signal to be transformed into two-dimensional coordinates and perform the identification using sparse matrix. Figure 1 shows the flow of utilizing sparse matrix in ECG human identification system, which consists of three steps. First, we map the ECG two-lead signals into two-dimensional coordination that forms a matrix. Then, we reduce dimensions of the matrix using a special mask, the size of which depends on how many dimensions we want to reduce. We then transfer the matrix into a sparse matrix so that it can be stored and addressed easily. The sparse matrix is regarded as the fusion features of ECG two-lead signals. Finally, the feature data for various individuals are used to train the sparse matrix classifier. Figure 1 is the detailed formula for the procedure.

*(1) Obtaining Two-Lead ECG Signals*. Consider two-lead ECG signals, respectively, aswhere real-valued corresponds to the th of the th ECG leads signal.

*(2) Transforming ECG Signals into Matrix*. Then, we initialize a matrix to zero, and then compare each pair of consecutive input signals at time, and set

In this procedure, we converted each lead signals of ECG to a matrix whose size is . Here, is the scopes of signal values.

*(3) Reduced Matrix*. Next, we defined a mask matrix ,where means the dimensions that we want to reduce. For coordination , and coordination , we follow the rule:

*(4) Storing as a Sparse Matrix*. And then we store the matrix as sparse matrix for the following processing.

Here, we use the coordinate format (COO) to store the spare matrix; that is, we just store three parameters, row, column, and value. As most elements of the matrix are zero and we set the corresponding point as “1,” we can just only use (row, column) sparse matrix format to express the full matrix of extracted ECG two-lead signals data. For an example, after processing, we get the ECG data that we extracted from the two-lead signals as Figure 2 shows, where means number of elements of this sparse matrix.

We can express the sparse matrices as

The relative sparse matrix is considered as the features of ECG two-lead signals and will be the input to the correlation coefficient classifier for training purposes and individual identification.

*(5) Computing Correlation Coefficient*. returns a matrix of correlation coefficients calculated from an input matrix whose rows are observations and whose columns are variables. The matrix is related to the covariance matrix by removes the mean from each column before calculating the result.

The covariance function is defined aswhere is a vector, is the mathematical expectation, and .

For an ECG data signal, which is from one unidentified individual. After transferring into a sparse matrix , we calculated the correlation coefficient of , , respectively, where is the template sparse matrix that came from a special individual, and we get = corrcoef . If we have individuals, then we have an output result. In sparse matrix correlation coefficient classification, we define as a threshold of th individual. If , it means we can classify to this corresponding target individual type ; that is, can be classified into the right type only if it belongs to a unique right type.

*(6) Setting Threshold for Identification*. To verify how efficient this algorithm is in human identification by ECG two-lead signals, we train the sparse matrix which represents a segment of ECG two-lead sample points to get thresholds of each individual. In these experiments, we define the threshold of the th individual aswhere represents the minimum correlation coefficient in training set of the th individual. is a variable that will be determined in the testing stage. To achieve the optimal identification result, we use increasing circulation to test result.

#### 4. Experimental Results

We conducted a comprehensive experiment on public ECG databases, and we selected the MIT-BIH normal sinus rhythm database [25]. This database includes 18 long-term ECG recordings of subjects referred to the Arrhythmia Laboratory at Boston’s Beth Israel Hospital. The subjects included in this database were found to have had no significant arrhythmias. These ECG data have a sampling rate of 128 Hz and a 12-bit binary representation.

For each individual, 8 segments of 10 sample periods long are obtained from the record of the ECG signal in the database. Thus, 1280 sample points in each segment are selected for frequency and rank order statistics. We set the matrix with 1300 rows and 1300 columns for reducing dimension easily. As we know, some of the ECG sample points are negative number, so every sample point value add 500 to get non-negative number. This modification of data set can avoid the problem for mapping them into the matrix. And then mapping those sample points into a 1300 × 1300 matrix. Next, we reduce the dimension of the matrix and store it as a sparse matrix.

In training stage, for each individual, there are 18 data sets for training via calculating the correlation coefficients. The process of training the neural network is from the MIT-BIH database for our sparse matrix experiment.

After training the ECG two-lead signals data with spare matrix correlation coefficient of 18 individuals, we can obtain the threshold of each individual. The test data for identification are also acquired from the same MIT-BIH database. For each individual, we recapture 10 segments of 1280 sample points of each lead in length. Note that these 10 segments are obtained at different locations of the ECG signal; that is, none of them overlap with previously selected segments used in the training process. Each segment is passed through the sparse matrix correlation coefficient classifier for the identification matching testing. Thus, there are 10 matching tests for each individual.

##### 4.1. Measurement Approaches

We used two approaches to evaluate the algorithms.

###### 4.1.1. Success Rate

This is a metric used for accuracy measurement. Based on the results of comparisons between the individuals, when the correlation coefficient is smaller than the threshold correlation coefficient, we considered it as an identification error. Summing up these errors gives us the total number of errors; then we divided this figure by the total number of comparisons to give the success rate.

###### 4.1.2. False Acceptance (FA) and False Rejection (FR) Rates

These are also the metrics used for accuracy performance. The FR denotes the relative ratio of subjects which should be accepted but are actually rejected by the classifier; similarly, the FA is the ratio of subjects which should be rejected but are actually accepted by the classifier. The threshold which for FA/FR is obtained from the training set, was aimed to minimize.

##### 4.2. Success Rate Results

Then, we use these to compute the correlation coefficient between testing data and template matrix of each individual to classify and identify the ECG testing data. As the of threshold defined is initialized by some random value, the performance might not be good enough; therefore, the classification should be trained a large number of times; for example, it is five times in our experiment. is initialized to zero, and then it increases by 0.01 to test the identification results. When increases to 0.20, we can find out the most appropriate for classification.

We use maximum correlation coefficient as prior method to calculate success rate. For a sparse matrix which came from one unidentified individual, we compute the correlation coefficient of this sparse matrix and each individual to get corresponding . Since there are 18 individuals in this experiment, we find out the max . And then if , where is the th individual threshold that can refuse data not belonging to its own, we have which belongs to the th individual. Else, does not belong to any individuals of those 18 individuals.

As we have 8 segments data for training, every segment can be a sparse matrix template for comparison during testing at the testing stage. Figure 3 shows the success rate by using maximum correlation coefficient prior method to identify human with each sparse matrix template.

Least square is another method for identifying human. For ECG signals which are extracted from MIT-BIH normal sinus rhythm database, we can get the correlation coefficient corresponding to th individual. After that, we compute the square :where means the average correlation coefficient of th individual at training sets. And then we find out the minimum , where the corresponding individual is . That is, . As a result, this ECG signal belongs to th individual. Figure 4 shows the success rate by using least squares correlation coefficient prior method to identify human with each template sparse matrix.

###### 4.2.1. FA and FR Rates Results

Figure 5 summarizes the FA and FR change with the changes. It shows that the FA/FR ratio for the matching test lies between 0.047 and 0.207 for 18 individuals with 10 segments each, which is acceptable for multiple subject classification.

The FA/FR rate of Figure 5 is compared with the 7th template of each individual. Next, we will show experiment result of different templates.

Figure 6(a) displays different FA rates of those eight templates of the sparse matrices. And we can figure out that the FA rate is smaller for all templates when the threshold is small. The best is .

**(a)**

**(b)**

Figure 6(b) displays different FR rates of those eight templates of sparse matrices. And we can figure out that the FR rate is smaller for all templates when the threshold is bigger. We have that when , all templates FR rate is zero.

The accuracy of sparse matrix correlation coefficient algorithm can be calculated by the FA rate and FR rate. In common, the calculation function would be

Although FA rate and FR rate have different tendency when changing threshold , we might find out a suitable by an iterative loop to find a better accuracy .

According to formula (10), we can get higher when is smaller.

As shown in Figure 7, we have eight templates matrix for training and testing ECG data in our SMCC algorithm. When , the result can reach the better performance. So we choose templates and to show FA and FR of each individual by sparse matrix correlation coefficient algorithm here, as Figure 7 shows.

#### 5. Comparison

To compare with the common one-dimensional algorithm, three ECG identification algorithms are compared in this experiment with the same database and comparison method. The four common identification algorithms are described below.

First, we list the ECG identification algorithms with single lead signal data.

##### 5.1. Comparing with Reduced Binary Pattern (RBP) Algorithm

This algorithm uses the frequency and rank order statistics of the input ECG signal [26]. For any ECG signal, we can express it as , where represents the th signal from the input data. According to the decrease or increase of two consecutive values, the two-state function, , is mapped onto the values of 0 and 1, respectively:

Through formula (11), the reduced binary pattern is simply represented by one binary sequence consisting of digits 0 and 1.

In counting and ranking process, the frequency of each whose value ranges between 0 and is calculated in the counting process. Therefore, we incorporate a weighted distance formula to define the measurement of similarity between and :where and represent the probability and ranking of in the sequence . The absolute difference between two rankings is multiplied by the normalized probabilities as a weighted sum; the factor in the denominator is to ensure all values of lie within the scope of .

##### 5.2. Comparing with Waveform Algorithm

In a waveform-based study [8], a total of 19 features are extracted from the four classes: amplitude (PQ, RQ, TQ, RT, PS, RP, TS, RS, PT, and QS), duration (QS, PR, QR, ST, and QT), slope (RS, ST, and QR), and area (area of the QRS triangle). These features form a feature vector .

After obtaining some waveform feature for the individual difference, we use formula (12), a similarity algorithm to evaluate difference between two individuals. The closeness between two feature-vectors and is considered as their distance ; the intra- and intergroup distances can be evaluated through (12).

##### 5.3. Comparing with Wavelet Transform Algorithm

Wavelet analysis or wavelet transformation is the finite or rapid attenuation of oscillation waveform signals, which is called the mother wavelet.

The procedures of the wavelet-based algorithm [9] in comparison include the following: each R-R cardiac cycle is obtained through R-R detection; an interpolation is performed on the R-R interval so each R-R cardiac cycle holds 284 data points; every R-R cycle is cut into three parts, each containing 85, 156, and 43 points; the first 85 and the last 43 points in each R-R cycle are assembled to form a 128-point segment; every four segments are grouped and an -level discrete wavelet transform (DWT) is performed to obtain the corresponding wavelet coefficients. Four of the computed wavelet coefficients are gathered as a wavelet vector and expressed as

The Euclidean distance between two wavelet vectors and is regarded as their distance; the intra- and intergroup distances can then be calculated through (12).

So far, we have introduced three algorithms for human identification by using one lead of ECG signal. Now we will design an experiment to conduct a comparison between the three algorithms and our sparse matrix algorithm.

In the evaluation using the MIT-BIH normal databases, it is obvious from the comparison of outcomes shown in Table 1 that the RBP, waveform-based, and wavelet transform algorithms perform well but our advanced sparse matrix with two-lead algorithm still excels them, and it has a better accuracy rate in the MIT-BIH normal public database.

##### 5.4. Comparison Result with One-Lead Methods

From Table 1 we know that using two-lead ECG signal for human identification can enhance the identification accuracy. The result demonstrates that there is a great potential of our proposed method in the ECG biometrics system. Next, we will compare two typical two-dimension algorithms with our two-lead ECG algorithm.

##### 5.5. Comparing with Basic Two-Dimensional Method

In Section 3, we described the basic flow of our sparse matrix algorithm with two-lead ECG signal. We reduce the sample points dimension directly through formula (4). Now we propose the basic method to deal with the similarity of two sparse matrices.

As we know, the baseline to measure similarity between two matrices is subtracted for two sparse matrices SM_{1} and SM_{2}, let and then calculate the correlation of and . The following steps to calculate the FA and FR are the same as our sparse matrix.

##### 5.6. Comparing with PCA Method

In this comparison, we use principal component analysis (PCA) to fusion two-lead ECG signal for identification. PCA is a statistical technique whose purpose is to condense the information of a large set of correlated variables into a few variables as principal components, while not throwing overboard the variability present in the data set [27].

For a matrix which is consisting of sample data. The linear transformation of converting to iswhere is considered as extracting the principal components of the original matrix . And is a linear transformation matrix. Each row of is the eigenvector of matrix , and

As our method, in this PCA algorithm, we read two-lead signal with the same steps. The difference is that PCA algorithm adds the second lead data behind the first lead and then use PCA method to reduce the features. Choose the main feature to train and test for human identification.

##### 5.7. Comparison Result with Two-Lead Methods

From our experiment, we determined that when we choose the main 19 features, the result of identification is improved over existing approaches.

Figure 8 shows the comparison of those two algorithms and our sparse matrix algorithm.

We can get our SMCC algorithm which have the better FA/FR rate compared with other two two-lead ECG identification methods.

#### 6. Conclusions

In this paper, a new ECG identification method is proposed with two-dimensional sparse matrix algorithm, in which two-lead ECG signals are fused using a sparse matrix approach. Using experimentation, we demonstrate that two-lead identification offers improvements over one-lead identification. Two typical two-dimensional classifications were compared with our method. The performance of our sparse matrix has around 95.3% accuracy which is better than basic two-dimensions 90.1% and PCA 63.2%. The results show that our sparse matrix has a good performance for ECG identification.

#### Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.