Abstract

Nowadays, there is a global change in lifestyle that is moving more toward the use of e-services and smart devices which necessitate the verification of user identity. Different organizations have put into place a range of technologies, hardware, and/or software to authenticate users using fingerprints, iris recognition, and so forth. However, cost and reliability are significant limitations to the use of such technologies. This study presents a nonfiducial PPG-based subject authentication system. In particular, the photoplethysmogram (PPG) signal is first filtered into four signals using the discrete wavelet transform (DWT) and then segmented into frames. Ten simple statistical features are extracted from the frame of each signal band to compose the feature vector. Augmenting the feature vector with the same features extracted from the 1st derivative of the corresponding signal is investigated, along with different fusion approaches. A support vector machine (SVM) classifier is then employed for the purpose of identity authentication. The proposed authentication system achieved an average authentication accuracy of 99.3% using a 15 sec frame length with the augmented multiband approach.

1. Introduction

Today, with an increasing dependency on the information systems, identity authentication has become an essential part of life. From accessing mobile phones to performing online financial transactions, a user needs to authenticate his/her identity. Generally, there are three approaches for user authentication: (1) soft keys (e.g., passwords), (2) hard keys (e.g., smart cards), and (3) biometrics [1]. Traditional authentication techniques such as account passwords and smart cards are still widely used, but they are not reliable enough because they are easily stolen, lost, forgotten, and forged. In the recent years, biometric-based identity authentication has been gaining more attention. The International Organization for Standardization (ISO) defined biometrics as the automated recognition of individuals based on their behavioral and biological characteristics [2]. Biometric-based identity authentication is aimed at uniquely identifying individuals based on their physiological and/or behavioral characteristics such as fingerprint, face, retina, palm print, lip movement, gait, DNA, voice, EEG, or ECG [219].

In 2003, Gu et al. presented the first attempt at using a PPG signal for identity authentication [20, 21]. Hertzman and Spielman in 1973 described the PPG [22] and a typical PPG device, which consists of two parts: the light source and the photodetector. The light source emits light to be reflected off tissue while the photodetector detects and measures the reflected light, which is proportional to blood volume variation [23]. Different light colors are used for different applications, where the most popular colors are red and green [24]. The PPG-based identity authentication technique has diverse advantages over other biometric approaches because it is easy to set up, is simple to implement, has low cost, and could be placed comfortably in different parts of the body such as the finger or wrist.

The biometric identity authentication system consists of two main phases: enrollment and authentication. The preprocessing and feature extraction stages are common in both phases. In the authentication phase, the decision stage is usually a classification process. Many researchers proposed different PPG-based identity authentication methods using various feature extraction approaches such as fiducial, nonfiducial, and hybrid with the utilization of diverse classifiers. For example, Gu et al. [20] used four fiducial features: the peak number, the upward/downward slopes, and the time interval. A feature vector template was then formulated for each subject, and the decision was made using Euclidian distance. With a dataset of 17 healthy subjects, they achieved 94% identification accuracy. Furthermore, the authors tried a fuzzy-based classification approach [21]. Kavsaoğlu et al. [24] presented a fiducial-based recognition system using 40 features extracted from PPG’s signal first and second derivatives and KNN classifier. They applied their method on a 15-period PPG signal belonging to 30 healthy subjects, where the best result they achieved was an accuracy of 94.44%. Sarkar et al. [25] proposed a dynamical model-based authentication approach, which maps each cardiac cycle to a limit cycle, which facilitates the decomposition of the PPG signal to a sum of Gaussian functions, where the Gaussian parameters serve as feature templates. Using the data of 23 subjects of the DEAP dataset [26], they achieved identification accuracies of 90% and 95% with 2 and 8 seconds of PPG test signal data, respectively. Jindal et al. [27] proposed a two-stage identification system. In the first stage, the individuals are clustered into different groups, and then, Boltzman Machines and Deep Belief Networks are used for the identification. They tested their approach using 12 subjects of the Troika dataset [28] and achieved an identification accuracy of 96.1%. Choudhary and Manikandan [29] presented a template matching identity authentication approach using normalized cross-correlation. They tested their method on 30 subjects and achieved an average false rejection rate (FRR) of 0.32 and false acceptance rate (FAR) of 0.32. Luque et al. [30] presented a nonfiducial subject authentication approach based on convolutional neural networks and evaluated their approach using datasets PulseID and Troika with accuracies of 78.2% and 83.2%, respectively. Yadav et al. [31] proposed a template matching identity authentication approach using Continuous Wavelet Transform (CWT) for feature extraction and linear discriminant analysis for feature dimensionality reduction. They achieved an equal error rate of 0.46% using the CapnoBase [32] dataset. Biswas et al. [33] presented a subject identification approach using a four-layer deep neural network. They tested their approach using 22 subjects of the Troika dataset and achieved an average accuracy of 96%.

In this study, we present a new nonfiducial approach for subject identity authentication which relies on simple statistical features of DWT-based filter signals of the PPG signal and a support vector machine classifier. The PPG signal is first filtered into four signals and then segmented into frames. The first derivative of each DWT frame is also computed. Different feature fusion approaches are also studied. It has been found that the proposed authentication system can have an average authentication accuracy of 99.30% with a frame length of 15 sec when the feature vector is composed of features extracted from frames of all band signals and their corresponding 1st derivatives. This result outperforms the best-performing methods in the literature, as will be demonstrated in Section 3. The paper is structured as follows: Section 2 presents the proposed system, the performance evaluation is presented in Section 3, and Section 4 contains the concluding remarks.

2. System Model

The model of the proposed authentication system is shown in Figure 1. It includes signal acquisition, signal framing, and feature extraction stages in both the enrollment and authentication phases. In signal framing, three tasks are performed: signal smoothing, signal filtering, and signal segmentation. The subject authentication stage makes use of a SVM classifier for identity authentication through a one-versus-all classification approach. In this work, the classifier is trained and evaluated using the CapnoBase dataset which is the largest publically available dataset. This dataset was developed at the University of British Columbia, Vancouver, Canada, in 2009 and contains respiratory signals obtained from capnography and spirometry. It also includes the CO2 and photoplethysmogram (PPG) waveforms of 42 subjects: 29 children (median age: 8.7, range 0.8−16.5 years) and 13 adults (median age: 52.4, range 26.2−75.6 years). The pulse peaks from PPG and breaths from CO2 are annotated by experts. Each subject has a recording session of 8-minute length using a sampling frequency of 300 Hz [32].

2.1. Signal Framing

In the signal framing stage, three tasks are performed: PPG signal smoothing, filtering the PPG signal using DWT, and signal segmentation. A moving median filter with a window size of 44 samples, selected based on extensive experimentation, was employed for smoothing the PPG signal. The filtered DWT signals are then segmented into frames with a nonoverlapping sliding window of size 1, 3, 5, 7, 10, or 15 sec. Different frame lengths are selected, along the lines of other works available in the literature [25, 29], to study the effect of frame length on the authentication system irrespective of individual PPG cycle. Table 1 shows the frame length in seconds and samples, as well as the number of related frames.

Figure 2 presents the frequency spectrum of the PPG signals averaged over all subjects considered in this study. As can be seen in the figure, the spectral energy is mainly concentrated on the frequencies below 12 Hz. Therefore, we use multiresolution wavelet transform to decompose the PPG signal into four bands covering the range from 0.1 to 18 Hz. The subbands are then extracted by passing the PPG signal through an iterated filter bank, as shown in Figure 3. In this work, the four subband coefficients, namely, , , , and , are estimated using the second member of the Daubechies wavelet family [34, 35]. Figure 4 shows the frequency responses of the corresponding subband filters.

The coefficients of the four subbands are used to reconstruct the four filtered signals. Here, the symbols , , , and are used to denote the four filtered signals which are segmented using different frame lengths, as mentioned previously.

2.2. Feature Extraction and Fusion

This subsection discusses the feature extraction approaches adopted in this work. The feature vector is formed by extracting 10 features from each preprocessed frame. The extracted features are the mean, median, variance, standard deviation, interquartile range, the first quarter (Q1), the third quarter (Q3), kurtosis, skewness, and entropy. The definitions of these features are well known and can be found in [36, 37]. We consider eight cases to form the feature vector as follows: (1)Feature vector , extracted directly from the time domain PPG signal(2)Feature vector , extracted from the 1st derivative of the time domain PPG signal(3)Feature vector , which is the fusion of and (4)Feature vector , extracted from the th filtered signal of the DWT of the PPG signal (, 2, 3, 4)(5)Feature vector , which is the fusion of , (6)Feature vector , extracted from the 1st derivative of the th filtered signal of the DWT of the PPG signal(7)Feature vector , which is the fusion of and (8)Feature vector , which is the fusion of

The main motivation behind the computation of all these feature vectors is to conduct a comprehensive investigation for the purpose of determining the most influential features for the authentication process. Figure 5 shows the method used to generate the feature vectors and the fused feature vector . Figure 6(a), on the other hand, is a box plot showing the distribution values of the statistical features extracted from a PPG signal framed in 5 sec length of subject 1, while Figure 6(b) presents the distribution values of the statistical features extracted from the 1st derivative of the PPG signal of the same subject. Noticeably, most of the features’ values extracted from the 1st derivative are near zero and with low dispersion.

The capability of these features to separate subjects is examined using the -distribution stochastic neighbor embedding (-SNE) algorithm [38], which is commonly used for data dimensionality reduction. The -SNE algorithm preserves both the local and global structures of data; hence, it facilitates its visual inspection. For illustration purposes, we apply the -SNE algorithm to the features extracted from data frames of seven subjects with a frame length of 5 sec to show the separability of the subjects’ frames, as depicted in Figure 7. Subject 2 and subject 5 are well separated from other subjects. However, subject 1, subject 6, and subject 7 are less separable, as is also the case with subject 3 and subject 4.

Figure 8 presents the results of applying the -SNE algorithm to the augmented (concatenated) features extracted from the same seven subjects. The figure demonstrates that the subjects’ frame separability is now better, which indicates that augmentation would improve the classification stage.

Figure 9 shows the methods used to generate the feature vectors . We construct the feature vector via four ways: single filtered signal (), multifiltered signals (), augmented single filtered signal (), and augmented multifiltered signals (). In the single filtered signal, the features are extracted from a specific filtered signal of a specific band while in multifiltered signals, the features are extracted from multifiltered signals and fused (concatenated) to compose a feature vector, as shown in Figure 9. Also, we study the effect of augmenting the extracted features with features of the 1st derivative of the corresponding signal.

Figure 10 contains boxplots showing the filtered signals’ feature value distribution. Notice that there are variations in the features’ values among the different bands.

Applying the -SNE algorithm to features extracted from a single filtered signal of the seven subjects mentioned before with a frame length of 5 sec shows that the subjects’ frames are not well separable, as demonstrated in Figure 11. However, using multifiltered signals improved the separability of subjects’ frames. The augmented multibands gave the best separability as shown in Figure 12. It is relevant here to note that the abbreviation means filtered signals 1, 2, 3, and 4.

2.3. Authentication

In the authentication phase, a one-versus-all classification approach is adopted. The classifier is trained on feature vectors extracted from the data of the target subject and other subjects. In this case, the positive instances will be much lower than the negative instances in the training dataset. Therefore, to balance the two sample sets, we employed an oversampling positive instance strategy [39], which replicates the positive instances to match the negative instances. The classifier results are binary “1” for the target subject and “0” otherwise. In this work, an SVM with a radial basis function (RBF) kernel is utilized as a classifier [40, 41].

3. Performance Evaluation

In this section, the performance of the proposed approaches is presented and discussed. Three widely used performance metrics are considered: accuracy, equal error rate (EER), and area under the curve (AUC) [31, 42]. The first metric is defined as , where true positives (TP) (true negative (TN)) refers to a positive (negative) instance that is correctly classified as positive (negative), and false positive (FP) (false negative (FN)) means a positive (negative) instance that is incorrectly classified as positive (negative). Furthermore, the equal error rate is the error rate at which both the false positive rate and the false negative rate are equal. The AUC is the area under the receiver operating characteristic (ROC) curve created by plotting the true positive rate against the false positive rate at various thresholding settings.

The results are obtained using the CapnoBase dataset, which is composed of 42 subjects. The feature vectors are extracted from the subjects’ frames and divided into training and testing sets. The training set consists of 60% of the feature vectors of each subject, while the remaining 40% is used to evaluate the trained model.

Figure 13 presents the performance results using different frame lengths when the features are extracted directly from the PPG signal’s frames. The approach achieved an average accuracy greater than 93% when the frame length is equal to or greater than 3 seconds. The best average accuracy of 95.03% is achieved using a 15 sec frame length. Augmenting the feature vector with the same 10 features extracted from the 1st derivative of the signal improved the performance results, as illustrated in Figure 13. The augmented method achieved an average accuracy greater than 93.9% with all the frame lengths and achieved the best average accuracy of 97.89% using a 15 sec frame length.

Figure 14 presents the performance results in terms of accuracy when extracting the features from a single filtered signal and multifiltered signals ( means bands ). When only a single filtered signal is considered, the best average authentication accuracy of 90.16% is achieved when using the fourth filtered signal () with a 15 sec frame length. Generally, extracting and fusing features from multisignals yield better performance results. Extracting features from the filtered signals 1, 2, 3, and 4 () achieved an average accuracy of 91.88, 96.79%, 97.77%, 98.17%, 98.48, and 98.69% with 1, 3, 5, 7, 10, and 15 sec frame lengths, respectively. Figure 15 shows the relationship between the segment length and the achieved performance of extracting features from multifiltered signals (). It shows that as the length of the frame increases, the accuracy also increases. The EER of using a 1 sec frame length is more than twice that obtained using a 15 sec frame length. The AUC of using a 1 sec frame length is >96%, while using a 15 sec frame length, it is more than 99%.

Figure 16 presents the performance results of augmenting the extracted features from a single filtered signal and multifiltered signals with features from the 1st derivative of the corresponding signal. Augmented features () extracted from the third filtered signal () and fourth filtered signal () achieved the highest average accuracies of 94.6% and 94.24% with a 15 sec frame length. Augmented features () from the multisignals () achieved an average accuracy of 94.27%, 98.09%, 98.82, 99.02%, 99.13%, and 99.3% with 1, 3, 5, 7, 10, and 15 sec frame lengths, respectively. Figure 17 shows that the EER and AUC are significantly improved using the augmented features extracted from the multisignals ().

Table 2 shows the performance of five classifiers named SVM [40, 41], linear discriminant analysis (LDA) [43], Random Forest (RF) [44], Naïve Bayes (NB) [45], and -Nearest Neighbor (KNN) [46] using features of extracted from a segment length of 15 sec. SVM and RF (with 100 decision trees) have a similar accuracy performance while SVM has the lowest EER among all classifiers.

Table 3 compares the results achieved using the proposed approaches with the performance of identity authentication methods which are available in the literature. The table includes the method with its reference, the number of subjects considered in the study, and the average authentication accuracy. It is noteworthy that the augmented features extracted from the four filtered signals achieved average authentication accuracy greater than 99%. In the opinion of the authors, the results obtained, which are more than 98% with a 3 sec frame length and 99.3% with a 15 sec frame length, demonstrate that the proposed authentication system is suitable for practical applications.

4. Conclusion

In this paper, a nonfiducial PPG-based subject authentication system has been proposed which relies on statistical features and support vector machine as a classifier. The photoplethysmogram (PPG) signal is first filtered into four signals using the discrete wavelet transform (DWT) and then segmented into frames. Ten simple statistical features are extracted from the frame of each signal band to compose the feature vector. Augmenting the feature vector with the same features extracted from the 1st derivative of the corresponding signal is investigated. In addition, different fusion approaches are also investigated. A support vector machine (SVM) classifier is employed for the purpose of identity authentication. The proposed authentication system achieved an average accuracy of 91.88, 96.79%, 97.77%, 98.17%, 98.48, and 98.69% with 1, 3, 5, 7, 10, and 15 sec frame lengths, respectively, with features extracted from the filtered signals 1, 2, 3, and 4 (). The augmented multifiltered signal () method achieved an average accuracy of 94.27%, 98.09%, 98.82, 99.02%, 99.13%, and 99.3% with 1, 3, 5, 7, 10, and 15 sec frame lengths, respectively. Therefore, there is a trade-off between latency and accuracy when the choice of appropriate frame length depends on the intended application. The identity authentication accuracy obtained is better than the highest performing methods currently in the literature, as demonstrated in Table 3. The investigation of different transformation domains and feature extraction methods with larger datasets will be the topic of future work. In addition, there is significant potential for exploring other application domains of the utilization of PPG signals, as in a patient’s diagnosis [47].

Data Availability

The data is available on this website (http://www.capnobase.org/) and can be downloaded from it. It is also referenced in this paper [32].

Conflicts of Interest

The authors declare that there are no conflicts of interest.

Acknowledgments

This work was supported by the Researchers Supporting Project number (RSP-2020/46), King Saud University, Riyadh, Saudi Arabia.