Abstract

Smartphone photoplethysmography is a newly developed technique that can detect several physiological parameters from the photoplethysmographic signal obtained by the built-in camera of a smartphone. It is simple, low-cost, and easy-to-use, with a great potential to be used in remote medicine and home healthcare service. However, the determination of the optimal region of interest (ROI), which is an important issue for extracting photoplethysmographic signals from the camera video, has not been well studied. We herein proposed five algorithms for ROI selection: variance (VAR), spectral energy ratio (SER), template matching (TM), temporal difference (TD), and gradient (GRAD). Their performances were evaluated by a 50-subject experiment comparing the heart rates measured from the electrocardiogram and those from the smartphone using the five algorithms. The results revealed that the TM and the TD algorithms outperformed the other three as they had less standard error of estimate (<1.5 bpm) and smaller limits of agreement (<3 bpm). The TD algorithm was slightly better than the TM algorithm and more suitable for smartphone applications. These results may be helpful to improve the accuracy of the physiological parameters measurement and to make the smartphone photoplethysmography technique more practical.

1. Introduction

Photoplethysmography (PPG) is an optical technique that can detect blood volume changes in the microvascular bed of tissue by using a light source and a detector [1]. The light source transmits light of certain wavelengths that propagates through the microvascular bed of tissue and is received by the photoelectric detector. According to Lambert-Beer law, the light absorbed by blood is associated with the blood volume. Hence, the intensity of the light received by the detector changes synchronously with the blood volume in each heartbeat. This technique is easy to use and low in cost. It has been utilized in medical devices like pulse oximeters and has been widely used in clinical application to measure heart rate and blood oxygen saturation [2]. With the development of modern digital signal processing techniques, PPG can be also used to detect breath rate [3, 4], blood pressure [5, 6], cardiac output [7, 8], arterial stiffness [9, 10], heart rate variability [11], and other underlying physiological information [12].

In recent years, a new kind of PPG technique was proposed based on a smartphone. This smartphone PPG (sPPG) acquires signals from the built-in camera of the smartphone. One only needs to place a finger on the camera lens and capture a video record with the built-in LED flash turned on; then several physiological parameters such as heart rate [1317], respiratory rate [15, 18], pulse volume [17], and oxygen saturation [15] can be estimated from the sPPG signals. With the help of a microphone to detect heart sound, blood pressure can also be estimated [19]. This sPPG technique requires no specific hardware equipment except a smartphone. It needs only a software downloaded in the smartphone that can be used anywhere anytime by anyone. As smartphones are becoming ubiquitous in the world, the sPPG technique shows promising application in remote medicine and home healthcare service.

The sPPG is much the same as the traditional PPG (tPPG, e.g., a pulse oximeter), only replacing the light source and the detector with a LED flash and a camera, respectively. However, the sPPG signals are videos with three dimensions (two-dimensional image and one-dimensional time) while the tPPG signals are time-series with only one dimension. It is necessary to reduce the dimensions of the sPPG to one dimension for digital signal processing. The general approach to deal with the sPPG is to select a region of interest (ROI) where the light intensity changes markedly in the video frames and then to calculate the average intensity of the ROI for each frame to generate a time-series waveform.

The selection of the ROI is an important factor affecting the quality of the waveform and the subsequent accuracy of the physiological parameters measurement. However, to the best of our knowledge, it has not been well studied in literature. Matsumura et al. averaged each video frame among all of the pixels; that is, they set the whole frame as the ROI [20]. Jonathan and Leahy and Scully et al. chose a fixed central region in each frame as the ROI [13, 15]. Chandrasekaran et al. split video frames into four quadrants and empirically selected the first quadrant as the ROI [19]. Karlen et al. selected the best ROI with the maximal pulsatile amplitude after comparing 88 blue channels [21]. To improve the reliability, two other methods were developed: Kurylyak et al. calculated the radius of the fitting circle after binarizing each frame [22], and Po et al. developed a frame adaptive ROI method to detour the color saturation or cut-off distortion [23]. Both these methods had a time-varying ROI, confusing with the time-varying intensity of pixels.

Furthermore, in our previous work of extracting heart rate variability from smartphone photoplethysmograms [24], we found that the selection of ROI had an impact on the quality of the waveform and the conventional fixed ROI was not satisfactory. Therefore, we proposed five algorithms to further investigate the determination of the optimal ROI. These algorithms are variance (VAR), spectral energy ratio (SER), template matching (TM), temporal difference (TD), and gradient (GRAD), and their performances were evaluated using a 50-subject experiment.

2. Methods

2.1. Variance (VAR)

Every video frame is divided into rows columns, blocks in total. and are set to proper values to make every block have a suitable size. Then the average intensity of each block for each frame is calculated over time to generate a time-series waveform. The waveform passes through a 4th Butterworth filter with passband 0.5 to 8 Hz to remove baseline wander and high-frequency noise [25]. Then the variance of the output of the filter is calculated. At last, the block generating the waveform with the maximal variance is selected as the optimal ROI, for the reason that the PPG signal is sine-like and the maximal variance means the maximal signal power.

2.2. Spectral Energy Ratio (SER)

The frame division and waveform generation are the same as described in Section 2.1, but without filtering, and the SER rather than the variance of the waveform is calculated. The SER is first introduced by Lee and Wei for spectral analysis of pulse signals [26]. We modified its definition as the ratio of the energy in the range of 0.5–3 Hz to the total energy of the waveform (1). The range 0.5–3 Hz is chosen because the frequency of heartbeats is usually in this range, corresponding to normal heart rates from 30 to 180 beats per minute (bpm). A higher SER indicates a larger proportion for cardiac activity and a smaller proportion for noise and interference in the total energy. Thus the block with the maximal SER is selected as the optimal ROI:where is the power spectrum and is the frame rate of the camera.

2.3. Template Matching (TM)

The frame division, waveform generation, and filtering of the TM algorithm are the same as described in Section 2.1. Then the waveform is cross-correlated with a template, which is a typical PPG signal shown in Figure 1, to measure the similarity between them. The cross-correlation is realized by a matched filter [27]:where is the input, is the output, and is the impulse response which is the same as the template except flipped left-for-right. Afterwards, the similarity is quantified as the amplitude of . The higher the similarity is, the more the waveform matches the template. Thus the block with the maximal similarity is selected as the optimal ROI.

2.4. Temporal Difference (TD)

TD is a commonly used algorithm to separate moving objects and the background [28]. It can be also applied to sPPG video processing. First, the TD is calculated as the absolute difference of the intensity for each pixel between two adjacent frames. It reflects the intensity variation of each pixel. In most cases, the value of the TD is too small and is sensitive to noise. Therefore, the TD for every interframe during a time interval is summed to reduce the effect of the noise. The time interval can be set to 2 s or longer to cover at least one heart cycle that is long enough to completely reflect the intensity variation caused by cardiac activity. Thereafter, the summed TD map is divided into rows columns, blocks in total. The average of the TD value in each block is counted and the block with the greatest average is selected as the optimal ROI.

2.5. Gradient (GRAD)

From some preliminary results of the four algorithms mentioned above (Figure 2), we observed that the optimal ROI was neither the brightest block nor the darkest one; it often existed in the blocks with medium intensity between the brightest and the darkest, namely, the transition region with significant changes of the intensity for pixels. In light of this observation, we thereby proposed the GARD algorithm. First, a frame of image is chosen from the sPPG video and its gradient is calculated. Then the gradient map is divided into rows columns, blocks in total. The average gradient of each block is calculated and finally the block with the greatest average gradient is selected as the optimal ROI.

3. Evaluation

We evaluated the effectiveness of the aforementioned five algorithms with a 50-subject experiment. The experiment was approved by the Institutional Review Board of Shenzhen Institutes of Advanced Technology (registration number: SIAT-IRB-140215-H0040). The subjects included 34 males and 16 females, age 20–31 years, height 150–183 cm, and weight 40–90 kg. They were all healthy without any known diseases and their written informed consent was obtained.

In the experiment, all the subjects were instructed to lie on a mattress and to place their right index finger on the camera lens of an HTC S510e smartphone with the built-in LED flash turned on. A camera application (APP) in the smartphone recorded the video of the fingertip for 1 minute with a resolution of pixels at the sampling rate of 30 frames per second (fps). Simultaneously, a Finometer MIDI (Model II, Finapres Medical Systems B.V., The Netherlands) was used to collect the electrocardiogram (ECG) signals at a sampling rate of 200 Hz and automatically stored the signals in the computer by a BeatScope Easy software (Finapres Medical Systems B.V., The Netherlands). The subjects were asked to keep as still as possible throughout the recording period.

The five algorithms introduced in Section 2 were employed to determine the optimal ROI. To compare with them, a sixth algorithm that sets a fixed central region (FCR) of the frame as ROI was also employed, since it was most used in literature [13, 15]. Then the time-series waveform of the selected ROI for each algorithm was generated as the average intensity in the red channel of the ROI for each frame. Afterwards, the waveform was filtered and processed by Fast Fourier Transform (Figure 3) to estimate the heart rate near the heartbeat frequency [13], about the range of 0.5–3 Hz corresponding to 30–180 bpm in normal heart rate.

On the other hand, the “true” heart rate was calculated by ECG analysis. R-wave peaks of the ECG were detected using Pan and Tompkins’ algorithm [29] and the heart rate was determined as 60 times the inverse of the mean R-to-R intervals (RRI), shown in

To evaluate the accuracy of the six algorithms for ROI selection, the heart rates estimated from the sPPG were compared with those estimated from the ECG by using statistical analysis (note that two subjects were excluded for the FCR algorithm, explained below). As shown in Table 1, the Pearson correlation coefficients for VAR, TM, TD, and GRAD were all greater than 0.95, except for SER and FCR. All the six algorithms had the standard errors of estimate (SEE) less than 5 bpm, especially the TM and TD less than 2 bpm. The Bland-Altman analysis showed that all the six algorithms had a bias less than 1 bpm. The VAR, TM, TD, GRAD, and FCR had the limits of agreement (LA) less than 5 bpm but the SER had the LA greater than 5 bpm. In Figure 4, the Bland-Altman plots revealed that more than 95% of the data points fell within LA for all the six algorithms.

Previous research has suggested that the correlation coefficients should be greater than 0.90 and the SEE should be less than 5 bpm for heart rate monitors [16]. Accordingly, all the six algorithms are valid, which indicates that the sPPG technique can provide accurate measurement of heart rate. However, the performances of these algorithms are different. The FCR algorithm failed in two subjects because the intensity in the central region of the video was saturated in the two subjects and no signals could be extracted, whereas the five proposed algorithms always found the optimal ROI and calculated the heart rate.

As to the five proposed algorithms, in general, the TM and the TD algorithms were better than the other three because they had greater coefficients, smaller SEE, and smaller LA. The SER was worse than the other four because it had the least coefficients, the largest SEE, the largest bias, and the largest LA. The performance of the VAR and GRAD was in the middle.

4. Discussion

4.1. Advantages and Disadvantages of the Five Algorithms

All the five algorithms are simple in principle and easy to implement. Each of them can select the optimal ROI according to its own decision rules. The VAR, the SER, and the TM are based on waveform processing so that they have to perform frame division and waveform generation first, which is time-consuming. However, the TD and the GRAD are based on image processing so that they can perform ROI selection before the waveform generation. This is time-saving and well suits smartphones with limited processing power, but the ROI selected may not be optimal to generate the waveform with the best quality. This weakness is acceptable because the ultimate objective of the sPPG technique is the extraction of physiological parameters like heart rate, and a suboptimal ROI will still work well (see Table 1).

The VAR algorithm selects the block generating the waveform with the maximal variance as the optimal ROI. The maximal variance is equivalent to the maximal power if the waveform is zero-mean. However, the VAR algorithm neglects whether the power is produced from the signal or the noise. It might report a wrongly selected ROI in the case that the waveform is polluted by square waves, triangular waves, or other fluctuations. Fortunately, these cases rarely happen in practice and the VAR algorithm works well with the SEE less than 3 bpm and the LA less than 5 bpm.

The SER algorithm was expected to have a good performance but it did not. The reason may be that the sPPG waveform is periodic and has harmonics out of the range 0.5–3 Hz (Figure 3). Consequently, the SER is not a good quantitative index for the ratio of heart energy to the total energy.

The TM algorithm needs a preset reference template and measures the similarity between the waveform and the template. Any PPG-like wave can be set as the template. The exact shape of the template is not important because the exact matching is not concerned. From the standpoint of filtering, a matched filter is a band-pass filter that maximizes the output signal-to-noise ratio. Therefore, the TM algorithm works better than the VAR algorithm.

The TD algorithm is commonly used to separate moving objects and the background. It is also useful to reflect the intensity variation caused by heartbeats in sPPG videos. It works much better than the VAR, the SER, and the GRAD and slightly better than the TM algorithm. It has another advantage to recognize the incorrect placement when the finger does not cover the camera lens. Its drawback is that it is sensitive to the finger’s movement. Nevertheless, this can be avoided by the subject’s self-control.

The GRAD algorithm is an empirical algorithm. It also works well with the SEE less than 3 bpm and the LA less than 5 bpm. However, it may mistake the edge of the finger for the optimal ROI when the finger half-covered the lens.

To summarize, the TM and the TD are better than the other three algorithms. The TD algorithm is slightly better than the TM and more suitable for smartphone applications.

4.2. Spatial Resolution and Temporal Resolution

The spatial resolution has a negligible impact on the sPPG waveform, since the spatial resolution describes the ability of the camera to show clear details which are not concerned in the sPPG technique, and the video frames are blurred by averaging all the pixels in the ROI. On the other hand, compared with the sPPG, the tPPG can be regarded as a camera with only one pixel.

On the contrary, the temporal resolution has a significant impact on the sPPG waveform. According to Nyquist sampling theorem, the sampling rate should be higher than twice the signal frequency. Therefore, the frame rate of the camera should be greater than 6 fps to detect the heart rate if the normal heart rate is less than 3 Hz (180 bpm). Fortunately, most of the commercial smartphones meet the requirements. But if more are required to detect the details of the waveform like peaks and dicrotic notches, the frame rate should be greater than 40 fps to reconstruct the complete pulse wave after digitization, as the maximum frequency of the pulse wave is less than 20 Hz [30].

4.3. Shape and Size of the ROI

Theoretically, the shape of the ROI should be round, for the light isotropically travels through space. In practice, a rectangular shape is more appropriate for computer processing.

The size of the ROI is a trade-off between the computational load and the antinoise capability. If the size is too large, the waveform generated by averaging the ROI is less sensitive to the noise, but more computational time is needed. If the size is too small, the computational time is reduced but the generated waveform contains more noise. Nam et al. suggested a ROI size of pixels and found that a larger ROI could not provide better signal quality [18]. As a rule of thumb, we think a proper size with pixels is workable.

5. Conclusion

The sPPG technique is easy to use and low in cost. It has great potentials to be applied in remote medicine and home healthcare service, especially for rural district and developing countries. However, the determination of the optimal ROI is an important practical problem that one encounters when dealing with sPPG videos. We thereby proposed five algorithms to solve this problem in the present study. The results showed that the TM and the TD algorithms were better than the other three as they had less standard error of estimate and smaller LA. The TD algorithm was slightly better than the TM algorithm and more suitable for smartphone applications. Therefore, the TD algorithm can be used in smartphones to promote the practicability of the sPPG technique. It may be also helpful to improve the accuracy of the physiological parameters measurement.

Conflict of Interests

The authors declare that they have no conflict of interests.

Acknowledgments

This study was funded by the National Basic Research Program 973 (no. 2010CB732606), the National Natural Science Foundation of China (no. 61401453), the STS Key Health Program of Chinese Academy of Sciences (nos. KFJEW-STS-097 and KFJ-EW-STS-095), the Guangdong Innovation Research Team Fund for Low-Cost Healthcare Technologies in China, the External Cooperation Program of Chinese Academy of Sciences (no. GJHZ1212), the Key Lab for Health Informatics of Chinese Academy of Sciences, the Peacock Program to Attract Overseas High-Caliber Talents to Shenzhen, and Shenzhen Municipal Government (nos.CXZZ20150504145109589 and JCYJ20150630114942270).