Abstract

Methods of the electrocardiography (ECG) signal features extraction are required to detect heart abnormalities and different kinds of diseases. However, different artefacts and measurement noise often hinder providing accurate features extraction. One of the standard techniques developed for ECG signals employs linear prediction. Referring to the fact that prediction is not required for ECG signal processing, smoothing can be more efficient. In this paper, we employ the -shift unbiased finite impulse response (UFIR) filter, which becomes smooth by . We develop this filter to have an adaptive averaging horizon: optimal for slow ECG behaviours and minimal for fast excursions. It is shown that the adaptive UFIR algorithm developed in such a way provides better denoising and suboptimal features extraction in terms of the output signal-noise ratio (SNR). The algorithm is developed to detect durations and amplitudes of the P-wave, QRS-complex, and T-wave in the standard ECG signal map. Better performance of the algorithm designed is demonstrated in a comparison with the standard linear predictor, UFIR filter, and UFIR predictive filter based on real ECG data associated with normal heartbeats.

1. Introduction

The electrocardiography (ECG) signals play a key role in diagnosing diverse kinds of heart diseases. Because the pulses produced by heart may have subtle differences from each other and noise affects the decision accuracy, the ECG is commonly organized using precise electronic equipment [1]. Accurate measurements are especially required when data are used to extract features of ECG signals and make decisions about different kinds of heart diseases employing special software. However, even very precise measurements are typically contaminated by artefacts and noise. Artefacts may result from a variety of internal and external causes, such as the Parkinsonian muscle tremors drying electrode gel. Different kinds of noises may contaminate the ECG signal during its acquisition and transmission, such as the high frequency noise (electromyogram noise, additive white Gaussian noise, and power line interference) and low frequency noise (baseline wandering). Because noise may lead to wrong interpretation, ECG signal denoising is required. Therefore, significant attention has been paid during the last decades to develop mathematical methods and computation algorithms to extract the ECG features from regular (noisy) data with an accuracy sufficient for medical needs [212].

The Fourier transform-based approach has been developed in [13] to extract ECG signal features in the frequency domain. But, this method omits the time resolution, which affects the estimation accuracy. This issue has been circumvented in some other works by providing the time-frequency analysis without significantly affecting the resolution. In [1417], the wavelet transform-based algorithms were developed to find applications in some medical areas. In the wavelet domain, a compromise between the frequency and time resolutions is achieved easier and one can select a proper wavelet to provide a reasonable accuracy. However, a choice of an optimal wavelet is still challenging [18] and the approach has low efficiency in smoothing ECG signals. Other algorithms tested for such needs include the principal component analysis (PCA) [19], linear discriminant analysis (LDA) [20], independent component analysis (ICA) [21], support vector machine [22], and neural networks [23].

One of the widely recognized approaches proposed in [24] provides noise reduction and features extraction from ECG data by employing linear prediction based on the theory developed in [25]. The approach suggests that all main features of ECG signals can be saved and gained using a one-step linear predictor. Accordingly, features extraction in the QRS complex (region of fast ECG excursions) is provided from an analysis of residual errors between the data and estimates. The approach has manifested itself as useful in the detection of arrhythmias. In other works employing one-step prediction [26, 27], automatic classification of the ECG cardiac abnormalities is provided using Gaussian mixtures. Later, the prediction-based approach has been recognized as one of the standard techniques suitable for ECG signals [28].

It has to be remarked now that, from the standpoint of optimal filtering, prediction is less accurate in noise reduction than filtering and much less accurate than smoothing. On the other hand, the ECG signal processing problems do not imply predicting future values and smoothing with some time-lag may be a better choice for cardiac analysis. A classic example is the Savitzky-Golay filter (smoother) [29], which has found wide applications in diverse areas [3035].

An optimal approach to provide smoothing and state estimation in linear models has been proposed in [36] to minimize the mean square error (MSE). A solution was found on a horizon of data points, where corresponds to a fixed discrete point of the ECG signal, , and is a discrete shift. The derived optimal FIR (OFIR) filter becomes smoothing with lag by , provides filtering with , and becomes -step predictive when . However, the -shift OFIR filter requires information about noise, which is not completely available for ECG signals.

A special case of the -shift OFIR filter is the -shift unbiased FIR (UFIR) filter [3639], which completely ignores zero mean noise and is thus more suitable for ECG signals. As being more general, the -shift UFIR filter generalizes the Savitzky-Golay filter by and linear predictor with . Although such a filter does not require the noise statistics except for the zero mean assumptions, it provides nice near optimal estimates if is set optimally as by minimizing the MSE [36].

In this paper, we develop an adaptive-horizon UFIR smoothing filtering algorithm for denoising ECG signals and features extraction. We also investigate the trade-off between the UFIR smoothing filter, UFIR filter, and UFIR predictive filter and compare them to the standard linear predictor suggested in [24]. We base our investigations on the MIT-BIH Arrhythmia Database available for free use from [40, 41]. Focused on the design of efficient algorithms, in this paper we limit our investigations by data associated with normal heartbeats and postpone an analysis of different kinds of heart diseases to future investigations. The rest of the paper is organized as follows. In Section 2, we describe the database, theory of the algorithms proposed, and validation. The experimental results are showed in Section 3, where we provide a comparison between the UFIR, UFIR smoothing, and UFIR predictive filtering algorithms. A discussion of the results is provided in Section 4 and generalizations with concluding remarks are given in Section 5.

2. Material and Methods

2.1. Materials

This work employs the MIT-BIH Arrhythmia Database as a benchmark. This database contains 48 ECG recordings applying two leads (e.g., MLII, V1), obtained from 47 subjects studied. The recordings have a sampled frequency 360Hz per channel with 11-bit resolution over a 10 mV range. In general, the lead most common in this database is MLII where the morphology of signal ECG is seen clearly. For testing process the MLII lead and sinusal normal rhythm are considered. Moreover, an additional test with ECG synthetic data is provided by specialized software of MATLAB designed by Karthik Raviprakash. The simulated ECG signals are based on principle of Fourier series. Here, the signal is corrupted by different levels of Gaussian white noise. Below is a brief description of the characteristics of the ECG signal.

2.1.1. ECG Signal

The morphology of heartbeat is fundamental for extracting features of ECG signals, which are quasiperiodic as sketched in Figure 1. The heartbeat pulse can be represented with four fundamental features: -wave (left slow excursion), -complex (central fast excursion), -wave (first right slow excursion), and -wave (second right slow excursion).

Several problems arise while processing ECG signals shown in Figure 1:(i)Measurement data are commonly contaminated by noise, which may not be Gaussian and white.(ii)Standard features depicted in Figure 1 must be estimated with highest accuracy to avoid medical mistakes.(iii)The ground truth (reference model) is not available to tune an estimator optimally.

Under such conditions, two approaches relying on accurate identification of heartbeat pulses are commonly considered to extract ECG signal features: fiducial and nonfiducial. The fiducial approach refers to the characteristics such as amplitude and heart rate, which are related to the duration, amplitude, and wave shape [4449]. The nonfiducial approach refers to quasiperiodicity of ECG signals [28] and all features are separated into three main categories based on autocorrelation, phase-space, and frequency-domain analysis.

2.2. Methods
2.2.1. -Shift UFIR Smoothing Filtering

Let us suppose that the ECG signal (Figure 1) is contaminated by zero mean additive noise with unknown statistics. Then measurement of can be represented in discrete-time index as an additive sum ofIn view of the fact that noise in (1) may not be white Gaussian and its statistics are commonly not well-known, the best way to avoid large estimation errors is using filters that do not require information about the statistics of noise. The -shift UFIR filter, which completely ignores noise and the initial conditions, can thus be considered as a good candidate.

On a finite horizon of points, the ECG signal can be represented with a degree polynomial and the -shift UFIR filter [37] applied to remove noise. In accordance with [37], the UFIR estimate of via data taken from can be found in the convolution-based form ofwhere is the -variant impulse response of the -degree UFIR filter, the extended measurement vector isand the filter gain matrix is given byTo satisfy the unbiasedness conditionwhere means an average of , then can be represented as [37, 50]where and the coefficients are defined by [37]Here, is the determinant of matrix , where is the Vandermonde matrix,and is the minor of . Function has the following fundamental properties [37, 50]: For low-degrees, and , one can find in Appendix A. For higher degrees, can be computed using a recurrence relation [51, 52] and then is obtained by a projection. Of importance is that the UFIR estimate (2b) does not require the noise statistics and initial values. The zero mean noise is allowed to have any distribution and covariance [53, 54] that is a fundamental difference with optimal estimates.

2.2.2. ECG Signal Denoising on Adaptive Horizons

The determination of optimal horizon is critical in UFIR filtering and smoothing [42]. Because a reference signal is unavailable for ECG data, can be found following [55] via the mean square value (MSV) of the measurement residual . It has been shown in [55] that can be estimated by minimizing the derivative asTo optimize the horizon, let us consider a single ECG pulse shown in Figure 2. As can be seen, the ECG pulse is slowly changing, except for a fast excursion in the QRS region. The slow background requires an optimal horizon in order to provide best denoising with no essential bias. On the contrary, the QRS region requires a minimum horizon of points to track the behaviour exactly. The horizon must thus be adaptive.

2.2.3. General UFIR Smoothing Algorithm

The general UFIR smoothing algorithm is represented with a pseudocode listed as Algorithm 1. It requires values of , , and described in Section 2.1.1. Function provides a vector and CalculateV calculates matrix given by (8). Vector contains the UFIR filter coefficients (6). Provided there are   and , the -degree matrix is computed and estimate is provided by (2b). We will use this algorithm at different horizons as smoothingUFIR function.

Data:  , , ,
Result:  
1: Begin:
2: ()
3:
4: = ()
5: =()
6: =
7:
2.2.4. Computing for ECG Data

Optimal horizon is provided by the algorithm designed, with a pseudocode listed as Algorithm 2. This algorithm requires the following variables: heartbeats data , filter degree , a set of heartbeats , the number of heartbeats , and the window width .

Data:  , , , ,
Result:  
1: Begin:
2: for   to   do
3:  
4:  =length()
5:  for   to   do
6:   (,,)
7:   IntervalQRS
8:   
9:   
10:   
11:   =Cat
12:   
13: end for
14: 
15: end for
16: Average
17: CubicFit
18:

By defining (8) and (9) and then analysing (12), the filter coefficients specified by (13) are obtained for given , , and . Next, coefficients are computed for (11) and estimate (1) is provided as . Function IntervalQRS is introduced to detect Q and S via data and a value called , which is related to the window width.

The window covers a region including Q and S and is used as . Because will produce highly biased estimates around Q and S, the window is split into three parts:where points and determinate the window width for . The horizon is applied to the first part (13) and third part (15). In the second part (14), estimation is provided with .

Function Cat is used to concatenate estimates (13)–(15) and compute the final estimate . Provided we have  , function is calculated for in the scale. This variable is saved as to represent a whole set of data for different heartbeats. Provided there are various values of MSV for each , an average of is computed as . Because is accompanied with ripples causing ambiguities, it is further approximated with a cubic polynomial using function CubicFit. The derivative applied to smoothed while solving the optimization problem (12) yields .

2.2.5. Denoising Algorithm for ECG Signals

Provided there are and , the UFIR smoothing algorithm can be designed for ECG signals with a pseudocode listed as Algorithm 3. In this algorithm, function smoothingUFIR is applied to different ECG signals parts with different horizons.

Data:  , , ,
Result:  
1:
2: for   to   do
3:  () = (, , )
4: end for
5: IntervalQRS
6:
7:
8:
9:
10:
11: ()

Five parts of the ECG signal are recognized by function smoothingUFIR over points , , , and :In Figure 2, the first and fifth parts are defined by (16) and (20), respectively, to apply . The third part represents an estimate, which is equal to the original ECG signal without noise reduction on . The adaptive horizon is applied to (17) and (19). Here, is decreased from to with a one-time step in the QRS complex region. Beyond the QRS complex, is gradually increased from to with a one-time step. Finally, function Cat provides the ECG signal estimate at the last fifth part.

2.2.6. UFIR-Based Algorithm for Features Extraction

Provided there is denoising by Algorithm 3, in this section we develop an efficient computation algorithm for ECG signal features extraction. To this end, we first localize special points on the ECG heartbeat pulse and then compute relevant amplitudes, durations, and an angle. Unlike the approaches developed in [15, 56, 57], this algorithm is based on the -shift and -order UFIR smoothing filter exploited with and . It was found out for data used that suites smooth parts of the discrete ECG signal and fits the QRS complex. Note that and must be specified for each of the measured ECG signals.

Step-by-step events representing the strategy of ECG signal denoising and features extraction are shown in Figure 3. The original discrete-time ECG signal (a) is smoothed as (b) using Algorithm 3. Then the ECG signal features are extracted as in the following:(i)Figure 3(c): the peak value (ECG signal maximum) is estimated as and a window is introduced with two points, and . The estimate of Q is found as the least in the interval between and . The estimate of S is found as the least between and .(ii)Figure 3(d): provided there are , , and , the QRS complex is suppressed to save only P and T waves. Then the estimates of and of are obtained similarly by suppressing one of the waves.(iii)Figure 3(e): provided there is , the P wave is split into two segments, and , where is extended from the initial point to . In segment , we apply the derivative. Next, we consider a small section of the resulting signal and find a global maximum. We consider it as a start point of P wave and call it . In segment , we also apply the derivative, consider a small portion of the resulting signal, and find a global minimum. This minimum, which corresponds to the end of P wave, is called . Values of and are located at points (Although and are omitted in Figure 3, their values represent the temporal line in the ECG signal. These points can be used to compute features of the duration and applied to , , , , , , and , which are described in Algorithm 4.) and , respectively. Then, the duration of P wave is computed as . A distance between and the baseline is calculated and called the wave amplitude.(iv)Figure 3(f): the QRS complex duration is obtained by the distance between points and . The QRS complex amplitude is provided by a distance between the baseline and .(v)Figure 3(g): similarly, points and are obtained for the T wave by splitting this wave into two segments, and .(vi)Figure 3(h): the ST-angle is computed bywhere and are vectors created from and . These values are localized in and . Vectors and have two components dependent on and . We consider a flat part, where and represent the origin zero point. We sum a temporal unity from the origin, obtain and , and rename as and as from plane. We then compute and and estimate via (18).

Data:  , , , ,
Result:  , , , , , , , , ,
1: Begin:
2: for   to   do
3:  =
4:  ()
5:  IntervalQRS
6:  (())
7:  (())
8:  (())
9:  (())
10:  ((: ()))
11:  ()
12:  ()
13:   = (())
14:  (())
15:  =()
16:  =(:())
17:  (())
18:  (())
19:  (:()) = ()
20:  ()
21:  ()
22:  ()
23:  
24:  
25:  
26:   = +
27:   =
28:  (, , , )
29:  
30: end for
2.2.7. Algorithm Design for Features Extraction of ECG Signals

A pseudocode of the algorithm designed to extract features of ECG signal is shown as Algorithm 4. Here, is the smoothed ECG signal represented as in Figure 3; is the number of heartbeats; is a variable, which represents the reference line; is the data sample frequency; is a value, which determines the window width to cover Q and S points (Figure 3). The algorithm output consists of estimates of the ECG signal features such as of P, of the P amplitude, of the P duration, of the QRS amplitude, of the QRS duration, of T, of the T amplitude, of the T duration, and of the ST angle . All these features are extracted from the smoothed signal .

The algorithm starts by computing as the ECG signal maximum, using function max. Function IntervalQRS is applied to compute points and . The variable determines the window width to cover the QRS complex and obtain and as two minima between points and . Function min is used to find the above-mentioned points. The supress function is used to suppress the QRS complex. Function max is used to estimate and . Function diff is introduced to compute the derivatives in the , , , and intervals. Functions max and min with function diff are used to find , , , ,   , and   . Provided the above-mentioned values are considered, the duration is estimated of P and T features. Function length is introduced to compute the signal length. The variable determines the reference line for computing the amplitude features. This variable is equal to . Function vector is used to provide vectors a and b based on , , , and . Finally, function arcos is used to compute an angle between vectors a and b. Note that all the above introduced functions are available from the authors by request.

2.3. Validation

Several methods have been proposed during decades for ECG signal features extraction. Among these methods, the Linear Predict approach proposed in [25] and developed by Martis [28] has been recognized as one of most efficient. The method employs the following model:in which is the original ECG pulse, is the estimator order, and is the linear prediction coefficient. The estimate is provided as a linear weighted combination of , . The residual erroris considered as the ECG signal fraction, which cannot be predicted. To compare with the UFIR filter, we will assign as suggested by Lin et al. [24].

The UFIR filter predicts estimates with and both the prediction estimator (17) and the UFIR predictive filter (1) employ discrete linear prediction of the undergoing process via its noisy data. Even so, there are some zones in the ECG picture where linear predictors are unsuccessful in extracting ECG features. Therefore, a comparative analysis of different methods developed in [3639] is required.

The real ECG data has unknown model and noise. A suitable metric is the concentration of error which is the difference between the estimate and measurement for different parts of ECG signals. The box plot allows giving indices related to the error dispersion and concentration. Moreover, a critical measure of denoising efficiency in any estimator is the MSE at its output. We provide the relevant study based on synthetic ECG signals generated using MATLAB. The ECG signal is contaminated by zero mean additive white Gaussian noise (AWGN) providing different SNR values.

The assessing performance for the features extraction is to analyse the concentration of the features seeing the effect of noise in the estimated features.

3. Results

3.1. Testing of Algorithms for Estimating and Denoising Algorithm

To test Algorithm 2 experimentally, we selected healthy heartbeats with 301 samples and estimated errors by allowing for (Figure 4(a)), (Figure 4(b)), (Figure 4(c)), and (Figure 4(d)).

As can be seen, behaves similarly for different degrees . It can also be observed that generally grows with and elevates to when . Particularly in Algorithm 3, an analysis of estimation errors produced by the 2-degree and 3-degree UFIR filters reveals no significant differences, except for the horizon length, which inherently grows with . This is explained by the fact that makes the noise power gain (NPG) of both filters equal [37]. The role of on the smoothing filter NPG has been studied by Shmaliy et al. in [37]. However, choosing reduces the computational complexity, while saving the estimation accuracy, and we accept as near optimal. Effect of on the estimation accuracy is illustrated in Figure 5.

3.2. Critical Evaluation of Denoising Algorithms

In Figure 6(a), we illustrate typical denoising errors produced by the predictive filter, filter, and smoothing filter, all having batch structures. A part of the ECG signal taken from is zoomed in Figure 6(b). The denoising errors are sketched in Figure 7.

As can be seen, all UFIR filters are successful in denoising with consistent errors. Even so, the UFIR smoothing filter does it more precisely while the predictive filter produces more errors. The medians of errors produced by the algorithms and represented with the dispersion are listed in Figure 7. This figure suggests that the UFIR smoothing filter outperforms both the UFIR filter and the standard linear predictor developed in [28] for ECG signals. An analysis of the signal-to-noise rations (SNRs) at the filters outputs will be provided next.

3.3. Effect of SNR on the Estimator MSE

The root MSEs (RMSEs) are shown in Figure 8 as functions of the SNR depicted in decibels at 18 discrete points with a step of . It follows that the UFIR smoothing filter outperforms other solutions in a wide range of SNR values. For , higher accuracy is achieved with a constant and, for , with an adaptive .

3.4. Applications to ECG Signals

Based upon the above developed UFIR-based approach, we now apply Algorithm 3 to the ECG signal database and extract special features depicted in Figure 1. The results obtained using the designed UFIR smoothing Algorithm 2 (UfirSmooth), UFIR predictive algorithm (Predictor UFIR), and basic linear predictor (Linear Predict) [25] are sketched in Figures 9 and 10. In these figures, 100 synthetic heartbeats are processed at each time index. This synthetic ECG signal is contaminated by AWGN at 35 dB with properties similar to the original data.

In Figure 11, we show dispersions and concentrations of the estimated features about their means. Shadowed areas represent features extracted by smoothing and it follows that the outputs of the filter and linear predictor are more vulnerable. Furthermore, noise dominates in the predictive filters outputs. This experiment was based on healthy records of MIT-Arrhythmia database (lead MLII) analysing 1000 heartbeats. Overall, the UFIR smoothing approach developed in this paper always produced better estimates than other linear methods considered.

4. Discussion

The purpose of this study is denoising the attached noise in ECG signals using a UFIR smoothing filter for features extraction. This work is focused on the morphological features extraction individual ECG signal processing with normal rhythm. A principal finding in applying the proposed method is the considerable reduction of noise with an optimum and adaptive horizon for real ECG data. This reduction contributes determining with better precision the features associated with the heartbeat.

From analysis of errors variability in real ECG signals and SNRs based on ECG synthetic data in different estimators has shown that the UFIR smoothing filter with adaptive horizon outperforms the linear predictor [2528] and other UFIR solutions such as the UFIR filter and UFIR predictive filter on MIT-BIH arrhythmia dataset. Let us notice again that the approaches based on linear prediction were recognized as standard for the ECG signal features extraction [28]. In this regard, better performance of the smoothing algorithm developed in this paper opens new horizons in achieving higher accuracy and reliability in detecting different kinds of heart diseases.

The UFIR smoothing filter performance was optimized by making the averaging horizon adaptive. Note that such an opportunity has not been used in the design of known linear predictors for ECG data. As a result, we have achieved the following improvements:(1)Suboptimal denoising of ECG signals with no requirements to noise, except for the zero mean assumption.(2)Unbiased filtering in the QRS region, in which the ECG signal demonstrates rapid excursions.

Such abilities of the UFIR smoothing filter have resulted in higher estimation accuracy, namely, in smaller variability of the estimated features around their mean values. In this regard, let us notice that larger variability in the standard linear predictor is due to larger errors and instability caused by unknown future data and errors in the determination of the predictor coefficients determined by the correlation method. Accordingly, errors in the determination of the prediction function lead to larger prediction errors (random and regular).

This has appeared to be particularly true for the and values, which are estimated by other methods with much larger errors. Estimates of and by different methods have appeared to be consistent, because these values are not affected by noise as much as other features. Nevertheless, the UFIR smoothing filter has demonstrated smaller errors even for . In the cases of both and angle , one watches for highly unstable estimates provided by the prediction-based filters, while the proposed UFIR smoothing filter has produced acceptable estimates. Also, it is important to clarify that the evaluation of features is analysed from the consistence of data near the average of the measurement. This is shown analysing the number of outliers. However, in this scenario, the quality features are not strictly analysed because the ECG signal used is just under normal conditions.

5. Conclusions

The UFIR smoothing filtering approach developed in this paper for ECG signals denoising and features extraction has demonstrated an ability to outperform the linear predictor-based one [25], which is recognized as one of the standard techniques for ECG signals. That has become possible by optimizing the order and averaging horizon for the UFIR smoothing filter in a way such that the horizon has become adaptive to different parts of ECG signals. A comparison of the UFIR predictive, filtering, and smoothing estimates has revealed a considerable difference in denoising in favor of the smoothing one. The results have also indicated that features extracted using the smoothing filter are more reliable and less prone to large deviations from average values. This is definitely an important advantage for medical needs. As a future work, we consider extracting features of ECG signals in discrete-time state-space by developing the fast iterative UFIR smoothing filtering algorithm and optimize it for different orders and kinds of heart diseases.

Appendix

A. Low-Degree UFIR Functions [37]

A.1. Ramp,

where

A.2. Quadratic,

where

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

We gratefully acknowledge the support and development of this research from the National Council for Science and Technology (Conacyt: Consejo Nacional de Ciencia y Tecnología) of the Mexican Federal Government. Also, we thank M.Sc Miguel Vazquez-Olguin for his support in the implementation and the bugging of the algorithms.