BioMed Research International

Volume 2018, Article ID 1868519, 26 pages

https://doi.org/10.1155/2018/1868519

## A Comparative Analysis of Methods for Evaluation of ECG Signal Quality after Compression

^{1}Department of Biomedical Engineering, The Faculty of Electrical Engineering and Communication, Brno University of Technology, Technická 12, 616 00 Brno, Czech Republic^{2}Institute of Scientific Instruments, The Czech Academy of Sciences, Královopolská 147, 612 64 Brno, Czech Republic

Correspondence should be addressed to Andrea Němcová; zc.rbtuv.ceef@aavocmen

Received 15 February 2018; Revised 4 June 2018; Accepted 27 June 2018; Published 18 July 2018

Academic Editor: Hesham H. Ali

Copyright © 2018 Andrea Němcová et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

The assessment of ECG signal quality after compression is an essential part of the compression process. Compression facilitates the signal archiving, speeds up signal transmission, and reduces the energy consumption. Conversely, lossy compression distorts the signals. Therefore, it is necessary to express the compression performance through both compression efficiency and signal quality. This paper provides an overview of objective algorithms for the assessment of both ECG signal quality after compression and compression efficiency. In this area, there is a lack of standardization, and there is no extensive review as such. 40 methods were tested in terms of their suitability for quality assessment. For this purpose, the whole CSE database was used. The tested signals were compressed using an algorithm based on SPIHT with varying efficiency. As a reference, compressed signals were manually assessed by two experts and classified into three quality groups. Owing to the experts’ classification, we determined corresponding ranges of selected quality evaluation methods’ values. The suitability of the methods for quality assessment was evaluated based on five criteria. For the assessment of ECG signal quality after compression, we recommend using a combination of these methods: PSim SDNN, QS, SNR1, MSE, PRDN1, MAX, STDERR, and WEDD SWT.

#### 1. Introduction

The evaluation of the quality of electrocardiogram (ECG) after compression is an essential part of compression in the broadest sense. Compression reduces the amount of data, which facilitates signal archiving, speeds up signal transmission (especially important in telemedicine), and reduces energy consumption. On the other hand, compression usually results in loss of signal quality. This arises in the case of lossy compression, which is the most commonly used technique because of its high efficiency. Indeed, while the quality of the signal after* lossless *compression is preserved, the efficiency is low. The aim of compression is to maximize the reduction of data amount while preserving the quality (diagnostic information). This naturally results in a compromise between efficiency and quality. It is thus desirable to express the compression performance through both efficiency and quality to avoid misunderstanding [1].

For the subsequent ECG analysis, the compressed signal should be of sufficient quality to ensure the avoidance of misdiagnosis. The quality can vary according to the aim of the analysis (e.g., specifying QRS complex morphology, tracking ST segment changes, and determining heart rate). For example, if we want to determine heart rate only, we can do this from a signal with lower quality and can therefore compress the signal much more than in case of, for example, tracking ST segment changes. Generally speaking, the quality of the signal after compression should be quantified to decide whether the signal is appropriate for further specific analysis or not [2]. For this purpose, it is advantageous to use automatized methods. These methods will facilitate the work of cardiologists and other medical staff since they will not have to deal with whether the signal is of sufficient quality or not [3]. This can save time [3], especially for the staff employed in telemedicine, where large amounts of data are transmitted and analysed. Quality indexes (the products of automatized algorithms) are also used for compression control [4]. If the signal is of low quality, it is compressed again with different settings. While existing literature presents various approaches (indexes and algorithms), there is no existing standardization or unification [3].

ECG compression is not yet commonly used in practice. This is because of the lack of reliable methods for an evaluation of signal quality after compression [5]. Indeed, the evaluation of the quality of ECG signal after compression is still an open and challenging problem [6]. One of the barriers here relates to the fact that various compression algorithms can cause various types of distortion (more or less important). This can be a major problem, especially in the case of evaluating algorithms without diagnostic information (e.g., Percentage Root mean square Difference (PRD)). Here, for one compression algorithm the achieved PRD can be, for example, 5 % and the important parts of the ECG signal are distorted (the compression algorithm for example distorts the ST segment). For another compression algorithm, the achieved PRD could be, for example, 10 % on the same ECG signal. But in this case mainly the noise is reduced and diagnostically important parts are not distorted. In other words, reduction of noise (as a secondary feature of good compression algorithms) causes the increase of PRD, but diagnosing will be simpler [7].

The main aims of this work are to create a review of the methods for the evaluation of signal quality after compression and to create recommendations regarding which quality metrics are the most suitable and what their threshold values must be to ensure that the signal is of sufficient quality. For the purpose of testing various quality measures, the signals from the second most cited standard database of ECG signals [8], the Common Standard in Electrocardiography (CSE) database [9], are used. These signals are compressed using an advanced and very popular wavelet-based Set Partitioning in Hierarchical Trees (SPIHT) algorithm [10, 11].

All the known (to the best of our knowledge) methods for the assessment of ECG signal quality after compression are described in Section 3 of this review along with their relative popularity (according to the number of citations). These methods were tested in terms of their suitability for the evaluation of the quality of ECG signal after compression. A description of the testing and its results including recommendations for the evaluation of ECG signal quality after compression are introduced in Sections 4 and 5, respectively. However, the efficiency of compression should be evaluated first, because this interacts with quality of compression. For this purpose, several metrics are used and are briefly described in Section 2.

#### 2. Known Methods for Evaluation of Compression Efficiency

As mentioned above, compression results in a compromise between efficiency and quality. Hence, it is necessary to calculate both of them [12]. Three main algorithms are used for the evaluation of compression efficiency: compression ratio (CR); compression factor (CF) and average value length (avL). However, the existing literature is not consistent in the use of the terms CR and CF. In some sources, the authors use the term CR for (3) rather than CF ([5, 6, 13–21]), while in others ([22]), the CR is defined differently, that is, (2). Equations (1) and (3) below come from [23]. Using these equations, CR should be less than 1 and CF greater than 1 in order to reach compression. CF is the reciprocal value of CR.Equation (3) is also valid for another two similar methods, the sample reduction ratio (SRR) [6] and sample compression ratio (SCR) [24]. These methods differ from CF only in the fact that the numerator as well as denominator are expressed in samples (not bits). In [25], the bit compression ratio (BCR) is described, which is CF with both numerator and denominator which are expressed in bits (ideal case).

Data volume saving (DS) [26] is the measure which uses CF. This is shown in (4) An alternative to CR and CF is avL [5] or average code length (ACL) [21]. These are two terms for one method shown in (5). It informs us about the number of bits that are used for coding one sample. From this the unit bits per sample (bps) follows. CF can be calculated from avL as a ratio between the original resolution of the signal (in bps) and the avL [27].

Another method similar to avL is called compressed data rate (CDR) [6, 13, 20]. This method can be calculated using different variables (as shown in (6) and (7)) and informs us how many bits are needed for coding one second of signal. Resolution in (7) expresses number of bits per sample of the original signal and CF is calculated in bits. The unit has the same abbreviation as avL (bps) but means bits per second (therefore it is not abbreviated here). Bit rate [5, 21] is very similar to CDR, but it does not work with sampling rate, as is shown in (8).The variables “size of the input stream” and “size of the output stream” are usually not clear, while in an ideal case, it is the bit size (this means number of bits before/after signal compression). In other instances, it can be the length of the signals (number of samples) or the size of the output file (depends on file format, e.g., .txt, .mat, .zip, and its compression algorithm if it exists). Therefore, we use avL in this work, because it has clear units. The avL method is, moreover, simply comparable with the original bit resolution of the signal (usually 8-16 bps [6, 28–30] depending on the recording device).

#### 3. Known Methods for Evaluation of the ECG Signal Quality after Compression

For the evaluation of ECG signal quality after compression and reconstruction, various methods are used. These methods can be divided into two main groups: subjective methods and objective methods. Subjective methods are based on the assessment of ECG signal quality by cardiologists or other experts and are described in Section 3.1, while objective methods are further divided into the following: methods without diagnostic information, methods with diagnostic information based on wavelet transform (WT), and methods with diagnostic information based on delineation. These three groups of methods are defined in Sections 3.2.1, 3.2.2, and 3.2.3, respectively. While the latter two groups of methods noted above both provide diagnostic information, their principles are completely different and are therefore described separately. There also exists at least one method for multilead ECG signal quality evaluation, which is briefly described in Section 3.2.4. In Section 3.2.5 single-lead ECG quality assessment methods are briefly mentioned. A search involving all these methods was conducted through Scopus and they were subsequently sorted according to their popularity. In Section 3.2.6, the ten most commonly used methods are listed. In this review, we have unified the style of presenting the equations for the individual methods to minimise the reader’s work.

##### 3.1. Subjective Methods

The subjective methods for ECG quality evaluation are medically accepted [31], unlike the majority of the objective methods. However, a subjective evaluation of ECG signal after compression requires the input of experts or specialist cardiologists. Thus, the main disadvantage of these methods is that they require the time of such individuals and the attendant financial costs. Moreover, they can only be performed offline. There also exist the problems of intraobserver (one cardiologist can evaluate the signal differently at different times) and interobserver (two cardiologists can diagnose the same signal differently) variability in diagnosis [32]. The factors that can influence this include knowledge level, work experience, practices (procedures) of the particular clinic, motivation, mental fatigue, and psychological state.

The direct method involves the evaluation of the quality of the reconstructed signal visually [3, 33] by a cardiologist or a holter technician. This method can be used as a reference for objective methods. Several sources ([5, 13]) are stricter and insist that the quality should always be verified by a cardiologist. Indeed, these authors [13] suggest that even if the signal is appropriate for further analysis in terms of objective methods, it should be evaluated subjectively by a physician.

The second subjective method is based on the difference in diagnosis from the original and compressed ECG [32, 34]. However, the gold standard of the methods for the evaluation of the ECG signal quality after compression is the Mean Opinion Score (MOS) test [2, 3, 28, 31, 35, 36]. Its output is , which informs us directly about the diagnostic distortion (in %). The MOS test consists of blind and semiblind tests. In [35] the MOS test was completed by three cardiologists, while in [30] three cardiologists and three researchers participated. In the blind test, they evaluated the general quality score (from 1, very bad, to 5, excellent) and the interpretation of P wave, QRS complex, T wave, ST segment, and abnormal beats (scales 1-8). In the semiblind test, they assessed the similarity between the original and reconstructed signals (from 1, completely different, to 5, identical) and binary evaluated whether they were diagnosed differently without access to the details of the original signal (for more detailed information on this see [35]). In [37], the authors defined four signal quality groups according to : very good (0-15 %), good (15-35 %), not good (35-50 %), and bad (> 50 %).

##### 3.2. Objective Methods

Objective methods are based on mathematical equations and have no need for expert human interaction (except for the developing phase of some algorithms, where the cardiologist can, e.g., set the weighting matrix or select appropriate features). These methods are automatized and save time and the costs of the cardiologists. By using these methods, we also avoid the intraobserver and interobserver variability. While the majority of these methods can work online or with some buffer, they should be carefully selected based on their performance, if they are to be used instead of subjective methods. Indeed, as explained below, not all the methods are suitable for ECG quality assessment. Generally speaking, they are based on various principles and can be calculated for the whole signal or in a time window. Some of the methods also include diagnostic information, while these methods are more complex and nontrivial. Some also require a delineation algorithm to assess the diagnostic quality. However, no perfect, universal delineation algorithm currently exists, which can lead to inaccurate values of diagnostic distortion. The objective method is regarded as “good”, provided it corresponds with the subjective evaluation of the cardiologist [28]. In this chapter, more than 40 objective methods are described.

###### 3.2.1. Without Diagnostic Information

Subjective methods without diagnostic information are easy to compute and are therefore popular (they are cited 619 times according to Scopus). Many authors use these methods, meaning new results can be compared with theirs. However, they do not have such high informative value as methods* with *diagnostic information. In fact, their values can be more or less dependent on the voltage of the signal. To avoid this dependence on voltage, normalization is carried out [2]. The baseline fluctuations influence the output value of these methods [35], and, ideally, they should be removed [36]. Meanwhile, some methods are dependent on the noise level [13]. While the equations are identical for all the various parts of the ECG signal, not all of them have equal importance from a diagnostic point of view [35]. For example, according to [32], higher distortion of the QRS complex can be less important than lower distortion of the baseline, where the problem with P wave detection can arise. The correlation between the objective methods without diagnostic information and diagnostic distortion is quite weak [28], which is the main disadvantage of these methods.

Error signal [1, 16] is probably the simplest way to compare the original and reconstructed signal (numerically or visually).where is the original signal, is the reconstructed signal, and is the index of each sample of the signal of length .

In [38] a similar measure is published. It is called Local Absolute Error (LAE) and applies absolute value on (9).

Mean Square Error (MSE) [2, 6, 13, 28] is computed according to and its normalized version MSE (NMSE) [13, 28] according to Root Mean Square Error abbreviated as RMS in [3, 28, 35, 39] or abbreviated as RMSE in [2, 13, 38, 40] is mathematically described by (12). In [20, 41] the equation of RMS differs in terms of subtracting 1 in denominator (13).The advantage of this method is preserving of original unit (millivolts) [40]. Using RMS as a quality measure for control of the compression is supposedly more effective than using PRD [40].

Normalized version of RMS (NRMSE) [2, 13, 28] is shown in (14). NRMSE is almost identical with following PRD except for the multiplication of 100 % (in case of PRD).The Percentage Root mean square Difference (PRD) [3, 5, 10, 16–21, 26, 28, 33, 35, 39, 41–43] or Percentile Root Mean Square Difference (PRMSD) [38] takes into account mean of the signal (DC component) and the offset (constant value which is added to signal for storing purposes; e.g., 1,024 for MIT-BIH Arrhythmia Database [13, 44, 45]. Both methods have the same equation:It is evident that PRD and NRMSE differ only in terms of multiplying the former by 100, which means the units of PRD are a percentage. For further analysis, the NRMSE is redundant. If the signal has a DC component and/or the nonzero offset and the PRD is calculated, its value will be artificially lower [36]. PRD will be also lower in the case of high standard deviation of the signal [2]. In [40] it is shown that the PRD does not correspond to the error signal. Therefore, the normalized version is used.

PRD has three normalized versions (PRDN) [2, 5, 13, 15, 18–20, 24, 28, 35, 36, 41–43, 46, 47] that should be used if the signal has a nonzero DC component and/or an added offset (see (16), (17), and (18)).where is the mean of the original signal (the DC component) and is the offset. If these components are subtracted from the signal correctly, the results of PRD and PRDN are the same. It is very important to distinguish between PRD and PRDN. Many authors do not define the type of PRD that they use and/or do not mention whether the DC component and offset were removed. Therefore, it is not possible to compare the performance of such compression algorithms properly [43]. PRDN has higher value than PRD if the signals contain DC component and/or an offset [28]. After removing the offset, the DC component is still nonzero [1]. Therefore, the PRDN1 measure is the correct one [1], because it eliminates both in one step. Noise and DC component have no diagnostic meaning [6].

According to [13, 35, 42], signals of “very good” and “good” quality have PRDN1 and PRDN2 less than 9 % for specific compression algorithms. The specific threshold value of PRDN depends on the principle of compression.

PRD or PRDN can be also calculated for each subband of signal after wavelet transform [3]. According to [12], another variant of PRD (here marked as PRDT) can be calculated using (19). The difference is that PRDT uses coefficients of the wavelet transform instead of the samples of the original signal.where* q* are the indexes of all wavelet coefficients, is the set of indexes of the most significant coefficients left after compression, and are the transform coefficients.

The Moving Average PRD (MAPRD) [12] was created as a local measure. MAPRD calculates the amount of distortion in a sliding window. Equation (20) is for one window of length .Signal to Noise Ratio (SNR) [2, 3, 19, 20, 28, 35, 41] corresponds to (21). Noise is here understood as a difference between original and reconstructed signal (error in (9)).SNR can be also computed with use of PRD.SNR is more accurate than PRD and PRDN measures when compared with MOS [28].

Peak Signal to Noise Ratio (PSNR) [18] is shown in Maximum Amplitude Error (abbreviated as MAX [3, 35, 40] or MAE [38]), Peak Error (PE) [1–3, 13, 28, 35], Maximum Absolute Error [13], or Maximum Error (MaxErr) [15] is one measure, which informs us about local distortion of the signal and is usually calculated separately for each cycle [28] using It is possible to calculate MAX for the whole signal as a mean value of MAX in each cycle [28]. MAX can be also modified by weighting the error signal [28]. Every sample of the error signal is weighted by the absolute value or the energy value of the original sample.

Normalized Maximum Amplitude Error (NMAE) [43], in some sources ([2, 13, 28]) called NMAX, is shown in (25). It informs us about maximal distortion in the signal (maximally distorted sample). In some studies, (e.g., [38]), the equation does not include 100, and its units are not a percentage. Normalization lies in dividing the numerator by the difference between the maximum of and the minimum of .Peak Amplitude Related Error (PARE) [38] is a normalized method and its product is the error signal (not only one number) as can be seen in Standard Error (S.E.) [13], STDERR [2, 28], or Standard deviation of Errors (StdErr) [15] is one method defined by two different equations. According to [2], the equation of StdErr is identical as (13) of RMS2. The authors in [13, 28] express the STDERR by where* e* is the difference between the original and the reconstructed signal and is the mean value of* e*.

Cross Correlation (CC) [28, 47] or Normalized Cross Correlation (NCC) [2, 13] is defined according to where is the mean value of the reconstructed signal. To set the record straight, (28) of CC1 is incorrect. The right form is (29) [3, 14, 43, 47].Percentage area difference (PAD) [2, 24, 43] is shown in where and are the times of the beginning and the end of the segment of interest.

Quality coefficient (*κ*) [48] is introduced in The same source [48] introduces another similar measure: method of averaged interval. The quality coefficient *κ* is computed for the intervals of required length (the authors use the length of 25 samples); the result is their average.

The method angle between two vectors (*α*) [48], shown in (32), is based on the fact that the dot product of orthogonal signals is zero.Quality score (QS) [5, 19, 20, 41] is a combination of two methods: CF as an efficiency measure and PRD as a measure of quality (see (33)).QS is suitable for comparison of signals with various CF and PRD. The greater the QS is, the better the compression is.

###### 3.2.2. With Diagnostic Information Based on WT

These methods reflect the diagnostic information contained in the ECG signal. They inform about the distortion of, e.g., P wave, QRS complex, or T wave.

Percentage Error (PE) in [3] can be calculated according to (34) from the wavelet coefficients and of the original and reconstructed signal, respectively. Index is the -th subband of WT.Wavelet-based Weighted PRD (WWPRD) [3, 28] is a method based on wavelet transform and weighting. The signal is decomposed into subbands using the wavelet transform (9-7 biorthogonal wavelet). The number of levels of WT is based on sampling frequency (for details see [3]). Then, the PRD is calculated for each subband similarly to (15); the only difference is in using wavelet subbands instead of the original signal. The use of nonnormalized PRD is relevant, because the means of the original and the reconstructed signal were subtracted before. There exist two types of weights: heuristically set (, and calculated as a Wavelet Subband Normalized Area (WWPRD WSNA). The second type of weights, WSNA, takes into consideration the amplitudes and shapes of the signal components. They are calculated as a sum of the wavelet coefficients in the respective subband divided by the sum of all wavelet coefficients (in all subbands). Here, we will consider only WWPRD WSNA (shortly only WWPRD), because the weights can be precisely calculated. The WWPRD value is calculated as a sum of the weighted PRDs calculated in individual subbands. According to [3], this method outperforms PRD, PRDN1, SNR, PE, CC, and RMS in terms of accuracy/uncertainty (in comparison with MOS). However, the tables and graphs [3] show that the CC has even higher accuracy than WWPRD, according to the provided statistical analyses. Indeed, WWPRD can be affected by baseline wandering [39]; therefore it should be eliminated. Based on cardiologists’ verification of compressed signals, the authors of [39] recommend compressing ECG signals with a WWPRD under 10 %.

Wavelet-Energy based Weighted PRD (WEWPRD) [4] and Wavelet-Energy based Diagnostic Distortion (WEDD) [28, 30, 46] are two names for one method based on WT and weighting (from now abbreviated as WEDD). The signal is decomposed into subbands using the WT. Based on the knowledge of the energy contribution of each frequency subband, the weight for each subband is calculated. Next, the PRD is calculated for each subband similarly to (15); the only difference is in using wavelet subbands instead of the original signal. The WEDD for each subband is then calculated as a product of its PRD and weight. The final WEDD for one ECG signal is then obtained as a sum of the WEDDs of all subbands. WEDD was used for control of the SPIHT compression algorithm [4]. It optimizes the rate-distortion performance better than PRDN and WWPRD [46]. WEDD is also robust to the presence of noise in the signals, while it is sensitive to any distortion of P waves, T waves, and QRS complex [28, 46]. Overall, The WEDD algorithm outperforms PRD and WWPRD [46].

Based on an adjusted MOS method, five quality groups of the signal were determined [3]: excellent; very good; good; not bad; and bad. Based on the values of WWPRD [3], PRD [3], and WEDD [28] it can be predicted in which group the ECG signal belongs. The highest mean correct prediction value (95 %) and the lowest normalized prediction error (0.6876 %) have the WEDD measure [28]. In other words, by using the WEDD measure, the signal can be classified into one of the five quality groups with the lowest error among all available methods.

Multiscale Entropy-based Weighted PRD (MSEWPRD) [31] is the newest alternative to WWPRD and WEDD. It is also based on decomposition of the signal using the WT and weighting. The procedure of decomposition and PRD calculation is identical to that of both the previous methods. The innovation here lies in the different calculation of weights, which is based on multiscale entropy calculated in each subband. There exist three methods for weight estimation: WSNA; Relative Wavelet Subband Energy estimation (RWSE); and Relative Mean Wavelet Subband Energy estimation (RMWSE). RWSE enhances lower subbands (higher energy) and while RMWSE also enhances lower subbands, it also suppresses higher subbands. Meanwhile, MSEWPRD using RMWSE results in the highest correlation with the subjective measure MOS among the methods of PRD, WWPRD, WEDD, MSEWPRD WSNA, MSEWPRD RWSE, and MSEWPRD RMWSE [31]. Therefore, MSEWPRD is appropriate for quality evaluation of noisy ECG signals.

###### 3.2.3. With Diagnostic Information Based on Delineation

The methods with diagnostic information based on delineation have the most predictive value. However, their disadvantage is the computational complexity and, for some, the presence of a cardiologist while developing and setting the algorithm (e.g., for feature selection or weights setting). The accuracy of these methods depends on the accuracy of the delineation algorithms. It is thus necessary to use accurate and robust delineation algorithms.

Weighted PRD (WPRD) [2, 14] is an improved version of PRD and includes diagnostic information. As shown in (35), WPRD is the sum of separately calculated distortions of P wave, Q wave, QRS complex, and ST segment. Furthermore, each distortion is weighted in terms of importance of the wave or complex. The weights should be determined by a cardiologist. The accuracy of the WPRD depends on the quality of delineation [28]. where are the weights, is the RMSE of current wave/complex/segment, and is the power of the original signal ( is called PRD in [14]).

The Clinical Distortion Index (CDI) [3, 49] is based on features extraction and the comparison between the original and the reconstructed signal (see (36)). For the purpose of CDI calculation, 12 features were used among durations, amplitudes, and morphology. The features were weighted according to their clinical importance (see (37)). where* k* is the index of the heartbeat,* m* is the feature index (*, *where* M* is the number of clinical features),* V*^{ref} is the reference value for each feature, and and are the specific clinical features in the particular beat of original and reconstructed signal, respectively. The values of* V*^{ref} are determined by cardiologists and are stated in [49]. Meanwhile,* d* is the features vector,* E* is a diagonal weighting matrix (in [3, 49] the identity matrix is used), and tr (trace) is the sum of the elements on the main diagonal of the matrix.

For the WDD estimation [28, 30, 35, 36, 43, 50, 51], it is first necessary to delineate both original and reconstructed signal. Using the delineated points, 18 features among locations, durations, amplitudes, and the shapes of waves and complexes of the ECG signal are extracted and WDD is calculated according to where *β* is the diagnostic features of the original signal, is the diagnostic features of the reconstructed signal,* Δ β* is the normalized difference vector, and

*Λ*is a diagonal matrix of weights. The equation exists for a calculation of differences (distances) of durations and amplitudes [35]. Calculation of the shape features differences is based on a penalty matrix that is constructed with the use of a database of possible shapes created by a cardiologist. This method is the most complex one from all mentioned in this paper and more detailed information can be found in [35, 50]. The WDD correlates well with visual inspection [28] and also with MOS, more so, in fact, than PRD [35, 50]. On the other hand, the weights were set by noncardiologist for the purpose of study [35]. As is written in [35], the weights should reflect the clinical importance of used features in real world. Therefore, we suppose that, to reach the highest objectivity of the method, it requires the cooperation of cardiologist (to set the weights, which are clinically relevant) and is therefore quite expensive and time consuming (at least at the beginning).

Average absolute error (AAE) [47] is a method based on extraction of ten features among amplitudes, durations, and slope. Initially, the features are extracted in both original and reconstructed signal before the error for each feature within each cycle* k* is calculated:At the end of the process, the errors of all features are averaged out for the whole signal. AAE was used for control of ECG signal compression based on discrete sinc interpolation [47].

The method based on heartbeats classification using multilayer perceptron neural networks (NN) [5] also belongs in this group of methods. Here, it is first necessary to segment the ECG signal to individual heartbeats using the R-wave detection algorithm. The heartbeats are further classified into eight groups (the eight most common types of heartbeats). NN is then trained on the original signal and tested on the reconstructed signal.

Another method for quality assessment of ECG signal after compression based on the sensitivity (SE) and specificity (SP) of QRS detection [26, 52] is introduced by (40) and (41). The authors use the equation for positive predictivity (+P) for calculation of specificity. The QRS complexes were detected in the reconstructed signal and their positions were compared with annotations. A tolerance of 88 ms on both sides was considered [52]. In [25] the authors use SE and +P with correct equations. where* TP* (true positives) are correctly detected QRS complexes,* FN* (false negatives) are QRS complexes that were not detected, and* FP* (false positives) are QRS complexes that are incorrectly detected (according to annotations).

The percentage similarity (PSim) [26, 52] is a measure based on features derived from detected QRS complexes. Initially, the QRS complexes are detected, then the features (*p*) are calculated, mean normal-to-normal (NN), standard deviation of NN (SDNN), low-frequency/high-frequency (LF/HF) ratio using Lomb periodogram to compute the power spectral density for low frequencies (0.04-0.15Hz), and high frequencies (0.15-0.4Hz), and high-frequency (HF) power. A comparison of the features extracted from the original signal with features derived from reconstructed signal is performed according to Similarity [26] is a method that also uses the detection of QRS complexes. The complexes are detected in the reconstructed signal and then compared with annotations of QRS complexes (e.g., from the standard databases) according to (43). From the corresponding article, it is not clear whether the authors consider only positions or both positions and values (amplitudes).Heart rate trace (HRT) [26] is a calculation of heart rate in beats per minute (bpm) according to where* fs* is the sampling frequency of the signal and* beatIntervals* is the length of the RR interval in samples.

Detection of five ECG significant points is a basis of the method published in [7]. Here, the original signal and then the reconstructed signal are delineated. Then, the positions of significant points in both the original and the reconstructed signal are compared with annotated positions (considering tolerance). The method was tested on signals from the CSE database compressed with the SPIHT algorithm and the authors stated that the minimum acceptable avL was 0.8 bps, with PRDN at around 5 %.

Dynamic time warping (DTW) [53, 54] is a method that allows the aligning (warping) of two signals to reach the same length. If DTW is applied on both the original and the reconstructed signal, the fiducial points of the original signal should match the fiducial points in the reconstructed signal (considering tolerance). If they do, the reconstructed signal is of high quality and the diagnostic information is preserved. However, if the fiducial points do not match, the signal is distorted. In order to find the fiducial points, delineation algorithms are used. According to [53], this method provides similar information to that of a cardiologist. In fact, it states how the positions of the fiducial points differ, on average, in the original and reconstructed signals and what their standard deviation is. However, the method has not been described in detail.

Partial PRD [53, 54] is a method calculating PRD separately in diagnostically important segments of the ECG signal (PQRST complex, from the P onset to the T offset) and diagnostically unimportant segments between PQRST complexes (from the T offset to the P onset). The distortion in PQRST segments should be as low as possible, while the distortion of interbeat segments can be higher. The authors of this method used annotations of P onset and T offset, while in terms of testing the method, they used signals from a fully annotated QT Database.

###### 3.2.4. Methods Developed for Multilead ECG

The objective methods described above were primarily designed for single-lead ECG. For multilead ECG (MECG), indexes such as PRD, MSE, RMSE, WEDD, or WDD can be applied separately for each lead [2, 30]. To express the distortion of MECG with one single figure, the average value of the following measures along all leads can be calculated: multichannel PRD (MPRD); multichannel MSE (MMSE); multichannel RMSE (MRMSE); and multichannel WEDD (MWEDD) [2]. There also exist methods that were developed specifically for MECG.

MSD diagnostic measure is based on multivariate sample entropy (MSampEn), which is an alternative to single-lead sample entropy [2]:where* e*_{o} and* e*_{r} are the MSampEn values for original and reconstructed signal, respectively. The calculation of MSampEn is not trivial and it is explained in detail in [55].

###### 3.2.5. Single-Lead ECG Quality Assessment Methods

To make the picture complete, there also exist methods, which can assess the quality (clinical acceptability) of single ECG signal; it means without any reference (such as original signal in case of compression). The review of these methods is in [56, 57] which is an example of one of the latest methods. The signal is very often corrupted with some noise and artefacts. This fact can, e.g., make the diagnosing more difficult or even inaccurate and decrease the accuracy of detectors and delineation algorithms. It is good to know the quality of the signal (it is most often categorized into two groups, acceptable and unacceptable [56]). These methods are not directly connected with compression, but they can be utilized in this area. If the signal or its part is unacceptable, it is discarded and the compression and transmission from wearable sensors are not provided. Another possibility is to set compression algorithm adaptively based on the knowledge about its quality [56].

###### 3.2.6. Popularity of the Methods

The popularity of the methods for evaluation of ECG signal quality after compression was ascertained using Scopus. A search on articles that used specific methods was initiated using keywords and Boolean operators. In all cases, the keywords “ECG” and “compression” were used with the Boolean operator AND. Simultaneously, the whole name or abbreviation(s) of the quality evaluation method were used with the Boolean operator OR. One example of our use of keywords and Boolean operators is* TITLE-ABS-KEY (“ecg” AND “compression”) AND ALL (“weighted diagnostic distortion” OR “wdd”)*. The results of the search were manually corrected since some of the articles were deemed irrelevant. The ten most commonly used methods found are shown in Table 1.