Abstract

The joint time-frequency analysis method represents a signal in both time and frequency. Thus, it provides more information compared to other one-dimensional methods. Several researchers recently used time-frequency methods such as the wavelet transform, short-time Fourier transform, empirical mode decomposition and reported impressive results in various electrophysiological studies. The current review provides comprehensive knowledge about different time-frequency methods and their applications in various ECG-based analyses. Typical applications include ECG signal denoising, arrhythmia detection, sleep apnea detection, biometric identification, emotion detection, and driver drowsiness detection. The paper also discusses the limitations of these methods. The review will form a reference for future researchers willing to conduct research in the same field.

1. Introduction

The electrocardiogram (ECG) signal has been an indicator of human health. It is the graphical representation of the electrical activity of the heart muscles occurring due to their contraction and relaxation [1]. A single cardiac cycle is labeled using different waves: P, Q, R, S, and T. The location and amplitudes of these waves are used primarily in ECG analysis during medical practices. It helps to predict the onset of cardiovascular diseases, irregularities in heart rhythm, stress levels, human emotions, and so on. A standardized ECG signal is represented via twelve leads, each calculated using a set of limb and chest leads. Conventionally, the ECG waves were visually observed and analyzed by an expert. The evaluation includes detecting any subtle change in the time series information that takes in morphological details such as the RR interval, QT segment, ST segment, QRS complex, and so on [2], and their statistical variations. Unfortunately, it is not always possible to track the minute changes in the morphological parameters (intervals, peaks, and waves) of the ECG signal.

The ECG signal is nonstationary; i.e., the statistical properties of the signal, such as mean, variance, and higher-order moments, change with time. A nonstationary time series of data contains systematic noise (trends, jumps, and datum shifts) that may change its statistical values. Hence, the time series data analysis is not enough for a meaningful interpretation. Also, the employment of traditional signal processing methods based on stationary assumptions is insufficient. Therefore, the decomposition of the time-series data into another domain, frequency or time-frequency, is used for easy analysis [3]. Fourier transform (FT) is the most widely employed method for frequency analysis. The technique uses the sinusoidal basis function to represent a time series signal in the frequency domain. The amplitudes of the measured sinusoids at different frequencies form a spectrum. It is one of the transformation methods that has changed the world of signal processing and have diverse application in feature extraction, denoising, and so on. However, FT does not have any information in the time domain.

Joint time-frequency analysis is a valuable method that expresses a signal in the time-frequency distribution [4]. It helps disclose the constituent frequency component of the signals and their time-varying nature. Several time-frequency analysis methods have been proposed to analyze ECG signals in various application domains. These methods include but are not limited to the short-time Fourier transform (STFT), continuous wavelet transforms (CWT), discrete wavelet transforms (DWT), empirical mode decomposition (EMD), and Wigner–Ville distribution (WVD) [5, 6], and so on. These methods help extract the vital signal components such as distortions, noises, and hidden patterns of the ECG waves and have been extensively used in various applications. Also, these methods form the base of several advanced joint time-frequency techniques. Typical examples are arrhythmia detection, heart disease diagnosis, peak detection, signal denoising, and emotion detection [79].

Despite the more inclusive application of the joint time-frequency analysis, it is unfortunate that no dedicated review is found in the literature that discusses different time-frequency methods for the ECG application. The reason may be that the time-frequency methods are a massive field with various possible applications. Hence, placing a vast amount of information in a single review is not easy. However, based on our limited knowledge, we have attempted to extensively review some selected time-frequency methods and their use in various ECG signal processing applications in this article (Figure 1). The current paper is organized into four different sections. Section 2 gives background information on the time-frequency methods. The usefulness of these time-frequency methods in various ECG applications has been discussed in Section 3. Section 4 deliberates the limitations, challenges, and future scope, followed by Section 5, concluding the study. Table 1 contains the list of abbreviations used in this article.

2. Background Information of the Time-Frequency Analysis Methods

The time-domain analysis gives the best time resolution but no frequency information. Consequently, the frequency domain analysis provides the best frequency resolution without time-related details. A proper time-frequency technique can overcome the disadvantage of one-dimensional analysis and provide signal information in the time and frequency domain. Some of the most widely used time-frequency analysis methods have been discussed in this section.

2.1. Short-Time Fourier Transform

In 1946, D. Gabor [10], a Hungarian scientist, proposed the short-time Fourier transform (STFT). In STFT, the Fourier transform (FT) is applied for a limited duration. The process follows a segmented analysis where the original signal is first divided into smaller segments of length “L” using a window. The Fourier transform (FT) of each segment is then calculated. In other words, the STFT provides the spectral information of each segment of the signals. For a continuous-time signal x (t), STFT coefficients can be represented mathematically using the following:where is the FT, is the window function, represent the time and frequency axis.

The original signal “x (t)” can be retrieved using the inverse STFT. It is represented using the following equation:

For calculating the STFT of a discrete-time signal, a discrete Fourier transform (DFT) can be used in place of FT. Mathematically, it is represented using the following equation:

Here, m is the starting point of the localized DFT, k is the DFT index, and L is the length of the window or segment. X [m; k] are the Fourier coefficients that depend on the time (n) and frequency .

STFT is a complex-valued function of two variables and requires a 4D plot of time, frequency, magnitude, and phase for the proper interpretation, which is practically not possible. Thus, the phase information is not considered while plotting the STFT spectrogram. In other words, time, frequency, and magnitude values represent an STFT spectrogram. Furthermore, a color-coding method is applied for the magnitude range, where a darker color represents a smaller magnitude value and vice versa. It is important to note that the size of the window shows a profound effect on the frequency resolution. A wider window provides a few time segments, resulting in lower precision in time but a high-frequency resolution. On the other hand, a narrow time window gives a high time resolution but a low-frequency resolution. Since the window length is fixed in the STFT method, the time and frequency resolution are fixed for the entire signal length. Figure 2 is a sample representation of an ECG segment of duration 1 sec. (sampling frequency 360 Hz) and its STFT at varying window lengths (L = 2, 9, and 18). It is evident from Figure 2 that with an increase in the window length, the changes in the time-domain values are less visible. On the contrary, the frequency domain changes are becoming more profound.

2.2. Continuous Wavelet Transform

The wavelet transform (WT) is a processing tool that has been widely used in signal and image processing and speech analysis. In 1984, two French scientists, Grossmann and Morlet, first coined the term “wavelet” and described it as a wave-like structure [11]. A wavelet has an amplitude that starts and ends at zero. The amplitude integral of the wavelets is zero. A detailed historical background of the wavelets is presented in [12, 13]. Several wavelet functions are available with diverse shapes and characteristics. Some common wavelets include Haar, Daubechies, Coiflet, and Symlet. The WT method solves the resolution problem associated with FT by providing a suitable resolution both in time and frequency. It is made possible by adopting a variable window function, wherein the window function shrinks and widens multiple times. The continuous wavelet transform (CWT) decomposes a given signal into different coefficients. Herein, a basis function called the mother wavelet is dilated and translated. Mathematically, the CWT is represented usingwhere and represents the conjugate function.

In equation (4), the term “” is used to normalize the mother wavelet (Ψ). The transformed signal generated after employing the CWT depends on the scaling factor (s) and the translation factor . The scaling factor shows an inverse association with frequency. A lower value of s leads to a rapid change in the wavelet and is used to detect the higher frequencies of the signal and capture the fast-varying details. On the contrary, a higher value of s helps perceive the lower frequency components and captures the slow varying details of the signal.

The reconstruction of the original signal can be obtained using

Scalogram is the absolute value of the continuous wavelet transform (CWT) as a function of time and frequency. Compared to the spectrogram, a scalogram provides more information as it gives the signal features at different scales. Figure 3 represents a sample ECG signal and its scalogram. As mentioned earlier, it is evident from the figure that the perceived frequency band is getting narrower with an increase in scale. ECG scalogram images are preferably used with deep learning models and have shown potential in various biomedical applications, including arrhythmia detection, apnea detection, and fall detection. The disadvantage of CWT is that it is highly redundant and shows a significant overlap between the wavelets at each scale and between the scale [14]. Furthermore, it is associated with higher computational complexity.

2.3. Discrete Wavelet Transform

Stromberg [15], a Swedish mathematician, proposed the mathematical foundation for the discrete wavelet transform (DWT) in 1980 [16]. A significant drawback of CWT is that the scaling factor (s) and translation factor value changes rapidly and, hence, calculates the coefficients of the wavelet for all possible scales. Thus, the method yields much new information [17], which is difficult to process. On the contrary, DWT addresses the aforementioned issues of CWT by representing the signal at a discrete time and as a set of wavelet coefficients. In DWT, the signal passes through a low-pass filter (LPF) and a high-pass filter (HPF) that splits the signal into half of the original frequency range [18, 19]. The low-pass filter output is the approximation component (A), and the high-pass filter output is the detailed component (D). The approximation component is further decomposed to form another set of approximation and detailed components in each subsequent level. Figure 4 represents the wavelet filter belts for DWT, where the x (n) is the original signal, and A and D bear their usual meaning.

DWT can be of two types based on whether each filter’s output is down-sampled by two or not. If the filter output is down-sampled during the decomposition process, it is called a decimated DWT. Undecimated DWT, also known as stationary wavelet transform (SWT), is the method that doesn’t incorporates the down-sampling operation at the filter output. Thus, in the case of SWT, the length of the approximation and the detailed coefficient are the same as the original signal. Usually, the term DWT represents the decimated method by default and is most commonly used due to its lower computational complexity than the undecimated method.

For a time-series signal, x (n) has the number of samples m, i.e., n ranges from 0 to m − 1. The scaling function and the wavelet function for the forward wavelet transform can be represented using the following equations:

Then, the signal x (n) can be represented (equation (8)) using the scaling and wavelet functions.

Equation (8) is also known as an inverse discrete wavelet transform. Figure 5 represents a sample representation of an ECG signal and its DWT coefficients after the 3rd level of decomposition using the db2 mother wavelet.

2.4. Wavelet Packet Decomposition (WPD)

Wavelet packet decomposition (WPD) extends the DWT, where the approximation and detailed coefficients are decomposed in the subsequent level. Hence, WPD provides a better frequency and time resolution compared to DWT. Figure 6 represents the wavelet filter belts for WPD, where x (n), A, and D bear their usual meaning, as described in Section 2.3. Similar to the DWT, the WPD can be of two types: decimated and undecimated. Generally, WPD follows the decimated method. A sample ECG signal and its wavelet coefficients after the 2nd level of decompositions using the db2 mother wavelet are represented in Figure 7.

2.5. Wigner-Ville Distribution (WVD)

Wigner [20], a Hungarian physicist in the year 1932, proposed the basis of the Wigner-Ville distribution (WVD) function. WVD is the quantitative representation of signal energy in the time-frequency domain. This method uses the autocorrelation function for the calculation of the power spectrum. The autocorrelation function (ACF) compares a signal (x (t)) to itself for all possible time shifts and is represented using the following equation:

In the ACF, the signal is integrated over a period of time, which makes it a function dependent only on . However, the WVD uses a variation of the ACF called the instantaneous autocorrelation function (IACF) to maintain the time parameter, and it is represented using the following equation:

The WVD function compares the signal information with its own at different times and frequencies. It can be viewed as the FT of the IACF.

Mathematically, it is defined using the following equation (11):

Compared to STFT, WVD gives better spectral resolution as it does not suffer from leakage. However, when a signal has several frequency components, it may be affected by the cross-term [21]. A cross-term occurs when multiple parts exist in the input signal, analogous in time and frequency beats. The cross-term can be minimized by modulating the WVD function by applying a sliding averaging window in the time-frequency plane. It is regarded as pseudo-WVD (PWVD) [22] and is more widely used than WVD. However, it reduces the effect of cross-terms to some extent but does not eliminate it.

Mathematically, the PWVD is represented using the following equation (12):

Figure 8 represents the PWVD of an ECG signal (360 Hz, duration 1 sec). Each data point in the WVD plot is represented with three signal variables: amplitude, time, and frequency.

2.6. Empirical Mode Decomposition

Empirical mode decomposition (EMD) is a local and data-driven adaptive method that is mainly applied to nonlinear and non-stationary signals. EMD splits a signal into many nanocomponent functions called Intrinsic Mode Functions (IMFs) [23]. The IMF holds a relationship between phase and frequency. An IMF must satisfy two conditions: (1) For a given signal, the number of zero crossings and the number of extrema must be equal to zero; if not, it must differ by one. (2) The mean of the envelope created due to the local maxima (peak of a wave) and the local minima (valley) is zero. In other words, the IMF represents only the simple oscillatory modes present in a signal. However, it does not ensure a perfect instantaneous frequency in all conditions. In [24], Peng et al. (2005) proposed an algorithm to extract the IMFs of a signal.

After the decomposition process, the original signal is characterized as the combination of the extracted IMFs and the residues . Mathematically, it can be represented using the following equation (13):

Figure 9 represents a sample ECG signal and the set of extracted IMFs and residues (Figure 8(b)). The figure also illustrates the instantaneous frequencies (Figure 9(c)). It can be observed from the figure that the lower IMFs capture fast oscillatory modes. On the contrary, the higher-order IMFs capture the slow oscillation modes. The limitation of the traditional EMD method is mode mixing in the case of signals with closely spaced frequencies [25].

2.7. Hilbert Huang Transform

The Hilbert Huang Transform (HHT) is an extension of EMD. It is the application of the Hilbert transform (HT) to the extracted IMFs. After finding all the IMFs from the original signal, the HT is applied to get the from each Mathematically, it is represented using the following equation (14):where is the analytic signal obtained using the Hilbert transform of the IMFs.

Replacing with in equation (14) and neglecting the value of , it yieldswhere .

At the output, the HHT produces an orthogonal pair for each IMF that is phase-shifted by 90°. In addition to the orthogonal pair, the IMF calculates the instantaneous variation in its magnitude and frequency over time. Hence, HHT can be a helpful method when analyzing nonlinear and nonstationary time series data.

2.8. Some Modified Joint Time-Frequency Methods

The aforementioned joint time-frequency methods form the basis of many advanced methods, which have been proposed in recent years. These advanced methods try to eliminate the limitations associated with the original techniques. Hence, these advanced methods have gained much attention in many signal-processing applications. Initially, it has been a general consideration that the Fourier transform method is applicable only for the spectral analysis of stationary signals. However, a modified Fourier transform method was recently developed for nonlinear and nonstationary signals application. This method is called the Fourier decomposition method (FDM) and has been employed as a time-frequency analysis tool [26]. Several-modified wavelet analysis methods, including least-square wavelet analysis (LSWA) and least-square cross wavelet analysis (LSCWA), have also been proposed [27]. Numerous variations in wavelet transformation methods have been reported recently. This includes tunable Q-wavelet transform (TQWT) [28], stationary wavelet transform (SWT) [29], empirical wavelet transform (EWT) [30], and dual-tree complex wavelet transform (DTCWT) [31]. The advantage of the TQWT is that it does not require the adjustment of the wavelet base function and can easily be adjusted according to the signal [32]. SWT shows the local time-frequency characteristics of a signal and has multiresolution analysis capability [33]. The EWT method is an adaptive wavelet method that uses a wavelet subdivision scheme. The method segments a signal’s spectrum and perfectly reconstructs the input signal [34]. DTCWT shows several advantages compared with DWT. These include approximate shift-invariance, directional selectivity, and perfect reconstruction of the original signal [34]. Also, compared to other numerical methods, DTCWT is faster and more effective.

The empirical mode decomposition (EMD) method has also received several improvements in the last decade and has formed the base for a number of decomposition methods [35], that include variable mode decomposition (VMD) [36], complex variable mode decomposition (CVMD) [37], Local mean decomposition (LMD) [38], ensemble empirical mode decomposition (EEMD) [39], multidimensional EEMD [40], complex EMD (CEMD) [41], Complete EEMD with adaptive noise (CEEMDAN) [42], and multivariate empirical mode decomposition (MEMD) [43]. VMD is an adaptive EMD method where the signal decomposes into many band-limited IMFs. The main advantage of VMD over EMD is that it eliminates the effect of mode-mixing during the decomposition process [44]. The LMD method produces a set of product functions after the decomposition process. Here, the time-frequency distribution of the original signal could be acquired from the instantaneous amplitude and frequency of the product functions [45]. The EEMD and CEEMDAN methods also eliminate the mode mixing issues of the EMD method by performing the decomposition over an ensemble of the signal with Gaussian white noise [46].

Modifications in the Wigner—Ville distribution functions resulted in pseudo-Wigner—Ville distribution (PWVD) [47] and smoothed pseudo-Wigner—Ville distribution (SPWVD) [48]. The HHT, as mentioned above, is also an advanced method of EMD, where the Hilbert spectral analysis is employed for each IMFs. The following section reports applying the aforementioned time-frequency methods in various ECG signal processing studies.

3. Applications in ECG Signal Analysis

The advancement in ECG signal processing methods has diversified its applications, both biological and nonbiological. Including various joint time-frequency methods in ECG processing has made the process efficient to a significant extent. The biological applications may include, but are not limited to, detecting abnormalities in heart rhythm, the onset of a seizure, sleep apnea, and so on. On the other hand, the nonbiological applications may consist of emotion detection, biometric identification, drug and alcohol detection, the removal of noise from the ECG signal, and so on. This section contains some of the most notable applications of joint time-frequency methods in ECG analysis.

3.1. Noise Removal

The acquisition of the clinical ECG signal is a noninvasive procedure that involves amplifying the biopotential signals using high-gain amplifiers obtained with surface electrodes placed over the skin. A conducting gel is also applied between the skin and electrode surfaces to reduce the skin-contact impedance and maintain proper conductivity. During the acquisition of the ECG signals, the signal may get contaminated with different noises. The primary noise sources in an ECG signal are power line interference, electrode instability due to improper adherence of the surface electrodes to the skin surface, and muscle activity. These noises are correlated with the original signal with a similar temporal distribution. However, they differ by intensity level. The noise signal possesses a variety of frequency bands, where the low, medium, and high-frequency bands signify the baseline wander (BW), power line interference, and electromyographic noise, respectively.

3.1.1. Baseline Wanders

The BW noise is prominent in the ECG signal at less than 1 Hz. Several factors may lead to this noise, including changes in electrode-skin polarization voltage, respiration, motion artifacts, and electrode, and cable movement. The peak amplitude and duration may vary according to electrode properties, skin contact impedance, electrolytes used, and electrode movement. This noise causes a shift in the isoelectric line during recording, hence, the name BW. The baseline drift is usually seen at a shallow frequency of 0.014 Hz in the ECG recordings.

3.1.2. Powerline Noise

The power line noise is mainly associated with the signal-carrying cables of the device. These cables are prone to electromagnetic interference at 50 Hz or 60 Hz. The two allied mechanisms that aid in powerline interference are capacitive and inductive coupling. However, in the case of the ECG, inductive coupling is more significant.

3.1.3. EMG Noise

The ECG data are acquired using surface electrodes placed over the human skin. It is important to note that various muscles are present underlying the human skin tissue. The contraction and relaxation of these muscles lead to the corruption of the ECG signals with the EMG signals from the underlying muscle tissues. The EMG noise is more defined in the case of differentlyabled persons, kids, and persons with tremor issues.

3.1.4. Electrode Contact Noise

As mentioned above, a conductive gel is usually used on the skin surface before the electrode placement, which acts as a dielectric medium and ensures good conductivity between the two electrodes (the skin surface and the measuring electrode). Electrode contact noise occurs when there is a change in the contact position of the electrodes to the skin. The loosening of the electrode contact may also contribute to the noise. Additionally, poor conductivity between the electrode and the skin surface decreases the amplitude and increases the probability of disturbance by reducing the signal-to-noise ratio (SNR). Maintaining the skin contact impedance as low as possible is advisable to ensure better conductivity between the skin surface and the measuring electrode.

The noise components in the signal contribute to its wrong interpretation, faulty observation, and inefficient feature extraction. Hence, removing the contaminants from the signal is crucial before further processing. Initially, moving average filters were used for this purpose, but they lost a lot of information due to averaging [49]. Various digital and adaptive filters were reported for baseline wander removal and motion artifacts [50]. However, determining the correct filter parameter is a difficult task. Again, these methods primarily focus on a single noise source. Time-frequency methods became popular as they can help remove multiple noises simultaneously. Various time-frequency methods, including wavelet transforms [51], EMD [52], WPD [53], and their variants, have been used in the literature for noise reduction. The conventional denoising steps include signal decomposition, identifying the decomposed signals where most of the noise is content, filtering these noises, and reconstructing the original signal. Figure 10 represents the basic steps involved in ECG denoising. Table 2 contains a comprehensive list of published papers that employed time-frequency-based methods to denoise the ECG signals in recent years.

3.2. Arrhythmia Detection

Cardiovascular disease (CVD) is one of the prime reasons for human death. As per reports, it contributed to 31% of the worldwide death in 2016. Out of these, 85% are due to a heart attack. Timely and early detection of the onset of the disease can help in reducing these statistics. Arrhythmia is a common manifestation of CVD known as heart rhythm disorder. It happens when there is an anomaly in the electrical conduction pattern of the heart. Though there are several forms of arrhythmia, namely, sinus node arrhythmia, atrial arrhythmia, junctional arrhythmia, and atrioventricular block [77], atrial fibrillation/arrhythmia is the most common. Usually, the irregular heartbeat does not show any harmful symptoms until it reaches a higher state, leading to a stroke, congestive heart failure, long-term or short-term paralysis, and sometimes even death. Thus, early detection of the progression of AF is crucial. The conventional way of diagnosing CVD is through a patient’s medical history and clinical tests. However, this method requires highly heterogeneous data and a medical expert for accurate prediction and interpretation, making the process inefficient. Also, the problem is more significant in places with a shortage of proper medical facilities. Therefore, for decades, researchers have been opting for a machine-based automatic system that uses physiological signals (ECG) for monitoring and diagnosis. Most of these diagnostic procedures follow a standard method, including ECG signal acquisition, decomposition, feature extraction, and classification for arrhythmia. The current section addresses different time-frequency-based methods in arrhythmia detection and their present status. Although several time-frequency methods have been employed for arrhythmia detection, wavelet-based methods have been widely explored in recent years. The discrete wavelet transform (DWT) is most prevalent due to its easy implementation. Figure 11 represents the block diagram of a DWT-based beat classification method, followed by Rizwan et al. (2022) [78]. Besides DWT, other methods, such as WPD and CWT, have also been employed. The CWT method is not widely used as the inverse CWT is not available in many standard toolboxes (MATLAB, Python, etc.) due to its high computational cost [79]. However, in many studies, the DWT and CWT were combined to improve classification accuracy. WPD, on the other hand, resulted in a larger feature set compared to the DWT method and showed potential in classifying arrhythmia. However, it is associated with high computational complexity. Some other time-frequency methods and their variants that have also been recently explored include EMD, HHT, WVD, and STFT. The STFT has been combined with deep neural networks such as recurrent neural networks (RNN) and convolutional neural networks (CNN) to obtain efficient results. Table 3 lists some of the recently published articles and discusses the time-frequency methods used, the features computed, and the classification method followed for automatic cardiac arrhythmia detection.

3.3. Sleep Apnea Detection

A good quality of sleep is crucial for leading a healthy life. Sleep apnea is the most common pathological condition that affects sleep quality [118]. It arises due to repetitive airflow obstruction and causes disturbed breathing during sleep time [119]. As per a recent report, around 1 billion people across the globe are affected by sleep apnea [120]. Nine hundred thirty-six million people aged between 30 and 69 have mild to severe obstructive sleep apnea (OSA), whereas 425 million have moderate-to-severe OSA. It has been reported that sleep apnea raises the cardiac disease risk by three times, the accident rate by seven times, and stroke by four times. OSA in the later stage can cause severe cardiovascular and neurocognitive problems if left untreated. Hence, early and timely detection of the disease is crucial. The conventional way of measuring sleep apnea is by performing polysomnography, in which the patient is asked to sleep after attaching several electrodes and sensors for the measurement. The test was performed in a controlled environment. However, the procedure is highly uncomfortable for the patient and may degrade sleep quality. Also, a dedicated person is required who can continuously monitor various physiological signals associated with brain activity, eye movement, muscle activity, etc. The process is time-consuming and expensive [121]. Accordingly, there is a need for a simple, low-cost, and automated method for its detection.

In recent years, researchers have implemented various physiological signals to detect OSA. However, the ECG signal is the most widely used physiological signal for the said purpose. This is because the acquisition of the ECG signal requires only a single-lead recording, which makes the measurement process simpler than other methods. Figure 12 describes the basic steps involved in sleep apnea detection. The current section discusses the application of different time-frequency analysis methods to the ECG signals to detect OSA. Hassan et al. (2015) used a single-lead ECG signal to classify the OSA in their research. They employed EMD, higher-order statistical features, and an extreme learning machine (ELM) for classification purposes. The authors reported a maximum accuracy of 83.77%. In [123], the authors used an eight-level wavelet packet analysis method on a short-duration (5 s) ECG signal to differentiate between central sleep apnea (CSA) and obstructive sleep apnea (OSA). CSA occurs when the brain is unable to send proper signals to the muscles associated with breathing. It is different from OSA, where normal breathing is hindered due to upper airway obstruction. In a similar study [124], the authors used wavelet-based ECG features to differentiate the CSA and OSA using an auto-regressive ANN classifier. They achieved a classification accuracy of 78.3%. Several other time-frequency methods, including DWT, and HHT, have also been used to classify sleep apnea. Table 4 summarizes some recently published articles in the field that use time-frequency methods during ECG processing.

3.4. Biometric Identification

Identification technologies are crucial in safety, security, and information protection [138]. The earlier approaches, including security keys, passwords, and certificates, are no longer secure as there is a high chance that they may be stolen or forgotten. Hence, biometric identification technology has emerged with great efficiency, considering the anatomical and physiological differences [138, 139]. Typical biometric examples include fingerprints, iris, and face IDs [140]. Even though these methods have been used with great popularity, they are not perfect enough as they can be forged. Recently, it has been found that the ECG signal can be used as a biometric as it is universal, stable, and easily measurable [141]. Again, the ECG of an individual solely depends upon the body shape, gender, age, emotional and the heart’s physiological status. It makes the ECG a unique signal. In general, visually differentiating the ECG signal of two individuals is very challenging due to the subtle changes in amplitude and duration. Hence, this method of pattern recognition has been employed for easy, quick, and reliable identification. The ECG signals used for biometric authentication are either one-channel, two-channel, three-channel, or 12-channel. Among these, the single-lead ECG is the most common due to its simplicity. However, it is unclear whether simplicity leads to better performance; hence, in some of the studies, 12 lead ECG data has also been used.

The ECG biometric identification process follows three crucial steps: preprocessing, feature extraction, and classification. In [142, 143], the authors showed that ECG exhibits a unique and discriminatory pattern and can be categorized according to the classifier employed. However, it is essential to note that the performance of a classifier relies on feature extraction methods [144, 145], where the raw ECG signal is used to extract informative features. In general, the features extracted for the biometric methods can be divided into two broad categories: fiducial and nonfiducial [144]. The fiducial method uses the characteristics of the ECG waves, such as different peaks, waves, and intervals, whereas the nonfiducial method does not use these characteristics.

Several feature extraction methods have been explored in the past. Though there is no generalized rule for determining the significant boundaries of the waves that helps in efficient biometric identification [146], the nonfiducial-based method is preferable. It is the reason that no reference detection is needed in this method [147]. Some examples of the most widely used nonfiducial methods include autocorrelation coefficients [148], wavelet coefficients [149], principal components [150], and time-frequency decomposition methods [151]. In this section, the application of time-frequency decomposition methods in biometric analysis has been discussed. Table 5 represents a recent publication that used different time-frequency decomposition methods for biometric identification. It is evident from the table that empirical mode decomposition (EMD), and discrete wavelet transform (DWT) are the two most widely used methods recently. Some researchers have also followed hybrid methods that combine two different time-frequency features or multiple features, including nonfiducial and fiducial features. The time-frequency method has used several classification methods, such as CNN, SVM, LDA, DT, and CNN. However, in most cases, the CNN model showed good performance compared to the other classifiers. The reason can be most of the deep learning models generate their own representative features during training.

3.5. Other Applications
3.5.1. Emotion Detection

Emotion is the consistent and separated response to external or internal events. The human emotional state can be defined using eight basic emotions: pleasure, sadness, anger, joy, curiosity, fear, and surprise. All other emotions can be a mixture of these primary emotional states. It has been reported in the literature that physiological signals are affected mainly by emotion. Hence, it can be used to detect and classify emotional states. Several studies have used the ECG signal to detect emotional changes [162165]. In the research of Dissanayake et al. (2019) [166], the authors used three ECG signal-based techniques and the EMD method to recognize the primary human emotions: anger, joy, sadness, and pleasure. They achieved an accuracy gain of 6.8% as compared to the other methods. Another study employed a wavelet-based approach to obtain features at different time scales [167]. The proposed method showed an accuracy of 88.8% in detecting the valence state and 90.2% in detecting the arousal state, respectively. Chettupuzhakkaran and Sindhu (2018) have performed a comparative analysis in different time-frequency methods to detect happy and sad emotions. The authors reported a higher accuracy in DWT’s case than in other methods (EMD, HHT, etc.) [168]. Wavelet transform and second-order difference plots were used in [169] to differentiate two emotional states: rest and fear, with a maximum accuracy of 80.24% using an SVM classifier.

3.5.2. Epileptic Seizures Detection

A seizure can be represented as an abrupt electrical disturbance in the brain activity that leads to a change in behavior, movement, and level of consciousness. Also, the onset of seizures affects autonomic nervous system activities. The literature suggests a significant difference in the physiological signals such as ECG and EEG has been observed during a seizure episode. The EEG signals have been used as a potential biomarker for seizure detection. However, significant ECG morphological changes have also been observed during a seizure episode. A shortened QT interval, ST-segment elevation, and T-wave inversion are typical changes in the ECG morphology [170, 171]. Nevertheless, a few research studies found in the literature only uses ECG signals for seizure detection. Most papers have extracted the time-frequency features from the EEG signal or ECG and EEG signals [172175]. But, in a recent study [176], Yang et al. found that the ECG signal was more efficient than the EEG signal in seizure detection. The authors used the spectrographic images of a short-duration ECG signal using the short-time Fourier transform (STFT). The images were used as the input to the CNN model for automatic seizure detection. Yet, more research based on the ECG-based features of epilepsy detection is needed in the future.

3.5.3. Driver Distraction Detection

Distracted driving is a severe concern for the safety of passengers and drivers. The three primary causes of distraction are taking the eye off the road, taking the hands off the steering, and a disturbed mind while driving. The secondary reasons may include conversations on the phone and active conversations with a passenger. Though social awareness and enhanced government rules have reduced the accident rate, the steps are insufficient. Hence, there is a need for real-time driver distraction detection. The ECG signal has shown potential application in real-time monitoring due to its properties: higher SNR, minimal implementation, easy to wear, and simple recording technology. Moreover, it does not show any latency issues compared to the camera-based detection system. The most crucial step in real-time ECG monitoring is the selection of features. Several time-frequency analysis methods have been reported in this regard. In [177], the authors have used the ECG subbands after decomposition using WPD. A set of WPD coefficients were selected, and three essential features, namely, power, mean, and standard deviation, were extracted from each coefficient. In the study, PCA was used as a dimensionality reduction method. The final feature set was used to classify the driver distraction using LDA and a quadratic discriminate analysis (QDA) classifier. In a similar study [178], the wavelet packet transform detected distraction during a phone call or conversation with a passenger. Dehzangi et al. (2018) have employed fused features extracted from the ECG signal [179]. It includes HRV parameters, spectro-temporal parameters, and power spectral density parameters. STFT was used for the spatiotemporal analysis. The optimal set of features was chosen using a feature selection method and various classifiers. The maximum detection accuracy of the driver distraction was 99.8%. Many studies have combined the ECG signal with other physiological signals such as EEG [180], EMG [181], and EOG.

3.5.4. Drug and Alcohol Detection

Early and timely drug overdose detection is crucial to maintaining health and avoiding major health problems. As per reports, nearly half of the emergency ward cases in the United States are due to drug-related overdose. It has been reported that most drugs influence cardiac functioning. The drug overdose may later lead to adverse cardiovascular events in many cases. Hence, the changes in the ECG signal can be a good indicator of this drug overdosage and can be used for its detection. Early research suggests changes in the ECG signal’s morphological parameters after consuming various drugs (e.g., benzodiazepines, acetaminophen, and opioids). In their study, Manini et al. (2017) evaluated the effects of an acute drug overdose on the electrophysiological parameters. A prominent R peak and QT dispersion were detected after the drug overdose [182]. In a recent study [183], QT interval prolongation was observed due to the overdose of hydroxychloroquine in COVID patients [183]. Similar findings were reported in the case of other drugs also. Some of the drugs include antidysrhythmic (sotalol), antidepressants (escitalopram, bupropion, citalopram, trazodone, and so on), antipsychotics (haloperidol, quetiapine), sodium channel blockers (amitriptyline, doxepin, imipramine, diphenhydramine, and nortriptyline, and so on), and the antiemetic serotonin antagonist ondansetron [184]. Apart from drugs, alcohol also showed a similar effect on the heart [185]. Recently, a few researchers have attempted to use ECG signals for automatic drug detection. Pradhan and Pal (2020) have reported that it is possible to use time-domain statistical and entropy-based features extracted from the ECG signal to automatically detect the presence of a psychoactive drug, “caffeine,” in the body [186]. In a recent study [187], the authors employed three different time-frequency methods, EMD, DWT, and WPD, to automatically detect the caffeinated coffee-inducedshort-term effect in the ECG signal. The application of ECG signals in seeing the impact of drugs and alcohol is new, and hence, a limited study is available in the literature. The exploration of joint-time frequency methods is insufficient and may be explored extensively in future research.

4. Limitations, Challenges, and Suggestions for Future Research

The main limitation of using the STFT method is that it does not show optimal time-frequency precision. Another disadvantage of the STFT method is that it is used primarily for short-duration ECG signal processing. However, short recordings are preferred during critical heart surgery to initiate the treatment process instead of investigating the longer-duration ECG signals [188]. In such cases, STFT-based signal processing has been proposed with definite success. Also, the STFT method is associated with varying spectral leakage due to applying different window functions. Another critical parameter while using the STFT method is choosing the correct window size. A limited time window shows a good time resolution but degrades the frequency resolution. Likewise, broader windows offer poor time resolution but a good frequency resolution. Hence, many employ more suitable techniques, such as the wavelet transform method (CWT, DWT, WPD, and so on). The wavelet transform can eliminate the problem of the fixed window size by using a varying window length and improving the time-frequency resolution [189]. However, it is unable to capture the edges of the signal adequately. Also, a trade-off exists amid WT’s accuracy and computational complexity. Choosing a suitable mother wavelet in the WT is crucial as the accuracy of a classification task is also affected by the choice of the mother wavelet.

The empirical mode decomposition (EMD) can overcome these limitations. EMD decomposes the signals into several IMF independent of the instantaneous frequency. The method delivers valuable data when little information about the underlying dynamic is available. However, careful application of the technique to any scientific research is required, as it lacks a proper theoretical background and is also associated with mode mixing [189]. Some extensions to the EMD method (including EEMD and VMD) were made to eliminate the disadvantages associated with EMD. VMD is more suitable for the analysis of nonstationary and nonlinear signals. The method shows a high operational efficiency and avoids information loss.

Several studies have implemented advanced time-frequency methods for analyzing and processing biopotential signals, such as EMG and EEG. For example, the tunable Q-wavelet transform (TQWT), combined with time-frequency features, was used to detect epileptic seizures using the EEG signal [190]. A recent study used the TQWT method to differentiate seven hand movements using the surface-EMG signal [191]. Ahmed et al. (2022) employed the LSWA method and computed the differential entropy features from each EEG segment. The calculated features were then used as input in the CNN model to detect different emotional states [192]. In a recent study, the authors used the EWT and deep learning methods to detect coronavirus disease (COVID) [193]. Despite their diverse applications, these advanced time-frequency methods in ECG signal processing are limited. Hence, in the future, these methods may be employed more efficiently.

Real-time implementation of the time-frequency method in different ECG applications is another big challenge. Most of the available research is based on offline analysis that excludes noisy data. Many recent articles have employed physiological data to monitor epileptic seizures [194], dynamic changes in the brain [195], vigilance [196], sleep quality [197], fatigue [198], and abnormal driving [199]. These methods have primarily used either the brain or muscle signals. Therefore, the real-time implementations of the afore-discussedtime-frequency methods in the ECG analysis may be explored in the future.

The current study has reviewed the application of various time-frequency decomposition methods for extracting ECG features. These features were then employed for various ECG-based applications, including arrhythmia detection, sleep apnea detection, biometric identification, noise elimination, and so on. A limitation of applying the feature extraction method is that the new features generated in the process are not always interpretable. Again, when there is a vast dataset, the conventional machine learning models do not perform satisfactorily due to the curse of dimensionality, which later needs feature selection methods. The deep learning models eliminate these issues as they can efficiently handle large datasets. Also, these models create their features, identify the correlated features, and then combine them to promote fast learning without providing explicit instruction. Though many studies have employed deep learning models with the 2D-ECG data (spectrogram, scalogram, and so on) or the decomposed signals, the field demands extensive analysis. It may be explored in the future.

5. Conclusion

The current study provides a background idea about different time-frequency methods and their biomedical applications in ECG analysis. The study also discusses the recently published articles that have used these methods in various ECG applications. Though it is hard to include such a vast area in a single article, the present paper stresses the current status and recently published articles in the last five years. The following observations can be made based on the current review: DWT is recently the most widely used method, irrespective of its applications. The EMD and its variants are more suitable methods for noise elimination. The 2D-image-based methods such as spectrogram, scalogram, and frequency plots are most widely used with the deep learning models and report higher classification accuracy in arrhythmia detection. However, its use in other ECG-based applications is still limited and needs more attention. Also, the applications of some of the advanced time-frequency methods mentioned in this review demand more consideration in future research. The current review will form a reference and provide a comprehensive idea about applying the time-frequency methods in the ECG signal analysis. Some of the typical applications include detecting arrhythmia and sleep apnea. Also, some nonbiological applications include biometric identification, drug and alcohol detection, driver distraction, emotion detection, and so on. The facts discussed in this review will provide information about the current status of the time-frequency methods. The study will help future researchers to fill in the gaps and overcome the challenges in the said field. The knowledge shared in this review will benefit society by bringing more advanced technologies for disease detection, diagnostic applications, and other nonbiological applications in the future.

Data Availability

No data were used to support this study.

Conflicts of Interest

The authors declare that they have no conflicts of interest.