Abstract

Cardiovascular disease is a major cause of death worldwide, and the COVID-19 pandemic has only made the situation worse. The purpose of this work is to explore various time-frequency analysis methods that can be used to classify heart sound signals and identify multiple abnormalities in the heart, such as aortic stenosis, mitral stenosis, and mitral valve prolapse. The signal has been modified using three techniques—tunable quality wavelet transform (TQWT), discrete wavelet transform (DWT), and empirical mode decomposition—to detect heart signal abnormality. The proposed model detects heart signal abnormality at two stages, the user end and the clinical end. At the user end, binary classification of signals is performed, and if signals are abnormal then further classification is done at the clinic. The approach starts with signal preprocessing and uses the discrete wavelet transform (DWT) coefficients to train the hybrid model, which consists of one long short-term memory (LSTM) network layer and three convolutional neural network (CNN) layers. This method produced comparable results, with a classification accuracy for signals, through the utilization of the CNN and LSTM model. Combining the CNN’s skill in feature extraction with the LSTM’s capacity to record time-dependent features improves the efficacy of the model. Identifying issues early and initiating appropriate medication can alleviate the burden associated with heart valve diseases.

1. Introduction

Cardiac arrest has become a common disease in today’s world. According to the World Health Organization (WHO), cardiovascular diseases (CVDs) are responsible for 17.9 million deaths per year, making them the leading cause of mortality. Heart attacks and strokes contribute to over of fatalities caused by CVDs, with one-third of these deaths occurring prematurely in individuals under 70 years. It is crucial to identify and treat cardiovascular diseases promptly to improve patients’ quality of life and well-being. This can also reduce the frequency of occurrences and prevent the onset of various complications [1]. A phonocardiogram (PCG) signal is a graphical representation of the sounds made by the opening and closing of heart valves. The signal shows how the heart sounds change over time. By monitoring these signals, doctors can detect heart abnormalities early on, which can help to reduce mortality rates [1].

Researchers have used time-frequency domain characteristics to train conventional machine-learning models for heart signal categorization. In [2], a similar technique is proposed that utilizes spectrogram phase and magnitude features to identify heart valve issues. The authors in [3] presents a TQWT-based two-stage abnormality detection approach, where SVM is used in the first stage for binary classification and the KNN is used in the second stage for further classification. In addition, Shannon energy [4], cochleagram features [5], and mel-frequency cepstral coefficients (MFCC) [6] are also considered as other characteristics to detect abnormalities in signals. To detect abnormalities in signals, Karhade et al. [7] present the time-frequency domain deep learning (TFDDL) framework techniques, which combine CNN architecture with time-frequency domain polynomial chirplet transform. A novel approach combining the wavelet scattering transform with the twin support vector machine (TSVM) was employed in the classification process [8].

The Yaseen Khan dataset [6], Physionet Challenge 2016 [9], fetal heart sound [10], and PASCAL heart sound datasets are available for the detection of abnormality in the PCG heart sound signals.

1.1. Contribution of the Paper

The primary contribution of the paper lies in its introduction of a multistage model that combines CNN and LSTM networks for heart sound signal classification. This innovative combination enables the extraction of intricate quasi-cyclic features from the heart sound signal, ultimately leading to more effective signal classification.

1.2. Organization of the Paper

This paper explains in detail the proposed model and its evaluation process. In Section 2, the step-by-step procedure of the proposed model is described, which includes signal preprocessing, decomposition, and classification using a novel hybrid deep-learning approach of the CNN and LSTM. Section 3 presents the outcomes of our experiments, demonstrating the performance of individual components such as the CNN, LSTM, and the hybrid model in classifying heart sound signals. Section 4 includes a comparative analysis with recently available methods, taking into account the model’s performance under noisy conditions. Finally, Section 5 concludes the study comprehensively.

2. Literature Review

Various models of machine learning and deep learning (DL) have been proposed for detecting heart sound abnormalities. In recent times, the deep learning neural network (DNN) has become a powerful tool for identifying abnormal heart sounds due to its strong feature representation capability. Thomae and Dominik [11] created an end-to-end deep neural network that focuses on temporal or frequency features to extract hidden features in the temporal domain. Ryu et al. [12] developed a CNN model specifically designed for segmented PCG classification. Recently, Humayun et al. [13] created a time-convolutional (tConv) unit to discover hidden features from temporal properties. Chakraborty et al. [14] used the cardiac cycle spectrum for training on a 2D-CNN. Figure 1 shows the evolution of abnormality detection methods.

Mel-frequency cepstral coefficients (MFCC) and discrete wavelet transform (DWT) characteristics were introduced by Yaseen et al. [6] and combined with support vector machine (SVM) for automated identification of heart valve diseases and other conditions. In the meanwhile, Zabhi et al. [15] classified HVDs using an ensemble of 20 feedforward neural networks (FFNNs) with time, frequency, and time-frequency characteristics. Nevertheless, approaches that depend on morphological characteristics need the identification of S1 and S2 sound peaks [15, 16]. The phonocardiogram (PCG) signal contains pathological fluctuations and noise interference, which make it difficult to detect these peaks [17, 18]. An integrated method is presented with k-NNACO algorithm. According to Rajathi and Radhamani [19], an analysis of accuracy and error rate is used to determine its effectiveness. An empirical Sstudy on initial proposed algorithms is proposed by Meena et al. [20] on the classification of heart disease dataset—its prediction and mining. All available methods need to be further improved in several areas. To cut computational costs, the detection algorithm’s efficiency must be improved. Also, false alarms should be reduced. This is especially important because the user will be interacting with the system; therefore, a low-cost solution is required. Table 1 shows details about similar work done by other researchers on different datasets.

3. Proposed Model

The proposed model classifies signals into the initial stage at the user end and a later stage in the clinical end as shown in Figure 2.

The suggested method uses TQWT transforms to help a convolutional neural network (CNN) categorize heart sounds into five types. The suggested approach is divided into three stages, preprocessing, denoising, and classification. Preprocessing is done on the signals in the first stage to make sure they are of the same length and normalized. After that, these signals are broken down into six levels: one approximation level coefficient and five detailed levels. Once this decomposition is complete, the output is combined into a vector that is fed into a one-dimensional CNN model. Lastly, a training session and subsequent dataset validation are performed on the hybrid model.

3.1. Preprocessing

The dataset for this study encompasses five classes of distinct heart sound signals, each sampled at a frequency of 8 kHz. The following preprocessing procedures are applied to each signal.

3.1.1. Resampling

To address the frequency range constraint of the fast Fourier transform (FFT), which cannot capture pathological sounds below 500 Hz, the signal sampling frequency is downsampled from 8 kHz to 1 kHz [36].

3.1.2. Scaling

Normalization is performed to overcome the effect of interclass variation on the amplitude of signals that suppress in amplitude variation in the signal. Normalization of signals is performed as follows:

3.1.3. Resizing

For a given dataset, the duration of the cardiac signal ranges anywhere from 1.15 seconds to 3.99 seconds. Each sample is comprised of three cardiac cycles, and the duration of these signals might vary owing to differences in the rate at which the heart beats. After identifying the beginning and end points of the transmission, the signals are shrunk down to the same size, which is 2.8 kilobits. Matlab’s “imresize” function, which makes use of bicubic interpolation, is called upon when the sample’s size has to be changed. The change in proportions, as seen in Figure 3, are seen at the original signal.

3.2. TQWT Denoising

In the Mallat method of sub-band coding algorithm, the signal is convolved with two frequency bands, high pass and low pass frequencies [37]. In this technique, the signal is divided into a detailed level and approximation coefficients, respectively. At the first level, the signal itself will be convoluted with a high-pass and low-pass filter. Downsampling approximation applied on the low-pass filter and get next level coefficients, as shown in Figure 4.

The detailed and approximation features at a particular level is obtained as given in equations (2) and (3):andWith the proposed technique, the heart sound signal is decomposed up to 18 levels, and a rough level coefficient vector is obtained. A tunable Q-factor qavelet transform (TQWT-) based adaptive thresholding technique [38] is used to suppress the in-band noise. The cardiac sound signal is decomposed by TQWT using eighteen TQWT decompositions. It is described as a time-condensed short-wave signal that transports energy and information. Both low-frequency and high-frequency sounds are categorized as being present. High-frequency transmissions have quality, while low-frequency signals have information.

3.3. Multiclass Classification Model

The TQWT denoising output generates approximation level coefficients using TQWT, organized in a 1-D array of length 2942, serving as input for training the CNN model. The model architecture consists of 5 layers: 1 input layer, 2 CP (convolution and pooling) layers, 1 fully connected layer, and 1 output softmax layer. Each CP layer employs padding to maintain output size matching the input. Following TQWT decomposition, 2942 coefficients are generated for each of the neurons in the layers. Each training epoch comprises 50 iterations, and within each epoch, nine iterations are executed. A learning rate of is applied in each iteration, totaling 450 iterations. Gradient descent optimization is utilized to determine the optimal settings for the output of the LSTM layer including the dropout layer, which is useful in noise-filled environments and plays a critical role in preventing overfitting. Table 2 illustrates the architecture of the proposed model.

4. Results and Discussion

For the evaluation of the proposed approach, a publically available dataset [6] of heart sound signals is used. It includes normal, aortic stenosis (AS), mitral stenosis (MS), mitral regurgitation (MR), and mitral valve prolapse (MVP) data samples, with each category having a total of 1000 data samples and each category’s 200 samples as shown in Figure 5. Each sample is given a frequency of one kilohertz and a length that is always fixed at 2800 samples. The entire dataset is divided into two disjoint sets, i.e., a training set and a testing set.

4.1. Evaluation Parameters

In this work, , , , and parameters are calculated for quality comparison. Metrics such as sensitivity (Sy), specificity (Sc), accuracy, and F-score were used to carry out the quantitative examination of performance. Quantitative parameters are estimated as follows:

It is also possible to see how many times each input class was correctly matched with the output class by computing the confusion matrix. The confusion matrix produced by the DWT denoising method is shown in Figure 6.

4.2. Proposed Method Results and Comparison to the Most Cutting-Edge Research Techniques

Using the same dataset, the suggested approach is evaluated in comparison to other methods that have just recently been presented for the detection of heart sound signals. The procedures are outlined in the following. Yaseen et al. [6] have been successful in extracting features based on MFCC and DWT. Create and analyze the performance using the SVM, KNN, and deep neural network methodologies. Using the WSST approach and random forest for the different categories, Ghosh et al.’s [2] method acquired the time-frequency domain signal, and they calculated the magnitude and phase characteristics. These characteristics come from the transmission. The approach, which is based on deep learning and employs the deep wavelet method, was proposed by Oh et al. [39] and colleagues. Wavelets are a kind of generative model that has been investigated for its potential to produce raw audio signals. The experiments are performed with the proposed hybrid model, as well as for the CNN model and LSTM model individually. The confusion matrix obtained using these models are provided in Figure 7. The CNN model achieved 97.8% accuracy, the LSTM model achieved 43.9% accuracy, and the hybrid model achieved 98.9% accuracy. It shows that the hybridization of CNN and LSTM networks produces better results than the individual models. It is expected because the hybrid of CNN and LSTM networks helps extract relevant patterns and exploit their time dependency. The CNN model alone produces satisfactory results. However, the LSTM model’s performance degrades drastically. These results indicate that the hybridization of the different models.

Table 3 depicts the performance parameters obtained for the CNN, LSTM, and hybrid models. The results obtained using the hybrid models are superior for all five categories compared to the CNN and LSTM models. The hybrid model achieved 100% F1-score for the classes N and MVP, while 98.04% for AS, 98.88% for MS, and 99.19% for MR. These results demonstrate the efficacy of the proposed hybrid model classifying all five heart sound signal categories, specifically normal vs pathological cases. Such a system will be helpful for automatically analyzing the heart sound signal.

4.2.1. Comparison with Existing Methods

The effectiveness of the proposed model is evaluated by comparing its performance with several recently introduced methods documented in the literature for the same dataset, as illustrated in Table 4. The proposed hybrid model with 98.9% accuracy is superior to all the compared methods. While some existing methods also show prominent results, the superior accuracy of the proposed model highlights its effectiveness in classifying heart sound signals. Its accuracy may reduce misdiagnoses and improve patient care, benefiting individuals with heart valve diseases and other cardiac disorders.

5. Conclusion and Future Work

This paper introduces a novel hybrid deep learning model comprising a 3-layer CNN and GRU to categorize heart sound signals into five categories, even in the presence of background noise. The proposed method yields impressive results, surpassing state-of-the-art methods. Notably, it maintains satisfactory performance even in noisy conditions. The success of the proposed method is attributed to the fusion of CNN and LSTM models: CNNs extract meaningful features through convolution layers, while LSTM exploits time-dependent features due to its recurrent nature. Moving forward, future work should focus on generalization to ensure effectiveness across diverse datasets, which is currently lacking in publicly available datasets. Also, real-time noise affects the quality of pathological signals. The inclusion of multiple stages in the proposed model inevitably prolongs the signal classification time.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.