Most people worldwide, irrespective of their age, are suffering from massive cardiac arrest. To detect heart attacks early, many researchers worked on the clinical datasets collected from different open-source datasets like PubMed and UCI repository. However, most of these datasets have collected nearly 13 to 147 raw attributes in textual format and implemented traditional data mining approaches. Traditional machine learning approaches just analyze the data extracted from the images, but the extraction mechanism is inefficient and it requires more number of resources. The authors of this research article proposed a system that is aimed at predicting heart attacks by integrating the techniques of computer vision and deep learning approaches on the heart images collected from the clinical labs, which are publicly available in the KAGGLE repository. The authors collected live images of the heart by scanning the images through IoT sensors. The primary focus is to enhance the quality and quantity of the heart images by passing through two popular components of GAN. GAN introduces noise in the images and tries to replicate the real-time scenarios. Subsequently, the available and newly created images are segmented by applying a multilevel threshold operation to find the region of interest. This step helps the system to predict the accurate attack rate by considering various factors. Earlier researchers have obtained sound accuracy by generating similar heart images and found the ROI parts of the 2D echo images. The proposed methodology has achieved an accuracy of 97.33% and a 90.97% true-positive rate. The reason for selecting the computed tomography (CT-SCAN) images is due to the gray scale images giving more reliable information at a low computational cost.

1. Introduction

Using the CNN, the entire image is processed which requires lot of resources and needs high-end GPU which makes the deployment of the model expensive. Image segmentation can find the region of interest by clustering the pixels with homogenous labels. Since working with only fewer parts of the images reduces the resources, it is more efficient than the CNN. This process also enhances the granularity of the images by focusing only on the characteristics that are associated with the boundaries of the images. The recognition of different class labels in the MRI or CT scan images is known as “image stratification.”

Image stratification is a basic method of picture interpretation that allows the utilization of satellite pictures as a geographical frame of reference is image stratification. To regionalize the patterns of emergence associated with particular human activities, to focus subsequent efforts on qualitative evaluation and the collection of field or airborne data, and to reduce the differences with quantitative estimates of particular landscape parameters associated with human activities, stratification is used in human dimension studies. With this technique, a detailed image can be divided into a number of straightforward spatial structure scenes.

Each image pixel is associated with a class description, such as a human, a flora, or an automobile. The system did follow by semantic fragmentation that deals with many items of the same category. On the contrary, instances of the same class have been treated as different isolated examples in instance segmentation [1]. The most widely used techniques for image fragmentation include the thresholding approach, methods used for border identification, regional methodologies, strategies used for categorization, methods dependent on the shade, tactics depending on the selective differential approximation, and processes relying on the ANN. In general, threshold segmentation is done based on a single value throughout the image where the resultant contains binary classification, i.e., whose values are greater than the threshold and are considered as objects and values less than the threshold are considered as the background. This approach is not appropriate for the CT scans, so the proposed model has implemented a multithreshold concept which categorizes the regions into complex objects, simple backgrounds, complex backgrounds, and others. Multitiered thresholding is a technique that divides a grey image into numerous areas. For the image and the fragments of the image in specified areas, which correlate to one background and additional subjects, this approach calculates more than one boundary.

One of the notorious innovative approaches used in the image stratification process is GAN. GANs are employed in reinforcement learning, fully supervised learning, and semisupervised learning. With practice, this method can create new data with the same metrics as the training dataset. GANs are typically measured using the inception score, which assesses how varied the generator’s results are (as determined by an image classification, often Inception-v3) or Frechet inception distance (FID). GANs have been suggested as a fast and precise way to predict the generation of high-energy jets [2]. A generator and a discriminator are present in GANs. The generator attempts to trick the discriminator by creating fraudulent samples of data (such as an image and audio) whereas the discriminator attempts to differentiate between genuine and bogus samples. Both the generator and the discriminator are neural networks, and throughout the training phase, they compete with one another. The procedures are repeated multiple times, and each time, the generator and discriminator become better at what they are doing. Mainly, GANs are classified into 5 types. They are Vanilla GAN, LAPGAN, CGAN, SRGAN, and DCGAN.

GANs are the algorithms that use two neural models that play against each other (the “adversarial”) to create new synthesis situations for transmitting actual statistics. These were frequently utilized in the production of images, videos, and speech. In a couple of years, there has been a stunning progress in using the GANs. The elevated realistic photo generation has significantly incresed image preservation. Not every GAN produces images. For example, GANs were also employed by researchers to create synthetic text input voice. For illustration, GAN may be used for comics and cartoons to generate facial representation automatically. A specific data collection, such as anime character models, trains the generative networks. By evaluating the dataset of supplied photos, the GAN produces new actors.

2. Literature Survey

According to Dorgham et al. [3], the advancements in image segmentation in medicine have shown tremendous results. The need to attain the effectiveness of this segmentation was crucial. A multi-iterative approach is needed to explore the search space to solve any medical-related image segmentation issues. For this, the developers here have introduced an MBO framework that computes based on several threshold points. This method was developed as a comparative study with the previously known brute-force methodology and two other multiple iterative frameworks, namely, DPSO and fractional-related DPSO. A similarity index grid and higher disturbance-to-signal rates were utilized to calculate the effectiveness of the fragmented derived images. The authors claim that their developed framework has increased the efficiency in segmenting the images with incredible speed

Abualigah et al. [4] explain the importance of multilevel thresholding and its drawbacks of difficulty in computation as the number of threshold values increases. This paper proposes a development-based DAOA algorithm that uses the standard differential mathematical operations to solve the problem to solve these issues. In the context of a multilevel threshold challenge, the described approach is used to assess Kapur across class deviation measures. Leveraging eight reference photos from two separate groups, environmental and CT COVID-19 images, the proposed DAOA will be employed for evaluation. The standard assessment measures used to verify the exactness of divided images are PSNR and SSIM. In comparison with various thresholding approaches, the suggested technique performance is assessed. There are a variety of different threshold levels of 2–6 shown in the results. The recommended strategy is superior and provides better alternatives than other comparison approaches, as per experimental data

Sun et al. [5] explain the importance of early-stage predictions in diagnosing brain tumors that could lead to cancer using a more profound understanding of medical image analysis. According to the authors, MRI imagining has shown advanced tends in identifying any injury in the brain and could easily explain its anatomy. This study extensively discusses the different modern approaches to brain tumor fragmentation [68]. The statistical research and the effectiveness evaluation of current methods are carried out. The article showed that several image segmentation algorithms are extensively addressed. One of the best optimal fragmentation approaches of the brain MRI tumor may be used to deliver practical responsiveness findings, accuracy, and DICE. This survey report gives complete knowledge about the different categorization strategies, including their advantages and disqualifications. Quality criteria indicate the performance of the approaches

From the authors’ point of view in [9], diverse image diagnostics have various problems, such as inhomogeneity of intension, distortion, small comparison, and undefined borders. The developers proposed a novel, utterly mechanized technique for categorizing clinical images that uses the benefits of thresholds and a dynamic contour concept to address these problems. In the present work, the optimizer of Harris Hawks is utilized to determine the best target value for the first classification contour [1012]. A substantially different Gaussian filter is used to improve the acquired shape of the dynamic contour prototype additionally. This framework is said to experiment over the two datasets. One contains many attributes that define several challenging aspects, and the other has a collection of many ranges of normal hearts, diseases, and issues. The evaluation was carried out with a DICE score which showed better differentiation in the skin with 0.90 and cardiac with a 0.93 DICE scores.

Because of several reasons [13], particularly air contamination, there are a significant growth in the number of chest-related illnesses and an alarming increase in the number of such victims. The developers here have used a machine training technique in this investigation to identify several chest-related disorders with the CNN on the chest X-ray collection. The method is oriented towards standard image differentiation algorithms, incorporating cutoff, -mean grouping, and boundary identification. The CNN cannot detect and interpret the overall piece simultaneously, repetitively examining the tiny pixel areas until the complete image is captured. Spatial adjustment levels and VGG19 have been utilized to retrieve functions, and ReLU stimulation was implemented as an accelerator because of its underlying minimal complexity and great computational speed [1416]. The significant contribution of the current technique is that the image’s fundamental, predictive characteristics remain, coupled with a substantial drop in dimensions.

Ramos-Soto et al. [17] automated fragmentation of retinal blood vessels is an alarming issue, and the drawbacks of meeting the problems in fragmentation with the standard measures are explained in detail. This paper is aimed at overcoming those by experimenting with two datasets. The approach presented has three parts, pretreatment, primary treatment, and posttreatment. The initial phase is applying image softening techniques [1820]. The preliminary handling phase will be separated into two combinations: the new optimized top hat, homomorphic screening, and medium screen, the first to split the dense vessels. Then, the intermediate configuration is utilized to separate narrow vessels using the MCET-HHO multilayer method, which is optimized using a high-hat system, homomorphic processing, paired and segmented. In the later stage, morphological image modifications are also performed. The efficiencies achieved from the first dataset are 0.98, 0.75, and 0.96. The values from the second dataset are 0.98, 0.74, and 0.95 for the second dataset

According to the investigators of [21], one of several primary factors of mortality is cardiovascular disease. Immediate treatment increases the quality of therapy and decreases the fatality rate. Investigators have been drawn to create a certified insightful wellness decision assistance platform by using ECG signals for wellness diagnosis. In this investigation, the ECG data for sufferers with AI methods are analyzed to provide a sophisticated, early intervention system for three prevalent cardiac illnesses, apnea, AF, and HF. This system is used to develop three distinct approaches to AI–ANN, SVM, and KNN. ECG impulses from PhysioNet are examined, information from the researchers is gathered, and four characteristics are retrieved and utilized as a categorization system source. The conclusions demonstrate that the recommended AI approaches have significant advantages, which can save lives of cardiomyopathies [22, 23]. The recognition rate with KNN is 92.4%, with . The ANN delivers 95.7% efficiency at 33 stages, whereas the most excellent reliability in categorization is achieved with a cluster efficiency of SVM of 97.8%.

The authors [24] present an algorithm for two phases in this work. The primary step involves ECG separation relying on NN with the conservative principle of the convoluted BSTM. The further action is based on the CNN applied to ECG beats collected across multiple periods from the preceding phase. ECG pulses are converted into 2D images through limited-time Fourier transform to authenticate normal ECGs and forecast premature cardiac demise from heart damage, such as arrhythmias or severe cardiac inability. The exactness of various timeframes was examined [1, 2527]. With 4 min ECG, we have diagnosed heart failure spontaneously at 100 percent, arrhythmia at 97.9 percent, and abrupt cardiac arrest at 100 percent [2831].

Table 1 represents the merits and demerits that occurred during the segmentation of heart images performed by different researchers. Further research can focus on the significant limitations identified in this article.

3. Proposed System

In the proposed system, first, the model tries to preprocess the CT images collected from the Kaggle repository [32]. The sample images of the dataset are shown in Figure 1.

3.1. Multilevel Threshold Segmentation

Multilevel thresholding parts a grayscale into several distinct zones. This segmentation method chooses multiple thresholds and splits the target picture into several brightness zones, with each zone representing a different background and various items. In accordance with the image pixel distribution around the mean, the peak pixel values are coarser in a wider interval. When used on objects with colorful or intricate backgrounds, the method works far better than bilevel thresholding. Pixels above and below the threshold are categorized as belonging to the white and black classes, respectively, based on an initial estimate of the threshold (for example, mean picture intensity). Effective segmentation of the entire image is performed using several thresholds discovered at each stage utilizing metrics like mean and standard deviation.

3.2. U-NET Integration

In this model, the images are preprocessed using a widespread neural network known as “U-Net,” which consists of encoding and decoding parts. This model prioritizes the class labels assigned to each pixel based on the localization parameters, which play their role in the segmentation of the images. U-Net is integrated with GANs to color the infected parts in the images because with the black and white images, it is difficult to recognize the level of infection. The infected parts of the image are passed as the input to the “autoencoder” for extracting the high-level details from the infection.

3.3. Encoding Mechanism

With the failure of common compression techniques like JPEG, the encoder portion of the network is utilized for encoding and occasionally even for compression algorithm purposes. The network’s encoder component, which has a smaller number of concealed units in each layer, performs the encoding. The decoder, which predicts the presence of class labels through queries, allows for better use of spatial data than global average pooling. The ML-Decoder is extremely effective and scales well to hundreds of classes thanks to a redesign of the decoder architecture and the use of a novel group-decoding approach. ML-Decoder consistently offers a superior speed-accuracy tradeoff than employing a larger backbone.

The encoder part consists of 4 downsampling layers from size , which are associated with the ReLU activation function, which is defined as shown as follows: where denotes the vector representation of the features in terms of pixels.

The architecture of the encoder neural network is shown in Figure 2. The encoder with four layers also contains three max-pooling layers to extract low- and high-level features from the images.

The decoder part of the U-Net works precisely opposite to the encoder, with four upsampling layers and three max-pooling layers. The output of the preprocessed images is shown in Figure 3.

To create annotated images, the model takes the help of GAN architecture and creates a segmented image, as represented as follows.

Applying the Pseudocode 1 creates segmented images, as shown in Figure 4.

Input: Load the Heart CT_Scan Images, HCTSCAN
Output: Segmentation of the images based on the threshold values
1. Define the hyper parameters like learning rate, batch size, and number of epochs
2. Define callbacks with the following estimators
i. model checkpoint to store the best model after compilation
 ii. set the CSV logger to save the best scores acquired by the model
  iii. define the Tensor Flow board with an early stopping mechanism to reduce the validation loss
3. Fit and compile the training dataset by shuffling the records
4. i. if threshold value<= length(masked_image) then
  Create a predicted mask to have a segmented image
 ii. else expand the image to concatenate with previous ROI bounds
5. store the images in the necessary directory

Figure 5 represents the overall architecture of the proposed model. In this architecture, the first principal component is splitting the dataset and creating a validation dataset to evaluate the model’s performance. The second component is enhanced by creating new images using data augmentation and GANs. The novelty of this research is the creation of new images based on the augmented operation. The augmentation process involves the following steps: (1)Perform horizontal and vertical flips with a scale value of 1(2)Rotate images at 45 degrees with actual data on the -axis and with masked data on the -axis(3)Apply a color transformation on the DICOM images to transform them into RGB images to create .jpg images

The third component of the model is cleaning the images by applying the U-Net operation as a preprocessing step. The fourth component is to segment the cleaned noisy images based on the threshold images. This decision-making step either creates masked images that are ready for prediction or creates concatenated images for further operations. Finally, the images are evaluated based on two types of metrics. One involves machine learning metrics like accuracy and others. The second one requires image processing metrics based on epochs like DICE coefficient and others.

4. Results and Discussion

The convolution neural network is defined for five epochs, and all the results for each epoch are tabulated in Table 2.

In Figures 6(a) and 6(b), the -axis represents the epoch number and the -axis represents the percentage values of all the possible metrics. Figure 6(a) shows that all-important metrics have increased gradually and reached optimistic values. Figure 6(b) shows that the loss evaluation has progressively decreased, which is the essential characteristic of any accurate model.

The DICE coefficient is the ratio of the overlapped area multiplied by two intersections and the total area covered by the images. The result does illustrate as shown in equation (2), and it is also known as the “F1-score.” where the overlapped area presents the number of pixels which belongs to more than one cluster. Combined image areas represent the intersection of the region of different images, whose boundaries are shared.

The intersection over the union, otherwise known to be the Jaccard index for binary classification, is computed as the ratio of the intersection area and union area and is shown as follows:

Recall and precision are generally similar in definitions but not so. The memory is the retrieval ratio and relevant values among all the retrieved values.

Precision is the ratio of retrieval and relevant values among all appropriate values. These are illustrated in equations (4) and based on the components of the confusion matrix.

In Table 3, this research article has published a few critical image metrics like accuracy, recall, precision, and F1-score, to portray the quality of the images during the training phase.

The proposed model has considered the test dataset with a 20% validation rate, and in Table 3 as a sample, it has exhibited the metrics related to learning algorithms. The table represents that the recall value for most of the images is 1, which means that the misclassification rate is 0. It also show cases that the accuracy of all the images are above 93%; this proves that the system can easily pass the deployment test and predicts the test images passed by the user during the real-time scenario with an average of 95%. Initially, the model has suffered with 0% precision but it gradually recovered with the increase of epochs and randomly selecting the images. Since the F1-score is the weighted average of recall and precision, its performance also gradually increased in proportion to the precision. The overall accuracy, precision, and recall of the proposed system are shown in Figure 7.

Table 4 compares the proposed algorithm and previous researchers’ work. It is proven that the proposed model has the highest accuracy and improved by +0.6% than Bi-LSTM.

Figure 8 has proved that the accuracy has gradually increased and the proposed system has reached the highest accuracy than model “BiLSTM.” Most of the deep learning models have above 93%, but in the traditional approaches, it is just in the range of 75% to 80%.

5. Conclusion

From the results and discussion, it is observed that the proposed model has obtained more efficient accuracy than the previous researchers’ work. It is also observed that rather than working on textual parameters or inputs taken from the patients by querying their depression levels, it is better to work on 2D-echo to better predict the heart attack levels. Most researchers think that working with GANs will increase the complexity of the model. However, these proposed systems can prove that GANs, even though they increase the complexity, provide more qualitative data, which helps not improve the accuracy, recall, and precision. This system also helps in reducing the loss of the model during the validation phase. In a future work, the researchers can work on transfer learning to pretrain the models. It is also essential to modify the fully connected layers as per the requirement of the applications. The proposed research has shown an improvement of nearly +0.6% when compared to the base model, which is considered as “Bi-LSTM.” In the future work, the model can retrain the weights by using latest network models of transfer learning with the help of the swarm intelligence algorithm.

Data Availability

The dataset will be shared by the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare that there is no conflict of interest regarding the publication of this paper.