Abstract
Myocarditis is heart muscle inflammation that is becoming more prevalent these days, especially with the prevalence of COVID19. Noninvasive imaging cardiac magnetic resonance (CMR) can be used to diagnose myocarditis, but the interpretation is timeconsuming and requires expert physicians. Computeraided diagnostic systems can facilitate the automatic screening of CMR images for triage. This paper presents an automatic model for myocarditis classification based on a deep reinforcement learning approach called as reinforcement learningbased myocarditis diagnosis combined with populationbased algorithm (RLMDPA) that we evaluated using the ZAlizadeh Sani myocarditis dataset of CMR images prospectively acquired at Omid Hospital, Tehran. This model addresses the imbalanced classification problem inherent to the CMR dataset and formulates the classification problem as a sequential decisionmaking process. The policy of architecture is based on convolutional neural network (CNN). To implement this model, we first apply the artificial bee colony (ABC) algorithm to obtain initial values for RLMDPA weights. Next, the agent receives a sample at each step and classifies it. For each classification act, the agent gets a reward from the environment in which the reward of the minority class is greater than the reward of the majority class. Eventually, the agent finds an optimal policy under the guidance of a particular reward function and a helpful learning environment. Experimental results based on standard performance metrics show that RLMDPA has achieved high accuracy for myocarditis classification, indicating that the proposed model is suitable for myocarditis diagnosis.
1. Introduction
Myocarditis is a condition that causes inflammation of the heart muscle [1]. It can affect heart pump function as well as electrical activation and conduction, resulting in heart failure and arrhythmia, respectively. The etiology is diverse, including infection (e.g., viral infections such as COVID19 and parvovirus) [2], systemic inflammatory and autoimmune diseases, and drug reactions. Symptoms of myocarditis include chest pain, fatigue, and shortness of breath [3]. Patients with suspected myocarditis should seek cardiology advice for early diagnosis and treatment. Endomyocardial biopsy, an invasive procedure, is recommended in severe cases to confirm the diagnosis and to guide treatment [4]. Management comprises supportive measures, symptomatic heart failure therapy, antimicrobials for identified infective agents, and immunosuppression for severe inflammation. Early diagnosis and prompt institution of treatment can significantly reduce morbidity and mortality. Noninvasive cardiac imaging with cardiovascular magnetic resonance imaging (MRI) [5] can help clinch the diagnosis. However, MRI requires expert interpretation, which is manually intensive and subject to operator bias. In this regard, automated diagnostic systems can be developed that employ various machine learning and data mining algorithms to solve medical image classification problems efficiently [6]. They can be applied to reporting workflows to screen images automatically, saving physicians time, reducing errors, and enhancing diagnostic accuracy.
Excellent performance of indepth models has been demonstrated in diverse applications, including natural language processing [7–9], computer vision, and medical image analysis [10, 11]. Deep learningbased algorithms converge with suitable weights to minimize the error between the real and predicted outputs. Typically, deep models use gradientbased algorithms as backpropagation to learn the weights. However, such optimization methods are sensitive to initial weights and may become trapped in local minima [12]. This issue is mainly encountered during classification [13]. Few researchers have shown that populationbased metaheuristic (PBMH) algorithms [14, 15] may help to overcome this problem [16]. Among PBMH algorithms, the ABC algorithm is one of the most effective optimizers [17, 18]. It emulates the behavior of bees in nature and, unlike traditional optimization algorithms, dispenses with the need to calculate gradients, thereby reducing the probability of getting stuck in local optimizations [19].
Classification performance in many machine learning algorithms may be adversely affected by imbalanced classification [20], which occurs when one class contains disproportionately more data than the others [21]. While imbalanced models may still attain reasonable detection rates for majority samples, the performance for minority samples is weak as minority class specimens can be difficult to identify due to their rarity and randomness. Also, misalignment of minority class samples can result in high costs. Methods have been proposed to address the problem at two levels [22]: data level and algorithmic level. In the former [23–25], training data are manipulated to balance the class distribution by oversampling minority class and/or undersampling majority class [26]. For instance, the synthetic minority oversampling technique (SMOTE) generates new samples by linear interpolation between adjoining minority samples [24], whereas NearMiss undersamples majority samples using the nearest neighbor algorithm [25]. Of note, oversampling and undersampling can risk overfitting and loss of worthy information, respectively [27]. At the algorithmic level, the importance of the minority class can be raised using techniques [28–32] that include costsensitive learning, ensemble learning, and decision threshold adjustment. In costsensitive learning, different incorrect classification costs are attributed to the loss function for the whole class, with a higher cost being allocated to minority class misclassification. Ensemble learning systems train several subclassifications and then apply voting or combination to obtain better results. Threshold adjustment techniques train the classifier in the imbalanced dataset and modify the decision threshold during the test. Deep learningbased methods have also been suggested for imbalanced data classification [33–35]. The authors in Reference [36] introduced a new loss function for deep networks that could capture classification errors from both minority and majority classes. Reference [37] introduces a method that could learn the unique features of an imbalanced dataset while maintaining intercluster and interclass margins.
To the best of our knowledge, only one work [3] based on deep learning models has been proposed for the diagnosis of myocarditis. The authors developed an algorithm for classifying images based on CNN and the kmeans algorithm [38], which has the following workflow: after the data preprocessing stage, the images were placed in several clusters, and each cluster was considered a class in which the CNN classified. The algorithm was repeated for different clusters, and all the results were combined for the final decision. The main problem with the method was that it considered the image matrix as a vector in kmeans, which resulted in missed pixels around a specific pixel.
This paper presents a method based on the ABC algorithm and reinforcement learning called RLMDPA that we believe would address the above mentioned problems. The RLMDPA model poses the classification problem as a guessing game embodied in a sequential decisionmaking process. At each step, the agent receives an environmental state represented by a training instance and then executes a classification under the direction of a policy. If the agent performs classification perfectly, it will be given a positive reward and, otherwise, a negative one. The minority class is rewarded more than the majority class. The agent’s goal is to accumulate as many rewards as possible during the sequential decisionmaking process to classify the samples as correctly as possible.
The main contributions of this article are as follows: (1) we considered the classification problem of medical images as a sequential decisionmaking process. We presented a reinforcement learningbased algorithm for imbalanced classification; (2) instead of randomly weighting, we have developed an encoding strategy and calculated the optimal initial value using the ABC algorithm, and (3) this work is based on a new wellannotated MRI dataset acquired from Tehran’s Omid Hospital that we have named the ZAlizadeh Sani myocarditis dataset and made publicly downloadable.
The rest of the article is structured as follows: the second section is a brief overview of the ABC algorithm and its working. The third section introduces the proposed model. The fourth section presents the evaluation criteria, dataset, and analysis of the results. The last section states the conclusions and future works.
2. Background
2.1. Artificial Bee Colony Algorithm
Artificial bee colony (ABC) introduced by Karaboga and Basturk [39] is one of the most efficient algorithms for optimizing numerical problems. It is straightforward, robust, and populationbased [19]. The algorithm emulates the intelligent foraging behavior of bees to arrive at the optimal solution. There is a list of food sources that bees seek out over time to get to the best positions. The algorithm involves three groups of bees: employed bees, onlooker bees, and scout bees. Employed bees discover the positions of food sources, whereas onlooker bees wait in the hive for the nectar from food positions to be sent by employed bees. Onlooker bees use the information to select food source positions. Once an employed bee has exhausted the food source, it becomes a scout bee to search for new positions randomly. The number of employed bees equals the number of unemployed (onlooker and scout) bees. The steps for optimizing an algorithm using the ABC algorithm are as follows:(1)Initialization: in the first step, an initial population of size is formed from the positions (solutions), as in where represents the th position, each solution is dimensions, and means the number of parameters that must be optimized. and are the smallest and largest values in , respectively.(2)Employed bee phase: at this point, new solutions are recognized by searching the neighborhood for current potential solutions. To keep the population size constant, the quality of new solutions is evaluated. If it is better than the previous ones, it will be replaced; otherwise, it will remain fixed. This step can be formed as follows: where is a random solution such that . is a random number picked from the interval [0, 1]. The potentially new solution is obtained by changing only one element of .(3)Onlooker bee phase: for the onlooker bees update, one solution is stochastically elected from the potential solutions, that is, one of the open facility solutions, according to the probability relation anticipated as follows: The selection process follows the equation provided: the more appropriate a solution is, the higher the chance it will be selected. If the chosen employed bee scores higher than the current onlooker bee’s current solution, the current solution replaces the previous one. This process is repeated for all onlooker bees in population .(4)Scout bee phase: a solution that does not improve its fit after some repetitions can get the algorithm caught up in local optimization [40]. To prevent this, once the solution’s fit does not improve after t iterations, the algorithm will discard it, and a new solution will be supplied according to equation (2).(5)Algorithm end condition: although different conditions can be defined for the end of the algorithm, the term termination is repeated in this study, which means that the algorithm ends after iterations.
The complete ABC algorithm is given in Algorithm 1.

2.2. Reinforcement Learning
Reinforcement learning [41] is an important branch of machine learning that encompasses many domains. Reinforcement learning can achieve relatively good classification results because it can effectively learn the compelling features of noisy data. In Reference [42], the authors defined classification as a sequential decision problem that used several factors to interact with the environment in order to learn an optimal policy function. Due to the complex simulation between the factors and the environment, the run time was inordinately prolonged. The model presented in [43] is a classification based on reinforcement learning provided for noisy text data. The proposed structure comprises of two classifiers: sample selector and relational classifier. The former selects a quality sentence from the noisy data by following the agent, whereas the latter classifier learns acceptable quality performance from clean data and gives a delayed reward to the sample selector for feedback. Finally, the model yields a superior classifier and quality dataset. The authors in Reference [44] proposed a solution for time series data in which the reward function and Markov process are explicitly defined. In various specific applications [45–48], reinforcement learning has been applied to learn the efficient features. These models promote valuable features for the classification, which leads to higher rewards that guide the agent to select more worthy features. To date, limited work has been done on deep learning for the classification of imbalanced data. In Reference [44], an ensemble pruning technique for deciding subclassifiers that adopted reinforcement learning was proposed. However, the model underperformed when the amount of data was increased. This is because it is difficult to choose classifiers when there are too many subclassifications.
3. The Proposed Solution
The overall structure of the proposed model is shown in Figure 1. We considered two critical options for classification. In the first step, we formulated a vector that includes all the learnable weights in our model. We assumed an initial value for the weights with ABC and then applied the backpropagation in the rest of the path. As mentioned, another problem that most classifiers suffer from, including ours, is imbalanced data. To address this, we employed reinforcement learning [49]. These concepts are detailed in the following sections.
3.1. Pretraining Phase
Weight initialization of deep networks is an essential part of deep models. Sometimes, incorrect initial values can lead to a failure of convergence in the model. The proposed model has a deep network with weights that need to be optimized. In this section, we present an encoding strategy and fitness function for the ABC algorithm.
3.2. Encoding Strategy
In our work, the encoding strategy aims to arrange the CNN and feedforward weights in a vector that will be considered the position of the bees in the ABC. Setting the specific weights is a challenge. Nevertheless, we have designed an encoding strategy that is as appropriate as possible after a few experiments. Figure 2 illustrates an example with encoding of a threelayer CNN network with three filters in each layer and a feedforward network with three hidden layers. Note that all weight matrices in the vector are stored in rows.
3.3. Fitness Function
The fitness function is defined as follows to measure the effectiveness of a solution in the ABC algorithm [12]:where is the total number of samples, and and are the target and predicted labels for th data, respectively.
4. Classification
Due to the difference in the amount of data between our two classes, we face the problem of imbalanced classification. To address this, we used the imbalanced classification Markov decision process (ICMDP) to construct a sequential decision problem. In reinforcement learning, an agent tries to obtain an optimal policy by performing a series of actions in the environment while maximizing its score. In the case of our model, a sample of the dataset is provided to the agent at each time point and classified. The environment then transmits the immediate score to the agent. A positive score corresponds to a correct rating, whereas a wrong rating gives a negative one. By maximizing cumulative rewards, the agent can arrive at the optimal policy. Let be the imbalanced set of existing images with samples, where corresponds to the th image, and is its corresponding label. The following explains the intended settings:(i)Policy : policy means a mapping function , where and are a set of states and actions, respectively. In other words, every means performing the action in the state . is acknowledged as the classifier model with weights .(ii)State : each state is mapped with sample from the dataset . The first data are deemed the initial state of . For the model not to learn a particular order, the is shuffled in each episode.(iii)Action action is performed to predict the label . Since the offered classification is binary, , zero represents the minority class and one represents the majority class.(iv)Reward : reward considers the performance of an action. An agent with the correct classification gets a positive reward; otherwise, it gets a negative reward. The amount of this bonus should not be the same for both classes. Rewards can significantly improve model performance because the level of reward and action has been carefully calibrated. In this work, the prize is defined for action according to the following equation [27]: where and represent the minority and majority classes, that is, healthy and sick, respectively, and is a value in the interval [0,1]. The reward is less than 1/−1 as the minority class becomes more critical due to fewer data. In effect, we can ascribe more importance to the minority class in order for it to approximate the majority class. In the results section, we will see the importance of the value .(v)Terminal the training process is completed at several terminal states, which occur in every training episode. An episode is the transition trajectory from an initial state to a final state, namely, . In our case, an episode stops when all the training data have been classified or when a sample of the minority class is misclassified.(vi)Transition probability the agent goes from state to the next state based on the order of the read data. The transition probability is determined as .
In ICMDP, the policy function reports the probability of all labels by receiving a sample:
In reinforcement learning, the intention is to maximize the discounted cumulative reward, or in mathematical terms, to attain a high limit for the following expression:
Equation (7) is termed the return function, which contains all the accumulated return values of the agent searches in space. The discount factor [50] is the coefficient of the effect of each reward. The function measures the quality of a stateaction combination:
Equation (8) is expanded according to Bellman’s formula [51]
By maximizing the function supported by , more cumulative rewards can be achieved. The optimal policy of is assessed by considering the function as follows:
By combining the two equations (9) and (10), the function is expressed as follows [27]:
In a lowdimensional space state, the function can be easily solved by a table. However, the table technique is inadequate when space is joined. To solve this problem, learning algorithms are used. In these algorithms, the tuple received from equation (11) is saved as experience replay memory . The agent gets a minibatch from and executes the gradient descent on these data according to the following equation:where is an estimate of the function expressed as follows [27]:where is the following state , and is the action performed in ; means whether the agent makes a wrong classification for the minority class or not. Finally, the policy weights can be updated as follows:
In conclusion, the optimal function can be achieved by minimizing the loss function presented in equation (12). Notably, the optimal policy of is taken using , which is the optimal model for the proposed classifier.
4.1. Overall Algorithm
We devised the simulation environment according to the above. The structure of the policy network depends on the complexity and number of training samples. According to the structure of the training samples and the output, the network input equals to the number of data classes, which is equivalent to 2. The general training algorithm of the RLMDPA model is displayed in Algorithm 2. In this algorithm, the policy weights are first initialized using the ABC algorithm, and then, the agent continues the training process until an optimal policy is reached. Action is based on a greedy policy, which is also evaluated by Algorithm 3. The algorithm is repeated for times, which is taken as 18,000 in this paper. At each step, the policy network weights are stored.


5. Empirical Evaluation
5.1. Dataset
Cardiac magnetic resonance imaging (CMR) [52] allows for comprehensive anatomical and functional evaluation of the heart as well as detailed tissue characterization [53]. It is the preeminent imaging modality for noninvasive diagnosis myocarditis without biopsy. The Lake Louise criterion (LLC) [54] introduced benchmark criteria for diagnosing myocarditis using CMR [55] based on the presence of myocardial necrosis, edema, and hyperemia. The presence of late gadolinium enhancement confirms myocardial necrotic damage. T2weighted images uncover areas of interstitial edema, which indicates inflammatory response. T1weighted images before and after contrast can depict hyperemia in the myocardial tissue. Fulfilling two of three LLC criteria confers 80% accuracy for diagnosing myocarditis [56]. This article presents a model for identifying myocarditis by considering the three LLC criteria.
A oneyear CMR research project on myocarditis was conducted from September 2016 at Omid Hospital in Tehran, Iran, where we performed CMR on patients who were clinically suspected to have myocarditis (e.g., chest pain, elevated troponin, negative functional imaging and/or coronary angiographic findings, and suspected viral etiology) and the treating physician assessed that CMR would likely affect clinical management (e.g., ongoing symptoms, ongoing myocardial injury evidenced by persistent ECG abnormalities, and presence of ventricular dysfunction). The protocol had been approved by the local ethics committee. CMR examination was performed on a 1.5Tesla system [57]. All cases were scanned with body coils in standard supine position. T1weighted images were acquired in the axial views. Shortly after gadolinium injection, the T1weighted sequences were repeated. After approximately 10–15 minutes, late gadolinium enhancement [58] sequences were performed in standard left ventricular short and longaxis views. Table 1 summarizes the CMR sequence parameters [3].
A total of 586 patients were identified who had positive evidence of myocarditis on the CMR images, which might show one or more areas of disease. A total of 307 healthy subjects were included as controls. We chose eight CMR images from each patient or control subject for the analysis, which were one longaxis image and one shortaxis image acquired using each of the following four CMR sequences: late gadolinium enhancement, perfusion, T2weighted, and steadystate free precession. The final CMR dataset comprises 4,686 and 2,449 samples from sick (i.e., myocarditis) and healthy subjects, respectively. Figure 3 shows example images obtained from this dataset. It may be noted that in this study, analysis is performed at the image level, and not at the patient level. In other words, prediction is based on a single image regardless of how many images are available for each patient.
Institutional approval was allowed to use the patient datasets in research studies for diagnostic and therapeutic purposes. Approval was granted on the grounds of existing datasets. Informed consent was received from all of the patients in this study. All methods were carried out in accordance with relevant guidelines and regulations. Ethical approval for using these data was obtained from the Tehran Omid Hospital.
5.2. Metrics
To evaluate the classification performance of the proposed model, we used six standard performance metrics, namely, accuracy, recall, precision, Fmeasure, specificity, and Gmeans [59], and they are defined as follows:where TP, TN, FN, and FP are true positive, true negative, false negative, and false positive, respectively. The Fmeasure and Gmeans are commonly applied to evaluate imbalanced classification [27], which aligns nicely with our dataset sample distribution and the reason for existing our proposed method. In addition, it is noteworthy that our prediction is per image. In this way, the intelligent myocarditis classification system can effectively screen entire CMR studies and flag individual images for scrutiny by physician readers. For this purpose, low FP and high recall metrics would be desired.
5.3. Details of Model
This work used Python and the PyTorch framework. The codes are written in Jupyter notebook. We used five layers of twodimensional convolution for the CNN network with 128, 64, 32, 16, and 8 filters. The size of the kernel, stride, and padding in each layer are 3, 2, and 1 for both dimensions, respectively. Each convolution layer involves a maxpooling layer with dimensions of 2 × 2. The three fully connected layers have 128, 64, and 32 hidden layers, respectively. To prevent overfitting, dropout with a probability of 0.4 and early stopping are employed. In every experiment, the batch size is set to 64. The images in the dataset are in grayscale and light intensities of image pixels are mapped to the range [0, 1]. The images in the dataset come in different sizes and are all resized to 100 × 100 for analysis.
5.4. Experimental Results
While standard techniques like data augmentation and weighted loss function [60] can sometimes be used to correct the imbalanced data distributions, they are not applicable in all situations. In our experiments, data augmentation and weighted loss function do not enrich our model, which is not unexpected.
We used fold crossvalidation ( or 5CV) in all our implementations. The entire dataset is divided into subsets. subsets are applied for training and the remaining one for test. This procedure is iterated times until all data subsets are utilized exactly four times for training and once for testing. All parameters are expressed as means, standard deviations, medians, minimums, and maximums. First, we compared our proposed method with the only published work in this field, CNNKCL [3]. Next, to investigate the contributions of the two distinct components ABC and RL in our model, we compared the performance of a basic model without ABC and RL, that is, CNN + random weight, versus the models CNN + ABC and CNN + RL, which used ABC and RL for training, respectively. The evaluation results of our RLMDPA model performance as well as the aforementioned comparisons on the ZAlizadeh Sani myocarditis dataset are presented in Tables 2 and 3. In general, the RLMDPA model reduces the error by more than 43%. From the means of all the performance metrics, the RLMDPA model outperforms the CNNKCL method as well as CNN + random weight, CNN + ABC, and CNN + RL combinations of its components. Both ABC and RL individually improve on the basic CNN network across all assessed performance metrics, which supports the use of combined approaches of initial weight and reinforcement learning. For better visualization, the results are illustrated in Figure 4. In terms of time, the best model was obtained after 100 iterations in 2 hours, while CNNKCL got the best after 350 iterations in 5 hours.
Standard machine learning classifiers have not been successful in classifying medical images, because they typically assume images as onedimensional vectors, which cause the neighboring pixels of a specific pixel to be spaced apart. In order to compare with our deep model, we used five algorithms: support vector machine (SVM) [61], knearest neighbor [62], naïve Bayes [63], logistic regression [64], and random forests [65] to classify the CMR images of the study dataset. SVM performed the best among these methods but is still inferior to deep models. The results are summarized in Tables 4 and 5, and the mean performance metrics is shown in Figure 5.
5.5. Investigation of Other Metaheuristic Algorithms on the Algorithm
The proposed model employs ABC algorithm in conjunction with backpropagation for the initial value. To compare the performance of ABC versus alternative instructors, we employed ABC in our model with five conventional algorithms, namely, gradient descent with momentum backpropagation (GDM) [66], gradient descent with adaptive learning rate backpropagation (GDA) [67], gradient descent with momentum and adaptive learning rate backpropagation (GDMA) [68], onestep secant backpropagation (OSS) [69], and Bayesian regularization backpropagation (BR) [70], and four metaheuristic algorithms, namely, gray wolf optimization (GWO) [71], the Bat algorithm (BA) [72], Cuckoo optimization algorithm (COA) [73], and whale optimization algorithm (WOA) [74]. The population size and number of function evaluations are 100 and 25,000 for all metaheuristic algorithms, respectively. Other parameter settings can be seen in Table 6. The performance metrics of these comparisons are summarized in Tables 7 and 8 and illustrated in Figure 6. In general, metaheuristic algorithms are better than conventional algorithms with the exception of GDMA in terms of accuracy, recall, and Fmeasure scores. Importantly, the ABC algorithm outperformed all conventional and metaheuristic algorithms to improve the error in the recall and Fmeasure criteria by more than 25% and 22%, respectively.
5.6. Explore the Reward Function
The reward function is a practical device that helps the agent to achieve the goal. In this work, the minority class reward is , while the majority is . To examine the effect of the value on the classification model, we test 10 values of on the model. Details of the results for all the criteria for these experiments are given in Table 9. For better visualization, we have plotted the trends in Figure 7. On examination, for the accuracy criterion, when takes the values from [0, 0.3], the chart has an ascending trend, and from [0.3, 1] has a descending move` This process is valid for all criteria. If , the importance of the majority class is disregarded, and if , the importance of both classes is the same. Although the minority class is more important to us, the majority class cannot be ignored.
6. Conclusion and Future Directions
This article presents a new model for classifying myocarditis images. The proposed model consists of two steps. First, the model weights are initialized using the ABC algorithm. Next, the model is considered an ICMDP problem. The environment assigns a high reward to the minority class and a low reward to the majority class. The algorithm terminates when the agent makes a wrong classification for the minority class, or the number of episodes runs out. We performed several experiments to examine various factors that affect the performance of the proposed model. The designed experiments confirmed that the RLMDPA model with ABC and RL is an effective classifier for myocarditis images.
In the future, we will try to employ ensemble convolutional neural network (ECNN), as our model to use a set of CNN networks and connect them to yield higher performance. In addition, we can also work with the generative adversarial network (GAN), which is widely used in many applications. It may be worth exploring to employ the developed model for other medical applications such as stroke detection, cancer detection and plaque detection.
Data Availability
The dataset used to support the findings of this study is available on GitHub: https://github.com/vahidmoravvej/ZAlizadehSanimyocarditisdataset.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Authors’ Contributions
Seyed Vahid Moravvej, Roohallah Alizadehsani, and RuSan Tan contributed to prepare the first draft. Nahrizul Adib Kadri, Muhammad Mokhzaini Azizan, and U. Rajendra Acharya contributed to editing the final draft. Sadia Khanam and Zahra Sobhaninia contributed to all analysis of the data and produced the results accordingly. Afshin Shoeibi and Fahime Khozeimeh searched for papers and then extracted data. Zahra Alizadeh Sani, N. Arunkumar, Abbas Khosravi, and Saeid Nahavandi provided overall guidance and managed the project.
Acknowledgments
This research received no specific grant from any funding agency in the public, commercial, or notforprofit sectors.