Abstract

Heart failure (HF) is considered a deadliest disease worldwide. Therefore, different intelligent medical decision support systems have been widely proposed for detection of HF in literature. However, low rate of accuracies achieved on the HF data is a major problem in these decision support systems. To improve the prediction accuracy, we have developed a feature-driven decision support system consisting of two main stages. In the first stage, statistical model is used to rank the commonly used 13 HF features. Based on the test score, an optimal subset of features is searched using forward best-first search strategy. In the second stage, Gaussian Naive Bayes (GNB) classifier is used as a predictive model. The performance of the newly proposed method (-GNB) is evaluated by using an online heart disease database of 297 subjects. Experimental results show that our proposed method could achieve a prediction accuracy of 93.33%. The developed method (i.e., -GNB) improves the HF prediction performance of GNB model by 3.33%. Moreover, the newly proposed method also shows better performance than the available methods in literature that achieved accuracies in the range of 57.85–92.22%.

1. Introduction

Heart failure (HF) is a condition whereby the heart is unable to supply enough blood to satisfy the body’s requirements. The coronary artery as an integral part of the heart is accountable for supplying blood to the heart. Coronary artery disease (narrowed or blocked arteries) is the most prevalent type of heart disease and the most common cause of HF [1].

There are many imperilling conditions that result in a HF disease. These conditions can be put into two categories, with the first category consisting of risk or imperilling conditions that cannot be altered, e.g., patient’s sex, age, and family history. The second category, which can be altered, consists of conditions that are attributed to the way of life of the patient, for instance, smoking habit, high cholesterol level, high level of blood pressure, and physical inactivity [2]. In addition, prevalent HF symptoms include dyspnea (shortage of breath), edema (swollen feet), fatigue, and weakness.

With so many factors to be analyzed, HF management becomes very complicated and even worse, particularly in nations that lack appropriate diagnostic instruments and medical experts [3, 4]. Furthermore, different tests are recommended by health practitioners to diagnose HF disease. Some of these tests are electrocardiogram (ECG), nuclear scan, angiography, and echocardiogram [5]. Among these tests, ECG is a noninvasive technique [6, 7]. But it is not very effective as it may lead to undiagnosed symptoms of HF disease [5]. This factor leads to angiography, a sort of diagnosis used to verify instances of heart disease. It is considered as the finest approach for HF disease diagnosis. However, some problems are associated with it such as its side effects, high cost, and requirement of high level of technical expertise [8, 9]. Thus, alternative modalities are needed which can solve these problems. It is therefore necessary to develop an efficient, intelligent, medical decision-making support system with the principles of data mining and machine learning.

In literature, various decision support systems with regards to support vector machine (SVM), decision tree, k-nearest neighbor (KNN), fuzzy logic based algorithms, artificial neural network (ANN), and ensembles of ANN have been suggested for the prediction of HF disease [1, 2, 1017]. Robert Detrano, who gathered HF-related information for the Cleveland heart disease data, used logistical regression to predict HF risk assessment, achieving a classification precision of 77%. Newton Cheung verified the feasibility of various classifiers including C4.5, Naive Bayes, BNND, and BNNF algorithms and achieved 81.11%, 81.48%, 81.11%, and 80.95%, respectively, as HF risk prediction accuracies. Polat et al. [18] used a decision support system that utilized artificial immune system (AIS) and produced 84.5% of classification accuracy. Özşen and Güneş [19] developed a modified AIS and obtained an HF risk prediction accuracy of 87.43%. A model of neural network ensemble was proposed by Das et al. [2] to enhance the classification precision and obtained a percentage of 89.01 for HF prediction accuracy. Samuel et al. [20] proposed an embedded decision support scheme on the basis of ANN and Fuzzy AHP and obtained a 91.10% HF risk classification accuracy. Ali et al. suggested a machine learning method by stacking and optimizing two SVMs together for improved HF risk prediction and obtained HF prediction performance of 92.22% [21]. Paul et al. have recently established an adaptive weighted fuzzy system ensemble-based model. Their suggested model led to a 92.31% HF prediction accuracy.

Inspired from the various decision support systems proposed earlier and discussed above, we also attempted to develop a new decision support system for HF risk prediction with an aim to improve classification accuracy and reduce computational cost or complexity. The decision support system developed in this study is named -GNB. The -GNB model uses statistical model to rank features according to test score. To obtain an ideal (i.e., optimal) number of the ranked features in this paper, we exploit the forward best-first search approach. The performance of each of the generated subset of features is evaluated using GNB model which is used as a machine learning classifier. It is worth discussing that the suggested -GNB model utilizes a simple predictive model but exhibits better efficiency than more complex predictive models such as ANN and even ensembles of ANNs. No prior research, to the best of our awareness, addressed the hybridization of GNB model from the family of Naive Bayes classifiers with model for the detection of HF disease. Compared with other techniques in the literature, the experimental findings of the suggested method are promising in terms of HF risk prediction accuracy.

The remainder of the manuscript is constructed as follows: Section 2 describes the materials (dataset) and the suggested methods. Section 3 gives a discussion of metrics used for evaluation and validation. Section 4 presents the findings of experiments and discussion. Section 5, as the final part, concludes the paper.

2. Materials and Methods

2.1. Dataset Description

The University of California, Irvine (UCI) provides an online repository for machine learning from which the Cleveland heart disease dataset was obtained for experiments in this paper. 297 samples out of 303 samples contained in the dataset include no missing values, whereas 6 cases possess missing features’ values. The data, in its original form, have 76 features. However, all the published research studies only refer to 13 of them. These commonly used 13 features are tabulated in Table 1.

2.2. Proposed Method

The suggested method, i.e., -GNB comprises two phases. The first phase ranks features with the use of the statistical model. Amidst each positive feature and class, i.e., θ, statistics are calculated using the model. That is, the model performs test which measures dependence between each feature and class. This approach, therefore, identifies those features (attributes) that most probably are not class-dependent. Thus, these features (attributes) are regarded as irrelevant for classification. The process of features selection itself is done in two phases. Phase one deals with the raking of features based on the test score while the second phase considers the search for ideal subset of features (attributes) from the available ranked features. It is worth mentioning that the feature ranking process is done using training data only, i.e., testing data are kept aside in order to avoid bias. Prior to the ranking of features and selection, data partitioning is performed. The features are ranked and selected on the basis of training data. The same attributes are also selected for the testing data during the validation phase or testing process. The process of ranking the features follows the basis of test which is expressed as follows.

For a classification that is binary in nature and contains τ instances, a positive and a negative class (two classes), we can construct Table 2 to compute the test score.

Similarly, , , and can also be computed. According to the general test formulation, we have

Readers can refer to [22] more enlightenment on the use of statistics for selecting and discretizing features. After features ranking by the above test score, we need to search the optimized subset of features based on the test. This is done by exploiting the forward best-first search algorithm. That is, first of all, we select solitarily, a feature having the greatest test score and then use the GNB predictive model to check its performance. In next iteration, we add another feature into the subset of features according to the test score and once more, a performance check is carried out using the GNB model on the subset of features. The same process is repeated till the point where the constructed subset of features approaches the full set of features. Lastly, a selection of the subset is made as the optimal subset of features showing the greatest performance. The ideal (i.e., optimal) subset of features is given to GNB to generate the best results. The formulation of GNB model is as follows.

Naive Bayes (NB) is a set of supervised predictive models known for their simplicity and effectiveness. These models learn the probabilities of an object with certain features belonging to a particular class or group, i.e., it is a group of probabilistic predictive models. These models are given the name “naive” because they make use of the naive assumption of independence, i.e., the models make the assumption that the occurrence of a certain feature is independent of the occurrence of other features. An NB model is based on Bayes theorem or rule, i.e., it evaluates the probability that a given instance belongs to a certain class. Given an instance X and a class label θ, using Bayes theorem, we can express the conditional probability as a product of simpler probabilities using the naive independence assumption:where denotes the features of the feature vector X. According to the naive independence assumption, we have

For all i, (3) will get the form:

As for a given instance, is constant. Thus, we can use the following classification rule:

For the estimation of the parameters in the NB model, i.e., and , maximum a posteriori (MAP) estimation is commonly used. The main idea is the same for different Naive Bayes models. However, different Naive Bayes models use different assumptions regarding the distribution of . In case of the GNB model, the likelihood of the features is assumed to be Gaussian.where the parameters and are estimated using maximum likelihood. In this paper, the performance of GNB is estimated on the HF disease dataset. To further enhance the performance of GNB, model is hybridized with it.

3. Evaluation Metrics and Validation Methods

3.1. Validation Methods

In this section, a binary classification problem is considered with two classes of diagnosis, i.e., healthy and patients who are prone to potential HF disease. Different studies have been conducted on the Cleveland heart disease dataset, and methods that achieved accuracies between 50% and 92.2% are reported in the literature. Most of these studies like [2, 23, 24] made use of a validation known as holdout, with a split of 70–30, i.e., training the proposed model with 70% dataset, and for the purpose of testing, 30% of the dataset is utilized. The methodology adopted in this paper for data portioning is the same as aforementioned.

3.2. Evaluation Metrics

The robustness of the proposed model is evaluated in this paper using different evaluation metrics including accuracy, specificity, sensitivity, and Matthews coefficient of correlation (). The percentage of subjects that are classified correctly represents accuracy in the training or testing dataset, sensitivity on the other hand, and carries information about patients that are classified correctly, whereas the correctly classified healthy subjects denote specificity.Here, the count of true positives is expressed as , the count of false positives is expressed as , the count of true negatives is expressed as , and the count of false negatives expressed as :

is the measure of quality of binary classification in machine learning. It can assume any value between the range of −1 and 1, where −1 shows the disagreement in total between prediction and observation, 1 shows a prediction that is perfect, and 0 refers to classification not more than a random prediction.

4. Experimental Results and Discussion

4.1. Experiment No. 1: Performance of the Proposed -GNB Model

In this section, the experimental results of the proposed -GNB are reported and discussed. After feature ranking by test, we obtained different subset of features. These subsets of features are applied sequentially to GNB predictive model for classification. The best accuracy of 93.33% is achieved on testing data with subset of features having , i.e., 9 features. As can be seen in Table 3, a training accuracy of 84.05% is achieved for the (optimal) subset of features, while sensitivity and specificity are reported to be 87.80% and 97.95%, respectively. From the perspective of machine learning, the ideal (optimal) subset of features apart from enhancing performance of the GNB model also decreases the complexity of the GNB model. This thereby, leads to a reduction in the GNB models training time. Moreover, the subset of features with includes , and . The classification results achieved for different subsets of features are reported in Table 3. The last row in the table represents the case when no features are selected, i.e., all the features are applied to the GNB model, and the prediction accuracy of 90% is achieved. Hence, it proves that the developed -GNB model enhances the potential of GNB model by 3.3%.

To get information about the statistics of correctly classified healthy subjects, patients and misclassification rate and confusion metrics are drawn in Figures 1 and 2 for training data and testing data, respectively. The horizontal class labels denote the predicted labels, while the vertical labels represent true labels. The testing data contain 90 samples out of which 41 are patients and 49 are healthy subjects. From the confusion matrix, the proposed model can correctly detect 36 patients, thus yielding sensitivity of 87.80% which is in accordance with sensitivity reported in Table 3. Similarly, out of 49 healthy subjects, the proposed method is capable of correctly classifying 48 subjects. Hence, the proposed method yields specificity of 97.95%.

The training data consist of 207 samples out of which 96 are patients and remaining are healthy subjects. The proposed model correctly classifies 76 patients, while 20 patients are misclassified. On the other hand, 98 healthy subjects are correctly classified, while 13 are misclassified.

4.2. Experiment No. 2: Comparative Study of the Proposed -GNB Model with Other State-of-the-Art Ensemble Models and Support Vector Machines

To prove how effective the -GNB method is, we perform a study by comparing it with other existing models of machine learning. These existing models encompass the support vector machine (SVM) with RBF and linear kernel and the ensemble models. The extra tree, often referred to as the randomized decision tree, Adaboost, and the random forest (RF) are the ensemble models that are considered for the comparison purposes. With the use of grid search algorithm, these models are therefore searched for the ideal (optimal) values of their hyperparameters. Table 4 presents the performance of each optimized model.

In Table 4, the number of trees in the forest is denoted by , a hyperparameter for the RF model. In case of Adaboost, the highest number of estimators for which termination occurs in boosting is denoted by the hyperparameter . Extra tree as an ensemble model makes use of averaging to enhance the accuracy of prediction and uses the number of randomized decision trees as a hyperparameter. With regards to extra tree, the number of trees used is represented by the hyperparameter . In terms of SVM, constant of the soft margin is denoted by C, and Gaussian Kernel width is denoted by G. Lastly, in the table, the size of the ideal (optimal) subset of features selected by the statistical model, in case of the -GNB, is represented by . It is clearly proved in the table that, with respect to performance, the proposed approach outperforms the different existing models (i.e., the SVM model with linear and RBF kernel and the ensemble models).

Further, two more metrics, i.e., receiver operating characteristic (ROC) curve and area under the curve (AUC) are used to conduct more investigations on the effectiveness of the -GNB model. For different thresholds, the true positive rate (TPR) versus the false positive rate (FPR) is displayed graphically by the ROC chart. This indicates that an ideal ROC curve of the plot is one that resides at the top-left corner of the graph. More precisely, we can state that the larger the area under the curve of the ROC chart, the better the developed model [25]. Figure 3 displays the ROC chart of the -GNB model and other models of machine learning. From the chart in Figure 3(a), the -GNB model indicates a 0.955 area under the curve (AUC). An optimized Adaboost model has a 0.925 AUC as depicted in Figure 3(b), whereas a 0.929 AUC as shown in Figure 3(c) is indicated by the optimized extra tree model. Furthermore, from Figure 3(d), the random forest model (optimized) indicates an AUC of 0.935. With the support vector machine (SVM), the linear kernel indicates a 0.949 AUC, whereas the RBF kernel indicates an AUC of 0.936 as displayed, respectively, in Figures 3(e) and 3(f). It is clearly proven as shown in the ROC charts that the -GNB approach outperforms the other existing optimized ensemble and SVM approaches.

4.3. Comparison of the Proposed Method with Other Well-Known Machine Learning Methods Proposed Earlier

In this section, we compare our results in terms of HF disease detection accuracy with the results of previously proposed methods in literature. Table 5 briefly discusses other methods developed for HF risk prediction and compares their accuracies with the model developed in this study. It can be seen that the newly developed model shows better HF risk prediction accuracy than many of the previous methods.

5. Conclusion

In this paper, on the basis of statistical model and GNB, we proposed a feature-driven decision support system for HF disease prediction. It was shown that the newly developed model, i.e.,-GNB, enhanced the performance of the conventional GNB model. In order to evaluate the robustness of the -GNB model, six evaluation metrics were used, i.e., sensitivity, accuracy, specificity, ROC, AUC, and MCC. It was observed that the proposed model improved the performance of GNB model by 3.33%. Moreover, a comparative analysis based on prediction accuracy between the -GNB model and other previously reported methods was carried out. It was shown through experimental results that the -GNB model outperformed the state-of-the-art ensemble models, support vector machines and many previously proposed methods.

Data Availability

The data used in the paper are publicly available at the UCI machine learning repository.

Conflicts of Interest

The authors declare that they have no conflicts of interest.