Diagnosis of heart disease is a difficult job, and researchers have designed various intelligent diagnostic systems for improved heart disease diagnosis. However, low heart disease prediction accuracy is still a problem in these systems. For better heart risk prediction accuracy, we propose a feature selection method that uses a floating window with adaptive size for feature elimination (FWAFE). After the feature elimination, two kinds of classification frameworks are utilized, i.e., artificial neural network (ANN) and deep neural network (DNN). Thus, two types of hybrid diagnostic systems are proposed in this paper, i.e., FWAFE-ANN and FWAFE-DNN. Experiments are performed to assess the effectiveness of the proposed methods on a dataset collected from Cleveland online heart disease database. The strength of the proposed methods is appraised against accuracy, sensitivity, specificity, Matthews correlation coefficient (MCC), and receiver operating characteristics (ROC) curve. Experimental outcomes confirm that the proposed models outperformed eighteen other proposed methods in the past, which attained accuracies in the range of 50.00–91.83%. Moreover, the performance of the proposed models is impressive as compared with that of the other state-of-the-art machine learning techniques for heart disease diagnosis. Furthermore, the proposed systems can help the physicians to make accurate decisions while diagnosing heart disease.

1. Introduction

The heart is a vital organ in the human body that is liable for blood circulation. The heart is responsible for oxygen and energy supply to all organs of the body including itself. Heart disease causes the abnormal blood circulation in the body that might be fatal for human life. Hence, if the heart stops its normal functionality, the whole system will be dead. From the literature, various risk factors are identified that cause the heart disease. The risk factors of heart diseases are classified into two major types such as the risk factors that can alter, e.g., smoking and physical exercise, and the risk factors that cannot alter, e.g., gender, age, and patient’s family history [1]. The diagnosis of heart through conventional medical methods is quite difficult, complex, time consuming, and costly. Therefore, the diagnosis of heart disease is worst in developing countries due to lack of state-of-the-art examination tools and medical experts [2, 3]. Additionally, the invasive medical procedure for examination of heart failure is formed on various tests suggested by physicians, after studying the medical history of the patient and analyzing the relevant symptoms [4]. Angiography is considered as the gold standard among the medical tests for diagnosis of heart failure. Heart disease cases are affirmed through angiography as it is the best practice for diagnosis of heart disease. Moreover, angiography has side effects as well as higher cost for diagnosis of heart disease and demands extraordinary technical expertise [5, 6]. Therefore, machine learning and data mining techniques are needed to design the expert systems for resolving the problems of angiography.

To address the abovementioned problems, researchers have designed different noninvasive diagnosis systems by exploiting machine learning based predictive models. These models include logistic regression, naive Bayes, k-nearest neighbor (KNN), decision tree, support vector machine (SVM), artificial neural network (ANN), and ensembles of ANN for heart failure disease classification [1, 718]. Robert Detrano utilized logistic regression for heart failure risk prediction and attained classification accuracy of 77%. Newton Cheung utilized various predictive models consisting C4.5, naive Bayes, BNND, and BNNF algorithms. The accuracies of proposed algorithms were 81.11%, 81.48%, 81.11%, and 80.95%, respectively, for precise classification of patients and healthy subjects. A. Khemphila and V. Boonijing proposed a classification technique based on multilayer perceptron (MLP) in addition to backpropagation learning algorithm and biomedical test values for diagnosing the heart disease through a feature selection algorithm. Information gain is utilized to filter features through elimination of the features which do not contribute for precise results. Total number of thirteen features is reduced to eight by using a feature selection algorithm. For the classification, ANN is used as a classifier. The accuracy of training dataset was 89.56%, while for data validation, the accuracy of 80.99% was reported.

Recently, Paul et al. proposed a fuzzy decision support system (FDSS) in order to detect the heart disease [19]. They proposed a genetic algorithm based on FDSS that has five key components such as preprocessing of the dataset, effective features selection through diverse methods, weighted fuzzy rules that are set up through genetic algorithm, generated fuzzy knowledge used to build FDSS, and heart disease prediction. The proposed system obtained the accuracy of 80%. Verma et al. proposed a hybrid model for coronary artery disease (CAD) diagnosis [20]. The proposed method consists of jeopardizing factor identifiers adopting a correlation based subset (CFS) selection with particle swarm optimization (PSO) search model and K-means. Supervised learning algorithms such as multilayer perceptron (MLP), multinomial logistic regression (MLR), fuzzy unordered rule induction algorithm (FURIA), and C4.5 are then utilized to design CAD cases. The accuracy of the proposed approach was 88.4%. The proposed model enhanced the efficiency of classification techniques from 8.3% to 11.4% of Cleaveland dataset. Shah et al. proposed a technique based on the feature extraction for reducing feature dimensions [21]. The proposed approach used probabilistic principal component analysis (PPCA). Projection dimensions are extracted through PPCA that compliments high covariance and also helps to eliminate feature dimension. Parallel analysis (PA) helps in the selection of projection vectors. The feature subset of reduce feature vector is input to the radial basis function (RBF) kernel-based support vector machines (SVMs). Two types of classification are categories into heart patient (HP) and normal subject (NS) through RBF-based SVM serves. The proposed model is tested against accuracy, specificity, and sensitivity on the datasets of UCI, i.e., Cleveland. The accuracy of the proposed model for Cleveland dataset was 82.18%, 85.82%, and 91.30%, respectively.

Most recently, Dwivedi tests the performance of different machine learning methods for the prediction of heart disease. The highest classification accuracy of 85% was reported based on logistic regression [22]. Amin et al. evaluate the different data mining methods and identify the significant features for predicting heart disease [23]. Predictive models were built from different combinations of features and well-known classification methods, e.g., LR, SVM, and K-NN. From experimental results, it was studied that the best performance of the data mining technique for classification accuracy was 87.4% for the heart disease prediction. Özşen and Güneş proposed an expert system developed from an artificial immune system (AIS) and achieved accuracy of 87% [24]. An expert system was proposed by Özşen and Güneş based on the artificial immune system (AIS). The accuracy of 87% was reported for the developed expert system. Polat et al. developed another similar system and obtained 84.5% accuracy [25]. Das et al. utilized a neural network ensemble model with the purpose of improving classification accuracy. His ensemble model obtained the classification accuracy of 89.01% [1]. Recently, Samuel et al. proposed a diagnostic system developed from ANN and Fuzzy AHP. The prediction accuracy of 91.10% was reported from the ANN and Fuzzy AHP diagnosis system [4].

As clear from the literature survey, ANN-based diagnostic systems have shown better performance on the heart disease data. Hence, we also attempt to design a diagnostic system based on neural network for heart disease detection. The development of various noninvasive diagnostic systems for heart disease detection motivates us to design an expert diagnostic system based on neural networks. From the empirical result, it is analyzed that proposed model shows promising performance. Hence, it can be used in clinics to make accurate decisions while diagnosing heart failure.

2. Materials and Methods

In previous studies, researchers used feature sets without eliminating irrelevant or noisy features. In this study, we propose a novel feature elimination method for removing noisy or irrelevant feature vectors and thus selected an optimal subset of feature vectors before feeding them to ANN or DNN. The proposed algorithm uses a window with adaptive size. The window size is initialized from one and is placed at the first feature of the feature vector. The feature or features to which the window points are eliminated while the remaining features constitute the subset of features that are supplied to the neural network for classification. To find the optimal configuration of the neural network for the subset of features, grid search algorithm is used. It is noteworthy that the previous studies utilized conventional ANN with only one hidden layer for heart failure detection problem. However, in this study, we found out that deep neural networks with more than one hidden layers and trained with new learning algorithms show better performance. Additionally, this study evaluates the feasibility of features selection algorithm at the input level of DNN. The working of the proposed diagnostic system is clearly shown in Algorithm 1 and Figure 1.

Input: { : Features size, : Hyperparameters} Output: {: Optimal Subset of features, Optimal}
(1); , where is the window size
(2)Best_Acc = 0
(4)for 1 = 
(5)for 1 = 
(6)  Acc using and
(7)  if
   Begin if
    Note down and as
   End if
(9)  END for
(10)END for
(11)Display , and
2.1. Dataset Description

For this research, an online repository of machine learning and data mining from University of California, Irvine (UCI), for the heart disease dataset was used that is known as a Cleveland heart disease database. Data were gathered from the V.A. Medical Center, Long Beach and Cleveland Clinic Foundation by Dr. Robert Detrano [26]. The dataset is comprised of 303 subjects. Furthermore, the number of subjects having missing values in the dataset is 6. In the dataset, 297 subjects have complete data values out of 303 subjects. Hence, the number of subjects that have complete data values is used for experiments. Moreover, each subject in dataset has 76 raw features. In the previous work, the researchers mostly used 13 prominent features out of 76 raw features of each subject for the diagnosis of heart disease. Therefore, mostly used 13 features for diagnosis of heart failure is considered for this study. Table 1 depicts the most commonly used 13 features of heart disease.

2.2. The Proposed Method

The proposed diagnostic system has two main components that are hybridized as one black-box model. The main reason for hybridizing the two components into one block is that they work in connection with each other. The first component of the system is a feature selection module, while the second component is a predictive model. Feature selection methods use data mining concepts to improve the performance of the machine learning models [27, 28]. The feature selection module uses a search strategy to find out the optimal subset of features which are applied to the DNN that acts as a predictive model. The feature selection module uses a window that scans the feature vector. The working of the proposed method can be depicted from the algorithm.

Initially, the size of the window is set to 1. And, the window is placed at the left most side of the feature vector with size n, i.e., having n number of features. Hence, initially, the feature is eliminated from the feature vector on that the window is placed and the remaining features constitute the subset of features which are supplied to the DNN for classification. The performance of the subset of features is saved. In the next step, the window floats towards the right direction. Again, that feature is eliminated on which the window is placed and the remaining features constitute feature subset whose performance is checked by the DNN model. The same process is repeated until the window reaches the last feature, i.e., the feature. With this, the first round of window floating is completed. It is important to note that the features subset size is in the first round.

In the next round, the window size is updated to 2. Hence, in this round, the window points towards the two features at a time. Again, the window starts the floating process from the left most side of the feature vector and eliminates the first two features. The remaining features constitute the features subset that is applied as the input to the DNN model for classification, and the results are compared with the best performance achieved on the previous subset of features. If the performance is better than the previous best performance, the best performance and optimal subset of features is updated. In the next iteration, the window floats towards the right direction and those two features are eliminated on which the window is placed. The remaining features constitute the subset of features which are applied to DNN. The same process is repeated until the window reaches to the right most side of the feature vector. This marks the end of the second round. In the third round, the window size is made 3 and the same process is repeated that was carried out for the first two rounds. Finally, at the round, the window size is made . In this round, the window can float just once towards the right. And then, the whole process is ended. Finally, the subset of features that give us the best results is declared as the optimal subset of features. The whole process of features selection through adaptive floating window is clearly illustrated in Figure 2. Each time a subset of features is supplied to the DNN, the DNN architecture is optimized using the grid search algorithm. The performance of a DNN is highly dependent on its architecture [29]. Inappropriate DNN architecture will result in poor performance although there are chances that the DNN is applied with an optimal subset of features. The main reason for such a poor performance is that, if the DNN architecture selected for the classification is with insufficient capacity, then it will result in underfitting [30, 31]. In such a case, the DNN will show poor performance on both data, i.e., training data and testing data. However, if the DNN architecture has excessive capacity, it will overfit to the training data; thus, it will show better performance on the training data but poor performance on the testing data. Hence, we need to search optimal architecture of DNN that will show good performance on both testing and training data. To understand the relationship between DNN architecture and the capacity of DNN, we need to understand the formulation of DNN. The neural network is formulated as follows:

Neural networks are generated by the computational system based on mathematical models that simulate the human brain. The key element in the neural network model is known as perceptron or a node [32]. Nodes are shaped into groups which are called layers. Artificial neurons work on the same principal which is followed by the biological neuron. As an artificial neuron receives one or more inputs from the adjoined neurons, it then processes the information and transfers the output to the next perceptron. Artificial neurons are connected through a link that is known as weights. The input information is weighted either positive or negative during the computation of output. An internal threshold value and weights are assigned for the solution of a problem under consideration. On every node, the result is calculated by multiplying the input values and associated weight that is fine-tuned by the threshold value . The output is then calculated through an activation function or transfer function and is given in the following equation:

The transfer can be linear or nonlinear. In the case of nonlinear function tangent, hyperbolic or radial basis form is applied. The sigmoid function, , is done at the following layer as an output value (equation (2)). is related to the shape of the sigmoid function. The increase in parameter value strengthened the nonlinearity of the sigmoid function:

The neural network is obtained by connecting the artificial neurons. If the constructed neural network model has only one hidden layer, we name it ANN [17]. However, if the constructed neural network model has more than one hidden layer, we name it DNN [17].

3. Validation Scheme and Evaluation Metrics

3.1. Validation Scheme

In earlier works, the performance of the expert diagnosis systems has been evaluated through holdout validation schemes. The dataset has to be partitioned into two parts: one is for training purpose, while another is used for testing. In the past, researchers have been using various train-test split percentages of data partitioning. Furthermore, Das et al. in [1] and Paul et al. in [33] used holdout validation schemes in their research. They have partitioned the dataset into 70%–30% ratio, where 70% of the data set is utilized for the training purpose of the predictive model while 30% of the dataset is utilized for testing the performance of the predictive model. Therefore, we also utilized the same criteria of data partition for train-test purpose.

3.2. Evaluation Metrics

Various evaluation metrics such as specificity, sensitivity, accuracy, and Matthews correlation coefficient are utilized for evaluating the performance and efficiency of the proposed model. The percentage of the precisely classified subjects is known as accuracy. Sensitivity is the accurate classification of the patients, whereas specificity is the absolute classification of healthy subjects. All the evaluation metrics are formulated in equations (3)–(6):where stands for true positives, describes true negatives, shows false positives, and stands for false negatives.

The characteristic of binary classification is assessed using for machine learning and statistics. The value of is ranging between −1 and 1. The −1 value of denotes the total conflict between prediction and observation, whereas 1 shows the exact prediction, while 0 describes the classification as random prediction. Moreover, in this study, another evaluation metric, namely, the receiver operating characteristic (ROC) curve was also exploited. The ROC curve is a well-known metrics that is used to statistically evaluate the quality of a predictive model. The ROC curve provides area under the curve (AUC); thus, a model is considered a better model if its AUC is high.

4. Experimental Results and Discussion

In this session, two kinds of diagnostic systems are proposed. Moreover, experiments are done to test the performance of the proposed diagnostic system. In the first experiment, FWAFE-ANN is developed and stimulated, while in the second experiment, FWAFE-DNN is utilized. In the first experiment, the FWAFE algorithm is used to construct a subset of features. Furthermore, a subset of features is applied to ANN that is used as a predictive model. In the second experiment, FWAFE is used to construct a subset of features, whereas DNN is utilized for classification. All the experiments were simulated by using Python programming software package.

4.1. Experiment No. 1: Feature Selection by FWAFE and Classification by ANN

In this experiment, at the first stage, FWAFE is used, while in the second stage, ANN is used. The feature selection module eliminates noisy and irrelevant features by exploiting a search strategy, whereas the second model is deployed as a predictive model. The proposed diagnostic system achieves accuracy of 91.11% using only a subset of features. The optimal subset of features is obtained for , , and where stands for the size of the feature subset. The simulation results are reported in Table 2. In the table, the last record displays a case where all the features are used, i.e., no feature selection is performed. It can be noticed that the best accuracy of 90% is achieved after optimizing the architecture of ANN by a grid search algorithm using all the features. Thus, it is evidently coherent that the proposed model is competent as it presents us better performance with the least number of features. The best performance of the proposed model is observed at 11 features for the peak training accuracy. Additionally, the feature selection module increases the performance of the optimized ANN by 1.11%. Moreover, denotes the features that are eliminated from the features, space during the feature selection process. The results from distinct subsets of features and diverse hyperparameters are displayed in Table 2.

4.2. Experiment No. 2: Feature Selection by FWAFE and Classification by DNN

In this experiment, at the first stage, FWAFE is used, while at the second stage, DNN is implied. The feature selection module eliminates noisy and irrelevant features by exploiting a search strategy, whereas the second model is utilized as a predictive model. The proposed diagnostic system achieves an accuracy of 93.33% using only a subset of features. The optimal subset of features is obtained for which includes , i.e., by eliminating feature number 5 and 6. The experimental outcomes are displayed in Table 3. To validate the effectiveness of the proposed feature selection method, i.e., FWAFE, the experiment is performed using the DNN model on full features without using the feature selection module. The DNN architecture was optimized using grid search algorithm. The best accuracy of 90% was obtained using neural network with four layers. The size of layer is equivalent to the number of features, layer consists 50 neurons, and layer contains 2 neurons and output layer has only one neuron. In Table 3, the last row represents a case, whereas all features are utilized. Hence, it is evidently clear that the feature selection module boots the performance of DNN by 3.33%. Moreover, FWAFE-DNN shows better performance than FWAFE-ANN. The results at distinct subsets of features on various hyperparameters are shown in Table 3. The ROC charts are utilized to analyze the performance of the proposed model. A method whose ROC chart has maximum area beneath the curve is considered the best. The ROC chart whose points are in the upper left corner is considered to be the best. Figure 3(a) shows the ROC chart of the proposed FWAFE-ANN diagnostic system, while Figure 3(b) denotes the ROC chart of the ANN-based diagnostic system. From the figure, it is evidently vivid that the feature selection module increases the performance of the ANN model owing to more area beneath the curve. Similarly, Figure 4(a) represents the ROC chart of the proposed FWAFE-DNN diagnostic system, while Figure 4(b) depicts the ROC chart of DNN-based diagnostic system. From the figure, it is clearly observed that the feature selection module also increases the performance of the DNN model.

4.3. Experiment No. 3: Results of Other State-of-the-Art Machine Learning Models

In this segment, a comparative analysis is done with other state-of-the-art machine learning models on biomedical datasets against our proposed model. The classifier selected for comparison are random forest (RF) classifier, randomized decision tree classifier, Adaboost ensemble classifier, SVM with radial basis function (RBF) kernel, and linear support vector machine (SVM). Table 4 denotes the results of the abovementioned models. The performance of each model with hyperparameters values is also depicted from Table 4. For Adaboost classifier, the hyperparameter represents the maximum number of estimators at which the boosting is terminated. For the RF classifier, the hyperparameter denotes the number of trees in the forest. The ensemble model based on randomized decision trees used average for improving the prediction accuracy. In case of SVM, the width of the Gaussian kernel is denoted by and soft margin constant is denoted by . Lastly, the number of neurons in hidden layer of DNN is denoted in the last record of the table , i.e., , and the number of neurons in hidden layer of the DNN, i.e., . Moreover, the performance of the proposed model is evidently precised and then the various state-of-the-art ensemble models as well as SVM model and are depicted from Table 4.

4.4. Comparative Study with Previously Reported Methods

In this section, experimental results of the proposed method are compared with those of the other methods discussed in the literature. The performance comparison is based on the prediction accuracy. Hence, Table 5 tabulates the prediction accuracies of our proposed method and other previously proposed methods in the literature. From the experimental outcomes, it is evident that the proposed hybrid method shows promising performance on heart disease, while the main limitation of the proposed method is its high time complexity.

From Table 5, it can be seen that many studies proposed numerous methods for automated detection of HF. For example, Ali et al. developed a two stage system using linear SVM at the first stage for feature selection and linear discriminant analysis model for classification at the second stage and obtained 90% accuracy. In another study, Verma et al. in [20] utilized the correlation-based feature subset (CFS) for feature selection and particle swarm optimization (PSO) algorithm for k-means clustering. Their method produced an accuracy of 88.4%. Saqlain et al in [21] proposed probabilistic principle component analysis and obtained an accuracy of 91.30%. Ali et al. in [34] proposed a novel hybrid method for improving the heart disease prediction accuracy. Their proposed method utilized linear SVM for feature selection and another SVM (with linear and nonlinear kernels) for classification. Their proposed method produced 92.22% heart disease detection accuracy. Hence, based on comparison with these methods, it is clear that our proposed method is a step forward in improving heart disease detection accuracy.

5. Conclusions

In this paper, an effort has been made to design a two stage diagnostic system that can improve the prediction accuracy of heart risk failure prediction. Two types of systems were developed. Both systems used same feature selection method, while the first system used ANN for classification and the second system used DNN for classification. A classification accuracy of 91.11% was achieved with the ANN-based system, while an accuracy of 93.33% was obtained with the DNN-based diagnostic system. It was also observed that the proposed diagnostic system shows better performance than other state-of-the-art machine learning models. From the experimental results, it can be safely concluded that the proposed system can help the physicians to make accurate decision while diagnosing heart disease.

Data Availability

All the data used in this study are available at UCI machine learning repository.

Conflicts of Interest

The authors declare that they have no conflicts of interest.


This work was supported by the Basic Science Research through the National Research Foundation of Korea (NRF) funded by the Ministry of Education under Grant NRF-2017R1D1A3B04031440 and by the Natural National Science Foundation of China under grant 61472066, Sichuan Science and Technology Program (Nos. 2018GZ0180, 2018GZ0085, 2017GZDZX0001, 281 2017GZDZX0002, 2018GZDZX0006, and 2018FZ0097).