#### Abstract

One of the main reasons for disability and premature mortality in the world is diabetes disease, which can cause different sorts of damage to organs such as kidneys, eyes, and heart arteries. The deaths by diabetes are increasing each year, so the need to develop a system that can effectively diagnose diabetes patients becomes inevitable. In this work, an efficient medical decision system for diabetes prediction based on Deep Neural Network (DNN) is presented. Such algorithms are state-of-the-art in computer vision, language processing, and image analysis, and when applied in healthcare for prediction and diagnosis purposes, these algorithms can produce highly accurate results. Moreover, they can be combined with medical knowledge to improve decision-making effectiveness, adaptability, and transparency. A performance comparison between the DNN algorithm and some well-known machine learning techniques as well as the state-of-the-art methods is presented. The obtained results showed that our proposed method based on the DNN technique provides promising performances with an accuracy of 99.75% and an F1-score of 99.66%. This improvement can reduce time, efforts, and labor in healthcare services as well as increasing the final decision accuracy.

#### 1. Introduction

Diabetes is a noncommunicable chronic disease that disrupts the body's natural blood glucose concentration management with disorders of carbohydrate, fats, and protein metabolism due to imperfections in insulin secretion, insulin action, or both of them [1–5]. The chronic hyperglycemia of diabetes is associated with long-term damage, dysfunction, and failure of different organs, especially the eyes, kidneys, nerves, heart, and blood vessels [1, 6, 7]. According to the World Health Organization, an estimation of about 422 million people worldwide have diabetes, and this number is expected to grow up to 693 million by 2045, and 1.6 million deaths are directly attributed to diabetes each year [8]. On the other hand, the worldwide economic expenditures for diabetes were estimated to be approximately USD 760 billion, and it is expected to reach over USD 802 billion in 2040 [9]. Day by day, both the number of cases and the prevalence of diabetes have been steadily increasing over the past few decades especially in the second- and third-world countries [2].

Medical diabetes diagnosis is one of the most challenging and important tasks in medicine [1]. To get the prediction of the disease, several parameters must be collected such as plasma glucose concentration, diastolic blood pressure, triceps skinfold thickness, serum insulin, body mass, and age [2, 4], which may take a long time to analyze and make the final decision [1]. Therefore, advanced computer and information technologies such as machine learning algorithms are used rather than traditional approaches [6]. This latter can help the physicians make critical medical decisions in a short time with small effort and little money with more accurate decisions [1].

Actually, machine learning techniques have been widely used in healthcare systems to make decisions based on clinical data [6, 10–19]. In this context, many researchers have used them for the diagnosis of diabetes. Yuvraj and his colleagues [20] have proposed an implementation of machine learning algorithms like Random Forest (RF), Decision Tree (DT), and Naïve Bayes (NB) in Hadoop based clusters environment for diabetes prediction. The RF algorithm produces the highest accuracy compared to other algorithms. In [21], the authors developed a prediction model using DT approach to identify low-risk individuals for incidence of type 2 diabetes for the Tehran Lipid and Glucose Study (TLGS) database. Moreover, different classification algorithms, such as Support Vector Machine (SVM), Multilayer Perceptron (MLP), Logistic Regression (LR), RF, and DT, have been compared in [22]. The K-fold cross-validation technique has been used to accurately classify diabetes. The MLP classifier achieved the highest accuracy. According to Jakka and Vakula [23], the performance of the diabetes prediction has been evaluated using several classification algorithms such as K-Nearest Neighbor (KNN), DT, NB, SVM, LR, and RF. The best accuracy achieved was with LR algorithm compared to other algorithms. Similarly, the authors in [24] have used many machine learning classification techniques such as DT, SVM, NB, RF, KNN, and LR to predict the disease, where LR and SVM algorithms work well on diabetes prediction compared to other techniques. In [25], the authors have proposed a comparative study on the disease diagnosis by using Levenberg-Marquardt (LM) and probabilistic MLP techniques, where the first one gave the highest classification accuracy. In [26], T. Roopesh et al. have employed a system to assess the performance of diabetes prediction using different machine learning algorithms by classification, regression, and clustering. Both the SVM and linear regression have obtained the highest accuracy in comparison with other techniques. Besides, Zou et al. [27] have made a comparative study between three classifiers (Neural Network, RF, and DT), where the latter was the better. In [28], a comprehensive comparative study was applied on various machine learning algorithms such as SVM, KNN, DT, NB, and LR for the disease classification, where LR gave the most accurate results. Likewise, Mujumdar and Vaidehi [29] have implemented many machine learning algorithms for diabetes prediction such as SVM, RF, DT, Extra Tree Classifier, AdaBoost algorithm, Perceptron, Linear Discriminant Analysis (LDA), LR, KNN, Gaussian NB, Bagging, and Gradient Boost. The LR gave the highest accuracy with 96%. Eventually, the authors in [30] used several machine learning algorithms including SVM, KNN, LR, DT, RF, and NB to predict diabetes disease. Both SVM and KNN algorithms provided the highest accuracy rate compared to the other algorithms.

However, machine learning techniques present some limitations in terms of precision and feature selection [1]. This drawback has been lifted by the Deep Learning (DL) algorithms, which are used widely in many forms in medical fields [31–37]. Numerous studies show that DL techniques give better results by minimizing the error rate, increasing the precision, and better resisting the noise, compared to other techniques [1, 3]. DL techniques can perfectly handle a massive number of datasets and have the ability to deal with complex problems at ease [1], which makes them very adequate for our diabetes disease prediction system [6].

In this paper, we propose a diabetes prediction system for better diagnosis. Our work focuses on the following points:(1)Set up a system architecture for diabetes prediction based on DNN algorithm in order to make an efficient decision to the diabetes diagnosing; • An evaluation of four different DNN architectures to get the best model.(2)A comparison of best DNN model’s results against those of many well-known ML classifiers such as LR, SVM, XGBoost, DT, and RF.(3)Furthermore, we compare our proposed method with the state-of-the-art methods that used the same datasets, the same experimental protocol, and the same performance measurements.

The rest of the paper is planned as follows: The second section provides an overview of the proposed system. The section that follows presents results and analyses. Then, we show the comparison of the state-of-the-art techniques. At last, Section 5 concludes the paper.

#### 2. Proposed System

The proposed diabetes disease prediction system consists of many steps which are perfectly linked to each other to get the desired results. The first step consists of splitting the used dataset into two subsets, training and testing data. Then, we applied two different categories (ML and DL methods) in order to carry out the training phase using the training samples with the best parameters. Eventually, the trained models will be able to predict the testing samples. The overall flowchart of the proposed system is shown in Figure 1.

##### 2.1. Dataset Description

To evaluate the performance of this work, we used the famous diabetes dataset taken from Frankfurt Hospital, Germany [38]. This latter contains 2000 records with 9 attributes for each one. A brief overview of the attributes can be found in Table 1, while the 9th one is considered as the target that shows the absence or presence of the disease (value of 0 or 1, respectively). In this dataset, 32.4% of the records had a value of 1 and the rest had a value of 0 (67.6%), taking into consideration the fact that all the patients are females and their ages are between 21 and 81. The first attribute “Pregnancies” shows the pregnancy frequency and it is described from 0 to 17. The Glucose attribute is the result of Glucose Tolerance Test, which examines how the body moves sugar from the blood into tissues such as muscle and fat; it has values ranging from 0 to 199. BloodPressure is the pressure in the arteries when the heart stops between beats; it has been recorded with a range of values from 0 to 122. Insulin is a hormone that aids in the movement of glucose (blood sugar) from the bloodstream into the cells, and its values are from 0 to 864. The SkinThickness attribute provides information about the fat reserves of the body, it has values from 0 to 99. The BMI attribute offers a quick and accurate way to determine whether a patient is overweight or underweight. It has been recorded with a range of values from 0 to 67.1. Finally, DiabetesPedigreeFunction provides a synthesis of the diabetes mellitus history in relatives and the genetic relationship of those relatives to the subject, which can take float values from 0.078 to 2.42.

##### 2.2. Dataset Preprocessing

Data preprocessing is a crucial stage that transforms the data into a usable and efficient format, so that it can fit as an input to the machine learning algorithm. In our system, only one technique has been used for data preprocessing, which is data normalization. This latter is generally considered as the process of data structuring. It is also called StandardScaler normalization, where all the values of the attributes are within [−1, 1]. The StandardScaler formula is shown below in equation (1), where *X* represents the input columns of the dataset to transform and *X*_STS represents the transformed ones [39].

##### 2.3. Prediction Methods

In this subsection, we briefly describe the different machine learning methods as well as the Deep Neural Networks that we used for evaluating the proposed system.

###### 2.3.1. Logistic Regression

Logistic Regression (LR) is a subset of generalized linear models which deals with the analysis of binary data, which seeks out the best-fitting model for describing the connection between dependent and independent predictors [40, 41]. When it comes to predicting sickness or health status, the LR model is most commonly used [42, 43]. Based on the risk factors given, the LR model can calculate the likelihood of an individual acquiring diabetes disease [43].

If a person suffers from diabetes disease, the value of target is 1; otherwise, target is 0. We determined that the probability of an individual developing diabetes disease is *P* (*X*). The LR model's formula is defined as follows:

After exponentiating both sides, we obtain

The probability of an individual developing diabetes disease can be written aswhere represent the risk factors and are regression coefficients.

###### 2.3.2. Support Vector Machine

SVM is a nonprobabilistic classifier with a separating hyperplane as its formal definition. The technique creates an ideal hyperplane with the greatest distance from the support vectors based on the available training data (supervised learning). This hyperplane is a line that divides a plane into two classes in two-dimensional space. The epsilon *ε*, regularization, and kernel parameters are the SVM classifier's tuning parameters [6, 44]. The principle of SVM is shown below in Figure 2.

###### 2.3.3. Extreme Gradient Boosting (XGBoost)

The Extreme Gradient Boosting is an improved supervised algorithm proposed by Chen and Guestrin [45] based on the Gradient Boosting Decision Tree algorithm [46]. XGBoost can be used to solve problems for regression and classification, which has been chosen to be used by data scientists because of its high execution speed and the high accuracy that it supplies [47]. The XGBoost objective function includes its loss function and regularization term, which can help to prevent overfitting by smoothing the final learned weights to obtain an optimal solution [48]. The loss function controls the ability of the prediction, which determines the deviation between predicted label and the actual label . The regularization term controls the complexity of the model and it can also handle the overfitting issue [48, 49]. XGBoost can also optimize the loss function using first-order and second-order gradient statistics. The objective function for XGBoost is defined as follows [49]:

The predicted label of the tree boosting model can be expressed as the total sum of all the trees prediction scores , where refers to how many trees are in XGBoost model and refers to the instances samples for a given dataset. Finally is the space of classification and regression trees (also referred as CART) [46–48]:

The regularization term for penalizing the complexity of each tree is shown in equation (7), where *T* denotes the number of leaves in the tree, is a regularization hyperparameter for controlling the L2-norm of the weights of leaf , and is a regularization hyperparameter for the simplicity cost by introducing additional leaf depending on each dataset [49, 50].

The main concept behind boosting is to create a more accurate model by combining a lot of simple trees with low accuracy, which will create a new tree for each iteration. There are many different methods for creating a new tree [50]. The common one is called Gradient Tree Boosting which is an improved version of tree boosting by training tree model using the gradient descent to generate the new tree based on all previous trees. Therefore, can be represented by , and the objective function in the step *t* is as follows [48]:

The first-order and second-order gradient statistics of the loss function are shown below in the two following equations, respectively:

It is worth noticing that and can help to find the optimal weights . Hence, the objective function becomes [47, 49]

###### 2.3.4. Decision Tree

DT is a nonparametric supervised learning algorithm for regression and classification tasks. DT (Figure 3) can be seen as a construction model that includes root node, division, and leaf node. Each internal node represents a test on an attribute, each division represents the outcome of test, and each leaf node grips the class label. The opening node in the tree is the root node. First, an attribute is selected and sited at the root node. Then, a division is made for each possible value. This splits dataset into subgroups, one for every value of the attribute. The tree process is recursively repeated for each division using only those cases that reach the branch. When all cases on a node have the same classification, the tree progress can be stopped. Usually, entropy or classification error is used to define the best tree division [51, 52].

###### 2.3.5. Random Forest

RF is one of the most common uses of classifier integration. As shown in Figure 4, RF is made up of numerous separate Decision Tree classifiers that vote on test samples according to a set of criteria [53, 54]. The steps are as follows:(i)Extracting some samples from the training set as a training subset using the bootstrap method, which is a self-help sampling approach.(ii)A number of features are randomly picked from the feature set for the training subset as the basis for splitting each node of the Decision Tree.(iii)Repeat steps (i)-(ii) to generate a large number of training subsets and Decision Trees, which are then combined to build a Random Forest.(iv)The test set's samples are fed into the Random Forest, where each Decision Tree makes a choice based on the data. After receiving the findings, the results are voted on using a voting technique to determine the sample categorization results.(v)Repeat step (iv) until all of the test sets have been classified [55].

###### 2.3.6. Deep Neural Networks

The Deep Neural Networks (DNNs) are one of the architectures of Deep Learning [56]. DNNs have the same basic architecture as ANNs, with the exception that DNNs may have several hidden layers; that is why we use the term “deep.” A Deep Neural Network can hold almost 150 hidden layers [1], and each layer can have several neurons as shown in Figure 5 and, in each layer of neurons, the input of a layer depends on the previous layer’s output and so on until we get the prediction of our model in the output layer [57].

The final output value of the first neuron for hidden layer (1) is , which is the sum of the products of the various weights and inputs with the bias as shown in equation (12). The value that can take is any number from -∞ to +∞ so the neuron cannot decide whether to fire or not. Activation functions are responsible for deciding whether the neuron will fire or not and calculating which would be the input for the next layer and so on [57]. The two activation functions used in the proposed model are the ReLU for the hidden layers and the Sigmoid for the output layer (binary classification).

#### 3. Experimental Results

In this section, we evaluate the performance of DNN algorithm by using the testing data to assess the effectiveness of our system based on several evaluation metrics. Besides, comparison between our proposed model and the machine learning algorithms described in section (2.3) has been conducted in order to demonstrate the superiority of our model. The used dataset was split into two subsets, the first one for training which contains 80% of the whole data (547 diabetics/1053 nondiabetics) and the other for testing which contains 20% of the whole data (137 diabetics/263 nondiabetics).

##### 3.1. Evaluation Metrics

The confusion matrix (Figure 6) is considered as a great tool to show the results summary of a model with the classification issues [1, 56]. In the classification, the prediction can be one of four special cases as follows.

If the actual value of the target in the dataset is True and the classifier predicts it as such, then the prediction is a True Positive (TP). On the contrary, if the classifier predicts it as False, then the prediction is a False Negative (FN). Similarly, if the actual value of the target in the dataset is False and the classifier predicts it as such, then the prediction is True Negative (TN). On the contrary, if the classifier predicts it as True, then the prediction is False Positive (FP) [58].

Finding out how the developed predictive model performs becomes easy with the help of the confusion matrix, which is clearly shown above in Figure 6. The following metrics are used to evaluate the proposed model [49, 56–59].

**Accuracy (Acc)** is the percentage of the correct predictions that a classifier has made compared with the actual values of the target in the testing phase.

**Sensitivity (Sens)** gives information about the percentage of True Positives that are correctly classified during the test.

**Specificity (Spec)** gives information about of True Negatives that are correctly classified during the test.

**Precision (Pre)** is the percentage of instances that a classifier has labelled as positive with respect to the total predictive positives (the exactness of a classifier).

**F1-score** shows the harmonic mean of precision and recall.

##### 3.2. Prediction with ML Methods

A comparative analysis of all the conventional machine learning algorithms has been done in this section for diabetes prediction. It has been done for comparing and analyzing accuracies of all the conventional algorithms.

###### 3.2.1. Hyperparameter Optimization

Hyperparameter optimization (i.e., tuning) is important because it directly controls the behavior of the training process of the algorithm and has a significant impact on the performance of the model. There are four common methods of hyperparameter optimization: Manual search, Random search, Bayesian optimization, and Grid search [56, 58]. In this work, we applied the Grid search method for each algorithm which systematically builds and evaluates a model for each combination of parameters in a specific grid.

We implemented five machine learning classifiers for binary classification by determining whether or not the patient has diabetes, where each classifier has many different hyperparameters that are not necessary to change, but the main of them needs to be altered to get a good model. Thus, to achieve better results, these parameters and their default values for each algorithm are shown in Table 2.

Now in order to show the impact of hyperparameters optimization on the overall system results, we compare the performances of the selected ML algorithms with and without the use of this process. Table 3 presents the average score obtained from each classifier using five metrics. We clearly see that all prediction methods give better results than without optimization, while RF gives the highest performance among the others.

##### 3.3. Evaluation of the DNN Method

There are different types of layers in DNN. In this work, three types of layers were implemented: a dense layer, which consists of a matrix of weights and the bias; a dropout layer, which can prevent an overfitting issue by dropping out certain fractions of layer's inputs units at each stage of training [1, 60]; and a batch normalization layer, which performs synchronized rescaling for the layer's inputs. We used the Early Stopping technique, which controls the improvement of our model [61]. We have made many experiments by changing the number of layers, the number of neurons in each layer, and different types of layers as shown in Table 4.

As shown in Table 5, the DNN model number 4 is the best one with the following parameters: Epochs = 500, Batch_size = 200, and Random_state = 0. Therefore, this model is considered for the rest of this study. The confusion matrix of DNN prediction results is shown in Figure 7. The performance of the model can be easily got using this confusion matrix by determining the metrics summarized in Table 5.

The behavior of the accuracy is shown in Figure 8, where the blue line represents the training phase, and the orange one represents the testing phase resulting in the best values of the accuracy, 99.0% and 99.75%, respectively.

##### 3.4. Performance Comparison

To give an idea of how the proposed DNN has superior performance, we compared it with other prediction methods evaluated above *x*. In the following, we discuss the obtained performance for each classifier using Boxplot diagrams.

###### 3.4.1. Accuracy

The accuracy performance of the proposed DNN in comparison with five ML methods is shown in Figure 9. Obviously, DNN achieved the highest ACC with 99.75%, where all the implemented ML methods also perform excellently. Only LR performs relatively poorly with an ACC less than 80%.

###### 3.4.2. Specificity

Figure 10 shows the specificity performance of the proposed DNN in comparison with other ML methods that performed excellently with more than 96%, except LR that shows the lowest specificity. The highest value of specificity is 99.60%, and it was achieved by the DNN method.

###### 3.4.3. Sensitivity

The sensitivity performance of the proposed DNN and ML methods is shown in Figure 11. The proposed DNN has achieved the highest sensitivity with 100.0%. The other ML methods performed excellently with more than 95%, except LR method that presented a very bad performance.

###### 3.4.4. Precision

The precision performance of the proposed DNN and ML methods is presented in Figure 12. The highest precision achieved (99.32%) was that obtained with DNN method. In addition, the ML methods have achieved a good range of precision with more than 93%, except the LR method that gave the worst precision.

###### 3.4.5. F1-Score

The F1-score performance of the proposed DNN and other ML methods is shown in Figure 13. Except LR technique, all used methods performed excellently with F1-score greater than 94%. The highest value of F1-score (99.66%) was achieved by using DNN.

Based on these statistics, it was observed that the proposed DNN is the better prediction model among the other implemented ML methods.

#### 4. Comparison with the State-of-the-Art Methods

To present how well our diabetes prediction system performs, we compared it with other works that used the same dataset and the same performance measures. It is worth noting that this comparison was based only on the accuracy metric because the other evaluation metrics are not available. As observed from Table 6, the proposed DNN prediction outperforms works reported in literature.

#### 5. Conclusion

In this study, we proposed an efficient diabetes prediction system based on Deep Neural Network (DNN) algorithm to identify whether or not a person has diabetes. We presented a comparative study between the Deep Neural Network (DNN) and several machine learning techniques. The performance evaluation of these models that have been studied and evaluated on various performance metrics such as accuracy, specificity, sensitivity, precision, and F1-score proved the superiority of the proposed DNN method. Furthermore, we performed a comparison between our system and the state-of-the-art methods. This comparison showed that a diabetes prediction system based on DNN algorithm could significantly provide promising, better performances compared to the state-of-the-art techniques. Applying this method can have a direct impact and economic saving on the design and development of diabetes disease prediction system in healthcare.

#### Data Availability

The data used to support the findings of this study are freely available.

#### Conflicts of Interest

The authors declare that they have no conflicts of interest.

#### Acknowledgments

The authors of this study would like to express their gratitude to the members of LASS Laboratory (University of M’sila, Algeria) for their support and assistance in publishing this work. The publication of this article was funded by the Qatar National Library. The authors would like to thank Qatar National Library (QNL) for supporting the publication charges of this article.