#### Abstract

Using strategies that obtain biomarkers where early symptoms coincide, the early detection of Alzheimer’s disease and its complications is essential. Electroencephalogram is a technology that allows thousands of neurons with equal spatial orientation of the duration of cerebral cortex electrical activity to be registered by postsynaptic potential. Therefore, in this paper, the time-dependent power spectrum descriptors are used to diagnose the electroencephalogram signal function from three groups: mild cognitive impairment, Alzheimer’s disease, and healthy control test samples. The final feature used in three modes of traditional classification methods is recorded: -nearest neighbors, support vector machine, linear discriminant analysis approaches, and documented results. Finally, for Alzheimer’s disease patient classification, the convolutional neural network architecture is presented. The results are indicated using output assessment. For the convolutional neural network approach, the accurate meaning of accuracy is 82.3%. 85% of mild cognitive impairment cases are accurately detected in-depth, but 89.1% of the Alzheimer’s disease and 75% of the healthy population are correctly diagnosed. The presented convolutional neural network outperforms other approaches because performance and the -nearest neighbors’ approach is the next target. The linear discriminant analysis and support vector machine were at the low area under the curve values.

#### 1. Introduction

The term “dementia” refers to many neurodegenerative illnesses caused by neuronal failure and death that interrupt cognitive and behavioral activities. The most prevalent of the several types of dementia is Alzheimer’s disease (AD), with about 70% of worldwide dementia cases. It affects the individual over 65 years, and the rate of occurrence increases exponentially at the age of 65 years [1–3]. To date, AD has not been resolved by palliative therapies, which have been temporarily slow to deteriorate in patients and caregiver living [4]. Today, only postmortem diagnosis of definitive AD is possible after examining the structural brain injury that is typical of the condition. Accuracies of up to 90% have usually been recorded for modern testing procedures, such as neurological assessments and medical history. The National Institute on Aging and Alzheimer’s Association has established the existing standards of clinical diagnosis of AD, and the Alzheimer’s Association has established them [5]. These standards are an advancement in the previous guidelines, which had been developed in 1984 by the National Institute of Neurological And Communicative Diseases and Stroke/Alzheimer’s Disease and Related Disorders Association (NINCDS-ADRDA) [6]. It is part of the NINCDS-ADRDA guideline. These revised suggestions require neuroimagery and the use of biomarkers and cerebrospinal fluid to diagnose AD for symptomatic people [5].

A guideline for diagnosing and monitoring AD [7] was established by the European Federation of Neurological Societies. The Mini-Mental State Assessment [8, 9] is the most used AD diagnosis method to test cognitive ability. The revised Montreal Cognitive Assessment [10] is commonly used in therapeutic functional applications and the revised Addenbrooke Cognitive Evaluation [11]. Another example of neurological testing is the Extreme Cognitive Disorder, Alzheimer’s Cognitive Disease Evaluation Scale, Neuropsychological Assessment Battery, and Serious Impairment Battery. The Trail Making Test [12] and the clock drawing test [13], by contrast, focus not only on testing thinking abilities but also on concentrating and administrative work. In comparison, the visual learning test and the Rey Auditory Fluency Assessment assess all patient support practice skills [14]. In some instances, AD is also associated with other diseases that cause dementia as brain vascular injury, Lewy body disease, and Parkinson’s disease [15]. The early diagnosis of AD and these problems is improved using methods that gain biomarkers as early signs overlap [16–19]. Electroencephalogram (EEG) is a technology that enables the recording by postsynaptic potentials of a thousand neurons of identical spatial orientation of the time of cerebral cortex electrical activity. Scalp-positioned electrodes measure the electrical potentials. EEG’s spatial resolution refers to the number and location of electrodes on the scalp. The most used configuration is the international 10-20 system, which consists of 21 electrodes; the 10-20 system is often used for higher density versions, for instance, 10-10 and 10-5, usually 64 and 128 electrodes, and the Maudsley [20] and geodesic positions [21] alternating layouts. Reliable therapeutic methods have been shown in recent years for the diagnosis and analysis of disorders and cortical conditions like the Huntington syndrome, the autism spectrum disorder [22], epilepsy and seizure [23], brain ischemia [24], frontotemporal dementia [25], and Parkinson’s dementia [26]. Furthermore, EEG evaluations were carried out on the comparative diagnosis of AD and other dementia-contributing diseases such as brain vascular injuries [27, 28] and Lewy disease [29, 30]. Theta () 4–8 Hz, delta () 0.1–4 Hz, beta () 12–30 Hz, alpha () 8–12 Hz, and Hz are typically divided between 5 major frequency bands in the analytics. Also, more divisions into these bands (high alpha, low alpha, low beta) are considered, but the subband frequency limits are not uniform in all studies. The different data on brain function and synchronization are given in each frequency band [31–33]. There has been a comprehensive study of the possible use of electroencephalography to diagnose dementia and AD [34]. EEG is a high time resolution noninvasive, comparatively inexpensive, and potentially mobile technology (about milliseconds). It was studied primarily as an AD diagnostic tool when comparing EEG records in AD patients with control subjects (healthy individuals) [35, 36]. AD is generally known to decrease the complexity of EEG signals and synchronous change in EEG.

These improvements have been used as discriminatory features for AD diagnosis in EEG recordings. Several methods of assessing the complexity of EEG signals have been established. The connection factor and the first positive exponent of Lyapunov have been used frequently [37–42]. EEG signals from AD patients have been shown to show lower (lower complexity) values of certain tests than signals from age-matched control subjects. Other information-theoretical methods, in specific entropy-based approaches, have appeared as theoretically useful EEG indicators for AD: epoch-based entropy [43, 44], sample entropy [45], Tsallis entropy [46], approximate entropy [47, 48], multiscale entropy [49], and complexity of Lempel-Ziv [50]. These approaches relate the strength of a signal to unpredictability: irregular signals are more complicated than regular ones because they are erratic. Different detection algorithms have been suggested in previous studies for epileptiform EEG data [51]. Current seizure detection methods use hand-built feature extraction techniques from EEG signals [52], including frequency domain, time domain, time-frequency domain, and nonlinear evaluation of signals [53, 54]. The features selected must be listed after the feature extraction to identify various EEG signals using all forms of classifiers [55]. Hamad et al. employed a differential wavelet transformation procedure to obtain the feature collection, then trained the radial reference algorithm with the support vector machine (SVM), demonstrating an epilepsy diagnosis with the suggested SVM gray wolf optimizer [56]. For the refinement of the SVM parameters based on genetic algorithms and particle swarm optimization, Subasi et al. developed a hybrid model. The proposed SVM hybrid model shows that neuroscientists use EEG as an essential method for diagnosing epileptic seizures [57]. However, the manual function selection criteria are not eradicated by these strategies [58]. The feature extraction is an important stage in classification determination, as it determines its specificity in large part. A system for classifying without the removal of complicated properties was suggested. Furthermore, recent advancements in deep learning have shown a new way of coping with this problem. Deep learning has in recent years reached the recognized form of computer vision and machine learning and has demonstrated that almost all human and superhuman functions such as pattern recognition and sequence learning perform numerous functions [59], among other things. Feature extraction before classification is more advantageous than entering raw EEG samples directly into the classifier. Nevertheless, several recent research types have not performed feature extraction, but instead, raw EEG signals were used for the deep learning model [60, 61].

In this paper, the time-dependent power spectrum descriptor (TD-PSD) method is utilized for feature extraction of the EEG signal from three categories of MCI, AD, and HC sample test. The final feature with labeling is used in three types of traditional classification methods, including -nearest neighboring (KNN), SVM, and linear discriminant analysis (LDA) approaches, and the results are recorded. Finally, an architecture of convolutional neural network (CNN) for AD patients’ classification is provided. The results are indicated using performance analysis.

#### 2. Literature Review

EEG signals’ complex and nonlinear nature implies creating new methods to study machine and signal processing [62, 63]. Recent progress has been made to enhance high-level abstractive methods for the automated removal of complex data features in the field of deep learning methodologies [64–66]. In the last years, these deep learning methods are usually used in image processing, natural language processing [67–70], speech processing, and video games [71]. The biomedical area has also been identified with these methods [72–74]. Acharya et al. [60] suggested a deep, 13-layer neural CNN that distinguishes normal, preictal, and EEG signals of seizure. In the study, 300 EEG signals were used for registering a classification rate of 88.67%. The same group proposed a deep neural network approach for an innovative EEG-based depression screening method [60]. This investigation’s outcomes are reported in 15 regular and 15 depressive patients, 93.5% (left hemisphere) and 96.0% (right hemisphere). Oh et al. [75] suggested that further studies used EEG signals to diagnose Parkinson’s disease. A 13-layer CNN model of 20 healthy and 20 Parkinson patients reached an accuracy of 88.25%. Hong et al. [76] propose a mathematical model employing Long Short-Term Memory (LSTM), a recurrent neural network (RNN) that predicts the mild cognitive impairment of AD. The data is taken in image form in this research, and the preprocessing is done by skull strip, normalization, registration, smoothing, and segmentation. The training is carried out by feeding sequential data with time steps to the model after preprocessing, and the model projects the state of the next six months. During model testing, when the feature data for the 18th and 24th months is presented, it forecasts the state of the subject for the 30th month. Similarly, Aghili et al. [77] suggest an RNN approach to evaluate longitudinal data to differentiate between stable people against AD individuals. The input is preprocessed, and the function is normalized. It is fed into LSTM and gated recurring units after preprocessing the data. In the model of LSTM and gated regular units, each subject’s time point data is provided to the corresponding cell along with its final diagnosis mark to learn the pattern of data transition. The effect models are contrasted with the effects of nonrecurrent networks, i.e., multilayer perceptron (MLP), for all data arrangements. For each patient, the data is fed into the MLP once. There are many trainable parameters in the LSTM models that need to be substantially trained for sequential data and are vulnerable to overfitting the training data. The purpose of unsupervised feature learning is to define AD using the principle of unsupervised feature learning. An approach used sparse filtering to learn the expressive characteristics of brain images [78]. The SoftMax regression is trained to classify the circumstances. In the first step, there are three phases: sparse filtering is trained, and its W weight matrix is calculated. The sparse filtering learned is used to extract from each sample the local characteristics. In terms of negative matrix factorization (NMF) and SVM with limitations of certainty, Padilla et al. [79] offer a novel conclusion technique for the early determination of Alzheimer’s disease. Through implementing the Fisher discriminant proportion and non-NMF for highlighting preference and extraction of the most important highlights, the single-photon emission computed tomography and positron emission tomography datasets are studied (see Table 1).

#### 3. Research Methodology

The proposed study uses the EEG signal to describe the phases of the disorder. It is suggested that a deep CNN network architecture is learned to distinguish multichannel human EEG signal data into different stages and that increases the efficiency of classification. This work includes the modules below: (i)Preprocessing(ii)Feature extraction(iii)Classification

##### 3.1. Feature Extraction

The EEG trace is expected to be explicated in a function of frequency, using the discrete Fourier transform, as a product of the sampled representation of the EEG signal as with , length , and sampling frequency fs (Hz). Parseval’s theorem explains that the function’s total square is the complete square of its transformation; the process starts with the extraction of features.

According to the above equation, is the phase-excluded power spectrum. It means that the frequency index is obtained by multiplyingby theconjugate divided by, where the phase-excluded power spectrograph is, i.e.,has its conjugate, separated by, which is compounded by, and frequency index. The full definition of the frequency as obtained by the Fourier transform is usually well-known to be symmetrical concerning zero frequency; i.e., it has similar sections extending to the frequencies, which are both positive and negative [89]. The whole spectrum, including positive and negative frequencies, is free from this symmetry. Access to spectral power from the time domain has not been completed. According to the concept of a one-minute of the order of the power spectral density, all irregular moments are also zero by the frequency distribution model’s statistical approach.

In the latter equation, the Parseval theorem of Equation (1) may be used where is used, and the Fourier transform time-differentiation feature for nonzero quantities of is used. Such a feature explicitly indicates theequal to multiply theaugmented by the spectrum to thepower, the derivative of a time-domain function referred to asfor various time signals.

To this end, as seen in Figure 1, the description of the characteristics used is as follows:

*Root squared zero-order moment (**)*: this is a function that shows the overall power of the frequency domain and is as follows:

All channels may also standardize their corresponding zero-order moments by dividing all channels into zero-order moments.

*Root squared second and fourth-order moments*: according to Hjorth [89], the second time is used as power, but then, a spectrum shifted , referring to the frequency function:

A repetition of this procedure gives the moment:

The total energy of the signal is decreased by consideration of the second and fourth signals; thus, a power transformer is implemented to normalize the domain of , , and to minimize the noise effect on all moment-based features as follows:

The experimental setting ofis 0.1. From these parameters, consequently, the first three features extracted are described as follows:

*Sparseness*: this feature measures the amount of vector energy in just several more elements. It is followed as

A feature represents a vector with all elements equivalent to a zero-sparseness index, i.e., and , due to differentiation and , while it should require a value greater than zero for all other sparseness levels [90].

*Irregularity factor (IF)*: a measure that expresses the ratio of peak numbers divided by zero crossings. According to [91], only in terms of their spectral instances can the number of upward zero crossings (ZC) and the number of peaks (NP) in a random signal be specified. It is necessary to write the corresponding feature as

*Covariance (COV)*: COV function is the ratio of the standard deviation on arithmetic averages as follows:

*Teager energy operator (TEO)*: it mainly displays the magnitude of the signal amplitude and instantaneous changes that are very susceptible to minor changes. While TEO was proposed for nonlinear speech signal modeling, it was later used to process audio signals. It is formed as follows:

In conjunction with the schematics in Figure 1, from each EEG record , first, the seven features are extracted. In the classification method, the features described by the corresponding vector are used. These characteristics can be assumed to reflect the EEG behavior in the form of cepstrum. Contrary to the well-known voice cepstral features [92], our EEG features have been obtained as the orientation between characteristics derived from a nonlinear EEG record and an initial EEG record following the equation. In the case of EEG classifications at differing levels of force, orientation-based feature extraction processes have recently been demonstrated to be of considerable significance for research on intact-limbed subjects as force generation relies on multiple hand muscle coordination [93].

There have not been any prior attempts to test the efficacy of specific features in amputees. In the coming subsection, the suggested orientation-based feature is adequate for amputees to classify EEG signals with different classes. In the remainder of the essay, the last feature is defined, together with the time-dependent descriptor spectrum, from all channels, given as TD-PSD.

##### 3.2. Convolutional Neural Network

CNN is one of the learning networks inspired by the MLP in this type of neural network. This deep network comprises an input layer, an output layer, and a deeply hidden layer. Firstly, the problem’s signal or data are identified and trained into the algorithm. The hidden weights of the output layer appear in many forms [94]. If the algorithm output includes numerous numerical components, such as a binary number or index (e.g., signal classification, normal = 1, abnormal = 2), then the algorithm presented is a classification or detection algorithm. That is, the outcomes are weighted after the training of several signals. When a new signal is added to the algorithm other than the training signals, the signal form is identified, for example, whether a matrix of various kinds of signals is sent to the algorithm and trains the machine, signals of benign or malignant types of cancer, Alzheimer’s, sarcoma, or brain tumor, for example. The type of disease can be identified by the algorithm with the weights obtained. CNN consists of numerous hidden sublayer forms that are explained as follows.

##### 3.3. Convolutional Sublayer

The basic of the CNN is the convolutional sublayer, and it is possible to view its output matrix as a three-dimensional neuron matrix. For a deeper explanation of this, imagine traditional neural networks. Each layer was a little more than a list (one-dimensional as a rectangle) of neurons in regular neural networks in which each neuron generated its output, and gradually, a collection of outputs referring to each neuron was produced. However, instead of a single list, it is presented with a three-dimensional list (one cube) where the neurons are organized in three dimensions. Therefore, a three-dimensional matrix would also be the production of this cube. This principle and the distinction between the two are illustrated in the images below [95].

Let the size of the input matrix be . Thus, utilizing a receptive field of , each neuron would have connections to the input matrix in the conventional layer. Notice that space’s connection is local (for example, here) but covers the maximum depth (Figure 2). An input displays the left image (for example, a image). The neuron matrix is observed in the conventional blue sheet. In terms of spatial coordinates (length and width) in the input matrix, each neuron in the conventional layer is related to one local region only, but this connection extends in-depth (i.e., covers all color channels). There are depths (in this case, neurons) that all look at one place at the entrance.

The hyperparameter regulates the output matrix’s dimension. The depth, stride, and zero-padding layer are these three parameters. The parameter can be used in the depth of the output matrix. This parameter regulates the number of neurons that bind to a region in the conventional layer’s input matrix. This variable is analogous to multiple neurons in a hidden layer all attached to one input in classical neural networks. All these neurons learn to function on various feedback features in which the deep columns around the spatial dimensions must be defined (width and height). When the stride is equal to 0, the spacing coordinates of just five spaced points are allocated to a new depth column of neurons. Also, in large output matrixes, this leads to receptive areas of overlap between columns. Alternatively, the receptive areas are less frequent if the measures taken are bigger, and the output mass is smaller in the spatial dimension [96]. Overriding the inbound matrix with a zero pad is also more precise. In other terms, fill zero with the input picture circle. Our signal is put within a zero signal like inserting row 1 and column 2 at the beginning and end of a signal.

##### 3.4. Max Pool Sublayer

One standard technique in conventional architecture is the positioning of a pooling layer between many successive layers. This layer’s purpose is to minimize the matrix (input) size (width and height) by reducing the number of variables and calculations within the grid and thereby overfitting the monitor. The pooling layer functions and uses it on each depth cut of the input matrix independently. The MAX function resizes the spot. The most typical way to utilize this layer is to use this layer with filters of sizes with phase that eliminates any depth cuts at the input by deleting two elements from the width and two elements from the height and deleting 2% of the values [97].

##### 3.5. Activation Function

Artificial neural networks’ activation function determines the node’s output node or “neuron” according to the input or group of inputs. In the next node, this output is known as the input. It follows before a solution to the issue is sought. The outcome values are translated to the target set, such as 0 to 1 or -1 to 1 (depending on the activation function selected). Using the logistic activation function, for example, transforms all inputs into true real ranges between 0 and 1. Another essential characteristic of an activation function is that it must be derivative to execute the optimization technique for backpropagation error and measure the weight gradient error in the network and use gradient descent or another optimal approach. Another optimization is optimizing weight to reduce weight. The rectified linear unit (ReLU) is used in this paper for the use of functions as follows:

Some activation functions are also not unique to a single variable and refer to the vector or different variables used in this article, such as SoftMax [98]:

Also, to normalize input results, a batch normalization layer is applied to the network to speed up the training process and reduce network sensitivity between convolutional layers and nonlinearities. Also, to create an abnormal signal augmentation, a dropout layer on the fully connected layers is applied. Fully connected layers are used at the end of the hidden layer, which has been known to distinguish signals. The deep learning layer’s outcome leads to a fully connected layer that drives the final classification judgment.

##### 3.6. Receiver Operating Characteristic (ROC) Curve

In 2004, the ROC curves were developed, which were used to detect a radio noise signal [99]. These curves have recently been discovered to have important uses in medical decision-making. Presume having two kinds of individuals, one is normal and the other is a patient. It is a screening test on both our patients and healthy people, and the spectrum of values from the test ranges from 0 to the large number scale. In this case, the greater the test outcome, the greater the risk of the disease. (For certain things, the action can be the opposite.)

The ROC curve is established by the true-positive rate (TPR) projection in different threshold settings against the false-positive rate (FPR). The TPR is often identified in ML as sensitivity, recall or detection probability. Beginning from the ROC’s left side, both the FPR and the TPR are zero at this point. (This argument indicates that the threshold line, which is the most significant number of test outcomes, is very large.) The TPR and FPR values in this case are measured and the next curve is drawn. The definition of TPR equals TP/Y, and the definition of FPR equals FP/N.

For a reduced number of previous values, let us reduce the threshold line. For lower values, the trend is replicated and eventually reaches the rightmost point of the ROC curve, which is equal to the baseline of the lowest value of the test outcome in this case. In this function, there is one TPR and one FPR. Accuracy relies entirely on allocating random errors and does not correspond with the real value or the value specified. In terms of bias, correctness is conveyed. A complete structural error can consist of one or more components of a systematic error. In the comparison value, a strong bias implies a large disparity. The two variables’ sensitivity and specificity in statistics were used to determine the binary classification outcome (duality). When the data can be broken into positive and negative classes, using sensitivity and attribute indicators, the consistency of a test’s outcomes that separate the information into these two divisions is observable and descriptive. Sensitivity means the number of positive cases that would be accurately checked as positive. Specificity means the number of negative cases that accurately label them as negative (, ).

*True positive (TP)*: the positive signal is accurately detected.

*False positive (FP)*: the negative signal is detected with mistakes.

*True negative (TN)*: the negative signal is detected accurately.

*False negative (FN)*: the positive signal is detected with mistakes.

The sensitivity divides TP cases into the sum of true-positive and false-negative cases in statistical terms.

The confusion matrix is the role of the algorithms described in the field of artificial intelligence. Usually, for supervised learning algorithms, such a demonstration is being used, but it is often utilized in unsupervised learning. An instance of the predicted value is seen in each column of the matrix. Suppose it includes an actual (true) instance in each row. This matrix’s name is also gained, making it possible to mistake and mess with the results. This matrix is commonly called a contingency matrix or an error matrix outside of artificial intelligence.

#### 4. Results and Discussion

##### 4.1. Data Collection

Multichannel EEG signals were captured using earlobe-electrode hallmark monopolar connections [100]. The location of the electrodes over the scalp was obtained according to the 10-20 International Electrode Positioning Method (i.e., Fp1, Fp2, F7, F3, Fz, F4, F8, T3, C3, Cz, C4, T4, T5, P3, Pz, P4, T6, O1, O2). In Figure 3, an example of EEG electrode placement is shown. The electrodes calculate the weak electrical potential in the microvolt range produced by brain activities. Recordings were conducted with closed eyes in a sleeping state. In this way, multiple brain regions can be believed to be governed by the exact hierarchical mechanism.

Data with a signal length of 300 seconds and a sampling frequency of 1024 and 256 samples per second were obtained. Just 180 seconds is extracted for each signal (i.e., from 60 to 240 seconds) to minimize the EEG context artifacts and convert each one to 256 samples per second. The sampling frequency, or sample rate, is the number of equal-spaced samples per unit of time. For instance, if there are 1024 equally spaced observations per second, the sampling rate is 1024/second or 1024 Hz. In this paper, for each second, 256 samples are used. Therefore, the length of the time series is . Moreover, the time interval from 60 to 240 seconds (of 300-second signals) is used with length . Finally, the length of the signal is . To summarize, the EEG capacity of human samples belonging to three categories has been recorded: (1)Patients who have Alzheimer’s disease (AD)(2)Patients who suffer from mild cognitive impairment (MCI)(3)Healthy control (HC)

Figure 4 gives an example of extracted EEG recordings of 256 samples for each group.

Figures 5–7 show each category’s time-frequency analysis based on continuous threshold wavelet transform. The EEG signal processing strategies are addressed in this section to extract the necessary quality information from clinical limitations and simultaneously determine patients’ status. The dispersion coefficient is a strong AD marker that can separate AD EEG from MCI and HC.

##### 4.2. Implementation of the Proposed Pattern Detection Method

In this research, the automatic classification of normal and abnormal EEG signals was built in a deep learning model. EEG signals are used in the EEG database. Documents are separated into two sections of the EEG database: training and testing samples. During the learning stage, the training data were used, and during the evaluation stage, assessment data were used for training the algorithms with 80% used as validation data, while the remaining with 20%.

These data distributions have been randomly chosen. For validation purposes, many datasets have been used since the model parameter is set in several steps. However, experimental results with a specific random seed value are obtained to ensure that the model is reproducible and uniform. Figure 6 provides a thorough description of the data considered for this work. The block diagram of the process is described in Figure 8. Based on the process, firstly, TD-PSD is used for feature extraction of input EEG signals. The results of feature extraction are the produced seven features of values from 256 EEG samples.

Regarding this fact, the number of input variables for each person reduced from 256 to 7 features. It not only reduced the number of inputs but also increased the classification process speed. Signals with the time-dependent value are needed to be transformed into a meaningful value. Therefore, these values are extracted features. Altogether, it is 64 persons for each category of MCI, AD, and HC that each one has seven features. Therefore, the input matrix for each category has elements.

In the next step, the output features are used for classification methods. For achieving this purpose, three powerful traditional methods of classification of EEG signal and diagnosis of AD and MCI patients from HC or healthy people are used. Used methods include KNN, SVM, and LDA. All the methods are chosen from powerful machine learning techniques. The results of classification for mentioned methods are as follows (Figure 9). Figure 9 shows the confusion matrix of utilized methods of classification. The labeling (1, 2, and 3) illustrates MCI, AD, and HC, respectively. Based on the confusion matrix. The green values show the number of persons diagnosed correctly from 64 people. Based on Figure 9(a), from 64 patients who suffered from MCI, 51 of them (79.7%) are diagnosed correctly.

**(a)**

**(b)**

**(c)**

Moreover, for people with AD, the sensitivity of the KNN is 71.9% (46 out of 64 patients). Moreover, 62.5% (40 persons) of healthy samples are detected correctly. Based on the results, the lower row of the matrix illustrates the sensitivity of the methods for the classification of each category. Moreover, red %ages indicated miss rates of each class. The right column of the matrix represents the precision of the KNN technique. Based on the results, 63.7% of the MCI-diagnosed patients are classified correctly. Other precision values are depicted in the right column. Finally, the accuracy of the methods is presented in the lower-right corner of the matrix. Based on the result, the accuracy of KNN, SVM, and LDA approaches is 71.4%, 41.1%, and 43.8%, respectively. In other words, the accuracy of the KNN is higher than that of the other methods. On the other hand, SVM and LDA methods represent results with very low accuracy. It can be noticed that the classification with two categories of patients and healthy people illustrates higher accuracy in comparison with three categories.

In this paper, a novel deep learning process for the classification of EEG signals is introduced based on a CNN. The architecture of the presented method is illustrated in Figure 10. Input layer includes the following: (i)Seven features of 64 people for each category of MCI, AD, and HC(ii)Input matrix 4D [](iii)Output matrix 1D []

The labeling (1, 2, and 3) illustrates MCI, AD, and HC, respectively, output matrix. The process of the classification using the presented CNN architecture is shown in Figure 11. The maximum accuracy for the process reached almost 100%, and the loss value decreased to almost zero.

The confusion matrix of the presented method is indicated in Figure 12 without uncertainty value. The confident value of the accuracy for the CNN approach is 82.3%. In detail, 85% of the MCI patients are detected correctly; however, 89.1% of the AD and 75% of the normal sample are diagnosed correctly. For comparison of the resulted model of CNN with other KNN, SVM, and LDA techniques, the ROC plot is depicted in Figure 13. Regarding Figure 13, the value of FP rate versus TP rate is depicted based on output classification scores. Based on this graph, if the values would be higher for the TP rate and lower for the FP rate, it is better than the other graphs.

Moreover, the area under the curve (AUC) is one of the criteria for the classification method’s performance analysis. Based on Figure 13, the presented CNN outperforms other methods, and the KNN method is in the following priority. The LDA and SVM reached low AUC values (see Table 2). To conclude, it can be seen that the presented CNN architecture is better and more accurate than other classification methods. Based on Table 2, the best method for diagnosing AD patients from EEG signals is the presented architecture’s CNN approach.

#### 5. Conclusion

In this article, the TD-PSD approach is used for EEG signal feature extraction from three groups of MCI, AD, and HC test samples. The final features used in three conventional classification methods are registered: KNN, SVM, and LDA, and the effects are recorded. Finally, the CNN architecture is provided for AD patient classification. The findings are indicated using performance measurement. Data were obtained with a signal duration of 300 seconds and a sampling frequency of 1024 and 256 samples per second. To minimize the EEG background artifacts, it obtains 180 seconds for each signal (i.e., from 60 to 240 seconds), and it is translated to 256 samples per second. The EEG ability of human samples belonging to three groups, including AD, MCI, and HC, has been summarized. First, the TD-PSD method is utilized to feature input EEG signals based on the procedure. The results of the extraction of features are the generation of seven value characteristics from 256 EEG samples. For classification methods, the output features are used in the next step. Methods used include KNN, SVM, and LDA. All the strategies are picked from effective methods in machine learning.

Based on the findings, 51 of the 64 patients with MCI (79.7%) were correctly diagnosed. In comparison, the KNN sensitivity for AD persons is 71.9% (46 out of 64 patients). In comparison, 62.5% (40 individuals) of sound samples are appropriately classified. Also, 63.7% of the MCI diagnosed patients are appropriately categorized based on the findings. In comparison, KNN, SVM, and LDA methods have a precision of 71.4%, 41.1%, and 43.8%, respectively. The precision of KNN, in other words, is greater than that of other processes. Then, a new EEG signal classification architecture is implemented that is focused on a CNN. For the CNN approach, the accurate meaning of accuracy is 82.3%. 85% of MCI cases are accurately detected in-depth, but 89.1% of the AD and 75% of the healthy population are correctly diagnosed. The presented CNN outperforms other approaches based on performance, and the KNN approach is the next target. The LDA and SVM were at low AUC values. For potential outcomes, it is recommended to modify feature extraction with another EEG signal justification for classification with a lower number of features and training using the design provided.

#### Data Availability

Data is extracted from the following ref: An integrated approach based on EEG signals processing combined with supervised methods to classify Alzheimer’s disease patients.

#### Conflicts of Interest

The authors declare that they have no conflicts of interest.