Abstract

Epilepsy is a common electrophysiological disorder of the brain, detected mainly by electroencephalogram (EEG) signals. Since correctly diagnosing epilepsy seizures by monitoring the EEG signal is very tedious and time-consuming for a neurologist, a growing number of studies have been conducted on developing automated epileptic seizure detection (AESD). In this work, a novel supervised model is proposed for AESD. Initially, the EEG signals are collected from Bonn University EEG (BU-EEG) database. Then, empirical mode decomposition and feature extraction (combination of entropy, frequency, and statistical features) are applied to extract the features. Furthermore, Siamese network is utilized to lessen the number of extracted features and obtain the most discriminative features. Then, these features are exploited to classify seizure and non-seizure EEG signals by using a support vector machine classifier. This paper examines the Siamese network’s contribution as a learning-based feature transformation in improving seizure detection performance. The numerical results confirm that the proposed framework can achieve a perfect classification performance ( accuracy). This approach can constructively help doctors to detect epileptic seizure activity and reduce their workload.

1. Introduction

An epileptic seizure is defined as a temporary event of signs or symptoms resulting from abnormal excessive neuronal activity in the brain by the International League Against Epilepsy and International Bureau for Epilepsy [1]. As asserted by the World Health Organization, currently, approximately 65 million people of all ages have epilepsy worldwide, which makes it one of the most common neurological diseases globally [2]. Technically speaking, various available technologies have been used to measure brain function throughout a seizure, including electroencephalography (EEG), functional MRI, PET scanning, single-photon emission computed tomography, and magneto-encephalography [3]. Among all these technologies, EEG has maintained its place as the most widely used diagnostic approach. The problem of automated epileptic seizure detection (AESD) has been widely studied by utilizing machine learning algorithm that can classify EEG signals into normal, interictal, and ictal brain activities. As represented in Figure 1, the whole process of AESD consists of four main steps: data acquisition, signal preprocessing, feature engineering, and classification. During the past decade, various types of research have been conducted to develop AESD techniques by improving one or more abovementioned steps [4]. In AESD, feature engineering aims to extract the most concise features from EEG signals, which comprises two main stages, namely, feature extraction and feature reduction. There are many ways to extract features. Extracting more features can lead to getting more information which in turn can result in improving classification accuracy. However, it has been found that the addition of features beyond a certain point may lead to lower classification accuracy due to the curse of dimensionality [5]. Nevertheless, there is not enough knowledge about the EEG signal and the features extracted from it at the beginning of the process. Therefore, it is reasonable to extract many features from the EEG signal and then use dimension reduction approaches to choose an informative subset of features. Note that dimensionality reduction depends on the feasibility of mapping data points from the high-dimensional input space X to the low-dimensional feature space Y using the function , where can be acquired from the data trying to bring information loss to a minimum when data points in X are projected to Y. Function can be linear. This is the case with principal component analysis (PCA) and other related techniques. However, the ability of these approaches is heavily dependent on the function’s non-linearity. Hence, deep learning (DL) can present new approaches precisely planned to perform non-linear feature reduction. To handle the abovementioned challenge, we exploit the Siamese network to reduce the dimensionality of handcrafted features for improving AESD accuracy. The main contributions of our paper can be summarized as follows. (i) An improved machine learning-based AESD method by utilizing DRL-based feature reduction is proposed. (ii) The effect of varying the number of feature dimensions is investigated. Moreover, the best window size is experimentally obtained in order to segment the EEG signal for improved AESD. (iii) A feature extraction strategy is introduced that extracts numerous features by utilizing a time and frequency-based statistical analysis of the original EEG signal and sub-bands decomposed by empirical wavelet transform (EWT). (iv) The Siamese network is constructed based on a multi-layer perceptron to extract the most informative features for improving classification performance. By exploiting the Siamese network as feature reduction, the performance of AESD significantly improves. The main idea behind Siamese networks is processing two inputs in parallel and training a double-input neutral network with a contrastive loss function, which gives the ability to detect whether the two inputs are from the same class or not. The Siamese network is used to reduce features and extract the most informative and discriminative features based on representation learning. Since few parameters in the Siamese network need to be learned, acceptable results can be achieved with relatively low training data. Considering all the previous works on the subject, utilizing the Siamese network for feature transformation in AESD is suggested in this study for the first time. Other sections of the study can be laid out as follows. In Section 2, the related works and the task background are discussed. Section 3 describes our proposed methodology in general and Siamese network-based feature transformation in detail. The experimental results of our framework are presented and discussed in Sections 4 and 5. Finally, Section 6 provides the conclusion and future work.

Many research papers are published to detect epileptic seizures in recent years to support automatic diagnosis systems to help clinicians with heavy tasks. The study in [6] shows the importance of using pattern recognition methods in epileptic data sampling and collection by discussion and analysis of research done in this field. This study also proves that discrete wavelet transform (DWT) can play a significant role in improving and enhancing the accuracy of epileptic seizure detection; moreover, using different pattern recognition methods with various features is discussed. The study in [7] has used and investigated various DL methods, e.g., autoencoder (AE), recurrent neural network (RNN), and convolutional neural network (CNN), to detect epilepsy by different screening ways such as MRI and EEG. Now scientists can use these analyses to understand better the information concepts derived from these signals. The whole process of AESD has been summarized in [4]. This essay aimed to examine the details in epileptic seizure detection using EEG signals. Similarly, in [8], the main steps of seizure prediction processes, including data collection, signal preprocessing, feature extraction, feature selection, classification, and performance evaluation, were summarized. A hybrid seizure detection framework that used both wavelet and Hilbert transforms was introduced in [9]. Jiang et al. proposed two new methods for ASED based on symplectic geometry decomposition and scattering transform in [10, 11]. Simulation results in [10] proved that features extracted based on symplectic geometry decomposition can be helpful in seizure detection with high-performance accuracy and low computational cost. In [11], the ability of scattering transform, which is an improved form based on a complex wavelet, in seizure detection was investigated. The exploration of FuzzyEn and LogEn-based in scattering transform is the most significant contribution of this study. In [12], Turker Tuncer et al. proposed a method for AESD that utilizes a local senary pattern (LSP) to extract stable features from EEG signals. At first, 15,360 features are extracted, then neighborhood component analysis (NCA) is used to reduce the number of features, and 256 features are obtained. Finally, support vector machine, K-nearest neighbor, quadratic discriminant analysis, and linear discriminant analysis are used as the classifiers. The study in [13] proposed a new approach for classifying epileptic phases based on EEG signals’ Fourier synchro-squeezed transform (FSST). In this method, the FSST’s absolute of the EEG signal is segmented into five sub-bands. Then, the gray-level co-occurrence matrix (GLCM) of each sub-band is considered as feature. By concatenating the features of different sub-bands, the final feature vector is obtained. Finally, after choosing informative features by the infinite latent feature selection approach, the support vector machine (SVM) is utilized to classify EEG signals. The authors in [14] proposed a new method for detecting epileptic seizures from EEG signals using TunableQ wavelet transform (TQWT), which is used to decompose EEG signals into sub-bands. Then, a quadruple symmetric pattern is utilized to extract 256 features from the raw EEG signal and its sub-bands. After that, NCA selects the most significant features in the feature selection phase. Finally, K-nearest neighbor (KNN) is utilized as a classifier. The study in [15] applied complementary ensemble empirical mode decomposition (CEEMD) and extreme gradient boosting (XGBoost) to improve AESD. First, CEEMD was employed to split raw EEG signals into intrinsic mode functions. Subsequently, the multi-domain features were extracted from raw signals and the decomposed components. Then, the most discriminative features were selected by importance scores. Eventually, XGBoost was applied to detect an epileptic seizure. In [16], a hybrid IMF selection method combining four different approaches (energy, correlation, power spectral distance, and statistical significance measures) was developed. In this work, the effect of selected IMFs extracted by EMD and EEMD on the classification has been investigated. Simulation results indicate that IMF selection affects classification. EEMD offers a reliable approach for separating preseizure and seizure phases from EEG data. Mammone et al. [17] suggested a spatial-temporal analysis of absence epilepsy EEG records using PE. In this paper, the permutation Rényi entropy (PEr) is defined parametrically in epileptic EEG analysis. PEr’s settings have been fine-tuned against PE (order, delay time, and alpha). PEr surpasses PE because there is a statistically significant, greater difference between interictal and ictal PEr levels than with PE. PEr outperformed PE as a classifier input for interictal vs ictal states. All the discussed state-of-the-art techniques extract the numerous features from the EEG signal and its sub-bands and select the most informative feature using a conventional feature selection approach. Here to acquire higher accuracy, we utilize Siamese network feature reduction to extract the most informative feature. Considering the aim of the work mainly as dimension reduction, we did not investigate the end-to-end differentiable solutions for AESD. Although these methods [18, 19] have acceptable performance, they are outside the scope of this paper.

2.1. Feature Selection and Feature Transformation

Dimensionality reduction has a significant impact on the AESD system’s performance. There are various ways to decrease dimensionality, in which all methods try to find the most discriminative features that can distinguish different classes. In general, dimensionality reduction methods have been categorized into two main groups: feature selection and feature transformation, also known as subspace learning. The main idea behind feature selection methods is eliminating redundant and highly dependent features via different criteria, which can effectively improve learning performance by increasing accuracy and decreasing the execution time of learning algorithms [20, 21]. On the other hand, feature transformation combines original features to construct a new and smaller set of features. Maximum relevance minimum redundancy (mRMR) [22], chi-square (CHI) [23], and odds ratio (OR) [24] are few instances of the feature selection methods. Principal component analysis (PCA) [25], latent semantic indexing [26], independent component analysis (ICA) [27, 28], and partial least square (PLS) [29] are some examples of feature transformation methods. Besides, considerable research has been conducted on improving seizure detection performance based on feature selection and dimensionality reduction. For instance, the authors in [30] proposed a new method for detecting epileptic seizures from EEG signals using the improved correlation-based feature selection method with random forest classifier. The study in [31] proposed a new feature selection algorithm based on a harmony search for AESD. In [32], a novel feature selection based on a genetic algorithm was proposed. Moreover, in [33], a novel feature selection algorithm was proposed based on the combination of the Fisher score and p value feature selection schemes. The proposed method used the most extended common sequences to rank the features for classifying epileptic and non-epileptic EEG signals, and it considerably outperformed several standard feature selection methods.

2.2. Siamese Networks

Bromly first introduced Siamese neural network as a powerful method for solving an image matching problem [34]. Then, the new structure of Siamese neural networks was proposed in [35], the main idea of which was to find a similarity metric from input data by finding a map between input patterns and target space so that a specific distance in the target space estimates the semantic distance in the input space [36]. More precisely, Siamese networks were trained using “contrastive loss,” which creates a Euclidean space in which samples with the same class label are close to one other and samples with different class labels are far apart. Since then, several kinds of research have been conducted to investigate Siamese network for different pattern recognition applications like metric learning, drug discovery, protein structure prediction, speech representation, zero/one/few-shot learning, and person re-identification. In brief, the Siamese neural network can somehow be related to representation learning, in which the created space facilitates the classification tasks. Simply put, the Siamese network is utilized for reducing the high dimensionality features by realizing the link between the pair of input data.

3. Methodology

As shown in Figure 2, the proposed method comprises two distinct systems: the offline training system used to generate rules from EEG signals and the online detection system that uses those rules to classify new signals. The offline training system tries to build a classifier through the following steps. Preprocessing is the first step, which consists of time segmentation and noise removal. The next step is breaking up EEG signals into narrow sub-band signals by using EWT, and then some standard features are obtained from EWT sub-bands. Then, NCA is used to select high-ranked features from all sub-band signals. Afterward, the proposed Siamese network is applied to transform high-dimensional feature space into a more reduced and concise space. Finally, transformed features and their corresponding labels are used to train the classifiers. Then, with a bit of difference, specific features are extracted from test samples in an online system, and then the class label is estimated using pretrained parameters. Ultimately, the performance of the classifier is assessed to determine how accurate, sensitive, and specific it can be.

3.1. Database

The proposed method is evaluated using the Bonn University’s EEG signal dataset, which was firstly reported by Andrzejak et al. [37]. This dataset consists of five subdatasets named Z, O, N, F, and S (it is also named A to E, respectively) with the characteristics described in Table 1. Furthermore, each set has 100 single-channel EEG signals of 23.6 duration, with a sampling frequency of 173.6 Hz leading to 4097 samples for each signal. Example of time series EEG signal from five sets available in Bonn dataset is depicted in Figure 3. Since this dataset consists of five subsets, various classification tasks could be considered to study, where the most practical one is shown in Table 2 [38]. In this work, the four common cases have been considered for EEG signals classification: Case 1, Case 9, Case 11, and Case 12. Thus, a valid comparison of the proposed method with previous studies can be established. It should be mentioned that Case 12 is the most challenging type of seizure classification, since it involves discriminating between five classes. In this case, the major purpose of this study is to achieve high accuracy, as there has been little research on the five-class classification issue.

3.2. Preprocessing

In the preprocessing stage, the input EEG signal is first divided into segments with a fixed length, with or without overlap, which entirely depends on the input database and final task. In this study, the effect of different segment lengths on seizure detection’s ultimate accuracy is also examined. In this regard, the segment length that is investigated is 512 samples, 1024 samples, 2048 samples, and the entire period (4097 samples) of the EEG signal. Since the Bonn dataset’s EEG signal is entirely free of noise, there is no need to apply noise cancellation methods [37]. In real experimentally acquired EEG, it is often necessary to eliminate undesired artifacts from EEG signals in order to retrieve valuable diagnostic data from them. For such a goal, often a priori information about the nature of the artifacts is required. For widespread EEG systems where the participants are free to move and consequently introduce a broad range of motion-related artifacts, artifact contamination of the EEG is much more prevalent. Because it is difficult to get a priori knowledge of their features, unfortunately, traditional artifact removal procedures are often useless. In this case, the method proposed in [39] can be practical for artifact suppression in pervasive EEG.

3.3. Signal Decomposition by Empirical Wavelet Transform

Because raw EEG signals are non-linear and non-stationary, decomposition algorithms are widely employed to decompose raw EEG data for better AESD performance. In many studies, time-frequency analysis has been employed for biological time series processing [4042]. In this study, EWT is exploited to decompose each raw EEG signal into several sub-bands. Then, these raw EEG signals and the decomposed components were considered for the subsequent feature extraction. The EWT, which has been first proposed in [43], is a self-adaptive and powerful technique for the analysis of non-stationary signals. EWT aims to decompose a signal on wavelet-tight frames that are built adaptively. Put merely, EWT tries to extract various signal modes by constructing suitable adaptive wavelet filter banks for signal processing. The necessary steps used by EWT are described below.

Step 1. Computing the Fourier spectrum in the frequency range of the signal by applying a fast Fourier transform algorithm and then finding all local maxima of and deducing their corresponding frequencies where is input EEG signal, .

Step 2. Segmenting the Fourier spectrum into N portions by the boundary detection method. In this study, the total number of sub-band signals (N) for each EEG signal gets fixed to 10. The boundary of each segment can easily be computed by taking the middle of two consecutive local maxima. It should be mentioned that first and last boundary frequencies are 0 and , respectively, and the Fourier segments are .

Step 3. Defining an adaptive filter bank of m wavelet filter on each obtained segment. These wavelet-based band-pass filters are generated with the idea of Littlewood–Paley–Meyer wavelets [44]. To summarize, EWT decomposes a signal into N sub-band signals , which in fact is a narrow-band modulation. Therefore, the Hilbert transform is utilized to define the instantaneous amplitude and frequency of each sub-band signal. As an example, the raw EEG signals and corresponding components decomposed using EWT from set F in the Bonn dataset are illustrated in Figure 4.

3.4. Feature Extraction

Previous studies revealed that pulling out features from various domains, including time, frequency, and time-frequency domains, is proved to be better than one single domain in improving AESD accuracy [2, 4, 45]. In this study, we propose a time-frequency analysis strategy using the EWT, which adaptively decomposes the EEG signal into sub-bands. It is worth mentioning that ten sub-bands are chosen empirically for further analysis. Then, 15 features from raw EEG signals and the decomposed sub-bands are extracted. These features are summarized in Table 3, all of which were comprehensively explained in [2]. Afterward, we concatenate the features of different sub-bands and raw EEG signals to obtain the final feature vector. Ultimately, features are extracted from both raw EEG signals and decomposed sub-bands. The procedure of the proposed feature extraction is demonstrated in Figure 5.

3.5. Feature Selection

Most likely, some of these features may be irrelevant or redundant, not proper for classification. Therefore, it is necessary to have some algorithm to eliminate these irrelevant features from the feature vector. In this regard, by using the feature ranking method, the extracted features are ranked according to their importance for epileptic seizure detection. In proposed model, out of the 165 features, 30 high-ranked features are selected by using analysis of variance (ANOVA). More specifically, ANOVA is applied for choosing the most relevant features based on ranking feature vector by computing F-value for each feature. For instance, for Case 1, the F-values of all features are depicted in Figure 6 and top 30 extracted features are listed in Table 4 in descending order.

3.6. Siamese Network-Based Feature Transformation

In proposed approach, the Siamese network is used to learn non-linear functions as the feature reduction method. As shown in Figure 7, Siamese network consists of two identical subnetworks with the same architecture sharing equal weights. In our method, MLP is considered as the backbone, which maps input feature space to a lower space by choosing p and q as the number of neurons in the input and output layer, respectively. Assuming that is a p dimensional vector of the input space, is a non-linear representation for MLP network which transforms the vector into a q-dimensional Euclidean space, binary variable Y that is zero and one when the pair belong the same class or not, respectively, and D is the indicator of distance between the pair of samples, which is obtained from the following equation , where is the Euclidean distance. The aim is to find (by updating weights) such thatFor , minimize the distance, D.For , D becomes large enough, at least larger than , where is the hyperparameter quantifying the minimum distance.

Hence, the contrastive loss function for one pair is defined as [46]

Obviously, contrastive loss function tries to minimize interclass distance and maximize intraclass distance. For training the network efficiently, the data must be grouped into pairs of data that are either similar or dissimilar. Here, similar data are defined as having the same label, while dissimilar data have different labels. In the proposed structure, each subnetwork consists of five fully connected layers (input layer, three hidden layers, and output layer). Each layer is followed by an activation function, a dropout layer, and a batch normalization layer. The number of neurons for the input layer and output layer can be determined by the selected feature dimension and final feature dimension, respectively, which equals 30 and 2. However, the number of neurons in each hidden layer is a hyperparameter, and it can be obtained through trial and error process. Here, numerous kinds of experimentation were employed to determine the neuron’s number for each hidden layer (in this study, 20 neurons, 10 neurons, and 5 neurons are used, respectively, for hidden layers 1, 2, and 3). In addition, mean squared error is used as the loss function, and Adam is applied as the optimizer. The learning rates is 0.0005, and the sampled mini-batch B is assumed to be 256. The details of the Siamese subnetwork are depicted in Figure 8.

3.7. Classification and Performance Evaluation

To classify the selected features, we used an SVM classifier. SVM, which was proposed by Vapnik, is an advanced pattern classifier that properly separates two classes. It was elaborated based on statistical learning. By determining the best hyperplane with the maximum margin, SVM pointedly classifies data points and it is also capable of disregarding outliers. For data points that are linearly non-separable, the sample space was mapped to the high-dimensional feature space through a non-linear kernel function in SVM. Thus, the kernel function converts integrated data into separable data. In the present research work, different kernels—linear and non-linear including quadratic, cubic, and Gaussian—are applied to determine the best hyperplane and the results related to the highest accuracy are shown. Besides, to evaluate the performance of the proposed method, accuracy (ACC), sensitivity (Se), and specificity (Sp) are utilized. Accuracy is the most commonly used factor to evaluate classification, which measures the validity of the classifier, and is defined as the number of correctly predicted examples to the number of all examples. Sensitivity, on the other hand, is defined as the ratio of the correctly predicted number of true positive examples to the number of all positive examples. Specificity is the ratio of the number of correctly predicted of true negative examples to the number of all negative examples. The mathematical notations of these performance metrics are formulated as follows:

True positive and true negative EEG signals, also known as TP and TN, indicate the number of correctly identified EEG signals and the number of correctly rejected EEG signals for each class. Also, false negative (FN) and false positive (FP) are the number of incorrectly rejected EEG signals and incorrectly identified EEG signals for each class.

4. Result

In the proposed method, to ensure that the present classification performance is free from biases, k-fold cross-validation is used. Yet the overall accuracy of one-time k-fold cross-validation might not be regarded as reliable. Therefore, T times k-fold cross-validation is performed in every single classification phase, and the mean and standard deviation are estimated in this examination. In this study, we assume T = 20 and K = 5. To simulate the proposed model, we used the MATLAB programming environment on a personal computer (PC). As mentioned in the methodology section, we first divide each EEG signal into the short segment, and each part has been decomposed in sub-bands. Later various features are extracted from sub-bands, and then the two-stage feature selection method is proposed to select the most concise features. After that, SVM is used to classify the selected features. Finally, k-fold cross-validation is employed to examine the performance of the proposed model. To cover all steps, the simulation results are obtained for different scenarios. At first, for several window lengths, the applicability of this method was evaluated. Then, we show the impact of feature engineering steps (number of sub-bands, number of features, and dimensionality reduction) on the classification performance. Finally, a comparison has been made between the proposed model and the latest techniques.

4.1. Impact of Feature Selection Step on Classification Performance

In this section, we examine the effect of the feature selection step on the final performance of the AESD. For this purpose, we first rank the obtained 165 features using ANOVA (as shown in Figure 6), and then at each step, we consider k top-ranked features for classification. Classifier’s accuracy on a different number of selected top-ranked features for four cases (Case 1, Case 9, Case 11, and Case 12) is shown in Figure 9, which shows that the performance in all experimental cases improves with the increase of the selection of top-ranked features till , and then it becomes almost steady till k = 35; after that, the accuracy is degraded due to the curse of dimensionality. It is worth mentioning that in this experiment, window length of 4096 samples is considered. In order to increase the accuracy of the AESD, after selecting the thirty top-ranked features, we propose to utilize the Siamese network to combine these features and extract the most informative features.

4.2. Impact of Feature Engineering Step on Classification Performance

To examine the importance of feature engineering, we analyzed the classification’s final results with different feature dimensions. For this purpose, we considered four different attitudes. In the first attitude, we just used 15 abovementioned features and extracted them from the original signal and performed classification on them. In the second attitude, we decomposed the signal using EWT transform to 10 sub-bands and extracted 165 features in total, and these features were sent to the classifier for evaluation. In the third attitude, we chose the top 30 most informative features from 165 features and studied classification results. In the final attitude, we considered the result of the proposed model, which is introduced in Figure 2; the results of these four attitudes are summarized in Table 5.

4.3. Effect of Various Lengths of the EEG Signal Segment

The proper selection of segmentation is critical in obtaining better classification results in seizure detection research. Hence, selecting segmentation based on length will yield classification results too. In this regard, four windows lengths were determined by using 4096, 2048, 1024, and 512 samples, respectively, to divide each original EEG segment into 1, 2, 4, or 8 short segments. Table 6 demonstrates the effect of different segment lengths on the final accuracy, sensitivity, and specificity of seizure detection.

5. Discussion

5.1. Examination of Parameter Variation on the Performance

From Table 6, it could be inferred that by increasing the signal length, the classification will be improved in terms of accuracy, sensitivity, and specificity. The experimental results indicate samples’ tendency to yield optimum classification results at a length of 4096 compared to samples having lengths of 512 and 1024, which can be attributed to the direct association between signal length and classification information as an increase in the former leads to an increase in the latter. Besides, for Case 1 with the shortest signal length, the best possible performance has been achieved, which indicates the robustness of our proposed method with respect to decreasing signal length. It is also worth mentioning that for short length window, the accuracy, sensitivity, and specificity of the proposed model for Case 12 (the most challenging type) are 98.91%, 97.94%, and 99.05%, respectively, which is superior to the previously existing methods [47, 48]. Table 5 illustrates the classification performance (accuracy, sensitivity, and specificity) for different feature dimensions and four scenarios. In the first step, 15 features were selected, which showed moderate performance in all metrics. In the next step, using EWT, the number of features has been increased dramatically from 15 to 165. As Table 5 indicates, the accuracy, sensitivity, and specificity have been drastically decreased, resulting from the curse of dimensionality. Simply put, the exception of improvement by exceeding features is not satisfied because of overfitting. In the third step, 30 high-ranked features have been selected by using ANOVA to address this issue. The result shows that accuracy, sensitivity, and specificity in Case 1 have reached 100%, which is the best possible performance, and in Case 12, they have reached 95.15%, 93.85%, and 96.75%, respectively. This improvement reflects the appropriate performance of the feature selection algorithm. In the final step, the Siamese network-based dimensionality reduction method is applied to combine features and obtain the most discriminative features, which is the main idea. Overall, it is evident that accuracy, sensitivity, and specificity for all cases significantly increase in the proposed method by using only two features. For instance, for Case 12, accuracy, sensitivity, and specificity compared to Step 3 has been improved by 4.15%, 5.79%, and 2.37%, respectively.

5.2. Comparison with Recent State-of-the-Art Works

The performance of the proposed method was compared with other existing state-of-the-art methods using the same dataset. As shown in Tables 7 and 8, the proposed method shows superior performance in terms of accuracy, sensitivity and specificity over the state-of-the-art methods.

6. Conclusion

In this work, we proposed a novel AESD model based on the Siamese network. We constructed a Siamese network based on a multi-layer perceptron to extract the most informative features for improving classification performance. A Siamese network-based dimensionality reduction method for AESD was designed and evaluated under different scenarios. The results showed the effectiveness of the approach for binary (Case 1) and multi-class (Cases 11 and 12) classification and for unbalanced datasets (Case 9). The final results imply a promising application for the Siamese architectures in the classification tasks. In near future, we will extend our model to examine larger datasets with more epileptic patients to improve the generalization ability. Furthermore, an advanced triplet network model can be considered for better results.

Data Availability

The dataset analyzed during the current study is available in https://ebrary.net/59044/education/details_public_databases.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Authors’ Contributions

Ramin Barati and Hamid Azad contributed equally to this work.