Abstract

Automatic epileptic seizure detection technologies for clinical diagnosis mainly rely on electroencephalogram (EEG) recordings, which are immensely useful tools for epileptic location and identification. Currently, traditional seizure detection methods based only on single-view features have great limitations for the typical dynamic and nonlinear EEG signals. An objective of this paper is to investigate the effect of multiview feature selection and multilevel spectral analysis methods on the identification of the EEG signals for seizure detection. Here, multiview features are extracted from time domain, frequency domain, and information theory to collect adequate information of EEG signals. And a feature selection algorithm based on particle swarm optimization (PSO) is proposed for automatic seizure detection. Moreover, due to the different frequency components of the EEG signals, they are divided into four kinds of brain waves for multilevel spectral analysis. The effect of these four rhythm waves on seizure detection is compared. Three well-known classifiers are employed to classify EEG signals concerning seizure or nonseizure events. The result shows that the average accuracy, specificity, and sensitivity of classification with the CHB-MIT database are 98.14%, 98.64%, and 96.79%, respectively. The application of the PSO-based feature selection method for automatic seizure detection improves accuracy by 5.99% with the SVM classifier. Compared with the state-of-the-art methods, the proposed method has superior competence with high performance for automatic seizure detection. It is further shown that the feature selection method is an indispensable step in seizure detection. With PSO-based feature selection and multilevel spectral analysis, the wave in the frequency range of 4-7 Hz shows better performance in the identification of EEG signals and is more suitable for the proposed method. The PSO-based feature selection algorithm for automatic seizure detection can be a useful assistant tool for clinical diagnosis.

1. Introduction

Epilepsy is a common neurological disease caused by abnormal brain discharge that affects about 1% of the world population. About 35% of patients with epilepsy can be controlled with medication. The other 8-10% can be treated with surgery. And there is no feasibility treatment strategy for the remaining patients [1]. The symptoms of epileptic seizures accompanied by sudden and periodic brain dysfunction reflect the clinical state of excessive abnormal brain neurons. Electroencephalogram (EEG) can record this process and detect electrical activity in the brain [2, 3]. The difference between the EEG recordings of normal people and those with epilepsy can be found by reading the EEG recordings. This difference mainly reflects the various modes of EEG in the brain, including rhythmic waves of various frequencies, multiple spikes, and complexes of spike and sharp waves. Reading short-term EEG recordings is easy for experts, but long-term EEG recordings will be a time-consuming task. Automatic seizure detection technologies can effectively help clinical experts diagnose epilepsy disease.

Automatic seizure detection methods have attracted increasing attention from researchers in recent years. Early automatic seizure detection algorithms were mainly based on empirical criteria [4, 5]. These methods may be insensitive to the identification of seizures. Subsequently, some seizure detection methods based on single-view features emerged successively and used frequency-based feature extraction methods to train classifiers [69]. Due to the nonlinear characteristic of EEG signals, frequency-based analysis methods had significant limitations and could not effectively differentiate seizure and nonseizure sequences. Tzallas et al. used the time-frequency (TF) analysis method to obtain energy by calculating the probability density function (PDF) of different TF distributions and used the energy as a characteristic to classify the EEG signals. The classification results verified that the TF analysis method was effective for automatic seizure detection [10]. Zhang et al. constructed an EEG montage and a multiple feature extractor for the time series and different frequency components of the signals [11]. Ech-Choudany et al. designed a TF analysis framework based on dissimilarity for seizure detection [12]. The experimental results showed that it achieved high accuracy for various classification problems (including 2, 3, and 5 classification). The TF method provided joint information of multiview features from time and frequency domain, achieving better performance than the early methods that only considered single-view features.

Several feature extraction methods have been proposed for EEG analysis. They are divided into four domains: time-domain analysis, frequency-domain analysis, TF analysis, and nonlinear methods. In [13], the singular value decomposition and the successive spike interval analysis were used to obtain the low and high frequency features of the signal, respectively. The extracted features were then fed into separate artificial feedforward neural networks for classification and each of the networks included two hidden layers. In [14], 21 features were extracted in each 2 s EEG segment, including time domain, frequency domain, and nonlinear features based on entropy. These features were then inputted into linear discriminant analysis (LDA) for classification. Thomas et al. extracted 55 features from each 8 s EEG segment and used the Gaussian mixture model to detect seizure [15]. Ahmed et al. and Temko et al. used the same set of features and fed these features into the support vector machine (SVM) classifier with radial basis function (RBF) and Gaussian dynamic time warping [16, 17]. Zwanenburg et al. obtained 5 features and used SVM to detect whether epilepsy occurred in a newborn [18]. Logesparan et al. summarized 97 shortlisted publications for comparison and obtained 65 features to evaluate the discriminative performance of these features [19]. When the number of extracted features is large, the dimension of the feature vector will increase, resulting in high computational cost for the identification of seizures. The feature selection method can reduce the dimension of features and improve the accuracy of classification.

The purpose of feature selection is to select a subset of relevant features. Existing work has rarely considered the effect of feature selection on performance classification for seizure detection. Some features extracted from the EEG signals are redundant or irrelevant. The feature selection method can remove redundant information and prevent the loss of useful information. The advantages of feature selection methods include the following: (1) a simplified and easier model for researchers or users to interpret [20]; (2) shorter training time [21]; (3) reduction in dimensionality [22]; (4) improvement of the compatibility of learning models for classification data [23]; (5) encode the inherent symmetries in the input space. Feature selection includes two aspects: search methods for new feature subsets and evaluation metric that scores each subset. According to the different evaluation metric, the feature selection algorithms can be divided into three categories: wrapper, filter, and embedded methods [24]. Wrapper methods use a predictive model to score a subset of features. Each new subset is used to train a model and then tested on the validation set. The feature subset is scored by calculating the number of errors (that is, the error rate of the model) on the validation set. Such methods tend to find the optimal feature set for a particular type of model. Filter-based methods employ proxy metrics instead of scoring error rates based on subsets of features. It is also used in the preprocessing step of wrapping methods, so that wrapping methods can still be used when the problem is too complex. Embedded methods try to combine the advantages of both previous methods. It takes advantage of its own variable selection process and performs feature selection and classification simultaneously. The computational complexity of this method tends to fall between the wrapper and filter methods. In this paper, the wrapper method is used to find the best feature set for seizure detection.

Particle swarm optimization (PSO) is a population-based search algorithm first proposed by Kennedy and Eberhart [25, 26]. It does not use evolutionary operators for individuals but regards each of them as a particle (point) without volume in the D-dimensional search space. A particle flies at a certain speed in this space and adjusts its position and velocity dynamically according to its own experience and the experience of its companions [27]. The PSO algorithm is powerful and easy to implement with high computational efficiency. Due to these advantages of the PSO algorithm, it has evolved into a simple and effective optimization algorithm [28, 29]. In this paper, the PSO-based feature selection algorithm is used to optimize the extracted features. The influence of the feature vectors before and after optimization on the classification of EEG signals is explored.

Several classification strategies are used in seizure detection. SVM classifiers are perhaps the most popular classifiers and are widely used in automatic seizure detection [3033]. The SVM classifiers find a hyperplane in the sample space based on the training set and separate the samples into different classes. Finding the partitioning hyperplane with the maximum margin yields robust classification results and produces the best generalization to unseen samples. Random forest (RF) classifiers are a kind of ensemble learner which consists of multiple independent decision tree learners [34, 35]. It collects training samples randomly, so that the samples show diversity, which can effectively increase the generalization of the final ensemble. Based on some distance measurements, -nearest neighbor (-NN) classifiers find the samples that are closest to a given sample in the training set and then make predictions based on the information of these neighbors [36]. In the feature selection algorithm, the corresponding classifier is used to train the model and obtain the best subset of features of the patient. The test set is predicted to evaluate the performance of the classifier.

Many of feature extraction methods have been proposed for seizure detection. However, feature fusion and selection will still affect the final performance of identification. In [19], the authors focused on selecting an optimal set of features by comparing the performance of single feature for classification. In [37], the authors provided nine statistical features in the time domain to solve the problem of unbalanced EEG data sets. And it has been pointed out in [27] that optimized features improve the performance remarkably. Hassan et al. extracted three features by using empirical mode decomposition and selected the most significant feature for the classification [38]. But the features they used for automatic feature selection were from single domain, which will not obtain adequate information from the EEG signals. In this paper, fusional features from multiview are provided to obtain adequate information on the EEG signals. Furthermore, a PSO-based feature selection algorithm is proposed for automatic seizure detection. The algorithm outputs an optimal set of features to improve the performance of identification of seizure detection through three common classifiers.

In addition, epileptic seizures may cause discrimination in certain frequency bands. EEG signals from epileptic patients are nonstationary with time-varying frequency components. Depending on the multilevel spectrum analysis of the EEG signal, the signal is divided into four frequency bands: (0.4-4 Hz), (4-8 Hz), (8-12 Hz), and (12-30 Hz). And the correlation of these bands is analyzed for seizure detection. In this paper, a PSO-based feature selection method is used for automatic seizure detection to classify EEG signals. The framework of the proposed method is depicted in Figure 1. The innovation of this work is to employ PSO-based feature selection to find optimal feature set for multiview features and improve the accuracy of classification for automatic seizure detection with EEG signals. In addition, multilevel spectral analysis method is used to find which frequency band is better for seizure detection. The proposed method can optimize feature vector and improve the performance of identification of the EEG signals. The main contributions of this paper are provided as follows: (1)The multiview features are extracted to provide joint information of signals, achieving better performance than methods that only consider single-view features. Therefore, to obtain adequate information of the EEG signals, we extracted multiview features from the time domain, frequency domain, and information theory. In addition, a PSO-based feature selection method is proposed for automatic seizure detection to improve the performance of classification(2)Due to the different frequency components of the EEG signals, they are divided into four kinds of brain waves. The multilevel spectral analysis method is used to improve the performance of classification(3)Three well-known classifiers are employed to classify EEG signals concerning seizure or nonseizure events with optimal feature subset

The content of this paper is organized as follows. In Section 2, the database used to perform simulation experiments, and the preprocessing method is introduced. And the feature extraction and classification of EEG signals are carried out. The experimental results and the analysis are presented in Section 3. Section 4 discusses comparisons with state-of-the-art seizure detection methods with the same database. Finally, the conclusions are described in Section 5.

2. Methods and Materials

2.1. CHB-MIT EEG Database

The public CHB-MIT database provided by Boston Children’s Hospital is used for the proposed method. The database contains long-term EEG recordings from 23 neonatal epilepsy patients with intractable epilepsy [3941]. Data acquisition is performed by placing electrodes on the scalp of patients. The placement of electrodes on the scalp follows the international standard 10-20 system (see Figure 2). The database collects records of 23 patients between the ages of 1.5 and 22. And the detailed description is shown in Table 1. There are 24 cases in the database, each of them contains 9 to 42 consecutive files from a single subject. In 24 cases, a total of 664 files and 198 seizure events are included. All EEG signals are sampled at 256 Hz with a 16-bit resolution. Most of the files contain 23 channels (24 or 26 in some cases). At the same time, due to the continuity of the electrode montages, we cannot read the data of some channels in chb4, chb6, chb12, and chb16. Therefore, the data for these four patients are removed.

2.2. Preprocessing of EEG Signals
2.2.1. Segmentation

In order to train the classification model, 4352 segments have been selected from 20 patients. Taking into account the long-term characteristic of EEG signals, a sliding window of 8 s with 50% overlap is used to better achieve seizure detection. In this paper, the data of the ictal and interictal periods are segmented by a window of 8 s with overlap. The segment obtained from the ictal periods of the signals is labeled positive samples. The segment obtained from the interictal periods of the signals is labeled negative samples. Since the total duration of the interictal periods is much longer than that of the ictal periods. In order to avoid the impact on the classification caused by the additional bias towards a certain type of samples in the data division process, we ensure consistency of the data distribution during the division process by randomly sampling the negative samples to keep the number of positive and negative samples consistent. Specifically, all seizures of each patient will be used to train the model. And the nonseizure segments are selected randomly to keep the number of seizure segments and nonseizure segments equal.

2.2.2. Filtering

Before implementing the proposed seizure detection method, it is critical to perform signal filtering to remove unwanted components noise from the original EEG signals [42]. For the CHB-MIT database, most epileptiform discharges are below 32 Hz. So, the frequency band commonly used for epilepsy diagnosis is 0.01-32 Hz [43]. Therefore, a fourth-order Butterworth filter is used to filter the EEG signals and preserve the EEG data in that specific frequency band.

2.2.3. Channel Selection

Due to the multichannel characteristic of EEG signals, it is necessary to perform channel selection [4449]. Most of the cases in CHB-MIT collect 23 channel information. But in some cases, there are 24 or 26. If all channels are used for classification, features cannot be extracted uniformly and will lead to high computational time. Classification performance and computational time must be balanced by selecting channels related to epileptic events in EEG signals for further analysis.

In [46], the authors presented a machine learning based seizure prediction method for channel selection. They found that adaptively selected three to six channels were good enough for the EEG seizure prediction task. In this paper, five channels are selected for analysis of EEG signals. The channel selection method is described below.

The channel with the least standard deviation (SD) is selected firstly. When the spike-rate increases, the amplitude also becomes very high in the seizure segments. Due to unwanted artifacts or muscular movement, it can increase the spike-rate and amplitude in nonseizure segments. Those channels with artifacts will have comparatively high SD, making it hard to distinguish seizure events in long-term EEG recording. Then, the remaining of the four channels are selected according to mutual information (MI). MI is a quantitative measure and is used to find the interdependency or similarity of two random variables. By computing the MI between the first selected channel and the rest of the channels, the four channels that have higher MI with the selected one will be obtained. Specifically, the definition of MI of two random variables and is introduced as [50] where and are the entropy values of the random variables and , respectively. is the joint entropy of and .

2.3. Feature Extraction

Feature extraction methods are highly useful in the automatic detection of epilepsy. Multiview fusion features will provide adequate information for EEG signals [51]. In this paper, time-domain, frequency-domain, and information theory features are extracted. Time-domain features are obtained from the original signals or from the first and second derivatives of the signals. Frequency-domain features are calculated by the power spectral density of each segment. On the basis of the random characteristic of signals, entropy in information theory is introduced to represent the characteristic information of signals.

2.3.1. Time-Domain Features

Time-domain features are obtained from the original signals or from the first and second derivatives of the signals. Amplitude and interval analysis are applied to extract statistical information in the time domain. For each segment of signals, maximum, minimum, mean, line length, variance, number of maxima and minima, and root mean square amplitude are computed to obtain the dimensional features according to the recordings of segment . In addition, the variances of the first difference signal and second difference signal are also considered.

Due to the time-varying nature of the EEG signals, some nondimensional features are used, concerning wave form factor, peak factor, pulse factor, margin factor, skewness, and kurtosis. Furthermore, the number zero crossings of the original signal and its first and second derivatives are also introduced. Nonlinear energy is used to predict seizures in adult epileptic patients and is calculated for each segment shown as [52]

The Hjorth parameters are based on simple statistical calculations on the EEG [53], including activity, mobility, and complexity. Autoregressive modelling (AR) methods are used to analyze the EEG signal [54]. In this paper, the errors of AR modeling with order 1-9 are used. Totally, 31 features are used in the time domain.

2.3.2. Frequency-Domain Features

In frequency domain analysis, the Fourier transform is used to convert the time domain into the frequency domain to obtain features from another perspective. Frequency-domain features are obtained by calculating the power spectral density of each segment, which can reflect the magnitude of signal components at different frequencies. Through the Fourier transform, the peak frequency, median frequency, center frequency, total power, frequency variance, and root mean square frequency are selected. Spectral edge frequencies (SEFs) are calculated as frequencies below which 80%, 90%, and 95% of the total spectral power reside. To obtain a detailed and focal analysis of the frequency, the segment is decomposed through 8 levels of decomposition with the Daubechies 4 wavelet.

2.3.3. Information Theory

Based on the random characteristic of the signals, the entropy in information theory is introduced to represent the characteristic information of the signals. Entropy is regarded as a numerical measure method of stochastic signal and is widely used to analyze bioelectrical signals, such as EEG signals. In information theory, the entropy of signals is obtained as a feature which is depicted below.

The spectral entropy describes the relationship between the power spectrum and the entropy rate. And the calculation formulas is shown as follows: where is the power spectrum of the grouped EEG segment.

The singular value entropy can reflect the distribution obtained from the decomposition and transformation of the signal, which can represent the energy distribution of singular values. where is the value of the singular value decomposition of the segment.

The sample entropy measures the complexity of the time sequence by measuring the probability of generating new patterns in the signals. The large probability of new patterns will yield a high complexity of the sequence [55]. where is the dimension of segmented signals, is similarity threshold, is the probability of matching points for two grouped sequences with similarity tolerance , and is the probability of matching points for two grouped sequences with similarity tolerance of .

The wavelet entropy can reflect the degree of disorder of EEG signals and extract rhythm from nonstationary EEG signals. Through the wavelet energy, the relative energy relationship between the EEG signals is also obtained. Kumar et al. obtained a classification accuracy of 99.75% for normal and interictal EEG signals and 96.30% for interictal and interictal EEG signals by combining wavelet entropy features with a recurrent neural network for classification [56]. where is the relative wavelet energy of the subsignal.

Shannon entropy characterizes the amount of information contained in the EEG signals. And the calculation of this feature is mainly towards continuous time-varying signals. where is the probability of EEG segment . represents the total number of records of the segment.

In this paper, 46 features of EEG signals are extracted, including 31 time-domain features, 10 frequency-domain features, and 5 features in information theory for each divided segment during the feature extraction process of EEG signals. Specific features are listed in Table 2. The source code in MATLAB to calculate features is uploaded at website: https://github.com/chloeqisun/multiViewFS/tree/master.

2.4. Classification

The features previously extracted are inputted into the SVM to train the data and obtain the classifier model. SVMs mainly rely on two assumptions. One is transforming the data into a high-dimensional space implemented by a linear discriminant function, which makes complex classification problems simple. The other is that SVMs use training patterns near the decision surface. It assumes that they can provide more useful information for classification. The classification methods used in this paper mainly consist of two steps: training and testing. The selected kernel of the SVM is the RBF. The number of features obtained from multiview domain is large, and it is linearly inseparable. So, linear kernel is not available. In addition, the training data is further split for validation to avoid overfitting. The formula for RBF kernel is described as follows: where is the width of the kernel. is the kernel function, which is based on the dot production of two samples and .

RF is a kind of ensemble machine learning algorithm whose base learner is a decision tree. It is based on bootstrap sampling. In detail, if the feature dimension of each sample is , the algorithm specifies a constant , which is much smaller than , and randomly selects subsets from features. Then it selects the optimal subset from these subsets by pruning the tree. The randomness here makes the algorithm not easy to fall into overfitting and has good antinoise ability. Each decision tree is a classifier. For an input sample, decision trees will produce results. RF integrates the classification results of all decision trees through voting, and the class with the most votes will be regarded as the final output. RF is simple and easy to implement with low computational overhead. Compared with a single decision tree, it has higher accuracy and works efficiently on large data sets, which means it can handle data with high-dimensional features without dimensionality reduction.

-NN is a supervised learning classification algorithm. Given a test sample, the algorithm finds the closest training samples in the training set based on some distance metric and then makes a prediction based on the information of these neighbors. Usually, the algorithm employs the voting method to give classification results according to the class of neighbors.

2.5. PSO-Based Feature Selection Algorithm for Seizure Detection

The phenomenon of feature redundancy occurs after the calculation of feature extraction with a high dimension. For this reason, a PSO-based feature selection algorithm is designed for automatic seizure detection. The algorithm first trains the original feature sets and selects the subset with the highest classification accuracy. Then, the obtained feature vector is inputted into the classifier for classification.

The PSO is initialized as a bunch of random particles (random solutions). Then the optimal solution is found through iteration. In each iteration, each particle updates itself by tracking two extreme values (, ). After finding these two optimal values, the particle updates its velocity and position by the following formula: where is the inertia weight, and are random numbers in the range . Whereas and are learning factors.

Parameters:
: population size;
: the number of patients;
: -fold cross-validation;
: max iteration of the PSO;
for in : do
for in : do
  Data partitioning according to feature vectors of:
  , , and represent the training, validation, and test sets, respectively;
  Initialization population: Initialize position and velocity of each particle within permissible range;
  while t do
   for in : do
    Conduct 10-fold cross-validation on , and calculate average accuracy ;
    Evaluate the classification accuracy on ;
    Compute fitness according to (11);
    Update and optimum of ;
    Update the velocity and position of the particle ;
    Observe the , when the iteration achieves the best validation accuracy, the training will stop;
   Retrain and build the classifier on based on the selected feature subset;
   Measure test accuracy on the test set via the trained classifier;
  Select the feature subset with best test accuracy ;
Output: Optimal feature set;

In the beginning, 5-fold cross-validation has been used, and (5-1) folds of training data are further split for validation. The trained model is then tested with the fold that has been left aside for testing to find the optimal subset of features. And the ratio of training set, validation set, and test set is 50%, 30%, and 20%, respectively.

The PSO-based feature selection algorithm partitioned the patient data set into training, validation, and test sets, represented by , , and , respectively. is the number of particles and is the maximum iteration of the PSO. For each particle, the algorithm trains the model with current position and computes fitness according to the following formula: where and are weighted parameters. And is the accuracy of the classifier computed on with 10-fold cross-validation. The original feature size and the number of features selected by the proposed algorithm are represented by and , respectively.

In Algorithm 1, the local optimal value and the global optimal value of each particle are updated according to Equation (10). Then, the algorithm updates the speed and position and turns to the next iteration. When the maximum iteration is reached, the algorithm stops and an optimal feature set is obtained. Since 5-fold cross-validation has been used, five feature subsets are selected. In this paper, the feature subset with the best test accuracy will be selected. Finally, the classifier uses this optimal set of features to evaluate the classification performance of the model.

3. Experimental Result

The proposed PSO-based feature selection method for seizure detection is implemented using MATLAB R2020a and PYTHON 3.8 on a DELL Inspiron 14, Intel Core i5 8th, and RAM 8G. Due to the powerful ability of computation, the signal processing is implemented in MATLAB to obtain feature vectors. Python is used to train a model for the identification of seizures.

The performance evaluation of the method is based on some evaluation metrics, including accuracy (), precision (), sensitivity (), F1-measure (), specificity (), negative predictive value (), and area under curve (). Accuracy is one of the most widely used performance metrics in the literature and is defined as the proportion of correctly classified samples in the total samples. Precision is the proportion of positive samples that are correctly predicted. Sensitivity, also known as the true positive rate, is the proportion of seizure samples that are correctly classified as seizure. The higher the sensitivity, the lower the probability of missed detection. -measure comprehensively considers precision and sensitivity. Specificity, or true negative rate, refers to the proportion of nonseizure samples that are correctly classified as nonseizure. NPV is a measure of the completeness of the result. The AUC is defined as the area under the receiver operating characteristic (ROC) curve. The ROC curve is based on a series of different cut-off values or thresholds and uses the as the ordinate and the () as the abscissa. The can measure the performance of the classifier. These evaluation metrics are defined as follows: where (true positive) is the number of seizure samples that are predicted as seizure, (true negative) is the number of nonseizure samples that are predicted as nonseizure, (false positive) is the number of nonseizure samples that are predicted as seizure, and (false negative) is the number of seizure samples that are predicted as nonseizure.

The proposed PSO-based feature selection algorithm for automatic seizure detection can effectively identify seizure and nonseizure from the EEG signals. In the data preprocessing stage, the EEG signals are processed with filters to reserve only the frequency range commonly used for the diagnosis of epilepsy. Additionally, frequency bands commonly used for the diagnosis of epilepsy are classified into four different types of brain waves. The discrepancy of these four types of frequency band on the recognition of seizures from EEG signals is compared. At the same time, the effect of feature selection on seizure recognition is also studied.

3.1. Comparisons of the PSO-Based Feature Selection

To investigate the effect of proposed PSO-based feature selection on the classification of seizure detection, we study the performance of seizure detection before and after feature selection. Feature extraction methods obtain features from time domain, frequency domain, and information theory for EEG signals. For each channel, a total of 46 features are extracted. In this paper, five channels are selected for each patient, so the number of features for each patient is 230. Using the multiview feature vectors to classify the EEG signals of each patient can collect adequate information. However, the large number of features may lead to high computational cost. And redundant information sometimes occurs, reducing the performance of classification. Therefore, the PSO-based feature selection algorithm is used to select the best subset of features for a certain patient, and the influence of the classifier on the classification results before and after feature selection is compared. At the same time, three classifiers are also applied to analyze the performance of classification. The results are recorded in Tables 3 and 4. Table 3 shows the classification results obtained by the classifier without feature selection. It can be seen from the table that the accuracy of most patients is above 90% by using multiview feature extraction method. The extracted feature space really achieves better results. But due to the heterogeneity of the EEG signals of the patients, the results among these patients show huge differences. RF obtains better results among these three classifiers according to the that can reflect the performance of the classifiers. Here, RF is an ensemble algorithm that builds multiple decision trees for classification and selects part of the features to build a decision tree rather than all the features. Table 4 shows the results after using PSO-based feature selection. The average results of the PSO-based feature selection are better than the feature extraction method without feature selection. As shown in Figure 3, after using the PSO-based feature selection algorithm, classification accuracy has been improved significantly. SVM, RF, and -NN with PSO-based feature selection improve accuracy by 5.99%, 0.65%, and 4.78%, respectively. It can be seen from the figure that the PSO-based feature selection algorithm achieves a better improvement with SVM and -NN. When the RF classifier is training the model, it always randomly selects a part of the features for training, not all the features, which can be regarded as a kind of feature selection method. Therefore, the classification results do not improve significantly. But after using the PSO-based feature selection algorithm, the results of accuracy, specificity, and sensitivity are also improved significantly. SVM obtains the best result among these three classifiers. The disadvantage of the kernel function RBF of SVM is that it is easy to overfit. So, the training data is further split for validation to avoid overfitting. The proposed algorithm achieves high sensitivity and satisfactory specificity in all 20 patients. Except patients 3, 10, 11, and 14, the classification accuracy of the other patients is above 95% with SVM classifier, which also verifies the effectiveness of the algorithm for the seizure detection of EEG signals.

3.2. Comparisons of the Number of Features

To analyze the number of features after selection based on the PSO algorithm, we recorded the optimal feature subset for each patient. The PSO-based feature selection algorithm proposed in this paper can reduce the dimension of features, and thus reduces the computational time. The dimension of the original feature vector is . After the feature selection algorithm is executed, the average number of features of SMV, RF, and -NN are 61, 103, and 98, respectively. Figure 4 shows the number of features selected for each patient. As can be seen from the figure, the number of features is reduced with feature selection. The number of features selected by the SVM classifier is lower than that of the RF and -NN classifier, and the classification accuracy achieved by the SVM classifier is also better than those. Therefore, the SVM classifier shows better performance for seizure detection with the proposed method. Feature extraction or deep processing occupies the main time for automatic seizure detection. In this paper, the run time for feature extraction is provided. For each 8 s segment, Table 5 shows the impact of the selected features on the computational time. The run time of computing the feature vector is 14.71 s for the seizure detection method without feature selection. When the algorithm finds the optimal features, the computational time is reduced observably. The run time is 3.93 s for SVM, which is less than RF and -NN. This can be seen that the PSO-based feature selection algorithm cannot only reduce the computational time of feature vectors but also improve the accuracy of classification.

3.3. Comparisons of the Four Brain Waves

To detect the detailed information of EEG signals, we used multilevel spectral analysis method to study the effect of the different types of brain waves on the identification of seizure. In the data preprocessing stage, filters are used to obtain the four brain waves in different frequency ranges, including , , , and waves. The wave is a kind of brain wave that occurs during deep sleep. The wave is usually found in a subconscious state and appears in childhood. And some adults who experience emotional stress or patients with brain disease also have such a wave. The wave reflects the rhythm of neurons in the brain during wakefulness and rest. The wave occurs in a state of awake consciousness when people experience stress or anxiety. According to the four brain waves, the EEG data is first filtered through a bandpass filter. The features of the filtered signals are extracted and used to train and classify these four brain waves with the proposed PSO-based feature selection algorithm. The experimental results with the SVM classifier are shown in Table 6. In terms of accuracy, sensitivity, and specificity, the classification performance of wave is better than that of other brain waves and the frequency range of 0.01-32 Hz. The AUCs of these different frequency components are above 97%. Therefore, the SVM classifier can be a useful tool for identifying the seizures from these waves. The wave achieves the best result with accuracy of 98.14%. The experimental results show that the wave in the frequency range of 4-7 Hz is more suitable for the proposed seizure detection method.

4. Discussions

To further verify the generalizability and effectiveness of the proposed method, the comparison between the proposed seizure detection method and the state-of-the-art methods in the literature is given in Table 7. For feasibility purposes, the same database is used for all experiments. Regarding the CHB-MIT database, EEG signals are scalp data with artifacts that have implications for seizure detection. To verify the effectiveness of the proposed automatic seizure detection method on long-term multichannel seizure EEG signals. The CHB-MIT database is used in the experiment, and the proposed automatic seizure detection method is employed to automatically identify seizure and nonseizure events for EEG signals. Table 7 presents the results of the patient-specific performance evaluation and outlines the state-of-the-art automatic seizure detection method for the CHB-MIT database, where NR indicates that the value is not recorded. As can be seen from the results, the average sensitivity, specificity, and accuracy of the proposed method are better than those of the most recent research, achieving promising performance in this benchmark database.

Due to the different settings of the experimental environment, such as the number of patients and the number of channels, it is difficult to compare directly. Taking into account the feasibility of the comparison, all evaluation results are captured from the same benchmark database. Kiranyaz et al. used multidimensional PSO to evolve a collective network. They achieved a sensitivity above 89% and a specificity above 93% [57]. Zabihi et al. extracted seven features from the intersection sequence by reconstructing the signal trajectories and obtained a sensitivity of 88.27% with a 25% training set [58]. Li et al. proposed a novel framework named CE-stSENet in which the sensitivity, specificity, and accuracy were 92.41%, 96.05%, and 95.96%, respectively [48]. Chandel et al. applied triadic wavelet decomposition for offset and onset detection to obtain feature sets which yielded higher classification performance [36]. Tian et al. first constructed multiview features and then used CNN to learn features from them. They obtained higher accuracy compared to single view and common feature extraction methods [59]. Peng et al. used the Stein kernel-based sparse representation (SR) method to construct EEG signals in symmetric positive definite matrices, and the average accuracy reached 98.21%. And it could also be applied in real-time detection [49]. Chen et al. developed a framework to search for the optimal setting of the discrete wavelet transform and acquired promising performance [30]. Jiang et al. collected features from the decomposition of the symplectic geometry, and the average accuracy of the experimental result was 99.62% [31]. Based on the optimization of the features in our work, the sensitivity, specificity, and accuracy of the experimental evaluation of the data set on average are 98.21%, 98.57%, and 97.85%, respectively. They verified that the method can effectively classify long-term EEG signals. Hassan et al. extracted three features via empirical mode decomposition (EMD) to improve the performance of identification [38].

Existing work rarely considers the temporal and spatial information of the signal and the effect of feature selection for seizure detection. The advantages of this work are: (1) adequate features are extracted from multi domain and can improve the performance of classification for automatic seizure detection; (2) a PSO-based feature selection algorithm is proposed. The results show that it cannot only reduce the computational time but also improve the accuracy of classification; (3) we distinguish different types of brain waves through multilevel spectrum analysis. And the results have proven that wave is suitable for the proposed method. Furthermore, optimal features selected by PSO-based feature selection methods can improve the accuracy of classification, so the feature selection method plays a crucial role in seizure detection. And the limitations of this work are: (1)Finding a larger EEG dataset that contains more patients. Since our model used to learn various patterns of epileptic seizures is patient-specific, it cannot generalize to the patterns across different patients. Transfer learning is an effective way to train cross-patient model on bigger dataset(2)Artifact noises that existed in the original signals prevent us from extracting effective features. In the future work, denoising will be conducted before feature selection in the future work

5. Conclusions

With an increasing requirement for clinical applications, automatic seizure detection methods have been a frontier study in the assistant diagnosis of nervous diseases. To improve the classification efficiency, the channel selection method is used to reduce computational time for the multichannel EEG. The features of the EEG signals are extracted from time domain, frequency domain, and information theory to obtain adequate information about the EEG signals. Then, an automatic seizure detection method via PSO-based feature selection is presented. Furthermore, three well-known classifiers are used to find seizure or nonseizure events in EEG signals. The experimental results demonstrate that the seizure detection method via PSO-based feature selection improved accuracy, specificity, and sensitivity by 5.99%, 4.75%, and 6.56% with the SVM classifier, respectively. At the same time, considering the different frequency components of the EEG signals, the signals are divided into four brain waves to test and analyze the effect of different components on seizure detection. The results show that the rhythm signal of is more suitable for the proposed automatic seizure detection method.

Data Availability

The public CHB-MIT database provided by the Boston Children’s Hospital is used in this paper. It can be found in the website: https://physionet.org/physiobank/database/chbmit/.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work is supported by the National Natural Science Fund under Grant no. 61871232 and by the Research Innovation Program for College Graduates of Jiangsu Province in 2021 under Grant no. KYCX21_0718.