Abstract
In this paper, we focus on recognizing epileptic seizure from scant EEG signals and propose a novel transfer enhanced expansion move (TrEEM) learning model. This framework implants transfer learning into the exemplarbased clustering model to improve the utilization rate of EEG signals. Starting from Bayesian probability theory, by leveraging KullbackLeibler distance, we measure the similarity relationship between source and target data. Furthermore, we embed this relationship into the calculation of similarity matrix involved in the exemplarbased clustering model. Then we sum up a new objective function and study this new TrEEM scheme earnestly. We optimize the proposed TrEEM model by borrowing the mechanism utilized in EEM. In contrast to other machine learning models, experiments based on synthetic and realworld EEG datasets show that the performance of the proposed TrEEM is very promising.
1. Introduction
Epilepsy is a kind of chronic disease, which is caused by the sudden abnormal discharge of brain neurons, resulting in transient brain dysfunction. Usually patients themselves have no obvious impression of the epileptic seizure process. For this reason, doctors can only diagnose the patient’s condition according to the patient’s family members or other personnel present during the epileptic seizure in the past. However, the accuracy of this manual diagnosis method is low. The pathogenesis of epilepsy is mainly manifested by abnormal neural discharge and abnormal brain waves. Although medical imaging, such as Computed Tomography (CT), magnetic resonance imaging (MRI), functional magnetic resonance imaging (FMRI), SinglePhoton Emission Computed Tomography (SPECT), Positron Emission Computed Tomography (PET), has made great progress over the years, and the major diagnostic method of epilepsy is based on electroencephalogram (EEG). More specifically, PET and fMRI cannot be used as common technical means because of their technical requirements and costs. In addition to the high cost, MRI cannot judge the nonstructural lesions as well. Invasive cortical electroencephalogram (ECoG) requires craniotomy and implantation of electrodes, which has a high risk; and noninvasive EEG and MEG can provide functional and structural detection. Taking all these into account, EEG has been widely concerned in more and more theoretical researches and clinical practice because of its low cost, convenient signal acquisition, and noninvasiveness.
The research on diagnosis of epilepsy through EEG signals has been a hot topic in related fields, compared with manual diagnostic method, and machine learning methods are less timeconsuming and more accurate [1–8]. Numerous machine learning models have been used to recognize epileptic EEG signals, such as support vector learning [9, 10], fuzzy system [1, 3], naïve Bayes [11], and exemplarbased clustering model [2, 12, 13]. The traditional machine learning process is usually divided into the three following steps, as shown in Figure 1: (1) EEG signal preprocessing improves signal to noise ratio and provides highquality input signal for spike detection. (2) According to the characteristics of spikes, artificial design features can reduce the signal dimension and highlight the difference between spikes and background signals. (3) According to the obtained features, spike signals are detected by the machine learning mechanism involved.
In summary, one of the significant issues in the field of processing EEG signals by machine learning technique is the insufficient training data. We briefly introduce some mechanisms for epileptic diagnosis through EEG signals here. Jiang [1] integrates transductive transfer learning, semisupervised learning, and TakagiSugenoKang (TSK) fuzzy system to take full advantage of the scant training data. Zhu [5] proposed dicmvfcm, which automatically evaluates the importance and weights of each view and then performs weighted multiview fuzzy clustering based on FCM framework to achieve accurate fuzzy partition. Bi [2] proposed a novel model called FEEM for incomplete EEG signal, which first compresses the potential exemplar list and thus reduces the scale of pairwise similarity matrix. However, to make better use of training data, we still need to do a lot of work, and we focus on this issue in this paper as well. Specifically, this paper aims at recognizing epileptic seizure from scant EEG signals.
Transfer learning is believed to be an effective strategy to solve problems caused by insufficient training data [1, 5, 13, 14]. Assume that there are two datasets from similar source: one has plenty of features and details and is easily to be learned, while the other one lacks details and is hard to be learned. Transfer learning offers an idea of leveraging the description of former data to study the latter data. The sufficient welldescribed data is called source data, while the insufficient rough data is named as target data. Accordingly, transfer learning utilizes source data to improve the learning result of target data. Under this framework, effectively measuring the relationship between source data and target data is an important part and has a great influence on the efficiency of relevant study model. Thus, starting from Bayesian probability theory, this paper first extends the concept of similarity matrix in the exemplarbased clustering model; and this strategy also broadens the application range of the algorithm to transfer learning scenario. By leveraging Kullback–Leibler distance, we propose a new transfer enhanced expansion move learning model called TrEEM. The detailed contributions of this paper are listed as follows:(i)According to the transfer learning theory [1, 5, 13, 14], considering the similarity between source and target data, the proposed model TrEEM should keep the target data close enough to the source data. Theoretically supported by the information theory, based on the Bayesian probability framework, TrEEM utilizes KL distance to measure the similarity between source data and target data and minimizes this KL distance in the optimization process.(ii)In the scenario of recognizing epileptic seizure, we aim at diagnosing the actual patient. As TrEEM is built on graph theory and pairwise similarity matrix and is an exemplarbased clustering model, this model selects exemplar from actual data. This advantage fits the requirements in the relevant scenario here.(iii)TrEEM embeds KL distance between target data and source data into the calculation of similarity matrix. Thus, the optimization mechanism utilized in EEM can be directly used to solve the new target function of TrEEM. In detail, we leverage expansion move optimization algorithm which performs better than LBP [15, 16] algorithm.
The paper is organized as follows. The related works are discussed in Section 2. We illustrate the target function and optimization mechanism of the proposed TrEEM in Section 3. The simulation experimental results and analysis are shown in Section 4. We make a conclusion in Section 5.
2. Related Works
Many researchers are committed to using machine learning technology to classify EEG signals, including SVM, fuzzy system, naïve Bayes, and exemplarbased clustering model. In this section, we illustrate two popular learning frameworks, namely, Enhanced Expansion Move (EEM) and TSK fuzzy system. EEM is a widely used exemplarbased learning model, and TSK fuzzy system is a typical fuzzyrulebased clustering model.
2.1. Enhanced Expansion Move
Consider a dataset ; N is the total number of Ddimensional data points. is the output, whereas the element refers to the exemplar for each .
The target function of a typical exemplarbased clustering model is defined as follows [12, 15]:where S is the similarity matrix of the dataset, and the elements are defined as ; is shown as follows:
In [15], the authors regard the above target function as the energy function of Markov random field (MRF) and verifies the below theorem.
Theorem 1. When, for , equation (3) is verified, the graphtheory based framework can be used to optimize the target function of the exemplarbased clustering model as shown in equation (1).
Enhanced Expansion Move (EEM) framework optimizes the above target function by an improved algorithm [15]. In more detail, theoretically supported by Theorem 1 and graphcuts [16] algorithm, EEM expands the active region of candidate exemplar from a single data to the whole dataset. EEM defines the second optimal candidate exemplar for as below, which is selected from the whole dataset as mentioned above.where is the dataset among which the exemplar is , and represent other exemplars in expect for . In this way, the optimization mechanism behaves more rapidly and effectively.
EEM algorithm is one of the most popular exemplarbased clustering models, and it performs effectively and steadily in numerous simulation experiments involved [2, 12–15]. Scientists have applied this model for data stream, constrained supervised learning, and EEG signal processing.
2.2. TSK Fuzzy System
TSK fuzzy system is a rulebased system and it is widely used as a typical fuzzy system model for both classification and clustering. Generally, the th TSK fuzzy rule for fuzzy rules can be described as .where is a fuzzy set subscribed by the input for the th fuzzy rule and is a fuzzy conjunction operator. Each rule is premised on the input data which is mapped to a singleton . Thus, the output of the TSK fuzzy system is defined aswherewhere is the membership grade that can be obtained using Gaussian membership function, and the other involved parameters also could be estimated using clustering techniques and other partition methods [1, 3–5].
Accordingly, based on the relevant theory of TSK fuzzy system, the target model above in equation (6) converts to a parameter learning process of the corresponding linear regression model. In line with recent achievements, TSK fuzzy model has strong interpretability and robustness. For this reason, this TSK fuzzy model is widely used among numerous intelligent medical diagnosis systems, including recognizing epileptic seizure from EEG signals.
In this section, we briefly introduce two popular machine learning clustering frameworks used in the recognition of EEG signals, namely, EEM and TSK fuzzy system. The detailed descriptions are shown in Table 1. Considering the scenario of diagnosing epileptic patients from some healthy patients based on their EEG signals, we focus on EEM clustering model in the rest of this paper.
3. Transfer Enhanced Expansion Move Learning Model
In this section, we first analyze the theoretical basis of TrEEM from Bayesian probabilistic framework. Second, we induce the novel algorithm TrEEM in detail. Then, considering the optimization algorithm utilized in EEM algorithm, we optimize target function as well. Generally, the structure of this novel model is shown in Figure 2. See Figure 2; on the basis of sourcedatabased exemplar set, starting from Bayesian probability framework, TrEEM first imbeds the distance between source data and target data in the calculation of similarity matrix. This distance is measured by KullbackLeibler distance. Then we induce the novel target function for TrEEM. Finally, TrEEM directly calls the optimization algorithm in EEM to solve this model and obtain the targetdatabased exemplar set.
Besides, we list the frequently used notations in Table 2.
3.1. Theoretical Preliminary of TrEEM Scheme
As mentioned before, transfer learning considers two datasets from similar source, namely, source data and target data; and the relationship between source data and target data is considered as a significant factor in this model (see Table 2); in the following part, we define the sufficient welldescribed source data as . After study, we obtain the sourcedatabased exemplar set denoted as in the above table. Then the insufficient target data is defined as above. Moreover, probabilistic framework contributes to measuring this relationship as well. Therefore, supported by Gaussian probability hypothesis and exemplarbased cluster mechanism, we built the pairwise probabilistic relationship of target data by leveraging the corresponding similarity as follows:where is the similarity between and its current exemplar and parameter is a standard deviation from Gaussian probability hypothesis.
As to the exemplar set, we should exclude the situation when an exemplar appoints other exemplars among current exemplar set except for itself as its own exemplar. Consequently, Bayesian posterior probability of an exemplar set is defined as follows:and here is the same as the definition shown in equation (2).
Accordingly, under Bayesian probabilistic framework and the discussion of EEM algorithm in Section 2, the objective function in equation (1) is equal to the following function:
In conclusion, equation (10) defines another form of the target function of EEM by introducing Bayesian probabilistic framework and Gaussian probability hypothesis. Starting from this target function, we would be able to design TrEEM for recognition of epileptic EEG signals in the next subsections.
3.2. Design of TrEEM Scheme
According to information theory, the KullbackLeibler distance (KL distance) is a natural distance between two real probability distributions and it has been widely applied to solve numerous issues [17–19]. The definition of KL distance is shown below.
Definition 1. Consider two probability distributions as P and Q; the KL distance from P to Q is as follows:where is the input data.
What is worth mentioning is the fact that KL distance is an asymmetric measurement, namely, , according to Definition 1.
Furthermore, given as a possible exemplar set, is the best exemplar for among current exemplar set . As discussed above, we also define under Bayesian probabilistic framework as follows:where is obtained from equation (8). In transfer learning, actually two datasets are involved, that is, source data and target data. In equation (12), note that the first is from the target data, and the second is from the possible exemplar set, namely, from the source data. Thus, see Table 2; to make the distinction clear, the symbol represents the source data, while stands for the target data in the rest of this paper.
Although the target data is not exactly same as source data, according to those theoretical analyses of transfer learning, the sourcedatabased learning model and results should contribute to the learning of new target data as well [3, 4, 20–22]. Otherwise, it will become negative transfer learning, which is not under discussion in this paper. Accordingly, we believe the targetdatabased exemplar that is set to be evaluated is supposed to be similar to the sourcedatabased exemplar set. In this section, we measure the difference between targetdatabased exemplar set and sourcedatabased exemplar set by the aforementioned KL distance with the definition shown in Definition 1. To be specific, in the process of designing the TrEEM learning model, we minimize the difference of target exemplar and source exemplar set by controlling the KL distance between them. The structure of TrEEM is shown is Figure 2 in detail. In view of this goal, on the basis of the probabilistic target function in equation (10) of EEM, we build the novel target function for the proposed TrEEM model as follows:where is the targetdatabased exemplar set to be obtained and represents the sourcedatabased exemplar set, as shown in Table 2. is the regularization parameter. In terms of maximum a priori (MAP) principle and combining Definition 1 and equation (12), (13) becomes
Observing equation (14), we can find that the values of the second and third terms belong to the same magnitudes; hence, the value of will not be large and the specific determination strategy will be discussed in Section 4.
Introducing the definitions of , , and in equations (8) and (9) and discarding the constant terms, equation (14) can be simplified into the following equation:
Comparing equations (15) and (10), we conclude that they are similar in structure. According to Theorem 1 in Section 2, TrEEM’s target function also can be solved by graphcuts mechanism. Consequently, we will discuss the optimization mechanism step by step in the next subsection.
3.3. Optimization Mechanism of the TrEEM Scheme
As mentioned before, the novel target function in equation (15) is similar to that of EEM algorithm under Bayesian probabilistic framework, so the optimization mechanism utilized in the EEM algorithm is supposed to be helpful in solving the novel target function. However, we need to deal with the difference between these two models firstly.
In detail, we redefine the similarity relationship of target data by imbedding sourcedatabased exemplar set . Specifically, we single out the suitable exemplar from for target sample by equation (12) and build the new pairwise transfer similarity matrix according to the new measurement in the following equation:where is the Euclidean distance between samples and , is the regularization parameter, refers to the exemplar singled out from source data. By introducing this new definition of similarity relationship, the target function equation (15) of TrEEM is equal to equation (10) in structure. Meanwhile, the constraint condition in Theorem 1 is true for TrEEM model as well. Therefore, the optimization mechanism of EEM algorithm is also suitable for the proposed TrEEM model; and the novel model TrEEM is described in detail in Algorithm 1.


EEM utilizes expansion move to optimize its learning model. As discussed above, the mechanism is also suitable for the proposed TrEEM model. We analyze this Enhanced Expansion Move optimization mechanism step by step here. Firstly, as the target functions shown in both equations (15) and (10) also can be defined as the energy function of MRF, we consider this optimization process as an energy reduction process of the MRF. In general, we start from the change values of energy to decide whether to accept new exemplar for a sample. Secondly, the improved optimization mechanism is designed to broaden the effective field when changing the sample’s exemplar. That is to say, assume that a sample’s current exemplar is abandoned; it will search all the rest exemplars for a new exemplar. This new alternative exemplar is defined as follows:where is the original exemplar for , is current exemplar set, and is the obtained alternative exemplar. By introducing this alternative exemplar for , we enhance the optimization efficiency.
Note that TrEEM model redefines the similarity matrix as equation (16). So, the following discussion is based on the similarity matrix . Apparently, the optimization mechanism would consider two cases; namely, is among current exemplar set or is not among current exemplar set. We analyze these two cases step by step in the next subsections.
3.3.1. Case I
is a current exemplar.
Obviously, in the process of optimization, this current exemplar may be abandoned. As previously analyzed, whether to keep in the ultimate exemplar set is decided by the reduction values of energy function calculated by the target function in equation (15).
Specifically, if is accepted as an exemplar, the energy of the model remains unchanged, and the reduction value is equal to 0. Otherwise, if is not accepted, all samples whose exemplars are would redetermine their exemplars; these samples are defined as . Theoretically supported by the related analysis in [2, 12, 14, 15], new exemplar for would be as shown in equation (17). Thus, the energy reduction should be computed by the following equation:
Then, we take the greater value of 0 and as the ultimate energy reduction for this case, as defined in the following equation:
Namely, if is the ultimate energy reduction, change their exemplars to . Otherwise, the current exemplar set is convincing and remains unchanged.
3.3.2. Case II
is not a current exemplar.
In this case, we define the current exemplar of as . When optimizing this situation, we firstly pretend to consider as a new alternative exemplar; namely, . Then, similar to the analysis in case I, whether to accept as ultimate exemplar is decided by the reduction values of energy function. In detail, if is accepted as a feasible exemplar, some samples would change their exemplar from to . These samples are defined as . Thus, the corresponding energy reduction is defined as follows:
On the other hand, may be current exemplar set is not convincing, so all samples would be certain to redetermine their exemplars including . As discussed before, the new exemplars for these samples are defined by equation (17), and the resulting energy reduction is listed as follows:
Remember that equations (20) and (21) are based on the assumption that . Considering this, the energy reduction caused by which is not a current exemplar should be
To sum up, the optimization mechanism is shown below in detail.
3.4. Model Complexity
The similarity matrix is calculated according to the Euclidean distance; . So, the scale of the similarity matrix is ; note that the amount of target data is not big. In the optimization process, we directly utilize the expansion move, which has time complexity. For the proposed TrEEM, sourcedatabased exemplar set is actually one of the inputs and is out of the scope of the time complexity analysis of TrEEM here. Although we adopt EEM to obtain the sourcedatabased exemplar set , many other clustering models could be helpful. TrEEM needs to select from the sourcedatabased exemplar set in the first step, and this procedure has the time complexity of . In summary, the time complexity of TrEEM is overall. Compared with other stateoftheart transfer learning frameworks, it is very acceptable.
4. Experimental Results
To comprehensively evaluate the TrEEM model, we have conducted several experiments based on both synthetic and realworld datasets. For comparison, we also perform comparison with other different machine learning mechanisms, namely, EEM [15], multiclass SVM [23], TSK fuzzy system [24], and TSC [25] in the experiments. In this section, we will carefully analyze these experimental results.
4.1. Preparation
Before inputting the TrEEM model, we need to preprocess the original nonstationary EEG signals [1–3]. Usually, the features of EEG signals include timedomain features, frequencydomain features, and timefrequency features. In short, in timedomain analysis, statistics component features of the original EEG signals will be analyzed [26]. In frequencydomain analysis, power spectrum analysis and ShortTime Fourier Transforms (STFT) [27, 28] are commonly used. In timefrequency analysis, time domain and frequency domain are simultaneously extracted from highdimensional and nonlinear EEG signals.
Various methods have been commonly used to extract EEG signals’ features, including wavelet [29, 30], KPCA (Kernel Principal Component Analysis) [1, 2], and LDA (Linear Discriminant Analysis). In line with the experiments setting in [1–3], we use two feature extraction methods in this section, that is, KPCA and wavelet.
Besides, we use both synthetic and realworld datasets in this section. Firstly, we randomly generate 300 twodimensional data points as 3 classes, shown in Figure 3. Then, we also choose Bonn EEG dataset [1, 2] as realworld data. The Bonn dataset is from the University of Bonn, Germany (http://epileptologiebonn.de/cms/upload/workgroup/lehnertz/eegdata.html), and has five classes. Each class (A to E) contains 100 signal channel EEG segments of 23.6 s duration. The sampling rate of all the datasets was 173.6 Hz. Each sample has 6 attributes. Table 3 lists a brief description of this dataset.
In addition, we examine the involved experimental results from two performance indices, namely, [2, 31] and. Assume that is the total number of data points; we give the definitions of them below. That is, is shown in the following equation:where is the amount of data whose cluster is in line with their class, while is the amount of those data whose cluster is inconsistent with their class. Also is defined in the following equation:where is the cluster result obtained by the learning model, while is the real data label set.
In all, the experiments are implemented in 2010a Matlab on a PC with 64bit Microsoft Window 10, an Intel (R) Core (TM) i74712MQ, and 8 GB memory.
4.2. Results Analysis
As mentioned before, four machine learning methods are involved in this section, namely, EEM, multiclass SVM, TSKFS, and the proposed TrEEM algorithm. There is no need to preset the cluster number in advance for EEM and TrEEM. In fact, it is a huge advantage for all exemplarbased clustering frameworks, whereas cluster number is an important parameter for TSKFS. Multiclass SVM and TSC are two typical classification methods. Both EEM and TrEEM need parameter selfsimilarity . For multiclass SVM, in line with [23, 32], we choose Gaussian kernel function. In TSKFS, usually the number of clusters is set to be equal to the number of fuzzy rules. Also, TSC need to preset the number of clusters. We follow the parameter setting strategy in relevant papers here. Besides, 5fold cross validation is used to search the optimal parameters; and Table 4 lists brief introductions of these involved methods and the parameter searching range.
To construct the transfer learning scenario, for both synthetic and realworld EEG signal datasets, we randomly choose 80% data as source data and the remaining 20% as target data. For statistical analysis, in the experiment procedure, each algorithm is repeatedly executed 10 times; and we record the average performance and the corresponding standard deviation of and . Furthermore, to deeply observe different extraction methods of EEG signals, we use both KPCA and wavelet here. The detailed comparison in terms of and of the proposed TrEEM model and other benchmark approaches is shown in Table 5.
Observing Table 5, in this experimental setting, especially considering the fact that Bonn EEG signal dataset has 6 attributes and 5 classes, the performance of TrEEM model is very promising. TrEEM model is capable of recognizing useful information from both synthetic and realworld EEG signal datasets. Moreover, compared with other benchmark machine learning models, the proposed approach TrEEM performs better in terms of and in this scenario.
In the experiment procedure, we also find that parameter selfsimilarity has important influence on the experimental results, especially on the obtained number of clusters. The finding is identified with other exemplarbased clustering models [2, 12–15], and the parameter selection method is also in line with these models. See Table 4; we multiply with , and the value of is decided from . In detail, with large value of , the TrEEM model would obtain a smaller number of clusters, while small will bring in big cluster numbers. To fit with real data labels, in our experiments here, we set .
The regularization factor has a big effect as well. As analyzed before, determines how the source data affects the clustering result, and the value should not be too large. Obviously, if is too large, the clustering result based on the target data will be very close to that based on the source data, which is not what we want. When , it means that TrEEM does not take the source data into account and TrEEM degrades to the typical EEM framework. In particular, Figures 4–6 show the dependence of model results on the value of . When , in terms of and , source data improves the performance of TrEEM. Index is more sensitive to , while changes slowly.
Table 6 shows the average running time of 10 times for each approach. Yet the time consumption of the proposed TrEEM model is a bit more than those of EEM and TSKFS; it is still in the same magnitude. Considering the improvements in and , we think that the time complexity is acceptable. The results also fit the discussion in Section 3.4.
Therefore, from experimental results in Tables 5 and 6 and the above analysis, we can conclude the following:(1)For both synthetic and realworld EEG signal datasets, TrEEM performs great. Thus, we believe that TrEEM can effectively absorb knowledge from scant target data when similar source data exists.(2)For time consumption, TrEEM takes source data into account, which will inevitably increase the time complexity. Remember that the scale of target data will not be big, and the time consumption is very acceptable especially when combined with the performance in Table 5.(3)Although TrEEM requires the most parameters shown in Table 6, and have big effects. Observing Figures 4–6, the performance of and depends more on the value of . Note that we can narrow the optimization range according to the discussion in Section 3. Thus, we believe that parameter setting would be easy.
5. Conclusion
In conclusion, the contribution of this paper is providing a novel TrEEM framework to learn from few EEG signals when recognizing epileptic seizure. Starting from information theory, the proposed TrEEM method implants the similarity relationship between source and target data into the exemplarbased clustering model to improve the utilization rate of EEG signals, whereas this structure keeps all merits of the original optimization scheme. Therefore, without increasing the complexity of the model, TrEEM utilizes transfer learning method to learn from scant EEG signals. Yet our experimental results have shown promising performance of TrEEM, and several other perspectives should be considered as well. For instance, when each class contains unbalanced data, will this TrEEM method still work? And if we can provide multiple source data, what should we do to make them collaborate instead of bringing a negative effect? These are the problems that we should discuss in the future.
Data Availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This work was supported in part by the 2018 Natural Science Foundation of Jiangsu Higher Education Institutions under Grant 18KJB5200001, by the Natural Science Foundation of Jiangsu Province under Grant no. BK20161268, and by the Humanities and Social Sciences Foundation of the Ministry of Education under Grant 18YJCZH229.