Research Article  Open Access
Robust Deep Network with Maximum Correntropy Criterion for Seizure Detection
Abstract
Effective seizure detection from longterm EEG is highly important for seizure diagnosis. Existing methods usually design the feature and classifier individually, while little work has been done for the simultaneous optimization of the two parts. This work proposes a deep network to jointly learn a feature and a classifier so that they could help each other to make the whole system optimal. To deal with the challenge of the impulsive noises and outliers caused by EMG artifacts in EEG signals, we formulate a robust stacked autoencoder (RSAE) as a part of the network to learn an effective feature. In RSAE, the maximum correntropy criterion (MCC) is proposed to reduce the effect of noise/outliers. Unlike the mean square error (MSE), the output of the new kernel MCC increases more slowly than that of MSE when the input goes away from the center. Thus, the effect of those noises/outliers positioned far away from the center can be suppressed. The proposed method is evaluated on six patients of 33.6 hours of scalp EEG data. Our method achieves a sensitivity of 100% and a specificity of 99%, which is promising for clinical applications.
1. Introduction
Epilepsy is a common and serious brain disorder, which affects about 50 million people worldwide [1]. Epileptic seizures are characterized by convulsions, loss of consciousness, and muscle spasms resulting from excessive synchronization of neuronal activities in the brain [2]. The abnormal neuronal discharges lead to epileptic patterns such as closely spaced spikes and slow waves in electroencephalogram (EEG). In seizure diagnosis and evaluation, visual inspection of these epileptic patterns from longterm EEG is a routine job for the doctors, which could be highly tedious and timeconsuming [3]. Therefore, reliable seizure detection system that identifies seizure events automatically would facilitate seizure diagnosis and has great potential in clinical applications.
There are two key points in automatic seizure detection. One is how to capture the diverse patterns of seizure EEG. For different individuals, the morphologies of seizure patterns could vary considerably. Therefore, effective feature extraction plays a key role in seizure detection and lots of efforts have been made. In order to characterize the changes in amplitude and energy in epileptic EEG, Saab and Gotman [4] proposed to use three measures, relative average amplitude, relative scale energy, and coefficient of variation of amplitude. Similarly, Majumdar and Vardhan [5] utilized the variance of differentiation of time window to detect significant changes in EEG signals. To identify the sharp waves which typically appear in seizure signals, Yadav et al. [6] introduced a morphologybased detector based on the slopes of the halfwaves of signals. To characterize the intrinsic timefrequency components of seizure patterns, GhoshDastidar et al. [7] used principal component analysis and Zandi et al. [8] applied wavelet transform to decompose the EEG signal for feature enhancement. To encode the changes in dynamics of epileptic signal, Jouny and Bergey [9] utilized nonlinear measures of sample entropy and LempelZiv complexity. To describe the topology state of epilepsy, Santaniello et al. [10] transformed the multichannel EEG data into a crosspower matrix, and eigenvalues of the matrix are used for seizure detection. The other key point is how to reduce the effect of noise. The noises caused by electromyography (EMG) or electrode movements commonly appear in EEG signal and are prone to trigger false alarms. These artifacts could bring impulsive changes with large amplitudes in EEG signal and lead to outlying values in the feature space. Some existing methods simply assumed these noises to be Gaussian [11, 12] and thus would be fragile given large amounts of outliers. Other approaches applied specific false alarm avoidance methods against these noises [4–6].
Although existing methods have shown some strengths in specific EEG datasets, the following problems have not yet been well explored. First, most existing features are designed according to the observation of a few seizure patterns, which seems too empirical to cover a wide range of seizure patterns; thus the features are usually suboptimal. Second, existing methods could be sensitive to the noises in EEG signals. Artifacts caused by EMG or electrode movements probably lead to a EEG signal shape similar to that of seizure states. A simple Gaussian assumption for the noises can be incorrect and the approaches designed based on this can cause high false alarms [11, 12]. Finally, most methods design the feature and classifier individually. Few efforts have been made to study the relationship between them or simultaneously optimize both of the two parts to maximize the abilities of them.
Inspired by the great success of deep network in image retrieval, speech recognition, and computer vision [13–21], this paper proposes a deep model framework to deal with the above issues. The main contributions of our work can be summarized as follows.(i)Instead of manually designing a feature, we propose a network called robust stacked autoencoder (RSAE) to automatically learn a feature to represent seizure patterns. The reconstruction error is first used to learn an initial feature.(ii)To reduce the effect of noises on EEG signals, we formulate a maximum correntropy criterion (MCC) to the RSAE network. Unlike the traditional autoencoder model which uses the mean square error (MSE) as the reconstruction cost, the output of the new kernel MCC increases more slowly than that of MSE when the input goes away from the center. Thus, the effect of those noises/outliers positioned far away from the center can be suppressed.(iii)The RSAE part and classification part are integrated to a new deep network. The objective of the network is the best seizure classification accuracy. Thus, both the initial feature and the classifier could be optimized according to the detection objective so that the whole detection system could be as optimal as possible. Besides, the optimal feature is completely datadriven. Given enough training data, the optimal feature learned by our method is able to represent various seizure patterns.
Our method is evaluated on 33.6 hours of EEG signals from six patients. With the MCCbased RSAE model, robust features are extracted from noisy EEG signal that the sensitivity and specificity increase by 14% and 1% compared with the traditional stacked autoencoder (SSAE). By supervised joint optimization of our deep model, the features are further optimized with better separability in the feature space and the sensitivity and specificity increase by 8% and 15%, respectively. In comparison with other methods, the proposed RSAE model outperforms the competitors and achieves a high sensitivity of 100% and a specificity of 99%.
The rest of this paper is organized as follows. Section 2 presents the detail of the RSAE deep model. The experimental results and discussions are shown in Section 3. Finally, we draw the conclusions in Section 4.
2. Materials and Methods
The framework of our method is shown in Figure 1. The multichannel EEG signals are firstly divided into shorttime segments, and we calculate the crosspower matrix for each segment to reveal the spatial patterns of the brain. Then, compact features are extracted from the crosspower matrix by a deep network cascaded to a softmax classifier. In our method, the deep network is first pretrained with the RSAE model to extract useful features, and then the features are further optimized jointly with the classifier to obtain optimal seizure detection system.
2.1. EEG Data
Scalp EEG data of six patients are used in this study. The EEG data were recorded during longterm presurgical epilepsy monitoring using NicoletOne amplifier at Second Affiliated Hospital of Zhejiang University, College of Medicine. A total of 28 channels were acquired at the sample rate of 256 Hz according to 10–20 electrode placement systems. The detail of the EEG data is given in Table 1. For each patient, all the available seizure EEG signals are used, and we randomly choose two 2.8hourlong EEG segments as the nonseizure data segmentation and data preparation.

2.2. Segmentation and Data Preparation
In the preprocessing stage, the multichannel EEG data are divided into 5secondlong segments with a sliding window. For each patient, a total of 4000 segments of nonseizure data and 1000 segments of seizure data are divided from the EEG signals. There is no overlap between nonseizure segments, while, for seizure segments, the proportion of overlap is configured considering the total length of the seizure signal and number of segments required.
After segmentation, all the segments are disordered and we randomly pick 750 seizure segments and 750 nonseizure segments as the training set and the rest 3500 segments are used as the testing set. All the experiments are carried out on the same training and testing set.
2.3. Multichannel Analysis
Studies have shown that the correlation structure of all pairs of EEG channels could reflect the spatiotemporal evolution of electrical ictal activities [22–24]. By characterizing the spatiotemporal patterns, it is possible to identify seizures and analyze seizure dynamics.
In this study, we adopt crosspower matrix [10] to reflect the spatial patterns of the brain. For each time window with channels, the crosspower matrix is . Each element in is defined by the crosspower [10] between the two EEG channels and in a given frequency band of as follows: where is the crosspower spectral density of channels and at frequency .
2.4. Frequency Band Selection
Considering the diversity of epileptic patterns among patients, we choose the frequency band patient specifically from theta (4–7 Hz), alpha (8–13 Hz), and beta (14–30 Hz) bands. In order to select the frequency band that best reflects the difference between seizure and nonseizure states, we adopt Fisher’s discriminant ratio (FDR) [25] as the criterion as follows: where and are means and covariance, respectively, of crosspower matrix of seizure segments and and are those of nonseizure segments. For each patient, only the training segments are utilized for frequency band selection, and the frequency band with the highest FDR is used for seizure detection. The frequency band selected for each patient is shown in Table 1.
2.5. Robust Stacked Autoencoder
After multichannel analysis, each time window is represented by a crosspower matrix of , where denotes the number of EEG channels. We propose to employ robust stacked autoencoders to extract reliable and compact features from the crosspower matrix.
In this section, first, we briefly introduce the basic autoencoder. Then, the robust autoencoder with MCC is presented to improve the feature learning ability under noises. Finally, we stack the robust autoencoders into a deep model for compact feature extraction.
2.5.1. Basic Autoencoder
Here, we begin with the traditional standard stacked autoencoder model (SSAE). An autoencoder is a threelayer artificial network including an encoder and a decoder. The encoder takes an input vector and maps it to a hidden representation through a nonlinear function as follows: where is the sigmoid function. Suppose and are dimensional and dimensional vectors, respectively; then is a weight matrix and is a dimensional bias vector.
Then, the vector is mapped back to a reconstruction vector by the decoder as follows: where the output vector is dimensional, is , and is a dimensional bias vector.
The parameter set is optimized by minimizing the average reconstruction error as follows: where is the loss function. Mostly, the mean square error (MSE) is used as
2.5.2. Robust Autoencoder
The traditional autoencoder model based on MSE loss is not suitable for stable feature learning in EEG signals. In EEG, especially in scalp EEG signals, the large amount of noises caused by EMG artifacts or electrode movements could bring abrupt changes in EEG signal and lead to outliers in both time and frequency domain. A typical example is shown in Figure 2. In this time window, the EEG signals are noised by shortterm EMG artifacts which lead to abrupt largeamplitude vibrations in some of the channels as shown in Figure 2(a). In the crosspower domain, such artifacts lead to outlying large values as in the light blocks in Figure 2(b). In the example illustrated, the crosspower between channel 17 and channel 18 is 5.41 × 10^{4}, which is far away from the interquartile range value of 395.3. In this situation, the MSEbased cost of the traditional autoencoder model could be dominated by these outliers so that the feature learning ability is weakened.
(a)
(b)
In order to learn robust features from EEG signals, we replace the loss function of the autoencoder model with correntropybased criterion to build robust autoencoder.
Maximum Correntropy Criterion. Correntropy is defined as a localized similarity measure [26] and it has shown good outlier suppression ability in studies [27, 28]. For two random variables and , the correntropy is defined as where is the mathematical expectation and is the Gaussian kernel with kernel size of as follows:
The correntropy induces a new metric that, as the distance between and gets larger, the equivalent distance evolves from 2norm to 1norm and eventually to zeronorm when and are far apart [29]. Compared with secondorder statistics such as MSE, correntropy is less sensitive to outliers. Figure 3 compares the secondorder cost and correntropy cost. As the input goes further from the center, the secondorder cost increases sharply, so that it is sensitive to outliers. By contrast, the correntropy is only sensitive in a local range and the increase of the cost is extremely slow when the input value goes out of the central area. Therefore, the correntropy measure is particularly effective in outlier suppression.
In practice, the joint probability density function is unknown and usually only a finite set of samples of is available for both and ; then the estimated correntropy can be calculated by
The maximum of correntropy error in (9) is called the maximum correntropy criterion (MCC) [29]. Due to the good outlier rejection property of correntropy, MCC is suitable for robust algorithm design.
Robust Autoencoder Based on MCC. In order to improve the antinoise ability of traditional autoencoders, we measure the reconstruction loss between the input vector and the output vector by MCC instead of MSE. In the MCCbased robust autoencoder, the cost function is formulated as where is the number of training samples and is the length of each training sample. The optimal parameter is obtained when is maximized.
In order to encourage the deep model to capture more implicit patterns, a sparsityinducing term is adopted. Studies of sparse coding have shown that the sparseness seems to play a key role in learning useful features [30, 31]. Xie et al. [32] combined the virtues of sparse coding and deep networks into a sparse stacked denoising autoencoder to achieve better feature learning and denoising performance. In our model, we regularize the reconstruction loss by a sparsityinducing term defined as in [32] as follows: where is the weight adjustment parameter, is the number of units in the second layer, is the activation value for the th hidden layer unit, and is a small number. The sparsityinducing term constrains that the value of should be near under KullbackLeibler divergence.
Also, a weight decay term is added to avoid overfitting. It is defined as follows: where represents an element in , is the parameter to adjust the weight of , and denotes number of units in layer . Therefore, the cost function of the proposed robust autoencoder is defined as
By minimizing the cost of , the parameter set could be optimized.
2.5.3. Stacking Robust Autoencoders into Deep Network
In order to learn more effective features for seizure classification, we stack the robust autoencoders into a deep model. Stacking the robust autoencoders works in the same way as stacking the ordinary autoencoders [17] and the output from the highest layer is cascaded to a softmax classifier for seizure detection. Such a model aims at the best seizure classification accuracy, and it is able to simultaneously optimize the feature and classifier.
The training process of the deep network includes two stages: unsupervised pretraining and supervised finetuning. In the pretraining stage, the network is trained layerwisely by the proposed robust autoencoder model to learn useful filters for feature extraction. A well pretrained network yields a good starting point for finetuning [33]. In the finetuning stage, a softmax classifier is added to the output of the stack, and the parameters of the whole system are tuned to minimize the classification error in a supervised manner. The network is globally tuned through backpropagation and all the parameters of both feature extraction and classification are jointly optimized. After finetuning, the deep network is well configured to obtain optimal overall classification performance.
3. Results and Discussion
In this section, experiments are carried out to evaluate the seizure detection performance of our model. The experiments include four parts: (1) we compare the unsupervised feature learning performance of the modified RSAE model and the standard stacked autoencoder (SSAE); (2) we compare the features before and after supervised finetuning to demonstrate the strength of joint optimization; (3) we compare the seizure detection performance of RSAE model with other methods; (4) we evaluate the influence of parameters in the RSAE model on the seizure detection performance.
In our experiments, the seizure detection performance is evaluated with the two commonly used criteria, sensitivity and specificity. Sensitivity is defined as the percentage of true seizure segments detected and specificity is the proportion of nonseizure segments correctly classified.
3.1. Performance of Feature Learning
In this experiment, we evaluate the unsupervised feature learning ability of the RSAE model with EEG signals. In our method, we train the RSAE model to learn compact features from the crosspower matrix. After the layerwised selftaught training, the deep network is well configured to learn useful features. The feature extraction results of the proposed RSAE model are illustrated in Figure 4. For both illustrations, the seizure begins at about the 20th second. After seizure onset, the patterns of features extracted by RSAE model show clear differences from nonseizure ones.
(a)
(b)
The feature learning performance of RSAE and SSAE is compared using EEG signal. In order to evaluate the ability of the features quantitatively, we utilize the classification performance as the criterion. In this experiment, the cost function of the SSAE model is as follows: where the loss function is formulated with MSEbased loss function as in (6) and and are formulated the same as RSAE.
We stack two autoencoders to constitute a threelayer network with 784 input units, 50 hidden units, and 10 output units. The same stacked architectures are applied for both RSAE and SSAE. The networks are initialized randomly and trained layerwisely using backpropagation to minimize the cost functions. The parameters are set as , , and for both methods and for RSAE.
The seizure detection results of both RSAE model and SSAE model are shown in Table 2. In order to eliminate the effects of randomness in network initialization, we present all the results averaged over 10 trials. Results show that the average sensitivity of RSAE is 97%, which demonstrates 14% improvement compared with SSAE. With specificity, the average result is 92% for RSAE which is also higher than that of SSAE. Thus, RSAE outperforms SSAE in both sensitivity and specificity.

In the analysis of the detection results, we find that SSAE fails mostly on EEG segments with impulsive noises such as the segment illustrated in Figure 2. Since such abrupt artifacts could appear frequently in EEG signals, the SSAE model could not be well trained because the MSEbased cost could be dominated by the large outliers. Thus, these EEG segments could not be well represented by the SSAE model. By contrast, the MCC in the RSAE model is more robust to large outliers. Therefore, the proposed RSAE method could handle noises in EEG signal well, and it provides more robust feature extraction performance than SSAE.
3.2. Performance of Joint Feature Optimization
In this experiment, we test the effects of joint feature optimization. After the MCCbased unsupervised learning, the deep network is well configured to extract useful features from EEG signals. On this basis, the deep model is finetuned through backpropagation to jointly optimize both feature and classifier, so that the optimal overall classification performance could be achieved. In this experiment, the parameters of RSAE are set the same as in Section 3.1 that only the unit number of the output layer is set to 3 for visualization convenience.
The visual comparison of features before and after finetuning is illustrated in Figure 5. In Figures 5(a) and 5(b), the red circles denote features of seizure segments while the blue stars are nonseizure ones. It can be seen that, after finetuning, the seizure and nonseizure segments are more separable in the feature space. We quantitatively analyze the separability of the features before and after finetuning with the FDR criterion as in (2) using the first four patients. As illustrated in Figure 5(c), the finetuned features achieve about ten times higher FDR than do the original ones, which strongly indicates that the joint optimization could help to learn superior features with high separability, so that the seizure detection performance could be improved.
(a)
(b)
(c)
The seizure detection performance of features before and after finetuning is presented in Table 3. After joint feature learning, the average sensitivity of six patients increases by 8% and the specificity increases by 15%. Therefore, the joint learning process enhances the separability of features between the two classes and greatly facilitates seizure detection performance.

3.3. Performance of Seizure Detection
In this experiment, seizure detection performance of the proposed RSAE model is evaluated and compared with singular value decomposition (SVD) based method. The SVD method is the most popular tool for correlation matrix analysis. Studies have shown that the seizure EEG signals commonly lead to a lowercomplexity state which could be well reflected by the eigenvalues from SVD of the correlation matrix [10, 22].
To provide a benchmark for the comparison, we also test the seizure detection performance with the original crosspower matrix without further feature extraction. The methods included in the comparison are configured as follows.(i)SVM: in SVM, the crosspower matrices of time windows are reshaped to vectors and fed into an SVM classifier with RBF kernel. The parameters of the SVM model are selected using 3fold crossvalidation.(ii)SVD(p) + SVM: for each time window, the crosspower matrix is decomposed by SVD, and the first eigenvalues are adopted as the features. The feature vectors are then classified by an SVM classifier with RBF kernel. The parameters of the SVM model are selected using 3fold crossvalidation.(iii)RSAE(q): the RSAE model is configured with 784 input units, 50 hidden units, and output units. The parameters are set as , , , and . For this method, all results are averaged over 10 trials.
The seizure detection results of the three methods are given in Table 4. For both SVD + SVM and RSAE, we test the seizure detection performance under two different choices of parameters of and , respectively. Results show that, with the original crosspower matrix classified by SVM, high sensitivities of above 0.99 are achieved for all six patients and the average specificity is 0.91. By the SVD + SVM method with , uneven performance is shown in different patients. For pt03, high sensitivity of 0.96 is reached with 0.99 of specificity. However, low sensitivities are obtained for pt01, pt05, and pt06. For SVD + SVM method with where more features are preserved, better sensitivities and specificities are achieved. However, the uneven performance over patients still exists, and the average sensitivity is only 0.83. Since the feature extraction process of the SVDbased method loses much useful information, lower performance is obtained compared with SVM benchmark. Besides, the seizure detection performance sees a decrease when fewer eigenvalues are used. By contrast, the proposed RSAE method achieves better performance than the benchmark SVM method. In RSAE with , high sensitivities of 1.00 and specificities of 0.99 are achieved for all patients. Equally high performance is obtained with . The RSAE model keeps robust seizure detection ability even with such small dimension of features.
 
SEN indicates sensitivity and SPE is specificity. 
3.4. Model Analysis
In this experiment, we test the influence of the two important parameters on the seizure detection performance. The first parameter is the output feature number, that is, the number of units of the output layer of the RSAE model, and the second parameter is the kernel size in MCC. The experiment is carried out using the first four patients.
3.4.1. Analysis of Feature Number
The feature number is tuned by the parameter in Section 3.3. In order to test the influence of on seizure detection, all the other parameters are fixed as in Section 3.3 and we gradually tune from 20 to 3. Figure 6(a) illustrates the seizure detection results averaged over four patients under different choices of . The result shows that the seizure detection performance of RSAE before finetuning sees a slight decrease with the decrease of feature number. However, after the finetuning, the seizure detection performance is greatly enhanced that high sensitivities and specificities up to 99% are achieved even with small feature numbers.
(a)
(b)
3.4.2. Analysis of
In the MCC, the kernel size serves as an important parameter that an appropriate choice of can effectively suppress the outliers and noises. The kernel size or bandwidth is a free parameter that its selection is still an open issue in ITL [26, 29, 34]. In practice, the parameter can be selected with Silverman’s rule [35]. In the experiments of Sections 3.1–3.3, we simply set .
Here, we test the influence of parameter on overall seizure detection performance. Also, all the other parameters are fixed as in Section 3.3. Figure 6(b) illustrates the seizure detection results under different selections of averaged over four patients. Results show that high seizure detection performance could be achieved under a wide choice of . Better results are obtained with small , and when increases from 0.1 to 0.2, the seizure detection performance becomes worse. In practice, the choice of should be small to keep good local property of the MCC.
4. Conclusions
In this paper, we have presented a novel deep model which is capable of extracting robust features under large amounts of outliers. Experimental results show that the proposed RSAE model could learn effective features in EEG signals for high performance seizure detection, and it is promising for clinical applications.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
This research was supported by Grants from the National Natural Science Foundation of China (no. 61031002), National 973 Program (no. 2013CB329500), National High Technology Research and Development Program of China (no. 2012AA020408), National Natural Science Foundation of China (no. 61103107), and Zhejiang Provincial Science and Technology Project (no. 2013C030453).
References
 Epilepsy, Factsheet no. 999, World Health Organization, Geneva, Switzerland, 2012.
 R. S. Fisher, W. Van Emde Boas, W. Blume et al., “Epileptic seizures and epilepsy: definitions proposed by the International League Against Epilepsy (ILAE) and the International Bureau for Epilepsy (IBE),” Epilepsia, vol. 46, no. 4, pp. 470–472, 2005. View at: Publisher Site  Google Scholar
 F. Mormann, R. G. Andrzejak, C. E. Elger, and K. Lehnertz, “Seizure prediction: the long and winding road,” Brain, vol. 130, no. 2, pp. 314–333, 2007. View at: Publisher Site  Google Scholar
 M. E. Saab and J. Gotman, “A system to detect the onset of epileptic seizures in scalp EEG,” Clinical Neurophysiology, vol. 116, no. 2, pp. 427–442, 2005. View at: Publisher Site  Google Scholar
 K. Majumdar and P. Vardhan, “Automatic seizure detection in ECoG by differential operator and windowed variance,” IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 19, no. 4, pp. 356–365, 2011. View at: Publisher Site  Google Scholar
 R. Yadav, A. K. Shah, J. A. Loeb, M. N. S. Swamy, and R. Agarwal, “Morphologybased automatic seizure detector for intracerebral EEG recordings,” IEEE Transactions on Biomedical Engineering, vol. 59, no. 7, pp. 1871–1881, 2012. View at: Publisher Site  Google Scholar
 S. GhoshDastidar, H. Adeli, and N. Dadmehr, “Principal component analysisenhanced cosine radial basis function neural network for robust epilepsy and seizure detection,” IEEE Transactions on Biomedical Engineering, vol. 55, no. 2, pp. 512–518, 2008. View at: Publisher Site  Google Scholar
 A. S. Zandi, M. Javidan, G. A. Dumont, and R. Tafreshi, “Automated realtime epileptic seizure detection in scalp EEG recordings using an algorithm based on wavelet packet transform,” IEEE Transactions on Biomedical Engineering, vol. 57, no. 7, pp. 1639–1651, 2010. View at: Publisher Site  Google Scholar
 C. C. Jouny and G. K. Bergey, “Characterization of early partial seizure onset: Frequency, complexity and entropy,” Clinical Neurophysiology, vol. 123, no. 4, pp. 658–669, 2012. View at: Publisher Site  Google Scholar
 S. Santaniello, S. P. Burns, A. J. Golby, J. M. Singer, W. S. Anderson, and S. V. Sarma, “Quickest detection of drugresistant seizures: an optimal control approach,” Epilepsy and Behavior, vol. 22, supplement 1, pp. S49–S60, 2011. View at: Publisher Site  Google Scholar
 D. Liu and Z. Pang, “Epileptic seizures predicted by modified particle filters,” in Proceedings of the IEEE International Conference on Networking, Sensing and Control (ICNSC08), pp. 351–356, IEEE, Sanya, China, April 2008. View at: Publisher Site  Google Scholar
 D. Liu, Z. Pang, and Z. Wang, “Epileptic seizure prediction by a system of particle filter associated with a neural network,” EURASIP Journal on Advances in Signal Processing, vol. 2009, Article ID 638534, 2009. View at: Publisher Site  Google Scholar
 P. Vincent, H. Larochelle, Y. Bengio, and P. Manzagol, “Extracting and composing robust features with denoising autoencoders,” in Proceedings of the 25th International Conference on Machine Learning, pp. 1096–1103, ACM, July 2008. View at: Google Scholar
 G. E. Hinton, S. Osindero, and Y. Teh, “A fast learning algorithm for deep belief nets,” Neural Computation, vol. 18, no. 7, pp. 1527–1554, 2006. View at: Publisher Site  Google Scholar  Zentralblatt MATH  MathSciNet
 G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” The American Association for the Advancement of Science. Science, vol. 313, no. 5786, pp. 504–507, 2006. View at: Publisher Site  Google Scholar  MathSciNet
 Y. Boureau and Y. Cun, “Sparse feature learning for deep belief networks,” in Proceedings of the Advances in Neural Information Processing Systems, pp. 1185–1192, 2007. View at: Google Scholar
 Y. Bengio, “Learning deep architectures for AI,” Foundations and Trends in Machine Learning, vol. 2, no. 1, pp. 1–27, 2009. View at: Publisher Site  Google Scholar  Zentralblatt MATH
 R. Salakhutdinov, J. B. Tenenbaum, and A. Torralba, “Learning with hierarchicaldeep models,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 8, pp. 1958–1971, 2013. View at: Publisher Site  Google Scholar
 H. Lee, R. Grosse, R. Ranganath, and A. Y. Ng, “Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations,” in Proceedings of the 26th International Conference on Machine Learning (ICML '09), pp. 609–616, Montreal, Canada, June 2009. View at: Google Scholar
 G. Hinton, L. Deng, D. Yu et al., “Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups,” IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 82–97, 2012. View at: Publisher Site  Google Scholar
 H. Lee, L. Yan, P. Pham, and A. Y. Ng, “Unsupervised feature learning for audio classification using convolutional deep belief networks,” in Proceedings of the 23rd Annual Conference on Neural Information Processing Systems (NIPS '09), vol. 9, pp. 1096–1104, December 2009. View at: Google Scholar
 K. Schindler, H. Leung, C. E. Elger, and K. Lehnertz, “Assessing seizure dynamics by analysing the correlation structure of multichannel intracranial EEG,” Brain, vol. 130, no. 1, pp. 65–77, 2007. View at: Publisher Site  Google Scholar
 K. A. Schindler, S. Bialonski, M. Horstmann, C. E. Elger, and K. Lehnertz, “Evolving functional network properties and synchronizability during human epileptic seizures,” Chaos, vol. 18, no. 3, Article ID 033119, 2008. View at: Publisher Site  Google Scholar
 C. Rummel, M. Müller, G. Baier, F. Amor, and K. Schindler, “Analyzing spatiotemporal patterns of genuine crosscorrelations,” Journal of Neuroscience Methods, vol. 191, no. 1, pp. 94–100, 2010. View at: Publisher Site  Google Scholar
 B. Scholkopft and K. Mullert, “Fisher discriminant analysis with kernels,” 1999. View at: Google Scholar
 L. Weifeng, P. P. Pokharel, and J. C. Principe, “Correntropy: a localized similarity measure,” in Proceedings of the International Joint Conference on Neural Networks (IJCNN '06), pp. 4919–4924, July 2006. View at: Google Scholar
 K. Jeong, W. Liu, S. Han, E. Hasanbelliu, and J. C. Principe, “The correntropy MACE filter,” Pattern Recognition, vol. 42, no. 5, pp. 871–885, 2009. View at: Publisher Site  Google Scholar  Zentralblatt MATH
 R. He, B. Hu, W. Zheng, and X. Kong, “Robust principal component analysis based on maximum correntropy criterion,” IEEE Transactions on Image Processing, vol. 20, no. 6, pp. 1485–1494, 2011. View at: Publisher Site  Google Scholar  MathSciNet
 W. Liu, P. P. Pokharel, and J. C. Principe, “Correntropy: properties and applications in nonGaussian signal processing,” IEEE Transactions on Signal Processing, vol. 55, no. 11, pp. 5286–5298, 2007. View at: Publisher Site  Google Scholar  MathSciNet
 B. A. Olshausen and D. J. Field, “Emergence of simplecell receptive field properties by learning a sparse code for natural images,” Nature, vol. 381, no. 6583, pp. 607–609, 1996. View at: Publisher Site  Google Scholar
 H. Lee, A. Battle, R. Raina, and A. Ng, “Efficient sparse coding algorithms,” Advances in Neural Information Processing Systems, vol. 19, pp. 801–808, 2007. View at: Google Scholar
 J. Xie, L. Xu, and E. Chen, “Image denoising and inpainting with deep neural networks,” in Proceedings of the 26th Annual Conference on Neural Information Processing Systems (NIPS '12), vol. 25, pp. 350–358, December 2012. View at: Google Scholar
 P. Vincent, H. Larochelle, I. Lajoie, and P. Manzagol, “Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion,” Journal of Machine Learning Research, vol. 11, pp. 3371–3408, 2010. View at: Google Scholar  MathSciNet
 R. He, W. Zheng, B. Hu, and X. Kong, “A regularized correntropy framework for robust pattern recognition,” Neural Computation, vol. 23, no. 8, pp. 2074–2100, 2011. View at: Publisher Site  Google Scholar  Zentralblatt MATH
 B. Silverman, Density Estimation for Statistics and Data analysis, vol. 26, CRC Press, 1986.
Copyright
Copyright © 2014 Yu Qi et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.