Recent Advances in Brain Signal Analysis: Methods and Applications
View this Special IssueResearch Article  Open Access
A Novel Fixed LowRank Constrained EEG Spatial Filter Estimation with Application to MovieInduced Emotion Recognition
Abstract
This paper proposes a novel fixed lowrank spatial filter estimation for brain computer interface (BCI) systems with an application that recognizes emotions elicited by movies. The proposed approach unifies such tasks as feature extraction, feature selection, and classification, which are often independently tackled in a “bottomup” manner, under a regularized loss minimization problem. The loss function is explicitly derived from the conventional BCI approach and solves its minimization by optimization with a nonconvex fixed lowrank constraint. For evaluation, an experiment was conducted to induce emotions by movies for dozens of young adult subjects and estimated the emotional states using the proposed method. The advantage of the proposed method is that it combines feature selection, feature extraction, and classification into a monolithic optimization problem with a fixed lowrank regularization, which implicitly estimates optimal spatial filters. The proposed method shows competitive performance against the best CSPbased alternatives.
1. Introduction
Brain computer interfaces (BCIs) are a rapidly growing field of research that combines neurophysiological insights, statistical signal analysis, and machine learning. BCIs are generally designed based on a pattern recognition approach, that is, extracting features from EEG signals and using a classifier to identify the user’s mental state from such features [1]. Those sequential approaches are called “bottomup” schemes; given a large collection of singletrial EEG data, better representations of the data are extracted to finally obtain the classification output at the top. In contrast, discriminative or “topdown” approaches focus on predicting user intentions and are based on two criteria: the empirical prediction performance and the regularizer. Suitably chosen regularizers automatically induce sparse decomposition of the signal, which corresponds to conventional feature extraction [2].
This paper proposes a discriminative method using a lowrank regularizer to estimate spatial filters for extracting effective features under a study. The advantage of the proposed method is that it combines feature selection, feature extraction, and classification into a monolithic optimization problem with a lowrank regularization, because this approach includes spatial filter estimation in the optimization framework of statistical inference model. Under a suitable chosen regularizer, it induces the best inference model, which implicitly estimates optimal spatial filters under the regularization assumption.
Emotion classification from EEG data has attracted much attention recently [3, 4]. Emotion also plays an important role in humanhuman communication and interaction. The ability to recognize the emotional states of people is an important part of natural communication. This field of research is still relatively new, and there is still much to be done to improve on existing elements in BCI but also to discover new possibilities.
For evaluation of the proposed methods, experiments were conducted to induce emotions by movies for dozens of young adult subjects and estimated the emotional states using the proposed method. The results were compared with conventional methods using a common spatial pattern (CSP).
This paper’s contribution is the proposal and the explicit derivation of the fixed lowrank constrained discriminative approach and its application to emotion recognition with comparative analysis with conventional methods. This paper is organized as follows. Section 2 describes the background of emotion recognition from EEGs, and Section 3 describes the proposed method. Section 4 presents the data acquisition and experimental protocol. Section 5 describes the results and discussion. Section 6 concludes the paper.
2. Background
2.1. Emotion in the Brain
The limbic system which is like a cortical ring around the brain stem is responsible for initial emotional interpretation of the signals from the autonomic nervous system. The hypothalamus is responsible for processing the incoming signals and triggering the corresponding visceral physiological effects like a raised heart rate or galvanic skin response.
From the hypothalamus the stimuli information is passed on to the amygdala, which is important for learning to connect stimuli to emotional reactions (reward/fear) and for evaluating new stimuli by comparing them to past experience.
The amygdala is considered vital for emotion processing. However, since it is an underlying structure like the rest of the limbic system, it cannot be detected directly in recording from the scalp. The amygdala is connected to the temporal and prefrontal cortices, which is thought to be the way visceral sensations are interpreted cognitively, resulting in a consciously experienced feeling of an emotion [5].
The temporal lobe is essential for hearing, language, and emotion and also plays an important role in memory. The prefrontal lobe is involved in the socalled highest level of functioning. It is responsible for cognitive, emotional, and motivational processes. The prefrontal lobe is part of the frontal cortex, which is said to be the emotional control center and to even determine personality. It is involved in, among others, judgment and social behavior. These functions are very much based on the experience of emotions.
2.2. Valence: Hemispherical Asymmetry
Psychophysiological research has shown the importance of the difference in activation between the two cortical hemispheres in the reaction that subjects show toward stimuli. Left frontal inactivation is an indicator of a withdrawal response, which is often linked to a negative emotion. On the other hand, right frontal inactivation is a sign of an approach response, or positive emotion.
HarmonJones [6] suggests that the hemispherical differences are not an indication of affective valence, but of motivational direction (approach or withdrawal behavior to the stimulus). Affective valence does seem tightly linked to motivational direction. Therefore, the hemispherical asymmetry patterns do indicate the affective valence.
Davidson and Fox [7] found that 10monthold infants exhibited increased left frontal activation in response to a film clip of an actress generating a happy facial expression as compared to a sad facial expression. Frontal cortical activity has been found to relate to facial expressions of positive and negative emotions as well.
3. Method
3.1. General Framework
Given a short highpass filtered EEG segment, , where is the number of channels and is the number of time points, the data are first bandpass filtered at a band range being studied. A commonly used form of a secondorder or power oscillationbased linear model can be written as follows:
Here, is the spatial filters, are the weighting coefficients of the features, and is a bias term. The classifier first projects the signal by spatial filters. Next, it takes a logarithm of the power of the projected signal. Finally it linearly combines these dimensional features and adds bias.
To determine spatial filters , CSP is often used [1]. Many variants of the original CSP have been proposed. [8]. Coefficients and are determined statistically from the training examples, that is, the pairs of trials and labels collected in the calibration phase. Label corresponds to the binary classes being studied.
To briefly summarize CSP to compute spatial filter , it is obtained by extremizing the following function:where is the spatial covariance matrix of the EEG signals from class as follows:where we assume a zero mean for the EEG signal.
Since remains unchanged if is rescaled, extremizing is equivalent to extremizing subject to the constraint . Using the Lagrange multiplier method, this constrained optimization problem amounts to extremizing the following function:The spatial filter extremizing is such that the derivative of with respect to equals 0:
The spatial filters are the eigenvectors of which correspond to its largest and lowest eigenvalues.
3.2. Proposed Model Calibration
If we ignore the logarithm in (1), it can be reformulated as follows:where and is the covariance matrix of . Finally we obtain
Note that is the elementwise inner product of the two matrices. To determine parameters , logistic regression was employed with lowrank regularization of . This amounts to solving the following optimization problem with training examples:where is the th singular value of and is the rank constraint of . The first term is convex. But since the lowrank constraint term is nonconvex, it is not guaranteed to find the optimal point. To solve this problem, the alternating direction method of multipliers (ADMM) [9] is employed with a hope that it has better convergence properties than other local optimization methods. For nonconvex problems, depending on the initial values, the solution can converge to different points.
The optimization problem is rephrased as follows:where is the set of matrices with rank . To solve it by ADMM, it can be rewritten as follows:where is the indicator function of . The augmented Lagrangian (using the scaled dual variable) iswhere is called the penalty parameter. So the iterative optimization of ADMM for this problem iswhere is the projection onto . Hence, is determined by carrying out a singular value decomposition, , and keeping the top singular values; that is,
Here we can initialize and as zero w.l.o.g. The primal and the dual residuals at iteration are defined as follows:These residuals converge to zero as ADMM proceeds.
3.3. Multiple Frequency Bands
The proposed method can be extended for estimating the spatial filters for multiple frequency bands. Let be the bandpass filtered data by filtering operator . The covariance matrix of the signal denoted as is obtained separately for each frequency pass band. Then align them as a large block diagonal matrix (14). To obtain the spatial filters for multiple bands, this block diagonal matrix is substituted for in (7). The solution is expected to effectively select the optimal spatial features from multiple frequency bands:
3.4. Merits of the Proposed Method
CSP estimates spatial filters based on a criterion that corresponding components produce minimum variance for one condition and maximum variance for the other and thus increase discriminative ability. However because the spatial filter estimation is decoupled from the inference model, such as logistic regression, optimal filters can only be predicted by using crossvalidation of the inference model and select the one which produces the best empirical inference performance.
On the other hand, our proposed model derived from “topdown” approach incorporates spatial filter estimation in the predictive model. Hence by focusing on the prediction performance with suitably chosen regularizer, such as fixed lowrank in our model, it induces sparse decomposition of the signal which corresponds to conventional feature extraction. Hence, it implicitly estimates optimal spatial filters of the best inference model under the assumption.
4. Emotion Recognition
To predict the state of emotion experienced by the participants from single EEG segments, a predictive model was employed that estimates from a given short EEG segment (here 5 sec) the probability that the participant experienced positive or negative emotions during that period. For the evaluation, fivefold crossvalidation is performed by holding out onesession dataset for the test and the remaining foursession datasets with labels were used to estimate parameters . For each round, the heldout dataset was used for tests to evaluate the classification error rate. In each round, the classification error rate is computed as the ratio of the number of correctly classified EEG segments divided by the total number of EEG segments in the trial.
4.1. Data Acquisition
Twentythree healthy adult volunteers participated under the informed consent that was approved by the ethical committee of ATR. Among them, ten subjects (males = 3, females = 7, age = ) were selected for analysis. The EEGs were recorded from 32 gelbased scalp electrodes, as shown in Figure 1, and four EOG placements around the eyes using an eego amplifier (ANT Neuro, Enschede, Netherlands) with 24bit resolution. The EEGs were sampled at 512 Hz. The protocol of the EEG experiment is described in Figure 2. To elicit emotions, a set of movie clips that were used in Samson et al. [10] was used. The movie clip set includes four classes of different target emotional states: positive, negative, neutral, and mixed. The average length of each clip was about 20 seconds. For each trial, to elicit emotions, four randomly selected movie clips of the same emotional class were played continuously without intervals and followed by selfassessment questions. One session consisted of four trials of four different movie classes. Before each trial, a random color grating pattern was displayed for 90 seconds to wash out the emotional states of the participant. The entire experiment consisted of seven sessions. For the analysis, however, only the first five sessions were used because, during the last two sessions, most participants appeared fatigued or drowsy.
4.2. Preprocessing
The EEG signals were downsampled from 512 to 128 Hz and highpass filtered at 0.5 Hz. The EOG and the muscle artifacts were automatically removed using AAR [11]. Among the 32 channels, only 26 channels were used excluding the reference and prefrontal channels, Fp1, Fpz, and Fp2, which were contaminated severely by the EOG artifacts. The EEG signals were rereferenced by the M1 and M2 means. All the trial data were extracted from the onset of the first movie clip until the offset of the last clip. Then all the trial data were bandpass filtered at 4–47 Hz. Finally, the length of all the trial data was identically set to 80 seconds. Training and testing data were generated by using a sliding window over each bit of trial data. The length of the window was five seconds, and the overlap between windows was two seconds.
5. Results
Figure 3 shows the variabilities of the classification error rate for all participants due to the change of the rank constraints. The classification error rate was computed by averaging over folds. It reached plateau after some rank constraints. This figure suggests that an optimal rank constraint exists between 1 and 10 regardless of the participants.
The elapsed time of convergence of the lowrank constrained optimization is shown in Figure 4. The time gradually decreases reciprocally as the rank increases and reaches plateau at some rank constraint. The trend is very similar to that of mean classification error rate in Figure 3. It is also perceived that subject data with higher classification error rates tend to have longer convergence times.
Figure 5 shows the change of mean classification error rate by changing the frequency band of bandpass filter to theta (4–7 Hz), alpha (8–13 Hz), beta (14–29 Hz), gamma (30–47 Hz), and wide band (4–47 Hz) in the preprocessing step. We use rank 6 for all the frequency bands. On average, better performance was obtained for beta and gamma bands compared with lower bands, that is, theta and alpha bands. The best performance was obtained when wide frequency band was used.
For comparative analysis with other methods, the spatial filters were calculated by CSP using identical preprocessed data. Since the proposed method used rank 6 constraint for analysis, six CSP filters were used for the alternative methods. CSP filters were selected automatically by three eigenvectors with the highest/lowest eigenvalues. We performed fivefold crossvalidation with different classification algorithms, namely, ElasticNet, LDA, QDA, linear SVM (LSVM), and SVM with RBF kernel (RSVM). For all methods, we used identical feature vectors by employing selected CSP filters. Note that, for each round, the spatial filters were recalculated using only the training data.
Figure 6 describes subjectwise comparison of the mean classification error rates of the proposed method with rank 6 constraint (LR6) and the six conventional methods with CSP. Except subject “S1,” the proposed method achieved better or comparative results compared with the other methods.
Table 1 describes the comparison results. The classification error rates were obtained by averaging over subjects. The proposed method outperforms CSPbased LDA, QDA, LSVM, and RSVM methods and shows comparative performance against ElasticNet, the stateoftheart method.

5.1. Discussion
If all the 23 subjects’ data are used for analysis, the mean classification error rate was dropped from 0.302 (0.103) to 0.412 (0.131) when using the proposed method (LR6). This is because the results of excluded subjects show below or just above chance level. The degradation of these results was common irrespective of methods including conventional methods. Therefore, these subjects data were deemed untrustworthy, so we manually select ten subjects for the analysis. The training/test data are noni.i.d. because of the sliding window approach; that is, there are temporal correlations among neighboring data. But our assumption is that even if i.i.d. assumptions are violated, the proposed method would work well in practice.
The lowrank constrained linear model in (7) can be transformed as follows: The last equation indicates that the spatial filters, and , are applied to the covariant matrix of from left and right, and the inner product of the spatially filtered signals is used to form the feature vector. The weighting coefficients of the feature vector correspond to singular values . Note that spatial filters and are almost identical, possibly with different signs, due to the nature of the original linear model denoted by (7). Hence, it corresponds to computing the power of spatially filtered signals, similarly to CSPbased methods.
The topographies in Figure 7 represent the scalp maps of six representative spatial filters, which are obtained by mean clustering of estimated spatial filters for all subjects as shown in Figure 12. The spatial filters are defined by the left singular vectors of . The color of topography is mapped from (red) to (blue).
Figure 8 shows the difference of mean spectral power density (PSD) between positive and negative over all subjects. The mean PSD is calculated by averaging PSDs of all spatially filtered signals by using six spatial filters estimated over all subjects. Hence the mean PSD represents total average PSD of spatially filtered EEG signals. The dotted plots show the deviation from the mean. From this figure, we can observe that positive tends to have larger power than negative especially over beta and gamma frequency bands.
5.2. Valence versus Neutral
In order to further examine the differences between positive/negative valence and neutral, twoway binary classifications were conducted for positive versus neutral and negative versus neutral. For the analysis, we employed the proposed method with rank 6 constraint as described above for positive versus negative analysis. The preprocessing is exactly the same as before except for training/test data which is relevant to the target two classes. The data were bandpass filtered at 4–47 Hz.
Figure 9 describes the classification error rates of the two cases: positive versus neutral and negative versus neutral for the same subjects as before. The mean and std. variation of classification errors were as follows: positive versus neutral () and negative versus neutral (). From this result, we notice that subjects with high classification performance for positive versus neutral case tend to have low performance for negative versus neutral case.
(a)
(b)
Figure 10 shows the scalp maps of six representative spatial filters for (a) positive versus neutral and (b) negative versus neutral, which are obtained by mean clustering of estimated spatial filters as described for Figure 7. Figures 13 and 14 show the estimated spatial filters for all subjects for positive versus neutral and negative versus neutral, respectively.
(a)
(b)
As we described in Section 2, many researches suggest that hemispherical asymmetry over the frontal cortex is implicated for emotions and motivations. If the assumption is true, our hypothesis is that the scalp maps of estimated spatial filters for valence versus neutral will likely show asymmetrical patterns over the frontal lobe, as such spatial filters should increase inference accuracy.
Among the topographies in Figure 10, about half of them do show asymmetrical patterns over the frontal and left/right temporal lobe area. It is difficult but slightly observable that left or right lateralization corresponds to positive or negative valence as indicated by previous works [6, 7].
Figures 11(a) and 11(b) show the difference of mean PSD between positive/negative and neutral over all spatially filtered channels and subjects. The mean PSD is obtained similarly as positive versus negative case as shown in Figure 8.
(a)
(b)
From these figures, we can observe that positive has larger power than neutral in beta and gamma bands. On the other hand, negative has similar or slightly lower power than neutral in those bands.
6. Conclusion
In this paper, a fixed lowrank spatial filter estimation for BCI systems was proposed with an application of emotion recognition induced by movies. The proposed approach unifies such tasks as feature extraction, feature selection, and classification, which are often independently tackled in a “bottomup” manner, under a regularized loss minimization problem. We explicitly derived the loss function from the conventional BCI approach and solved its minimization by optimization with a nonconvex fixed lowrank constraint.
The proposed method derived from “topdown” approach incorporates spatial filter estimation in the predictive model. Hence by focusing on the prediction performance with suitably chosen regularizer, such as fixed lowrank in our model, it induces sparse decomposition of the signal which corresponds to conventional feature extraction. Hence, it implicitly estimates optimal spatial filters of the best inference model under the assumption. The result of comparative analysis shows that the proposed method is competitive and has equivalent performance to the best CSPbased alternative.
In the discussion, we show that about half of the significant scalp maps of spatial filters estimated for positive versus negative do show asymmetrical patterns over the frontal and temporal lobe, which agree with the previous research works; that is, asymmetrical patterns over frontal cortex are implicated for emotions and motivations. We also observe that positive state tends to exhibit larger power than negative state over beta and gamma frequency bands. The opposite lateralization of hemispherical activity is weakly admitted for positive and negative cases.
There are some directions for future work and some suggestions for improving performance. First, extending the proposed method to multiclass classification is required to recognize variety of emotional states. Second, source space analysis might be useful to further investigate subcortical activities of emotions. Lastly, obtaining genuine training/test data is of primal importance especially for BCIs depending on interoceptive inputs like thoughts and emotions. One possible solution is to evaluate labels based on ratings of participants.
Competing Interests
The authors declare that they have no competing interests.
Acknowledgments
This research was supported by the ImPACT Program of Council for Science, Technology and Innovation, Japan. The authors wish to thank Dr. Motoaki Kawanabe for the valuable comments and appreciate Dr. Takeshi Ogawa and Dr. Hiroki Moriya for the experimental setup.
References
 B. Blankertz, R. Tomioka, S. Lemm, M. Kawanabe, and K.R. Müller, “Optimizing spatial filters for robust EEG singletrial analysis,” IEEE Signal Processing Magazine, vol. 25, no. 1, pp. 41–56, 2008. View at: Publisher Site  Google Scholar
 R. Tomioka and K.R. Müller, “A regularized discriminative framework for EEG analysis with application to braincomputer interface,” NeuroImage, vol. 49, no. 1, pp. 415–432, 2010. View at: Publisher Site  Google Scholar
 X.W. Wang, D. Nie, and B.L. Lu, “Emotional state classification from EEG data using machine learning approach,” Neurocomputing, vol. 129, pp. 94–106, 2014. View at: Publisher Site  Google Scholar
 I. Daly, A. Malik, F. Hwang et al., “Neural correlates of emotional responses to music: an EEG study,” Neuroscience Letters, vol. 573, pp. 52–57, 2014. View at: Publisher Site  Google Scholar
 E. R. Kandel, J. H. Schwartz, and T. M. Jessell, Principles of Neural Science, McGrawHill, New York, NY, USA, 2000.
 E. HarmonJones, “Clarifying the emotive functions of asymmetrical frontal cortical activity,” Psychophysiology, vol. 40, no. 6, pp. 838–848, 2003. View at: Publisher Site  Google Scholar
 R. J. Davidson and N. A. Fox, “Asymmetrical brain activity discriminates between positive and negative affective stimuli in human infants,” Science, vol. 218, no. 4578, pp. 1235–1237, 1982. View at: Publisher Site  Google Scholar
 F. Lotte and C. Guan, “Regularizing common spatial patterns to improve BCI designs: unified theory and new algorithms,” IEEE Transactions on Biomedical Engineering, vol. 58, no. 2, pp. 355–362, 2011. View at: Publisher Site  Google Scholar
 S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Foundations and Trends in Machine Learning, vol. 3, no. 1, pp. 1–122, 2011. View at: Publisher Site  Google Scholar
 A. C. Samson, S. D. Kreibig, B. Soderstrom, A. A. Wade, and J. J. Gross, “Eliciting positive, negative and mixed emotional states: a film library for affective scientists,” Cognition and Emotion, vol. 30, no. 5, 2015. View at: Publisher Site  Google Scholar
 G. GómezHerrero, W. De Clercq, H. Anwar et al., “Automatic removal of ocular artifacts in the EEG without an EOG reference channel,” in Proceedings of the 7th Nordic Signal Processing Symposium (NORSIG '06), pp. 130–133, IEEE, Rejkjavik, Iceland, June 2006. View at: Publisher Site  Google Scholar
Copyright
Copyright © 2016 Ken Yano and Takayuki Suyama. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.