Abstract
For efficient decoding of brain activities in analyzing brain function with an application to brain machine interfacing (BMI), we address a problem of how to determine spatial weights (spatial patterns), bandpass filters (frequency patterns), and time windows (time patterns) by utilizing electroencephalogram (EEG) recordings. To find these parameters, we develop a datadriven criterion that is a natural extension of the socalled common spatial patterns (CSP) that are known to be effective features in BMI. We show that the proposed criterion can be optimized by an alternating procedure to achieve fast convergence. Experiments demonstrate that the proposed method can effectively extract discriminative features for a motor imagerybased BMI.
1. Introduction
Brain machinecomputer interfacing (BMIBCI) is a challenging technology of signal processing, machine learning, and neuroscience [1]. BMIs capture brain activities associated with mental tasks and external stimuli, realize nonmuscular communication, and control channel for conveying messages and commands to the external world [1–3]. Basically, noninvasively measured data such as electroencephalogram (EEG), magnetoencephalogram (MEG), and functional magnetic resonance imaging (fMRI) are widely used to observe brain activities. Among them, because of its simplicity and low cost, EEG is practical for use in engineering applications [4, 5].
Efficient decoding around motorcortex is a crucial technique for the realization of BMI associated with motorimagery (MIBMI) [6, 7] with the application to controlling external devices [7], prostheses [4], rehabilitation [8], and so forth. For instance, it is also known that the real and imaginary movements of hands and feet evoke the change of the socalled mu rhythm in different brain regions [2, 3]. Therefore, the accurate extraction of these changes from the measured EEG signals in the presence of measurement noise and spontaneous components which are related to other brain activities enables us to classify the EEG signal associated with the different motor (imagined) actions such as movement of the right hand, left hand, or feet.
In classification of EEG signals in MIBMI and analyzing of the brain activities during motor imagery, signal processing techniques such as bandpass filtering and spatial weighting are used [1]. For the processing, presuming the parameters such as coefficients of the filters and weights that extract the related components is a crucial issue. Moreover, the optimum parameters in classification are highly dependent on users and measurement environments [9].
In order to determine the parameters, datadriven techniques that exploit observed data are widely used [1, 2]. The observed data essentially include class labels corresponding to the tasks. The techniques should find the parameters that extract discriminative features as much as possible. For example, the wellknown common spatial pattern (CSP) method finds the spatial weights by using the observed signals [1, 9, 10] in such a way that the variances of the signals extracted by the linear combination of a multichannel signal and the spatial weights differ as much as possible between two classes. The standard CSP method has been extended to methods to estimate the other parameters, such as the frequency bands [11–16], and methods to select the CSP features extracted with various parameters [17, 18].
Besides, one of the parameters to be decided by datadriven techniques is a time window because of the following reasons. A kind of BMIs is implemented based on cues which a user follows. In the BMI, the user begins to perform a task when the cue is given. Therefore, the time when the user begins to perform the task is known. However, the time when the brain activity associated with the task occurs is unknown. The time windows working to remove samples that do not contain the brain activity will not match the period when the cues are showed. For instance, the samples for a few hundreds of milliseconds after the cues are assumed not to be used to extract the features in previous works [13, 17, 19, 20], which heuristically determined the time window. Contrary to these works, this paper hypothesizes that an optimal observation period in classification depends on users. For example, reaction time defined as the elapsed time between the presentation of a sensory stimulus and the subsequent behavioral response is strongly associated with age [21]. The reaction time can be related to the time of response to the cues. Therefore, the time window should also be designed by using the observed signals or datadriven.
In this paper, we propose a method for finding the time windows as well as the aforementioned parameters (CSP and temporal filters) by extending a framework proposed in [15]. The proposed method enables us to find these parameters that separately extract several components that are observed in different spatial patterns, frequency bands, and periods of time. We call these simultaneously designed parameters common spatiotimefrequency patterns (CSTFP). As compared to the methods, in which the parameters are selected out of the predefined candidates fixed in advance [18], the CSTFP method has a higher degree of freedom for the parameters reducing the computational costs. In CSTFP, the coefficients of the temporal filters and the spatial weights are searched in the set of real numbers. Moreover, although the time windows are selected out of a set of candidates, the computational cost of the CSTFP method does not grow rapidly even with a large number of candidates.
The rest of this paper is organized as follows. Section 2.1 reviews the CSP method. Next we, illustrate the CSTFP method from Section 2.2 to Section 2.4. We experimentally analyze the CSTFP method by using artificial data in Section 3. Section 4 presents experimental results of classification of EEG signals during motor imageries to show the performance of CSTFP. Finally, the conclusions of this paper are presented in Section 5.
2. Common SpatioTimeFrequency Patterns (CSTFP)
We describe a novel method called the CSTFP method to simultaneously find the parameters for the spatial weights, the temporal filters, and the time windows. In this section, before introducing CSTFP, CSP is reviewed. Next, we define a signal extraction model using the spatial weighting, filtering, and time windowing. Then, we propose a criterion designing the parameters motivated by that for the CSP method. The proposed criterion evaluates not only the spatial weights but also the coefficients of the time windows and the temporal filters. Next, the optimization method for the proposed criterion with alternate updating is presented. Finally, we define feature vector with the feature extraction model for classification of unlabeled EEG signals.
2.1. Common Spatial Pattern (CSP): A Review
Let be an observed signal, where is the number of channels and is the number of samples. In BMI application, we do not directly use , but we use the filtered signal described as to find the CSP, where is a bandpass filter which passes the frequency components related to brain activity of motor imagery. Denote the components of by , where and is the time index. We assume the sets of the observed signals, and , where contains the signals belonging to class , is a class label, , and is a set having no elements. CSP is defined as the weight vector that maximizes the intra class variance in under the normalization of samples, where is a class label. More specifically, for being fixed, the weight vector is found by solving the following optimization problem [9, 10]: where denotes the expectation over , is the time average of given by , is the transpose of a vector or a matrix, and is the absolute value of a scalar. The solution of (1) is given by the generalized eigenvector corresponding to the largest generalized eigenvalue of the generalized eigenvalue problem described as where , for , are defined as .
2.2. Signal Extraction Model
The target signal and signal extraction procedure are formulated in this section. The filtered signal of a target signal, , denoted as , is defined as for , , where is the th entry of a vector, is the filter order of an FIR filter of which the coefficients are denoted by , is a spatial weight for th channel, and is a time window for th sample that takes a binary value of either 0 or 1. The structure of temporally filtering, spatial weighting, and windowing for is illustrated in Figure 1. In this model, is regarded as the spatial pattern, is regarded as the frequency pattern, and is regarded as the time pattern of the extracted signal.
The sample variance of over time is described as where is defined as , is defined as , is defined as , and is the Euclidean norm of a vector.
The variance defined in (4) can be transformed to matrixvector form as follows. We define , , whose elements are from as for and , where is the element at the th row and the th column of a matrix. Then, (4) can be modified to where is defined as .
2.3. Optimization for Sets of Parameters
We consider the problem of the design of sets of the FIR filter, the spatial weights, and the time window represented by . These sets of the parameters are designed in such a way that , , and maximize expectation of with respect to samples under the normalization of an expectation of over all of the observation. Additionally, we impose the orthonormality on , to avoid the trivial solution. Moreover, the time windows are chosen from given candidates for efficient optimization. Therefore, we formulate the following maximization problem: where represents set and is the cost evaluating the ratio of the feature value in all samples defined as where is any subspace in , is a candidate set for the time windows, defined as , is a class label chosen from 1 and 2, is a regularization parameter, and is the Kronecker delta defined as for and , otherwise.
Since it is difficult to simultaneously find all parameters, we consider sequential optimization to find the parameters with respect to each filter index . That is, we first find , and then find under the constraint on . This sequential optimization is represented with respect to each as where is a subspace defined as where represents a subspace spanned by vectors and the operator denoted by gives the direct sum of two subspaces. Methods for choosing has been discussed in [15]. In (9), to optimize the parameters indexed with , we adopt an alternating optimization procedure based on alternating least square (ALS). In the optimization, we separate the problem of (9) into three subproblems for , , and , respectively. Then, we update the parameters by alternating solving the subproblems. The three subproblems and these solutions are as follows.
The first subproblem is to optimize . While fixing and , maximizing (9) is found as the generalized eigenvector corresponding to the largest generalized eigenvalue of the generalized eigenvalue problem [15] described as where for , and is an eigenvalue.
The second subprolem is to optimize . While fixing and , maximizing (9) is found as the generalized eigenvector corresponding to the largest generalized eigenvalue of the generalized eigenvalue problem [15] described as where for , are vectors spanning described as where is the identify matrix and is an eigenvalue.
The third subproblem is to optimize while fixing and . The cost of (9) can be reduced to Then, where , , , and . Because are chosen out of , we calculate the values for all candidates in and the optimal can be chosen as the candidates that maximize . This formulates that
The procedure to design the spatial weights, the temporal filters, and the time windows is summarized in Algorithm 1 as a pseudocode.

2.4. Feature Vector Definition
With designed , , and , the feature vector of an EEG signal, , for classification is defined as where the set of is the CSPs corresponding to and . , , , are decided as follows. By solving (11) with and , we obtain spatial patterns as for and , where is the unitlength eigenvector corresponding to the th largest eigenvalue of (11). Then, are defined as and for , .
3. Experimental Analysis of Artificial Signal
We give an analysis of CSTFP by a toy experiment with an artificial signal in this section. We assume a 2class BMI where the observed EEG signals are modeled by a mixture of narrowband signals (see Figure 2). In this model, a trial signal belonging to class is given by where is a class label taking either or , is a vector representing a signal at a discrete time point , is the number of time samples for a trial, is the number of channels, is an th source signal of feature components, is the number of the source signals, is a vector defined as , is the amplitude of in the th channel for class , and is a stochastic noise. The source signals, , are generated as where represents a discrete spectrum, represents the time window that decides the period when the th source signal is generated, is a stochastic phase of the source signals, and the operator denoted by takes a real part of a complex number.
(a) 1st source, 
(b) 2nd source, 
(c) 3rd source, 
(d) 4th source, 
(e) 1st filter, 
(f) 2nd filter, 
(g) 3rd filter, 
(h) 4th filter, 
In particular, 200 artificial signals that we used in the experiment were generated with the conditions shown in Table 1 where is a Gaussian distribution with a mean, , and a variance, , and is a uniform distribution whose minimum and maximum values are denoted by and , respectively.
We applied the CSTFP method to the artificial signals as follows. The class label represented by in (8) was set to 1. The number of the sets of the parameters, , was set to 4. The order of the temporal filters was set to 41, yielding that ; the length of a filtered signal, , is 60. For the given candidates for the time windows, we define the following set. First, we define ten dimensional vectors as where . Then, we use all combinations of represented by and , , as the given candidate set. Therefore, the number of the candidates, , was 1023. Moreover, for that determines the search space for , we used [15], where each element of is defined as , , and the operator denoted by takes a residue of dividing by . In addition, the regularization parameter, , was set to 0.05.
The optimization resulted in Figures 2(e)–2(h) for the amplitude characteristics of the FIR filters, Figures 3(e)–3(h) for the time windows, and Figures 4(e)–4(h) for the normalized spatial weights. The centers of passbands of the filters shown in Figures 2(e)–2(h) coincide with the centers of the source signals shown in Figures 2(a)–2(d). Moreover, in the spectrum of the source and the amplitude characteristic of the designed FIR filter that have similar center frequencies, the spatial amplitude corresponding to the source and the spatial weight vector corresponding to the FIR filter are similar to each other. For instance, the correlation coefficient between shown in Figure 4(a) with circles and shown in Figure 4(f) with circles is 0.971.
(a) 1st source, 
(b) 2nd source, 
(c) 3rd source, 
(d) 4th source, 
(e) 1st window, 
(f) 2nd window, 
(g) 3rd window, 
(h) 4th window, 
(a) 1st source, and 
(b) 2nd source, and 
(c) 3rd source, and 
(d) 4th source, and 
(e) 1st weights, and 
(f) 2nd weights, and 
(g) 3rd weights, and 
(h) 4th weights, and 
The results also suggest that the time windows designed by the CSTFP method can remove samples observed in the periods of time that do not contain the source signals. For instance, hence, the source signal, , is not observed in the first 25 samples according to Figure 3(a), the time window for extracting is expected to remove the first 25 samples. Because and the amplitude characteristics shown in Figure 2(f) have similar center frequency, we can decide that the time window for extracting is the time window shown in Figure 3(f). Although the designed time windows do not coincide with because is applied to an FIRfiltered signal that is shorter than that of the original one, the time window removes samples in the first 10 samples, as we expected. Moreover, we can decide that the time window shown in Figure 3(c) is for extracting due to the same reason. As the observed signals do not have in the last 25 samples (see Figure 3(c)), the time window shown in Figure 3(c) removes the samples in the last 20 samples.
4. Experiment of EEG Signal Classification
A comprehensive comparative study was performed to illustrate the ability of the CSTFP method to produce more accurate classification of EEG signals during motor imagery over several conventional methods (CSP [10], common sparse spectral spatial patterns (CSSSP) [12], filter bank CSP (FBCSP) [17], and discriminative filter bank CSP (DFBCSP) [15]).
4.1. Data Description
We used dataset IVa from BCI competition III [22], which was provided by Fraunhofer FIRST (Intelligent Data Analysis Group) and Campus Benjamin Franklin of the Charité  University Medicine Berlin (Department of Neurology, Neurophysics Group) [23] and dataset 1 from BCI competition IV, which was provided by Berlin Institute of Technology (Machine Learning Laboratory), Fraunhofer FIRST (Intelligent Data Analysis Group), and Campus Benjamin Franklin of the Charité  University Medicine Berlin (Department of Neurology, Neurophysics Group) [24]. The condition for each dataset is shown in Table 2. They have two classes of motor imagery. The signals in the provided datasets were recorded with the sampling rate of 1000 Hz.
We furthermore applied to this dataset a Butterworth lowpass filter whose cutoff frequency is 50 Hz and the filter order is 4, and downsampled to 100 Hz.
4.2. Result
For the experiments, as a sample for each trial, we used a signal observed in the period from to [sec] after the cue that directs the subject to perform the task. In the experiments, was tuned by a method we mention later. was set to 3.5 and 4 seconds for BCI competition III dataset IVa and IV dataset 1, respectively.
In order to compare the classification abilities for the methods, we obtained the classification accuracy rates by crossvalidation (CV). In each classification in the CV, we separated learning samples for selecting the parameters of the feature extraction and a linear discriminant analysis (LDA) classifier and test samples for obtaining classification accuracy rates.
For the methods to be compared (CSP, CSPExh, CSSSP, FBCSP, DFBCSP, and CSTFP), the parameters for the feature extraction were obtained as follows.
(i)CSP: the parameters determined in this method are spatial weights. Before obtaining the spatial weights by the CSP method, we applied the Butterworth bandpass filter with the passband of 7–30 Hz. In the CSP method, we minimized the variance cost of the right hand class in (1). The eigenvectors corresponding to the largest and smallest eigenvalues of the eigenvalue problem (2) were given as the spatial weights. (ii)CSPExh: the parameters determined in this method are spatial weights and a passband of the Butterworth filter. The passband of the Butterworth filters was tuned as Hz by an exhaustive search by the CSP method and the learning samples. After the filtering with the passband, the spatial weights were given by the same manner as that of the CSP method. (iii)CSSSP: the parameters determined in this method are spatial weights and a bandpass filter. The bandpass filter between 7 and 30 Hz was applied as preprocessing [12]. CSSSP was applied with regularization parameters, , and the parameter for the number of the spatial weights, . The order of the filter was fixed to 16 [12]. (iv)FBCSP: the parameters determined in this method are bandpass filters out of a filter bank and associated spatial weights. FBCSP was applied with the mutual information based best individual feature and a naïve Bayesian Parzen window (NBPW) classifier [17]. The filter bank comprising 9 bandpass filters covering 4–40 Hz was used. All filters were Chebyshev type II filters with a bandwidth of 4 Hz each. In FBCSP, the number of the spatial weights, , in each band was set to 8. These parameters were decided by referring to [17]. (v)DFBCSP: the parameters determined in this method are FIR filters and spatial weights associated to each FIR filter. DFBCSP was applied with the FIR filter order of 41 as done in [15]. In optimization, we stopped iteration when error of the cost function between successive iterations becomes under . (vi)CSTFP: the parameters determined in this method are FIR filters, the corresponding time windows, and spatial weights associated with each FIR filter. We fixed to observe the behavior of the resulting time window. CSTFP is applied with the FIR filter order, 41 [15], and the following candidate set for the time windows. The candidate set for the time windows, , consists of the vectors defined as where we choose and out of a set such that and . The regularization parameter, , was set to 0.1. In alternating optimization, we initialize as a random vector which is orthonormalized from and as a vector all the elements of which are one. We stopped iteration when error of the cost function between successive iterations becomes under .For the obscure parameters such as in the above list, we furthermore tuned them by 5 5 CV using the learning samples as done in [25]. We conducted the nested CV [25] with all of the combinations of these parameters and obtained the classification accuracy rates. We adopted the combination that performed the highest rates as the parameters. The parameters tuned by the nested CV in the learning data and the candidates for them are summarized in Table 3.
After we obtained the feature vectors extracted by the filters, spatial weights, and the time windows that are designed by each listed method, we calculated the logarithm of the feature vectors. Then, the LDA for the learning samples was used to obtain a projector onto the 1dimensional space. The threshold for classification was determined as the middle point of two class averages over the projected learning samples. The feature vectors from the test samples were classified by the projection and the threshold and we obtained the classification accuracy rate in each CV. In Tables 4 and 5, CSTFP results in the highest accuracy rate in the average over whole subjects.
Moreover, we conducted the classification experiments, in which the parameters shown in Table 3 were fixed to show the effects of the changes of the parameters on the classification accuracy rates. Figure 5 shows the accuracy rates of CSP and the rates of each method averaged over all subjects with each . For each subject, the parameters shown in Table 3 except for were fixed to the combinations of the parameters that performed the highest accuracy rates with each . In Figure 5(a), that perform the highest accuracy rates are different among the subjects. Moreover, Figure 5(b) shows that the accuracy rates highly depend on in the conventional methods. Figure 6 shows the variation of the accuracy rates by the various regularization parameters, , in CSTFP. For each subject, the parameters, and , were fixed to the combinations of the parameters that performed the highest accuracy rates with each . CSTFP performs higher accuracy as goes higher than 0.1.
(a) CSP
(b) Average over all subjects
(a) BCI competition III dataset IVa
(b) BCI competition IV dataset 1
We show examples of the spatial patterns, the frequency patterns (the amplitude characteristics of the FIR filters), and the time patterns designed by CSTFP in Figures 7 and 8. As we can observe in Figures 6, 7(a), and 8(a), the short time windows caused by small result in poor classification accuracy rates. All of the time windows shown in Figures 7 and 8 remove the samples observed within a few hundreds of milliseconds after the cue. The results suggest that the brain activities related to the task cannot be observed just after the cue. In Figures 7(c)–7(e) and 8(c)–8(e), the time windows do not significantly change in around 0.1–0.4 of . This result of the relations between and the time windows corresponds with the result shown in Figure 6 in which the classification accuracy rates do not strongly depend on the regularization parameter, , by more than 0.12. Moreover, the FIR filters do not highly differ even with various . However, the spatial weights with the small windows (Figures 7(a) and 8(a)) differ from the other weights with the longer windows. Furthermore, there are slight differences in the time patterns and the frequency patterns between subject and subject .
(a) 
(b) 
(c) 
(d) 
(e) 
(a) 
(b) 
(c) 
(d) 
(e) 
5. Conclusion
We have proposed a novel method called CSTFP for the classification of EEG signals during motor imagery. Our objective is to design the time windows that are adopted in signal processing for a cuebased BMI. Incorporating the idea into DFBCSP has allowed us to simultaneously design the parameters for the time windows, the spatial weights, and the FIR filters. These parameters are optimized in a single criterion based on the CSP method. We have shown the optimization procedure for the problem of the CSTFP method that is conducted by sequentially and alternatively solving the subproblems into which the original problem is divided. Through the experiments of the artificial signals and actual EEG signals, we have shown the performance of CSTFP. In the experiment, we have demonstrated that CSTFP achieves high classification accuracy rate. Our experimental results also suggest that the CSTFP method can find the frequency bands and the time periods in which brain activities associated with a mental task can be observed.
Methods for finding time intervals in which the feature components associated to some brain activities are observed are needed for accurate classification in an asynchronous BCIBMI [26]. We would like to develop such methods by applying the proposed CSTFP algorithm to our future works.
Acknowledgments
This work is supported in part by KAKENHI, GrantinAid for Scientific Research (B), 21360179 and 24360146, and GrantinAid for JSPS Fellows, 24686.