Research Article  Open Access
Motor Imagery EEG Classification Based on Decision Tree Framework and Riemannian Geometry
Abstract
This paper proposes a novel classification framework and a novel data reduction method to distinguish multiclass motor imagery (MI) electroencephalography (EEG) for brain computer interface (BCI) based on the manifold of covariance matrices in a Riemannian perspective. For method 1, a subjectspecific decision tree (SSDT) framework with filter geodesic minimum distance to Riemannian mean (FGMDRM) is designed to identify MI tasks and reduce the classification error in the nonseparable region of FGMDRM. Method 2 includes a feature extraction algorithm and a classification algorithm. The feature extraction algorithm combines semisupervised joint mutual information (semiJMI) with general discriminate analysis (GDA), namely, SJGDA, to reduce the dimension of vectors in the Riemannian tangent plane. And the classification algorithm replaces the FGMDRM in method 1 with knearest neighbor (KNN), named SSDTKNN. By applying method 2 on BCI competition IV dataset 2a, the kappa value has been improved from 0.57 to 0.607 compared to the winner of dataset 2a. And method 2 also obtains high recognition rate on the other two datasets.
1. Introduction
Brain computer interface (BCI) based on motor imagery (MI) is used to analyze human intention by electroencephalogram (EEG) signals generated by human brain electrophysiological activity [1, 2]. Based on BCI technology, exoskeletons can be used to help people with physical disabilities regain their motor ability, and BCI also has wide applications in smart home, entertainment, military, and other fields [3–6].
Common spatial pattern (CSP) is widely used in motor imagery to extract EEG features [7]. CSP has excellent performance in two classification tasks, but the drawback is that it needs a lot of electrodes [8].
Despite its short history, the use of the Riemannian geometry in BCI decoding is currently attracting increasing attention [9–13]. Covariance matrices lie in the space of symmetric positive definite (SPD) matrices, which can be formulated as a Riemannian manifold [14]. In the BCI field, the connections of the CSP algorithm and the tools of information geometry have been investigated, considering several divergence functions in alternative to the Riemannian distance [15–18]. Barachant et al. proposed a simple data augmentation approach for improving the performance of the Riemannian mean distance to mean (MDM) algorithm [13]. Kumar et al. propose a single band CSP framework for MIBCI that utilizes the concept of tangent space mapping in the manifold of covariance matrices, and the proposed method obtains good results when compared to other competing methods [19]. A hierarchical MDM classifier for multiclass problem has been tested in [20].
Advanced classifiers based on the tangent space on the Riemannian manifold of positive matrices are also receiving increasing attention. Barachant et al. map the covariance matrices in the tangent space and apply feature selection and linear discriminate analysis (LDA) in the tangent space [10]. For the application of the classifier in the tangent space, the problem is that the curse of dimensionality. Traditional data dimensionality reduction methods include two categories: linear dimensionality reduction (LDR) and nonlinear dimensionality reduction (NLDR). Since most of the actual data are nonlinear, NLDR techniques such as locally linear embedding (LLE) [21], isometric mapping (ISOMAP) [22], maximum variance unfolding (MVU) [23], and tdistributed stochastic neighbor embedding (tSNE) [24, 25] are used to tackle problems widely. Lee et al. used discrete wavelet transform (DWT) and continuous wavelet transform (CWT) to extract features of MI tasks, and Gaussian mixture model (GMM) was used to construct GMM supervectors; this method accelerates the speed of training and improves the accuracy of motor imagery [26]. Sadatnejad et al. propose a new kernel to preserve the topology of data points in the feature space, and the proposed kernel is strong, particularly in the cases where data points have a complex and nonlinear separable distribution [8]. Xie et al. proposed a framework for intrinsic submanifold learning from a highdimensional Riemannian manifold; the proposed method exhibited strong robustness against a small training dataset [27].
There is still another approach for overcoming the problem of high dimensionality in SPD manifolds. And this method maps from a highdimensional SPD manifold to a lower dimensional one while the geometry of SPD manifolds is preserved. And there are only two works of this way. Davoudi et al. [14] proposed distance preservation to local mean (DPLM) as dimensionality reduction technique, combined with FGMDM, the best performance of this article in terms of kappa value is 0.60. Harandi et al. [28] learned a mapping that maximizes the geodesic distances between interclass and simultaneously minimizes the distances between intraclass, and it is done via an optimization on Grassmann manifolds.
In this paper, we proposed a novel SSDTFGMDRM and SSDTKNN for the classification of multiclass MI tasks by designing a simple yet efficient subjectspecific decision tree framework. Method 1 contains SSDTFGMDRM to improve the performance of FGMDRM. For each individual, method 1 first separates the two most discriminative classes from the group. Furthermore, the remaining categories including the misclassification samples of the previous nodes are reclassified in the last node. Method 2 contains SSDTKNN and a NLDR method named SJGDA. SJGDA combines the advantage of semiJMI and GDA, and method 2 performed well on different datasets. The aims of this article are as follows:(1)To verify the effectiveness of the proposed SSDT framework through dataset 1(2)To verify the superiority of SJGDA in feature extraction, compared with semiJMI and GDA(3)To validate the generalization ability of method 2 through different datasets, in this paper
The rest of the paper is organized as follows: Section 2 introduced the mathematical preliminaries of the Riemannian geometry. Section 3 discussed the proposed methods in detail. Three datasets are introduced in Section 4. The results of our work are discussed in Section 5. And in Section 6, we compared our methods with the state of the art. This paper concludes in Section 7.
2. Geometry of SPD Matrices
Let X_{i} represent a short segment of continuous EEG signals, and X_{i} can be denoted as follows:where X_{i} corresponds to the trail of imaged movement starting at time t = T_{i}. denotes the number of sampled points of the selected segment.
For the trail, the spatial covariance matrix (SCM) can be calculated as follows:
Based on the SCM, there are two ways to classify MI tasks in the Riemannian manifold.
2.1. Filter Geodesic Minimum Distance to the Riemannian Mean
The Riemannian distance between two SPD matrices P_{1} and P_{2} in P(n) is given by [29]
Given m SPD matrices P_{1}, … , P_{m}, the geometric mean in the Riemannian sense is defined as
For algorithm mean Riemannian distance to Riemannian mean (MDRM), we compute the Riemannian distance between unknown class P to the Riemannian mean point of each class and classify the unknown class into categories corresponding to the shortest distance. Inspired by the principal geodesics analysis (PGA) method [30], the literature [31] finds a set of filters by applying an extension of Fisher linear discriminant analysis (FLDA) named Fisher geodesic discriminant analysis (FGDA). And then, apply these filters to MDRM to form filter geodesic minimum distance to Riemannian mean (FGMDRM). More details can be seen from [31].
2.2. Tangent Space Mapping
As shown in Figure 1, the SPD matrix of P is denoted by a differentiable Riemannian manifold Z. Each tangent vector S_{i} can be seen as the derivative at t = 0 of the geodesic Γ(t) between P and the exponential mapping P_{i} = EXP_{P}(S_{i}), defined as follows:
The inverse mapping is given by the logarithmic mapping and can be defined as follows:
Using the Riemannian geodesic distance, the Riemannian mean of I > 1 SPD matrices by
Using the tangent space located at the geometric mean of the whole set trials, , and then, each SCM P_{i} is mapped into this tangent space, to yield the set of m = n(n + 1)/2 dimensional vectors:
Many efficient classification algorithms can be implemented in the Riemannian space [10].
3. Methods
3.1. SubjectSpecific Decision Tree Framework
Decision tree is a common machine learning method. Each node of decision tree can be defined as a rule. Guo and Gelfand [32] proposed classification trees with neural network, and this method embeds multilayer neural networks directly in nodes. In the decision tree, one of the most important things is to construct a proper binary tree structure; the upper nodes have the greater impact of the accuracy of the whole samples [33]. In order to solve the multiclassification problem in this paper, we constructed a subjectspecific decision tree (SSDT) classification framework as shown in Figure 2 according to the best separating principle [34]. As can be seen from Figure 2, the SSDT proposed in this paper trains a different classification model at different nodes of the decision tree.
The advantages of the SSDT framework are as follows:(1)This model separates the two MI tasks (e.g., C.1 and C.2) with the highest recognition rate as far as possible(2)At the last node, we reclassify some samples to enhance the classification ability of the classifier
3.2. Method 1: A Direct Classification Method Based on SSDTFGMDRM
Firstly, we point out one problem of the multiclass FGMDRM by using an example. Figure 3 gives a threeclass classification problem. Figure 3(a) shows the classification progress by FGMDRM. We can see that three Riemannian mean points (RMPs) are located on the manifold. Since the classification criterion is decided by the distance calculated between the test point and the RMP, it caused a wrong classification. Figure 3(b) shows the example of the classification results obtained by using the first node of the SSDTFGMDRM framework. It can be seen that the error classification is corrected by using the decision tree framework.
Method 1 is used to classify four types of MI tasks directly. The training and testing diagram is shown in Figure 4.
3.3. Feature Extraction Algorithm Based on the Riemannian Tangent Space
In this paragraph, we propose a novel data reduction method which combines semiJMI and GDA, namely SJGDA, to solve the dimension disaster problem after tangent space mapping.
3.3.1. Semisupervised Joint Mutual Information
Semisupervised dataset D = D{D_{L} ∪ D_{U}} consists of two parts, are labelled data and are unlabelled data. A binary random variable S is introduced to determine the distribution of labelled dataset and unlabelled dataset. When s = 1, we record the value of y, otherwise not. In this way, the labelled set D_{L} comes from the joint distribution p(x, ys = 1), while the unlabelled set D_{U} comes from the distribution p(xs = 0). The underlying mechanism S turns out to be very important for feature selection.
Feature selection method based on mutual information theory is a common feature selection method [35]. In these methods, we rank the features according to the score and select the features with higher scores. For example, by ranking the features according to their mutual information with the labels, we get the sort of correlation that is related to class labels. The characteristics of the score are defined as follows:where X_{θ} represents the set of the features already selected and X_{k} is the feature ranked by scores. Y represents the label corresponding to feature X_{k}.
SemiJMI is a method of using a semisupervised dataset as a training set for JMI. More details can be seen from Reference [36]. In this paper, the missingness mechanism is classpriorchange semisupervised scenario (MARC) [37]. After feature ranking, we can obtain a feature vector as follows:where n is the length of the tangent vectors S_{i}. Since information redundancy exists in f, we select the best vector length m (m < n) of each subject by the classification recognition rate:
3.3.2. Generalized Discriminant Analysis
After variable selection, this paper uses generalized discriminant analysis (GDA) [38, 39], which is a nonlinear feature reduction technique based on kernels to reduce the length of the feature vectors f_{SJ} and their redundancies. Mapping X (f_{SJ}) into a highdimensional space F through a kernel function Φ:
The linear Fisher decision is performed in the F space, and the criterion function for its extension iswhere W^{Φ} ∈ F and S_{B} and S_{W} are betweenclass scatter and withinclass scatter, respectively.
For the convenience of the numerical calculation, kernel functions are introduced to solve the problem:
Gauss kernel, poly kernel, and sigmoid kernel are widely used in GDA [40]. For test data z, its image Φ(z) in F space projects on W^{Φ} is as follows:
This paper uses ploy kernel to reduce the dimension. After GDA, we can get a vector f_{G} as follows:where d of f_{G} is decided by the actual needs, and in this paper we set d = 1. And then, SJGDA is applied to the dataset of this paper, and the final feature vectors are constructed as follows:
3.4. Method 2: SJGDA and SubjectSpecific Decision Tree kNearest Neighbor
Method 2 is used to classify four types of MI tasks after tangent space mapping. The training and testing diagram is shown in Figure 5.
4. Description of Data
4.1. Dataset 1
BCI competition IV dataset 2a is used to evaluate the performance of the proposed two methods [41]. Dataset 2a collects 22 channel EEG data and 3 EOG channel data. Four types of motor imagery were collected: left hand, right hand, foot, and tongue. The dataset contains nine healthy subjects and each subject has two sessions, one training session and one test session. Each session has 288 trails of MI data with 72 trails for each MI task. The EEG signals are bandpass filtered by a 5th order Butterworth filter in the 8–30 Hz frequency band. The selection of trial period is important in MI classification; we select 2 s data (0.5 s and 2.5 s) after the cue, instructing the user to perform the MI tasks by the winner of the competition.
4.2. Dataset 2
BCI competition III dataset IIIa is used to evaluate the performance of method 2. BCI III dataset IIIa contains 3 subjects: K3b, K6b, and L1b, and collects 64 channel EEG data. The EEG was sampled with 250 Hz. Four types of motor imagery were collected: left hand, right hand, foot, and tongue. More details about this dataset can be seen at Reference [42].
4.3. Dataset 3
In our own dataset, Emotiv Epoc+ is used to collect EEG data of motor imagery. It is a portable EEG acquisition device with a sampling rate of 128 Hz. It has fourteen electrode channels (AF3, F7, F3, FC5, T7, P7, O1, O2, P8, T8, FC6, F4, F8, and AF4), two inference electrodes (CMS and DRL), and the electrode placement follows the international 10–20 standard. Equipment and the Emotiv 14 electrodes are located over 10–20 international system positions as shown in Figure 6. This experiment collected three kinds of EEG signals of one joint: imagination of shoulder flexion (F), extension (E), and abduction (A), as shown in Figure 7.
(a)
(b)
(a)
(b)
(c)
Seven subjects participated in this experimental study. These subjects were in good health. During the experiment, subjects were naturally placed with both hands, trying to avoid body or head movement. During the experiment, subjects carried out motor imagery under the outside cue, a single experiment collected EEG signal for 5 seconds, and then took 5–7 seconds to have rest, each action repeated acquisition 20 times. The experimental process is shown in Figure 8.
5. Results
5.1. Results of Method 1
We use SSDTFGMDRM to classify multiclass MI tasks as introduced in Section 3.1. Since there are four classes, we can have four pairs of MI tasks: left vs rest (L/RE), right vs rest (R/RE), foot vs rest (F/RE), and tongue vs rest (T/RE). For each subject, the pair with the highest accuracy is used to train N.1, and the pair with the second highest accuracy is to train N.2. Table 1 gives the tenfolder crossvalidation results obtained using FGMDRM in OVR scheme.

Table 2 displays the kappa values obtained by method 1. Compared with other methods, five subjects (A03, A06, A07, A08, and A09) achieved higher kappa value of nine without exploring the frequency domain information by method 1. In the case of fixed frequency window, we have improved the mean kappa value of 0.069 than MDRM (), and 0.139 than FGMDM_fixed (). Our approach also shows significant improvement than FGMDM (), which has exploited subjectspecific frequency information, in terms of the kappa value of 0.039.
5.2. Results of Method 2
The results in Figure 9 show the T/RE feature distribution of the five features of subject A09. Figure 9(a) shows the first five ranked features with semiJMI. After applying the semiJMI, the first five best features extracted have shown statistically significant improvement in the separability with p values <0.05 except feature 2 with value 0.77. In Figure 9(b), the first five features extracted from primitive feature vectors with value of 0.13, 0.05, 0.87, 0.05, and 0.13. The values indicate that the pair T/RE have no significance in the primitive feature vectors. The results show that with our semisupervised feature ranking algorithm, the separable degree of the feature has been greatly improved.
(a)
(b)
Figure 10 shows the evolution of the classification accuracy with KNN (k = 5 in this paper) against the number of ranked variables in OVR scheme. L/RE and T/RE are the two pairs with the highest recognition rate, and they achieved the highest recognition rate in 100 variables. But this is still a curse of dimensionality for classifiers; GDA is used to analyze the first 100 sorted variables in our study.
As the separation of characteristics cannot meet our requirements, GDA is used to get more obvious variables. Figure 11 illustrates distributions for the first five most discriminant variables with GDA and semiJMI. It can be seen from Figure 11 that L/RE is separated equally well by using GDA.
Table 3 displays tenfolder crossvalidation results obtained using SJGDA and KNN in OVR scheme. It can be seen that the vectors which are mapped to the tangent space have better classification performance than that in the Riemannian manifold directly.

Table 4 presents the results obtained by SJGDA in pairwise way for multiclass MI tasks. We have six pairs of MI tasks: left and right (L/R), left and foot (L/F), left and tongue (L/T), right and foot (R/F), right and tongue (R/T), and foot and tongue (F/T).

Table 5 displays the comparison of classification accuracy using SJGDA and KNN for L/R task in 10folder cross validation. References [8, 43–45] contain the classification of other publications. We have improved the accuracy compared with Reference [44] (p = 0.85) and Reference [45] (). Gaur et al. [43] () explored the specific frequency information for each subject, and Sadatnejad and Shiry Ghidary [8] () used a novel kernel for dimensionality reduction which is similar to SJGDA. Although the results in the paper are not as high as those in Reference [43], it can be concluded that there is no difference between the results in Reference [43] and those in this paper because of .

Table 6 presents the results in terms of the kappa value. The proposed method 1 achieved a mean performance of 0.589 which ranks this method to the first place of the competition. And with our proposed method 2, we have achieved a mean performance of 0.607, which makes method 2 to acquire the best performance of the state of the art.

Dataset 2 is used to verify the effect of method 2, and the classification results are given directly in this paper. The results are shown in Table 7. As can be seen from Table 7, method 2 obtained the second highest recognition rate in the comparative literature. Compared with the recent reference [47], method 2 achieved good classification results.
5.3. Results of Dataset 3
Dataset 3 is used to evaluate the performance of method 2. Figure 12 shows the classification error with KNN against the number of ranked variables in OVR scheme. A/RE and F/RE are the two pairs with the lowest classification error, and they all achieved the highest recognition rate within 60 variables. In this paper, the first 60 ranked variables are used for the next analysis.
Figure 13 displays 5folder crossvalidation results obtained by using SJGDA and KNN in OVR and OVO scheme. This Figure 13(a) illustrates three possible pairs of MI tasks (F/RE, E/RE, and A/RE) for each subject. It can be learned from the figure that flexion and abduction are the easiest movement to distinguish in six subjects of seven, and the six subjects are S1, S3, S4, S5, S6, and S7. However, due to individual differences, the highest recognition rate of each subject is different.
(a)
(b)
We also compared three possible pairs (F/E, F/A, and E/A) in OVO scheme of seven subjects. Figure 13(b) depicts the comparison results for each subject, and it can be seen that the pair of F/A obtained the highest recognition rate in seven subjects. Combined with the analysis results of Figures 13(a) and 13(b), it can be considered that flexion and extension are more obvious in the three MI tasks.
As SJGDA is a new method proposed in this paper, we also compared the feature distribution of SJGDA, GDA, and semiJMI to illustrate the effectiveness of SJGDA. Figure 14 depicts the feature distribution of F/E MI tasks of seven subjects. The blue and red circles represent the two different feature classes. As shown in Figure 14, the F/E MI tasks learned by SJGDA have high separability than GDA and semiJMI.
The performance of the proposed method 2 is evaluated by using classification accuracy. Since there are three classes, the chance level is 33.33%. Figure 15 demonstrates that the proposed method achieves higher performance for six subjects (S1, S2, S3, S4, S5, and S6) out of seven except S7 compared to semiJMI and GDA methods. In addition, it also can be seen that GDA obtains a better classification accuracy for four subjects of seven (S1, S2, S5, and S7) compared with semiJMI. The reasons for this phenomenon can be attributed to as follows: In the process of feature selection, we manually select feature dimensions suitable for classifiers, which results in partial information loss. As a feature dimensionality reduction technique, GDA is suitable for the preservation of useful information from the primitive vectors. And the proposed method SJGDA in this paper not only preserves the advantages of GDA but also adds some high ranking features to strengthen the expressive ability of the features.
6. Discussions
In this paper, we proposed a novel SSDT framework combined with classifiers to improve the performance of classifiers for multiclass MI tasks. We also proposed a novel NLDR method named SJGDA, and this NLDR method performs better than both semiJMI and GDA on different datasets. In the following paragraphs, we have discussed the two methods in detail.
Method 1 indicates the drawback of FGMDRM, and then the novel SSDT framework is used to improve the accuracy for each individual. As shown in Table 2, compared with other published results, method 1 gets a quite good result in the case of processing the EEG signals of fixed frequency segment (8–30 Hz).
As shown in Table 6, Gaur et al. [43] proposed SSMEMDBF to select the subjectspecific frequency to obtain enhanced EEG signals which represent MI tasks related to µ and β rhythms, then classification with the Riemannian distance directly. TSLDA was proposed by Barachant et al. [10], and the covariance matrices are mapped onto a higher dimensional space where they can be vectorized and treated as Euclidean objects. Ang et al. [46] is the winner of the competition, FBCSP and multiple OVR classifiers were used for MI tasks, and achieved the mean kappa value of 0.57. Sadatnejad and Shiry Ghidary [8] proposed a new kernel for NLDR over the manifold of SPD matrices, the kappa value is 0.576. Davoudi et al. [14] considered the geometry of SPD matrices and provides a lowdimensional representation of the manifold with highclass discrimination, and the best result of this method in terms of the kappa value is 0.60.
In method 2, SJGDA is used to get more obvious vectors from the tangent vectors, and a SSDTKNN classifier is used to identify different MI tasks. Combined with SJGDA and SSDTKNN, we have achieved a better performance compared with method 1 (), Reference [43] (), TSLDA (), winner 1 (), Reference [8] (), and Reference [14] (). It is clear that the proposed method in this paper is effective for MI tasks in a BCI system.
In order to prove the effectiveness of the proposed method 2, we tested it on two other datasets. As shown in Table 7 and Figure 15, method 2 achieves good classification results on two datasets.
7. Conclusion
The experimental results of method 1 show that the proposed classification framework significantly improves the classification performance of the classifier. The experimental results of method 2 show that the SJGDA algorithm proposed in this paper is superior to GDA and semiJMI in feature extraction, and method 2 has the highest recognition rate in this paper. However, as the classifiers in the SSDT framework is substitutable, the focus of the next work is to combine more advanced classifiers with SSDT to increase the recognition rate of the BCI systems.
Data Availability
The dataset 1 and dataset 2 used to support the findings of this study are available from http://bncihorizon2020.eu/database/datasets. The dataset 3 used to support the findings of this study is available from the corresponding author upon request.
Conflicts of Interest
The authors declare no conflicts of interest.
Acknowledgments
Northeast Electric Power University (Grant number BSJXM201521) and Jilin City Science and Technology Bureau (Grant number 20166012).
References
 J. R. Wolpaw, N. Birbaumer, D. J. McFarland, G. Pfurtscheller, and T. M. Vaughan, “Braincomputer interfaces for communication and control,” Clinical neurophysiology, vol. 113, no. 6, pp. 767–791, 2002. View at: Publisher Site  Google Scholar
 N. Birbaumer, “Breaking the silence: brain?computer interfaces (BCI) for communication and motor control,” Psychophysiology, vol. 43, no. 6, pp. 517–532, 2006. View at: Publisher Site  Google Scholar
 K. K. Ang, C. Guan, K. S. Phua et al., “Transcranial direct current stimulation and EEGbased motor imagery BCI for upper limb stroke rehabilitation,” in Proceedings of 2012 Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 4128–4131, IEEE, San Diego, CA, USA, August 2012. View at: Google Scholar
 B. Xu, S. Peng, A. Song, R. Yang, and L. Pan, “Robotaided upperlimb rehabilitation based on motor imagery EEG,” International Journal of Advanced Robotic Systems, vol. 8, no. 4, p. 40, 2011. View at: Publisher Site  Google Scholar
 F. Wang, X. Zhang, R. Fu, and G. Sun, “Study of the homeauxiliary robot based on BCI,” Sensors, vol. 18, no. 6, p. 1779, 2018. View at: Publisher Site  Google Scholar
 F. Wang, H. Wang, and R. Fu, “RealTime ECGbased detection of fatigue driving using sample entropy,” Entropy, vol. 20, no. 3, 2018. View at: Publisher Site  Google Scholar
 Z. J. Koles, M. S. Lazar, and S. Z. Zhou, “Spatial patterns underlying population differences in the background EEG,” Brain Topography, vol. 2, no. 4, pp. 275–284, 1990. View at: Publisher Site  Google Scholar
 K. Sadatnejad and S. Shiry Ghidary, “Kernel learning over the manifold of symmetric positive definite matrices for dimensionality reduction in a BCI application,” Neurocomputing, vol. 179, pp. 152–160, 2016. View at: Publisher Site  Google Scholar
 M. Congedo, A. Barachant, and R. Bhatia, “Riemannian geometry for EEGbased braincomputer interfaces; a primer and a review,” BrainComputer Interfaces, vol. 4, no. 3, pp. 155–174, 2017. View at: Publisher Site  Google Scholar
 A. Barachant, S. Bonnet, M. Congedo, and C. Jutten, “Multiclass braincomputer interface classification by riemannian geometry,” IEEE Transactions on Biomedical Engineering, vol. 59, no. 4, pp. 920–928, 2012. View at: Publisher Site  Google Scholar
 A. Barachant, S. Bonnet, M. Congedo, and C. Jutten, “Common spatial pattern revisited by riemannian geometry,” in Proceedings of 2010 IEEE International Workshop on Multimedia Signal Processing (MMSP), pp. 472–476, IEEE, Saint Malo, France, October 2010. View at: Google Scholar
 A. Barachant, S. Bonnet, M. Congedo, and C. Jutten, “A brainswitch using riemannian geometry,” in Proceedings of 5th International BrainComputer Interface Conference 2011 (BCI 2011), pp. 64–67, Graz, Austria, September 2011. View at: Google Scholar
 A. Barachant, S. Bonnet, M. Congedo, and C. Jutten, “Classification of covariance matrices using a riemannianbased kernel for BCI applications,” Neurocomputing, vol. 112, pp. 172–178, 2013. View at: Publisher Site  Google Scholar
 A. Davoudi, S. S. Ghidary, and K. Sadatnejad, “Dimensionality reduction based on distance preservation to local mean (DPLM) for spd matrices and its application in BCI,” 2016, https://arxiv.org/abs/1608.00514. View at: Google Scholar
 S. Brandl, K.R. Müller, and W. Samek, “Robust common spatial patterns based on Bhattacharyya distance and gamma divergence,” in Proceedings of 2015 3rd International Winter Conference on BrainComputer Interface (BCI), pp. 1–4, IEEE, GangwonDo, South Korea, January 2015. View at: Google Scholar
 W. Samek, M. Kawanabe, and K.R. Muller, “Divergencebased framework for common spatial patterns algorithms,” IEEE Reviews in Biomedical Engineering, vol. 7, pp. 50–72, 2014. View at: Publisher Site  Google Scholar
 W. Samek and K.R. Müller, “Information geometry meets BCI spatial filtering using divergences,” in Proceedings of 2014 International Winter Workshop on BrainComputer Interface (BCI), pp. 1–4, IEEE, Gangwon, South Korea, February 2014. View at: Google Scholar
 W. Samek and M. Kawanabe, “Robust common spatial patterns by minimum divergence covariance estimator,” in Proceedings of 2014 IEEE International Conference onAcoustics, Speech and Signal Processing (ICASSP), pp. 2040–2043, IEEE, Florence, Italy, May 2014. View at: Google Scholar
 S. Kumar, K. Mamun, and A. Sharma, “CSPTSM: optimizing the performance of riemannian tangent space mapping using common spatial pattern for miBCI,” Computers in biology and medicine, vol. 91, pp. 231–242, 2017. View at: Publisher Site  Google Scholar
 C. LindigLeón, N. Gayraud, L. Bougrain, and M. Clerc, “Comparison of hierarchical and nonhierarchical classification for motor imagery based BCI systems,” in Proceedings of The Sixth International BrainComputer Interfaces Meeting, Pacific Grove, CA, USA, MayJune 2016. View at: Google Scholar
 S. T. Roweis and L. K. Saul, “Nonlinear dimensionality reduction by locally linear embedding,” Science, vol. 290, no. 5500, pp. 2323–2326, 2000. View at: Publisher Site  Google Scholar
 O. Kramer and D. Lückehe, “Visualization of evolutionary runs with isometric mapping,” in Proceedings of 2015 IEEE Congress on Evolutionary Computation (CEC), pp. 1359–1363, IEEE, Sendai, Japan, May 2015. View at: Google Scholar
 K. Q. Weinberger, F. Sha, and L. K. Saul, “Learning a kernel matrix for nonlinear dimensionality reduction,” in Proceedings of the TwentyFirst International Conference on Machine Learning, p. 106, ACM, Banff, Canada, July 2004. View at: Google Scholar
 L. Van Der Maaten, “Learning a parametric embedding by preserving local structure,” RBM, vol. 500, p. 26, 2009. View at: Google Scholar
 L. Van Der Maaten, “Accelerating tSNE using treebased algorithms,” Journal of Machine Learning Research, vol. 15, pp. 3221–3245, 2014. View at: Google Scholar
 D. Lee, S.H. Park, and S.G. Lee, “Improving the accuracy and training speed of motor imagery braincomputer interfaces using waveletbased combined feature vectors and Gaussian mixture modelsupervectors,” Sensors, vol. 17, no. 10, p. 2282, 2017. View at: Publisher Site  Google Scholar
 X. Xie, Z. L. Yu, H. Lu, Z. Gu, and Y. Li, “Motor imagery classification based on bilinear submanifold learning of symmetric positivedefinite matrices,” IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 25, no. 6, pp. 504–516, 2017. View at: Publisher Site  Google Scholar
 M. T. Harandi, M. Salzmann, and R. Hartley, “From manifold to manifold: Geometryaware dimensionality reduction for SPD matrices,” in Proceedings of European Conference on Computer Vision, pp. 17–32, Springer, Zurich, Switzerland, September 2014. View at: Google Scholar
 M. Moakher, “A differential geometric approach to the geometric mean of symmetric positivedefinite matrices,” SIAM Journal on Matrix Analysis and Applications, vol. 26, no. 3, pp. 735–747, 2005. View at: Publisher Site  Google Scholar
 P. T. Fletcher and S. Joshi, “Principal geodesic analysis on symmetric spaces: statistics of diffusion tensors,” in Computer Vision and Mathematical Methods in Medical and Biomedical Image Analysis, M. Sonka, I. A. Kakadiaris, and J. Kybic, Eds., pp. 87–98, Springer, Berlin, Heidelberg, 2004. View at: Publisher Site  Google Scholar
 A. Barachant, S. Bonnet, M. Congedo, and C. Jutten, “Riemannian geometry applied to BCI classification,” in Proceedings of International Conference on Latent Variable Analysis and Signal Separation, pp. 629–636, Springer, St. Malo, France, September 2010. View at: Google Scholar
 H. Guo and S. B. Gelfand, “Classification trees with neural network feature extraction,” IEEE Transactions on Neural Networks, vol. 3, no. 6, pp. 923–933, 1992. View at: Publisher Site  Google Scholar
 S. R. Safavian and D. Landgrebe, “A survey of decision tree classifier methodology,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 21, no. 3, pp. 660–674, 1991. View at: Publisher Site  Google Scholar
 Y.H. Shao, W.J. Chen, W.B. Huang, Z.M. Yang, and N.Y. Deng, “The best separating decision tree twin support vector machine for multiclass classification,” Procedia Computer Science, vol. 17, pp. 1032–1038, 2013. View at: Publisher Site  Google Scholar
 G. Brown, A. Pocock, M.J. Zhao, and M. Luján, “Conditional likelihood maximisation: a unifying framework for information theoretic feature selection,” Journal of Machine Learning Research, vol. 13, pp. 27–66, 2012. View at: Google Scholar
 K. Sechidis and G. Brown, “Simple strategies for semisupervised feature selection,” Machine Learning, vol. 107, no. 2, pp. 357–395, 2017. View at: Publisher Site  Google Scholar
 J. G. MorenoTorres, T. Raeder, R. AlaizRodríguez, N. V. Chawla, and F. Herrera, “A unifying view on dataset shift in classification,” Pattern Recognition, vol. 45, no. 1, pp. 521–530, 2012. View at: Publisher Site  Google Scholar
 G. Baudat and F. Anouar, “Generalized discriminant analysis using a kernel approach,” Neural Computation, vol. 12, no. 10, pp. 2385–2404, 2000. View at: Publisher Site  Google Scholar
 M. Haghighat, S. Zonouz, and M. AbdelMottaleb, “Cloudid: trustworthy cloudbased and crossenterprise biometric identification,” Expert Systems with Applications, vol. 42, no. 21, pp. 7905–7916, 2015. View at: Publisher Site  Google Scholar
 V. Vapnik, The Nature of Statistical Learning Theory, Springer Science & Business Media, Berlin, Germany, 2013.
 C. Brunner, R. Leeb, G. MüllerPutz, A. Schlögl, and G. Pfurtscheller, BCI Competition 2008–Graz Data Set A, vol. 16, Institute for Knowledge Discovery (Laboratory of BrainComputer Interfaces) Graz University of Technology, Graz, Austria, 2008.
 B. Blankertz, K. R. Muller, D. J. Krusienski et al., “The BCI competition III: validating alternative approaches to actual BCI problems,” IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 14, no. 2, pp. 153–159, 2006. View at: Publisher Site  Google Scholar
 P. Gaur, R. B. Pachori, H. Wang, and G. Prasad, “A multiclass EEGbased BCI classification using multivariate empirical mode decomposition based filtering and riemannian geometry,” Expert Systems with Applications, vol. 95, pp. 201–211, 2018. View at: Publisher Site  Google Scholar
 F. Lotte and C. Cuntai Guan, “Regularizing common spatial patterns to improve BCI designs: unified theory and new algorithms,” IEEE Transactions on Biomedical Engineering, vol. 58, no. 2, pp. 355–362, 2011. View at: Publisher Site  Google Scholar
 H. Raza, H. Cecotti, Y. Li, and G. Prasad, “Adaptive learning with covariate shiftdetection for motor imagerybased braincomputer interface,” Soft Computing, vol. 20, no. 8, pp. 3085–3096, 2015. View at: Publisher Site  Google Scholar
 K. K. Ang, Z. Y. Chin, C. Wang, C. Guan, and H. Zhang, “Filter bank common spatial pattern algorithm on BCI competition IV datasets 2a and 2b,” Frontiers in Neuroscience, vol. 6, p. 39, 2012. View at: Publisher Site  Google Scholar
 H. Baali, A. Khorshidtalab, M. Mesbah, and M. J. E. Salami, “A transformbased feature extraction approach for motor imagery tasks classification,” IEEE Journal of Translational Engineering in Health and Medicine, vol. 3, pp. 1–8, 2015. View at: Publisher Site  Google Scholar
 A. Schlögl, F. Lee, H. Bischof, and G. Pfurtscheller, “Characterization of fourclass motor imagery EEG data for the BCIcompetition 2005,” Journal of Neural Engineering, vol. 2, no. 4, pp. L14–L22, 2005. View at: Publisher Site  Google Scholar
 M. GrosseWentrup and M. Buss, “Multiclass common spatial patterns and information theoretic feature extraction,” IEEE Transactions on Biomedical Engineering, vol. 55, no. 8, pp. 1991–2000, 2008. View at: Publisher Site  Google Scholar
 I. Koprinska, “Feature selection for braincomputer interfaces,” in Prooceedings of PacificAsia Conference on Knowledge Discovery and Data Mining, pp. 106–117, Springer, Bangkok, Thailand, April 2009. View at: Google Scholar
Copyright
Copyright © 2019 Shan Guan et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.