#### Abstract

This paper proposes a novel classification framework and a novel data reduction method to distinguish multiclass motor imagery (MI) electroencephalography (EEG) for brain computer interface (BCI) based on the manifold of covariance matrices in a Riemannian perspective. For method 1, a subject-specific decision tree (SSDT) framework with filter geodesic minimum distance to Riemannian mean (FGMDRM) is designed to identify MI tasks and reduce the classification error in the nonseparable region of FGMDRM. Method 2 includes a feature extraction algorithm and a classification algorithm. The feature extraction algorithm combines semisupervised joint mutual information (*semi*-JMI) with general discriminate analysis (GDA), namely, SJGDA, to reduce the dimension of vectors in the Riemannian tangent plane. And the classification algorithm replaces the FGMDRM in method 1 with k-nearest neighbor (KNN), named SSDT-KNN. By applying method 2 on BCI competition IV dataset 2a, the kappa value has been improved from 0.57 to 0.607 compared to the winner of dataset 2a. And method 2 also obtains high recognition rate on the other two datasets.

#### 1. Introduction

Brain computer interface (BCI) based on motor imagery (MI) is used to analyze human intention by electroencephalogram (EEG) signals generated by human brain electrophysiological activity [1, 2]. Based on BCI technology, exoskeletons can be used to help people with physical disabilities regain their motor ability, and BCI also has wide applications in smart home, entertainment, military, and other fields [3–6].

Common spatial pattern (CSP) is widely used in motor imagery to extract EEG features [7]. CSP has excellent performance in two classification tasks, but the drawback is that it needs a lot of electrodes [8].

Despite its short history, the use of the Riemannian geometry in BCI decoding is currently attracting increasing attention [9–13]. Covariance matrices lie in the space of symmetric positive definite (SPD) matrices, which can be formulated as a Riemannian manifold [14]. In the BCI field, the connections of the CSP algorithm and the tools of information geometry have been investigated, considering several divergence functions in alternative to the Riemannian distance [15–18]. Barachant et al. proposed a simple data augmentation approach for improving the performance of the Riemannian mean distance to mean (MDM) algorithm [13]. Kumar et al. propose a single band CSP framework for MI-BCI that utilizes the concept of tangent space mapping in the manifold of covariance matrices, and the proposed method obtains good results when compared to other competing methods [19]. A hierarchical MDM classifier for multiclass problem has been tested in [20].

Advanced classifiers based on the tangent space on the Riemannian manifold of positive matrices are also receiving increasing attention. Barachant et al. map the covariance matrices in the tangent space and apply feature selection and linear discriminate analysis (LDA) in the tangent space [10]. For the application of the classifier in the tangent space, the problem is that the curse of dimensionality. Traditional data dimensionality reduction methods include two categories: linear dimensionality reduction (LDR) and nonlinear dimensionality reduction (NLDR). Since most of the actual data are nonlinear, NLDR techniques such as locally linear embedding (LLE) [21], isometric mapping (ISOMAP) [22], maximum variance unfolding (MVU) [23], and t-distributed stochastic neighbor embedding (t-SNE) [24, 25] are used to tackle problems widely. Lee et al. used discrete wavelet transform (DWT) and continuous wavelet transform (CWT) to extract features of MI tasks, and Gaussian mixture model (GMM) was used to construct GMM supervectors; this method accelerates the speed of training and improves the accuracy of motor imagery [26]. Sadatnejad et al. propose a new kernel to preserve the topology of data points in the feature space, and the proposed kernel is strong, particularly in the cases where data points have a complex and nonlinear separable distribution [8]. Xie et al. proposed a framework for intrinsic submanifold learning from a high-dimensional Riemannian manifold; the proposed method exhibited strong robustness against a small training dataset [27].

There is still another approach for overcoming the problem of high dimensionality in SPD manifolds. And this method maps from a high-dimensional SPD manifold to a lower dimensional one while the geometry of SPD manifolds is preserved. And there are only two works of this way. Davoudi et al. [14] proposed distance preservation to local mean (DPLM) as dimensionality reduction technique, combined with FGMDM, the best performance of this article in terms of kappa value is 0.60. Harandi et al. [28] learned a mapping that maximizes the geodesic distances between interclass and simultaneously minimizes the distances between intraclass, and it is done via an optimization on Grassmann manifolds.

In this paper, we proposed a novel SSDT-FGMDRM and SSDT-KNN for the classification of multiclass MI tasks by designing a simple yet efficient subject-specific decision tree framework. Method 1 contains SSDT-FGMDRM to improve the performance of FGMDRM. For each individual, method 1 first separates the two most discriminative classes from the group. Furthermore, the remaining categories including the misclassification samples of the previous nodes are reclassified in the last node. Method 2 contains SSDT-KNN and a NLDR method named SJGDA. SJGDA combines the advantage of *semi*-JMI and GDA, and method 2 performed well on different datasets. The aims of this article are as follows:(1)To verify the effectiveness of the proposed SSDT framework through dataset 1(2)To verify the superiority of SJGDA in feature extraction, compared with *semi*-JMI and GDA(3)To validate the generalization ability of method 2 through different datasets, in this paper

The rest of the paper is organized as follows: Section 2 introduced the mathematical preliminaries of the Riemannian geometry. Section 3 discussed the proposed methods in detail. Three datasets are introduced in Section 4. The results of our work are discussed in Section 5. And in Section 6, we compared our methods with the state of the art. This paper concludes in Section 7.

#### 2. Geometry of SPD Matrices

Let *X*_{i} represent a short segment of continuous EEG signals, and *X*_{i} can be denoted as follows:where *X*_{i} corresponds to the trail of imaged movement starting at time *t* = *T*_{i}. denotes the number of sampled points of the selected segment.

For the trail, the spatial covariance matrix (SCM) can be calculated as follows:

Based on the SCM, there are two ways to classify MI tasks in the Riemannian manifold.

##### 2.1. Filter Geodesic Minimum Distance to the Riemannian Mean

The Riemannian distance between two SPD matrices *P*_{1} and *P*_{2} in *P*(*n*) is given by [29]

Given *m* SPD matrices *P*_{1}, … , *P*_{m}, the geometric mean in the Riemannian sense is defined as

For algorithm mean Riemannian distance to Riemannian mean (MDRM), we compute the Riemannian distance between unknown class *P* to the Riemannian mean point of each class and classify the unknown class into categories corresponding to the shortest distance. Inspired by the principal geodesics analysis (PGA) method [30], the literature [31] finds a set of filters by applying an extension of Fisher linear discriminant analysis (FLDA) named Fisher geodesic discriminant analysis (FGDA). And then, apply these filters to MDRM to form filter geodesic minimum distance to Riemannian mean (FGMDRM). More details can be seen from [31].

##### 2.2. Tangent Space Mapping

As shown in Figure 1, the SPD matrix of *P* is denoted by a differentiable Riemannian manifold *Z*. Each tangent vector *S*_{i} can be seen as the derivative at *t* = 0 of the geodesic *Γ*(*t*) between *P* and the exponential mapping *P*_{i} = EXP_{P}(*S*_{i}), defined as follows:

The inverse mapping is given by the logarithmic mapping and can be defined as follows:

Using the Riemannian geodesic distance, the Riemannian mean of *I* > 1 SPD matrices by

Using the tangent space located at the geometric mean of the whole set trials, , and then, each SCM *P*_{i} is mapped into this tangent space, to yield the set of *m* *=* *n*(*n* + 1)/2 dimensional vectors:

Many efficient classification algorithms can be implemented in the Riemannian space [10].

#### 3. Methods

##### 3.1. Subject-Specific Decision Tree Framework

Decision tree is a common machine learning method. Each node of decision tree can be defined as a rule. Guo and Gelfand [32] proposed classification trees with neural network, and this method embeds multilayer neural networks directly in nodes. In the decision tree, one of the most important things is to construct a proper binary tree structure; the upper nodes have the greater impact of the accuracy of the whole samples [33]. In order to solve the multiclassification problem in this paper, we constructed a subject-specific decision tree (SSDT) classification framework as shown in Figure 2 according to the best separating principle [34]. As can be seen from Figure 2, the SSDT proposed in this paper trains a different classification model at different nodes of the decision tree.

The advantages of the SSDT framework are as follows:(1)This model separates the two MI tasks (e.g., C.1 and C.2) with the highest recognition rate as far as possible(2)At the last node, we reclassify some samples to enhance the classification ability of the classifier

##### 3.2. Method 1: A Direct Classification Method Based on SSDT-FGMDRM

Firstly, we point out one problem of the multiclass FGMDRM by using an example. Figure 3 gives a three-class classification problem. Figure 3(a) shows the classification progress by FGMDRM. We can see that three Riemannian mean points (RMPs) are located on the manifold. Since the classification criterion is decided by the distance calculated between the test point and the RMP, it caused a wrong classification. Figure 3(b) shows the example of the classification results obtained by using the first node of the SSDT-FGMDRM framework. It can be seen that the error classification is corrected by using the decision tree framework.

Method 1 is used to classify four types of MI tasks directly. The training and testing diagram is shown in Figure 4.

##### 3.3. Feature Extraction Algorithm Based on the Riemannian Tangent Space

In this paragraph, we propose a novel data reduction method which combines *semi*-JMI and GDA, namely SJGDA, to solve the dimension disaster problem after tangent space mapping.

###### 3.3.1. Semisupervised Joint Mutual Information

Semisupervised dataset *D* = *D*{*D*_{L} ∪ *D*_{U}} consists of two parts, are labelled data and are unlabelled data. A binary random variable *S* is introduced to determine the distribution of labelled dataset and unlabelled dataset. When *s* = 1, we record the value of *y*, otherwise not. In this way, the labelled set *D*_{L} comes from the joint distribution *p*(*x*, *y*|*s* = 1), while the unlabelled set *D*_{U} comes from the distribution *p*(*x*|*s* = 0). The underlying mechanism *S* turns out to be very important for feature selection.

Feature selection method based on mutual information theory is a common feature selection method [35]. In these methods, we rank the features according to the score and select the features with higher scores. For example, by ranking the features according to their mutual information with the labels, we get the sort of correlation that is related to class labels. The characteristics of the score are defined as follows:where *X*_{θ} represents the set of the features already selected and *X*_{k} is the feature ranked by scores. *Y* represents the label corresponding to feature *X*_{k}.

*Semi*-JMI is a method of using a semisupervised dataset as a training set for JMI. More details can be seen from Reference [36]. In this paper, the missingness mechanism is class-prior-change semisupervised scenario (MAR-C) [37]. After feature ranking, we can obtain a feature vector as follows:where *n* is the length of the tangent vectors *S*_{i}. Since information redundancy exists in *f*, we select the best vector length *m* (*m* < *n*) of each subject by the classification recognition rate:

###### 3.3.2. Generalized Discriminant Analysis

After variable selection, this paper uses generalized discriminant analysis (GDA) [38, 39], which is a nonlinear feature reduction technique based on kernels to reduce the length of the feature vectors *f*_{SJ} and their redundancies. Mapping *X* (*f*_{SJ}) into a high-dimensional space *F* through a kernel function Φ:

The linear Fisher decision is performed in the *F* space, and the criterion function for its extension iswhere *W*^{Φ} ∈ *F* and *S*_{B} and *S*_{W} are between-class scatter and within-class scatter, respectively.

For the convenience of the numerical calculation, kernel functions are introduced to solve the problem:

Gauss kernel, poly kernel, and sigmoid kernel are widely used in GDA [40]. For test data *z*, its image Φ(*z*) in *F* space projects on *W*^{Φ} is as follows:

This paper uses ploy kernel to reduce the dimension. After GDA, we can get a vector *f*_{G} as follows:where *d* of *f*_{G} is decided by the actual needs, and in this paper we set *d* = 1. And then, SJGDA is applied to the dataset of this paper, and the final feature vectors are constructed as follows:

##### 3.4. Method 2: SJGDA and Subject-Specific Decision Tree k-Nearest Neighbor

Method 2 is used to classify four types of MI tasks after tangent space mapping. The training and testing diagram is shown in Figure 5.

#### 4. Description of Data

##### 4.1. Dataset 1

BCI competition IV dataset 2a is used to evaluate the performance of the proposed two methods [41]. Dataset 2a collects 22 channel EEG data and 3 EOG channel data. Four types of motor imagery were collected: left hand, right hand, foot, and tongue. The dataset contains nine healthy subjects and each subject has two sessions, one training session and one test session. Each session has 288 trails of MI data with 72 trails for each MI task. The EEG signals are bandpass filtered by a 5-th order Butterworth filter in the 8–30 Hz frequency band. The selection of trial period is important in MI classification; we select 2 s data (0.5 s and 2.5 s) after the cue, instructing the user to perform the MI tasks by the winner of the competition.

##### 4.2. Dataset 2

BCI competition III dataset IIIa is used to evaluate the performance of method 2. BCI III dataset IIIa contains 3 subjects: K3b, K6b, and L1b, and collects 64 channel EEG data. The EEG was sampled with 250 Hz. Four types of motor imagery were collected: left hand, right hand, foot, and tongue. More details about this dataset can be seen at Reference [42].

##### 4.3. Dataset 3

In our own dataset, Emotiv Epoc+ is used to collect EEG data of motor imagery. It is a portable EEG acquisition device with a sampling rate of 128 Hz. It has fourteen electrode channels (AF3, F7, F3, FC5, T7, P7, O1, O2, P8, T8, FC6, F4, F8, and AF4), two inference electrodes (CMS and DRL), and the electrode placement follows the international 10–20 standard. Equipment and the Emotiv 14 electrodes are located over 10–20 international system positions as shown in Figure 6. This experiment collected three kinds of EEG signals of one joint: imagination of shoulder flexion (F), extension (E), and abduction (A), as shown in Figure 7.

**(a)**

**(b)**

**(a)**

**(b)**

**(c)**

Seven subjects participated in this experimental study. These subjects were in good health. During the experiment, subjects were naturally placed with both hands, trying to avoid body or head movement. During the experiment, subjects carried out motor imagery under the outside cue, a single experiment collected EEG signal for 5 seconds, and then took 5–7 seconds to have rest, each action repeated acquisition 20 times. The experimental process is shown in Figure 8.

#### 5. Results

##### 5.1. Results of Method 1

We use SSDT-FGMDRM to classify multiclass MI tasks as introduced in Section 3.1. Since there are four classes, we can have four pairs of MI tasks: left vs rest (L/RE), right vs rest (R/RE), foot vs rest (F/RE), and tongue vs rest (T/RE). For each subject, the pair with the highest accuracy is used to train N.1, and the pair with the second highest accuracy is to train N.2. Table 1 gives the ten-folder cross-validation results obtained using FGMDRM in OVR scheme.

Table 2 displays the kappa values obtained by method 1. Compared with other methods, five subjects (A03, A06, A07, A08, and A09) achieved higher kappa value of nine without exploring the frequency domain information by method 1. In the case of fixed frequency window, we have improved the mean kappa value of 0.069 than MDRM (), and 0.139 than FGMDM_fixed (). Our approach also shows significant improvement than FGMDM (), which has exploited subject-specific frequency information, in terms of the kappa value of 0.039.

##### 5.2. Results of Method 2

The results in Figure 9 show the T/RE feature distribution of the five features of subject A09. Figure 9(a) shows the first five ranked features with *semi*-JMI. After applying the *semi*-JMI, the first five best features extracted have shown statistically significant improvement in the separability with *p* values <0.05 except feature 2 with value 0.77. In Figure 9(b), the first five features extracted from primitive feature vectors with value of 0.13, 0.05, 0.87, 0.05, and 0.13. The values indicate that the pair T/RE have no significance in the primitive feature vectors. The results show that with our semisupervised feature ranking algorithm, the separable degree of the feature has been greatly improved.

**(a)**

**(b)**

Figure 10 shows the evolution of the classification accuracy with KNN (*k* = 5 in this paper) against the number of ranked variables in OVR scheme. L/RE and T/RE are the two pairs with the highest recognition rate, and they achieved the highest recognition rate in 100 variables. But this is still a curse of dimensionality for classifiers; GDA is used to analyze the first 100 sorted variables in our study.

As the separation of characteristics cannot meet our requirements, GDA is used to get more obvious variables. Figure 11 illustrates distributions for the first five most discriminant variables with GDA and *semi*-JMI. It can be seen from Figure 11 that L/RE is separated equally well by using GDA.

Table 3 displays ten-folder cross-validation results obtained using SJGDA and KNN in OVR scheme. It can be seen that the vectors which are mapped to the tangent space have better classification performance than that in the Riemannian manifold directly.

Table 4 presents the results obtained by SJGDA in pairwise way for multiclass MI tasks. We have six pairs of MI tasks: left and right (L/R), left and foot (L/F), left and tongue (L/T), right and foot (R/F), right and tongue (R/T), and foot and tongue (F/T).

Table 5 displays the comparison of classification accuracy using SJGDA and KNN for L/R task in 10-folder cross validation. References [8, 43–45] contain the classification of other publications. We have improved the accuracy compared with Reference [44] (*p* = 0.85) and Reference [45] (). Gaur et al. [43] () explored the specific frequency information for each subject, and Sadatnejad and Shiry Ghidary [8] () used a novel kernel for dimensionality reduction which is similar to SJGDA. Although the results in the paper are not as high as those in Reference [43], it can be concluded that there is no difference between the results in Reference [43] and those in this paper because of .

Table 6 presents the results in terms of the kappa value. The proposed method 1 achieved a mean performance of 0.589 which ranks this method to the first place of the competition. And with our proposed method 2, we have achieved a mean performance of 0.607, which makes method 2 to acquire the best performance of the state of the art.

Dataset 2 is used to verify the effect of method 2, and the classification results are given directly in this paper. The results are shown in Table 7. As can be seen from Table 7, method 2 obtained the second highest recognition rate in the comparative literature. Compared with the recent reference [47], method 2 achieved good classification results.

##### 5.3. Results of Dataset 3

Dataset 3 is used to evaluate the performance of method 2. Figure 12 shows the classification error with KNN against the number of ranked variables in OVR scheme. A/RE and F/RE are the two pairs with the lowest classification error, and they all achieved the highest recognition rate within 60 variables. In this paper, the first 60 ranked variables are used for the next analysis.

Figure 13 displays 5-folder cross-validation results obtained by using SJGDA and KNN in OVR and OVO scheme. This Figure 13(a) illustrates three possible pairs of MI tasks (F/RE, E/RE, and A/RE) for each subject. It can be learned from the figure that flexion and abduction are the easiest movement to distinguish in six subjects of seven, and the six subjects are S1, S3, S4, S5, S6, and S7. However, due to individual differences, the highest recognition rate of each subject is different.

**(a)**

**(b)**

We also compared three possible pairs (F/E, F/A, and E/A) in OVO scheme of seven subjects. Figure 13(b) depicts the comparison results for each subject, and it can be seen that the pair of F/A obtained the highest recognition rate in seven subjects. Combined with the analysis results of Figures 13(a) and 13(b), it can be considered that flexion and extension are more obvious in the three MI tasks.

As SJGDA is a new method proposed in this paper, we also compared the feature distribution of SJGDA, GDA, and *semi*-JMI to illustrate the effectiveness of SJGDA. Figure 14 depicts the feature distribution of F/E MI tasks of seven subjects. The blue and red circles represent the two different feature classes. As shown in Figure 14, the F/E MI tasks learned by SJGDA have high separability than GDA and *semi*-JMI.

The performance of the proposed method 2 is evaluated by using classification accuracy. Since there are three classes, the chance level is 33.33%. Figure 15 demonstrates that the proposed method achieves higher performance for six subjects (S1, S2, S3, S4, S5, and S6) out of seven except S7 compared to *semi*-JMI and GDA methods. In addition, it also can be seen that GDA obtains a better classification accuracy for four subjects of seven (S1, S2, S5, and S7) compared with *semi*-JMI. The reasons for this phenomenon can be attributed to as follows: In the process of feature selection, we manually select feature dimensions suitable for classifiers, which results in partial information loss. As a feature dimensionality reduction technique, GDA is suitable for the preservation of useful information from the primitive vectors. And the proposed method SJGDA in this paper not only preserves the advantages of GDA but also adds some high ranking features to strengthen the expressive ability of the features.

#### 6. Discussions

In this paper, we proposed a novel SSDT framework combined with classifiers to improve the performance of classifiers for multiclass MI tasks. We also proposed a novel NLDR method named SJGDA, and this NLDR method performs better than both *semi*-JMI and GDA on different datasets. In the following paragraphs, we have discussed the two methods in detail.

Method 1 indicates the drawback of FGMDRM, and then the novel SSDT framework is used to improve the accuracy for each individual. As shown in Table 2, compared with other published results, method 1 gets a quite good result in the case of processing the EEG signals of fixed frequency segment (8–30 Hz).

As shown in Table 6, Gaur et al. [43] proposed SS-MEMDBF to select the subject-specific frequency to obtain enhanced EEG signals which represent MI tasks related to *µ* and *β* rhythms, then classification with the Riemannian distance directly. TSLDA was proposed by Barachant et al. [10], and the covariance matrices are mapped onto a higher dimensional space where they can be vectorized and treated as Euclidean objects. Ang et al. [46] is the winner of the competition, FBCSP and multiple OVR classifiers were used for MI tasks, and achieved the mean kappa value of 0.57. Sadatnejad and Shiry Ghidary [8] proposed a new kernel for NLDR over the manifold of SPD matrices, the kappa value is 0.576. Davoudi et al. [14] considered the geometry of SPD matrices and provides a low-dimensional representation of the manifold with high-class discrimination, and the best result of this method in terms of the kappa value is 0.60.

In method 2, SJGDA is used to get more obvious vectors from the tangent vectors, and a SSDT-KNN classifier is used to identify different MI tasks. Combined with SJGDA and SSDT-KNN, we have achieved a better performance compared with method 1 (), Reference [43] (), TSLDA (), winner 1 (), Reference [8] (), and Reference [14] (). It is clear that the proposed method in this paper is effective for MI tasks in a BCI system.

In order to prove the effectiveness of the proposed method 2, we tested it on two other datasets. As shown in Table 7 and Figure 15, method 2 achieves good classification results on two datasets.

#### 7. Conclusion

The experimental results of method 1 show that the proposed classification framework significantly improves the classification performance of the classifier. The experimental results of method 2 show that the SJGDA algorithm proposed in this paper is superior to GDA and *semi*-JMI in feature extraction, and method 2 has the highest recognition rate in this paper. However, as the classifiers in the SSDT framework is substitutable, the focus of the next work is to combine more advanced classifiers with SSDT to increase the recognition rate of the BCI systems.

#### Data Availability

The dataset 1 and dataset 2 used to support the findings of this study are available from http://bnci-horizon-2020.eu/database/data-sets. The dataset 3 used to support the findings of this study is available from the corresponding author upon request.

#### Conflicts of Interest

The authors declare no conflicts of interest.

#### Acknowledgments

Northeast Electric Power University (Grant number BSJXM-201521) and Jilin City Science and Technology Bureau (Grant number 20166012).