Abstract

In motor imagery brain computer interface system, the spatial covariance matrices of EEG signals which carried important discriminative information have been well used to improve the decoding performance of motor imagery. However, the covariance matrices often suffer from the problem of high dimensionality, which leads to a high computational cost and overfitting. These problems directly limit the application ability and work efficiency of the BCI system. To improve these problems and enhance the performance of the BCI system, in this study, we propose a novel semisupervised locality-preserving graph embedding model to learn a low-dimensional embedding. This approach enables a low-dimensional embedding to capture more discriminant information for classification by efficiently incorporating information from testing and training data into a Riemannian graph. Furthermore, we obtain an efficient classification algorithm using an extreme learning machine (ELM) classifier developed on the tangent space of a learned embedding. Experimental results show that our proposed approach achieves higher classification performance than benchmark methods on various datasets, including the BCI Competition IIa dataset and in-house BCI datasets.

1. Introduction

Brain computer interface (BCI) systems have been developed to provide a novel and promising alternative method for humans to interact with their environment. Among currently operational BCI systems, many have been based on electroencephalography (EEG) signals, owing to the noninvasive nature of the technique and affordable data acquisition equipment, both of which also facilitate real-time operation [1]. Although many EEG signals such as P300 waves or steady-state visual evoked potentials (SSVEP) could be used for BCI [2], in this study, we focus on motor imagery-based BCI systems, with which trained subjects can voluntarily generate EEG control signals by imagining movements of different parts of their bodies [3]. With the advantage not requiring external stimulation, motor imagery-based BCI has proven effective in practice.

The common spatial pattern (CSP) algorithm has been shown to be one of the most effective methods for feature extraction from motor imagery signals [4]. The success of CSP techniques in BCI applications depends heavily on learning a proper spatial filter from a training dataset. Some extensions of CSP were proposed to improve the performance of feature extraction in motor imagery classification. In [5], a novel regularized CSP combined with correlation-based channel selection was proposed to extract effective features of EEG. In [6], a discriminative canonical pattern matching combined with CSP was used to extract the movement-related features. In [7], to capture the shared salient information across multiple related spatial patterns, a multiview learning-based sparse optimization was proposed to jointly extract robust CSP features with L1L2-norm regularization. In [8], a generic model training by weighted linear discriminant analysis was proposed to reduce the correction time of the BCI system. In [9], a fusion method for internal feature selection was proposed to fuse different feature selection rules. It can consume less computational cost and obtain high performance of motor imagery classification. Typically, in such motor imagery classification system, some discriminant information contained in the testing dataset remains unutilized. However, graph-based models are able to utilize such information because they naturally incorporate diverse types of information, such as the relationship between training and testing datasets [10].

Graph construction is an essential step in graph-based models. Two conventional methods are commonly used, including the -nearest neighbor method and the -ball-based method [11]:(1)-nearest neighbor methods consider data samples and as neighbors if is among the nearest neighbors of or is among the nearest neighbors of , where the nearest neighbors are determined as the samples with the smallest Euclidean distance(2)-ball neighborhood methods consider data samples and as neighbors if , where is the Euclidean norm and

The above traditional graph construction methods are relatively simple to implement and have been applied effectively in practice. However, if the data samples considered do not lie in a Euclidean space, the -nearest neighbor and -ball-based methods cannot guarantee the connectivity of the whole graph and often result in several separated subgraphs because neighbors defined by Euclidean distance cannot characterize the real geometrical relations among samples well [12]. An obvious example is the covariance feature data used in motor imagery-based BCI. Because covariance features of EEG signals with the form of symmetric positive-definite (SPD) matrices lie on a Riemannian manifold [13], we propose an extension of the graph construction approach based on Riemannian geodesic distance to construct a proper graph for EEG signals, namely, a Riemannian graph. In fact, EEG classification techniques on high-dimensional Riemannian manifolds have recently received increased attention [14]. For example, an efficient classification algorithm called tangent space linear discriminant analysis (TS + LDA), was proposed in [14]. TS + LDA mapped all the data points in a Riemannian manifold into its tangent space, which is known as the best hyperplane for classification tasks [15], and then applied linear discriminant analysis (LDA) for classification in the tangent space.

After constructing a Riemannian graph for an EEG signal, many graph-based algorithms can be exploited on the constructed Riemannian graph. Such graph embedding algorithms are among the most successful in graph-based data processing on high-dimensional Riemannian manifolds because the most important information is always contained in a low-dimensional embedding of a high-dimensional Riemannian manifold. Many state-of-the-art semisupervised and unsupervised algorithms have been proposed. Semisupervised learning methods can utilize both training and test data to boost the algorithmic performance. Many related works, such as [1618], were proposed to classify EEG based on training and test data in the context of semisupervised learning. In [16], a self-training support vector machine (SVM) method was proposed to use the test data for updating the parameter of SVM model. In [17], an iterative semisupervised SVM method was proposed for channel selection and classifier training. In [18], a semisupervised expectation maximization (EM) was incorporated into CSP to extract more effective features. On the contrary, many unsupervised dimensionality reduction methods also are proposed to learn a low-dimensional embedding preserving similarities between vertex pairs on such graphs, including locally linear embedding [19], locality-preserving projection [20], and Laplacian eigenmaps [21].

Considering locality preservation, in this study, we propose a bilinear locality-preserving embedding learning algorithm on a constructed Riemannian graph. Because the learned embedding was also an SPD matrix space, inspired by the TS + LDA algorithm, we applied an extreme learning machine (ELM) [22, 23] method on the embedding tangent space for classification. Many related works of graph embedding methods and ELM-based methods have been recently proposed. In [24], a hybrid-graph discriminant learning method was proposed to reveal the complex high-order relationship of the original data and to extract the discriminant features. In [25], a local constrained manifold structure collaborative preserving embedding was proposed to enhance the intraclass compactness and the interclass separability. In [26], to improve the classification performance of motor imagery, a sparse Bayesian ELM algorithm was proposed to automatically control the model complexity and exclude redundant hidden neurons. In [27], a multikernel ELM method integrating two types of kernels was proposed to effectively explore the supplementary information from multiple nonlinear feature spaces.

In [28], a clustering-based multitask feature learning method was proposed to improve EEG pattern decoding. Most of these studies have achieved great success in graph embedding and classification. However, the above methods learn embedding or classifier on Euclidean space and ignore that the covariance matrices lie on Riemannian manifold. The contributions of this study are threefold.(1)A graph model of the covariance features of motor imagery signals based on Riemannian geodesic distance was constructed. The neighborhood relationships among data samples were characterized by Riemannian geodesics because the EEG covariance features endowed with a Riemannian metric formed a Riemannian manifold.(2)A novel framework of graph embedding is proposed for the constructed Riemannian graph. The calculation of graph embedding is formulated as an eigenvalue problem.(3)A classification algorithm, called an embedding tangent space extreme learning machine (ETS + ELM), is proposed to learn low-dimensional embeddings. Experimental results show that the proposed algorithm achieved higher classification performance than prior methods used for comparison

The remainder of this paper is organized as follows. In Section 2, some basic concepts concerning SPD matrix spaces and BCI data models are briefly reviewed. In Section 3, we introduce the construction of Riemannian graphs and propose a bilinear locality-preserving graph embedding algorithm and an embedding tangent space ELM for classification. Extensive experimental results are presented in Section 4 to demonstrate the effectiveness of the proposed method. Finally, some conclusions are presented in Section 5.

2. Concepts of Riemannian Geometry and BCI Data Models

2.1. Riemannian Geometry

In mathematics, Riemannian manifold is defined as a smooth manifold equipped with a Riemannian metric , i.e., a smoothly varying inner product , on its tangent space [29]. A smooth manifold is a topological manifold with a smooth structure locally similar to Euclidean space. Every point on this topological manifold has a neighborhood for which there exists a homeomorphism mapping the neighborhood to a Euclidean space. The materials and methods section should contain sufficient detail so that all procedures can be repeated. It may be divided into headed subsections if several methods are described.

For many applications, the space of SPD matrices endowed with a Riemannian metric is one of most important Riemannian manifolds [30]. Owing to the properties of SPD matrices, explicit formulae exist for the major concepts of Riemannian geometry. Therefore, these concepts can be applied relatively easily in the context of pattern recognition.

2.2. Space of Symmetric Positive-Definite Matrices

Denoting the space of symmetric matrices byand the space of positive-definite matrices bythe space of SPD matrices is defined as

The space of SPD matrices lies on a differentiable Riemannian manifold [30]. Hence, all the mathematical tools defined in Riemannian geometry can be applied to any .

The Riemannian distance between two matrices is the minimum length of the curves connecting and on a Riemannian manifold [31]. There are many possible mathematical definitions of the Riemannian distance. In this study, we adopt the following formula for Riemannian distance:where is the Frobenius norm of a matrix and is the th real eigenvalue of .

The associated Euclidean tangent space is critical in the analysis of a Riemannian manifold, as illustrated in Figure 1. As a preliminary step to define the tangent space, we first, respectively, introduce the logarithmic mapping and exponential mapping operators aswhere is a mapping from a point on the manifold to the tangent space at , while is a mapping from the tangent space at to the manifold. These two operators are both one-to-one mapping operators between the Riemannian manifold and the tangent space.

Because the tangent space is the space of symmetric matrices, it comprises only N(N + 1)/2 independent elements. A minimal representation of the tangent space [10] is defined aswhere the upper operator is used to retain the upper triangular portion of the symmetric matrix and obtain its vectorization.

The mean of the SPD matrices plays an important role in classification. Since the neighborhood of Riemannian manifold is local homeomorphic to its tangent space, the trinational classifier can be performed on tangent space to obtain high classification performance [14]. However, large neighborhood will lead to large distortion between neighborhood and its tangent space, particularly when the selection of tangent point is not representative. The mean of SPD matrices is the best representation of all data points. It is defined as the point with the minimum sum of squared distances to all SPD matrices in a dataset :

2.3. BCI Data Model

The recorded EEG signal was a multilag/multichannel signal:where N and L denote the number of channels and sampled points, respectively, and is the snapshot vector. In motor imagery BCIs, the second-order statistical information (covariance feature) of often provides discriminative information for classification. The covariance feature of is represented by its sample covariance matrix (SCM) [32] as

3. Proposed Bilinear Locality-Preserving Embedding and ELM Classification

3.1. Riemannian Graph Construction

In SPD matrix spaces, because the data lie on a Riemannian manifold, it is natural to construct a Riemannian graph via Riemannian geodesic distance, as illustrated in Figure 2. represents the set of graph vertices, and represents the graph weight matrix. For a given dataset, the Riemannian graph is constructed as follows.

First, to construct an adjacency graph, a Riemannian distance-based nearest neighbor method is used. The points are considered neighbors if is among the nearest neighbors of or is among the nearest neighbors of , where is a positive integer and nearest neighbors are measured by the Riemannian geodesic distance .

Second, to calculate graph weights, many approaches may be used, such as heat kernels [33], inverse Euclidean distance [34], and local linear reconstruction coefficients [19]. In this study, we use a heat kernel based on the Riemannian geodesic distance to construct graph weights:where is the element of and the parameter controls the width of the neighborhoods.

3.2. Bilinear Locality-Preserving Embedding and ELM on Tangent Space

Because the covariance features studied in this paper lie in SPD matrix spaces, we propose a bilinear mapping with and under the constraint , to construct a system learning a low-dimensional embedding. The low-dimensional embedding was expected to preserve the locality geometry of a high-dimensional Riemannian manifold. Locality preservation attempts to ensure that if and in a high-dimensional manifold are “close,” then and in low-dimensional embedding are also “close.” A reasonable criterion for locality preservation is to minimize the following objective function:where the -norm is used to measure the “closeness” of data points in the embedding.

There is no closed-form optimal solution for the optimization problem (11). It should be noted that the objective function (11) is the trade-off model between optimized solution and Riemannian locality preserving. In this study, we propose an iterative strategy to obtain a local optimal solution of (11). In the proposed strategy, the matrix is first initialized as , and the new optimal of the th iteration can be obtained by solving the following eigendecomposition problem, given the th iteration of and :where . Negating (11), the minimization problem becomes a maximization problem. Because the optimization problem in (12) is a generalized eigenvector problem [35, 36], the optimal transformation matrix is the first generalized eigenvectors of corresponding to the largest eigenvalues. The iteration is repeated until the solution converges, and a local optimal solution of (11) is obtained. Further details are provided in Algorithm 1.

Input: given SPD matrix set with samples , dimensions of embedding , iterative number , stop threshold , and ELM parameter , and ;
(1)Construct a Riemannian graph over all samples based on Riemannian geodesic distance (Section 3.1);
(2)Initialize: ;
(3)Fort = 1 : 1:(), do;
 Calculate the matrices (Section 3.2);
 Obtain mapping matrix by (12);
If
  break;
end if;
end for;
(4)Construct -dimensional embedding , where is convergent matrix in step 4;
(5)Calculate the Riemannian mean of all data on embedding:
(6)Project embedding point onto tangent space on Riemannian mean by (6) ;
(7)Apply a three-layer ELM in the tangent space, learning the weight from (14), using the training data;
(8)The label of the testing data may be obtained by , where is the mapping matrix of the testing data;
Output: the label of the testing data.

After learning the optimized bilinear mapping matrix , the proposed approach maps all points in the embedding into the tangent space according to (6). It has been reported that the tangent space located at the Riemannian mean is the best hyperplane for classification, as it is associated with the minimal distance loss [15]. A three-layer ELM network was constructed on a tangent space for classification because the ELM classifier is more flexible to handle complex and multivariate data than LDA and SVM classifier. The mapping function from the th input tangent vector to the th hidden node was defined aswhere denotes the parameters of the sigmoid mapping function. Based on the principles of ELMs, we randomly chose parameters and within the range [−1, 1]. We denote by the output vector of the hidden layer with respect to and by the output weights that connect the hidden layer to the output layer, where and are the number of hidden neurons and output neurons, respectively. Moreover, the output weights can be learned by solving an optimization problem [22]:where and parameters are used to balance the loss and is a regularizer. The parameter is a matrix of the target output, and the th row of is a binary vector with only one entry (corresponding to the class to which belongs) equal to one.

The framework of ETS + ELM algorithm is shown in Figure 3; the proposed ETS + ELM algorithm includes a Riemannian graph construction for data representation, locality preserving with bilinear mapping for dimensionality reduction, tangent space mapping for feature extraction, and 3-layer extreme learning machine network for classification.

4. Experimental Results

4.1. Experimental Setup
4.1.1. Data Description

In this study, the BCI Competition 2008 motor imagery data [37] and our in-house experimental data are used for performance evaluation.(1)The BCI Competition 2008 IIa EEG motor imagery data were recorded from nine subjects who performed four kinds of motor imagery actions (imagined right hand, left hand, foot, and tongue movements) with 22 EEG channels. For each subject, there were 144 trials per class, including training and testing trials. The EEG signals were filtered by an 8−30 Hz bandpass filter.(2)Our in-house EEG data were recorded from 12 subjects with 64 EEG channels. For each subject, there were 234 trials per class. Our in-house experiments tested right and left hand imagined movements.

The sampling rate of the EEG signal was 250 Hz. The bandpass filter was the same as the Competition dataset.

4.1.2. Algorithms Evaluated

We evaluate the proposed algorithms (ETS + ELM) against five competing algorithms.(1)CSP + LDA: the CSP in [4] and linear discriminant analysis (LDA) in [38] were applied for evaluation(2)CSP + SVM: the number of CSP spatial filters was set to 8, as suggested in [4], for feature extraction, and a support vector machine (SVM) [39] was used for classification(3)MDRM: minimum distance to Riemannian mean [14] was used for evaluation(4)TS + LDA : LDA was used to classify signals on a tangent space [14]

4.1.3. Parameter Setting

In all experiments, we empirically set and during Riemannian graph construction. In ETS + ELM, the number of iterations was set to , and the iterative stop threshold to . In fact, most information of the original space was lost during the process, while the number of embedding dimensions was too small. However, all of the methods evaluated suffer from computational cost problems under conditions of excessively high dimensionality. Hence, the dimension was empirically set to 10. Because the ELM method suffers from an overfitting problem in the case of excessively hidden neurons, a regularizer parameter was used to control the overfitting problem. Therefore, the regularizer parameter was empirically set to 0.001, while the number of hidden neurons was 2000.

4.2. Experiments for EEG Classification

To evaluate the performance of the proposed algorithms, ETS + ELM, some competing algorithms (CSP + LDA, CSP + SVM, MDRM [14], and TS + LDA [14]) were compared on the EEG datasets from the 2008 BCI Competition and our in-house BCI system.

First, we evaluated the convergence performance of the iterative algorithm for optimization problem (12). Because the Kappa coefficient [40] scales linearly from 0 to 1 in the range between random and perfect classification, it was selected as a performance measure [40] for the classification algorithms in this study. In each iteration step of (12), we were able to obtain a learned mapping matrix and a corresponding low-dimensional embedding. The performance of the ETS + ELM classification algorithm under the same parameter settings was evaluated on these embeddings. The average performance of nine subjects on the BCI Competition IIa dataset is shown in Figure 4. It is clear that the mapping matrix learned from the proposed algorithm converged quickly. Normally, it required only two iteration steps.

We also compared the performance of various methods on classification of the BCI Competition 2008 data. In Table 1, the performance of the proposed methods, MDRM, TS + LDA, and the top three Competition winners for dataset IIa of BCI Competition 2008 (1st of BCI08, 2nd of BCI08, and 3rd of BCI08) are presented for comparison. From the results shown in Table 1, it may be observed that the proposed ETS + ELM method achieves a mean performance of 0.577, which was the best performance of all the methods compared.

We also tested the performance of the proposed algorithms in terms of classification accuracy on the BCI Competition 2008 IIa dataset with a 30-fold cross-validation procedure. The cross-validation reflects the generalization performance of the proposed method. From the results in Table 2, it is clear that, for all subjects, ETS + ELM outperformed the reference methods. This indicates that, by finding a low-dimensional embedding with the property of locality preservation, the classification method ETS + ELM exhibited the best performance.

Moreover, we evaluated the performance of all algorithms on the dataset recorded from our in-house BCI systems. As shown in Table 3, the proposed ETS + ELM showed a significant performance improvement over the prior methods (an approximately 2% improvement in classification accuracy), even when compared with the benchmark algorithm CSP [4]. In conclusion, the experimental results on the competition dataset and in-house dataset demonstrate that the proposed ETS + ELM approached achieved a higher performance than the other methods. In additional, we provide the ANOVA and t-test analysis of Tables 13. The repeated measures ANOVA results for Tables 13 are p = 0.2 × 10−10, 0.7 × 10−7, and 0.03, respectively. It indicates that all studied methods have statistically significant difference on classification performance. The t-test analysis of Tables 13 are shown in Table 4.

We also compared the computational loads of the proposed algorithm with the competing methods. In the BCI competition dataset, the training time of proposed algorithm is 24.7 s, which is far longer than the training time of MDRM (0.12 s), TSLDA (1.17 s), CSPLDA (0.18 s), and CSPSVM (0.19 s). It is because that the proposed algorithm is a semisupervised learning model where the training and test dataset are simultaneously used for learning features, but the competing methods only used training dataset to learn feature. Other reason is that the proposed algorithm takes a lot of time in iterative strategy of optimization (equation (11)), but the competing methods solve the optimization problem by quick eigenvalue decomposition. However, the test time of proposed algorithm is 0.06 s, which is close to the test time of MDRM (0.17 s), TSLDA (0.03 s), CSPLDA (0.04 s), and CSPSVM (0.0.3 s). The computational loads on in-house dataset are similar with those of competition dataset.

Finally, we compared our bilinear locality-preserving embedding with other state-of-the-art manifold learning algorithms. We show the 2-dimensional embedding learning from ISOMAP, LLE, and the proposed algorithm in Figure 5. The motor imagery EEG data are a spatiotemporal signal, and the correlations of EEG signals in high-dimensional space are expected to be destroyed in 2-dimensional embedding. However, compared with feature distributions extracted by other manifold learning methods, the proposed method can also obtain a moderately good separability of classes in the feature space.

5. Conclusions

Inspired by the fact that the covariance feature of EEG signals in the form of SPD matrices lies on a Riemannian manifold, in this study, we have presented an approach to construct a Riemannian graph for motor imagery classification. A bilinear locality preservation method has been proposed to find low-dimensional embeddings carrying discriminant information. A classification algorithm, called ETS + ELM, was further proposed on a tangent space of the learned embedding. Experimental results on BCI competition and our in-house datasets show that the proposed bilinear locality-preserving embedding was able to capture the useful locality and discriminant information of the original manifold, and the proposed classification algorithms have higher performance than the other competing algorithms compared. The proposed methods can also be applied to many other pattern-recognition problems with input data in the form of SPD matrices, such as pedestrian detection problem [15] or hyperspectral image processing [41]. In future, we will further study weight calculation method to construct preferable Riemannian graph and study the interpretation of the feature learned by Riemannian-based methods.

Data Availability

Previously reported BCI competition EEG data used to support this study are available at https://www.bbci.de/competition/iv/#dataset2a. These prior studies (and datasets) are cited at relevant places within the article as references. The in-house EEG data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported in part by the Natural Science Foundation of Hainan, under Grant 2019RC165, and the Young Talents Science and Technology Innovation Project of Hainan Association for Science and Technology, under Grant QCXM202011.