Computational Intelligence and Neuroscience

Volume 2019, Article ID 4317078, 19 pages

https://doi.org/10.1155/2019/4317078

## An Incremental Version of L-MVU for the Feature Extraction of MI-EEG

^{1}Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China^{2}Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing 100124, China

Correspondence should be addressed to Mingai Li; nc.ude.tujb@iagnimil

Received 27 February 2019; Revised 4 April 2019; Accepted 7 April 2019; Published 2 May 2019

Academic Editor: Amparo Alonso-Betanzos

Copyright © 2019 Mingai Li et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Due to the nonlinear and high-dimensional characteristics of motor imagery electroencephalography (MI-EEG), it can be challenging to get high online accuracy. As a nonlinear dimension reduction method, landmark maximum variance unfolding (L-MVU) can completely retain the nonlinear features of MI-EEG. However, L-MVU still requires considerable computation costs for out-of-sample data. An incremental version of L-MVU (denoted as IL-MVU) is proposed in this paper. The low-dimensional representation of the training data is generated by L-MVU. For each out-of-sample data, its nearest neighbors will be found in the high-dimensional training samples and the corresponding reconstruction weight matrix be calculated to generate its low-dimensional representation as well. IL-MVU is further combined with the dual-tree complex wavelet transform (DTCWT), which develops a hybrid feature extraction method (named as IL-MD). IL-MVU is applied to extract the nonlinear features of the specific subband signals, which are reconstructed by DTCWT and have the obvious event-related synchronization/event-related desynchronization phenomenon. The average energy features of and waves are calculated simultaneously. The two types of features are fused and are evaluated by a linear discriminant analysis classifier. Based on the two public datasets with 12 subjects, extensive experiments were conducted. The average recognition accuracies of 10-fold cross-validation are 92.50% on Dataset 3b and 88.13% on Dataset 2b, and they gain at least 1.43% and 3.45% improvement, respectively, compared to existing methods. The experimental results show that IL-MD can extract more accurate features with relatively lower consumption cost, and it also has better feature visualization and self-adaptive characteristics to subjects. The *t*-test results and Kappa values suggest the proposed feature extraction method reaches statistical significance and has high consistency in classification.

#### 1. Introduction

Brain-computer interface (BCI) system-based rehabilitation therapy aims to help disabled people control their injured limbs by external devices and ultimately repairs their damaged nerve pathways [1–3]. The key point of the BCI system is pattern recognition for motor imagery electroencephalography (MI-EEG) signals [4]. MI-EEG not only contains huge amounts of physiological information but also has a close correlation with the state of consciousness. Therefore, to ensure the accuracy of pattern recognition, it is very important to extract as many separable features as possible. In addition, practical application and the time consumption involved are other significant factors to consider [5].

MI-EEG is a complex non-linear time-varying and non-stationary biological signal with high-dimensional characteristics. The high dimension of the MI-EEG will raise the difficulty of feature extraction and have a further impact on the accuracy of pattern recognition. To solve the high-dimensional problems of MI-EEG, earlier researchers adopted dimension reduction methods in machine learning, such as principal component analysis (PCA), independent component analysis (ICA), and methods based on these. PCA replaced the original features with a smaller number of features. The new features were linear combinations of the old features, which maximized the sample variance and made the new features irrelevant to each other [6]. ICA, which usually involves PCA as its preprocess, is expected to decompose a signal into linear combinations of several statistically independent components [7]. These methods are easy to implement, and their weakness is obvious at the same time. The main weakness is that these methods will lose some important information due to ignoring the nonlinear characteristic of MI-EEG [8]. Manifold learning (ML) provides a better way to extract the feature of MI-EEG. ML can recover the structure of lower-dimensional manifolds from high-dimensional data and can help us to obtain the corresponding nonlinear embedded coordinates that are regarded as a meaningful representation of the reduced dimension of data [9]. According to the preserved relation between data points before and after dimension reduction, the methods of ML are divided into two types, the global approach and the local approach. The global approach is represented by isometric mapping (ISOMAP), and the local approach is represented by locally linear embedding (LLE) [10]. These two algorithms are the earliest proposed ML algorithms. They have been applied to the feature extraction of MI-EEG. Krivov and Belyaev [11] employed ISOMAP to preserve the geodesic distance of the covariance matrices to achieve dimension reduction. For the public dataset, classification accuracy is at the same level as the common spatial pattern (CSP) algorithm. Lee et al. [12] compared the effect of the feature extractions of PCA, ISOMAP, and LLE with each other and concluded that ISOMAP is better than LLE, although a lot of information is lost. From another perspective, the local approach, such as LLE, is greatly affected by the data noise, which means that, when we use the local approach to extract the nonlinear feature of MI-EEG, the data noise will affect the nonlinear structure and will further affect the classification accuracy. To overcome these limitations of ISOMAP and LLE, Weinberger and Saul [13] proposed a novel ML algorithm called maximum variance unfolding (MVU), which is based on semidefinite programming. MVU is used to maximize the Euclidean distance between data points on the premise that keep the distance in the neighborhood graph unchanging. It can detect the correct underlying dimensionality of the inputs and preserves information on both local angles and distances. In addition, Weinberger and Saul [14] emphasized that MVU is adapted to the data with noise or other particular applications by relaxing the distance-preserving constraints. However, the key step of MVU is to solve a semidefinition program, and it cannot process the huge dataset. In 2005, Weinberger et al. [15] developed an improved MVU algorithm called landmark MVU (L-MVU) to make it possible to process the huge dataset, which is based on semidefinite programming and kernel matrix factorization. Nevertheless, L-MVU also has a limitation in which we must employ the whole train data to reproduce the new low-dimension data points if we want to obtain the low-dimension data of new data points, which causes the excessive time consumption and further affects the implementation of the online application. Therefore, to overcome this shortcoming, a novel algorithm called incremental version of L-MVU (denoted as IL-MVU), which was inspired by the incremental algorithm of other ML algorithms, is presented [16–19].

However, merely extracting nonlinear features does not represent all of the information of MI-EEG. As we all know, MI-EEG has a clear time-frequency characteristic, and many earlier researchers obtained better results simply by extracting the time-frequency information. Wavelet transform (WT) was proposed to effectively obtain the time-frequency information of signals. The traditional WT is a continuous wavelet transform. However, researchers who are limited by the huge computation cost of WT usually employ the discrete wavelet transform (DWT), which is convenient for the computer calculations as it discretizes the scale and shift parameter of the continuous wavelet transform. Imran et al. [20] used DWT to extract the statistical features of MI-EEG and then employed PCA to reduce the dimension of the proposed feature vector. The k-nearest neighbor (KNN) classifier was employed to classify the features, and the average recognition accuracy was 78.26%. Even though the DWT is an efficient computational algorithm, it also suffers from a few intertwined shortcomings. For example, substantial artifacts were involved in the DWT-based reconstructed signal. The dual-tree complex wavelet transform (DTCWT), which overcame some deficiencies of the DWT, is a relatively recent enhancement of the DWT [21]. The real part and the imaginary part of DTCWT showed good information complementarity, which reduced the substantial aliasing of DWT. Minmin et al. [22] demonstrated the defect of aliasing. After that, they employed DTCWT and particle swarm optimization (PSO) to extract the feature of MI-EEG. The accuracy on the testing set reached 90%. DTCWT employs two real DWTs, which construct the real and imaginary parts of the transform and are the enhancements of DWT. Meng et al. [23] proposed a feature extraction method that combines DTCWT and the sample entropy. On the Dataset 1 of BCI Competition IV, the average classification accuracy rate of the four subjects is 87.25%. Bashar et al. [24] used DTCWT to extract the energy of coefficients as a feature from the relevant bands of motor imaginary, and the classification accuracy reached 91.07% with KNN classifier. From the aforementioned literates, we find that more researchers start to employ DTCWT to extract the time-frequency feature of MI-EEG.

In this paper, an incremental version of the L-MVU algorithm, called IL-MVU, is presented to reduce the time consumption during the testing stage, and it is combined with DTCWT, thus forming a novel hybrid feature extraction method of MI-EEG (named as IL-MD). The DTCWT is used to reconstruct the MI-EEG with every subband, and the normalization energy features of the subband signal that corresponds to the wave and the wave are calculated as the time-frequency feature of MI-EEG. In the meantime, IL-MVU is executed to obtain the nonlinear feature of specific subband signals with obvious event-related synchronization (ERS)/event-related desynchronization (ERD) phenomenon. Finally, we perform feature fusion for the above two types of features. IL-MD not only guarantees recognition accuracy but also meets the requirements of the online BCI system.

The remainder of the paper proceeds as follows: section 2 introduces the basic theory of the DTCWT and L-MVU algorithm. In the following section, the IL-MVU algorithm and the feature extraction method based on DTCWT and IL-MVU are introduced in detail. In section 4, the experimental steps of IL-MD are shown in details on BCI Competition 2003 Dataset 3b. The experimental results on two mentioned datasets and the discussion are shown in section 5. Finally, section 6 concludes the paper and the prospects of the future work.

#### 2. Preliminary

##### 2.1. Dual-Tree Complex Wavelet Transform

The decomposition of a signal with DWT will produce some frequency components that we do not expect to obtain because the low-pass and high-pass filters are not the ideal filters. In DTCWT, two real DWTs are employed to give the real and imaginary parts of the transform, and the low-pass filters of the two real DWTs should satisfy a very simple property: one should be approximately a half-sample shift of the other. In addition, DTCWT requires the first level of dual-tree filter bank (FB) to be different from the succeeding levels [25]. More details about the decomposition and reconstruction of DTCWT can be seen in Appendix.

##### 2.2. Landmark Maximum Variance Unfolding

L-MVU was proposed to resolve the high time-consumption problem by choosing landmarks [15]. It uses the smaller matrix of inner products between randomly chosen landmarks to reformulate the semidefinite programming (SDP). It has already been applied to the dimension reduction [26] and the feature extraction of MI-EEG [27]. Assume that the dataset contains the high-dimensional samples , where *D* denotes the dimension of the samples and *n* is the number of the dataset *X*. The free parameters of L-MVU are the number of nearest neighbors *r* used to derive locally linear reconstructions, the number of landmarks *m*, the intrinsic dimension of the dataset *d *, and the number of nearest neighbors *k* used to generate distance constraints in the SDP. Based on the parameters we set above, the steps of L-MVU are as follows.

Reconstruct each by a weighted sum of its nearest neighbors for *r* we have set above. The reconstruction weights can be obtained by minimizing the error function:where , and if is not the r-nearest neighbor of .

Choose first *m* sample of *X* as landmarks and compute the linear transformation *Q*. First, we define the matrix , and is the *n* × *n* identity matrix. Then, partition the into blocks to distinguish the *m* landmarks from other samples, as follows:where is the *m* × *m* submatrix of and is the (*n* − *m*) × (*n* − *m*) submatrix of . Based on formula (10), the linear transformation *Q* was computed as follows:

Solve the SDP for the landmark kernel matrix *L* (*m* × *m*), which is the submatrix of the kernel matrix *K* in MVU. The SDP is expressed as follows.

Maximize trace () subject towhere denotes whether sample and is the *k*-nearest neighbor and the *k* has set earlier in this paper.

Produce the low-dimensional representation of the landmarks. First, we perform the eigendecomposition for the matrix *L* to get eigenvalues and eigenvectors. Then, the element of the landmarks can be calculated as follows:where denotes the eigenvalues of matrix *L* and denotes the element of the eigenvector.

Produce the low-dimensional representation of the samples that are not selected as landmarks. These low-dimensional samples are reconstructed as follows:

So far, we obtain the low-dimensional representation of all samples . In addition, the low-dimensional dataset is denoted as .

#### 3. Methods

##### 3.1. Incremental L-MVU

Inspired by the instinct that L-MVU cannot meet the time-consumption requirements when processing out-of-sample data, we proposed the incremental version of L-MVU based on its basic framework, which significantly reduces the time of the feature extraction procedure.

Assume that the dataset is the training set. The high-dimensional samples , which are regarded as the training sample, are contained in *X.* The free parameters *r, m, d,* and *k* are set same, as described in section 2.2. In addition, there is a new parameter that denotes the number of incremental nearest neighbors. In addition, the denotes the points out of the dataset *X*. Based on the above settings, IL-MVU is divided into the training and testing parts as follows.

During the training part of IL-MVU, the low-dimensional representation of , which is denoted as , is produced by the L-MVU algorithm. In addition, the low-dimensional dataset is denoted as . It is worth noting that the datasets *X* and *Y* are kept in memory so that it can be used in the testing part of IL-MVU.

During the testing part of IL-MVU, the new sample is put into the dataset *X* and employed its -nearest neighbors to reconstruct it in the low-dimensional space. First, we find the -nearest neighbors of in dataset *X* and define the neighbors set as Ns. Then, we compute the incremental reconstructed weight by minimizing the function:where .

Finally, the low-dimensional representation of can be calculated by using the low-dimensional representation of its -nearest neighbors and reconstructed weight IW, which is shown in the following:

##### 3.2. Feature Extraction Method Based on DTCWT and IL-MVU

In this section, a novel feature extraction method called IL-MD is shown in detail. The flow chart of this method is shown in Figure 1.