Research Article  Open Access
An Incremental Version of LMVU for the Feature Extraction of MIEEG
Abstract
Due to the nonlinear and highdimensional characteristics of motor imagery electroencephalography (MIEEG), it can be challenging to get high online accuracy. As a nonlinear dimension reduction method, landmark maximum variance unfolding (LMVU) can completely retain the nonlinear features of MIEEG. However, LMVU still requires considerable computation costs for outofsample data. An incremental version of LMVU (denoted as ILMVU) is proposed in this paper. The lowdimensional representation of the training data is generated by LMVU. For each outofsample data, its nearest neighbors will be found in the highdimensional training samples and the corresponding reconstruction weight matrix be calculated to generate its lowdimensional representation as well. ILMVU is further combined with the dualtree complex wavelet transform (DTCWT), which develops a hybrid feature extraction method (named as ILMD). ILMVU is applied to extract the nonlinear features of the specific subband signals, which are reconstructed by DTCWT and have the obvious eventrelated synchronization/eventrelated desynchronization phenomenon. The average energy features of and waves are calculated simultaneously. The two types of features are fused and are evaluated by a linear discriminant analysis classifier. Based on the two public datasets with 12 subjects, extensive experiments were conducted. The average recognition accuracies of 10fold crossvalidation are 92.50% on Dataset 3b and 88.13% on Dataset 2b, and they gain at least 1.43% and 3.45% improvement, respectively, compared to existing methods. The experimental results show that ILMD can extract more accurate features with relatively lower consumption cost, and it also has better feature visualization and selfadaptive characteristics to subjects. The ttest results and Kappa values suggest the proposed feature extraction method reaches statistical significance and has high consistency in classification.
1. Introduction
Braincomputer interface (BCI) systembased rehabilitation therapy aims to help disabled people control their injured limbs by external devices and ultimately repairs their damaged nerve pathways [1–3]. The key point of the BCI system is pattern recognition for motor imagery electroencephalography (MIEEG) signals [4]. MIEEG not only contains huge amounts of physiological information but also has a close correlation with the state of consciousness. Therefore, to ensure the accuracy of pattern recognition, it is very important to extract as many separable features as possible. In addition, practical application and the time consumption involved are other significant factors to consider [5].
MIEEG is a complex nonlinear timevarying and nonstationary biological signal with highdimensional characteristics. The high dimension of the MIEEG will raise the difficulty of feature extraction and have a further impact on the accuracy of pattern recognition. To solve the highdimensional problems of MIEEG, earlier researchers adopted dimension reduction methods in machine learning, such as principal component analysis (PCA), independent component analysis (ICA), and methods based on these. PCA replaced the original features with a smaller number of features. The new features were linear combinations of the old features, which maximized the sample variance and made the new features irrelevant to each other [6]. ICA, which usually involves PCA as its preprocess, is expected to decompose a signal into linear combinations of several statistically independent components [7]. These methods are easy to implement, and their weakness is obvious at the same time. The main weakness is that these methods will lose some important information due to ignoring the nonlinear characteristic of MIEEG [8]. Manifold learning (ML) provides a better way to extract the feature of MIEEG. ML can recover the structure of lowerdimensional manifolds from highdimensional data and can help us to obtain the corresponding nonlinear embedded coordinates that are regarded as a meaningful representation of the reduced dimension of data [9]. According to the preserved relation between data points before and after dimension reduction, the methods of ML are divided into two types, the global approach and the local approach. The global approach is represented by isometric mapping (ISOMAP), and the local approach is represented by locally linear embedding (LLE) [10]. These two algorithms are the earliest proposed ML algorithms. They have been applied to the feature extraction of MIEEG. Krivov and Belyaev [11] employed ISOMAP to preserve the geodesic distance of the covariance matrices to achieve dimension reduction. For the public dataset, classification accuracy is at the same level as the common spatial pattern (CSP) algorithm. Lee et al. [12] compared the effect of the feature extractions of PCA, ISOMAP, and LLE with each other and concluded that ISOMAP is better than LLE, although a lot of information is lost. From another perspective, the local approach, such as LLE, is greatly affected by the data noise, which means that, when we use the local approach to extract the nonlinear feature of MIEEG, the data noise will affect the nonlinear structure and will further affect the classification accuracy. To overcome these limitations of ISOMAP and LLE, Weinberger and Saul [13] proposed a novel ML algorithm called maximum variance unfolding (MVU), which is based on semidefinite programming. MVU is used to maximize the Euclidean distance between data points on the premise that keep the distance in the neighborhood graph unchanging. It can detect the correct underlying dimensionality of the inputs and preserves information on both local angles and distances. In addition, Weinberger and Saul [14] emphasized that MVU is adapted to the data with noise or other particular applications by relaxing the distancepreserving constraints. However, the key step of MVU is to solve a semidefinition program, and it cannot process the huge dataset. In 2005, Weinberger et al. [15] developed an improved MVU algorithm called landmark MVU (LMVU) to make it possible to process the huge dataset, which is based on semidefinite programming and kernel matrix factorization. Nevertheless, LMVU also has a limitation in which we must employ the whole train data to reproduce the new lowdimension data points if we want to obtain the lowdimension data of new data points, which causes the excessive time consumption and further affects the implementation of the online application. Therefore, to overcome this shortcoming, a novel algorithm called incremental version of LMVU (denoted as ILMVU), which was inspired by the incremental algorithm of other ML algorithms, is presented [16–19].
However, merely extracting nonlinear features does not represent all of the information of MIEEG. As we all know, MIEEG has a clear timefrequency characteristic, and many earlier researchers obtained better results simply by extracting the timefrequency information. Wavelet transform (WT) was proposed to effectively obtain the timefrequency information of signals. The traditional WT is a continuous wavelet transform. However, researchers who are limited by the huge computation cost of WT usually employ the discrete wavelet transform (DWT), which is convenient for the computer calculations as it discretizes the scale and shift parameter of the continuous wavelet transform. Imran et al. [20] used DWT to extract the statistical features of MIEEG and then employed PCA to reduce the dimension of the proposed feature vector. The knearest neighbor (KNN) classifier was employed to classify the features, and the average recognition accuracy was 78.26%. Even though the DWT is an efficient computational algorithm, it also suffers from a few intertwined shortcomings. For example, substantial artifacts were involved in the DWTbased reconstructed signal. The dualtree complex wavelet transform (DTCWT), which overcame some deficiencies of the DWT, is a relatively recent enhancement of the DWT [21]. The real part and the imaginary part of DTCWT showed good information complementarity, which reduced the substantial aliasing of DWT. Minmin et al. [22] demonstrated the defect of aliasing. After that, they employed DTCWT and particle swarm optimization (PSO) to extract the feature of MIEEG. The accuracy on the testing set reached 90%. DTCWT employs two real DWTs, which construct the real and imaginary parts of the transform and are the enhancements of DWT. Meng et al. [23] proposed a feature extraction method that combines DTCWT and the sample entropy. On the Dataset 1 of BCI Competition IV, the average classification accuracy rate of the four subjects is 87.25%. Bashar et al. [24] used DTCWT to extract the energy of coefficients as a feature from the relevant bands of motor imaginary, and the classification accuracy reached 91.07% with KNN classifier. From the aforementioned literates, we find that more researchers start to employ DTCWT to extract the timefrequency feature of MIEEG.
In this paper, an incremental version of the LMVU algorithm, called ILMVU, is presented to reduce the time consumption during the testing stage, and it is combined with DTCWT, thus forming a novel hybrid feature extraction method of MIEEG (named as ILMD). The DTCWT is used to reconstruct the MIEEG with every subband, and the normalization energy features of the subband signal that corresponds to the wave and the wave are calculated as the timefrequency feature of MIEEG. In the meantime, ILMVU is executed to obtain the nonlinear feature of specific subband signals with obvious eventrelated synchronization (ERS)/eventrelated desynchronization (ERD) phenomenon. Finally, we perform feature fusion for the above two types of features. ILMD not only guarantees recognition accuracy but also meets the requirements of the online BCI system.
The remainder of the paper proceeds as follows: section 2 introduces the basic theory of the DTCWT and LMVU algorithm. In the following section, the ILMVU algorithm and the feature extraction method based on DTCWT and ILMVU are introduced in detail. In section 4, the experimental steps of ILMD are shown in details on BCI Competition 2003 Dataset 3b. The experimental results on two mentioned datasets and the discussion are shown in section 5. Finally, section 6 concludes the paper and the prospects of the future work.
2. Preliminary
2.1. DualTree Complex Wavelet Transform
The decomposition of a signal with DWT will produce some frequency components that we do not expect to obtain because the lowpass and highpass filters are not the ideal filters. In DTCWT, two real DWTs are employed to give the real and imaginary parts of the transform, and the lowpass filters of the two real DWTs should satisfy a very simple property: one should be approximately a halfsample shift of the other. In addition, DTCWT requires the first level of dualtree filter bank (FB) to be different from the succeeding levels [25]. More details about the decomposition and reconstruction of DTCWT can be seen in Appendix.
2.2. Landmark Maximum Variance Unfolding
LMVU was proposed to resolve the high timeconsumption problem by choosing landmarks [15]. It uses the smaller matrix of inner products between randomly chosen landmarks to reformulate the semidefinite programming (SDP). It has already been applied to the dimension reduction [26] and the feature extraction of MIEEG [27]. Assume that the dataset contains the highdimensional samples , where D denotes the dimension of the samples and n is the number of the dataset X. The free parameters of LMVU are the number of nearest neighbors r used to derive locally linear reconstructions, the number of landmarks m, the intrinsic dimension of the dataset d , and the number of nearest neighbors k used to generate distance constraints in the SDP. Based on the parameters we set above, the steps of LMVU are as follows.
Reconstruct each by a weighted sum of its nearest neighbors for r we have set above. The reconstruction weights can be obtained by minimizing the error function:where , and if is not the rnearest neighbor of .
Choose first m sample of X as landmarks and compute the linear transformation Q. First, we define the matrix , and is the n × n identity matrix. Then, partition the into blocks to distinguish the m landmarks from other samples, as follows:where is the m × m submatrix of and is the (n − m) × (n − m) submatrix of . Based on formula (10), the linear transformation Q was computed as follows:
Solve the SDP for the landmark kernel matrix L (m × m), which is the submatrix of the kernel matrix K in MVU. The SDP is expressed as follows.
Maximize trace () subject towhere denotes whether sample and is the knearest neighbor and the k has set earlier in this paper.
Produce the lowdimensional representation of the landmarks. First, we perform the eigendecomposition for the matrix L to get eigenvalues and eigenvectors. Then, the element of the landmarks can be calculated as follows:where denotes the eigenvalues of matrix L and denotes the element of the eigenvector.
Produce the lowdimensional representation of the samples that are not selected as landmarks. These lowdimensional samples are reconstructed as follows:
So far, we obtain the lowdimensional representation of all samples . In addition, the lowdimensional dataset is denoted as .
3. Methods
3.1. Incremental LMVU
Inspired by the instinct that LMVU cannot meet the timeconsumption requirements when processing outofsample data, we proposed the incremental version of LMVU based on its basic framework, which significantly reduces the time of the feature extraction procedure.
Assume that the dataset is the training set. The highdimensional samples , which are regarded as the training sample, are contained in X. The free parameters r, m, d, and k are set same, as described in section 2.2. In addition, there is a new parameter that denotes the number of incremental nearest neighbors. In addition, the denotes the points out of the dataset X. Based on the above settings, ILMVU is divided into the training and testing parts as follows.
During the training part of ILMVU, the lowdimensional representation of , which is denoted as , is produced by the LMVU algorithm. In addition, the lowdimensional dataset is denoted as . It is worth noting that the datasets X and Y are kept in memory so that it can be used in the testing part of ILMVU.
During the testing part of ILMVU, the new sample is put into the dataset X and employed its nearest neighbors to reconstruct it in the lowdimensional space. First, we find the nearest neighbors of in dataset X and define the neighbors set as Ns. Then, we compute the incremental reconstructed weight by minimizing the function:where .
Finally, the lowdimensional representation of can be calculated by using the lowdimensional representation of its nearest neighbors and reconstructed weight IW, which is shown in the following:
3.2. Feature Extraction Method Based on DTCWT and ILMVU
In this section, a novel feature extraction method called ILMD is shown in detail. The flow chart of this method is shown in Figure 1.
This method is roughly divided into five steps: signal preprocessing; signal decomposition and reconstruction based on DTCWT; average energy feature extraction; nonlinear feature extraction based on ILMVU; and feature fusion.
3.2.1. Signal Preprocessing
According to the characteristics of the MIEEG signal, we know that the wave and wave include features with relatively obvious information. In addition, the ERS/ERD phenomenon is most obvious in the C3 and C4 channels [28, 29]. Therefore, according to the average power spectrum analysis of MIEEG of the C3 and C4 channels, we can obtain the optimal time block. The average power of the ch channel can be calculated as follows:where N denotes the number of the trials and represents the sample point of trial in the ch channel.
Based on this, the average power spectrum of imagine left and right hands movements is drawn. In addition, the optimal time block [min, max] with the most obvious ERS/ERD phenomenon is selected according to the average power spectrum and is denoted as OT.
3.2.2. Signal Decomposition and Reconstruction Based on DTCWT
As mentioned in Section 2.1 and Appendix, a signal via the Jlevels DTCWT decomposition obtains its complex wavelet coefficient and its complex scale coefficient , which are calculated in formula (19)–(26).
With the coefficients that we obtained below, we can reconstruct the signal by using formula (27). Finally, by setting the wavelet coefficients of other levels to zero, we can obtain a signal in the specific frequency band. If the sampling frequency of the signal is and we perform JLevel DTCWT reconstruction to the signal, we will obtain the subband signals , and the corresponding frequency band ranges are .
3.2.3. Calculation of Average Energy Feature
As some former researchers have demonstrated, most of the motor imaginerelated information is contained in the wave (8–13 Hz, namely, rhythm) and the wave (13–30 Hz, namely, rhythm) [28]. For general reasons, the timefrequency feature extraction of this paper is in the two mentioned waves.
First, as for the subband signals obtained from section 3.2.2, we selected the data in the optimal time block OT, which were recorded as . Then, we set a sliding time window of length 2 . We can calculate the energy within the sliding time windows as follows:
After that, we choose signals whose frequency band is close to the wave and the wave, whose energy are recorded as , , , and . The normalization energy is computed as follows:
The normalization energy of two subband signals in the sliding time window is calculated by sliding one sample point at a time. From above, we can obtain four energy sequences of length , which are expressed as , , , and . Then, we obtain the , , , and by calculating the average value of each energy sequence.
Finally, according to the ERS/ERD phenomenon, the average energy difference between the C3 conductor and the C4 conductor in the same wave of signal can be calculated as follows:
As for each trial of motor imagined, we can obtain a 2dimensional average energy feature , which is shown as follows:
3.2.4. Nonlinear Feature Extraction Based on ILMVU
As for each subband signal obtained from section 3.2.3, we drew the average power spectrum, as shown in section 3.2.1. Then, we obtain and , which has the most obvious ERS/ERD phenomenon in OT. To obtain the features that are more conducive to the classification, ILMD does the following with and :where denotes the sample point in motor imagine task of . Then, we obtain the initial highdimensional feature of motor imagine task as follows:
After that, we construct the highdimensional training feature set as the input of ILMVU. Then, we set the initial parameters of ILMVU, which is mentioned in section 3.1.
As for the samples in Tr, we obtain their ddimensional feature by calculating formula (1)–(6). And for the sample out of Tr, we should execute the training part of ILMVU first and then calculate formula (9)(10) to obtain its ddimensional feature. Therefore, for any given motor imagine task, a ddimensional nonlinear feature can be obtained, and it is denoted as .
3.2.5. Feature Fusion
To make full use of the information obtained from multiple aspects, features and are integrated through a serial port connection. However, the order of magnitude difference between and is too large to affect classification. So, as to reduce the influence of the order difference on the classification, we made 100 times bigger than before. Eventually, we obtain the features as follows:
To verify the effectiveness of ILMD, the linear district analysis (LDA) classifier is selected to classify the features F.
4. Experimental Research
4.1. Dataset Description
To increase the persuasiveness of ILMD, we verify ILMD in two public datasets.
The first dataset is from BCI Competition 2003 Dataset 3b provided by BCI Lab, Graz University of Technology [29], hereinafter referred to as Dataset 3b. This dataset was composed of 280 trials from 3 subjects, of which 140 were training and 140 were used to test images of left/right hand movements. The sequence diagram of each trial is shown in Figure 2. The signal was sampled at 128 Hz. The MIEEG channels were measured over C3, Cz, and C4 conductors, using AgCl as an electrode. The electrode placement is shown in Figure 3. The placement of the electrode obeys the 10–20 electrode system.
The other dataset is from BCI Competition 2008 Datasets 2b provided by BCI Lab, Graz University of Technology [30], hereinafter referred to as Dataset 2b. This dataset consists of EEG data from 9 subjects, namely, B01–B09. For each subject, five sessions are provided, whereby the first two sessions were recorded without feedback, and the last three sessions were recorded with feedback. The MI tasks are same as the former dataset. For first two sessions, each session consisted of six runs with ten trials. The time schedule is shown in Figure 4(a), in which each trial has a short break of at least 1.5 seconds in the end. For the three online feedback sessions, four runs with positive feedback, denoted by a smiley symbol, were recorded (see Figure 4(b)), whereby each run consisted of twenty trials. Depending on the cue presented from 3 to 7.5 seconds, the subjects were required to move the smiley towards the left or right side by imagining left or right hands movements, respectively. During the feedback period, the smiley changed to green when moved in the correct direction; otherwise, it became red. Three channels of bipolar recording (C3, Cz, and C4) were acquired with a sampling frequency of 250 Hz. The electrode placement is the same as the former dataset, which is shown in Figure 3. The placement of the electrode also obeys the 10–20 electrode system.
(a)
(b)
In this section, the proposed feature extraction method is shown in detail using Dataset 3b. The results of the two mentioned datasets are shown in the “Result and Discussion” section.
4.2. Optimal Time Block Selection
The MI is a process but not a transient result. For this reason, the features of signal in different time blocks also show differences. To obtain a better result of classification, we should select the optimal time block related to the MI task based on the ERD/ERS phenomenon and apply it to the following steps.
Based on the above analysis, the average power of MIEEG over C3 and C4 conductors under two classes of MI tasks was calculated according to formula (9). The average power spectrum of the imagined left and right hands movements is shown in Figure 5.
From Figure 5, it can be seen that the time block [3.5, 7.5] s shows the greatest diversity under the different MI tasks. When the sample frequency of signal is 128 Hz, the sample points corresponding to this time block are denoted as approximately [450, 1000], which correspond to the optimal time block OT, which was defined in section 3.2.1.
4.3. Subband Selection
DTCWT can divide the frequency bands accurately, which is a recent enhancement to DWT. In addition, the MI tasks always exhibit their characteristics over a given frequency band. For this reason, ILMD obtains each subband signal of the original signal with the decomposition and reconstruction of DTCWT. Then, the optimal frequency band will be obtained by plotting the respective average power spectrum and analyzing it based on the ERS/ERD phenomenon.
As mentioned in section 3.2 and in the actual situation of the dataset, the subband signal called , , , , and via 4level DTCWT reconstruction of the original signal will be obtained, and the corresponding frequency band ranges are , , , , and . According to the rule of the wave division of EEG, these subband signals can correspond approximately to wave, wave, wave, wave, and wave. The average power spectrum is shown in Figure 6.
(a)
(b)
(c)
(d)
(e)
It can be seen in Figure 6 that the ERS/ERD phenomenon is reflected in different degrees on each subband signal. For the selected dataset, the phenomenon is barely visible in signal . In signal and signal ( wave), the phenomenon is only apparent in imaging right hand movement task. As for the rest two subband signals, and have the most obvious ERS/ERD phenomenon in the two classes of MI tasks, and is even better than . So, signal is selected for subsequent feature extraction on Dataset 3b. However, if the dataset is changed, the subbands selection results can be changed, namely, the multiwaves, including are all the candidates for different subjects. This reflects the subjectbased characteristics of MIEEG and will be proven in the following experiments.
To find the difference between the subband signals reconstructed by DWT and DTCWT, the average power spectrum of signal is shown in Figure 7. It can be seen clearly from Figure 7 that the signal reconstructed by DTCWT has more obvious ERS/ERD phenomenon than that of DWT. It is because DWT may introduce substantial artifacts in signal reconstruction and cause that the corresponding ERS/ERD phenomenon is not obvious, and further more obvious features cannot be extracted. Figure 7 illustrates that DTCWT is more suitable for the subband reconstruction of MIEEG, thanks to its perfect reconstruction characteristics.
(a)
(b)
4.4. Filter Bank Selection of DTCWT
The FB of DTCWT has several selectivities. And as section 2.1 was introduced, DTCWT requires that the first level of the dualtree FB be different from the succeeding levels. In this paper, dtcwttoolbox 4.3 was used to execute the decomposition and reconstruction of signals obtained from the previous section. This toolbox provides several FBs in the first level and in the succeeding levels of DTCWT. These FBs are shown in detail in Table 1.
 
Note: 6,6 nonzero taps; 10,10 nonzero taps. 
Different reconstructed signals can be obtained by different combinations of these FBs. According to the classification accuracy of these reconstructed signals, the combination of the best FBs is selected for the following steps. The classification accuracy of different combinations with ILMD is shown in Table 2. In Table 2, FB 1 denotes the FB of the first level and FB 2 denotes the FB of the succeeding levels. It can be seen that the combination of Antonini and Qshift_c obtains the highest classification accuracy. Therefore, Antonini and Qshift_c are selected as the FB in the following research.

Based on the description of section 3.2 and the selected FBs, the average energy feature can be calculated by formula (10)–(12). Finally, the 2dimensional energy feature consisting of and can be obtained and then used in the following steps.
4.5. Parameters Optimization of ILMVU
To discover the spatial structure information hidden in the MIEEG, ILMVU algorithm is used to extract the nonlinear feature of subband , and CSDP 6.2.1 solver is used to solve SDP [31].
ILMVU algorithm has five parameters that can be adjusted. They are the number of nearest neighbors r used to derive locally linear reconstructions, the number of landmarks m, the intrinsic dimension of the dataset d, the number of nearest neighbors k used to generate distance constraints in the SDP, and the number of incremental nearest neighbors . To reduce the computational expense, parameter is set to 4. In fact, the experimental result shows that the parameter has almost no effect on accuracy. As for the other four parameters of ILMVU, joint optimization is performed and evaluated by the recognition accuracy. By using the traversing methods, the optimal values of these four parameters were selected. Figure 8 illustrates the final results of parameter optimization (r = 52, m = 14). It should be noticed that the features used to classify are the combination of average energy feature and nonlinear feature .
It can be discovered that the fluctuation range of accuracy is very small when d and k are changed. This proves that the ILMVU algorithm has good robustness. In addition, when we set r to 52, m to 14, d to 5, and k to 24, the classification accuracy reaches its peak.
4.6. Feature Visualization
To observe the separability of features extracted by ILMD more intuitively, the feature visualization is carried out in this section. The feature visualization for the average energy feature is shown in Figure 9. L and R in the legend denote imaging left and right hand movement, respectively.
The horizontal and vertical axes in Figure 9 stand for the average energy difference (mentioned in formula (12) of section 3.2.3) of wave and wave, respectively. Figure 9 illustrates that feature has shown the better separability. It has a positive effect on improving the overall accuracy rate. However, the confusion of the average energy features of the two types of MI tasks still exists. It is indicated that high classification accuracy cannot be achieved by only relying on the average energy features.
Furthermore, the nonlinear feature visualization shown as Figure 10 is illustrated to compare ILMVU with the common ML algorithms. These involved algorithms are ISOMAP [32], LLE [33], Laplacian EIGENMAPS (LE) [34], Landmark ISOMAP (LISOMAP) [32], Local Tangent Space Alignment (LTSA) [35], Hessian LLE(HLLE) [36], MVU, and LMVU.
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
The visualization results of ISOMAP, LISOMAP, and LE are shown as Figures 10(a)–10(c). It can be seen clearly that these three algorithms are able to classify the MI tasks that are involved in the dataset. Furthermore, LISOMAP is no worse than ISOMAP in visualization. In Figures 10(d)–10(f), the features that are extracted by LLE, LTSA, and HLLE are visualized. These three methods perform dimensionality reduction by retaining local information. It can be simply assumed that these three methods retain the local relation, the first derivative, and the second derivative of local relation, respectively. The visualization results of LTSA and HLLE are not as good as LLE, which proved that it is not necessary to preserve complex local relationships. Figures 10(g)–10(i) are the visualization results of MVU and its extension algorithms. A preliminary conclusion can be drawn that MVUbased algorithms have a better effect on feature visualization compared with other ML algorithms. This is because the MVU or its extension algorithm retains more useful information in reducing dimensions with its strict constraints. Moreover, compared to MVU, LMVU has more obvious clustering distribution. As for ILMVU, its separability is better than that of the other two methods.
5. Results and Discussion
5.1. Results and Discussion for Dataset 3b
To verify the superiority of ILMD, a multiaspect comparison is demonstrated in this section under the same experimental conditions. The experimental environment is the Windows 10 64 bit operating system; the CPU is Intel(R) Xeon(R) E5 2683 v3; the memory is 16 GB; and software is Matlab R2017a. The 10fold crossvalidation (CV) is used in this section to reliability of experimental results. The whole 280 trials were randomly divided into 10 packages (including 28 trials per package). One package is selected as test set every time. The ten results were averaged as the final accuracy.
5.1.1. Comparison of Different Classifiers
In addition to the feature extraction method, the classification accuracy is also affected by the classifier performance. In the following, ILMD is employed to generate the hybrid features, and the commonly used classifiers, including LDA, knearest neighbors (KNN), naïve Bayes (NB), random forest (RF), logistic regression (LR), backpropagation net neural network (BP), and support vector machine (SVM), are applied to classify the features. The 10fold CV experiment results are shown in Figure 11.
It is clear that the classification accuracy of LDA is higher than that of some regression classifiers, such as SVM, BP, and LR, and it also has a slight advantage over RF and KNN. The classification result of NB is worse than most of other classifiers. The classification results show that the features extracted by ILMD do not need too complex classifiers to achieve high classification accuracy and matches best with the LDA classifier. Therefore, the LDA classifier is selected to evaluate the performance of the proposed feature extraction method.
5.1.2. Comparison with Other ML Algorithms
The proposed ILMVU and ILMD will be compared with the other ML methods in testing time and classification accuracy. The testing time of single sample and the 10fold CV classification accuracy are shown in Figure 12. The parameters in the MLbased feature extraction methods have been optimized, and the classifiers are all LDA. In Figure 12, the horizontal axis denotes the single testing time of each ML method and the vertical axis denotes the recognition rates. In order to clearly demonstrate the differences in test time among various methods, the horizontal axis uses logarithmic coordinates. As we all know, an excellent feature extraction method needs to meet the requirements of high precision and low time consumption, namely, the closer to the upper left corner, the better the performance of the method.
From Figure 12, it can be seen clearly that ILMD obtain the highest recognition rate and the lower time consumption compared with other ML algorithms. It is a benefit of the extraction of the hybrid features of MIEEG. Furthermore, more useful information is preserved from the original data with the improved ML algorithm, i.e., ILMVU. On the other hand, the results in Figure 12 are basically consistent with those obtained from Figure 10. Compared with ISOMAP, LISOMAP made great progress in recognition accuracy and time consumption. The reason is that more redundant information is discarded by choosing landmarks in the LISOMAP algorithm. LE obtains the lowest time consumption because this algorithm preserves less information. Therefore, the recognition accuracy of LE is the lowest too. As for the LLEbased algorithms, we can see clearly that the time consumption of LLE, LTSA, and HLLE increases successively. This is because the computation complexity of these three ML algorithms increases successively. However, the recognition accuracy reduces successively, which illustrates that LTSA and HLLE are unsuitable for the feature extraction of MIEEG.
The MVUbased algorithms have higher recognition accuracy. Thanks to the selection of landmarks, LMVU gains a 1% improvement compared to MVU, and the time consumption has reduced rapidly. The proposed ILMVU algorithm, which is based on the practical application, is mapped in the outofsample data directly on the trained manifold according to the principle of local reservation, which greatly reduces the time consumption of feature extraction. Although the time consumption of ILMVU is further reduced to 0.31 s, which has greatly improved the performance of online recognition for MIEEG, it shows some decrease in the recognition rate compared with LMVU. Therefore, it is necessary to analyze the differences of the results between the LMVU and ILMVU. In the following, a twosample ttest is applied to identify whether there is a significant difference when they are used for MIEEG feature extraction. The recognition rate of LMVU and ILMVU with 10 sets of parameters (d, k, r, and m) is used for ttest.
Assume that and represent the mean value of the 10fold CV’s accuracy from LMVU and ILMVU, and stand for the variance, and and denote the number of the results. The ttest statistic can be calculated as follows:
Suppose that the null hypothesis is , the results of ILMVU and LMVU come from independent random sample from normal distributions with equal means and the alternative hypothesis is , the results of ILMVU and LMVU come from populations with unequal means. The significance level was chosen as . The decision rule is to reject , if
Finally, the value is equal to 0.0135, which is less than 0.05. The null hypothesis is rejected at 0.05 significance level. Therefore, the performance of ILMVU is significantly increased compared to LMVU. Furthermore, combined with DTCWT, ILMD obtains the highest recognition rate of 92.50%.
5.1.3. Comparison with DTCWTBased Methods
The comparison results between ILMD and the methods of MIEEG feature extraction based on DTCWT for the same dataset are shown in Figure 13. The feature extract method DTCWT in Figure 13 only uses the timefrequency feature mentioned in section 3. The results of the comparison methods in other literatures are not the 10fold crossvalidation result.
Figure 13 illustrates that DTCWT combined with ILMVU obtained the highest recognition accuracy. It was proven that these two algorithms have good complementarity when applied to the feature extraction for MIEEG. In addition, the results of the other DTCWTbased feature extraction methods are generally higher. The reason is that the frequency band can be divided more precisely by employing DTCWT, which is consistent with the frequency characteristic of MIEEG. Take the result of Figure 12 into consideration, we find that using DTCWT or ILMVU alone cannot reach the recognition rate above 90%. However, when these two methods are combined with the recognition rate reaches 92.50%, which is attributed to the complementarity of DTCWT and ILMVU.
5.2. Result and Discussion on Dataset 2b
To verify the selfadaptive characteristic of ILMD, we extend this method to the multisubject dataset mentioned in section 4.1. Under the same experimental procedure as above, the subjectoptimized performance is shown as Table 3. It is worth noting that the training and testing sets in this section are divided the same way as [37]. The training sets of every subject are shown in Table 3. Session 4 and session 5 of each subject are selected as their testing set. And the optimal time block is chosen from 3.5 s to 7 s for all subjects. The comparison with waveletbased method is shown in Figures 14 and 15. The results of the waveletbased method are from [37].

From Table 3, wave and/or wave are commonly selected for most of the subjects. That is because they covered the frequency band range of motor imagery. And other bands, including wave or wave, have a little correlation with MI tasks for some subjects, such as B02, B03, and B05, which demonstrates the individuation characteristics of MIEEG.
Therefore, the individualized selection of the parameter of ILMD according to the wave characteristics of the subjects becomes the key to improving the recognition rate of ILMD.
From Figure 14, we can see that the average recognition rate of ILMD is higher than the waveletbased method. In terms of individual subjects, ILMD obtains the better results than that of the waveletbased method for subjects B01, B02, B03, B05, B07, and B09, it is as good as the waveletbased method for B08, and it is slightly lower for B04. The average recognition accuracy of ILMD makes 3.38% improvement over the waveletbased method. In addition, the kappa values shown in Figure 15 indicate that ILMD performs better than the waveletbased method for most subjects. The highest improvement is obtained for subject B03, where the kappa value increased from 0.27 to 0.71. Only B05 had a slightly lower kappa value than the waveletbased method. Although there are large variations in kappa values for different subjects, the average kappa value of ILMD is improved by 0.06 compared to the waveletbased method. It also indicates very good strength of class prediction and suggests that ILMD has high consistency in classification.
ILMD is also compared with the other methods applied to the same dataset. The experiment is finished in the same testing set, and the experiment results are shown in Table 4. The proposed feature extraction method takes an obvious advantage in the average classification accuracy and kappa value over the other methods for nine subjects. This is probably because ILMD can excavate the individual characteristics of the subjects.
In addition, CSP and its extension versions have been applied in feature extraction of MIEEG and have obtained better recognition results [43]. ILMD was further compared with CSPbased methods, including CSP, filter bank CSP (FBCSP), discriminant filter bank CSP (DFBCSP), and frequency domain CSP (FDCSP), and the experimental results on Datasets 2b are presented in Figure 16. The experimental session is selected same as [43]. All the average recognition rates in Figure 16 are 10fold crossvalidation result.
From Figure 16, it can be seen that the improved CSP method outperforms the CSP method. However, the average recognition rate of the nine subjects using the CSP method was about 80%. The average recognition rate of ILMD reaches 88.13%, which is superior to CSPbased methods. In order to further prove the advantage of ILMD over CSPbased method, ttest over ILMD and the best CSPbased method (FDCSP) is added. The process of ttest is similar with section 5.1.2. The significance level was chosen as . After being calculated, the result is equal to 0.0342, which is less than 0.05. The result of ttest illustrates that ILMD has significant advantages compared to FDCSP.
6. Conclusions
Based on the LMVU algorithm and the principle of local reservation, an incremental version of LMVU, i.e., ILMVU, is proposed. It is used for nonlinear dimension reduction of specific subband signals, acquiring the subjectbased nonlinear feature of MIEEG. In addition, DTCWT is employed to extract the normalized average energy feature of subband signals corresponding to wave and wave. The experimental results have an advantage in feature visualization, showing that the two types of features have good separability and an obvious cluster distribution, which results in a relative higher recognition accuracy and lower time consumption. This is helpful for promoting theoretical development of manifold learning and enhancing the adaptive characteristics of feature extraction, as well. However, there is some limitation in ILMD. One is that it cannot be applied to the multichannel signal, and the other is that it is only suited to two motor imaginary tasks. In a future study, we intend to integrate ILMVU with a common spatial pattern (CSP) to improve its property and to develop a broad application in BCI systems. In addition, to design a more stable BCI system, the recognition of EEG generated under high pressure will also be an important aspect of our future attention [44, 45].
Appendix
Decomposition and Reconstruction of DTCWT
Figure 17 shows the decomposition procedures of the DTCWT. As Figure 17 shows, the real component tree is represented by Tree a and the imaginary component tree is represented by Tree b.
The wavelet coefficients and scale coefficients in Tree a and Tree b can be shown as follows:where j is the scale factor and j = 1, …, J, n is the sample number, and s(t) is the signal. , , , and are the functions of the wavelet transform, which can be expressed by dualtree FBs, such as , , , and .
Afterward, the complex wavelet coefficient and the complex scaling coefficient of the dualtree complex wavelet, which formed by its real and imaginary parts as formulas (19)∼(22), can be obtained by the following equations:
The reconstruction procedure of the DTCWT is simple. The wavelet coefficients and the scaling coefficients could be inverted as
And the signal s(t) could be expressed as
Note that we can obtain a signal in each subband easily by setting the wavelet coefficients of other subbands to zero. When we process the nonstationary signal with a timefrequency characteristic, such as MIEEG, we could employ DTCWT to obtain the expected frequency bands accurately and to extract their timefrequency feature.
Data Availability
In this manuscript, previously reported two datasets were used to support this study and are available at http://bbci.de/competition/ii/ and http://www.bbci.de/competition/iii. These datasets are cited at relevant places within the text as references [29, 30].
Conflicts of Interest
The authors declare no conflicts of interest.
Authors’ Contributions
Hongwei Xi conceived the study. Mingai Li and Hongwei Xi conducted the experiments and analyzed the results. Hongwei Xi wrote the manuscript. Mingai Li, Hongwei Xi, and Xiaoqing Zhu helped in revising the paper.
Acknowledgments
This work was financially supported by the National Natural Science Foundation of China (nos. 81471770, 11882003, and 61672070). We would like to thank all of the people who have given us helpful suggestions and advice.
References
 J. R. Wolpaw, “Braincomputer interfaces as new brain output pathways,” Journal of Physiology, vol. 579, no. 3, pp. 613–619, 2007. View at: Publisher Site  Google Scholar
 J. R. Wolpaw, N. Birbaumer, D. J. McFarland, G. Pfurtscheller, and T. M. Vaughan, “Brain‐computer interfaces for communication and control,” Clinical Neurophysiology, vol. 113, no. 6, pp. 767–791, 2002. View at: Publisher Site  Google Scholar
 K. K. Ang and C. Guan, “EEGbased strategies to detect motor imagery for control and rehabilitation,” IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 25, no. 4, pp. 392–401, 2017. View at: Publisher Site  Google Scholar
 A. Bashashati, M. Fatourechi, R. K. Ward, and G. E. Birch, “A survey of signal processing algorithms in braincomputer interfaces based on electrical brain signals,” Journal of Neural Engineering, vol. 4, no. 2, pp. R32–R57, 2007. View at: Publisher Site  Google Scholar
 B. Blankertz, M. Tangermann, C. Vidaurre et al., “Detecting mental states by machine learning techniques: the Berlin brain–computer interface,” in BrainComputer Interfaces, Springer, Berlin, Heidelberg, Germany, 2009. View at: Google Scholar
 I. Horev, M. F. Yger, and M. Sugiyama, “Geometryaware principal component analysis for symmetric positive definite matrices,” Machine Learning, vol. 106, no. 4, pp. 493–522, 2017. View at: Publisher Site  Google Scholar
 L. Cao, K. S. Chua, W. K. Chong et al., “Support vector machines experts for time series forecasting,” Neurocomputing, vol. 51, no. 12, pp. 321–339, 2003. View at: Publisher Site  Google Scholar
 M. AlKadi, M. Reaz, M. Ali, and C. Liu, “Reduction of the dimensionality of the EEG channels during scoliosis correction surgeries using a wavelet decomposition technique,” Sensors, vol. 14, no. 7, pp. 13046–13069, 2014. View at: Publisher Site  Google Scholar
 H. S. Seung and D. D. Lee, “COGNITION: the manifold ways of perception,” Science, vol. 290, no. 5500, pp. 22682269, 2000. View at: Publisher Site  Google Scholar
 L. V. D. Maaten, E. Postma, and J. V. D. Herik, “Dimensionality reduction: a comparative review,” Review Literature and Arts of the Americas, vol. 10, no. 1, 2009. View at: Google Scholar
 E. Krivov and M. Belyaev, “Dimensionality reduction with isomap algorithm for EEG covariance matrices,” in Proceedings of the International Winter Conference on Braincomputer Interface, Bari, Italy, October 2016. View at: Google Scholar
 F. Lee, R. Scherer, and R. Leeb, “Feature mapping using PCA, locally linear embedding and isometric feature mapping for EEGbased brain computer interface,” in Proceedings of the 28th AAPR Workshop on Digital Imaging in Media and Education, pp. 189–196, Autrian Computer Society, Hagenberg, Austria, June 2004. View at: Publisher Site  Google Scholar
 K. Q. Weinberger and L. K. Saul, “Unsupervised learning of image manifolds by semidefinite programming,” International Journal of Computer Vision, vol. 70, no. 1, pp. 77–90, 2006. View at: Publisher Site  Google Scholar
 K. Q. Weinberger and L. K. Saul, “An introduction to nonlinear dimensionality reduction by maximum variance unfolding,” in Proceedings of the Twenty First National Conference on Artificial Intelligence (AAAI06), pp. 1683–1686, Boston, MA, USA, 2006. View at: Google Scholar
 K. Q. Weinberger, B. D. Packer, and L. K. Saul, “Nonlinear dimensionality reduction by semidefinite programming and kernel matrix factorization,” in Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics (AISTATS05), Z. Ghahramani and R. Cowell, Eds., pp. 381–388, Barbados, West Indies, 2005. View at: Google Scholar
 O. Abdelmannan, A. B. Hamza, and A. Youssef, “Incremental Hessian locally linear embedding algorithm,” in Proceedings of the 2007 9th International Symposium on Signal Processing and Its Applications, pp. 1–4, IEEE, Sharjah, UAE, February 2007. View at: Google Scholar
 O. Kouropteva, M. O. Okun, and M. Pietikäinen, “Incremental locally linear embedding,” Pattern Recognition, vol. 38, no. 10, pp. 1764–1767, 2005. View at: Publisher Site  Google Scholar
 M. H. C. Law and A. K. Jain, “Incremental nonlinear dimensionality reduction by manifold learning,” IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 28, no. 3, pp. 377–391, 2006. View at: Publisher Site  Google Scholar
 O. AbdelMannan, A. B. Hamza, and A. Youssef, “Incremental line tangent space alignment algorithm,” in Proceedings of the Conference on Electrical & Computer Engineering, Taylor & Francis, UK, 2007. View at: Google Scholar
 S. M. Imran, M. T. F. Talukdar, S. K. Sakib et al., “Motor imagery EEG signal classification scheme based on wavelet domain statistical features,” in Proceedings of the International Conference on Electrical Engineering and Information & Communication Technology, pp. 1–4, IEEE, Dhaka, Bangladesh, April 2014. View at: Google Scholar
 N. Kingsbury, “The dualtree complex wavelet transform: a new efficient tool for image restoration and enhancement,” in Proceedings of the 9th European Signal Processing Conference (EUSIPCO 1998), pp. 1–4, IEEE, Island of Rhodes, Greece, September 1998. View at: Google Scholar
 M. Minmin, “EEG pattern recognition based on dual tree complex wavelet transform and particle swarm optimization,” in Proceedings of the 10th International Conference on Sensing Technology (ICST), Nanjing, China, November 2016. View at: Google Scholar
 M. Ming, L. Shaona, M. Haitao, M. Yuliang, and M. Yunyuan, “Feature extraction method of motor imagery EEG based on DTCWT sample entropy,” in Proceedings of the 2015 34th Chinese Control Conference, Hangzhou, China, July 2015. View at: Google Scholar
 S. K. Bashar, A. R. Hassan, and M. I. H. Bhuiyan, “Identification of motor imagery movements from EEG signals using dual tree complex wavelet transform,” in Proceedings of the International Conference on Advances in Computing, Iskandar, Malaysia, November 2015. View at: Google Scholar
 I. W. Selesnick, R. G. Baraniuk, and N. C. Kingsbury, “The dualtree complex wavelet transform,” IEEE Signal Processing Magazine, vol. 22, no. 6, pp. 123–151, 2005. View at: Publisher Site  Google Scholar
 S. Hou, R. C. Qiu, Z. Chen, and Z. Hu, “SVM and dimensionality reduction in cognitive radio with experimental validation,” 2011, http://arxiv.org/abs/1106.2325. View at: Google Scholar
 M. Li, H. Xi, and Y. Sun, “Feature extraction and visualization of MIEEG with LMVU algorithm,” in World Congress on Medical Physics and Biomedical Engineering 2018, L. Lhotska, L. Sukupova, I. Lacković, and G. Ibbott, Eds., Springer, Singapore, 2019. View at: Google Scholar
 H. Adeli, Z. Zhou, and N. Dadmehr, “Analysis of EEG records in an epileptic patient using wavelet transform,” Journal of Neuroscience Methods, vol. 123, no. 1, pp. 69–87, 2003. View at: Publisher Site  Google Scholar
 B. Blankertz, K. R. Muller, D. J. Krusienski et al., “The BCI competition III: validating alternative approaches to actual BCI problems,” IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 14, no. 2, pp. 153–159, 2006. View at: Publisher Site  Google Scholar
 M. Tangermann, K. R. Müller, A. Aertsen et al., “Review of the BCI Competition IV,” Frontiers in Neuroscience, vol. 6, 2012. View at: Publisher Site  Google Scholar
 B. Borchers, “CSDP, A C library for semidefinite programming,” Optimization Methods and Software, vol. 11, no. 14, p. 11, 1999. View at: Publisher Site  Google Scholar
 J. B. Tenenbaum, “A global geometric framework for nonlinear dimensionality reduction,” Science, vol. 290, no. 5500, pp. 2319–2323, 2000. View at: Publisher Site  Google Scholar
 L. K. Saul and S. T. Roweis, “Think globally, fit locally: unsupervised learning of low dimensional manifolds,” Journal of Machine Learning Research, vol. 4, pp. 119–155, 2003. View at: Google Scholar
 M. Belkin and P. Niyogi, Laplacian Eigenmaps for Dimensionality Reduction and Data Representation, MIT Press, Cambridge, MA, USA, 2003.
 Z.y. Zhang and H.y. Zha, “Principal manifolds and nonlinear dimensionality reduction via tangent space alignment,” Journal of Shanghai University (English Edition), vol. 8, no. 4, pp. 406–424, 2004. View at: Publisher Site  Google Scholar
 D. L. Donoho and C. Grimes, “Hessian eigenmaps: locally linear embedding techniques for highdimensional data,” Proceedings of the National Academy of Sciences, vol. 100, no. 10, pp. 5591–5596, 2003. View at: Publisher Site  Google Scholar
 M. Alansari, M. Kamel, B. Hakim, and Y. Kadah, “Study of waveletbased performance enhancement for motor imagery braincomputer interface,” in Proceedings of the 6th International Conference on BrainComputer Interface (BCI), Gangwon, South Korea, January 2018. View at: Google Scholar
 M.a. Li, W. Zhu, H.n. Liu, and J.f. Yang, “Adaptive feature extraction of motor imagery EEG with optimal wavelet packets and SEisomap,” Applied Sciences, vol. 7, no. 4, p. 390, 2017. View at: Publisher Site  Google Scholar
 J. Luo, Z. Feng, J. Zhang et al., “Dynamic frequency feature selection based approach for classification of motor imageries,” Computers in Biology and Medicine, vol. 75, pp. 45–53, 2016. View at: Publisher Site  Google Scholar
 S. Kumar, A. Sharma, and T. Tsunoda, “An improved discriminative filter bank selection approach for motor imagery EEG signal classification using mutual information,” BMC Bioinformatics, vol. 18, no. 16, p. 545, 2017. View at: Publisher Site  Google Scholar
 J. Wang, Z. Feng, N. Lu, and J. Luo, “Toward optimal feature and time segment selection by divergence method for EEG signals classification,” Computers in Biology and Medicine, vol. 97, pp. 161–170, 2018. View at: Publisher Site  Google Scholar
 Y. R. Tabar and U. Halici, “A novel deep learning approach for classification of EEG motor imagery signals,” Journal of Neural Engineering, vol. 14, no. 1, Article ID 016003, 2017. View at: Publisher Site  Google Scholar
 L. Na, W. Jie, and F. Zuren, “Feature extraction by common spatial pattern in frequency domain for motor imagery tasks classification,” in Proceedings of the 29th Chinese Control and Decision Conference (CCDC), Chongqing, China, May 2017. View at: Google Scholar
 C. T. Lin, Y. K. Wang, J. W. Fan et al., “The influence of acute stress on brain dynamics,” in Proceedings of the 2013 IEEE Symposium on Computational Intelligence, Cognitive Algorithms, Mind, and Brain (CCMB), pp. 6–10, IEEE, Singapore, April 2013. View at: Google Scholar
 J. T. King, M. Prasad, T. Tsai et al., “Influence of time pressure on inhibitory brain control during emergency driving,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 99, pp. 1–7, 2018. View at: Google Scholar
Copyright
Copyright © 2019 Mingai Li et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.