Channel and Feature Selection for a Motor Imagery-Based BCI System Using Multilevel Particle Swarm Optimization
Brain-computer interface (BCI) is a communication and control system linking the human brain and computers or other electronic devices. However, irrelevant channels and misleading features unrelated to tasks limit classification performance. To address these problems, we propose an efficient signal processing framework based on particle swarm optimization (PSO) for channel and feature selection, channel selection, and feature selection. Modified Stockwell transforms were used for a feature extraction, and multilevel hybrid PSO-Bayesian linear discriminant analysis was applied to optimization and classification. The BCI Competition III dataset I was used here to confirm the superiority of the proposed scheme. Compared to a method without optimization (89% accuracy), the best classification accuracy of the PSO-based scheme was 99% when less than 10.5% of the original features were used, the test time was reduced by more than 90%, and it achieved Kappa values and F-score of 0.98 and 98.99%, respectively, and better signal-to-noise ratio, thereby outperforming existing algorithms. The results show that the channel and feature selection scheme can accelerate the speed of convergence to the global optimum and reduce the training time. As the proposed framework can significantly improve classification performance, effectively reduce the number of features, and greatly shorten the test time, it can serve as a reference for related real-time BCI application system research.
The brain-computer interface (BCI) is a communication and control system established between the human brain and a computer or external device without the involvement of the nervous system or muscles. Common brain activity patterns used in BCIs include P300 potentials, steady-state visual evoked potentials (SSVEP), and motor imagery (MI) . Among these, the motor imagery-based BCI system (MI-BCI) involves imagining a moving body part without any actual body movement; this provides a new approach for patients with motor disabilities for effective communication. Similar methods have been widely used for rehabilitation applications .
Pattern recognition is an important aspect of the MI-BCI system. However, brain signals contain a large amount of physiological and pathological information, and registered electroencephalogram (EEG) signals are mixed with other brain activity signals, which can overlap in both time and space . As a result, the extracted features contain a lot of redundant and misleading information, thus limiting the accuracy of classification. Furthermore, for enhancing the spatial resolution of EEG recording devices and tracing techniques, the number of signal acquisition channels was increased to 16 leads, 32 leads, and 64 leads or higher . An increase in the number of channels not only increases the spatial resolution but also increases the number of features, thus increasing the running time of classification. Hence, task-related features require proper selection mechanisms. Although a genetic algorithm is used most commonly for task recognition [5, 6], different MI optimization methods have been suggested, such as differential evolution , particle swarm optimization , concave-convex procedure , principal component analysis , and correlation-based channel and time window selection [11, 12]. It should be noted that the PSO algorithm is another promising technique with simple computation and rapid convergence characteristics, which has been successfully applied to mechanical engineering optimization, business optimization, and clustering problems .
Analysis of the existing motor imagery recognition schemes revealed several drawbacks. Firstly, high variance among the optimized feature components may result in low classification performance in some schemes, due to the inability to identify and filter out all misleading features. Secondly, the number of optimized features is still large, which requires a lot of testing time and limits practical application of these schemes. Finally, the features and channels have been analyzed separately, not taking into account the fact that the selection of the optimal features depends on the channel used. Even if the best feature set can be identified, more training time is required.
To solve these technical problems, on the one hand, this study introduces an efficient optimization framework based on multilevel PSO (MLPSO). PSO algorithm is used to perform global search in the whole search space in this scheme, and local search is performed by running this algorithm continuously. This allows improving the ability of the procedure to switch from local to global optima. The scheme reduces the number of features and enhances the classification accuracy via selection of the best feature subsets that match the expected potential cortical activity patterns during the MI task, bringing it the advantages of a great range of application and easy implementation. The reason for using MLPSO in feature optimization is that when a particle’s current position coincides with the global best position and the particle velocity is not zero, all particles will move to the position rapidly, leading to a rapid convergence of PSO algorithms. However, the algorithm convergence to the local optimal value is not guaranteed. The procedure means that all particles move to the best position found at present; this phenomenon is known as stagnation . However, this problem can be solved by running the optimizer several times for the same cost function. Hence, MLPSO was used to optimize the task related to motor imagery in this study.
On the other hand, for the last problem, according to the proposed signal processing framework, three optimization schemes based on channel and feature selection, channel selection, and feature selection were designed. Among them, channel and feature selection eliminated irrelevant channels through channel selection and then selected features matching the task through feature selection. These steps accelerated the screening of irrelevant features.
The current study investigates a signal recognition MI-BCI framework. An MLPSO algorithm was used for optimization in combination with Bayesian linear discriminant analysis (BLDA) classification, and the modified Stockwell transforms (MST) were applied during feature extraction. Three optimization schemes of channel and feature selection, channel selection, and feature selection based on MLPSO optimization are designed for this framework. Figure 1 shows the block diagram of the proposed methodology framework. Signal processing was implemented with MATLAB, and the simulation was run on a workstation with LINUX Sever, 64 GB of memory, 512 GB of SSD, NVIDIA GeForce TITANX, and six-core Intel(R) Xeon(R) Silver 4114 CPU @ 2.20 GHz.
The remainder of this paper is organized as follows. Section 2 describes the experimental dataset and preprocessing, feature extraction, classification, multilevel PSO-based channel and feature selection, and classification performance. Sections 3 and 4 present and discuss the classification results of the proposed optimization scheme. Finally, conclusions are summarized at the end of this paper.
2. Materials and Methods
2.1. Experimental Dataset and Preprocessing
A high-quality signal is an important prerequisite for improving the classification accuracy and evaluating an algorithm’s performance. Since an electrocorticogram (ECoG) is recorded on the surface of the cortex and provides higher temporal and spatial resolution, better signal-to-noise ratio, and broader bandwidth compared to those of EEG signals, the BCI Competition III dataset I  was used in this study. During the BCI experiment, a subject had to perform imagined movements of either the left small finger or the tongue. The time series of the electrical brain activity was picked up during these trials by using an 8 × 8 ECoG platinum electrode grid, which was placed on the contralateral (right) motor cortex. All recordings were performed using a sampling rate of 1000 Hz. Every trial consisted of either an imagined tongue or an imagined finger movement and was recorded for a duration of 3 seconds.
The dataset consists of 278 trials of training data and 100 trials of test data, which are stored in a 3D matrix named X using the following format: trials × electrode channels × samples of time series. The label of the dataset is stored as a vector of -1/1 values named Y. To reduce the amount of data needed for signal processing, the data is downsampled to 100 Hz without causing distortion.
2.2. Feature Extraction
Efficient feature extraction method can isolate event characteristics from registered brain signals, thus improving classification performance. The Stockwell transform (ST) is an extension of wavelet transform, based on a moving and scalable localizing Gaussian window, providing frequency-dependent resolution while maintaining a direct connection to the Fourier spectrum .
The ST of the time series can be obtained as follows:
The Gaussian window is defined byand the standard deviation is the function of the frequency , which is equal to
By adjusting the time-frequency resolution of the standard deviation of the Gaussian window, MST can provide better energy concentration than ST, obtaining higher-frequency resolution at lower frequencies and better time localization at higher frequency . Accordingly, it has been used to detect dynamic brain signals. The standard deviation of MST is represented aswhere the scaling factors and determine the width and height of the Gaussian window, respectively.
According to the frequency range of the MI, the frequency range of the MST is set to 1–35 Hz and the interval is set to 1 Hz. The power spectral density (PSD) is then calculated. Therefore, 35 features were extracted for each channel. Since there were 64 channels, 2240 features were extracted for each trial. Therefore, the number of features of the training set and test set are 278×2240 and 100×2240, respectively. The power spectrum after feature extraction in two trials with different labels in the training set is shown in Figure 2. Observably, the frequency distribution of energy in the two figures is visibly different.
As an extension of Fisher’s linear discriminant analysis, BLDA applies regularization in the training process; it has the advantages of automatically adjusting parameters and avoiding data overfitting in classification . These characteristics make it suitable for real-time BCI systems.
In the dataset, the labels of the samples are denoted by “1” and “,” but the output of the classifier is usually not two values. Therefore, the predicted output is changed by setting the threshold to 0; that is, the predicted outputs of experiments that are ≥0 are marked as “1,” and those that are <0 are marked as “.”
2.4. Multilevel PSO for Channel and Feature Selection
PSO is a population-based optimization algorithm based on the social behavior of bird flocking . The algorithm firstly initializes a group of particles randomly in the given solution space, updating the velocity and position of the particles in the solution space by tracking two “best values.” One “best value” is the best position found by a single particle in iteration, called the personal best position (). The other is the global best position () found by all of the particles in the iteration. For the PSO, the particles are calculated based on the following equation:where and represent the position and velocity of the -th particle in the -th dimension, respectively, at iteration . and are random values between 0 and 1. and are the acceleration coefficients. To prevent a blind search of particles and the expansion of the population, the position and velocity are limited to a certain interval . When values exceeded this range, a boundary absorption strategy was adopted to set the parameters to the adjacent boundary values. is the inertia weight. In this paper, in order to better balance the search ability of the algorithm, the linearly decreasing inertia weight is used; that is,where and represent the maximum and minimum values of inertia weights, respectively, is the current iteration, and represents the maximum number of iterations.
Channel and feature selection occurs in discrete search space, so the value of particle in the state space can only be “0” and “1.” The speed update rule of PSO algorithm is still retained, but the position of the particle is determined by the following equation:where is a random number of and is the sigmoid function. Therefore, the equation indicates that the probability of a particle position value 1 is . The PSO optimization process is described as follows:(a)Initialize the population. Randomly initialize the position of the population using binary coding, initialize speed, and set the maximum number of iterations. Here, is the number of particles, and represents the dimension of the particle, which is determined by the number of features to be optimized. Each index represents one feature, where binary “1” represents the feature at the same index that will be used for classification, and “0” indicates that the feature will be ignored. Figure 3 shows a schematic of the initial population location creation.(b)Calculate the fitness of each particle in the population. Fitness value is an indicator used to measure the individual advantages and disadvantages of population. In the experiment, the inverse of the mean square error of the testing data is taken as the fitness function as follows: where is the predicted value of the testing data, is the true value, and is the number of samples.(c)Update and . For each particle, its fitness value is compared to the fitness value of the best position it has experienced, and is updated if this value is better. For each particle, its fitness value is compared with the fitness value of the global best position, and if it is better, is updated.(d)Update velocity and position. First, the updated velocity is calculated according to equations (5) and (6), where the inertia weight is obtained by (7). Next, the new position is calculated from equation (8).(e)Repeat steps b–d until the maximum number of iterations is reached, and record the fitness value and for each iteration to select the best combination of features.
Figure 4 shows the flow of multilevel PSO for channel and feature selection. The loop is terminated when the maximum execution level is reached or the number of selected features does not change.
2.5. Classification Performance
To evaluate the performance of the algorithm, Kappa values, -score, and time-based statistics such as sensitivity (recall), specificity, and precision were used. The Kappa values are an indicator used to measure the accuracy of the classification and can be given by
Here, is the classification result, and is the result of random classification. For two-class classification, the value of is 0.5 .
Sensitivity (the proportion of positives that are correctly identified), specificity (the proportion of negatives that are correctly identified), precision (the proportion of correctly predicted positives to all predicted positives), and -score (the harmonic mean of the precision and recall) are defined as follows:where true positive () and true negative () represent the numbers of left-hand little-finger movements and tongue motions, respectively, correctly classified by the algorithm. If these data are erroneously detected as the opposite movements, they are termed as false positive () and false negative ().
3.1. Parameter Settings
The scale factors and are used to adjust the width and height of the Gaussian window of the MST by continuously adjusting the value of the scale factor; the results are shown in Table 1, and the most effective features could be obtained when and = 1.
Because the two-level PSO feature selection scheme is very representative, it not only achieves higher classification accuracy but also saves a lot of time compared with multilevel optimization. Therefore, the two-level PSO is taken as an example to illustrate the parameter adjustment process, as shown in Table 2; the optimal parameters are = 100, = = 1.5, = 0.8, = 0.4, and = 20. In the experiment, we found that MLPSO usually reached the optimal value within 100 iterations. Therefore, take = 100. All experiments were started with the initial particle swarms independent of each other.
3.2. Channel Selection
To confirm the effectiveness and feasibility of the proposed MLPSO optimization framework, we fixed the feature sets representing each channel and utilized the PSO algorithm to find the best channel combination. Results are shown in Table 3. The accuracy of classification increased from 89% to 97%. Figure 5 shows the distribution of selected channels with different levels of PSO for channel selection. Eleven channels were stable in each experiment. Since all experiments were started with independent initial particle swarms, these 11 channels were considered to be the optimal channel combination, far less than the initial 64 channels. Figure 6 illustrates the positional arrangements of the corresponding electrodes. The selected channels are relatively concentrated in the upper half, while channels are almost entirely absent from the lower left part.
The electrodes of the dataset used in this study were placed under the dura of the cerebral cortex, covering the main motor area and premotor area, as well as the frontotemporal area of the left and right hemispheres . Therefore, the selected channels were located close to the motor cortex region of the brain. The unselected channels may be due to the interval of a week between collection of the training set and test set, electrode shedding, or decreased conductivity, resulting in poor signal quality.
3.3. Feature Selection
The results of feature selection are listed in Table 4. After feature selection, the classification accuracy of all experiments was improved by more than 4%, and the best accuracy level reached 99%. In addition, the number of features used for classification was 40 when optimal accuracy was reached for the first time, which was only 1.8% of the original number of features. This coincided with a 95% reduction in testing time. The number of selected features in the last several experiments of the scheme remained unchanged, suggesting that the optimal feature set related to the task was identified. Figure 7 represents a scatter diagram of optimal feature distribution.
3.4. Channel and Feature Selection
Since the optimal channel combination was identified via four-level PSO, we conducted four groups of feature selection experiments using different levels of PSO for channel selection; results are shown in Table 5. The best classification accuracy achieved in each group of experiments was 99%; and the adequate the channel selection, the shorter the feature selection time needed to achieve the greatest accuracy. Concurrently, the specificity of each experiment reached 100%.
4.1. Comparison of Channel and Feature Selection
Tables 3–5 show that when only MLPSO is used for channel selection, the best classification accuracy is 97%, while the other two schemes achieved 99% accuracy. This is a significant improvement in accuracy compared to 89% before optimization. At the same time, the number of features used to achieve 99% classification accuracy was less than 10.5% of the original number, and the test time was reduced by more than 90%. These data suggest that MLPSO-based optimization framework not only significantly improves classification accuracy but also effectively reduces the number of features, thus greatly reducing the test time. These characteristics indicate that MLPSO may be useful as a reference for related real-time BCI application system research.
Figure 8 shows the change in classification accuracy of each experiment when different levels of PSO are used. Compared with the scheme using feature selection only, use of channel and feature selection requires fewer feature selection times to achieve 99% accuracy. Meanwhile, the total training time of channel and feature selection scheme is less than that of feature selection scheme. This demonstrates that channel selection can filter out channels that are not related to the task and simplify the complexity of the optimal feature selection process. These data suggest that channel and feature selection can accelerate the convergence of the algorithm to the global optimal value, reduce computational complexity, and shorten the training time.
4.2. Comparison of the Classification Performance
Tables 3–5 also provide evaluation of classification performance corresponding to each experiment, mainly in terms of Kappa, F-score, sensitivity, specificity, and precision. The optimal Kappa value of feature selection scheme based on PSO was 0.98, which demonstrated 20% improvement compared with the method without optimization. Moreover, the accuracy and sensitivity of the PSO-based method were greatly improved, and the specificity reached 100%. These improved evaluation indices indicate the effectiveness of the proposed scheme.
4.3. Comparisons with Other Methods
Table 6 presents a comparison of the proposed method with the current state-of-the-art scheme using the same dataset. The classification accuracy of the proposed framework is evidently higher than that of the previously used algorithms. Chang et al.  proposed a feature selection scheme based on a genetic algorithm; the classification accuracy of the algorithm is 96%, and the number of selected features is 48.6% relative to the number of original features. By contrast, our scheme achieves 99% accuracy with less than 10.5% features, which proves the effectiveness of the scheme proposed in the present study. Xu et al.  proposed using gradient boosting to classify brain signals by extracting the combined features of fractal measures and LBP operators; 41 channels with the highest precision were selected, yielding 95% accuracy. Zhao et al.  used band power for channel selection and feature extraction. Eleven channels with distinctive features were selected from the initial channel. Principal component analysis was used to reduce the dimensions of features. Finally, FLDA was used for classification, achieving 94% accuracy, but the algorithm has high complexity. Ince et al.  proposed an adaptive classification scheme, including the generation of a structured redundant feature dictionary based on dual-tree undecimated wavelet packet transform (UDWT) and linear discriminant analysis (LDA) classifiers. By using only three features, 93% accuracy can be achieved, but the subset of its features will increase algorithm complexity. Wei et al.  selected the optimal channel through the genetic algorithm, and then the common spatial pattern (CSP) extracted the power characteristics, and FLDA classified it to achieve 90% classification accuracy through seven channels. Compared with other methods, our algorithm has high classification accuracy and specificity.
This study describes three optimization schemes for motor imagery-based BCI. MLPSO is used to optimize the process of channel and feature selection, channel selection, and feature selection, respectively, and MST-based PSD and BLDA were used for feature extraction and classification. The scheme of using MLPSO for feature selection and hybrid channel-feature selection achieved 99% classification accuracy, the test time was shortened by more than 90%, and Kappa values were increased from 0.78 to 0.98, and the specificity reached 100%, achieving the best reported level. The results show that the channel and feature selection scheme can accelerate the speed of finding the global optimal value and reduce the training time. Due to the excellent performance of the proposed optimization scheme, it can provide a reference for related real-time BCI application system research.
The BCI Competition III dataset I is available at http://bbci.de/competition/iii/.
Conflicts of Interest
The authors declare no conflicts of interest.
This work was supported by the Shandong Key Research and Development Project (2017GGX10102), the National Natural Science Foundation of China (no. 61701270), and the Program for Youth Innovative Research Team in University of Shandong Province, China (no. 2019KJN010).
Y. Li, J. Pan, F. Wang, and Z. Yu, “A hybrid BCI system combining P300 and SSVEP and its application to wheelchair control,” IEEE Transactions on Bio-Medical Engineering, vol. 60, no. 11, pp. 3156–3166, 2013.View at: Google Scholar
S. U. Kumar and H. H. Inbarani, “PSO-based feature selection and neighborhood rough set-based classification for BCI multiclass motor imagery task,” Neural Computing and Applications, vol. 28, no. 11, pp. 3239–3258, 2017.View at: Google Scholar
J. Xu and G. Zuo, “Motor imagery electroencephalogram feature selection algorithm based on mutual information and principal component analysis,” Journal of Biomedical Engineering, vol. 33, no. 2, pp. 201–207, 2016.View at: Google Scholar
T. N. Lal, T. Hinterberger, G. Widman et al., “Methods towards invasive human brain computer interfaces,” Advances in Neural Information Processing Systems, vol. 17, pp. 737–744, 2005.View at: Google Scholar
F. Xu, W. Zhou, Y. Zhen, and Q. Yuan, “Classification of ECoG with modified S-transform for brain-computer interface,” Journal of Computational Information Systems, vol. 10, no. 18, pp. 8029–8041, 2014.View at: Google Scholar
H. B. Zhao, C. Liu, C. Y. Yu, and H. Wang, “Channel Selection and feature extraction of ECoG-based brain-computer interface using band power,” Applied Mechanics and Materials, vol. 44, pp. 3564–3568, 2011.View at: Google Scholar
Q. Wei and W. Tu, “Channel selection by genetic algorithms for classifying single-trial ECoG during motor imagery,” in Proceedings of the 30th IEEE Engineering in Medicine and Biology Society, pp. 624–627, Vancouver, Canada, April 2008.View at: Google Scholar