Research Article  Open Access
Shidong Lian, Jialin Xu, Guokun Zuo, Xia Wei, Huilin Zhou, "A Novel TimeIncremental EndtoEnd Shared Neural Network with AttentionBased Feature Fusion for Multiclass Motor Imagery Recognition", Computational Intelligence and Neuroscience, vol. 2021, Article ID 6613105, 16 pages, 2021. https://doi.org/10.1155/2021/6613105
A Novel TimeIncremental EndtoEnd Shared Neural Network with AttentionBased Feature Fusion for Multiclass Motor Imagery Recognition
Abstract
In the research of motor imagery braincomputer interface (MIBCI), traditional electroencephalogram (EEG) signal recognition algorithms appear to be inefficient in extracting EEG signal features and improving classification accuracy. In this paper, we discuss a solution to this problem based on a novel stepbystep method of feature extraction and pattern classification for multiclass MIEEG signals. First, the training data from all subjects is merged and enlarged through autoencoder to meet the need for massive amounts of data while reducing the bad effect on signal recognition because of randomness, instability, and individual variability of EEG data. Second, an endtoend sharing structure with attentionbased timeincremental shallow convolution neural network is proposed. Shallow convolution neural network (SCNN) and bidirectional long shortterm memory (BiLSTM) network are used to extract frequencyspatial domain features and timeseries features of EEG signals, respectively. Then, the attention model is introduced into the feature fusion layer to dynamically weight these extracted temporalfrequencyspatial domain features, which greatly contributes to the reduction of feature redundancy and the improvement of classification accuracy. At last, validation tests using BCI Competition IV 2a data sets show that classification accuracy and kappa coefficient have reached 82.7 ± 5.57% and 0.78 ± 0.074, which can strongly prove its advantages in improving classification accuracy and reducing individual difference among different subjects from the same network.
1. Introduction
The braincomputer interface (BCI) is a communication control system established between the brain and the external devices through the signals generated by brain activity. Creating direct communication between the brain and the external device, the system does not rely on muscles or peripheral nerves but the central nervous system [1]. Motor imagery (MI) is a psychological process in which an individual simulates the body movements. During the process of performing different MI tasks, when a certain area of the cerebral cortex is activated, the metabolism and blood flow of this area increase. Meanwhile, a simultaneous information processing will lead to an amplitude decrease or even block of EEG in its mu and beta spectrum oscillation. This electrophysiologic concept is called eventrelated desynchronization (ERD). In contrast, the phenomenon of a manifest amplitude increase of mu and beta oscillation, which appears in resting or inert states, is called eventrelated synchronization (ERS) [2].
The purpose of MIBCI is to identify the imagined movements by classifying the electroencephalogram (EEG) characteristics of the brain, to control the external devices, such as robots [3, 4]. On the one hand, MIBCI can help patients with severe dysfunction and establish communication channels with the outside world. On the other hand, to some extent, it can activate the brain region to promote the remodeling of the patient’s central nervous system [5]. In contrast to the traditional rehabilitation training, it can improve the patient’s subjective initiative to achieve the rehabilitation effect, which overcomes the defect of the passive and single means of traditional rehabilitation [6]. Therefore, MIBCI has a growing potential value in the fields of motor function assist and motor neurorehabilitation. However, the high complexity and instability of the EEG signals make the feature extraction and pattern classification of signals very challenging.
The very important part of the MIBCI system is how to classify the EEG characteristics of MI task correctly and convert it into external control instructions [4]. At present, the traditional MIEEG signal feature extraction is mainly based on ERD/ERS in the µ band (8–12 Hz) and the β band (16–31 Hz), including signal bandpass filtering [7], autoregressive model [8], frequency domain statistics [9], phaselocking value (PLV) [10], wavelet transformation and waveletpacket transformation [11, 12], information entropy [13], and common spatial pattern (CSP) [14]. Based on the above methods, Li et al. [15] used waveletpacket transform (WPT) to analyze and rebuild the MIEEG signals and extract the energy characteristics of the µ band and the β band. Zhang et al. [16] analyzed MIEEG signals and extracted temporal and spatial features by using a oneversusrest filter. However, traditional feature extraction relies on manual selection of specific frequency bands, and features are very limited, which may lose part of the EEG information. In addition, the pattern classification methods of MIEEG signals include linear discrimination analysis (LDA) [17], bayesian linear discrimination analysis (BLDA) [18], logic regression (LR) [19], support vector machine (SVM) [20], and neural network (NN) [21]. The classification performance of these methods depends on the quality of feature extraction.
In more recent years, deep learning has made excellent achievements in the fields of speech recognition, image recognition, and natural language processing [22, 23]; it has been used as a good machine learning method in these fields for its advantages on selflearning of features [24–26]. Therefore, deep learning is also gradually used in the feature extraction and pattern classification of EEG signals, in some cases not only improve the accuracy, but also provide a new method to learn features from EEG data [27, 28]. For example, Tang et al. [29] investigated how convolution neural network (CNN) displayed spectral features of the series of MIEEG samples. Yang et al. [30] used augmented CSP to extract the spatial features and CNN to learn deep structural features for MIEEG classification on the BCI Competition IV data sets, which revealed that feature extraction no longer relied on manual ways. Tabar and Halici [31] proposed using a kind of deep learning method to classify MIEEG signal patterns, CNN was used to extract features, and then stacked autoencoder (SAE) was used to classify the extracted features. However, MIEEG signals are time series with strong timevarying characteristics, and CNN is not completely suitable for learning timeseries features. Therefore, Lee and Choi [32] used continuous wavelet transform to extract the temporalfrequency features of MIEEG signals and classified them by CNN. Zhou et al. [33] adopted a way based on waveletpacket and long shortterm memory (LSTM) neural network, which divided MIEEG signals into several categories through the amplitude features and timeseries information. An et al. [34] did some research on deep belief network (DBN) based on the restricted Boltzmann machine (RBM) linked up with fast Fourier transform (FFT) for MIBCI pattern recognition, and the results were significantly better than those of traditional SVMbased algorithm. However, these methods simply extracted the temporal domain, frequency domain, or temporalfrequency domain features and did not fully extract the EEG signal features. Many other methods of deep learning have also been used in the recognition of MIEEG signals, but the network structure is overly complex.
To sum up, all the above deep learning methods used in the recognition of MIEEG signals do not take full advantages on selflearning of features, which still manually select features of specific frequency bands before pattern classification. Because the features selected manually are very limited and the objective function of feature extraction is different from that of pattern classification, it is easy to get information loss. What is more, multiclass MIBCI classification mainly adopts splitting strategy. The whole process is extremely cumbersome and the classification accuracy is not high. Besides, the signaltonoise ratio of MIEEG signals is relatively low, and the data of the same person in the same task has randomness, instability, and individual variability, which makes the network trained with smallsample data sets have limitations. To reduce these limitations, a multimodal neural network is designed to form a novel endtoend shared neural network in this paper. The main contributions are as follows:(1)For the sample size of BCI Competition IV 2a data sets is small, 1s time window is used to intercept the training data, and then autoencoder (AE) network is used to enlarge the sample size of all subjects’ training data which is intercepted and merged in advance. It meets the requirement of a large amount of training data for neural network and effectively reduces the bad effect on signal recognition because of randomness, instability, and individual variability of EEG data.(2)To ensure the classification results of MIEEG signals, a novel convolution neural network structure named the shallow convolution neural network (SCNN) is proposed to extract the differentdimension frequencyspatial domain features. Because of its simple structure and fewer parameters, the training model is not easy to overfitting. Furthermore, the EEG signals processed in the frequencyspatial domain are input into bidirectional long shortterm memory (BiLSTM) network to extract the timeseries features, so that the features in the MIEEG signals are fully extracted. Finally, to reduce the redundancy of the fusion features and improve the classification accuracy, the attention model is introduced into the feature fusion layer to dynamically weight the extracted temporalfrequencyspatial domain features.(3)Through the proposed multimodal neural network, the training data of all subjects is used to train the endtoend shared neural network, and it is tested by the test data of each subject and compared with the stateoftheart methods in the MIEEG recognition field to prove its higher classification accuracy and the minimum individual difference.
The structure of this paper is as follows: Section 1 is the introduction. Section 2 describes the data sets and the details of the neural network method that we proposed. The experiment and its results are presented in Section 3. The discussion is presented in Section 4. Finally, Section 5 is the conclusion.
2. Materials and Methods
Different from images and videos, MIEEG signals are time series with strong timevarying characteristics, which own a mass of data information while the amount of data is not large. In addition, its signaltonoise ratio is low, and randomness, instability, and individual variability still exist in the process of signal acquisition. What is more, with the increase in the number of MIEEG classifications, the strategy that multiclass task was split and the method that patterns were classified after feature extraction were both introduced in the past, but it is still difficult to improve the classification accuracy. To solve these problems, we put forward a method of an attentionbased timeincremental endtoend shared neural network, as shown in Figure 1. With a combination of SCNN network and BiLSTM network, and an attention model introduced into the feature fusion process, it is practicable for feature extraction and pattern classification in a stepbystep way for the temporalfrequencyspatial domain features of multiclass MIEEG signals. This method is simply called the method of SCNNBiLSTM network based on attention.
Before feature extraction and pattern classification are carried out in a stepbystep way, the training data of all subjects is expanded using AE network. Then, the differentdimensional frequencyspatial domain features which were abstract are extracted by different convolutional kernels of SCNN network, and the timeseries features are extracted through BiLSTM network with time increments; after that, all the temporalfrequencyspatial domain features are combined with the attention mechanism. Finally, the above fusion features are input to output layer of the network for classifying. During the training process of this attentionbased timeincremental endtoend shared neural network, the convolution layers and the recurrent layers can receive the reverse propagation error of the output layer at the same time, and the gradient drop caused by the error will gradually spread to the front of the network. So, after many iterations, the network parameters are gradually updated, and the error will become smaller and smaller.
2.1. Data Description and Processing
In this paper, the data sets are taken from fourclass MIEEG data of left hand, right hand, foot, and tongue in BCI Competition IV 2a in 2008 [35]. In the data sets, the EEG data of 9 subjects was recorded with 22 Ag/AgCl electrodes and labeled as A01–A09. The sample frequency is 250 Hz, bandpass filtering is set between 0.5 Hz and 100 Hz, and line noise is suppressed by a 50 Hz notch filter. Each experiment consists of two sessions. The first session is training and the second session is testing. One run contains 48 trials (12 for each of the four possible classes), resulting in 288 trials per session. The timing scheme of experimental data acquisition is shown in Figure 2.
Firstly, we select 2 s–6 s data from the training data sets T and intercept it with a time window of 1 s. After processing, the training data sets T of 9 subjects are merged and then enlarged with AE network. Secondly, to accelerate the convergence speed of the network, prevent interference caused by abnormal EEG data, and avoid unnecessary numerical problems, the segmented training data is standardized. What is more, to increase the stability of the network, the training data sets T with training labels are reordered randomly.
Particularly, in the process of data standardization, we standardize the EEG data based on the mean and standard deviation of the raw data, so as to avoid the influence of outliers and extreme values in the data through centralization. The processed EEG data conforms to the standard normal distribution with mean 0 and standard deviation 1.
The training sets and testing sets are both standardized before being input into the network as follows:where is the segmented training data and is the segmented testing data.
And, during a single trial of BCI Competition IV 2a data, when t = 2 s, the prompt arrow appeared and lasted for 1.25 s; the subjects observed and imagined the corresponding action. When t = 3.25 s∼6 s, the subjects imagined the corresponding action. Because the EEG signals are instantaneous and susceptible to interference, in the process of motor imagery, the ERD/ERS characteristics of subjects’ motor imagery EEG are uncertain during the transition from the preparation stage to the imagination stage, which is easy to cause invalid edge data. So, the method proposed in this paper is verified by selecting the motor imagery data of 4 s∼5 s with 0.75 s interval from t = 3.25 s in the final.
2.2. Data Expansion Based on Autoencoder
The deep learning methods need a large amount of data to train the network models. However, the data samples of BCI Competition IV 2a are small. In the meantime, the different periods and the size of the electrode caps during the data acquisition process make each subject’s EEG signals have randomness and instability. Therefore, for the training sets, we select the data of 2 s∼6 s and use the 1 s time window to intercept and then use the autoencoder (AE) network to enlarge the data by generating the reconstructed data from the real training data in this paper, which is a threelayer neural network composed of an input layer, a hidden layer, and an output layer [36], as shown in Figure 3. After processing, it satisfies the need for a mass of training data of neural networks and improves the robustness of the network model, while effectively reducing the bad effect on signal recognition because of randomness, instability, and individual variability of EEG data.
In AE network, the output layer has the same size as the input layer , so can be considered as an approximation of . and represent the encoding and decoding functions, respectively. The encoding and decoding procedures are as follows:where denotes the hidden layer information, denotes weight of the input layer to the hidden layer, and denotes weight of the hidden layer to the output layer; and are biases of the hidden layer and the output layer, respectively; and are activation function of the encoding and decoding procedures, respectively. Here, both and adopt the sigmoid function. And, to simplify the calculation, let .
In this paper, the AE network firstly encodes the real training data to reduce the data dimension, and the important features of the data are retained through unsupervised learning. Then, the encoded data is decoded to obtain the reconstructed data. And, finally, the average value of the error reconstruction function between the reconstructed data and the real training data, namely, the loss cost function, is calculated to measure the similarity between them. The smaller the loss cost function is, the more similar the reconstructed data is to the real training data. However, in the process of network learning to obtain the parameter , the value of the loss cost function will become smaller and smaller, which may result in overfitting. Therefore, we adopt crossentropy in reconstruction error function to suppress overfitting to obtain an AE network with strong generalization ability. The reconstruction error function is defined as follows:
For the entire training sets , the overall loss cost function is
The function is minimized to obtain the parameter by the gradient descent method.
2.3. Shallow Convolution Neural Network
The structure of convolution neural network (CNN) is different from that of traditional hierarchical connections. The connections between neurons in CNN are not fully connected; what is more, the sharing weight of convolution kernel can reduce the complexity of the network model and reduce the weight parameters of network training, making it easier to train than the previous neural network [37].
Nevertheless, compared with the information volume of images and videos, that of EEG signals is very small. Besides, it is a kind of nonstationary, random, very weak, and low signaltonoise ratios signal with unstable waveform. When classifying EEG signals, we found that too many convolutional layers of CNN can easily lead to overfitting of the training model. Therefore, it is very crucial to structure a suitable CNN model. In this paper, the differentdimensional frequencyspatial domain features which were abstract are extracted by different convolutional kernels of shallow CNN network. The design of SCNN adopts the principle of the Visual Geometry Group (VGG) network [38]. It is a special CNN with a simple structure and few parameters. The training model is not easy to overfit and can directly extract the frequencyspatial domain features from the EEG data. The structure of SCNN is shown in Figure 4. The details of the structure used in this paper are shown in Table 1 in Section 3.

The input is onedimensional feature vector with a length of corresponding to EEG signals of channels; the convolution layer is composed of convolution kernels, the size of each convolution kernel is , the coefficient of the convolution kernel is , and the output is , wherewhere denotes the bias of the convolution kernel and denotes the nonlinear activation function that adopts the Leaky ReLU function [39, 40].
In a single SCNN network structure, the network connects one or more full connection layers and a Softmax output layer after multiple convolution, pooling, and dropout layers. Supposing that the network has a total layers, where the layer is the full connection layer, the layer is the final output layer, and the output number of cells is the number of classification categories , the entire calculation process is as follows:where denotes the output of the convolution network’s hidden layer information, and are the learning parameters of the network, is the value which is not activated before the last output layer, and is a posterior probability judging whether the input belongs to the category . The label for each input’s EEG signal category is . For all samples in the training sets, crossentropy is taken as the objective function to optimize.
2.4. Bidirectional Long ShortTerm Memory Network
EEG signals are not images in the traditional sense, but time series with a strong correlation in time. The SCNN network is not fully suitable for learning timeseries features of EEG signals; however, recurrent neural network has certain advantages on that [41, 42]. Therefore, in this paper, BiLSTM network with time increments which is a kind of recurrent neural network is connected in series before the full connection layer and after multiple convolution, pooling, and dropout layers of the SCNN network. Different from the traditional unidirectional LSTM network, the BiLSTM network improves on network structure so as to solve the gradient disappearance well and more fully extracts the information of each time point, which is suitable for EEG processing in temporal domain. The input at each moment of BiLSTM network comes from the information transmitted by the hidden layers in the forward and backward directions, and then the network combines the output of the forward and backward hidden layers to obtain its final output of each moment.
In this paper, to reduce the local convergence caused by fewer layers and the gradient disappearance caused by too many layers, the twolayer BiLSTM network is designed to converge more quickly and effectively reduce the gradient disappearance caused by too deep propagation between layers. The network’s structure and principle are shown in Figure 5.
BiLSTM network is a unidirectional LSTM network when it is performing forward calculation, and the forward calculation requires the input data before the current time. The forward calculation of the network is as follows:
When it is performing backward calculation, it is associated with the future input data after the current time. The backward calculation of the network is as follows:
The LSTM networks in the forward and backward directions maintain the state information of their own network, respectively, there is no connection between them, and the unfolded diagram of the network is not a circular structure. Having superimposed the state information coming from both directions simultaneously, then the output layer can be calculated. The overall calculation of the network is as follows:where and denote the input and the output layers, respectively; and represent the network’s weights and biases, respectively.
When BiLSTM network is combined in series with SCNN network, the calculation method for the whole network is as follows:where is the hidden layer information that is output from the whole SCNN network. After a linear transformation, it is found that is a set of effective EEG features for different categories extracted by the network from the input EEG data. Having the hidden feature of EEG signals in temporal dimension, is the synthesis of output from BiLSTM network of all time nodes, and it can largely reflect the abnormal changes of EEG signals in the temporal dimension. Then, the frequencyspatial domain features extracted by SCNN network and the timeseries features extracted by BiLSTM network are synthesized in the feature fusion layer and the fusion features of the temporalfrequencyspatial domain are obtained. Finally, the highly abstract features that have undergone multiple convolutions and cycles will be fused after a linear transformation. The relative proportions of the “good” and “bad” features are adjusted by learning weights from the training data; then, the proportions are sent into the output layer for the probability calculation of each category.
The above feature fusion is only synthesized in the direction of onedimensional vector. The frequencyspatial domain features extracted by SCNN network and the timeseries features extracted by BiLSTM network will have some redundancy. The fusion features, mechanically synthesized, will be redundant which will slow down the network training speed and then spoil final classification effect. Therefore, in this paper, attention mechanism is added to process the fusion features.
2.5. Attention Mechanism
In cognitive science, to reasonably use the finite resources of visual information processing, humans usually choose to ignore part of the information and pay attention to the more critical part of all the information; that is to say, the brain’s attention is focused on the specific visual area; this mechanism is called attention mechanism [43]. In this paper, the feature fusion process is optimized through the attention mechanism. The frequencyspatial domain features extracted by SCNN network and the timeseries features extracted by BiLSTM network are fused and the important degree of the fusion feature is calculated to obtain the effective attention, so as to realize the automatic classification of MIEEG signals by more effectively fusing temporalfrequencyspatial domain features.
In traditional sequencetosequence learning, the encoderdecoder structure is often used for learning, as shown in Figure 6.
The encoder encodes the input sequence to get the intermediate state information C and then uses the intermediate vector as the input of the decoder to get the output of each sequence at the decoding end. The overall process is as follows:
The output at each moment uses the same context semantic vector C, but, in the process of sequence encoding and decoding, we hope that the context semantic vector for each moment’s output is an appropriate vector, so the attention mechanism is introduced to select the appropriate context semantic vector according to the output of different moments. The attention model is shown in Figure 7.
The decoding process for the attention model is as follows:where is the added attention; its role is to associate the output with the relevant input and to calculate the correlation between the current output and all inputs; then,where is the hidden layer information at the position of the encoder’s input. In this paper, the SCNN outputs EEG signal with frequencyspatial domain features as the input of the encoder BiLSTM; that is, , where the forward and backward of the hidden layer information are synthesized. Weight identifies the relevancy of the input sequence to the current output sequence, which is a normalized probability value, meaning the probability of the relationship between item of the input and the output at the current moment. And weight is defined as
The definition of introduces the symbol as a feedforward neural network, which is jointly determined by the state information of the hidden layer at the decoding end and of the hidden layer at the encoding end. In the SCNNBiLSTM network based on attention that we designed above, the attention module is an additional neural network, which can give different weights to each part of the fusion features and is more sensitive to the classification target; it can effectively enhance the performance of the whole neural network in a natural way.
3. Experiment and Results
We refine the model of SCNNBiLSTM network based on attention that is designed in Section 2 and then train and test the model to verify its superiority in MIEEG multiclass recognition. The model is trained and tested on the Intel 3.6 GHz Core i710700F CPU and 16 GB RAM NVIDIA GeFore RTX 2060 GPU.
The details of the network model are shown in Table 1.
For the above deep neural network model, the minibatch gradient descent method is used for network training. To accelerate the attenuation of the network, the Adam optimizer is used for the network model, so that the model converges to the optimal value [44]. In the training of the model, the setting of the learning rate and the selection of the minibatch size affect the model’s final accuracy and training speed, so in this paper we fix the other parameters of the model, constantly change the size and attention frequency of the learning rate, set the minibatch size in different sizes, and compare the final accuracy of the model. After multiple comparative experiments, the minibatch size takes 200 and the initial value of the learning rate is set to 0.001.
While training the neural network, random dropout and padding strategies are used. Among them, the random dropout strategy for the SCNN network can prevent the network model from overfitting the training data, while the padding strategy makes the output size of the convolution layer equal to the input size to prevent the loss of feature size [45]. In this paper, the random dropout parameter P is 0.2.
Figure 8 shows the training loss rate and accuracy curve of the neural network model after 500time repeated training. It can be seen that, after 240 iterations, the training accuracy curve converges to 0.9 and the training loss rate is about 0.1.
We test the trained network model by the test data sets E; each point of the highdimensional data of fourclass MIEEG features is assigned on the lowdimensional map and is avoided to concentrate in the center of the map, so as to form a scatter plot of Tdistributed Stochastic Neighbor Embedding (TSNE) [46], as shown in Figure 9. In the TSNE scatter plot, the classification categories are represented by different colors, and it can be seen that all categories are clearly separated, but there is also some data hard to identify, which may be caused by interference during data acquisition.
Further, the network model which has been trained is measured by indicators such as accuracy, precision, sensitivity and specificity, and these indicators are calculated as follows:where TP represents the number of testing samples whose real value and model predicted value of classification category are both true, TN represents the number of testing samples whose real value and model predicted value of classification category are both negative, FP represents the number of testing samples whose real value of classification category is negative but their model predicted value is positive, and FN represents the number of testing samples whose real value of classification category is positive but their model predicted value is negative. Accuracy (ACC) is the proportion of the total number of model’s correct judgments in the total model prediction results of testing samples. Precision (PPV) is the proportion of the number of model’s correct judgments in the model prediction results of testing samples whose predicted value of classification category is positive. Sensitivity (TPR) is the proportion of the number of model’s correct judgments in the model prediction results of testing samples whose true value of classification category is positive. Specificity (TNR) is the proportion of the number of model’s correct judgments in the model prediction results of testing samples whose true value of the classification category is negative. n denotes the classification categories. The trained model classifies the test data of subjects in BCI Competition IV 2a; the classification accuracy rate of each subject and average classification accuracy rate of all subjects are shown in Figure 10.
4. Discussion
Our method of SCNNBiLSTM network based on attention in this paper is compared with the methods in the literature [16, 31, 47–52] and the classification accuracy of each method is measured by kappa coefficient. In the classification problem, the higher the kappa coefficient [53], the higher the classification accuracy. The kappa coefficient is calculated as follows:where denotes the number of known categories and is the average classification accuracy.
For analyzing the literature methods, see Table 2. The literature in [47] proposed the Filter Bank Common Spatial Pattern (FBCSP) method to extract features of MIEEG signals and adopted the “oneversusrest” multiclassification mechanism, which won the 2008 International BrainComputer Interface Competition. The literature in [48] proposed an automatic method for the classification of general artifactual source components, which was a kind of Independent Component Analysis (ICA) for artifact removal in MIEEG signals, and the classification accuracy was 69.7 ± 14.2%. The literature in [49] proposed a method which spectral regression kernel discriminant analysis (SRKDA), with a classification accuracy of 78.4 ± 14.0%. The literature in [50] proposed a method which combined CSP and Local featuresscale Decomposition (LCD) to extract features of MIEEG signals, with classification accuracy of 80.2 ± 8.10%. The literature in [51] proposed a method of adaptive Stacked Regularized Linear Discriminant Analysis (SRLDA) to analyze the temporal, spatial, and spectral information of MIEEG signals. The results showed that the adaptive SRLDA method was superior to the method of Data Space Adaptation (DSA) based on KullbackLeibler divergence. However, the abovementioned literature methods completely rely on human’s current cognition of EEG signals and require relevant professional knowledge in the process of feature extraction, which makes the feature extraction too complicated and the classification effect poor. The literature in [52] proposed a method based on the combination of wavelet transformation and 2layer CNN network, with classification accuracy of 81.2 ± 28.5%. The literature in [16] proposed using “oneversusrest” Filter Bank Common Spatial (OVRFBCSP) mode to extract features of MIEEG signals primarily; then, CNN and LSTM networks were applied to reextract and classify those primary processing features. The classification accuracy was 83.0 ± 8.34%. Although these methods have achieved some accomplishments, they did not fully utilize the advantage of deep learning’s selflearning characteristics and still followed the idea of manually extracting features first and then classifying patterns. Tabar and Halici [31] and Amin et al. [37] proposed new deep learning methods of feature extraction and pattern classification for MIEEG signals, but the classification accuracy was not high, which was 66.2 ± 11.2% and 74.5 ± 10.1%, respectively.

In this paper, we propose a method of an attentionbased timeincremental endtoend shared neural network. After extracting the frequencyspatial domain features by SCNN network and extracting the timeseries features by BiLSTM network with time increments, the method effectively learns the temporalfrequencyspatial domain features of MIEEG signals. Finally, an attention mechanism is added to the network feature fusion layer, and the extracted temporalfrequencyspatial domain features are dynamically weighted to reduce the redundancy of the fusion features and improve the classification accuracy rate to 82.7 ± 5.57%. The results of comparison between our method and the literature methods are shown in Figure 11. It is obvious from (a) and (b) that, compared with the nondeep learning methods and deep learning methods, our method has the minimum individual difference among different subjects, and also as can be seen from (c) and (d) the classification accuracy of each subject is greater than 73.3%. In other words, our method has greatly improved the overall accuracy of all 9 subjects. Therefore, our method is more suitable for the multiclass recognition of MIEEG signals that are shorttime series and is very effective in improving the overall classification and recognition.
(a)
(b)
(c)
(d)
5. Conclusions
In this paper, we propose an attentionbased timeincremental endtoend shared neural network which is essentially an endtoend trainable model formed through the unification of SCNN network, BiLSTM network, and attention mechanism. With this endtoend shared neural network, the feature extraction and pattern classification of MIEEG signals are performed in a stepbystep way, which effectively improves the accuracy and robustness of EEG recognition.
In much research of deep neural network methods for EEG signals, the amount of training data is not enough for the network’s training, and EEG signals are treated as images in usual, which may lead to the loss of information about time. Moreover, a simple network stacking can cause redundancy of features. So, to solve these issues, the method in this paper is divided into the following steps: first, a combination of all the sample training data followed by an increase in sample number using autoencoder meets the needs of a mass of training data for deep learning while effectively reducing the bad effect on signal recognition because of randomness, instability, and individual variability of EEG data. Second, BiLSTM network with time increments is connected in series with SCNN network, so that feature extraction of MIEEG in its frequencyspatial domain and temporal domain successively uses SCNN network and BiLSTM network, which can make the features of MIEEG in its temporalfrequencyspatial domain fully learned and ensure the classification results of MIEEG signal. Third, the attention mechanism introduced into the dynamic weighted feature fusion of MIEEG reduces the redundancy of the fusion features and improves the classification accuracy.
The results of comparison with the traditional nondeep methods and deep learning methods have shown the effectiveness of this endtoend shared neural network that we proposed. The method is more suitable for the multiclass recognition of MIEEG signals that are shorttime series. It has the minimum individual difference among different subjects and is very effective in improving the overall classification and recognition of subjects.
In the near future, we will continue keeping focus on the research of information in raw MIEEG data. Through the analysis of the irregularities of both distribution structure of EEG channels and EEG data, we will try to learn more about the algorithms of feature extraction, feature fusion, and pattern classification to recognize motor imagery tasks more compared to four classes and improve the classification accuracy while reducing the individual difference of the same network model in different subjects. What is more, we will also try to deploy the combination of braincomputerinterface and limb rehabilitation robot in an online system.
Data Availability
The BCI Competition IV data set 2a is available at http://www.bbci.de/competition/iv/.
Conflicts of Interest
The authors declare no conflicts of interest.
Acknowledgments
The authors thank the laboratory of braincomputer interfaces at the Graz University of Technology for providing their data. This research was supported by Zhejiang Provincial Natural Science Foundation of China (no. LQ19F030002) and “Science and Technology Innovation 2025” Major Special Project of Ningbo (no. 2020Z082).
References
 R. Abiri, S. Borhani, E. W. Sellers et al., “A comprehensive review of EEGbased braincomputer interface paradigms,” Journal of Neural Engineering, vol. 16, no. 1, Article ID 011001, 2019. View at: Publisher Site  Google Scholar
 Y. Aoh, H.J. Hsiao, M.K. Lu et al., “Eventrelated desynchronization/synchronization in spinocerebellar ataxia type 3,” Frontiers in Neurology, vol. 10, p. 822, 2019. View at: Publisher Site  Google Scholar
 S. Aggarwal and N. Chugh, “Signal processing techniques for motor imagery brain computer interface: a review,” Array, vol. 12, Article ID 100003, 2019. View at: Publisher Site  Google Scholar
 M. A. RamírezMoreno and D. Gutiérrez, “Evaluating a semiautonomous braincomputer interface based on conformal geometric algebra and artificial vision,” Computational Intelligence and Neuroscience, vol. 2019, Article ID 9374802, 19 pages, 2019. View at: Publisher Site  Google Scholar
 A. Malekmohammadi, H. Mohammadzade, A. Chamanzar et al., “An efficient hardware implementation for a motor imagery brain computer interface system,” Scientia Iranica, vol. 26, pp. 72–94, 2019. View at: Google Scholar
 X. Mao, M. Li, W. Li et al., “Progress in EEGbased brain robot interaction systems,” Computational Intelligence and Neuroscience, vol. 2017, Article ID 1742862, 25 pages, 2017. View at: Publisher Site  Google Scholar
 V. Vimala, K. Ramar, and M. Ettappan, “An intelligent sleep apnea classification system based on EEG signals,” Journal of Medical Systems, vol. 43, no. 2, p. 36, 2019. View at: Publisher Site  Google Scholar
 R. SalazarVaras and R. A. Vazquez, “Evaluating the effect of the cutoff frequencies during the preprocessing stage of motor imagery EEG signals classification,” Biomedical Signal Processing and Control, vol. 54, Article ID 101592, 2019. View at: Publisher Site  Google Scholar
 S. Tarai, R. Mukherjee, Q. A. Qurratul, B. K. Singh, and A. Bit, “Use of prosocial word enhances the processing of language: frequency domain analysis of human EEG,” Journal of Psycholinguistic Research, vol. 48, no. 1, pp. 145–161, 2019. View at: Publisher Site  Google Scholar
 M. J. Hülsemann, E. Naumann, and B. Rasch, “Quantification of phaseamplitude coupling in neuronal oscillations: comparison of phaselocking value, mean vector length, modulation index, and generalized linear modeling crossfrequency coupling,” Frontiers in Neuroscience, vol. 13, p. 573, 2019. View at: Publisher Site  Google Scholar
 Y. Liu, J. Gao, W. Cao et al., “A hybrid doubledensity dualtree discrete wavelet transformation and marginal Fisher analysis for scoring sleep stages from unprocessed singlechannel electroencephalogram,” Quantitative Imaging in Medicine and Surgery, vol. 10, no. 3, p. 766, 2020. View at: Publisher Site  Google Scholar
 S. Taran and V. Bajaj, “Motor imagery tasksbased EEG signals classification using tunableQ wavelet transform,” Neural Computing and Applications, vol. 31, no. 11, pp. 6925–6932, 2019. View at: Publisher Site  Google Scholar
 Y. Lu, M. Wang, W. Wu, Y. Han, Q. Zhang, and S. Chen, “Dynamic entropybased pattern learning to identify emotions from EEG signals across individuals,” Measurement, vol. 150, Article ID 107003, 2020. View at: Publisher Site  Google Scholar
 R. Fu, M. Han, Y. Tian, and P. Shi, “Improvement motor imagery EEG classification based on sparse common spatial pattern and regularized discriminant analysis,” Journal of Neuroscience Methods, vol. 343, Article ID 108833, 2020. View at: Publisher Site  Google Scholar
 M.A. Li, M. Zhang, and Y.J. Sun, “A novel motor imagery EEG recognition method based on deep learning,” in 2016 International Forum on Management, Education and Information Technology Application, Guangzhou, China, 2016. View at: Google Scholar
 R. Zhang, Q. Zong, L. Dou et al., “A novel hybrid deep learning scheme for fourclass motor imagery classification,” Journal of Neural Engineering, vol. 16, no. 6, Article ID 066004, 2019. View at: Publisher Site  Google Scholar
 J. Long, J. Wang, and T. Yu, “An efficient framework for EEG analysis with application to hybrid brain computer interfaces based on motor imagery and P300,” Computational Intelligence and Neuroscience, vol. 2017, Article ID 9528097, 6 pages, 2017. View at: Publisher Site  Google Scholar
 H. Rajaguru and S. K. Prabhakar, “Correlation dimension and bayesian linear discriminant analysis for alcohol risk level detection,” in Proceedings of the 2019 International Conference on Computational Vision and Bio Inspired Computing, Coimbatore, India, September 2019. View at: Google Scholar
 A. QuinteroRincón, M. Flugelman, J. Prendes et al., “Study on epileptic seizure detection in EEG signals using largest lyapunov exponents and logistic regression,” Revista Argentina de Bioingeniería, vol. 23, no. 2, pp. 17–24, 2019. View at: Google Scholar
 Y. Xu, J. Hua, H. Zhang et al., “Improved transductive support vector machine for a small labelled set in motor imagerybased braincomputer interface,” Computational Intelligence and Neuroscience, vol. 2019, Article ID 2087132, 16 pages, 2019. View at: Publisher Site  Google Scholar
 S. Raghu, N. Sriraam, Y. Temel, S. V. Rao, and P. L. Kubben, “EEG based multiclass seizure type classification using convolutional neural network and transfer learning,” Neural Networks, vol. 124, pp. 202–212, 2020. View at: Publisher Site  Google Scholar
 R. HaebUmbach, S. Watanabe, T. Nakatani et al., “Speech processing for digital home assistants: combining signal processing with deeplearning techniques,” IEEE Signal Processing Magazine, vol. 36, no. 6, pp. 111–124, 2019. View at: Publisher Site  Google Scholar
 C. Tian, L. Fei, W. Zheng et al., “Deep learning on image denoising: an overview,” Neural Networks, vol. 131, pp. 251–275, 2020. View at: Google Scholar
 S. Kurkin, A. Hramov, P. Chholak et al., “Localizing oscillatory sources in a brain by MEG data during cognitive activity,” in Proceedings of the 2020 4th International Conference on Computational Intelligence and Networks (CINE), IEEE, Kolkata, India, February 2020. View at: Google Scholar
 G. Niu, X. Yi, C. Chen et al., “A novel effluent quality predicting model based on geneticdeep belief network algorithm for cleaner production in a fullscale papermaking wastewater treatment,” Journal of Cleaner Production, vol. 265, Article ID 121787, 2020. View at: Publisher Site  Google Scholar
 S. L. Oh, J. Vicnesh, E. J. Ciaccio, R. Yuvaraj, and U. R. Acharya, “Deep convolutional neural network model for automated diagnosis of schizophrenia using EEG signals,” Applied Sciences, vol. 9, no. 14, p. 2870, 2019. View at: Publisher Site  Google Scholar
 R. T. Schirrmeister, J. T. Springenberg, L. D. J. Fiederer et al., “Deep learning with convolutional neural networks for EEG decoding and visualization,” Human Brain Mapping, vol. 38, no. 11, pp. 5391–5420, 2017. View at: Publisher Site  Google Scholar
 M. Li, R. Wang, J. Yang et al., “An improved refined composite multivariate multiscale fuzzy entropy method for MIEEG feature extraction,” Computational Intelligence and Neuroscience, vol. 2019, Article ID 7529572, 12 pages, 2019. View at: Publisher Site  Google Scholar
 Z. Tang, C. Li, and S. Sun, “Singletrial EEG classification of motor imagery using deep convolutional neural networks,” Optik, vol. 130, pp. 11–18, 2017. View at: Publisher Site  Google Scholar
 H. Yang, S. Sakhavi, K. K. Ang et al., “On the use of convolutional neural networks and augmented CSP features for multiclass motor imagery of EEG signals classification,” in Proceedings of the 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), IEEE, Milano, Italy, 2015. View at: Google Scholar
 Y. R. Tabar and U. Halici, “A novel deep learning approach for classification of EEG motor imagery signals,” Journal of Neural Engineering, vol. 14, no. 1, Article ID 016003, 2016. View at: Publisher Site  Google Scholar
 H. K. Lee and Y.S. Choi, “A convolution neural networks scheme for classification of motor imagery EEG based on wavelet timefrequecy image,” in Proceedings of the 2018 International Conference on Information Networking (ICOIN), IEEE, Chiang Mai, Thailand, 2018. View at: Google Scholar
 J. Zhou, M. Meng, Y. Gao et al., “Classification of motor imagery EEG using wavelet envelope analysis and LSTM networks,” in Proceedings of the 2018 Chinese Control and Decision Conference (CCDC), IEEE, Shenyang, China, 2018. View at: Google Scholar
 X. An, D. Kuang, X. Guo et al., “A deep learning method for classification of EEG data based on motor imagery,” in Proceedings of the 2014 International Conference on Intelligent Computing, Taiyuan, China, 2014. View at: Google Scholar
 C. Brunner, R. Leeb, G. MüllerPutz, A. Schlögl, and G. Pfurtscheller, BCI Competition 2008—Graz Data Set A, vol. 16, Institute for Knowledge Discovery (Laboratory of BrainComputer Interfaces), Graz University of Technology, Graz, Austria, 2008.
 R. Shams, M. Masihi, R. B. Boozarjomehry, and M. J. Blunt, “Coupled generative adversarial and autoencoder neural networks to reconstruct threedimensional multiscale porous media,” Journal of Petroleum Science and Engineering, vol. 186, Article ID 106794, 2020. View at: Publisher Site  Google Scholar
 S. U. Amin, M. Alsulaiman, G. Muhammad, M. A. Bencherif, and M. S. Hossain, “Multilevel weighted feature fusion using convolutional neural networks for EEG motor imagery classification,” IEEE Access, vol. 7, pp. 18940–18950, 2019. View at: Publisher Site  Google Scholar
 T. Urakawa, Y. Tanaka, S. Goto, H. Matsuzawa, K. Watanabe, and N. Endo, “Detecting intertrochanteric hip fractures with orthopedistlevel accuracy using a deep convolutional neural network,” Skeletal Radiology, vol. 48, no. 2, pp. 239–244, 2019. View at: Publisher Site  Google Scholar
 A. K. Dubey and V. Jain, “Comparative study of convolution neural network’s ReLu and leakyReLu activation functions,” in Applications of Computing, Automation and Wireless Systems in Electrical Engineering, pp. 873–880, Springer, Berlin, Germany, 2019. View at: Google Scholar
 C. Liu, W. Ding, X. Xia et al., “Circulant binary convolutional networks: enhancing the performance of 1bit dcnns with circulant back propagation,” in Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, June 2019. View at: Google Scholar
 X. Chen, J. He, X. Wu, W. Yan, and W. Wei, “Sleep staging by bidirectional long shortterm memory convolution neural network,” Future Generation Computer Systems, vol. 109, pp. 188–196, 2020. View at: Publisher Site  Google Scholar
 G. Liu and J. Guo, “Bidirectional LSTM with attention mechanism and convolutional layer for text classification,” Neurocomputing, vol. 337, pp. 325–338, 2019. View at: Publisher Site  Google Scholar
 H. Fukui, T. Hirakawa, T. Yamashita et al., “Attention branch network: learning of attention mechanism for visual explanation,” in Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019. View at: Google Scholar
 S. Bock and M. Weiß, “A proof of local convergence for the Adam optimizer,” in Proceedings of the 2019 International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, 2019. View at: Google Scholar
 M. Giménez, J. Palanca, and V. Botti, “Semanticbased padding in convolutional neural networks for improving the performance in natural language processing. A case of study in sentiment analysis,” Neurocomputing, vol. 378, pp. 315–323, 2020. View at: Publisher Site  Google Scholar
 A. C. Belkina, C. O. Ciccolella, R. Anno et al., “Automated optimized parameters for Tdistributed stochastic neighbor embedding improve visualization and analysis of large datasets,” Nature Communications, vol. 10, no. 1, pp. 1–12, 2019. View at: Publisher Site  Google Scholar
 K. K. Ang, Z. Y. Chin, C. Wang et al., “Filter bank common spatial pattern algorithm on BCI competition IV datasets 2a and 2b,” Frontiers in Neuroscience, vol. 6, p. 39, 2012. View at: Publisher Site  Google Scholar
 I. Winkler, S. Haufe, and M. Tangermann, “Automatic classification of artifactual ICAcomponents for artifact removal in EEG signals,” Behavioral and Brain Functions, vol. 7, no. 1, p. 30, 2011. View at: Publisher Site  Google Scholar
 L. F. NicolasAlonso, R. Corralejo, J. GomezPilar, D. Alvarez, and R. Hornero, “Adaptive semisupervised classification to reduce intersession nonstationarity in multiclass motor imagerybased brain computer interfaces,” Neurocomputing, vol. 159, pp. 186–196, 2015. View at: Publisher Site  Google Scholar
 Q. Ai, A. Chen, K. Chen et al., “Feature extraction of fourclass motor imagery EEG signals based on functional brain network,” Journal of Neural Engineering, vol. 16, no. 2, Article ID 026032, 2019. View at: Publisher Site  Google Scholar
 L. F. NicolasAlonso, R. Corralejo, J. GomezPilar, D. Alvarez, and R. Hornero, “Adaptive stacked generalization for multiclass motor imagerybased brain computer interfaces,” IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 23, no. 4, pp. 702–712, 2015. View at: Publisher Site  Google Scholar
 B. Xu, L. Zhang, A. Song et al., “Wavelet transform timefrequency image and convolutional networkbased motor imagery EEG classification,” IEEE Access, vol. 7, pp. 6084–6093, 2018. View at: Google Scholar
 N. Bagh and M. R. Reddy, “Improving the performance of motor imagery based braincomputer interface using phase space reconstruction,” in Proceedings of the 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), IEEE, Berlin, Germany, 2019. View at: Google Scholar
Copyright
Copyright © 2021 Shidong Lian et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.