Abstract

Brain-computer interface (BCI) provides a new communication channel between human brain and computer. In order to eliminate uncorrelated channels to improve BCI performance and enhance user convenience with fewer channels, this paper proposes a new framework using binary adaptive differential evolution bat algorithm (BADEBA). The framework uses the important ideas of differential evolution algorithm and bat algorithm to select electroencephalograph (EEG) channels and intelligently optimizes the parameters of support vector machine (SVM). It combines wavelet packet transform (WPT) and common space pattern (CSP) to achieve the goal of using fewer channels to obtain the best classification accuracy. The proposed framework is evaluated with a common data set (DEAP). The results show that, compared with genetic algorithm (GA), binary particle swarm optimization (BPSO) and bat algorithm, the proposed BADEBA in this framework only uses eight channels to improve the classification accuracy by 13.63% in the valence dimension and seven channels to improve the classification accuracy by 15.22% in the arousal dimension. In addition, the spatial distribution of the best channels selected by this method is consistent with the existing knowledge of brain structure and neurophysiology, which shows the accuracy and validity of this method.

1. Introduction

Emotion recognition as an emerging research direction has attracted increasing attention from different fields and is promising for various application domains, especially in the fields of medicine [1], education [2], and road and aviation safety [3]. The literature reveals that researchers have developed a variety of emotion recognition methods based on facial expressions [47] and voice [8, 9]. Compared with facial expressions and voice, EEG-based emotion recognition shows greater potential, because internal nerve fluctuations cannot be deliberately hidden or controlled. Other peripheral-physiological signals, such as skin temperature, heart rate, respiratory patterns, blood volume pressure, and galvanic skin response [10, 11], have low resolutions due to the problems of the signal itself. Because of its high classification and recognition accuracy, EEG signals have been introduced into the field of emotion recognition, especially the application of BCI technology, due to its simple equipment, no harm to the experimenters, and high temporal resolution of the collected brain signals.

BCI is a communication system that transmits information between brain and external devices. One of the main challenges of BCI is its personal dependence. Even if the same experiment is replicated in the same environment, different brain regions will be activated. To solve this problem, more EEG channels need to be added to obtain more decision signals. However, the use of multiple channels can cause additional problems, such as computational complexity, noise, and redundant signals, which will reduce the performance of BCI. In addition, the use of a large number of channels requires a longer preparation time, which directly affects the convenience of BCI. Therefore, the need for performance and convenience can be balanced by choosing the minimum number of channels to obtain the highest or required accuracy. Based on DEAP database, Jianhai Z et al. [12] used Relieff method to obtain the most closely related EEG channels, which greatly reduced the number of EEG channels needed to be collected at the expense of lower recognition rate. Arvaneh M et al. [13] proposed a new EEG channel selection algorithm based on sparse CSP, which can delete those channels that have poor correlations with EEG or even negative effects, and then obtain higher recognition rate while reducing the number of EEG channels. Similar enhancements based on CSP are also used for feature extraction and channel selection [14]. He L et al. [15] used an improved genetic algorithm based on Rayleigh coefficient characteristics to select the optimal subset of EEG channels. Their experiments showed the effectiveness of the method. Lan T et al. [16] proposed a method based on mutual information, which is utilized to rank the importance of channels, and important channels have the priority to be chosen. However, such kind of methods only focuses on the correlation between features and categories and ignores the relationship between features. Ansari-Asl K et al. [17] proposed a channel selection algorithm based on Synchronization Likelihood method in 2007. Five effective channels were selected from 64 EEG channels. When it was used to recognize positive, middle, and negative emotional states, the classification effect was not significantly reduced. Lahiri R et al. [18] used the firefly algorithm to optimize the channel selection for the three objectives of correlation and mutual information. The experimental results showed that the method was applicable. Then they added the classification accuracy as the fourth objective and independently optimized the four objectives. The channel selection was performed through membership ranking. The result was in the optimal solution set, and the channels selected on the basis of the optimization of the classification accuracy were in the majority [19].

Based on the aforementioned reasons, this paper proposes a BADEBA-SVM framework for channel selection, which maximizes classification accuracy and minimizes the number of channels. The framework applies mutation, crossover, and selection mechanisms of the binary adaptive differential evolution algorithm to the bat algorithm, so that mutation mechanism is introduced into the bat algorithm. This can increase the diversity of the bat population and enhance the global optimization ability of the framework by preventing individual population falling into local optimum. Combining with WPT and CSP, the best classification accuracy can be achieved by using fewer channels.

2.1. Bat Algorithm

Bat algorithm is a new heuristic swarm intelligence algorithm [20] proposed by Xinshe Yang in 2010. The algorithm searches the solution space of the optimization problem by simulating the bat's foraging behaviour to achieve the goal of optimization. Compared with intelligent algorithms such as genetic algorithm and particle swarm optimization, it has two advantages. One is to enhance the diversity of the population in the optimization process using the frequency tuning method, and the other is to improve the convergence of the algorithm by adjusting the pulse loudness and frequency adaptively.

In bat algorithm, a bat represents a feasible solution, while the prey of a bat represents the optimal solution. The position of Bat i at t time is and the velocity is , and the updating formula of Bat i at t+1 time in d-dimensional foraging space is as follows:where β is the random number within , is the frequency of sound waves emitted by Bat i at the current time, and . is the current global optimal solution.

When searching locally, a bat is randomly selected from the bat population, and according to (4), a new authority solution is created by free flight of the bat.where is the average loudness of the bat population at t time, is a random vector, and is a random solution selected from the current optimal solution. During the iteration of bat algorithm, the following equations are utilized to update loudness and emission pulse rate .where and are constants. In most applications, we usually set 0 < α < 1, γ > 0, and is the initial emission rate.

2.2. Differential Evolution Algorithm

Differential evolution algorithm is an evolutionary algorithm based on the population difference. It was proposed by Rainer Storn and Kenneth Price in 1997 to solve Chebyshev polynomials [24]. The mutation operation is the most prominent feature of the algorithm. The mutation is accomplished by adding the weighted difference between two individuals to the individual. Like GA, the algorithm includes mutation, crossover, and selection operations. The mutation operation is as follows:where (t), (t), and (t) are three individuals randomly selected from the population, and b1≠b2≠b3≠i, b1, b2, . F is a scaling factor and one of the main control parameters of differential evolution algorithm, .

Cross-operation can increase the diversity of the population. Interindividual cross-manipulation of the t-generation population and its variant intermediates is performed based on (8).where Rand(0,1) is a random number obeying uniform distribution on , rand is a random integer, j=. This crossover strategy ensures that at least one component of (t+1) is contributed by (t).

Selection operation is to determine whether individual (t+1) is superior to (t) based on fitness function to generate a new generation of individual (t+1). The equation is as follows:

3. Proposed Methods

This section provides a detailed overview of the proposed framework (Figure 1). The proposed framework consists of datasets (a training dataset and a validation dataset), signal preprocessing, channel selection, feature extraction, optimization of support vector machines, and classification.

In this framework, a novel evolutionary method is used to evaluate the fitness function value by validation set and optimize the channel subset and support vector machine parameters to obtain the highest classification accuracy and the minimum number of channels. The first step is to decompose the EEG signal into wavelet packet in frequency domain and then reconstruct the signal in the (β+γ) band. The second step is to extract the channel feature of position 1 through CSP. Here we regard the electrode space as solution space. Each bat's position vector consists of 0 and 1. 0 means that the channel is not selected, and 1 means that the channel is selected. The third step is to optimize the parameters of SVM. The parameters needed to be determined for SVM model are penalty coefficient C and kernel parameter . We selected the radial basis function as the kernel function of SVM, which is expressed as follows:where is kernel parameter.

It is found that the choice of C and have great effects on the performance of SVM. The value of penalty coefficient C will affect the complexity of model and the penalty degree of the training error. Thus it will affect the efficiency of the performance. Learning is overfitting if the value of C is too large, whereas extra learning is needed if the value is too small. The kernel parameter precisely defines the structure of high-dimensional feature space, thus controlling the complexity of the ultimate solution. A larger value of (it will end up with a hypersurface that is almost flat) causes underfitting, whereas a smaller value of (it will end up with a spiky hypersurface) causes overfitting. The fourth step is to evaluate the fitness function. We set the fitness function according to the importance of the two factors: average classification accuracy and the number of channels used. Since the maximum classification accuracy is equal to the minimum classification error, the fitness function is set as follows:where n is the number of channels, =1 indicates that this channel is selected, =0 indicates that this channel is not selected, and ACC is the average classification accuracy of SVM for validation set.

3.1. Binary Adaptive Differential Evolution Algorithm

In order to solve the optimization problem of discrete space such as channel selection, the differential evolution algorithm must be binary coded and the new mutation operator must be designed by using logic operation instead of arithmetic operation. In this paper, the following operations are defined first.Random selection of three individuals is conducive to global search, but reduces the convergence of the algorithm. In order to improve the convergence performance of the algorithm, the vector difference between the optimal individual and the suboptimal individual of the three individuals randomly selected is added. The vector difference of two randomly selected individuals is added to the optimal individual of the current algebra, which has a good performance on convergence but is easy to fall into local optimum. In order to make the algorithm possess both global search ability and convergence performance, (14) is modified as follows: using “”, “”, or exclusive or operation, respectively.where is the best of the three individuals randomly selected in the current generation. and are the other two suboptimal individuals. is the best individual in current generation, and and are two randomly selected individuals. t is the number of iterations and T is the maximum number of iterations.

Differential evolution algorithm has three control parameters: population size N, difference vector scaling factor F, and crossover probability CR. F and CR have significant influences on the performance of the algorithm. F is set to be large in the early stage and small in the late stage, and CR is set to be small in the early stage and large in the late stage. This can improve the performance of the algorithm. The modified equation of the scaling factor F is as follows:where is the upper limit of the scaling factor and is the lower limit of the scaling factor. Generally, =0.9 and =0.1.

Dynamic CR can make the algorithm converge to the appropriate position with a higher probability in the early stage of global search. In this study, the strategy of linearly decreasing weighted CR values is adopted, and the equation is as follows:where is the minimum crossover probability and is the maximum crossover probability.

3.2. Binary Adaptive Differential Evolution Bat Algorithm

This paper proposes a binary adaptive differential evolution bat algorithm, which introduces the mutation, crossover, and selection mechanism of binary adaptive differential evolution algorithm into bat algorithm. Compared with the bat algorithm, the difference between the two algorithms is that, in each evolutionary process, the evolved bat position does not directly go to the next iteration, but carries out mutation, crossover, and selection operations among individuals in the population to obtain a new bat position. Then it goes to (t+1) iteration. The specific implementation steps are shown in Algorithm 1.

Require: Objective function Fitness(), =
Initialize the bat population , , , , and
Define pulse frequency at
Initialize pulse rate and the loudness
  while  t < Max number of iterations do
  Generate new solutions by adjusting frequency, and updating velocities and locations/solutions
   according to Eq.(1)-Eq.(3)
if  rand >   then
  Select a solution among the best solutions
  Generate a local solution around the selected best solution
  end if
  Generate a new solution by flying randomly according to Eq.(4)
if  rand<&&f()<f(x)  then
  Accept the new solutions
  Increase and reduce according to Eq.(5) and Eq.(6)
end if
  Mutate using Eq. (14)
  Crossover operation using Eq.(8)
  Select operation using Eq.(9), find the current best x
  end while
Output Result

4. Experimental Data and Results

DEAP is a data set for emotion analysis using EEG, physiological and video signals [25]. In this section, only EEG signals (including 32 channels in 10-20 systems) are used for emotion recognition. The data set consisted of 32 subjects, each of whom provided 40 samples for self-evaluation of valence, arousal, dominance, and liking (ranging from 1 to 9 points). In this experiment, valence and arousal were chosen as two separate classifications. We selected the processed matlab version of the data set. Each sample lasted for 63S (3 seconds rest and 60 seconds EEG data) and contained 8064 data points, with a sampling frequency rate of 128 HZ. In order to improve the signal-to-noise ratio (SNR), the first 3 seconds of the data set were deleted, leaving only 60S useful EEG data. In data set splitting, 2440 samples were randomly selected as the training set, and 840 samples were left as the test set. In order to improve the generalizability of the model, the training set was divided into 10 parts, each containing 96 samples and used as the validation set of the model. When training the model, we adopted the method of 10-fold cross validation. In label making, we chose the value 5 as the cut-off point. Values greater than 5 were categorized into one type of emotion, and values less than or equal to 5 were categorized into the other type. The running environment of the system was matlab 2016b, win10, memory 12G, CPU model i7-7500U, 64bit operating system.

4.1. Signal Preprocessing and Feature Extraction

WPT includes the decomposition and reconstruction of wavelet packet coefficients. Wavelet packet decomposition has been extensively used in the field of signal processing. Compared with wavelet decomposition, it cannot only decompose the signal orthogonally in the whole frequency band, but also select the corresponding frequency band adaptively according to the characteristics of the signal, so that it matches the signal spectrum and has higher time-frequency resolution. Based on the multiresolution characteristic of WPT, the optimal component combination relation of EEG signal can be selected, and the signal in the useful information frequency range can be extracted and reconstructed. The decomposition algorithm for the coefficients is obtained by (17).where ,, and are all the wavelet packet coefficients, and and are the low-pass and high-pass filter coefficients for decomposition, respectively.

The reconstruction algorithm for the wavelet packet coefficients is deduced aswhere and are the low-pass and high-pass filter coefficients for reconstruction, respectively.

Research [26] shows that the high frequency components of EEG signals can reflect people's emotional and cognitive state, and β and γ bands are better than low frequency bands in distinguishing emotional state transition [27]. In order to improve the SNR of EEG signals, the emotion-related β (13-30Hz) and γ (30-50Hz) band signals are obtained. The wpcoef function Db4 is used to decompose 7-layer wavelet packets to obtain 128 nodes. Since the sampling frequency rate of EEG signals is 128HZ, the range of each subband is 1HZ. The signals (β+γ) bands are reserved and reconstructed to obtain the final EEG signals. Then, the (β+γ) band signals of the selected channel are constructed by the spatial filter based on CSP algorithm, which maximizes the energy difference of the spatial components of EEG data after spatial filtering. The variance of each line of the eigenvector matrix of two kinds of signals is extracted as EEG features.

4.2. Optimization Algorithms for Channel Selection

For swarm intelligence optimization algorithm, parameters determine the performance of the algorithm. In terms of parameter setting, we have carried out multivariate sensitivity tests. For example, the range of bat population size is set to from 10 to 100, incrementing by 5 each time; the range of pulse A0 is set to from 0.05 to 0.9, incrementing by 0.05 each time. By changing the parameters at the same time to observe its impact on the classification results, the following parameters are finally determined to achieve better classification results.

In BCI system, BADEBA was used to select electrodes. Equation (11) was used as fitness function. Fifty bats were selected and used. We set the maximum number of iterations to 500, the maximum pulse A0 to 0.25, the maximum pulse rate r0 to 0.5, the pulse frequency range to , and the pulse frequency enhancement coefficient γ to 0.05, the pulse frequency attenuation coefficient α to 0.95, the upper limit of scale factor to 0.9, the lower limit of scale factor to 0.1, the maximum crossover probability to 0.9, and the minimum of crossover probability to 0.1.

In BCI system, GA was used to select electrodes. Equation (11) was also used as fitness function. The parameters were as follows: 50 individuals were used, and the maximum number of iterations was set to 500, the crossover probability to 0.8, and the mutation probability to 0.01.

In BCI system, the binary particle swarm optimization algorithm [28] was used to select the electrode. Equation (11) was also used as fitness function. The parameters were as follows: 50 particles were used and the maximum number of iterations was set to 500. We set c1 and c2 to 2, r1, r2 and τ to be random numbers between 0 and 1. The t-th inertia weight is as follows:where was set to 0.9, was set to 0.5, and T represents the maximum number of iterations.

4.3. Experimental Results

Using binary adaptive differential evolution bat algorithm for channel selection, experiments were conducted on the two dimensions of valence and arousal. The results are shown in Figure 2: the left chart is the optimal channel in the valence dimension and the right chart is the optimal channel in the arousal dimension. As can be seen from Figure 2, in addition to some other channels, the prefrontal and occipital lobes are selected as the best channels. This suggests the effectiveness of the proposed framework. From neurophysiological knowledge, the brain regions of the prefrontal and occipital lobes are related to emotions.

Figure 2 shows that by minimizing the fitness function, the framework chooses eight channels in the valence dimension and seven channels in the arousal dimension. In order to verify the effectiveness of the framework, GA, BPSO, BA, and BADEBA are used for channel selection and comparison with the whole channel. The experimental results on the valence dimension are shown in Table 1.

As can be seen from Table 1, when validation sets are used for binary classification in the dimension of valence, 75.26% of the average classification accuracy is achieved when only 8 channels are used with BADEBA algorithm. When using GA, only 61.63% average classification accuracy is obtained by choosing 15 channels. As can be seen from Table 2, when validation sets are used for binary classification in the dimension of arousal, 75.98% of the average classification accuracy is achieved when only 7 channels are used with BADEBA algorithm. When using GA, only 60.76% average classification accuracy is obtained by choosing 14 channels. The results in Tables 1 and 2 clearly show that the proposed BADEBA is superior to other evolutionary algorithms.

According to the channels selected by the four algorithms, in order to find out which band of signals contributes most to classification results, we reconstruct the original signals in the β band, γ band, and (β+γ) band after decomposing the wavelet coefficients. We use the four optimized models to classify the emotions. The experimental results (Figure 3) show that the classification accuracy of signals in the (β+γ) band is better than that of signals in the β band or in the γ band alone. This shows that for certain emotions, the relationship between the two bands is neglected by using β band signal or γ band signal alone, thus reducing the classification accuracy. In contrast, the (β+γ) band signal contains some relationship between β band and γ band, thus improving the accuracy of classification.

In order to verify the performance of the proposed framework, we use the test set to evaluate the performance of the proposed model. The classification accuracy of the model optimized by BADEBA algorithm is 74.86% and 75.61% in the two dimensions of valence and arousal, respectively. The results of DEAP data classification are compared with those of other literature in Table 3.

As can be seen from Table 3, compared with the models proposed by Chung and Yoon, the proposed model in this paper not only reduces the channels, but also increases the classification accuracy. The number of channels selected by Dai’s model is less than that of the model proposed in this paper, but the accuracy is reduced. Therefore, the model proposed in this paper has a great potential value for application.

5. Conclusions

Based on the emotion recognition of EEG signals, a novel evolutionary optimization method is proposed to select the important channels and optimize the parameters of support vector machine to achieve higher classification accuracy. This paper has two main contributions. One is to combine the binary adaptive differential evolution algorithm with the bat algorithm to improve the diversity of the population, which not only has a strong global search ability in the early iteration, but also has a strong local search ability in the late iteration. The other is to improve the signal-to-noise ratio of emotion-related EEG signals through wavelet packet decomposition and reconstruction and then extract EEG features by CSP filtering for classification. The results show that the combination of signals in the (β+γ) band has higher classification accuracy than that in the single β band or γ band. Finally, the improved BADEBA algorithm is applied to the channel selection problem of emotion recognition based on EEG signals, and the validity and feasibility of the algorithm are proved by tests. How to use other intelligent algorithms to combine with binary differential evolution algorithm, such as grey wolf algorithm [29], and how to use other intelligent algorithms to combine with bat algorithm, such as flower pollination algorithm [30], to improve the performance of the model, can be used for future research directions.

Data Availability

The Deap dataset is a multimodal dataset for the analysis of human affective states. The electroencephalogram (EEG) and peripheral-physiological signals of 32 participants were recorded as each watched 40 one-minute long excerpts of music videos. Participants rated each video in terms of the levels of arousal, valence, like/dislike, dominance, and familiarity. The data used to support the findings of this study were supplied by Queen Mary University of London, United Kingdom, under license and so cannot be made freely available. Requests for access to these data should be made to Queen Mary University of London, United Kingdom, http://www.eecs.qmul.ac.uk/mmv/datasets/deap/index.html or email:[email protected].

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grant 61373116 and Grant 61572399 and the Project of Science and Technology Department of Shaanxi Province of China (Grant No. 2019ZDLGY07-08, 2018GY-013).