Abstract
Based on similarity measures in the wavelet domain under a multichannel EEG setting, two new methods are developed for singletrial eventrelated potential (ERP) detection. The first method, named “multichannel EEG thresholding by similarity” (METS), simultaneously denoises all of the information recorded by the channels. The second approach, named “semblancebased ERP window selection” (SEWS), presents two versions to automatically localize the ERP in time for each subject to reduce the time window to be analysed by removing useless features. We empirically show that when these methods are used independently, they are suitable for ERP denoising and feature extraction. Meanwhile, the combination of both methods obtains better results compared to using them independently. The denoising algorithm was compared with classic thresholding methods based on wavelets and was found to obtain better results, which shows its suitability for ERP processing. The combination of the two algorithms for denoising the signals and selecting the time window has been compared to xDAWN, which is an efficient algorithm to enhance ERPs. We conclude that our waveletbased semblance method performs better than xDAWN for singletrial detection in the presence of artifacts or noise.
1. Introduction
Braincomputer interface (BCI) research endeavours to provide new ways of communication for severely handicapped people by translating their brain activity into commands that can be used in a computer or other devices, without using the standard peripheral nerves and muscular pathways. In particular, braincomputer interfaces are control and communication systems that are designed to assist people with motor disabilities, such as people suffering from amyotrophic lateral sclerosis (ALS), spinal cord injury, multiple sclerosis, muscular dystrophies, and cerebral palsy. In this paper, we will focus on the noninvasive BCI [1].
Electroencephalography (EEG) is a noninvasive way of measuring over the scalp the electrical activity occurring as the product of the interactions of neurons in the brain [2]. EEG recordings are usually overlapped with noise and artifacts such as muscle activity. Their presence hinders EEG detection and requires novel methods to remove them from the underlying true brain signal. One important assumption about noise is that it is supposed to be independent of brain activity. It is assumed that brain signals are (almost) instantaneously recorded by the electrodes, implying that each recorded channel is highly correlated with the others. Accordingly, it is possible to conclude those signal components that are not correlated over channels are assumed to be noise or artifacts and can be removed from the recorded signals. A comprehensive survey of signalprocessing algorithms for EEG applied to the BCI can be found in [3].
Eventrelated potentials (ERPs) are neural activities generated involuntarily as a consequence of the occurrence of an expected but rare event. A deflection appears in the EEG signal with a specific polarization and latency. For example, P300 is a cognitive ERP with a positive peak at 300 ms after the stimulus. ERP can potentially be detected with signalprocessing techniques and used as a control command in BCI applications, such as the very wellknown P300 speller proposed by Farwell and Donchin [4].
Unfortunately, major obstacles still exist in the use of EEG for braincomputer interfaces: signal changes to be detected are very small and high noise such as the signal depletion due to the skull or muscular artifacts is present. To enhance and detect the ERP response, it is necessary to repeat the stimuli and average the responses; however, this reduces the information transfer rate. Several efforts to reduce the number of averaged trials or to directly perform singletrial ERP detection have been made [5–8]. The intention of improving the singletrial classification performance ranges from increasing the database, adding artificial trials [9], where original data are deformed to create new artificial patterns, to EEG source reconstruction or transfer learning methods, where data from multiple users are used to train classifiers to improve the BCI system [10, 11]. The singletrial approach saves time, but the signaltonoise ratio (SNR) is very low, making ERP detection difficult. On the contrary, when the stimuli are repeated to enhance the ERP detection, these repetitions may become tedious and tiring for the user, and the averaging technique decreases the speed of the spelling. Another problem related to the average of repetitions is the assumption of stationarity. Latency jitter, amplitude variability, or phase artifacts between single trials can cause a flattening up to the elimination of transient characteristics [12, 13].
Usually, in P300based BCI systems, a temporal window is manually selected [14]. This window is usually chosen to be large enough (within a range of [0, 1] second) to ensure including the ERP components under the study, independently of the user reaction time to the stimulus. However, the ERP responses have different latencies (and amplitudes) for each subject, which comprise irrelevant data to be covered in the temporal window. This can increase both the difficulty of training a classifier with irrelevant variables and the complexity of detecting the ERP. The variance among trials can provide information on the subject’s cognitive state, allowing comparisons to be made between subjects [15].
Several studies have shown that it is possible, yet difficult, to distinguish singletrial signals from the EEG background [16–18]. A popular technique is the xDAWN algorithm [19] which automatically enhances the ERP for classification by using spatial filters, combining the multichannel information to put aside useless components. An exhaustive comparative study of several classification techniques is given in [20]. A review of stateoftheart methods for singletrial detection of eventrelated potentials can be found in [21]. Finally, a complete survey of the issues that should be considered when designing a new P300based BCI paradigm can be found in [22].
Wavelets theory has been used in several studies for P300 detection [23–25]. In particular, some advances using wavelets for singletrial detection can be found in [16, 26], where automatic denoising methods are recommended. The fundamental hypothesis of wavelet denoising is that large coefficients correspond to the signal and small coefficients correspond to the noise [27]. The problem with current methods is that they can only denoise one channel at a time, regardless of the information on other channels. This causes them to lose the information provided by the ensemble, such as phase and amplitude information. In ERP studies, the most common mother wavelets that are used are the quadratic Bspline [28–30], the Daubechies wavelets Db4 and Db8 [31], the Symlet wavelet [31, 32], and Coiflet [33]. For example, in [34], a singletrial P300 detection algorithm is presented based on independent component analysis (ICA) and wavelets. Nevertheless, despite these advances, singletrial P300 detection still needs to be improved before it can be made more available for the general public.
In this paper, we introduce a novel method to denoise, localize, and isolate ERPs combining two approaches based on wavelet theory. This formalism is used to study singletrial brain signals based on similarity measures. The first approach simultaneously denoises the signals by using the phase information provided by all the channels in a single trial. Afterward, the second approach combines the phase and the amplitude information of the signals to optimize the time window of the ERP for each user.
The rest of this paper is organized as follows: In Section 2, we presented wavelet theory and semblance analysis to introduce our proposal of using the correlated information of recorded channels to remove noise and automatically establish an appropriate time window for the analysis of each subject. The results are provided in Section 3 and the discussion in Section 4. Finally, our conclusions are given in Section 5.
2. Materials and Methods
2.1. Wavelet Transforms
The wavelet transform is the inner product of a signal with scaled and shifted versions of a mother wavelet function [35]. The continuous wavelet transform (CWT) uses a continuous wavelet function for the signal analysis:where and change continuously and is the complex conjugate of . The CWT coefficients measure the variation of in a neighborhood of point , whose size is proportional to , obtaining a mapping of a onedimensional signal into a twodimensional space.
On the contrary, the discrete wavelet transform (DWT) uses filter banks to obtain a multiresolution timefrequency representation. More precisely, the discrete orthogonal wavelet decomposition is obtained using a discretised scale and translation.
2.2. Semblance Analysis
Wavelet analysis is also useful for bivariate analysis, making it possible to study two different signals to discover the relationship between them. Crosswavelet analysis allows us to find the mutual characteristics between signals using the available information in the wavelet transform. The crosswavelet spectrum [36] of two different signals and is defined by their wavelet decompositions and as follows:where is a complex value and can be decomposed into amplitude and phase .
Semblance analysis [37] was introduced to compare two given signals and based on the phase correlations between their wavelet decompositions and using θ:where is an odd integer that is greater than zero. The reason why should be odd is to preserve the sign of the cosine. The use of large numbers for also produces a sharp semblance response, as demonstrated in [37].
The values of correspond to the phase correlation between the two signals, where means they are fully correlated, means they are fully inversely correlated, and means they are not correlated. Also, it is possible to analyse the signal’s amplitudes combining the phase information with the amplitude information as follows:
2.3. Multisemblance Analysis
The semblance concept was extended to compare more than two signals at the same time. This measure is called the mean resultant length (MRL), and it was presented by Cooper in [38] based on circular statistics [39]. The MRL can be calculated according to the number N of signals treated, at each scale a and time t:
For more than two signals, the inversely correlated concept does not apply. This is verified by the MRL values ranging from 0 for uncorrelated signals to 1 for correlated signals.
2.4. Denoising Based on the Similarity in the Channels
We propose to denoise EEG signals considering the information of all channels based on their phase and correlations within the DWT transform. Let X be the matrix containing the whole dataset, and let be the signal recorded by the electrode, , at time t, . The matrix of the recorded EEG signals can be defined as . The denoising, through thresholding, can be done using the MRL coefficients, i.e., using coefficients obtained for all channels (equation (6)), instead of the individual channel coefficients. We prune all the coefficients that are below a given threshold to zero in order to reconstruct the signal using the filtered wavelet coefficients. The MRL computation is done through the combination of the phase angles of the real and imaginary parts of the wavelet decomposition. DWT uses wavelet families that are orthogonal to each other so that the imaginary part can be achieved by the Hilbert transform of the channel [38].
In simple words, we keep those components with high similarity between channels to produce a denoised EEG signal. This novel approach is called multichannel EEG thresholding by similarity (METS), and the full wavelet denoising process is described in Algorithm 1.

2.5. Temporal Window Selection
We propose to compute a variable time window, by comparing the averages of the target and nontarget signals. Two alternatives are proposed: by channels or by the grand averages over channels. The purpose is to have a better expressiveness of both amplitude and phase to isolate the ERP wave. The hypothesis is that the ERP is not correlated with the EEG background activity, and this should be reflected in the semblance analysis.
Our approach independently computes the correlation between the averages for each channel or the grand averages over channels, using the continuous wavelet transform, to obtain a continuous trend in time of the correlations by each scale. According to Kolev et al. [40], the most significant differences between target and nontarget responses in the frequency domain are found in delta and theta brain rhythms. Consequently, the scales that are used to analyse the signal averages were selected according to these rhythms. Figure 1(a) presents the dot product obtained by applying equation (5) to the grand average where it is easy to identify the less correlated components in cold colors and the more correlated components in warmer colors. These warmer colors correspond to the EEG background, yet the correlation is not uniform between scales. It is possible to localise the ERP (in blue) within a thinner time window compared to the original. However, the ERP is not present in all scales, making difficult an automatic detection of this temporal window. Fortunately, the ERP produces a high variability over the scales at a specific time on the dot product D given by equation (5), which leads to analysis of the standard deviation for each time point. As the magnitudes of the wavelet coefficients can be different for each subject, we normalise the standard deviation between 0 and 1. Finally, we select the temporal window utilising a predefined threshold, as shown in Figure 1(b). This technique is completely independent of the previous denoising step introduced in Section 2.4, although better results are expected if both techniques are used together.
(a)
(b)
2.5.1. SemblanceBased ERP Window Selection by Channels
Let the signals be denoted by , where c corresponds to the channel and , and be the set of all the stimuli, , in which corresponds to the subset of target stimuli and corresponds to the subset of nontarget stimuli. The responses to a stimulus in a predefined temporal window of size can then be extracted as follows:where corresponds to the stimulus onset i. The average response for each type of stimulus, and for each channel, can be computed as follows:in which the operator denotes the cardinal number. After obtaining the averages, we compute their continuous wavelet transforms and to calculate the dot product D through equation (5).
In the example given in Figure 1, the result obtained for the dot product D corresponds to Figure 1(a), where the cold colors indicate that the signals have a maximum difference, and P300 is located in the spatial space. The normalised standard deviation of D is shown in Figure 1(b). By using a threshold , , the new temporal window, for the current channel, is selected within limits and obtained through a sequential search of the first points that exceed the threshold from both the left and right boundaries. We named this algorithm “semblancebased ERP window selection by channels” (SEWS1), and Algorithm 2 describes the complete window selection process.

2.5.2. SemblanceBased ERP Window Selection over Channels
A variation of this model is used to perform the semblancebased ERP window selection over channels (SEWS2), as summarised in Algorithm 3, where the same temporal window is selected for all channels. To do this, the only difference in the process is to compute the grand average through all of the channels using

Further information about algorithms and methods can be found in [41].
2.6. Databases
2.6.1. UAM Dataset
We employ a P300 dataset [42] recorded by the Neuroimaging Laboratory of Universidad Autónoma Metropolitana (UAM), Mexico, using the P300 speller [4] included in the BCI2000 platform [43]. The dataset (http://akimpech.izt.uam.mx/p300db/; https://tinyurl.com/ycr52v9b) contains 22 firsttime healthy users, and all subjects have similar characteristics, such as sleep duration and age. A total of 10 electrodes were recorded (Fz, C3, Cz, C4, P3, Pz, P4, PO7, PO8, and Oz) providing the best features for discrimination [20, 44]. The signals were recorded at 256 Hz using a g.tec g.USBamp EEG amplifier, a right ear reference, and a right mastoid ground. An 8th order 0.1–60 Hz Chebyshev bandpass filter and a 60 Hz notch were used. The stimulus is highlighted for 62.5 ms with an interstimulus interval of 125 ms. In order to validate our algorithms, we use session 1 (copy spelling session) to train a classifier and session 3 (free spelling session) to test the generated models. This choice was made because both sessions do not use feedback and because the detection methods have to be robust under different conditions; that is, the algorithms will be studied under a multisession scheme. From a singletrial perspective, the dataset contains 240 letters per person for session 1 and 250 letters approximately for session 3 because the number of letters varies depending on the subject. Finally, considering that, for the detection of one letter, it is necessary to identify two P300s, i.e., detecting twice 1 ERP among 6 responses, and the letter detection by chance is 1/36.
2.6.2. EPFL Dataset
This dataset (for further details about subjects, see http://mmspg.epfl.ch/BCI_datasets) [45] was recorded by the Ecole Polytechnique Fédérale de Lausanne utilising six different images to evoke a P300 response. The images were individually and randomly flashed for 100 ms with an interstimulus interval of 400 ms. 32 electrodes were recorded (Fp1, AF3, F7, F3, FC1, FC5, T7, C3, CP1, CP5, P7, P3, Pz, PO3, O1, Oz, O2, PO4, P4, P8, CP6, CP2, C4, T8, FC6, FC2, F4, F8, AF4, Fp2, Fz, and Cz) using a BioSemi ActiveTwo system at a sampling rate of 2048 Hz.
3. Results
This section presents the experiments for evaluating the performance of the new methods, which are introduced by contrasting them with the xDAWN algorithm and the state of the art on wavelet thresholding techniques.
3.1. Experiment: METS vs. Classic Wavelet Methods
Table 1 shows the results of our denoising algorithm METS compared with the results of the classic thresholding methods. The baseline results only using a bandpass filter of [0.1–20] Hz are also presented. All thresholding techniques improve the baseline result on average, meaning that these techniques are actually denoising the original signals and they do not remove the relevant information. We can observe that the performance of the subject with the maximum value slightly decreases using METS. However, our algorithm is, in general, able to remove the noise from the signal’s subjects. This improves the results obtained by the classic thresholding techniques based on wavelets. Moreover, the standard deviation is less in our approach, meaning that the increase in the mean is global and not due to the improvement of subjects with best results. In fact, the maximum value is the same as that obtained by minimax and universal thresholds, while the minimum is improved by METS.
3.2. Experiment: METS vs. SEWS Algorithms
In this section, we apply our two main contributions, the SEWS techniques and the METS denoising algorithm, to analyse the impact of a smart selection of a thinner window.
The time windows for the subjects range within [0–133] ms for the starting point and within [762–1000] ms for the ending point. The mean and standard deviation for the starting point of the window for METS & SEWS1 are and for METS & SEWS2 are . The mean and standard deviation for the ending point of the window for METS & SEWS1 are and for METS & SEWS2 are . Finally, the window sizes for METS & SEWS1 are and for METS & SEWS2 are .
Table 2 shows the impact of using the METS algorithm as the previous step for the timewindow selection. The threshold values for the algorithms are for METS, for SEWS1, and for SEWS2, according to the previous study [41].
3.3. Experiment: Comparison with xDAWN Algorithm
The xDAWN algorithm plays a similar role to the algorithms proposed in this paper because METS also uses the multichannel information.
Table 3 presents the results of xDAWN, METS & SEWS1, and METS & SEWS2, for each subject. The minimum sample size for each subject was 2160 trials. Note that the baseline performance is for a random detection. The preprocessing of the signal was performed according to the specifications in [19]. The only difference from our experimental setup is the high cutoff filter frequency of 1 Hz, instead of 0.1 Hz. For xDAWN, three sources (channels) were used because previous studies have shown that the best result was obtained using this configuration [41].
To check the normality assumption, we applied the Shapiro–Wilk pairwise test with a significant level . For the combination of xDAWN with METS & SEWS1 and xDAWN with METS & SEWS2, the difference of performances is not normally distributed. On the contrary, for the difference of performances between METS & SEWS1 with METS & SEWS2, we cannot reject the normality assumption. Finally, the performance of the METSSEWS algorithms is statistically greater than that of xDAWN according to the Wilcoxon signedrank test with a significant level . However, the METS & SEWS1 and METS & SEWS2 have no statistically significant difference according to the ttest ().
Table 4 summarises the statistics of the letter accuracy for xDAWN compared to our algorithms. Our proposed algorithms obtained the best results, showing improvements in mean, standard deviation, and maximum and minimum values. Surprisingly, the mean results for xDAWN do not improve the baseline of 52.50% (see Table 4) obtained with the [1–19] Hz filter, achieving good results for subjects with high performances and very low results for subjects with low performances, as shown in Table 3.
Table 5 shows the results corresponding to the evolution of the algorithms when using different numbers of repetitions, going from a single trial up to five repetitions. It is possible to observe that METS & SEWS2 perform better than METS & SEWS1 using repetitions. As was expected, METS & SEWS perform better than xDAWN for a few (two and three) repetitions. For four repetitions, the results are very similar, and for five repetitions, xDAWN outperforms our algorithms.
The results for the EPFL dataset are reported in Table 6. To check the normality assumption, we applied the Shapiro–Wilk pairwise test with a significant level . For all the combination of algorithms, we cannot reject the normality assumption. The performance of the METSSEWS algorithms is statistically greater than that of the baseline and the METS algorithms according to the ttest with a significant level . However, METS & SEWS1 and METS & SEWS2 have no statistically significant difference according to the ttest (). It is important to note that the combination of METS and SEWS offers an improvement for almost all the subjects. Specifically, SEWS2 is more effective for disabled subjects (i.e., the first four subjects). Moreover, it obtains an impressive improvement from 45.31% to 62.50% for the healthy subject numbered s8.
4. Discussion
The results show that the introduction of the channels’ correlation in the process and the fact that the channels are processed as a whole improve the denoising step for the classification. The thresholds in classic methods are selected using statistical techniques to infer a suitable threshold for each subject, while for METS, we used a fixed threshold for all subjects. Thus, METS could be extended to use a similar automatic threshold selection to obtain better results.
Although the improvements compared to the classical wavelet methods may seem modest in terms of absolute values, we should take into consideration that the probability of detecting a letter in the P300 speller, by chance, is only 1/36 (detecting twice 1 ERP among 6 responses), in contrast to the probability of randomly detecting a single ERP of 1/2. This means that there is a high probability of missing a letter that was correctly detected (0.97) because of randomness and only a small chance of correctly detecting a letter (0.03) that was previously misclassified.
Also, the METS algorithm is flexible enough to exploit the specific properties of the analysed phenomenon through the parameter selection. In fact, the mother wavelet is one important parameter. Unfortunately, there is no formal method to choose a mother wavelet. It can only be chosen by its resemblance in shape to the underlying target phenomenon or by their properties.
Table 3 suggests that the data of subjects with high accuracy are less noisy or have stronger ERP responses, making the classification results similar for all methods. However, for a more general solution that can deal with signals from subjects with poor original performance, either because they are firsttime users or because they do not generate a strong P300 response, our approach seems more suitable. Therefore, we conclude that waveletbased semblance performs better than xDAWN for singletrial detection in the presence of artifacts and noise. Regardless, xDAWN should have a better performance using averaged responses. These results are consistent with the theory behind both algorithms: xDAWN improves with more repetitions because averaged ERPs are easier to enhance due to the increased SNR, while the effect of METS becomes moderate because averaging naturally removes uncorrelated components. Nevertheless, xDAWN and our proposed algorithms are not mutually exclusive, which means that their clever combination could be developed in the future.
As is explained in Section 2.5, a temporal window must be set to specify the segment to be analysed. The removed information usually lies at the end of the original temporal window, beyond the segment where P300 theoretically lies. Both SEWS algorithms improve the results compared to the results of only using METS. When comparing with the baseline filter results, the combination of our two independent algorithms significantly improves the ERP detection in a single trial and only incurs a small computational cost compared to the time required for the training and classification phases. In particular, SEWS1 obtains the best maximum and minimum accuracy, but, on average, both window selection algorithms are competitive. Besides, a better SNR makes the classification easier and implies that useless features removed by SEWS have less impact on the decision boundaries.
The experiment using the EPFL dataset validates our techniques for singletrial ERP detection not only because it is a different dataset with a different display matrix and patients but also because it uses the same parameters used for the UAM dataset. Finally, better performances are expected if the parameters (like the thresholds) are fine tuned for this dataset.
5. Conclusion
We introduced two automatic methods in this paper. The first method aims to simultaneously denoise channel recordings, which allows us to detect signal components that are not present in all channels and improve the analysis of EEG signals in general. The second method uses the continuous wavelet transform to analyse ERP’s time windows. The core element of the timewindow selection algorithms is the dot product D, and the techniques presented in this paper only explore a few of the possibilities of what can be done using D. The standard deviation exploits the information carried by D. A more elaborate or detailed method could be developed in the future to increase the performance or robustness of the timewindow selection. Moreover, our analysis of the similarity of signals based on wavelets opens several research opportunities for EEG applications. For example, the same principle behind the timewindow selection can be applied to detect constant scales through the full signal to select the most useful frequency bands for the problem under study.
Finally, these similarity measures can be applied to the study, and comparison of the EEG behavior of different subjects for the same application can be applied to help us better understand the physiological responses of the brain or to develop more robust BCI techniques.
Data Availability
The data used to support the findings of this study are available at http://mmspg.epfl.ch/BCI_datasets and http://akimpech.izt.uam.mx/p300db/.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
The authors acknowledge the support from CONICYT–PAI/Concurso Nacional Inserción En La Academia, Convocatoria 2014–Folio (79140057) and Proyectos REDES ETAPA INICIAL, Convocatoria 2017 (REDI170367).