Abstract

It is a fact that contamination of EEG by ocular artifacts reduces the classification accuracy of a brain-computer interface (BCI) and diagnosis of brain diseases in clinical research. Therefore, for BCI and clinical applications, it is very important to remove/reduce these artifacts before EEG signal analysis. Although, EOG-based methods are simple and fast for removing artifacts but their performance, meanwhile, is highly affected by the bidirectional contamination process. Some studies emphasized that the solution to this problem is low-pass filtering EOG signals before using them in artifact removal algorithm but there is still no evidence on the optimal low-pass frequency limits of EOG signals. In this study, we investigated the optimal EOG signal filtering limits using state-of-the-art artifact removal techniques with fifteen artificially contaminated EEG and EOG datasets. In this comprehensive analysis, unfiltered and twelve different low-pass filtering of EOG signals were used with five different algorithms, namely, simple regression, least mean squares, recursive least squares, REGICA, and AIR. Results from statistical testing of time and frequency domain metrics suggested that a low-pass frequency between 6 and 8 Hz could be used as the most optimal filtering frequency of EOG signals, both to maximally overcome/minimize the effect of bidirectional contamination and to achieve good results from artifact removal algorithms. Furthermore, we also used BCI competition IV datasets to show the efficacy of the proposed framework on real EEG signals. The motor-imagery-based BCI achieved statistically significant high-classification accuracies when artifacts from EEG were removed by using 7 Hz low-pass filtering as compared to all other filterings of EOG signals. These results also validated our hypothesis that low-pass filtering should be applied to EOG signals for enhancing the performance of each algorithm before using them for artifact removal process. Moreover, the comparison results indicated that the hybrid algorithms outperformed the performance of single algorithms for both simulated and experimental EEG datasets.

1. Introduction

The functional dynamics of the brain have been thoroughly investigated over the course of many years using noninvasive brain imaging techniques [15]. Electroencephalography (EEG), for example, is a portable neuroimaging system that can be used to assess different functional brain states [69]. However, a recorded EEG signal is highly contaminated with nonneuronal activities from different sources including eye blinking, eye movements, muscle movements, and electrocardiography (ECG) [1015]. Eye movements and blinking generate high-magnitude artifacts as compared with the pure neuronal activity present in EEG data [1618]. Such interferences are commonly known as ocular artifacts [19, 20].

It is widely accepted within the BCI research community that in any BCI system, neurological phenomena are the only source of control [21, 22]. Artifacts, unwanted electrical signals that arise from sources other than the brain, can interfere with neurological phenomena. Such artifacts might alter the characteristics of neurological phenomena or even be mistakenly used as the source(s) of control in BCI systems [23]. Among the different artifacts, eye movement and blinks are the most important and major sources of physiological artifacts in BCI systems [2426]. If not removed, these artifacts could, as indicated above, be mistakenly used to control the BCI system, which is the most significant artifact-related problem [27]. As failing to deal with artifacts can result in deterioration of BCI system performance during practical applications, it is necessary to develop automatic methods to handle artifacts or to design BCI systems robust to them. Bashashati et al. showed that dealing with eye artifacts in EEG data can enhance the performance of a self-paced BCI system [24]. Erfanian and Mahmoudi used recurrent neural networks to automatically suppress ocular artifacts for improved EEG-based BCI performance [25]. Recently, Yong et al. combined stationary wavelet analysis with adaptive thresholding to automatically remove ocular artifacts from EEG data in an EEG- and eye tracker-based self-paced BCI system [26]. They showed that their system can achieve higher BCI performance than can BCIs in which artifacts are not removed. Furthermore, artifacts can also affect diagnosis and analysis in clinical research such as on sleep disorders, Alzheimer disease, and schizophrenia [2832]. It is therefore mandatory, in either clinical or practical research, to deal with these artifacts prior to the analysis of EEG signals.

Several manual and automated methods have been developed to deal with this challenging task. One straightforward approach to the reduction of ocular artifacts is to prevent eye movements as much as possible, though requiring this and achieving it are two very different things. Also, the specific request of avoiding blinks could affect the investigated states and cognitive process of the subject [33]. Another commonly employed solution is to discard those epochs of EEG data that contain ocular artifacts, though this can incur the loss of neuronal activity-related EEG data. Alternatively, several automated methods for detection and removal/reduction of ocular artifacts have been proposed such as blind source separation-based methods, wavelet transforms, regression-based analysis, and empirical mode decomposition. Among these, the most commonly employed, which are known as regression-based algorithms, are based on the removal of electrooculography (EOG) contamination from EEG data. The simplest and most common procedure for removal of ocular artifacts from EEG data entails the subtraction, from each EEG channel, of reference channel signals containing proper artifactual interference. Techniques of this kind were widely applied until the mid-1990s, due to their low computational costs and simplicity [20, 33]. Subsequently, researchers used EOG channels to record eye movement and blinking to efficiently remove ocular artifacts from EEG data. EOG-based methodologies assume that the true neuronal activity and ocular artifacts are present in a linear combination in acquired EEG signals. These methods employ regression-based analysis [10, 18, 3446], by which, in the time domain, the contamination coefficients of EOG signals are estimated and subtracted from each EEG channel to obtain clean EEG signals. Since being very low computation demanding, they are a great tool for real-time/online BCI applications. Although these methods have been proved to be more efficient than simple reference channel regression, their performance is highly affected by many factors. For example, neuronal activity from the frontal brain area, which EOG additionally measures, might be eliminated during the subtraction process, resulting in loss of true EEG signals [47]. Furthermore, these techniques are based on the assumption that EOG signals and neuronal activity recorded in EEG signals have no correlation, which has been found to be totally invalid [48, 49]. In order to overcome these issues, recently, regression-based algorithms have been combined with blind source separation techniques in the development of automated methodologies for removal of ocular artifacts [47, 50]. These methodologies have been shown to be more effective than either only regression- or blind source separation-based techniques, but they still lack the best results. The reason might be due to the fact that the outcomes are still affected by bidirectional contamination.

The main problem with regression-based techniques is that they are always affected by bidirectional contamination; for example, EEG recordings are contaminated as the results of eye movement and blinking, while EOG recordings are contaminated by neuronal activities (originating mostly from the frontal and lateral frontal areas) [18, 47, 50, 51]. Therefore, removal of ocular artifacts using EOG signals would also remove common neuronal activity present in both EEG and EOG data. In a modified version of the regression method, called filtered regression, the effects of bidirectional contamination are reduced by low-pass filtering of EOG signals prior to regression analysis [39, 43, 44, 52]. This idea is based on studies that have shown that high-frequency components in EOG channels are generated from brain activity [43, 52]. In Table 1 list of the different low-pass frequencies used by researchers, it can be seen that there is no consensus on any particular frequency of EOG signal, though there is agreement on the fact that the most of the low-frequency components in EOG signals belongs to the ocular artifacts [43, 44, 47, 51, 52]. Determining the optimal low-pass frequency for EOG signals is very important, as the outcomes for regression-based correction methods can be affected by the selected filtering frequency. To the best of our knowledge, no study has investigated the optimal low-pass EOG signal filtering limits for use with regression-based algorithms. Our hypothesis is that EOG signal filtering will enhance the process of artifact removal to reduce bidirectional contamination and if so, then what are the optimal low-pass frequency limits which will give better results from all other low-pass filtering.

In this study, we used simulated contaminated EEG and EOG datasets and motor-imagery-based experimental BCI datasets to investigate the effect of different EOG filtering on the removal of ocular artifacts from EEG data. We used 12 different low-pass EOG signal filtering and unfiltered EOG data along with five different methods from the literature, namely, simple regression [44], least mean square-based regression [53], recursive least squares based regression [45], REGICA [47], and the method developed in [51] (we reference it hereafter as automatic independent component analysis and regression (AIR)) to determine the effect of different EOG signal filtering on artifact removal from EEG data. Since the underlying artifact-free EEG (true EEG) in artificially contaminated EEG data is known; therefore, it is possible to evaluate the effect of each EOG filtering using different performance metrics. The performance evaluation indexes employed were the mean square error and the mean absolute error in the time and frequency domains, respectively. Additionally, mutual information was utilized to estimate the common information between the reconstructed EEG signal and the artifact-free EEG signal. The improvements in reconstructed EEG is also evaluated using signal-to-artifact ratio before and after the artifact removal process for all algorithms. For real EEG datasets, we evaluated the classification accuracies of each subject and each method after the artifact removal using different low-pass EOG filtering. Finally, paired -test was employed to the results of simulated and experimental datasets to find out the optimal EOG filtering with highest statistical significance. The results of this statistical testing revealed that best results from EOG-based algorithms could be achieved if low-pass frequency is used from 6–8 Hz. Furthermore, the results of both simulated and real EEG signals indicate that hybrid algorithms performed better than simple regression and adaptive filtering. A schematic diagram of the study is shown in Figure 1.

2. Materials and Methods

2.1. Materials

This section describes the detailed procedure used to simulate contaminated EEG and EOG datasets and real EEG datasets used from BCI competition IV.

2.1.1. Simulated Datasets

(1) Participants. Fifteen (15) healthy subjects (all male) participated in this study. All had normal or corrected-to-normal vision. The experimental protocol was approved by the Institutional Review Board of Pusan National University. Experiment was conducted in accordance with the ethical guidelines established by the Institutional Review Board of Pusan National University and the Declaration of Helsinki. Each participant was asked to sign an informed consent form after being thoroughly informed about the nature and purpose of the study. The experiments were performed in a quiet room with dim lighting to prevent environmental disturbances. Each participant was seated in an armchair at a distance of about 1 m from a 24″ LCD monitor (ASUS; resolution: 1366 × 768).

(2) Experiment 1. In this experiment, the participants were asked to sit relaxed and calm while keeping their eyes closed for 30 s. Each also was instructed to avoid moving their eyes during the experiment so as to avoid or at least minimize artifacts. After the experiment, the subjects’ data were carefully inspected for any presence of major artifacts; none was found. These datasets were then used as “clean EEG” signals for the purposes of further analysis. The data from this experiment will be referred as the “neuronal group” throughout in this paper.

(3) Experiment 2. The experimental protocol was as follows. At the start of the experiment, the subject was instructed to sit relaxed and calm for 3 s. Three different word cues (blink, move horizontally, and move vertically) were used. The subjects were asked to blink their eyes or move them vertically or horizontally according to any of six visual cues (2 cues for each word) that appeared for 2 s each at the center of the screen. The interval between the cues was 2 s. At the end of the experiment, each subject was again asked to relax for 3 s. The total duration of the experiment was 30 s. The data from this experiment will be referred as “artifactual group” throughout in this paper.

(4) EEG Recordings. The EEG data were acquired using an ActiCap 32-channel active electrode system with a BrainAmp DC amplifier (Brain Products GmbH, Gilching, Germany). The data sampling rate was 250 Hz. Nineteen (19) electrodes positioned according to the international 10–20 system (Fp1, Fp2, F3, F4, F7, F8, Fz, C3, C4, Cz, T7, T8, P7, P8, Pz, P3, P4, O1, and O2) were used for acquisition of EEG signals. AFz and FCz were used as the ground and reference electrodes, respectively. The impedance of all of the electrodes was reduced to below . The data were high-pass filtered at 0.5 Hz.

(5) EOG Recordings. The EOG data were acquired using the BrainAmp ExG system (Brain Products GmbH, Gilching, Germany). Four electrodes were placed around the left and right eye to record ocular activities. All of the data were sampled at a rate of 250 Hz. Table 2 lists the class-wise low-pass frequencies used in this study.

(6) Simulated Datasets. In the present study, in order to investigate the optimal frequency limits, we simulated 15 artificially contaminated EEG and EOG datasets. As the underlying true EEG signal in artificially contaminated EEG data is known, such data can be used as a primary tool to determine the optimal filtering for EOG signals. We utilized EEG data recorded in an eyes-closed session to simulate contaminated EEG. Whereas such data might contain low-frequency eye movement contamination, they are notwithstanding preferred, as they tend to contain minimal overall artifacts. The alternative to this is eyes-open data acquisition. Note though that human eyes produce much-higher-amplitude signals in light than in darkness [54]. In this sense, recording of EEG signals in an eyes-closed session is preferred. However, EOG signals were acquired in an eyes-open session with different eye movements. Simple linear models were estimated to calculate the parameters of contamination for both EEG and EOG. By doing so, we can simulate signals which are bidirectionally contaminated, that is, EEG data is contaminated with EOG signals and EOG data is contaminated with EEG signals. In this sense, we can obtain simulated signals as close as possible to real signals [43].

(6.1) Simulated EEG Signals. It is known that the recorded EEG signal contains pure activity from neurons, ocular artifacts, and measurement noise (artifacts from all other sources), as shown in Figure 2. In this paper, neuronal sources (EEG signals from the neuronal group) were artificially contaminated with ocular sources (EOG signals from the artifactual group) to simulate contaminated EEG signals [43]. These interferences were calculated by estimation of simple linear models between EEG and EOG recordings [44]. The detailed procedure is explained below.

Four 2 s epochs with high EOG actives were selected from each one of the fifteen subjects from the artifactual group. A simple linear model was estimated for each 2 s epoch and for each channel. These models have two inputs corresponding to and signals (where refers to signals from the artifactual group) and one output that was each one of the 19 channels. For a better estimation of models, neuronal activity in EOG due to bidirectional contamination is reduced in order to calculate accurate parameters of contamination. For this purpose, in each model, and channels, corresponding to outputs and inputs, respectively, were low-pass filtered with the cutoff frequency corresponding to the highest value of the 99% (99) of the total energy of these and signals [43]. Thus, the remainder 1% of signal energy was not considered as ocular activity (neural activity, power line interference, electrode noise, etc.) in the recordings [43]. The 99% cutoff frequencies obtained were 6.44 ± 2.43 and 7.61 ± 3.82 Hz for and , respectively, as mean and standard deviation for all epochs and subjects. This idea was based on and supported by the consideration that most components in the EOG signals related to the high frequencies are of neuronal source [52].

A linear model used to estimate parameters of EOG interferences for each EEG channel () and each epoch () is evaluated as follows. where and are the unknown model parameters, and is unknown error mapping. By this procedure, we got four and four parameters corresponding to all four epochs for each channel. These parameters were averaged ( and ) to obtain ocular contamination coefficients and for each channel. Finally, simulated EEG signals were generated according to Elbert’s contamination model [54] as where and are the artificially contaminated and pure EEG signals, subindex refers to the neuronal group, and is modeled as white Gaussian noise compensating for other noise sources. An example of simulated contaminated EEG data for electrode Fp1 is shown in Figure 3(a).

(6.2) Simulated EOG Signals. Ocular sources (EOG signals from the artifactual group) were artificially contaminated with neuronal sources (EEG signals from the neuronal group) to simulate contaminated EEG signals. Similar to simulated EEG signals, these interferences were calculated by approximation of simple linear models between EOG and EEG recordings. The detailed procedure is described below.

Four 2 s epochs with no apparent EOG actives were selected from all of the fifteen subjects from the neuronal group. Neuronal contamination of EOG channels was obtained from the frontal electrodes (Fp1, Fp2, F7, and F8), which are the nearest ones to the eyes [44]. A linear model for both and was estimated for each 2 s epoch. These models have four inputs corresponding to signals from Fp1, Fp2, F7, and F8 and one output that was each one of the and .

Two linear models used to estimate parameters of EEG interferences for and , and each epoch is evaluated as follows: where and are the unknown model parameters.

By this procedure, we got four and four parameters corresponding to all four epochs for and , respectively. These parameters were averaged ( and ) to obtain neuronal contamination coefficients and for each and , respectively. Finally, simulated EOG signals were generated as follows: where and are the artificially contaminated and , respectively. An example of simulated contaminated EOG data can be visualized in Figure 3(b).

2.1.2. Experimental Datasets

In this study, the datasets from nine healthy subjects were sampled from publically available MI-based BCI signals of BCI competition IV (datasets 2a). Four different MI classes including left hand, right hand, both feet, and tongue were performed by all subjects. The experiment consisted of training sessions and evaluation sessions. For all subjects, each session consisted of six runs with short breaks. For each subject, two classes from evaluation sessions corresponding to left- and right-hand MI were selected. One run consisted of twelve trails of each class, resulting in 144 trails for each subject. Twenty-two EEG and three EOG channels [55] were used to record the data with a sampling frequency of 250 Hz and 50 Hz notch filter. All EEG channels were recorded monopolarly with the left mastoid serving as reference and the right mastoid as ground. The signals were band-pass filtered between 0.5 and 100 Hz. More details can be found in [56]. The datasets were highly contaminated with ocular artifacts which is a challenging problem in practical BCI systems [57].

2.2. Methods

In literature, different artifact removal algorithms have been developed to deal with the ocular contamination present in EEG signals. Broadly speaking, these algorithms can be divided into three main categories, namely, EOG-based, non-EOG-based, and hybrid algorithms. The most commonly employed among those are EOG-based regression algorithms [18, 33]. Although these algorithms are simple and perform well as compared with manual rejection, they nonetheless cause EEG data distortion due to bidirectional contamination [47]. To improve the performance of simple regression-based algorithms, researchers developed adaptive filter-based regression algorithms. Most commonly used adaptive filters are least mean squares and recursive least square-based filters, and these methods proved to be more effective as compared to simple regression. On the other hand, non-EOG-based algorithms, for example, the ICA-based algorithms, do not require any EOG signals, though their removal of artifactual ICs might cause the loss of substantial neuronal data, which is their major drawback [58, 59]. Urigüen and Garcia-Zapirain [10] suggested that multiple-combination artifact removal methods can be developed to efficiently remove artifacts from recorded EEG signals. Recently, EOG-based algorithms have been combined with non-EOG-based algorithms (ICA) to more effectively deal with ocular artifacts present in EEG data [47, 50, 51]. These algorithms were shown to outperform all of the algorithms with which they were compared in terms of artifact removal and maintenance of neuronal activity present in EEG data; however, their performance can be further improved if optimal low-pass EOG filtering is used. In this study, we used all three kinds of EOG-based algorithms (simple regression, adaptive regression, and hybrid methods) to investigate the effect of different low-pass filtering on the removal of ocular artifacts from EEG data. One may argue that there are other methods like ICA that can be used to remove ocular artifacts from EEG data without the need of EOG signals, but irrespective of their other disadvantages these methods cannot be used for real-time/online BCI applications whereas regression-based methods (simple and adaptive) are simple and fast; therefore, it can be used as an optimal option for BCI applications if their performance is enhanced [60].

Next, we will briefly describe the implementation steps of the methods used in this study.

2.2.1. Simple Regression Method

The simple regression method is implemented as follows [44]: (1)Equation (3) in [44] was used to estimate the parameters of EOG signals.(2)Artifact-free EEG was reconstructed by subtracting estimated VEOG and HEOG from contaminated EEG.

2.2.2. LMS Regression Method

The LMS regression method is implemented as follows [53]: (1)Least mean square estimation was used to estimate the parameters of EOG signals.(2)Artifact-free EEG was reconstructed by subtracting estimated VEOG and HEOG from contaminated EEG.

2.2.3. RLS Regression Method

RLS regression method is applied as follows [45]: (1)Recursive least square estimation was used to estimate the parameters of EOG signals.(2)Artifact-free EEG was reconstructed by subtracting estimated VEOG and HEOG from contaminated EEG.

2.2.4. REGICA

REGICA is implemented as follows [47]: (1)EEG signals are independent component analysis (ICA) decomposed.(2)Independent components (ICs) are filtered using recursive least square estimation with reference EOG signal.(3)ICs are backprojected to reconstruct EEG signal.

2.2.5. AIR

AIR is implemented as follows [51]: (1)Contaminated EEG data are decomposed using ICA to obtain ICs.(2)Composite multiscale entropy and kurtosis are calculated to identify ocular-artifact-related ICs.(3)ICs are filtered using the linear regression model and extended recursive least mean squares.(4)Median absolute deviation is applied to remove any high-magnitude ocular artifacts left.(5)Artifact-free EEG data are obtained by backprojecting all ICs using inverse ICA.

2.3. Evaluation Indexes
2.3.1. Mean Square Error

In this study, the performance of each algorithm for each of the EOG signal filtering ranges was evaluated using the mean square error. It was defined as [41]. where is the reconstructed EEG and is the artifact-free EEG (EEG from the neuronal group).

2.3.2. Mutual Information

The amount of mutual information between the reconstructed EEG signal and the artifact-free EEG was calculated in order to analyze the utility of each method for recovery of neuronal activity-related EEG signals. Mathematically, the calculation proceeds are as follows [61]: where represents the joint pdf and and represent the marginal pdfs. The artifact-free EEG and reconstructed EEG were deemed to be closely related if and only if the mutual information values between them were large.

2.3.3. Signal-to-Artifact Ratio

Signal-to-artifact ratio is the metric commonly used to evaluate the improvements in the corrected EEG signal as compared to the contaminated EEG signal. We calculated signal-to-artifact ratio for contaminated EEG signals as follows [62]. where is signal-to-artifact ratio before artifact removal, and is the contaminated EEG signal. We also calculated signal-to-artifact ratio for corrected EEG as follows. where is signal-to-artifact ratio after artifact removal and is the corrected EEG signal. An effective artifact removal algorithm will remove all the artifacts and will have higher values and consequently . The gain in signal-to-artifact ratio can be calculated as follows.

The value is positive if signal-to-artifact ratio is improved, negative if signal-to-artifact ratio is decreased and zero if there is no improvement.

2.3.4. Mean Absolute Error

In order to measure the percentage distortion across the different frequency bands delta (0.5–4 Hz), theta (4–8 Hz), alpha (8–12 Hz), beta (12–30 Hz), and gamma (30–100 Hz), mean absolute error was defined as [51] where denotes the power spectrum density (PSD). PSD was estimated using the Welch method according to the following parameters: 200 sample points as the window length and 5 sample points as the overlap. The average PSD for each frequency band was calculated for all subjects.

Also, mean absolute percentage error to estimate the percentage distortion in each frequency band was defined as [43].

3. Results

This study investigated the effect of low-pass filtering on the removal of ocular artifacts from EEG data. According to the literature, different studies used different low-pass filtering for EOG signals ranging from 4 to 100 Hz. In this study, we used unfiltered and twelve different low-pass EOG filtering with simulated EEG datasets in an effort to find the optimal one from which best results could be achieved. For this purpose, we used five different methods from the literature with five performance evaluation metrics. Table 3 lists the average mean square errors of all of the simulated datasets and for all of the electrodes with each method. It can be seen that the mean square error was lowest when the low-pass filter from 6 to 8 Hz was used. Furthermore, the mutual information scores also were calculated in a time domain analysis; the results of which are shown in Table 4. Similar to the mean square error results, the average mutual information for all of the datasets and all of the electrodes was the maximum when one of the 6 to 8 Hz low-pass filter was used with a different method. Table 5 shows the average improvement gained in reconstructing artifact-free EEG by calculating signal-to-artifact ratio before and after the artifact removal. It can be seen that corrected EEG with every EOG faltering showed improved signal-to-artifact ratio but results of this analysis also indicate similar outcomes as in the case of mean square error and mutual information. Moreover, the investigation of the optimal filtering range was carried out also in the frequency domain, by calculating the mean absolute error and mean absolute percentage values for the different frequency bands. The effect of bidirectional contamination can be best observed and analyzed in frequency domain by evaluating the distortion produce in different frequency bands using different filtering. Results of frequency domain analysis are shown in Tables 610. Except delta band, mean absolute error for all bands was lowest when 6 or 7 Hz low-pass EOG filtering was used. In case of delta band, least mean squares and recursive least squares showed lowest errors with 6 Hz of low-pass filter; but in case of REGICA and AIR unfiltered EOG, data showed lowest errors. Moreover, Figure 4 depicts and compare the pure EEG and output EEG with different EOG filtering to analyze the effect of bidirectional contamination for time and frequency domain. It can be seen in Figure 4(a) through highlighted regions that different levels of distortions were introduced to EEG signal with different low-pass filtering of EOG signals. Specifically, when using high value low-pass filter (e.g., 35 Hz or unfiltered, magenta, and green line), the distortion in the neuronal signal is much more as compared to other low-pass filtering’s (e.g., 7 Hz, blue line). It can be seen that the reconstructed EEG from 7 Hz low-pass filtering (blue line) follows the true EEG (black line) very closely as compared to all other outputs. Furthermore, Figure 4(b) illustrates the effect of bidirectional contamination in frequency domain. It can be seen through the highlighted box that artifact-free EEG with 7 Hz successfully recovered the frequencies similar to true EEG signal whereas all other produced distortion in the frequencies. This advocate our hypothesis that bidirectional contamination could be reduced by using optimal low-pass filtered EOG signals.

We used paired -test to statistically compare results from all the metrics to find out if there are any differences in the outputs with different low-pass filtering. Before applying paired -test, as listed in Table 2, all EOG filtering frequencies used in this study were split into four classes which are 4 Hz (belongs to delta band), 5–8 Hz (belongs to theta band), 9–15 Hz (belongs to alpha and low-beta band), and 20 Hz–unfiltered (belongs to high-beta and gamma band). We divided this statistical testing into two steps. In the first step, we analyzed results of paired -test to select an optimal low-pass filtering class which showed minimum errors with significantly increased results from all four classes as listed in Figure 5. Mean square errors obtained in the range of 5–8 Hz were lowest with 6.81 ± 11.21 (averaged for all methods) when compared to 4 Hz (8.83 ± 15.06, ), 9–15 Hz (7.14 ± 11.60, nonsignificant difference), 20 Hz–unfiltered (7.65 ± 12.32, except for the LMS method). In case of frequency domain analysis in delta band, 5–8 Hz (MAE: 0.326 ± 0.14; MAPE: 3.89 ± 2.30%; ), 9–15 Hz (MAE: 0.329 ± 0.14; MAPE: 3.86 ± 2.16%; ), and 20 Hz–unfiltered (MAE: 0.330 ± 0.14; MAPE: 3.88 ± 2.12%; ) showed low errors when compared to 4 Hz (MAE: 0.398 ± 0.17; MAPE: 7.45 ± 8.73%). There was a nonsignificant statistical difference observed when all other ranges were compared (5–8 Hz versus 9–15 Hz, ; 5–8 Hz versus 20 Hz–unfiltered, ; and 9–15 Hz versus 20 Hz-unfiltered, ) for delta band. Five to eight Hz class (theta: MAE: 0.105 ± 0.03; MAPE: 3.44 ± 5.27%; alpha: MAE: 0.023+ 0.007; MAPE: 0.60+ 0.39%; beta: MAE: 0.004+ 0.001; MAPE: 0.73+ 0.57%) showed highly significantly increased results for theta, alpha, and beta frequency bands when compared with 4 Hz (theta: MAE: 0.279 ± 0.14; MAPE: 10.35 ± 13.66%; ; alpha: MAE: 0.020 ± 0.007; MAPE: 0.70 ± 0.60%; ; beta: MAE: 0.006 ± 0.001; MAPE: 0.86 ± 0.57%; ), 9–15 Hz (theta: MAE: 0.134 ± 0.04; MAPE: 3.86 ± 6.74%; ; alpha: MAE: 0.11 ± 0.03; MAPE: 3.51 ± 2.23%; ; beta: MAE: 0.015 ± 0.004; MAPE: 2.23 ± 1.84%; ), and 20 Hz–unfiltered (theta: MAE: 0.143 ± 0.04; MAPE: 4.14 ± 7.31%; ; alpha: MAE: 0.171 ± 0.05; MAPE: 6.39 ± 4.91%; ; beta: MAE: 0.100 ± 0.03; MAPE: 15.93 ± 11.40%; ). In case of gamma band, 4 Hz (MAE: 0.028 ± 0.008; MAPE: 0.19 ± 0.03%) showed significantly improved performance when compared with 20 Hz–unfiltered (MAE: 0.097 ± 0.03; MAPE: 0.99 ± 1.06%; ) and nonsignificant difference when compared with 9–15 Hz (MAE: 0.025 ± 0.01; MAPE: 0.11 ± 0.04%; ), but the errors were low for 5–8 Hz (MAE: 0.023 ± 0.009; MAPE: 0.10 ± 0.04%; ). Both 5–8 Hz and 9–15 Hz have nonsignificant differences for gamma band but showed statistically increased results when compared with 20 Hz–unfiltered (). Overall, in summary, 5–8 Hz outperformed 4 Hz for all metrics except for alpha band () and when compared with 9–15 Hz and 20 Hz–unfiltered showed highly significantly increased results for theta, alpha, and beta bands () and better results with nonsignificant difference for all other metrics. From this analysis, it can be concluded that 5–8 Hz perform better as compared to all other low-pass filtering classes. Next, we analyzed results of each low-pass frequency in 5–8 Hz range to check whether there is any single optimal frequency which showed lowest errors with high-level of statistical significance (Figure 6). In this analysis, no significant differences were observed between most of the metrics when they compared statistically. However, outputs from 6 Hz (MSE: 6.70 ± 11.00; theta: MAE: 0.086 ± 0.02, MAPE: 3.46 ± 6.03%; beta: MAE: 0.004 ± 0.001, MAPE: 0.72 ± 0.56%; gamma: MAE: 0.023 ± 0.009, MAPE: 0.098 ± 0.03%) low-pass filter show significantly increased results when compared with 5 Hz (MSE: 7.28 ± 12.01; theta: MAE: 0.140 ± 0.06, MAPE: 5.95 ± 10.15%; beta: MAE: 0.005 ± 0.001, MAPE: 0.77 ± 0.56%; gamma: MAE: 0.024 ± 0.008, MAPE: 0.10 ± 0.03%) low-pass filter results with in mean square error, theta, beta, and gamma bands (except for LMS in gamma band with nonsignificant difference), while in delta and alpha bands low significant differences were observed (). There was no significant difference observed between 6, 7, and 8 Hz except for theta and alpha band in 6 Hz versus 8 Hz () and theta, alpha, and beta bands in 7 Hz versus 8 Hz (). It can be concluded that results from 6–8 Hz showed statistically increased results when compared with outputs from all other low-pass filtering and unfiltered EOG signals; but overall, there is no significant difference observed in 6–8 Hz except for few cases. Furthermore, these results advocate our hypothesis that unfiltered EOG signals cause high bidirectional contamination and low-pass filtering should be applied to EOG signals before using them in artifact removal algorithms.

In this study, we also used real EEG datasets to verify the efficacy of the proposed framework. The five algorithms described above were applied separately to each subject’s data to remove ocular artifacts with four low-pass EOG filtering. These EOG filtering (4 Hz, 7 Hz, 12 Hz, and UF) were chosen such that there is one low-pass filtering from each class (Table 2). We compared the classification accuracies obtained after the application of each method with each EOG low-pass filtering. In BCI studies, common spatial pattern (CSP) is the most commonly used filtering technique to extract features from EEG signals [63]. Generally, the goal of the CSP is to find spatial filters by maximizing the variance of one class while minimizing the variance of the other to discriminate the two populations of EEG signals [64]. Finally, we used linear discernment analysis (LDA) for classification of the extracted features of the two classes due to its simplicity and low computational cost. For each subject and each artifact-free EEG obtained after using different EOG filtering, 6 runs of 6-fold cross-validation were used to calculate the classification accuracies. The 6-fold cross-validation randomly divides the data into six equal partitions and use five set of partitions for training and 1 set of partition for testing. This process was repeated for six times, and the average accuracy for each subject was calculated. The average classification accuracies of each subjects for all six sessions with artifact-free EEG data using different low-pass EOG filtering, and each method are listed in Table 11. Similar to the results of simulated signals, it can be visualized from Table 11 that results from the candidate of class II low-pass filtering showed highest classification accuracies (REG: 69.52 ± 4.52; LMS: 73.45 ± 2.04; RLS: 73.99 ± 2.75; REGICA: 77.46 ± 3.59; AIR: 76.85 ± 3.27) for all subjects and all methods when compared with the results from the candidates of all other classes (class I: REG: 66.43 ± 4.11; LMS: 70.13 ± 2.45; RLS: 70.83 ± 2.59; REGICA: 73.37 ± 3.64; AIR: 73.53 ± 4.71; class III: REG: 67.20 ± 6.09; LMS: 70.37 ± 2.84; RLS: 72.14 ± 4.65; REGICA: 74.53 ± 3.00; AIR: 73.30 ± 3.65; class IV: REG: 67.12 ± 3.75; LMS: 69.90 ± 2.40; RLS: 69.67 ± 4.48; REGICA: 73.07 ± 2.21; AIR: 72.91 ± 3.06). We further validated these results statistically using paired -test. This analysis revealed that classification accuracies obtained with 7 Hz showed highly significant results when compared with outputs from 4 Hz ( for all methods), 12 Hz ( for LMS, REGICA, and AIR and for REG and RLS), and unfiltered EOG ( for all methods). Furthermore, it could also be noted that hybrid methods demonstrated high-classification accuracies as compared to simple regression and adaptive filtering methods. These results from experimental EEG datasets verified the results from simulated EEG signals that a low-pass EOG filtering from 6–8 Hz could be used to remove artifacts efficiently. As it can be seen from Table 11 that classification accuracies obtained from all low-pass filtered EOG are higher than those obtained when unfiltered EOG was used. From Figure 7, the -testing also revealed that results from each low-pass filtering showed statistically increased classification accuracies as compared to the results from artifact-free EEG obtained after using unfiltered EOG signals. Therefore, these results also validate our hypothesis that low-pass filtering of EOG can be used to minimize the effect of bidirectional contamination problem.

4. Discussions

Many studies have shown that the performance of BCI applications can be reduced due to the presence of ocular artifacts in EEG data [2327]. Among different methods, EOG-based algorithms are simple and fast due to which could be used as a good tool for real-time/online BCI applications if their performance is enhanced, since it is highly affected by bidirectional contamination [10, 44, 47, 51, 60]. The simplest solution to this problem is low-pass filtering EOG signals before using them in artifact removal algorithm [10, 39]. In efforts to overcome the effect of bidirectional contamination, a number of studies have applied different low-pass filtering on EOG signals ranging from 5 to 100 Hz, but there is no consensus on which low-pass frequency should be used for optimal results. The idea of low-pass filtering EOG is based on studies that have demonstrated that high-frequency components in EOG signals are generated from brain activities [52], and this is supported by some studies [10, 39, 43, 44, 47, 51]. It has been previously shown that performance of simple regression-based algorithms can be improved by using low-pass filtered EOG signals (7.5 Hz) as compared to unfiltered EOG signals [43]. Thus, the performance of EOG-based algorithms could be highly affected and dependent on the low-pass EOG signal filtering. Various studies have utilized different low-pass frequencies for removal of ocular artifacts from EEG data [35, 4345, 47, 51, 65, 66]. Table 1 lists the different low-pass filtering used in those studies, note though that in literature, there is still no evidence on the optimal low-pass frequency of EOG signals. In this light, it is very important to investigate the optimal low-pass filtering for EOG signals before using them in artifact removal process, not only for efficient removal/reduction of artifacts but also for enhancement of the classification accuracies and communication rates of the current BCI systems.

In this study, we used unfiltered and twelve different low-pass frequencies to filter simulated EOG signals before using them in artifact removal algorithm. The frequencies with their categorization used in this study are listed in Table 2. EOG-based five algorithms from simple [44], adaptive [45, 53], and hybrid [47, 51] categories have been chosen to investigate the task. The performance of each algorithm was evaluated in both the time and frequency domains in order to reach a conclusion for optimal low-pass filtering of EOG signals. In the time domain, the mean square error, mutual information scores, and gain in signal-to-artifact ratio were used as evaluation metrics [41, 61, 62], whereas in the frequency domain, the mean absolute error and mean absolute percentage error were employed [43, 51]. The results for each algorithm with each evaluation metric are shown in Tables 310. Time and frequency results indicate that there is a reduction of ocular artifacts by using low-pass filtering on EOG signals as compared to unfiltered EOG outputs. However, for some low-pass filtering, corrected EEG signals specifically from the frontal area (e.g., Fp1) showed distortion in the neuronal component of the EEG signals. Since we argued throughout the paper that optimal filtering will reduce the bidirectional contamination and hence will result in efficient removal of artifacts from EEG data, therefore, we tried to analyze the effect of bidirectional contamination in Figure 4. It can be seen that by using filtered EOG, specifically 7 Hz (blue line), the distortion in EEG for both time and frequency domain is very less as compared to that of when using unfiltered or high values of low-pass filtering (e.g., 35 Hz, green line) of EOG signals. Moreover, the effect of bidirectional contamination can also be analyzed by observing the errors obtained using different performance metrics, that is, lower the errors means less effect of bidirectional contamination. Furthermore, statistical testing is utilized by means of paired -test to check any differences and improvements between all low-pass filtering results. From time domain metrics, results of paired -test were displayed only for mean square errors (mutual information and signal-to-artifact ratio also have similar results) and in the frequency domain for mean absolute errors of all frequency bands. In each low-pass filtering class, all the results were averaged before applying statistical testing. From Figure 5, the results of this statistical testing revealed that the frequency class 5–8 Hz have low errors and statistically significant results as compared to all other frequency ranges with in most of the cases (bold in purple color). Finally, we applied paired -test on results of each low-pass frequency belonging to 5–8 Hz class to see whether there are any differences in the results with these low-pass filtering of EOG signals. The results of this analysis are listed in Figure 6. This testing revealed that, although in most of the cases, there are no significant differences in the results (), but 6–8 Hz EOG filtering shows statistically increased results as compared to 5 Hz low-pass filter ().

Although, simulated signals are the primary tool to analyze the performance of algorithms, but validation with real EEG signals is the ultimate goal specifically for applications like BCIs. We also used MI-based BCI signals to further validate the results obtained through the comprehensive analysis using simulated datasets. Four low-pass EOG filtering, one from each group defined in Table 2 was selected to analyze the effect of artifact removal on the classification accuracies of the MI-based BCI. Classification accuracies were used as an evaluation metric to verify the results, that is, higher the classification accuracies means that the corresponding low-pass EOG filtering could be used as an optimal frequency class to remove artifacts from EEG signals and to overcome the bidirectional contamination problem. It can be seen that all methods show best results when 7 Hz low-pass EOG filtering was used to remove ocular artifacts. Furthermore, statistical testing is utilized by means of paired -test to check any differences and improvements between the classification accuracies obtained with each method and all low-pass filtering. The results of this analysis are shown in Figure 7. The results of this statistical testing revealed that the classification accuracies obtained after artifact removal from each method by using 7 Hz low-pass filtered EOG signals show a significant statistical increase as compared to 4 Hz, 12 Hz, and unfiltered EOG with in most of the cases (bold in purple color). Furthermore, results from all low-pass EOG filtering showed statistically significant results when compared to the classification accuracies obtained after the application of unfiltered EOG signals. These results from experimental EEG data not only validate the outcomes from simulated datasets but also support our hypothesis that low-pass filtering should be applied to EOG signals before using them for artifact removal to reduce the effect of bidirectional contamination.

The main focus of this study was to analyze the effect of different low-pass filtering of EOG signals on the removal of ocular artifacts from EEG data, but the classification accuracies for real EEG signals could be improved by incorporating more features and by using enhanced classifier. For instance, EEG signals can be divided into different subfrequency bands to calculate more CSP features for each subband. Also, it has been shown previously that the performance of many other classifiers like support vector machine (SVM) is better than LDA but at the cost of more computations. Although the present investigation can be considered helpful in optimizing EOG signal filtering, the comprehensive comparison of the performances of other algorithms with no need of EOG signals for artifact removal still remains to be determined. Furthermore, the performances of BCI systems with EOG-based and non-EOG-based methods also should be investigated to conclude the optimal method for BCI applications. Therefore, in our future studies, we will use simulated and experimental EEG to evaluate artifact removal and BCI performance by including more methods like independent component analysis, canonical correlation analysis, empirical mode decomposition, and wavelet transform.

5. Conclusions

The optimal performance of a BCI depends on the effective removal/reduction of ocular artifacts from EEG recordings. Since the efficiency of an algorithm’s removal of ocular activities is highly affected by bidirectional contamination, it is very important to use the optimal low-pass filtering for EOG signals in order to overcome/minimize the effect of bidirectional contamination. In the literature, there is still no evidence on the optimal low-pass frequency of EOG signals. In this study, we investigated the optimal EOG signal filtering for efficient removal of ocular artifacts from EEG data using fifteen artificially contaminated EEG and EOG datasets. Results from statistical testing of this investigation suggest that low-pass frequency from 6–8 Hz could be used as the optimal EOG signal filtering frequency for good results in terms of artifact removal and retrieval of true EEG signals. Furthermore, MI-based BCI datasets were utilized to validate the results of simulated signals. Classification accuracies obtained by class II showed statistically increased results as compared to results from all other classes. Moreover, the performance of each algorithm was enhanced by applying low-pass filtering to EOG signals before using them for artifact removal process. Overall, hybrid algorithms (REGICA and AIR) showed better performances as compared to regression and adaptive filtering methods for both simulated and experimental signals.

Data Availability

Data will be provided on request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the National Research Foundation of Korea (NRF), grant funded by the Korean Government (MSIP) (Grant no. 2015R1A5A1037668).