Abstract

Attention is the ability to facilitate processing perceptually salient information while blocking the irrelevant information to an ongoing task. For example, visual attention is a complex phenomenon of searching for a target while filtering out competing stimuli. In the present study, we developed a new Brain-Computer Interface (BCI) platform to decode brainwave patterns during sustained attention in a participant. Scalp electroencephalography (EEG) signals using a wireless headset were collected in real time during a visual attention task. In our experimental protocol, we primed participants to discriminate a sequence of composite images. Each image was a fair superimposition of a scene and a face image. The participants were asked to respond to the intended subcategory (e.g., indoor scenes) while withholding their responses for the irrelevant subcategories (e.g., outdoor scenes). We developed an individualized model using machine learning techniques to decode attentional state of the participant based on their brainwaves. Our model revealed the instantaneous attention towards face and scene categories. We conducted the experiment with six volunteer participants. The average decoding accuracy of our model was about 77%, which was comparable with a former study using functional magnetic resonance imaging (fMRI). The present work was an attempt to reveal momentary level of sustained attention using EEG signals. The platform may have potential applications in visual attention evaluation and closed-loop brainwave regulation in future.

1. Introduction

Attention is generally a core function in human cognition and perception [1]. Sustained attention refers to a cognitive capability to maintain focus during a task [2]. Deficits in attention are commonly seen in various brain disorders such as Alzheimer’s disease (AD) and related dementia, Traumatic Brain Injuries (TBI), and Posttraumatic Stress Disorder (PTSD) [35]. Improvement in attentional states may assist these populations in boosting cognitive and perceptual functions such as working memory [68]. Recently, a number of studies have evaluated attentional states, most of which utilized blood oxygen-level dependent (BOLD) signal collected by functional magnetic resonance imaging (fMRI) [9, 10]. Using fMRI technology, Rosenberg et al. [11] suggested that the whole-brain functional connectivity could be a representation of a robust neuromarker for sustained attention. Rosenberg and colleagues [11] used images of city and mountain scenes as visual stimuli. Since face-like visual stimuli undergo specialized processing in human brain compared to other non-face objects (see Kanwisher et al. [1215]), face images were employed in numerous brain studies particularly studies on attention. Cohen and Tong [16] investigated instantaneous object-based attentional states in individuals who attended to a sequence of blended images of face and house using fMRI. By employing the same brain imaging technique, deBettencourt et al. [2] designed a closed-loop attention training paradigm in which the transparency of an image category (face or scene) inside a sequence of blended images was adjusted based on the participant’s decoded attentional level to the primed image category. Although fMRI has superior spatial resolution, the technology has several limitations for practical use in a real-time neurofeedback training system. fMRI measures the blood oxygenation level dependent changes caused by the hemodynamic and metabolism of neuronal responses [17]. Even though there is a significant correlation between neural activity and the underlying BOLD signal, the vascular response of the brain is much slower than the underlying synaptic activity [17]. Instead, electroencephalography (EEG) is a convenient solution given direct measure of neural activity at milliseconds time precision. Additionally, a wireless EEG headset can make Brain-Computer Interface (BCI) at an individual level simpler to use [18]. EEG has been used as a low-cost brain imaging technique for attention evaluation and training by many researchers [19]. A number of EEG paradigms have been suggested in connection with attention deficit and attention enhancement [18, 19]. In 1976, Lubar and Shouse [20] pioneered the EEG-based neurofeedback training for patients with attention disorders.

Neurofeedback training has been studied extensively in children with Attention Deficit/Hyperactivity Disorder (ADHD) [2123]. Several studies have focused on developing EEG-based BCI systems to evaluate momentary attentional levels and utilized it in a closed-loop structure for attention training [24] (for a review see [1]). Among these attention training studies, some reports were on regulating the sensorimotor rhythms (SMR), i.e., theta, and alpha bands which have direct connections to memory and attention training [25]. Lim et al. [26] developed a BCI system for reducing attentional impairments in children with ADHD. Lee et al. [27] suggested a platform for improving working memory and attention in healthy elderly people. Cho et al. [28] conducted a study on the attention concentration in poststroke patients using beta waves. In a recent offline work, List et al. [29] explored the spatiotemporal changes in EEG to classify perceptual states (e.g., faces versus Gabors) and analyzed the scope of attention (e.g., locally versus globally focused states). Finally, Sreenivasan et al. [30] used event-related potential (ERP) analysis to show that attention to faces in composite images with different transparency of face images can modulate perceptual processing of faces at multiple stages of processing. Nevertheless, this study did not report the EEG classification results when the subjects attended to both categories of image in the overlapped images in separate blocks of training. We hypothesize that scalp EEG data collected from an individual who attends to the elements of a complex stimulus contains relevant and distinguishable pattern. The majority of previous studies focused on identifying the level of attention in a participant without knowing which visual stimulus the sustained attention is devoted to.

The objective of the current work is to develop a clinically convenient and portable neurotraining platform for simultaneously monitoring visual attentional states to two categories of images in a single-trial basis using whole-brain activity. By adopting the paradigm from an fMRI study [2], we aim to analyze instantaneous attentional states by introducing a novel EEG-based BCI platform. We evaluated the proposed platform within a pilot study with a group of participants. We primed them with two categories of face and scene images while a sequence of composite images (face and scene) was displayed. In each block, we asked them to focus their attention to only one image category. A machine learning technique is introduced for EEG classification. The technique takes advantage of a combined feature set of neural oscillations (SMR) and ERP. The analysis of brain signals collected from the whole brain of all participants suggests the existence of an individualized attention neuromarker [11]. Additionally, the results may advocate the presence of a common attention neuromarker among our pilot sample data. The results of the study could also suggest that the platform has potential application in a closed-loop attention training by adjusting the transparency of image categories inside the overlapped images based on devoted attentional level to the instructed category [2].

2. Materials and Methods

This section covers the materials and components of the integrated BCI platform. The experimental protocol used to collect the scalp EEG from multiple participants during a sustained attention task is also described. Subsequently, the applied techniques for decoding EEG data are explained in detail.

2.1. Development of the BCI Platform

The BCI platform consists of a wireless EEG headset, a workstation computer with dual monitors, data acquisition, and analysis software. Figure 1 shows a simple schematic of the platform’s components and the direction of data flow among components. A Graphic User Interface (GUI) was developed to allow a practitioner to conveniently administer the experimental protocol.

2.2. EEG Recording Device

EEG signals were acquired using a wireless headset called Emotiv EPOC [31]. The Emotiv EPOC headset has 14 channels of EEG electrodes located based on 10-20 international system covering the frontal, temporal, and occipital lobe regions. The exact locations are labeled sequentially as AF3, F7, F3, FC5, T7, P7, O1, O2, P8, T8, FC6, F4, F8, and AF4. The sampling frequency was set to 128 Hz. By applying a high-pass filter with cut-off frequency at 0.2 Hz and a low-pass filter with cut-off frequency at 43 Hz, the device collects band-pass brain signals and transmits the participant’s brain signals to the PC with a Bluetooth connection. The research edition of the headset provides access to the raw EEG signals for further analysis. The same headset has been used for motor assessment and rehabilitation in a number of previous BCI works [32, 33]. Use of wireless headsets offers the potential for applications in real-life settings [34, 35].

2.3. Interface

We performed our data acquisition and analysis in MATLAB and Simulink. The visual stimulation and designed protocol were also controlled by MATLAB and Simulink through a customized Graphic User Interface (GUI).

2.4. Stimuli

Four subcategories of indoor scenes, outdoor scenes, male face, and female face images were chosen in our experiment as stimuli. The images were all black and white with equal sizes, 800 × 1000 pixels. Each subcategory contained 90 images. Some of the images were gathered from Sanders-Brown Center on Aging at the University of Kentucky and the others were collected through the Internet. Face images were chosen to be neutral without any emotional expression. They were centered inside the composite image. Female faces had long hair and male ones had short hair. The indoor images were chosen from interior scenes. Outdoor images were natural landscapes and cityscapes. Brightness and contrast for all face and scene images were adjusted so that the images on both face and scene categories have equalized and identical contrast. Participants were trained with a sequence of superimposed images of face and scene. The duration of each stimulus was set to be 1000 milliseconds. It was programmed to have 50% transparency of each image subcategory in the superimposed images. For the future works, the platform has also the capability to further adjust the transparency of image types in the superimposed image based on attentional level to the instructed subcategory.

2.5. Experimental Protocol

Six healthy participants including 4 males and 2 females, with a mean age of 43 years, voluntarily completed eight training blocks of the experiment. All participants had normal or corrected-to-normal vision. They were all right-handed and never had prior experience in participating in BCI studies. They had no history of neurological or psychological disorder (based on self-report). All the subjects were employees at UTK and five out of six participants had an academic degree. The experimental protocol was approved by the Institutional Review Board at the University of Tennessee, Knoxville (UTK). All participants gave written consent to perform the experiment. The computerized task was provided by a PC with dual monitors. One monitor was viewed by the experimenter to control the experiment. The other monitor was positioned in front of the participants for presentation of stimuli. The participants were asked to sit comfortably in a fixed chair with one hand resting on the lap and another hand grabbing a computer mouse for giving behavioral responses. The participants were instructed to pay attention to the monitor during the experiment and limit their excessive body movement. The participants were also asked to fixate their gaze to the middle of the screen, and keep their head at approximately 50 cm from the monitor while observing the stream of images. Our experimental protocol consisted of eight blocks of trials with a respite between blocks. Each block started with a one-second texture cue instructing the attended subcategory image, followed by 50 trials of image stimuli. The duration of each trial was set to one second without any intertrial time. A trial includes a greyscale overlaid picture in which 50% of opacity was from scene (indoor or outdoor) category and 50% is from face (male or female) category. There was no repetition of face or scene images through each block of the experiment. This process helped to prevent any learning mechanism happening for the participant. Participants were asked to identify whether the shown image contained the task-relevant image (e.g., an indoor image) or the task-irrelevant image (e.g., an outdoor image) by responding to each superimposed image. They were asked to click the mouse for each recognized relevant image and withhold their responses for irrelevant image. The task-relevant subcategory images were fairly distributed within each block. As a result, half of the composite images contained images from the task-relevant subcategory (e.g., indoor image) while the other half of composite images contained images from the task-irrelevant subcategory (e.g., outdoor image). Table 1 illustrates a sample sequence of composite images during a block and also the corresponding expected responses from participants. The number and distribution of blocks were chosen in a way to counterbalance the projection of mouse clicking on the brainwaves. We alternated the task-relevant and task-irrelevant images among the four image subcategories as shown in Table 2. Because of difficulty in keeping constant sustained attention to the composite images during a block, we ran each block one time to prevent any fatigue happening for the participant. The total time for the experiment was about 10 minutes per participant.

3. Classification Methods

Previous studies provided evidence that recorded EEG signals have potential to discriminate healthy people and individuals with cognitive deficits. A number of signal processing and machine learning methods have been studied for (non-)event-related EEG analysis by our group (see McBride et al. [36, 37]). We applied different techniques on EEG such as interchannel coherence, spectral analysis, and causality. In the present work, we aimed to identify participants’ attentional states into two categories of images (face versus scene; regardless of their subcategories) by using recorded EEG signals. The participants were primed with the subcategories throughout the experiment. So, we hypothesized that the brainwaves contained common features for the subcategories of one category. This assumption reduced the problem into the classification of EEG signals to a 2-class classification problem, i.e., classifying underlying patterns of EEG while the participants attended to faces or scene. Meanwhile, the behavioral responses were collected and used as a predictor for comparison (relevant image vs. irrelevant image; see Table 1). Flowcharts in Figure 2 illustrate the process of analyzing a participant’s overt response as well as his/her EEG signals. A brief description of EEG signal preprocessing, features extraction, dimensionality reduction, and classification techniques is given as follows. In this study, a combination of temporal and frequency features has been extracted.

3.1. Signal Preprocessing

A band-pass FIR filter with an order of 500 and cut-off frequencies of 0.4 Hz and 40 Hz has been applied to EEG recordings. The filter has the advantage of removing low-frequency drifts while eliminating the undesired frequency bands. EEG has a low signal/noise ratio (SNR) and is prone to various artifacts such as electrooculography (EOG). This muscle artifact may interfere with the results of the experiment. To avoid the influence of facial movement artifacts on the experiment result, we excluded the trials in which the participants’ EEG amplitude was greater than 75V.

3.2. Temporal Features Extraction

There were numerous features that can be extracted from EEG data. After an initial investigation, we found that there are multiple ERPs associated with different stages of attention. We incorporated all of those ERPs to be part of the feature vector. This feature vector was formed by extracting magnitude and latency on all 14 channels corresponding to ERPs including ELAN, N1, Visual N1, P1, N2, N2pc, N400, MMN, Bereitschaftspotential (BP), P50, P2, P3, P3a, P6, and N700. Overall, the calculation led to a total of 420 (30x14) ERP related features.

3.3. Frequency Features Extraction and Dimension Reduction

The power spectral density (PSD) of all 14 channels was calculated. By using the Welch’s method, we extracted energy from delta [0.5, 3.5] Hz, theta [3.5, 7.5] Hz, alpha [7.5, 12.5] Hz, beta [12.5, 25] Hz, and gamma band Hz. The median spectral power and total spectral power were also computed. It led to a feature vector with the size of 112 (8 × 14) elements to represent frequency features of a trial. The temporal feature vector in addition to the frequency feature vector resulted in a vector of 532 elements for training portion of EEG data. A stepwise iterative variable selection was applied [3840] to reduce the dimension of the extracted feature array while choosing the most significant features. The confidence level was set at 95%. Therefore, features with the significance level (p_value) below 0.05 were incorporated.

3.4. Classification Method

An individualized support vector machine (SVM) model with the reduced size feature vector was trained to classify participant’s attentional state to a scene versus to a face image. SVM is a supervised classification method for linear and nonlinear pattern recognition [41]. SVM has been employed in many studies on machine learning and data classification where it is difficult to assort distinct classes with a straight line. The basic approach of SVM is to construct a hyperplane in a feature space to perform as a decision surface for separating data from two different categories. If we assign “+1” and “-1” labels to the two class categories, the objective in SVM analysis is to find an optimal hyperplane between positive and negative examples such that the margin between two classes would be maximized. To evaluate the performance of classification and prevent model overfitting, we conducted a leave-one-block-out (LOBO) cross-validation to develop an individualized attentional state decoding model within each participant’s dataset. For each participant, one block of trials was withheld as the test set while an SVM model was trained using the remaining blocks. Subsequently, classification results of the SVM model evaluated on the test set block were recorded. This was repeated to facilitate each block of data as a test set.

4. Results

This section illustrates the results of participants’ behavioral response as well as EEG decoding accuracy among sample dataset during a sustained attention task. We measured the percentage of correct responses as behavioral response. Participants’ mean behavioral performance is reported in Table 3. The average success rate over the entire face and scene blocks were shown for each participant, separately. The behavioral performance ranges between 73.0% and 96.5%. Since each composite image was shown for one second only, a participant may misidentify a relevant image or fail to make the correct response within the trial interval. Thus, we decided to accept participants’ self-correction if it happened during the same trial interval. We performed a linear SVM classification with the collected EEG signals dataset while participants were attending to two classes of scene and face images. The LOBO cross-validation results on attentional state evaluation are summarized in Table 4. The scene accuracy indicates how accurately the SVM model predicts the attentional state towards scene images whereas the face accuracy indicates how accurately the SVM model predicts the attentional state on face images. On average, the accuracy of the individualized model was around 77% which is comparable to the fMRI study’s result that reported an accuracy of 78% [2]. In some of the previous studies with fMRI, there has been a report of positive correlation between functional neural network and the relevant behavioral performance [2, 42]. This motivated us to investigate the existence of such correlation in our experiment using the classification results of EEG signals. Figure 3 is plotted by using reported computation in Tables 3 and 4. Figure 3 illustrates the success rate of behavioral performance with respect to mean decoding accuracy for scene and face categories, separately. The result suggests a positive correlation between mean behavioral success rate and the corresponding mean attentional states among the population. This observation may suggest that there is a close association between decoded attentional states and the response of motor. This observation confirmed the finding in some of the previous fMRI studies while suggesting the efficacy of the proposed low-cost EEG-based platform in tracking interconnectivity of cognitive and motor performance in neurorehabilitation programs. Figure 4 shows the average time-frequency illustration of EEG signals during attention to faces and scenes (left and middle) and averaged ERP responses (right) over all participants at occipital (O1, O2) and parietal electrodes (P7, P8) in Figure 4(a), and frontal (left F3, FC5, right F4, FC6) sites in Figure 4(b). Visual attention is mostly associated with ERP components N100, N200, and P300 [1, 43, 44]. Faces evoked larger N100, N170 responses in early visual cortices as it is seen in Figure 4(a) compared to ERPs of scenes. In Figure 4(b) frontal sites, faces evoked extra alpha waves compared to scene (e.g., F3). Enhanced ERP to faces is also observed in P1 and P2 range, likely because faces draw more attention. Consistent with literature, right hemisphere depicts slightly more attention to faces [43]. Additionally, P300 ERP component which relates to late positive brain response does not reveal any difference in processing of attentional states to face and scene categories.

5. Discussion and Future Work

The present study is an attempt to develop a new BCI platform to conveniently decode the brainwave patterns during a sustained attention task using EEG. Previously, Wang et al. [45] reported discriminating four categories of images (human faces, buildings, cats, cars) using offline recorded EEG data. The same group later improved the classification results in 2016 [46]. De Vos et al. [47] classified the face image stimulus among house and word stimuli using single-trial EEG data. El-Lone et al. [48] classified the EEG signals for two categories: objects versus animals. However, none of these works reported the attentional state evaluation using EEG classification when two different categories of images are superimposed in one image. As an initial attempt, an EEG-based BCI platform was developed to implement the designed attentional state protocol. In a pilot study consisting of six participants, the EEG data was collected while the participant attended to only one subcategory of images during the blocks of streaming of composite images.

The developed platform may be employed in diagnosis of attention deficit in early stage of dementia or Mild Cognitive Impairment (MCI) in elderly people [5]. The platform may also be used as a method to assess the attentional levels in children to diagnose ADHD. Moreover, the proposed platform may be extended into a real-time neurofeedback protocol as a mechanism to enhance attention in ADHD as well as in dementia patients. An individualized EEG neuromarker from the whole brain for sustained attentional state was extracted using machine learning methods. After extracting 532 spectral and temporal features from EEG, we filtered out the most significant features through an iterative stepwise feature reduction algorithm. It helped to refine the feature set to incorporate the most relevant features with an automated scheme. The individual differences in cognition, performance, and brain responses were observed in another study [49]. This intersubject variability led to different targeted neural frequency (e.g., theta/beta, upper alpha) and brainwave patterns in individualized attention training protocols [50, 51]. As a result, different neurofeedback protocols were proposed for various clinical populations rather than offering a generic neurofeedback protocol [50, 51]. The EEG showed a potential to provide information about the sustained attention network in brain beyond the classic vigilance networks [11, 52, 53]. The classification results could be improved by using advanced machine learning techniques such as deep learning methods [5457]. This enhanced cognition model from the whole brain may reduce the time needed to generate an individualized classifier/neuromarker [11]. The current study has a small sample size. Future research shall investigate an extended population. As suggested in recent research [5, 7, 58], EEG-based paradigms may be developed into an optimized, generic, and ready-to-use neurofeedback protocol. In future work, we will implement the trained SVM classifier in a real-time closed-loop system to further study the efficacy of the method in a neurofeedback rehabilitation setup. The neurofeedback training (adjusting the transparency of images in the composite image based on attentional level) will be used to modulate brain activities while increasing the vigilance in behaviorally relevant functional network [2, 59] using a reward-based training protocol [2, 44] and brain-machine interface technology [42, 60].

6. Conclusions

A new EEG-based BCI classification system was developed for evaluating attention during a visual discrimination task. The developed platform is able to collect EEG data in real time while presenting superimposed stimuli to a participant. A GUI was designed to give more flexibility and controllability to the practitioner for administering participants through the experiment. Six participants were recruited to test the feasibility of the system and to evaluate the viability of EEG-based classification of the participant’s attentional state. EEG signals were collected from the whole brain and were sent to the computer for processing. A subset of features comprised of power spectral density on different bands in addition to amplitude and latency of multiple ERPs were identified in the data classification. Support vector machine was employed as the discrimination method. The average behavioral response was around 85%. The average classification result between scene and face categories was at 77%. It is noteworthy that the visual stimuli in this work are composite pictures that consist of two image categories. As such the developed platform is not designed to extract the content of the stimulation but to determine the attentional state of the subject. The developed EEG-based BCI platform has the potential to be applied in real-time classification and neurofeedback tasks for diagnosing and training patients with attention deficits.

Data Availability

The EEG data used to support the findings of this study are available from the corresponding author upon request.

Disclosure

The authors have conducted a related study [61]. The content in this article is independent of the presentation in [61].

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Authors’ Contributions

Reza Abiri and Soheil Borhani contributed equally to this work.

Acknowledgments

This work was in part supported by NeuroNET and Alzheimer’s Tennessee.