Abstract

Reliable detection of cognitive load would benefit the design of intelligent assistive navigation aids for the visually impaired (VIP). Ten participants with various degrees of sight loss navigated in unfamiliar indoor and outdoor environments, while their electroencephalogram (EEG) and electrodermal activity (EDA) signals were being recorded. In this study, the cognitive load of the tasks was assessed in real time based on a modification of the well-established event-related (de)synchronization (ERD/ERS) index. We present an in-depth analysis of the environments that mostly challenge people from certain categories of sight loss and we present an automatic classification of the perceived difficulty in each time instance, inferred from their biosignals. Given the limited size of our sample, our findings suggest that there are significant differences across the environments for the various categories of sight loss. Moreover, we exploit cross-modal relations predicting the cognitive load in real time inferring on features extracted from the EDA. Such possibility paves the way for the design on less invasive, wearable assistive devices that take into consideration the well-being of the VIP.

1. Introduction

Visual impairment affects approximately 285 million individuals worldwide according to the WHO [1]. Assistive navigation aids are essential to the visually impaired (VIP) for improving their quality of life and increase their independence. Traditionally, VIP relied exclusively on the white cane due to its simplicity; despite its reliability in obstacle detection, it does not provide any information regarding important aspects of navigation such as the distance, the speed, or the shortest path to the destination [2]. New technologies came to fill this gap, enhancing the traditional assistive aids, aiming to improve the route planning [3], navigating long distances [4], discovering landmarks [5], and detecting obstacles [68]. Ranging from smartphone applications to wearable devices, assistive navigation aids promote greater independence and enable VIP to perform tasks formerly impossible or difficult to accomplish [9]. Yet, the focus of these aids is often on optimizing way-finding or localization tasks without taking into consideration the individual’s needs [10].

Building on our previous work [11, 12], in this study, we place the focus entirely on the visually impaired, assessing biomarkers that can predict in real time the mental effort of the visually impaired while navigating in unfamiliar indoor and outdoor urban environments. The challenges VIP experience during orientation and mobility tasks can be framed according to the cognitive load theory [13], since, during orientation and navigation tasks, a specific amount of space is consumed by the working memory for the exact cognitive demands necessary.

We designed two ad hoc orientation and mobility tasks gathering a wide range of behavioral and biophysical signals from 10 VIP with various categories of sight loss (see Table 1), who volunteered to participate in our study. Collecting electroencephalogram (EEG) signals, we assess the cognitive load and task engagement in the performed task. EEG signals are shown to be stable indicators of the cognitive load in a variety of tasks performed in controlled laboratory settings, for instance, learning to navigate using hypertext and multimedia data [1416] and learning to use complex maps during hypermedia navigation [17]. Despite EEG’s ability to capture the cognitive load when performing a task, its usability in commercial assistive devices is still in its infancy. For this reason, we collected a wide range of physiological signals, such as skin conductance by means of a wearable bracelet. Based on findings in the literature, skin conductance may predict the performance in a task under stressful conditions [1820]; confirming such statement in “out-of-laboratory” conditions brings great advantages in the design of assistive devices.

We contribute to the existing literature by conducting navigation experiments exclusively “in the wild,” where the VIP participants navigated in predefined indoor and outdoor routes previously unfamiliar to the them. These routes included a large variety of obstacles and different urban environments (see Table 2). A machine learning framework was designed based on random forest classifiers to predict the cognitive load of the participants for each time instance inferring on physiological features extracted from the skin conductance signals. The aim of this study is twofold; first, we exploit possible effects that the various urban indoor and outdoor environments may induce on people in relation to their degree of sight loss and, second, to pinpoint easily accessible biomarkers that robustly predict the cognitive load of VIP when navigating in unfamiliar sites in the wild.

In line with the current literature [1820], the emerging cross-validated results suggest that physiological features related to skin conductance are accurately and robustly predicting the amount of cognitive load in real time. Taking into consideration these findings, the design of assistive aids adapts in real time to the requirements and personal needs of the user.

2. Data Collection

2.1. Participants

A total of ten healthy visually impaired adults with different degrees of sight loss participated in the two mobility studies (6 females; average age = 41 yrs, range = 22–53 yrs). To help make them feel comfortable and safe, they were encouraged to walk as usual using their white canes if they wished so and were accompanied by their familiar O&M instructor. Participants were instructed to avoid smoking normal or e-cigarettes and consuming caffeine or sugar (e.g., coffee, coke, and chocolate) approximately one hour prior to the walk. Recruitment was based on volunteering and all VIP were capable of giving free and informed consent. The study was approved by the National Bioethics Committee of Iceland. All data was anonymized before analysis. Seven of the participants walked both the outdoor and indoor routes, one took part only in the outdoor study, and two completed only the indoor task (see Table 1).

2.2. Indoor and Outdoor Routes

The indoor experiment was conducted inside a building of the University of Iceland in Reykjavik. With the assistance of VIP caretakers and O&M instructors, we planned a route to take the VIP through circumstances where different levels of stress were likely to occur (i.e., of varying complexity and difficulty). Participants walked the charted route three times for training purposes. The route comprised five distinct environments representable of a variety of indoor mobility challenges (see Table 2). Indicatively, participants had to enter through automated doors, use an elevator, move across a busy open space, walk down a large spiral staircase, and walk through other obstacles. The route was approximately 200 meters in length and took on average 5 minutes to walk (range = 4–8 minutes).

The outdoor route was charted in the city center of Reykjavik in Iceland. It comprised eight distinct scenes defined so as to cluster environmental and situational factors expected to elicit similar affective reactions. For example, participants had to walk on a busy shopping street, stroll through an urban park, cross a major junction, and pass through narrow sidewalks (see Table 2). The route was approximately 1 km long and took on average 13 min 44 s to walk (range = 9–19 min).

2.3. Multimodal Biosignals

EEG signals were recorded using the Emotiv EPOC+ (http://emotiv.com/epoc/), a mobile headset with 16 dry electrodes registering over the 10-20 system locations AF3, F7, F3, FC5, T7, P3 (CMS), P7, O1, O2, P8, P4 (DRL), T8, FC6, F4, F8, and FC4 (sampling rate Hz). Given the practical constraints involved in monitoring brain electrical activity in the wild, EPOC+ was chosen because it provides a good compromise between performance (i.e., number of channels and scientific validity of the acquired EEG signals) and usability (i.e., portability, preparation time, and user comfort) with respect to other commercial wireless EEG systems [2124].

Along with the Emotiv headset, participants were asked to wear the Empatica E4 wristband (https://www.empatica.com/e4-wristband) [25]. E4 measures EDA as skin conductance through 2 ventral (inner) wrist electrodes (Hz) and BVP through a dorsal (outer) wrist photoplethysmography (PPG) sensor (Hz). E4 further reports HR, extracted on board from BVP interbeat intervals. The wristband also includes an infrared thermopile sensor and a 3-axis accelerometer. E4 is currently the only commercial multisensor device developed based on extended scientific research in the areas of psychophysiology and affective computing. Additionally, it has a cable-free, watch-like design, which makes it easier and more aesthetically pleasing to wear and thus better fitted to use in the wild compared to other wearable biosignal devices. Participants were asked to wear the wristband on the nondominant hand to minimize motion artifacts related to handling the white cane [26].

2.4. General Procedure

Participants walked the outdoor route twice and the indoor route three times for training purposes. In both studies directions were only provided during the first walk to help the VIP familiarize with the route. They were instructed to avoid unnecessary head movements and hand gestures as well as talking to their O&M instructor unless there was an emergency. Video and audio were registered by means of a smartphone camera to facilitate data annotation (observing behaviors across the different environments and situations) and synchronization (start/end of walk, environments, and obstacles). In the outdoor study, GPS coordinates were additionally logged using a Garmin GPSMAP-64s unit at a rate of 1 registration per second. Upon completing the last walk, participants were asked to describe stressful moments they experienced along the route.

3. Feature Extraction

3.1. EEG

The EEG data was first time-domain interpolated using the Fast Fourier Transform (FFT) to account for missing samples due to connectivity issues. Subsequently, all signals were baseline-normalized by subtracting for each participant and for each channel the mean of resting state registrations. These were obtained during a series of laboratory studies with the same participants [27, 28].

Based on findings in the neuroscientific literature we extracted a series of features descriptive of the cognitive and the physiological state of the participants in each time instance. The brain activity is characterized by rhythmic patterns across distinct frequency bands, the definition of which can vary somewhat among studies. Here we analyzed EEG in six bands, namely, delta (0.5–4 Hz), theta (4–7 Hz), alpha 1 (7–10 Hz), alpha 2 (10–13 Hz), beta (13–30 Hz), and gamma (30–60 Hz). Beta activity is associated with psychological and physical stress, whereas theta and alpha 1 (i.e., lower alpha) frequencies reflect response inhibition and attentional demands such as phasic alertness [29]. Alpha 2 (i.e., higher alpha) is related to task performance in terms of speed, relevance, and difficulty [30]. Gamma waves are involved in more complex cognitive functions such as multimodal processing or object representation [31]. Features related to signal power and complexity were extracted using the PyEEG open source Python module [32]. For each of the 14 EEG channels, we computed the Relative Intensity Ratio as an indicator of relative spectral power in each of the six frequency bands [33].

Having extracted the power band features from the EEG signals, we estimated the event-related (de)synchronization (ERD/ERS) index, a well-established measure of band power change in EEG originally proposed by Pfurtscheller and Aranibar [34]. It is defined as where IBP stands for interval band power. The baseline IBP refers to a prestimulus time period without any task demands, in our case the resting state, whereas the activation interval (test IBP) refers to the time period while working on the experimental task. We slightly modified the estimation of the ERD/ERS index, defining test IBP as the time interval of one second of our recorded data. In this way, we result with one time point of ERD/ERS per second, where every time point expresses the synchronization or desynchronization according to the same baseline.

3.2. EDA

The skin conductance data was decomposed into two continuous components, namely, phasic and tonic component [35]. This decomposition and subsequent extraction of tonic and phasic electrodermal activity (EDA) features were performed using the Ledalab toolbox (http://www.ledalab.de/). Overall, we extracted six features: mean tonic EDA (TM) and the number of “spontaneous” SCRs (i.e., phasic changes not traceable to specific stimulation), which are known to be particularly suitable for longitudinal monitoring of emotional stress-elicited EDA (i.e., tonic arousal); sum of amplitudes of registered SCRs (AS) and average, maximum, and cumulative phasic EDA (PM), which provide varying indicators of instantaneous phasic arousal [26].

3.3. BVP and HR

The photoplethysmography sensor of the E4 device measures the blood volume pulse (BVP) from which it derives on board the heart rate (HR). We min-max normalized both data streams to account for interindividual differences [36].

4. Linear Mixed Model Analysis

4.1. Method

To examine differences in mental activity between outdoor and indoor scenes of varying complexity and obstacles in relation to the amount of vision loss, a linear mixed model analysis was conducted for the alpha 1 (lower alpha) and alpha 2 (upper alpha) bands in each of the two routes (outdoor, indoor). Linear mixed models perform a regression-like analysis while controlling for random variance caused by differences in factors such as participant and electrode [37, 38]. We chose to focus on the alpha bands only because it has been repeatedly observed that brain activity at those frequencies is associated with cognitive load in a variety of task demands: specifically, alpha activity has been shown to fall in magnitude (i.e., alpha ERD increases) with higher task difficulty (see [39] for a review).

Fixed factors examined in the analysis included type of scene (Table 2) and category of vision impairment (Table 1). For the latter, two broader categories of vision loss were considered to better fit the linear models to the data: almost blind (categories VI-5 and VI-4) and severely impaired (categories VI-3 and VI-2). Random intercepts for each participant and electrode position were added. Type III Wald -tests were used to test the significance of the fixed factors and their interaction [40]. Pairwise comparisons of group means were carried out with -tests, using Bonferroni-adjusted values where appropriate [41]. Before averaging across conditions, a logarithmic transformation of single-condition ERD/ERS values was applied to improve their distributional characteristics.

4.2. Results

The across-participants average ERD/ERS values for each environment and for each category of vision impairment are shown in Figure 1. For each subplot, mean values for outdoor scenes are depicted in the left panel, whereas those for indoor environments are drawn in the right panel. Type III Wald -test results from the four (two bands × two routes) linear mixed models are reported in Table 3. Vision alone was only a significant predictor of upper alpha ERD/ERS in the outdoor route, although the interaction of vision and scene was significantly influential for both lower and upper alpha ERD/ERS in the outdoor route. The scene alone had a significant effect on both bands and in both outdoor and indoor scenarios.

Post hoc paired samples -tests showed that ERD/ERS in the lower alpha band was significantly higher for almost blind than for severely impaired individuals for the outdoor environments B [small street; , ], E [open space; , ], and G [crossing small street without traffic lights; , ]. Similar trends were found for outdoor ERD/ERS in the upper alpha band [B: , ; E: , ; G: , ]. In addition, upper alpha ERD/ERS was found to be significantly higher for almost blind than for severely impaired participants for the outdoor environments A [shopping street; , ] and H [construction alley; , ]. For the indoor environments, lower alpha EDR/ERS was only significantly higher for almost blind than for severely impaired individuals when walking up and down stairs [scene E; , ].

When averaging across the two VI groups, lower alpha ERD/ERS was significantly higher when crossing a main traffic junction than when passing through the shopping street [, ], small street [, ], and small street crossing scenes [, ]. ERD/ERS in the lower alpha band was higher when passing through the shopping street than the small street and higher for the latter than when crossing a small street, but these differences were not found to be significant. Similar trends were obtained for upper alpha ERD/ERS in the outdoor model [(833.64–−834.56), ], while significantly higher upper alpha ERD/ERS was also observed for the urban park scene compared to the small street environment [, ]. For the indoor route, ERD/ERS in the lower alpha band was significantly higher when using automated moving doors and when taking the elevator than when walking along a narrow corridor [, and , , resp.], navigating through an open space [, and , , resp.], and using the stairs [, and , , resp.]. Lower alpha ERD/ERS was higher for the elevator than for the door scene, but not significantly so. It was higher for the corridor than for the stairs environments and higher for the latter than for the open space scene, but these differences were also not found to be significant. Upper alpha ERD/ERS was also significantly higher when using automated moving doors and when taking the elevator than in the other indoor environments [(887.33–−893.52), and (886.08–−893.01), , respectively], while trends similar to the outdoor model were observed for the remaining indoor scene contrasts.

Overall, outdoor and indoor environments that were more dynamic with respect to complexity and unexpected obstacles, such as crossing a major road, strolling through an open urban space, walking through a narrow alley with coffee tables and advertisement boards, using an elevator, and going through automatic doors, resulted in substantially higher ERD values (i.e., lower relative power) across the two alpha bands, which implies increased task difficulty. These cognitive load “hotspots” are in full agreement with the scenes reported as stressful by the participants themselves at the end of the study.

5. Automatic Prediction of Cognitive Load

5.1. Classification Experiments

To automatically identify the cognitive load of urban indoor and outdoor spaces experienced by VIP while walking through it based only on their biosignals, we postulated the study as a supervised classification process. A widely used ensemble learning method for classification was employed, namely, Random Forest (RF) classifier [42], selected due to its ability to deal with possibly correlated predictor variables and because it provides a straightforward assessment of the variable importances.

The ERD/ERS index of cognitive load was averaged over all electrodes per frequency band per second. The resulting averaged index was binned in three chunks, namely, “Low,” “Medium,” and “High” load. We trained a RF model to predict the aforementioned labels of cognitive load index per each band, inferring on the features extracted from the skin conductance and blood volume pulse sensor. The adjustment of the two most important parameters of RF was performed by means of grid search parameter estimation with 5-fold cross-validation. We exploited the effect of the number of estimators and the effect of the maximum number of features . Overall, the optimum number of estimators was 300 and the maximum number of features was set equal to the total number of features for each experiment.

5.2. Results

Table 4 reports the classification results in terms of AUROC weighted metric. Hereafter, we will refer to AUROC weighted metric with the term “accuracy.” For each frequency band, the average accuracy over 5-fold is reported, along with the respective standard deviation. We note that for all frequency bands the performance of the models is quite accurate and robust.

As mentioned, the ERD/ERS index employed for the definition of the classes was averaged over all electrodes; in literature, there are many studies associating specific electrodes to brain functions, for instance, to memory recall tasks; however, the Emotiv EPOC+ used for the experiments does not provide a full coverage of the cranial surface so as to focus on specific electrodes. Following the exact same scheme for the classification of the cognitive load states (“Low,” “Medium,” and “High”) from the separate electrodes per band we obtained accuracy values identical to the averaged results per band.

Figure 2 depicts the most predictive features of EDA and heart rate of the cognitive load. Note that the order of importance and the relative amplitude of the “Gini” importance value are comparable for all the frequency bands showing the stability of the approach. These findings are in line with the studies in the literature, where the skin resistance is stated to be an important indicator of the cognitive load [1820].

6. Conclusions

This paper presents a framework for real-time automatic assessment of cognitive load when visually impaired people move and navigate in unfamiliar outdoor and indoor environments. The objective is to demonstrate the feasibility of real-time tracking of mentally demanding tasks which can be used as on the fly feedback to assistive devices. Mobility aids for visually impaired people should be capable of implicitly adapting not only to changing environments but also to shifts in the cognitive load of the user in relation to different environmental and situational factors.

The proposed framework is based on multimodal fusion of brain and peripheral biosignal features. Using stress-related features of the EDA signal and an EEG index of cognitive load based on event-related (de)synchronization in the alpha band (ERD/ERS), we identified the most important cognitively demanding “hotspots” for the generic VIP population and for the specific categories of sight loss, pointing out the particular needs/difficulties faced by each VIP category. The high prediction rates in the multimodal classification experiments (83–97% AUROC Weighted, Table 4) are very encouraging of the proposed approach. Even if the chosen urban and building sites did not represent all possible different outdoor and indoor environments and situations in terms of complexity and difficulty, the charted routes were designed so as to combine most of the mobility challenges faced by VIP.

Despite being promising, reported findings should be considered with caution due to the limited number of participants, which did not allow for an in-depth analysis of specific stressors in each category of vision impairment. A larger group study would need to be carried out to confirm and quantify the trends obtained here. Furthermore, the well-established Emotiv EPOC+ EEG headset has certain limitations with respect to the quality of the recorded signal during experiments involving physical activity “in the wild” such as those presented here. Future steps of the present study include refining the predictive model through exploring novel multimodal biosignal features for cognitive load assessment and comparing different classifiers. Such findings hopefully pave the way to emotionally intelligent mobile technologies that take the concept of navigation one step further, accounting not only for the shortest path but also for the most effortless, least stressful, and safest one.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this article.

Acknowledgments

The authors wish to thank the administration and O&M instructors at the National Institute for the Blind, Visually Impaired, and Deaf in Iceland for their valuable input and generous assistance and the visually impaired individuals who took part in the study for their time and patience. The research leading to these results has received funding from the European Union’s Horizon 2020 Research and Innovation Program under Grant Agreement no. 643636 “Sound of Vision.” Charalampos Saitis acknowledges the Alexander von Humboldt Foundation for support through a Humboldt Research Fellowship.