Abstract

Injuries and fatalities for vulnerable road users, especially bicyclists and pedestrians, are on the rise. To better inform design for vulnerable road users, we need to evaluate how bicyclist and pedestrian behavior and physiological states change in different roadway design and contextual settings. Previous research highlights the advantages of using immersive virtual environments (IVEs) in conducting bicyclist and pedestrian studies. These environments do not put participants at risk of injury, are low cost compared to on-road or naturalistic studies, and allow researchers to fully control variables of interest. In this paper, we propose a framework, Omni-Reality and Cognition Lab Simulator (ORCLSim), to support human sensing techniques within IVEs to evaluate bicyclist and pedestrian physiological and behavioral changes in different contextual settings. To showcase this framework, we present two case studies, where pilot data from five participants’ physiological and behavioral responses in an IVE setting are collected and analyzed, representing real-world roadway segments and traffic conditions. Results from these case studies indicate that physiological data are sensitive to road environment changes and real-time events in the IVE, especially changes in heart rate and gaze behavior. In addition, our preliminary data indicate participants may respond differently to various roadway settings (e.g., signalized vs. unsignalized intersections). By analyzing these changes, future studies can identify how participants’ stress level and cognitive load are impacted by the surrounding environment. The ORCLSim system architecture is a prototype that can be customized for future studies in understanding users’ behavioral and physiological responses in virtual reality settings.

1. Introduction

Over the past couple of decades, the evaluation of roadway safety and design has been automobile-centric. Many observational, survey-based, naturalistic, and experimental studies have been conducted to evaluate the impact of roadway design features on drivers’ behaviors and safety, leaving out other roadway users such as bicyclists and pedestrians. The National Highway Traffic Safety Administration reported a 35% increase in pedestrian fatalities in the past ten years, and deaths of bicyclists in the United States reached all-time highs in 2018 and 2019 [1]. These trends indicate that the design of current roadways needs to be improved to be more inclusive for all users, especially for vulnerable road users such as bicyclists and pedestrians [2]. Different factors, such as the speed limit, roadway design, and the presence of large vehicles (e.g., trucks), have been shown to be associated with severe injury or fatality of bicyclists [3]. In addition, the presence of intersections, traffic volumes, noise levels, and physical separation between bicyclists and vehicles has been shown to influence bicyclists’ stress or comfort levels [46]. Similarly, for pedestrian safety, similar factors for bicyclists are emphasized by researchers: pedestrian infrastructure, roadway design, traffic volumes, vehicle speed, and visibility of the road environment [7]. It is also found that bicycle paths, crossing surface material, street type, as well as the presence of nearby parked vehicles are associated with the number of pedestrian-vehicle conflicts from a naturalistic observation study [8].

To better inform roadway design, extensive datasets like the automobile-focused studies of the past are needed for bicyclists and pedestrians. To develop robust bicyclist and pedestrians-focused datasets, studies with both high ecological and internal validity are needed. Ecological validity refers to the extent an experimental environment matches with the real world, increasing the chances that the effects identified in an experimental environment generalize to real-world settings. Internal validity refers to the extent, in which a cause-effect relationship is warranted in a study. Subjective, naturalistic, and experimental datasets can be utilized to tackle these issues. Subjective studies, such as surveys, provide measures of users and their perceptions of their environment but lack ecological and internal validity [9]. On the other hand, naturalistic studies can provide information about realistic changes within the environment and bicyclist and pedestrians’ behavior with high ecological validity, but these studies with lower internal validity are resource- and time-extensive and have potential risks of injuries and fatalities for participants. For example, a study in real traffic examining glance behavior of teenage cyclists, while listening to music is terminated when the results indicated that a substantial percentage of participants cycling with music decreased their visual performance [10]. Furthermore, naturalistic studies are influenced by many environmental factors that restrict the ability to fully isolate and understand the impact of independent variables, thus offering low internal validity, especially for physiological and behavioral factors [4, 11]. Thus far, the majority of bicyclist and pedestrian studies rely on subjective and naturalistic data derived from real-world settings to assess participants’ behavior and comfort in different traffic environments [12, 13].

Experimental studies provide an opportunity to evaluate the impact of safety-related conditions, infrastructure, and technology on bicyclists and pedestrians. They can offer the ecological validity lacking in subjective studies and allow the researchers to control for external variables, unlike naturalistic studies, for greater internal validity. Experimental studies conducted with virtual simulators can minimize the hypothetical bias of subjective surveys while offering a controlled, low-risk, and immersive environment that real-world experiments cannot guarantee. The benefit of immersive virtual environment (IVE) is achieving high internal and ecological validity, while also being cost-effective and offering complete experimental control to replicate trials [14, 15]. Early IVE lacked realism, which was primarily due to a lack of technological capability. Fortunately, IVE software and hardware platforms have significantly improved over the last few years with the release of high-end commercially available head-mounted displays (HMDs). Furthermore, as the level of immersion increases, it is possible to integrate human sensing devices to capture participants’ psychophysiological data, which is a field of data that has historically been overlooked. Such data provide insights into how participants’ behaviors and perceptions may change in contextual settings in different research fields [1620]. In addition, psychophysiological data can record people’s responses to environmental changes, while some of these responses are not visible from the videos such as heart rate. For example, pedestrians’ distinct physiological responses (gait patterns, heart rate, and electrodermal activity) to negative environmental stimuli are reported from naturalistic ambulatory settings in a building [21]. With the increase of realism in IVE simulators and the development of low-cost ubiquitous sensors, IVE simulators have become promising tools for conducting highly realistic and immersive experimental studies [22]. In traffic safety studies, driving simulators have been widely applied to study drivers’ behaviors, awareness [23], and psychophysiological states with multimodal data collection systems such as eye trackers, electroencephalogram (EEG), and electrocardiogram (ECG) [2427]. Some of the driver-related studies are conducted in IVE [28]. Meanwhile, for bicyclists and pedestrians, only a few studies have applied physiological responses in IVE simulators. For example, bicyclists’ galvanic skin response is found to have less peaks with a bike lane than in no bike lane condition [6]. In another cycling virtual reality study, EEG data show its potential in a hybrid model framework as an indicator of the perceived risk of bicyclists [29]. For pedestrians, it is notable that older pedestrians spent more time focusing on their travel path and rarely on other areas in the last five seconds before making the crossing decision in an IVE study [30].

In this paper, we propose a modular IVE-based framework, Omni-Reality and Cognition Lab Simulator (ORCLSim), for supporting pedestrian and bicyclist physiological behavior research. The proposed framework integrates realistic visualizations from the real world in IVE along with a physical bicycle and a suite of passive sensing technologies, which enable the collection of physiological and behavioral responses of users. Our framework has the following innovations: (1) a low-cost and highly immersive solution for studying vulnerable road users’ behavior and responses to the new roadway design features or new technology such as bicyclist’s responses to connected autonomous vehicles; and (2) multimodal data collection to capture participants’ physiological information (i.e., gaze, pose, heart rate, and HRV), physical responses (i.e., braking, speed, and steering), as well as controlling and monitoring environmental conditions (distance of vehicles to cyclists and traffic volume). Previous IVE studies did not collect behavioral and physiological data. The lack of physiological data in previous studies is mainly due to cost and the intrusive nature of such devices, as well as the lack of analytic and processing capability of human sensing devices. By synchronizing the timestamps of different human sensing devices, bike simulators, pedestrian walking, VR headset, cameras, and video recording, the authors are able to precisely match behavioral/physiological responses to the stimuli in the environment, allowing us to fully capture how contextual setting impacts roadway users. In the framework, the authors use integrated eye trackers, light-weighted smartwatches, and low-cost web cameras to collect the gaze, heart rate, body position, and cycling/walking data in a less intrusive way. These modules are independent, and can be easily modified and integrated with other simulators.

The goals of this paper are to (1) identify research methods, trends, and gaps in knowledge related to bicyclist and pedestrian physiological behavior research in IVE; (2) present a novel framework for evaluating bicyclist and pedestrian behavioral changes through integrating human physiological sensing within IVE; and (3) present a set of case studies to highlight how the proposed framework could be implemented to collect and analyze bicyclist and pedestrians’ behavioral and physiological changes in different roadway conditions and designs.

2. Background and Literature Review

This section will provide background information regarding different types of studies on bicyclists and pedestrians and how physiological measurements are integrated into related studies, especially for IVE studies. Table 1 shows a list of acronyms for further reference.

2.1. Surveys and Observational Studies

Surveys have been widely used as methods of studying bicyclists and pedestrians, particularly when faced with a lack of observational data. Surveys, when composed carefully, can reliably and efficiently assess large populations of people and have been used to study a wide variety of topics including perceived safety and comfort [31, 32], route choice [33], and crash history [34, 35]. However, stated preference surveys have limitations such as being subject to hypothetical bias, where responses to hypothetical situations are not the same as they would be in real-world situations [36].

Observational studies eliminate the risk of hypothetical bias from stated preference surveys [37]. However, the data collected relies on-road users in real-world conditions. In recent years, with the increasing number of cameras, more video streams are available for observational studies. However, these studies can only evaluate participants’ behaviors in existing environments, where we have a very limited number of options to consider for roadway improvements. To have full control over design considerations, we need to evaluate how bicyclists and pedestrians respond to different designs of roadways during the planning or design phase of projects. Simulations and immersive virtual environments offer an approach that minimizes the limitations of stated preference surveys, and allows for a controlled, safe environment that real-world observational studies cannot provide.

2.2. IVE Simulation Technology and Framework

Over the past decade, driving simulators, virtual reality (VR) technologies, and human sensing technologies have provided new insights into human behavior in different contextual settings, assisting in evaluating different design alternatives for roadways [38, 39], buildings [40], hospitals [41], and other civil infrastructure systems [39, 42, 43]. Simulation methods utilizing IVE offer a low-cost, low-risk approach to studying the users’ safety, perception, and behavior. Traditionally, real-world observation methods have been used to understand bicyclist and pedestrian behavior. These methods are often expensive, time-consuming, and unrealistic for studying naturalistic behaviors as they often require some level of unrealistic environmental control for the safety of test subjects. The improvements in IVE over the recent years have provided researchers, designers, and engineers with a way to evaluate alternative infrastructure designs while providing high degrees of immersion. Novel, commercially available VR headsets offer a high degree of realism and immersion. Furthermore, environmental factors that may influence bicyclist and pedestrian behavior are highly controllable within IVE, allowing for replicable experimental trials. The last two decades have seen research utilizing IVE and VR simulations focusing on how countermeasures influence safety-related elements such as walking speed, gap acceptance, analysis of risky behavior, stated preference data, visual or auditory warning effectiveness, speeds, steering, and resistance [4449].

Arguably the biggest gap in IVE research for bicyclists and pedestrians is the lack of standard methods to cross-compare different studies. For instance, it is difficult to draw conclusions relating to technology effectiveness between a simulator using 2D screens and another using a 3D HMD, as validation studies are very limited and not consistent between different mediums [50]. This was shown by Maillot et al., which evaluated participants’ crossing behavior across three mediums: 2D screens, 3D HMD, and 3D Cave Automatic Virtual Environment (CAVE); their analysis showed there exists a significant difference in participant gap acceptance between 2D screens and CAVE. However, there was no significant difference between CAVE and HMDs [51, 52]. These findings indicate that not only are there limited IVE studies for understanding bicyclist and pedestrian behaviors but there are even fewer studies where the results and findings can be properly compared. Some correlation is recognized between a multiscreen setup and the use of a cell phone mounted in a cardboard viewer as a simulated HMD setup [53]. Other factors such as participant movement, visual scenes, and sound technology have been taken into consideration for fidelity comparison [5456]. Overall, a lack of a generalized framework to develop IVE simulators and technological inconsistency in data collection between studies are the biggest factors for this gap. Other research gaps worth noting include a lack of model complexity; for example, more work needs to be put into the IVEs to incorporate traffic flow theory [5759]. In addition, the lack of complexity with respect to what the bicyclists and pedestrians can do within an IVE also needs to be addressed including limitations in walking speed, interaction with vehicles and infrastructure, and modeling streetscapes within the boundaries of indoor laboratory space [46, 5759].

2.3. Integration of Human Sensing within IVE

Apart from subjective studies, there are limited datasets including human physiological and psychological sensing (e.g., eye tracking, body tracking, and heart rate) for bicyclists and pedestrians. It is crucial to assess participants’ patterns of perception and reaction in certain contextual settings. Many traditional on-road studies have used accident statistics and road infrastructure data (e.g., roadside cameras) to evaluate the safety-related concerns of bicyclists and pedestrians. To further study their perception and cognitive states, human sensing devices (e.g., physiology devices) have been shown to provide promising insights [60, 61]. There are practical concerns about the data collection of human sensing on real roads. First, the safety, ethical and cost considerations prohibit large-scale on-road observational experiments [10]. Second, the implementation of traditional human sensing devices (such as body trackers) is intrusive, which may affect the behavior and perception ability of the participant, as well as the data quality on real roads (especially in high-speed scenarios) [62]. Considering these shortcomings, most IVEs can handle the first limitation, as virtual environments provide a low-risk and cost-effective alternative to real settings. The second shortcoming (monitoring perception and cognition) requires the integration of human sensing systems and ubiquitous computing into the IVE. The majority of existing IVE research in bicyclist and pedestrian studies has not utilized ubiquitous computing and human sensing techniques to monitor participants’ behaviors and physiological states.

Eye tracking behaviors, such as fixations distribution and pupillometry, are usually found to be related to the process of cognitive resource allocation. Eye-tracking behavior is usually measured by optical eye trackers. Eye tracking has been widely used in studying users’ visual perception and attention in different contexts. For example, research has shown that experienced and inexperienced bicyclists have a different perceived gaze at infrastructure treatments around intersections [63]. The latest virtual reality headsets, such as the HTC VIVE Pro Eye, have integrated eye tracking features, allowing for IVE researchers to incorporate eye tracking analysis within their studies.

Body position has an influence on leg kinematics and muscle recruitment for bicyclists [64]. Sensors can be used to build 3D body tracking by implementing multiple on-body receivers to study pedestrians’ dynamics of indoor activity [65]. Recent developments in computer vision have greatly reduced the cost of obtaining body movement data. For example, OpenPose, an open-source real-time multiperson system, can jointly detect human body, hand, facial, and feet key points on single 2D images [66].

ECG is a well-established method to record the electrical activity of the heart. A participant’s heart rate (HR) and heart rate variability (HRV) can be measured using an ECG signal. HR is a commonly measured index of physiological arousal in response to changes in work demands, especially for workload [67]. Relative to HR, HRV decreases with increasing task demands [68]. To collect the HR/HRV data, apart from the intrusive sensors usually utilized in lab tests, many wearable devices such as smartwatches and smart bands, can provide reliable measurements for HR and HRV [69, 70]. These devices enable longitudinal data collection that can help in building personalized models for users.

To summarize the existing literature, we have categorized past IVE bicyclist and pedestrian simulator studies with their IVE settings and data collection methods. Tables 2 and 3 illustrate how the trends in technology, immersion, collected data, and analysis of bicyclist and pedestrian research have changed over the last two decades. Note that for studies from the same research group, only the latest work is included. As can be seen from Tables 2 and 3, for past pedestrian and bicycle simulators, there are two major shortcomings: (1) lack of realism in existing VR environments and cycling experiences and (2) lack of integration of behavioral and/or physiological sensing in the real world and VR simulation studies.

3. Materials and Methods

To address the existing knowledge gaps identified in the previous section, we introduce a new IVE-based framework, ORCLSim, where, we can evaluate participants’ behavioral and physiological responses in different simulated environments. This section provides details on the devices and processing techniques utilized in the proposed framework. In order to collect the multimodal data desired, multiple components are required to work in synchronicity within the IVE. The ORCLSim system architecture is shown in Figure 1, demonstrating all the technology, software, communications network, and associated data flow. The details of the system framework will be discussed in this section.

3.1. Environment and Design Context

The IVE is developed based on a 1 : 1 scale as the real-world environment: the Water Street corridor in Charlottesville, Virginia. In the presented framework and during the development of the IVR model, the scale of the road, surrounding buildings, and other roadway design features (i.e., markings, traffic lights, etc.) are calibrated to fit a 1 : 1 scale as the real road. For the presented case study, the construction plans of the selected roadway corridor from the city of Charlottesville were obtained as a reference to build the IVE. Water Street is well-trafficked by bicyclists and has been identified by the Virginia Department of Transportation as a high-risk site for pedestrians and is being considered for redesign by the city of Charlottesville, as shown in Figure 2. The section of the corridor chosen for this experiment consists of four city blocks, with an eastbound 4% downhill grade on one of the segments (road segment 1 in Figure 2(d)), shared lane markings for bicycles in the east and westbound directions, a traffic signal at the intersection of East Water Street and 2nd Street SE, and a parking lane in the westbound direction. Figures 2(a) and 2(b) present the comparison between the real environment and the IVE created in Unity. The IVE used in this framework is developed in Unity 3D game engine 2018 and runs through the SteamVR platform. In addition, to realistically simulate traffic patterns, the research team collected approximately two weeks of video data at the selected corridor and recorded the number of cars and other roadway users passing through this corridor along with their speed. This data was used to simulate the number and speed of the cars within the IVR environment. A pilot test was conducted among the research team and several transportation experts from the University of Virginia were invited to evaluate the realism of the model and provide feedback for improvement.

3.2. Simulator Setup

This section will discuss the hardware components chosen for both simulators. Figure 3 demonstrates the appearances of both simulators. HTC Vive Pro VR headsets with their accompanying controllers are equipped in our simulators.

Table 4 shows the cost estimation of the IVE framework, compared to a comparable real road test. The IVE-based framework cannot only save direct costs in hardware but also save additional costs in the planning process and eliminate test subject safety concerns. At the time of this experiment, the cost of the initial setup of the virtual environment is about $3500, the price of HTC VIVE Pro Eye with controllers and base stations is approximately $1500, and the room videos data are collected from two web cameras ($25 each). The software to integrate all virtual reality videos is OBS studio, which is open-sourced and free for research purposes. For the IVE, the eye tracking software is the Tobii Pro Unity SDK, which is free for research purposes. The authors have developed the documentation and the code examples on how to set up and obtain access to the eye tracking data from the HTC VIVE Pro EYE. Meanwhile, for on-road tests, not all types of eye trackers are suitable for this situation (e.g., desktop-based). More flexible eye trackers such as eye tracking glasses (SMI or Smarteye Pro) are required, as well as the license fee of the eye tracking glasses’ software, which can be very expensive (such as the SMI eye tracking glasses + iMotion). Lastly, the real-world road test has potential risk to researchers and participants during the experiment as participants may be involved in unforeseen accidents/events, especially during busy traffic hours. However, the risk for participants and researchers in IVE is very low. Therefore, as long as the IVE setting is representative of the real-world conditions, such IVE environments can provide us with insightful information on how the users can design and manage roadway systems to ensure the safety and comfort of all roadway users.

The following equipment was specifically chosen for the bicyclist simulator:(i)Wahoo Kickr Smart Trainer: power measurement system of ± 2% for accurate, realistic resistance feedback(ii)Wahoo Kickr Climb: adaptive, real-time indoor bicycle grade simulator attached to the front fork of the bicycle that accurately raises or lowers the front end of the bicycle based on on-road grade(iii)Wahoo Kickr Headwind: adaptive, real-time variable speed vortex fan capable of reaching wind speeds experienced by bicyclists on the road(iv)ANT + : wireless protocol used for communications between the Wahoo training equipment and desktop computer(v)Physical Trek Verve bike: the main body structure of the bicycle simulator

3.3. Data Collection

In this section, we will introduce details about the collected data from different data sources including the data type and the frequency of data collection. Specifically, we first discuss the data streams exported from the Unity software, followed by the eye tracking data, and information extracted from the video recordings and smartwatches, as shown in Figure 1.

With the attached scripts written in C# programming language to the Unity scenario, the world position (in meters) and direction (unit vector) of each object in the virtual environment can be extracted including headset, controllers, and other virtual objects such as vehicles. The scripts also collect any input from the controllers. For example, the pulled trigger values (0 to 1) are the brake for the bike simulator. The frequency of Unity is generally around 30 Hz. In addition, the system timestamp is attached to the final Unity output data for time synchronization.

The eye tracking data are collected through Tobii Pro Unity SDK. It is integrated with Unity with C# scripts. The output of Tobii Pro raw data is the 3D gaze direction, gaze origin, and pupil diameter. Preprocessing techniques are required to relate the eye tracker’s coordinate system to the headset’s position in a virtual 3D world. The frequency of eye tracking data is 120 Hz. Details of the utilized eye tracking system, sample environment, and the code to extract the different data streams have been shared online [87].

The video recording system has three components: two video recordings from cameras capturing the body position of the participant and one screen recording of the participant’s point of view in IVE. These videos are recorded simultaneously in OBS studio with the same frequency (30 Hz), resolution (1080 p, 1920 by 1080), and system timestamp.

Experiment participants wear two android smartwatches (one for each wrist) that are equipped with the “SWEAR” app for collecting longitudinal data. The SWEAR app records heart rate (1 Hz), hand acceleration (10 Hz), audio amplitude (noise level, 1/60 Hz), and gyroscope (10 Hz) [88]. Both watches are connected to a smartphone via Bluetooth; the smartphone and computer are on the same Wi-Fi network to make sure time is synchronized with the server before each experiment.

3.4. Data Preprocessing

All the data collection devices and platforms (except for the smartwatches) are connected to the local computer, allowing them to be synchronized with the computer’s system time. Information from each video source (frames per second, creation date, duration, height, and width) can be extracted from the singular video and be split into separate videos for each source (cameras 1 and 2) through the Opencv software. Furthermore, the body position data can be extracted from these videos using the OpenPose software. Figures 4(a) and 4(b) show the body position detection of the video recordings from OpenPose. Combining the raw gaze direction from the eye tracking data with the video information of the point-of-view videos, it is possible to transform the 3D gaze direction into 2D videos to visualize, what the participants are looking at in the IVE. As shown in Figure 4(d), the green and blue dots represent the direction, in which the left and right eyes are looking, respectively. Data from the smartwatches are stored locally in the device during the experiment and then uploaded to the Amazon S3 cloud. After the experiment, the data can be downloaded for further analysis.

Figure 4 shows the visualization of all the data collected in the simulator after time synchronization.

3.5. Data Analysis

In this section, we demonstrate potential applications of the collected data, specifically by discussing the change point detection algorithm applied on the HR as well as the gaze entropy, which is the basis of the event detection in our study. First, we discuss how the Bayesian change point (BCP) detection is applied to the HR data. Similarly, for the gaze data, we discuss how gaze entropy can be calculated and used to identify the dispersion of gaze.

Bayesian Change Point Detection. BCP detection methods are applied to detect the abrupt changes in HR data. Change point analysis deals with time series data, where certain characteristics undergo occasional changes. Observations are then assumed to be independent in different blocks given the sequence of parameters [89]. Suppose, we have a time series of HR data X, and, we use to indicate a partition of the time series into nonoverlapping HR regimes, where means a change point happens at position  + 1. To calculate the posterior distribution over partitions, we use the Markov Chain Monte Carlo (MCMC) method. We define a Markov Chain with the following transition rule: with probability , a new change point at the location is introduced. In each step of the Markov Chain, at each position , a value of is drawn from the conditional distribution of given the data X and the current partition . Let denote the number of blocks obtained if  = 0, conditional on , for . The transition probability , for the conditional probability of a change point at the position  + 1, can be obtained from [89, 90]:

Here, and are the within and between block sums of squares obtained when  = 0 (with change point at location ) and  = 1 (without change point at location ), respectively. The two tuning parameters and can be calculated with MCMC. We use bcp package in R to implement the change point analysis [90]. A similar approach has been utilized in a previous study to identify changes in driver’s HR data in different roadway conditions [70]. The BCP output is a time series data of the probability of change points.

Gaze entropy. Gaze entropy is a comprehensive measurement of visual scanning efficiency. The concept of entropy originates from information theory [91]. There are two types of gaze entropy measures: stationary gaze entropy (SGE) and gaze transition entropy (GTE) [92]. SGE measures overall predictability for fixation locations, which indicates the level of gaze dispersion during a given viewing period [93]. The SGE is calculated using Shannon’s equation:

Here, is the value of SGE for a sequence of data with length , is the index for each individual state, and is the proportion of each state within , it is assumed that fixation is an individual output of the gaze control system that makes spatial predictions regarding the location of subsequent fixations [92].

GTE is conducted by applying the conditional entropy equation to 1st order Markov transitions of fixations with the following equation:

Here, is the value of GTE, is the stationary distribution, same as (2), and is the probability of transitioning from i to j. GTE provides an overall estimation for the level of complexity or randomness in the pattern of visual scanning relative to overall spatial dispersion of gaze, where higher entropy suggests less predictability.

Specifically, to calculate the SGE and GTE, the visual field is divided into spatial bins of discrete state spaces to generate probability distributions. In this study, the fixation coordinates were divided into spatial bins of 100 × 100 pixels, following previous studies [94]. To get the trend of gaze entropy, it is calculated in a rolling window of five seconds (600 data points in raw gaze data streams).

4. Case Study and Results

In this section, we present two case studies (one for bicyclists and one for pedestrians) from a pilot study of five participants to evaluate the proposed framework and highlight the importance of collecting physiological, speed, and position data from participants. The tasks in the IVE are different for each user type (bicyclists and pedestrians) in the case study. The bicyclists are asked to cycle eastbound along the corridor, as indicated in Figure 2(c). The pedestrian’s task is to cross the street using the crosswalk at intersection 2 whenever they feel it is safe to do so. More details about the bicyclists’ experiment can be found in our previous study [95]. All five participants have recruited Charlottesville residents, who are familiar with the modeled Water Street corridor (have experience cycling and walking along with corridor in real life) with a mean age of 31 years old (SD = 3.4). These participants have a more positive disposition to cycling compared to the general population but are not transportation professionals. In the case study, before the experiment, videos introducing the simulator are sent to the participants. Immediately before the experiment, instructions on how to control and move within the IVE are provided to each participant. Furthermore, the participants are placed into a training scenario to get familiar with the IVE before the formal experiment. They can stay as long as they would like in the training scenario by walking or cycling around the IVE until they feel comfortable starting the formal experiments.

Several steps have been taken in the case study to minimize the effect of physical activity on HR in the current study. (1) Upon the arrival of each participant, they are asked to have a seat and rest for at least 10 minutes in the lab. Then they are asked to fill out surveys collecting their demographic information and prior VR experience, the heart rate in this phase is identified as the resting heart rate. (2) The experiment is short in duration and consists of many breaks. For the pedestrian case study, the crossing takes about 10 seconds with an average walking speed of 1.16 m/s, the whole duration (from entering VR to exit) is about 40 seconds. For the bicyclist case study, the average speed is about 15 km/h, the pedaling time is about 60 seconds and the whole duration (from entering VR to exit) is about 90 seconds. There is a 10-min break between the pedestrian study and bicycle study. (3) For data analysis, abrupt changes in HR are used rather than the mean HR. Thus, the analyzed HR is relative to each individual’s resting HR. It is hypothesized that the HR change points may reflect participants’ perceived safety of the environment.

We first identify where abrupt changes happen in the HR readings and then identify the potential reasons behind the events that take place in each time frame. To achieve this, the videos are manually annotated to identify an event or behavioral change among participants. Then the timestamps of these event/behavior changes, as well as other physiological responses, are compared to the time that we observe HR change points for each participant. Through this, we can show whether the effect of HR changes is consistent across different groups of participants.

The other physiological variables selected in the case study are head movement direction, the position of the bicycle and pedestrian from Unity, gaze direction from the eye tracker, and the gaze entropy and its BCP probability from the gaze direction.

4.1. Bicycle Pilot Study

In this experiment, after familiarization with the simulator and calibration for eye tracking and steering in a training scenario, the participants are asked to cycle eastbound in the simulated environment as indicated in Figure 2(c).

Figure 5 shows one participant’s physiological responses to the pilot bike experiment. Using BCP, we can detect the moments when the underlying distribution of HR data changes in a short period of time. Figure 5(b) shows the overall time series of different physiological data. Figures5(b)(I) shows the HR (blue) and the probability of detected change point events (red) during the whole experiment. In addition to the HR data, Figure 5(b)(II) shows the head movement in horizontal direction x (black) and gaze in horizontal direction x (green), the head movement in the x-axis indicates the head facing the direction from straight backward (−1) to straight forward (1). The gaze direction x indicates the gaze direction from left (−1) to right (1). Figure 5(b)(III) shows the stationary gaze entropy (cyan) and BCP probability of SGE (red), and Figure 5(b)(IV) shows the gaze transition entropy (yellow) and BCP probability of SGE (red). Figures 5(a) and 5(c) show the corresponding screenshots for the two HR change points detected in Figure 5(b)(I).

The first change point happens when the participant is approaching the first intersection on the road that does not have any traffic signals; at this time, the participant is also being passed by a vehicle on the left (Figure 5(a)). Meanwhile, the other physiological signals do not show abrupt changes except for minor peaks in GTE as shown in Figure 5(b)(IV). The second HR change point takes place when the participant is approaching the third intersection, where there is a traffic signal. While crossing the intersection, a looking-around behavior is also observed as shown in (Figure 5(c)). As a result, we observe changes in both horizontal head and gaze direction (Figure 5(b)(II)), a larger variance in SGE, and the change points detected from the SGE data points (Figure 5(b)(III)). Similarly, we observe higher variance and more change points in GTE (Figure 5(b)(IV)). Previous research suggests an increase in SGE associated with a higher GTE may reflect the influence of top-down interference on visual scanning, which results in a greater dispersion of gaze [92]. In other words, increased SGE together with GTE indicates a higher visual or cognitive load in the experiment scenario for this participant.

This case study indicates that the HR and gaze changes are sensitive to the environmental changes as well as the participant/bicyclist behaviors. It is also important to note that specific contextual factors (e.g., an intersection with or without a traffic signal) can trigger different physiological responses; therefore, it is important to collect and monitor different physiological data when conducting naturalistic or experimental studies of bicyclists.

To find the reason behind each event, all five of the participants’ video recordings in the case study are manually analyzed. Figure 6 illustrates when the HR and gaze (use GTE as an example) change points happen for each participant. For HR, almost all the change points take place, when participants are approaching an intersection within 15 meters, except for participant 2. When participant 2 was passed by a vehicle in intersection 2 with a very close lateral proximity, the HR went up immediately (no other participant in the pilot study had a car pass by them as closely). For gaze transition entropy, the change point generally happens earlier than the HR change point but follows a similar trend as the HR. Although the sample size is small, some of our observations from the case study include: (1) among the five participants in the pilot study, there are more HR/GTE change points prior to reaching the first intersection. As it is the first intersection in the experiment, participants may feel more stress than when approaching other intersections, as they become familiar with the environment. This implies that in the early portions of VR experiments, participants still need time to adjust to the IVE environment, even after a training scenario before the actual experiment. (2) The change points prior to intersections 2 and 3 take place farther from the intersections compared to the change points detection prior to intersection 1. This could be explained by two possible factors: first, the road segment after intersection 1 has a 4% downhill slope as shown in Figure 2(d), and the participant’s field of view increases as they pass through intersection 1 and enter a 4% downhill slope. Second, the roadway environments for intersections 2 and 3 are more complex. Intersection 2 is at the end of the downhill road segment and there is a lane shift after intersection 2, thus braking and right steering are needed before they enter intersection 2. Intersection 3, as indicated before includes a traffic signal. Although participants are told the signal will always be green during the experiment, their physiological (HR and gaze entropy) data still showed a distinct response at this intersection.

4.2. Pedestrian Pilot Study

The pedestrian pilot study is conducted at intersection 2 in the same IVEs with the pedestrian simulator, where participants can walk freely as they would do in real life to cross a crosswalk. As explained before, the eastbound lane has randomly generated vehicles with different gaps. At the beginning of the pilot study, participants are asked to wait until the first vehicle passes before they can cross using the crosswalk. Once the first car passes, whenever they feel safe, they may cross the road.

Like the bicycle case study, we extract the physiological data with the HR change point analysis results for one of the participants as shown in Figure 7. The definition of the data is the same as the bicycle case study. The first change point happened when the pedestrian noticed the first approaching vehicle, as indicated by the red circle in Figure 7(a). A larger variance in SGE (Figure 7(b)III) and GTE (Figure 7(b) IV) is observed at the same time. An increase in SGE associated with lower GTE is likely indicative of distraction (such as the first approaching vehicle in this case). The second change point happened during the crossing in the eastbound lane, just after the participant looks at the approaching vehicle in the lane (Figure 7(c)). During the change point event, only a larger variance in GTE (Figure 7(b)-IV) is observed, while SGE remains at a low level (Figure 7(b)-III). A reduction in SGE when GTE is increasing reflects top-down interference whereby the viewer focuses on specific items within the visual scene. In this case, the participant is looking straight to the other side of the road after the last look at the approaching vehicle in the lane, trying to cross the crosswalk quickly. In addition, after the pedestrian starts crossing, the range of horizontal head movement is smaller than before crossing (Figure 7(b)-II). This indicates once they make the decision to cross, they will not observe the surroundings (e.g., incoming vehicles) as much as they do before crossing.

Table 5 shows the video annotation details for the pedestrian experiment. A total of 7 HR change points are identified across the participants. There are three main categories of HR change points: two HR change points are detected when participants noticed the first approaching vehicle, two HR change points are identified when participants cross the crosswalk right after the first vehicle passes, and three HR change points are detected, when participants are crossing in the vehicle-approaching (eastbound) lane. Like the bicycle pilot study, these change points correlate to the changes in the contextual setting, such as a vehicle approaching a crosswalk. These findings indicate, why it is important to collect participants’ physiological responses when conducting pedestrian studies. Although our preliminary findings show there exists a correlation between HR and gaze change points to the time that certain events take place in the environment, analysis of a larger group of participants is needed to verify the findings.

5. Discussion and Conclusion

In this paper, we present a system architecture (ORCLSim) for VR simulators to capture physiological and behavioral changes in bicyclists and pedestrians. Specifically, the aim of this study is to determine (1) Current research gaps in physiological behavioral research for vulnerable road users including what metrics are needed to monitor bicyclists and pedestrians’ behavioral changes, especially in IVE simulator studies, (2) what devices are available and how different hardware and software packages can be integrated into IVEs to conduct similar studies, (3) how the multimodal data can be processed for observing the changes in physiological responses given different contextual settings, and (4) showcase how the proposed framework can be implemented by presenting two case studies for bicyclists and pedestrians.

By showcasing the case studies, we aim to demonstrate that the multimodal data collection with low-cost integrated or mobile sensing devices of this framework works. Both case studies’ results are in accordance with findings from previous literature. For example, in a 2019 study, the eye tracking results in a real-world experiment from Italy indicated that different intersection types (e.g., traffic signals, with different merging lanes) affect cyclists’ gaze behavior as they arrive at an intersection [63]. In another example, the Federal Highway Administration (FHWA) recommends using an average walking speed of 1.2 m/s when designing crosswalks [96]. In the presented pedestrian case study, the average walking speed among the five participants is 1.16 m/s, which is aligned with FHWA’s guidelines. In addition to objective measurements, subjective ratings about the realism of the simulator are collected from the postexperiment survey. All the five participants rated 4 or 5 out of 1–5 rating scale on the question “To what extent did your experiences in the virtual environment seem consistent with your real-world experiences of crossing a street?” where 4 indicates “consistent” and 5 indicates “very consistent”.

Previous studies on bicyclists’ and pedestrians’ responses to changes in contextual settings highlight the advantages of controlled experimentation, especially in IVEs. In this paper, we demonstrated that it is important to track physiological metrics to better understand vulnerable user behavior in different settings. Specifically, we showcased the importance of gaze tracking and heart rate data in capturing bicyclists’ and pedestrians’ behavioral responses to different events and roadway contexts. These measurements may indicate the impact of stress levels and cognitive load on the way participants interact with the physical environment. In the case study, our initial findings from the five participants indicate that physiological data are sensitive to road environment changes or real-time events, especially for the change in heart rate and gaze behavior. In the presented framework, we use Bayesian Change Point (BCP) detection method to detect abrupt changes in physiological data. First, we use the HR change point data to identify any potential events and then use annotated video data to get a better understanding of the causes behind each event. The findings are further verified by two measurements of eye tracking data: stationary gaze entropy (SGE) and gaze transition entropy (GTE). The dynamic changes in the eye tracking data also support the observations from the video annotation. For the presented bicycle case study, most change points happen prior to the intersections, while the eye tracking change points usually happened earlier than the HR change points. The increased SGE and GTE along with abrupt changes in HR indicate where the participants feel higher levels of stress in the environment, which are observed to be at the beginning of the experiment and when participants reach the intersection with a traffic signal. The physiological changes in the pedestrian case study are indicative of critical behavior during the crossing such as observing the first approaching vehicle or the moment before crossing. It is worth noting that not all HR abrupt changes reflect visible physical behavioral changes. For instance, in the pedestrian case study, some HR change points occurred when the pedestrian noticed the first approaching vehicle in the initial position, but there are no behavioral changes that are visible from the video. This is a reason why we need to collect physiological data as an indicator of the users’ psychophysiological state. Although these preliminary findings are promising, we need to further examine whether these change points are observed when the number of participants is increased for both case studies.

We have open-sourced the system setup document, code example, and sample dataset for the research community. The integration between the presented devices and software platforms along with the data processing method provides the foundation to support IVE experimental studies, where, we can identify the impact of different roadway designs on bicyclist and pedestrian behavioral and physiological changes. Furthermore, this system architecture makes the development of a VR simulator simpler and more robust since many of the modules are flexible and scalable to different systems and improvements. For example, the smartwatch system can be replaced by more recent and advanced wearable devices that can collect different data streams; or the video recording systems can integrate more event or activity detection through computer vision-based techniques.

5.1. Limitations and Future Work

While useful in addressing many of the gaps in virtual simulation research, IVEs have limitations. Many of the studies in Tables 2 and 3 indicate that portions of their subject pool’s data had to be discarded due to the motion sickness participants experienced while in VR.

Furthermore, the removal of risk within the IVE may also be perceptible to participants: IVE experimentation relies heavily on the subject’s sense of immersion and while the environment may look, feel, and behave realistically, the knowledge that one is in a risk-free virtual space, where physical injury is not possible remains [59]. It is up to the realism of the IVE to suspend a subject’s disbelief in the environment sufficiently to overcome this knowledge, and this effect varies from person to person.

With the proposed system architecture ORCLSim, future IVE research can apply any physiological data collection modules to their own IVE simulators to study the vulnerable road users’ behaviors, perception abilities, and cognitive states in different contextual settings. In addition, more physiological responses may be included in the system with off-the-shelf sensors such as electrodermal activity and electromyography. Furthermore, other measurements of eye tracking, such as the Jensen–Shannon divergence can be applied to calculate the visual scanning efficiency, which has been reported to be a better indicator than the SGE/GTE used in this study [97]. Future IVE can be improved to increase immersion and tackle more complex research problems. A feature that would greatly improve the ease of development of an IVE would be the integration of development platforms such as Unity or Unreal Engine, and commercially available transportation simulation software such as Synchro and Vissim. Through such integration, roadway segments, objects, vehicles, vehicle behaviors, and traffic networks could be simulated more realistically and robustly within IVEs. Furthermore, more robust platforms for integrating multiple users into IVEs would vastly improve immersion and realism.

Data Availability

The txt and video data used to support the findings of this study have been deposited in the Open Science Framework (OSF) repository(DOI: [10.17605/OSF.IO/2DARX])[87].

Disclosure

A preprint of this paper has previously been published in https://Arxiv.org [98].

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.