Abstract

Drivers consecutively direct their gaze to various areas to select relevant information from the traffic environment. The rate of crash risk increases with different off-road glance durations in different driving scenarios. This paper proposed an approach to identify current driving scenarios and predict driver’s eyes-off-road durations using Hidden Markov Model (HMM). A moving base driving simulator study with 26 participants driving in three driving scenarios (urban, rural, and motorway) was conducted. Three different fixed occlusion durations (0-s, 1-s, and 2-s) were applied to quantify eyes-off-road durations. Participants could initiate each occlusion for certain duration by pressing a microswitch on a finger. They were instructed to occlude their vision as often as possible while still driving safely. Drivers’ visual behavior and occlusion behavior were captured and analyzed based on manually frame by frame coding. Visual behaviors in terms of glance duration and glance location in time series were used as input to train HMMs. The results showed that current driving scenarios could be identified ideally using glance location sequences, the accuracy achieving up to 89.3%. And motorway was relatively distinguishable easily with over 90% accuracy. Moreover, HMM-based algorithms that fed up with both glance duration and glance location sequences resulted in a highest accuracy of 92.7% in driver’s eyes-off-road durations prediction. And higher accuracy achieved in longer eyes-off-road durations prediction. It indicates that time series of glance allocations could be used to predict driving behavior and indentify driving environment. The developed models in this study could contribute to the development of scenario sensitive visual inattention prewarning system.

1. Introduction

There is compelling evidence demonstrating that drivers gradually become less aware of the driving situation over time with the eyes-off-road duration increasing [1]. Particularly, the longer eyes-off-road duration is associated with lower driving performance and higher risk of safety-critical incidents [2]. Specifically, longer off-road glance results in larger lane deviation and slower response to leading vehicle braking [3] and increases standard deviation of lane position [4]. The results of the 100-Car Naturalistic Driving Study show that crash risk and near-crashes increase by approximately 2 times when the driver’s off-road glance duration exceeded 2 s during safety-critical events [5]. It is regarded that drivers’ uncertainty accumulates over time with their visual attention away from the road. With the increased presence of in-vehicle information systems (IVIS), it becomes more common that drivers divert visual attention away from the road.

Due to the increasing threat to driving safety, several approaches have been developed to estimate visual inattention [6]. Glance duration has long been recognized to influence driving performance significantly and is expected as a sensitive indicator of driving state. Some researchers set the threshold of distraction depending on the relationship between distraction severity and driver’s eyes-off-road duration. When driver’s eyes-off-road duration reaches the threshold, he/she is considered to be distracted [7]. However, such methods rarely distinguish the glanced targets related to driving or not. It may not really reflect driving state only based on driver’s off-road glance. The National Highway Traffic Safety Administration (NHTSA) set 2-seconds off-road glance as one of the limiting criteria for guidelines of acceptable visual-manual in-vehicle information system [8]. Nevertheless, glancing away from the road just for 1 second may endanger driving safety in some conditions.

Glance location also has an impact on crash risks. Intuitively, it may yield additional benefits to use both temporal and spatial dimensions of visual behavior. Some studies focused on glance location as an indicator to recognize visual inattention. Therefore, 1.5th power of glance duration and a penalty for glance location have been used as principal characteristics of visual patterns to model driving behavior [9]. However, it only used visual angle to the road center as the penalty for glance location, not defining the actual glance targets. AttenD algorithm with a time buffer of 2 s was developed by Kircher et al. to detect visual distraction [10, 11]. And the glance location was defined into three areas: forward road, driving-related areas, and other areas. The initial buffer decreases when the driver looks away from the road and increases when his/her gaze returns back to the road and driving-related areas. If the buffer runs empty, the driver is classified as distracted. However, the promising approach did not take the current driving scenarios into consideration. One study shows that drivers spend more time looking at the road and have a lower proportion of long off-road glances in complicated driving context [12]. In other words, the severity of eyes-off-road also depends on the demanding of the driving context [13]. In addition, most of the proposed algorithms tried to identify the “distracted” driving state when driver conducting additional tasks or off-road glance duration reaching the predefined threshold. It means the warning systems usually react to the degraded driving performance induced by inattention passively [14]. Few studies have been conducted to predict visual inattention in advance.

Driving has been recognized as a set of dynamic and complex interactions with the environment [15]. Attentive Drivers should look at targets relevant to driving frequently to maintain good situational awareness. When and where to look was not only determined by drivers’ subgoal but also depends much on stimulus properties of an object in driving scenarios [16]. For instance, driving in urban road requires drivers to update his or her information about parked cars, intersections, pedestrians, cyclist, traffic lights, and other surrounding traffic, while drivers only need to monitor driving speed, lane position, and surrounding cars on motorway. How often and how long they need to look is related to innate characteristics of the contextual factors, such as information decay speed. Meanwhile, different contextual targets require different amount of attention. That is, driving scenario has a significant impact on drivers’ visual behavior. Thus, drivers may show different glance strategies driving in different scenarios, which could be used to distinguish driving scenarios in turn. Fridman demonstrated that visual behavior could be used to predict driving environment using HMM from just 6 seconds with the 100-car naturalistic driving data [17].

Driver behavior is a continuous process, and the current state relies on the last state. Each state has a probability distribution which could be represented as the Markov process. Hidden Markov Model (HMM) is a probabilistic sequence classifier which could give a probabilistic prediction of driving state over future based on past driving behavior. It was extensively used to recognize driver’s intention and predict driving behavior. Mauricio et al. tried to distinguish glance patterns in manual radio tuning, voice-based radio tuning and none-radio tuning driving with the application of HMM. It suggested that the differences in glance allocation strategies serve as an effective predictor of driving state [18]. Khaisongkram et al. described steering behaviors as a dynamic sequence model with the application of the HMM to recognize and predict steering behaviors [19]. Beyond that, HMM was also used to predict driver’s future road segments [20]. In addition, John employed a discrete Markov chain representation to predict a vehicle’s near-term future route based on its near-term past route and trained the model with vehicle’s long-term trip history from GPS data. It was reported that the next road segment could be predicted with an accuracy of 90% [21].

Visual behavior was demonstrated directly associated with driving state and could be used to predict driving state and driving scenario within a short time period. Therefore, it is quite plausible to expect that drivers’ eyes-off-road duration could be predicted using visual behavior characters in different driving scenario with the application of HMM. Instead of merely detecting the distracted driving state, this study proposed a proactive approach to predict the forthcoming off-road glance duration in advance of critical events occurs. It takes driving scenarios into account, which may improve the context sensitivity of visual inattention warning system.

2. Materials and Methods

2.1. Participants and Apparatuses

A total of 26 participants between the ages of 22 and 45 years (M = 31.4; SD = 5.7 years) with a driving experience of 6–13 years (M = 8.4; SD = 2.5) were recruited from a database of interested participants. Limitation of participants was done based on valid driver’s licenses, none motion sickness, and normal or corrected-to-normal vision. All the participants were compensated with 200 RMB for their efforts.

The experiment was conducted in a high-fidelity motion base driving simulator with 6-degree of freedom (Figure 1(a)). The visual system was a high definition ring screen projection providing a 180 × 30° (horizontally × vertically) visual field. The cabin is modified by original car Besturn B50 with high-precision vehicle dynamic simulation model. The driving simulator was equipped with an automatic gearbox. Participants only needed to operate gas pedal, steering wheel, and brake pedal. And the CAN bus communication and electrical system from the original car were connected to the driving simulator.

The participants’ eyes-off-road duration was manipulated by semi-self-paced vision occlusion method. PLATO (portable liquid-crystal apparatus for tachistoscopic occlusion) goggle was used to achieve occlusion (Figure 1(b)). The default state of the google was open. Participants could occlude themselves for a predefined certain duration by pressing a button attached to their finger. And they would be told the oncoming occlusion duration by the experimenter during the neutral phase.

Eye glance data were recorded through a head-mounted eye tracker (Dikablis eye-tracking system V3.0, Ergoneers GmbH, Germany) (Figure 1(c)). The eye tracker measures eye movements with 60 Hz and displays them superimposed on a scene view video recorded by a camera on the front of the eye tracker.

2.2. Experiment Design and Driving Task

The experiment followed a 3 (scenario) × 3 (occlusion duration) within-subjects design. Urban road, rural road, and motorway were chosen as the three tested scenarios. The limited speed for each type of road was 60km/h, 90km/h, and 120km/h, respectively. There were intersections, pedestrians, parked cars, dense oncoming vehicles, and bus stops in the urban road. In contrast, no other interacting traffic presented in the motorway except discrete oncoming vehicles in the opposite lanes and curves, while moderate oncoming vehicles, traffic signs, and sharp curves existed in the rural road. Thus, urban road was regarded as the most demanding followed by rural road, and motorway was the least demanding. The three predefined different occlusion durations were 0 s, 1 s, and 2 s. Each participant completed 9 combinations, henceforth called subtrails, whose order was counterbalanced across participants to avoid learning effect. Each subtrial was about 5 minutes long, which were connected with each other by a 200 meters long neutral phase. During the neutral phase, participants do not need to occlude themselves. But countdown time ticker was displayed at the bottom of front display during the neutral phase to notify the participants for the start of next subtrail.

2.3. Procedure

On arriving at the simulator lab, each participant was given an instruction about the experiment, and then he/she signed an informed consent document. After that, the participant sat in the simulator and wore PLATO goggle with microswitch for occlusion and eye tracker which was calibrated for each participant. Then participant practiced the operation of the driving simulator for approximately 5 minutes until he/she got used to occlusion durations in the three scenarios. Following the practice driving, participants conducted all the nine subtrials, which took approximately one-hour long. During the whole experiment, participants were instructed to follow the limit speed in each scenario. Furthermore, they were told to close the goggle when they felt they did not need information anymore, but safe driving and obeying traffic regulations received priority.

2.4. Analysis

Vehicle movement status data and scenario variable parameters were logged at 10 Hz directly from the driving simulator and were time synchronized with eye-movement data and occlusion behavior data. Data from three participants were excluded from the analyses due to poor eye-tracking quality and data logging problems. The data was only analyzed for each subtrial, and all the neutral phases were excluded.

The visual behavior was only analyzed for the segment of 10 seconds before each occlusion. It was manually analyzed frame by frame to encode all glances of each sub-trail based on the recorded video stream. During the encoding process, one occlusion case and five areas of interest (AOIs) were defined: “Occlusion,” “Forward,” “Oncoming” (oncoming vehicles), “Speedometer,” “Mirror,” and “Others.” Appreantly there was no occlusion in baseline. “Others” refers as intersections, pedestrians, parked cars, traffic signs, curves, bus stops, and other areas not related to current driving. The value of each AOIs was set individually per participant depending on visual inspection of the gaze video in order to eliminate logging inaccuracy. Eye movements were analyzed in accordance with the ISO-metrics number of glances and glance duration. A glance is defined as the maintaining of the gaze within an AOIs; it lasts from when the gaze moves towards an AOIs to the moment it moves away from the AOIs [22].

Glance transition could be referred to as the driver’s glance location transition and glance state transition. Glance location transition referred to driver’s glance location shifted from one AOIs to another AOIs in spatial domain [23]. It only contains glance location change information without information about glance duration. That is, it could be used to model where participants looked but have no idea about how long they looked in an individual glance. Glance state transition was defined as glance state changed from one moment to next moment in time series based on frame-by-frame analysis. It was suitable for modeling both glance location and glance duration. In the present research, the 10-second observation sequences were discretized into 100 state samples (spaced 0.1 seconds apart) to model glance duration, as the lowest-resolution sampling was 10 Hz in the video software. So, glance state means where does driver look in 0.1-second duration in the present study. Glance state transition is glance location changes from time x to time x+0.1 s. As a result, it involves large parts of self-transitions. Both glance transition probability matrices in the study included 6 × 6=36 elements, as 6 glance targets were defined here. They were both computed to model characteristics of visual behavior for scenario identification and eyes-off-road duration prediction.

3. Dataset and Modeling

The driver’s visual field was divided into 6 AOIs. It means the gaze sequence of the 10-second observations could take on 6 values. And each value in the observation sequence was regarded as a discrete variable in the HMM. Both scenario identification and eyes-off-road prediction were modeled as ternary classifications problem as three driving scenarios and three eyes-off-road durations were defined here.

For all occlusion cases, 10-s before each occlusion were extracted as observation sequences in occlusion conditions. Observation sequences in baseline were extracted based on occlusion locations in occlusion 1-s and occlusion 2-s conditions. To be specific, the locations of extracted observation sequences in baseline were the same as the other two occlusion conditions. A total of 3000, 2215, and 1425 observation sequences were extracted from baseline, occlusion 1-s, and occlusion 2-s condition separately. As a result, 6640 observation sequences were obtained in all. After sorting out the 6640 observation sequences according to driving scenarios, a total of 1587, 1725, and 2162 observation sequences were extracted for urban road, rural road, and motorway separately.

The final dataset was divided into training dataset and verification dataset with a ratio of 9:1. So 90% of the final dataset was randomly selected to train the model, and the rest 10% of the final dataset was used to verify the recognition performance of the models. Concretely, 90% of the observation sequences from baseline, occlusion 1 s, and occlusion 2 s were separately selected as the training dataset for identifying the current driving scenario. And 90% of the observation sequences from urban road, rural road, and motorway were separately chosen as the training dataset for predicting eyes-off-road durations. Thus, the impact of other variables on the accuracy of identification model and prediction model caused by data randomly selected could be reduced.

Due to the unknown distribution of visual attention transition probability in different driving conditions. It was difficult to assume a priori temporal and spatial structure of the driver's visual behavior under different driving conditions. Therefore, HMM inference method was used here to model driver’ visual behavior. That is, there was no restriction on the participant's glance transition sequence in specific driving conditions. On the contrary, the models were allowed to learn any visual behavior patterns during training. In the present research, one HMM was trained per condition. That is, three HMMs were established for driving scenario identification and three HMMs models were trained for eyes-off-road duration prediction separately. The detailed process was shown in Figure 2.

The trained models for driving scenarios identification and eyes-off-road durations prediction returned the maximum likelihood estimate by calculating the log-likelihood estimates for the 10-s observation sequences. Each model contained three HMM classifiers (one each for the driving scenario, and one each for the eyes-off-road duration). When a new observation sequence was added to the established classifiers, the log-likelihood value of each classifier was calculated, and the maximum log-likelihood probability generated in the HMM classifiers was returned. Thus, the updated HMM classifiers could best interpret the new observation sequence.

4. Results

Occlusion number and glance number to each AOIs were first analyzed to see the occlusion behavior and visual behavior per scenario and occlusion duration. Two-way repeated-measures analyses of variance identified significant main effects for scenario (F(2, 44) =26.7, p<.05) and occlusion duration (F(1, 22)=19.5, p<.05) on number of occlusions. It could be seen from Table 1 that number of occlusions decreased with the demanding of driving scenarios. And variations of occlusions were larger in less demanding scenarios. In addition, number of occlusions was less in occlusion 2 s condition than that in occlusion 1 s condition.

Histograms of total glance number to different AOIs, divided by different driving conditions (baseline, occlusion 1 s, and occlusion 2 s) per scenario were displayed in Figure 3. As expected, drivers looked at “Forward” more times in all three driving scenarios. A repeated measures analysis was conducted for the number of glances towards “Forward” with the factor of driving scenario and occlusion duration. It was found that participants glanced more frequently to “Forward” in motorway (F(2, 44) =20.1; p < .05). Corresponding analyses of variance were also conducted for the number of glances to “Oncoming,” “Speedometer,” “Mirror,” and “Others.” Participants directed their gaze to “Others” and “Oncoming” more often in urban road and rural road, respectively. It was notable that attention was shifted more frequently to the “Speedometer” in occlusion conditions, which was even more in occlusion 2 s condition (F(2, 44) =15.6; p < .05). Also, “Speedometer” received more glances in scenarios with higher speed (F(2, 44) =13.7; p < .05).

Glance transition probabilities between different gaze targets involved in the defined 6 AOIs were presented in Figure 4. It provided a visualization of the driving scenarios and eyes-off-road durations classification problem. The three subfigures in the top row illustrated the glance location transition matrices for urban road, rural road, and motorway separately. As expected, participants shifted their attention from “Forward” and “Oncoming” to “Others” in urban road. While, the spatial distribution of visual attention in rural road illustrated that glance location changed from “Forward,” “Speedometer,” and “Others” to “Oncoming” more frequently. And there was a high percentage of glance transitions from “Occlusion” and “Forward” to “Speedometer” in motorway. The three subfigures in the bottom row described the glance state transition matrices (including both glance location and duration transition characteristics) for three different eyes-off-road durations. It could be seen that participants glanced “Forward” for longer time in baseline. Except “Speedometer”, self-transition of all other AOIs increased with the increasing duration of occlusion. These matrices interpreted the aggregate differences in glance transitions driving with different eyes-off-road durations in different driving scenarios. And these discriminating characteristics could be used to identify driving scenarios and predict eyes-off-road durations.

The performance metrics of established models were given in accuracy as the percentage of correctly classified sequences. Table 2 showed classification results (average accuracy) for driving scenario identification and eyes-off-road duration prediction with the input of different visual behavior features. Combination was short for combination of glance location and glance duration features. As can be seen in Table 2, the HMM showed high performance in both distinguishing urban road, rural road, motorway, and predicting drivers’ eyes-off-road durations. As expected, when using the combination of glance location and duration as the input, the trained models resulted in the highest accuracy. Over 90% accuracy was achieved in both distinguishing driving scenarios and predicting eyes-off-road duration with the input of combination features. This strongly supports the idea that the driving scenarios and driving behavior modality may be characterized by where the drivers direct their gaze and how long they decide to allocate attention.

Furthermore, the HMM fed up with glance location sequence also showed strong performance with accuracy of more than 85%, which was better above that of a random classifier. Particularly, the model with input of glance location features almost showed similar performance with the features of combination in driving scenario identification. It implies that little performance improved when adding the glance duration sequence. Therefore, sequences of glance location were suitable to identify current driving scenario, whereas the combination of glance location and glance duration sequences was the best feature to predict driver’s eyes-off-road duration.

In addition, confusion matrices of each classifier were calculated during validation procedure to explore the origin of the misclassifications above. As shown in Figure 5, motorway was the easiest to be recognized compared with the other two driving scenarios. Moreover, urban road was also clearly distinguishable from motorway, which suggested that the visual behavior in urban road was obviously different from that in motorway. However, there were still a few cases that were difficult to identify between the two driving scenarios. Clearly, the main difficulty of the driving scenario classification stemmed from the rural road identification with the other two scenarios. In short, rural road was confused with the other two driving scenarios. Especially, there were about 18.5% occasions that have been misclassified as urban road.

Figure 6 illustrated the confusion matrices in eyes-off-road durations prediction using combination of glance location and duration sequences. It could be seen that baseline driving was clearly distinguishable among the three conditions with an accuracy of over 90%. Also, occlusion 1 s and occlusion 2 s could be predicted with an accuracy of 74.7% and 88.2% from baseline driving, respectively. Noticeably, when driving in baseline and driving with occlusion 2 s, there were 7.7% and 9.7% cases that have been treated as occlusion 1 s, respectively. And there were 18.1% cases in occlusion 1 s that were misclassified to be occlusion 2 s. Only 74.7% occlusion 1 s cases could be recognized accurately. This explains that the main resource of eyes-off-road misprediction comes from the prediction of two occlusion conditions. And HMM could be used to predict driver’s eyes-off-road duration over 1 s.

5. Discussion

The current study explored to predict driver’s eye-off road duration in different driving scenarios by building Hidden Markov Models (HMM) using driver’s visual behavior. It was based on the assumption that driver adopts different visual strategies in different scenarios [24]. The experimental results showed that occlusion behavior was well in line with Kujala et al. findings [25]. And HMMs could be used to predict driving state with the input of drivers’ visual behavior sequences [26]. Features of glance location sequence resulted in high performance in driving scenario identification. This implies that differences in glance location sequences are effective in driving state recognition and could be used to identify current driving scenario. As for predicting driver’s eye-off-road duration, the trained models achieved ideal accuracy by using combination of glance location and glance duration sequence as in put of HMM. It may due to that drivers’ eyes-off-road duration is related with both where and how long drivers direct their gaze.

Confusion matrices of each classifier were calculated to seek for the origin of the misclassifications [18]. It illustrated that there were some misclassifications in rural road identification. It may because the contextual factors for visual attention in rural road were similar to that in the other two scenarios. As a result, participants employed similar glance patterns. Concretely, participants usually shifted attention to the forward road and oncoming vehicles after looking other targets in rural road and urban road. Also, participants allocated their attention to speedometer and forward road in a similar way when driving on rural road and motorway. In addition, occlusion 1 s was misclassified to occlusion 2 s condition much. It may come from the fact that participants allocated their attention to sample enough information to prepare for the oncoming occlusions [27]. And the participants adopted similar visual search pattern when driving with occlusion 1 s and occlusion 2 s.

The findings of this study may potentially contribute to the development of proactive distraction mitigation system that recognizes the current driving scenarios and predicts driving distraction. If it predicts the driver to shift his/her attention away from road exceed the predefined threshold in certain driving scenario, suitable warnings will be provided.

There are some limitations that needed to be considered in this study. First, the experiment employed occlusion method to simulate drivers’ eyes-off-road durations in a driving simulator. So, the results may not be able to directly compare to on-road driving with actual additional driving tasks due to lack of any actual crash risk and peripheral vision [28], which may interfere with drivers’ visual behavior. It is well known that drivers’ glance behaviour differs strongly between different situations [29]. Given the artificial simulated scenario and rather limited number of participants, the validity of presented results in other situations should be taken into account, and to what extent the trained HMMs hold for larger naturalist data should be examined. Second, the time window was set 10 s which was a bit long to identify driving sate in real time. Efforts should be made to shorten the time window in order to distinguish driving scenario and predict drivers’ eyes-off-road duration timely. Alternatively, sliding window-based algorithms could be employed to integrate glance history with glance duration and location to boost the classification performance. Third, HMM was the only technique that used in this study, no comparison with others was made. Higher performance may achieve with alternative techniques. Future work should compare this modeling technique with other machine learning algorithms such as Recurrent Neural Networks (RNN), Random Forest, Dynamic Bayes Classifiers, and Decision Trees. In the future, this system can be extended by using GPS and vehicle sensor data, which may promote scenario identification performance.

6. Conclusions

The present study applied Hidden Markov Model-based framework to identify driving scenarios and predict drivers’ eyes-off-road durations using sequences of glance location and duration in a driving simulator. The models developed in this study generated promising performance with the input of glance location sequences in driving scenarios identification. And using both sequences of glance location and glance duration as the input of HMM achieved the highest accuracy in predicting driver’s eyes-off-road durations. Furthermore, the motorway was the easiest to be distinguished from the other two driving scenarios. However, urban road and rural road shared over 10% similarity. Also, driver’s eyes-off-road durations over 1 s could be predicted from baseline driving accurately, whereas 18.1% cases of occlusion 1 s were misrecognized as occlusion 2 s. This study suggests that both glance duration and glance location could be used to reflect driving states, and HMM seems to be useful in driving behavior prediction. The findings, therefore, have practical implications for developing proactive driving distraction warning system and may contribute to improving driving safety.

Data Availability

The original data and data analysis programming used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This research is financially supported by the National Natural Science Foundation of China (no. 61473046).