Abstract

Road safety has recently been considered an important issue in the country. Single-vehicle accident statistics show the importance of this issue. From a safety viewpoint, drivers need to have a reasonable time window for hazard recognition and reaction; therefore, the hazard has to be in sight from a distance preferably longer than the standard minimum stopping sight distance. Nevertheless, if the roadside configuration makes the sight available for a very long distance, the hazard properties are the ones defining the visibility. The hazard size, color, and mobility are some of the most important hazard properties, which may mainly interact with ambient light (like being day or night) and driving speed. In this research, effect of hazard properties on driving accident likelihood was investigated in a condition that enough recognition and reaction time window was available for the driver to provide a ceteris paribus experiment. To fulfil that in a safe experiment condition, a driving simulator was used to test the behavior of 90 licensed drivers encountering an average of 14 hazards with various sets of properties. Based on the findings of this research, there are some interactions between influential hazard properties. The results imply that it is approximately 23% more likely to observe an accident when encountering a dark small stationary hazard at nighttime like a dark-colored with an observed size of 0.5 m × 0.5 m (e.g., a stone) than a major moving light-colored hazard in the daytime like a camel of 1.5 m 2 m in size. A green-colored hazard is 27% less likely to involve in an accident at nighttime than hazards with other colors. Each 10 km/h speed increment leads to 1.9% more accident likelihood, and every time the driver encounters a hazard, they will be 0.84% less likely to crash next time.

1. Introduction

The proliferation of road accidents has made the safety of any newly planned road to be one of the most important criteria of design. The statistics on single-car accidents suggest the significance of these types of road accidents. Research has shown that, in 2006, 883 accidents resulted in deaths out of 82,343 road accidents registered by the police in Iran, of which 43.17% was single-car accidents, accounting for 55.4% of all road deaths in Iran [1]. Car overturning, pedestrian-vehicle, animal-vehicle, or stationary object-vehicle accidents are some instances of single-vehicle road accidents. This type of accident occurs as a result of an error in the interaction of the road and the environment with the driver’s limited capacity regarding hazard recognition [2].

In the road design procedure, to provide for the interaction between recognition abilities of drivers and hazards and to make some error mitigations possible, a perception-reaction time (PRT) is used to calculate enough braking sight distance. This time is defined as a period during which the driver detects the hazard, perceives the danger (perception time), concludes that there is a need to stop, and finally decides to press the brake (reaction time). This period starts when the objects become visible first until the driver decides to brake [3]. Some of the most influential factors on driver behavior and subsequently the length of PRT are the driver’s anticipation of impending hazards, the driver’s age and sex, driving workload, and driving emergencies [4].

According to the NCHRP Report 600 regarding the human factors in highway systems, influential factors on PRT include color contrast between the hazard and the environment, light glare, the driver’s anticipations, road visual complexity, drivers’ experience and familiarity with the road, the driver’s age, and complexity of the hazardous situation [5]. Although these factors are deemed to affect all types of hazardous situations in roads, including non-vehicular hazards, little is known about non-vehicular hazards themselves. In other words, the influence of the characteristics of a non-vehicular not-anticipated hazard and how these characteristics have interaction still need to be investigated. Thus, in this study, we aim to pay a more precise attention to these hazards and their characteristics.

Regarding the tools to investigate influential factors in road safety, researchers have exploited several mathematical and probabilistic statistical models. They have used different methods for different aspects of interest. Probit and binomial logit models have been frequently used to study the severity of accidents [68]. For severity models, some more sophisticated instances of logit and probit like “mixed logit model” and “random parameters bivariate ordered probit” have been proven useful in the literature [9, 10]. When the count of accident is the case, generalized linear regression models, particularly Poisson, negative binomial model, are widely used owing to being custom-built for count data [11]. Some researches concern the probability of accidents. In these cases, several approaches have been employed. Sometimes, accident prediction indices are developed, and sometimes models with an output of probability nature are used like the logit model [12]. In other researches, descriptive statistical methods are used to identify human factor influences on accident probability [13]. Furthermore, analytical methods like Gaussian models are exploited to illuminate the effect of spatiotemporal variables on accident probability [14]. In the case of prediction with high accuracy, specialized models like neural networks have helped researchers [15]. Owing to the theoretical base of binomial logit models and their binomial nature of the dependent variable, they are suitable to model probabilistic two-condition phenomena like the state of an accident being observed or not [16].

According to the aforementioned studies, several groups of affecting factors have been investigated, and different results and conclusions have been reached amongst which some instances of factors affecting the count and likelihood of accident are as follows: Das et al. found the driver’s behavior effective and claimed that it was more likely for “accident-prone” drivers to engage in an accident than others, particularly when the driver violated driving laws [17]. Theofilatos studied the real-time traffic and weather conditions and found these factors considerably influential on accident likelihood and severity [18]. In addition to weather and traffic conditions, according to Ahmed’s study, the geometry design of roads is associated with accident occurrence [19].

Researchers have made much effort to understand factors affecting the accident likelihood. However, there are still some questions to ask such as “Do hazard properties have any significant and important and distinguishable effect on the driver’s behavior and accident likelihood? hazards like pedestrians, animals, and fixed objects);” “Are properties such as mobility, size, and color influential?”

Using field data, studying human behavior has many difficulties since this type of data has limited evidence to show about the details of the driver’s reaction, recognition, and decision-making when encountering a hazard. In this regard, technology introduces the driving simulator as a means of considerable help. Simulators are a branch of virtual reality putting the user in a virtual environment and making them feel like a real environment [20]. A driving simulator putting the driver in a simulated environment provides controlled laboratory conditions for experiments and prepares a safe condition for human factors in transportation safety experimental studies [2023].

According to previous studies, driving simulators are mainly used in studies on the driver’s behavior. Abdel-Aty, using a simulator, found that the use of variable speed limit and variable massage signs affects the speed choice and speed dispersion and leads to more uniform speed choices in the road [24]. Bella, by driving a simulator, showed that the driver’s behavior was affected only by cross-sections and geometric elements, not by roadside configurations. Although the presence of trees along the road represents a factor increasing the severity of run-off-road accidents, drivers do not change their behavior when barriers are not present [25].

Calvi in a before-after study using a driving simulator demonstrated the effectiveness of perceptual treatments, especially red PTB, in enticing drivers to reduce their speed while approaching the sharp curve under study or driving through the curve [26]. In addition, a recent study shows that most indicators are valid in the driving behavior research in the work zone (driving simulator). For example, spot speed, car-following distance, headway, and reaction delay time show absolute validity [27].

To answer the questions about the effect of hazard properties on the driver’s behavior in this research, real drivers’ reactions to non-vehicle road hazards have been investigated in a simulated virtual environment. A driving simulator was used to provide a controlled experiment condition as well as precise and detailed hazard engagement and reaction data. Statistical methods like regression analysis were then used to extract information from the data and answers to the research questions.

2. Materials and Methods

In this research, the situation of encountering some non-motor-vehicle road hazards was prepared in a driving simulator to study the effect of their properties on the driver’s behavior. Then, the effects were investigated using mathematical modelling methods.

2.1. Apparatus

This research employed a fixed-based car driving simulator, including the front half of a Saipa Pride, which is a popular car commonly used in Iran; the stimulator consisted of all in-car equipment and three monitors with an intermediate angle of 135 degrees. The experiments were conducted in a calm controlled room with a fixed temperature and illumination conditions. Figure 1 shows a view of the driving simulator and a driver taking the test.

2.2. Scene and Scenario

The test route was a two-way three-lane separated straight road with a 3.6 m standard lane in width in a plane terrain with a view of soil covered roadsides with thin vegetation and some dispersed trees, hills, and traffic signs. The road was paved with asphalt and separated from the opposing side of the road by a guardrail-protected median. Light posts were placed in the median.

A total of 60 different hazard engagements in 4 separate test scenarios were prepared. For each of 4 scenarios, Table 1 lists the number of hazard engagements, length of the route, and state of being day or night. An average of 14 hazard engagements was due in every scenario. At the beginning of each of them, there has been a 5 km warm-up length in which no engagement was provided in order for the participant to prepare and adapt himself to the virtual reality environment. The drivers were told they might face some unknown hazards in the road as in the real world so that they had to be ready to react.

The hazards in the experiment had differences in terms of properties such as size, color, and mobility. These hazards included some adult and child pedestrians, some animals like camels, cows, and cats, and some fixed hazards such as rocks.

2.3. Participants

To choose test-takers, several over 18-year-old licensed drivers from different groups of occupational backgrounds like scholars, professional drivers, engineers, other freelancers, and novice drivers were invited from whom 90 individuals accepted to participate in the experiment. Tables A-1 to A-4 in the Appendix (Supplementary Materials) show the dispersion of the participants’ age, sex, marital status, and driving experience. Since the focus variables of this study are related to the hazards’ characteristics, the participants’ characteristics were not influential in choosing the number of participants. In that case, 90 participants were considered sufficient, since in rare cases in similar simulator studies in the literature, the number of participants has exceeded 50 [21, 27, 28].

3. Linear Probability Model (LPM)

In a linear regression mathematical model, if a binary variable is set as the dependent variable, the model outcome will be the probability of observing a “1” (the event of success) in some particular states of independent variables. In such a case, equation (1) is true:

In the LPM model, the parameters (variable coefficients) show the amount of change in the success probability with one unit change in the independent variable as it can be seen in the following equation:

The LPM model output is a continuous value of probability, which sometimes takes a value smaller than zero or greater than one. Furthermore, this output may sometimes be desired as a binary discrete variable. In that case, assuming a value like the average of output is estimated as a threshold, the estimates greater than this threshold are assumed as a 1 or success, and those smaller than that are assumed as a 0 or failure.

Before interpreting the LPM model, first, it is needed to confirm the Gauss–Markov assumptions of multiple linear regression consisting of five basic assumptions, namely, linearity in parameters, random sampling, zero conditional means of the error term, no multicollinearity, and no heteroskedasticity. If these assumptions are confirmed, the model is assumed unbiased and can be interpreted.

4. Model Estimation

The effects of the hazards’ properties were investigated by mathematical modelling using experiment data, and the research hypotheses and main questions were addressed.

4.1. Variables Definition and Description

To investigate the effect of hazard properties on accident likelihood, a binary variable named “accident” was used as the dependent variable, which took a 1 when an accident was observed and a 0 otherwise. The independent variables used in the modelling process are defined and described in Table 2.

4.2. Linear Probability Regression Model (LPM)

The LPM model was developed regarding the “accident” as the dependent variable, and a desirable LPM model was developed through a trial-and-error process using different combinations of other variables. In this model, the multiplicative variables were used to catch the interactive effects of these variables. Table 3 shows the variables, their corresponding estimated parameters, and other important statistics.

In the resulted model, all values were less than the critical significance level of 0.05, and the was equal to 0.5830. The overall model value was 21.27, and the ratio of correct predictions was 0.7041.

4.3. Linear Regression Gauss–Markov Assumptions Confirmation

To validate the unbiasedness of the model and the hypotheses testing statistics like t value, the five basic Gauss–Markov assumptions of linear regression have to be confirmed as previously described. The first three assumptions, including linearity in parameters, random sampling, and zero conditional mean, are confirmed if the sampling is random, and the OLS estimation method is used validly. The estimated parameters are linear, the sampling method is explained previously, and Table 3 shows the conditional mean of the model error term, which is practically zero.

The fourth Gauss–Markov assumption, no multicollinearity, is investigated using the variance inflation factor (VIF). The estimated VIFs for this model corresponding to each variable are all less than 10; therefore, there is no multicollinearity in the model, and the 4th assumption is confirmed (see Table A-5 in the Appendix (Supplementary Materials)).

The models are usually inspected regarding the 5th Gauss–Markov assumption, no heteroskedasticity, using various statistical methods like the Breusch–Pagan test. Since the observed dependent variable in the LPM model is a binary variable, the error term in the calculation will be divided into two groups of success and failure observations, and the mean of each group shows a difference of 1 unit. This condition leads to a positive heteroskedasticity test outcome even if there is no varying variance. In that case, it is needed to check any correlations between the independent variables and the error term of the model. If there is no correlation, the 5th assumption is confirmed, and the estimates are suitable for inference. In this regard, the correlations were investigated, and near-zero correlations were found between the error term and the independent variables, whereas the dependent variable was highly correlated with the error estimates. This is shown in Table A-6 in the Appendix (Supplementary Materials).

5. Results and Discussion

The outcomes of this model are in essence the likelihood of event of the accident. Although this estimate has a probabilistic nature and should have a value between 0 and 1, 91 observations had an accident likelihood estimation of less than zero. Approximately, 79% of the predicted accidents were correct. The statistical description of estimations of the model is shown in Table A-7 in the Appendix (Supplementary Materials).

5.1. Goodness of Fit

As stated in Table 3, the of the model as a measure of goodness of fit has been estimated (58.3%), which is an acceptable amount relative to the number of variables in the model and the nature of the study issue.

The ratio of correct predictions also supports the desirable fit of the model. The average of the estimated outcome of the selected model as the threshold of discretization indicates that this model has predicted correctly 70.41% of all observations and 79.01% of accident observations (i.e., 128 correct accident predictions out of 162 observed accidents).

5.2. Parameters Significance

The significance investigation of the model is conducted by evaluating the value corresponding to t and F tests stated in Table 3. The values are all less than 0.05 supporting the statistical significance of all variables and the whole model.

5.3. Results

After evaluating the significance of the overall model and parameters, the results can be investigated and interpreted.

5.3.1. Speed

The parameter corresponding to speed is estimated to be 0.0019. This value means that a 1 km/h increase in the speed will result in a 0.19% increase in accident likelihood, and a 10 km/h increase will lead to a 1.9% higher accident likelihood. This increasing effect is logical and consistent with the previous research.

5.3.2. Day

The corresponding parameter to day speed is estimated to be −0.2467, which means that the accident likelihood is 24.67% less in the daytime than at night. This difference is soundly consistent with common sense and the literature.

5.3.3. Mobile

This parameter has been estimated at −0.1105. As Table 4 shows for a car passing by with a specific speed, encountering a mobile hazard with a specific size at night leads to 11.05% less accident likelihood than a non-moving one. For such a car, encountering a non-moving hazard in the daytime is associated with a 24.67% less accident likelihood than at night and results in 35.73% less accident likelihood for a moving hazard in day compared to a fixed hazard at night.

5.3.4. Size

There are four different conditions derived from the combination of the mobility of hazard and time of day affecting the influence of the size of hazard. Table 5 shows the marginal effect of the size variable by those combination conditions. In accordance with this table, an increase in the hazard size by 1 m2 will decrease the accident likelihood from 0.37% to 5.45%.

5.3.5. PEN

Since drivers’ behavior in relation to hazard properties was the subject of study in this research, the previously encountered number of hazards (PEN) variable was defined to account for the drivers’ ability of learning from previously encountered hazards. The parameter estimated for PEN variable is equal to 0.0084, which means that 1 more time of encountering a hazard leads to 0.84% less crash likelihood.

5.3.6. Colors

In this research, the effect of the green and yellow colors of a hazard on accident likelihood has been investigated. Using two multiplicative variables, the interaction between these colors and the time of day (being day or night) is explored. To evaluate the effect of these colors at night and day, there will be four different conditions, which are shown in Table 6.

As Table 6 clearly shows, both colors have decreasing effects on accident likelihood, and the effects at night are considerably more than in the daytime. According to these results, the green color of hazard leads to approximately 5% more decreasing effects than those in the daytime, and the yellow color decreases the likelihood by approximately 10% more in the daytime than at night.

6. Model Validation

For model validation, about ten percent of the whole data was reserved, which contained 118 observations, including 30 records of accident. The accident likelihood of each observation was calculated using the model, and then, these probability amounts were discretized by setting the estimation mean as threshold. Then, the observations and these predictions were compared, which showed 56.78% correct predictions. It was also seen that the model correctly predicted 27 accidents out of 30 equal to 90% of all validation data accidents; therefore, the model can be considered valid in the scope of data.

7. Conclusions

A slightly similar study has been conducted on the impact of hazard specifications on accidents; however, some studies have noted the impact of hazard properties on the times affecting accidents. Like the research study by Asadamraji et al. [15], which showed the effects of color, mobility, size, and contrast with the environment on the driver’s hazard perception, previous studies like the one conducted by Hooper and McGee [3] indicated that the main parameters in the driver’s perception of reaction time are the driver and road characteristics. Perception reaction time can be effective in preventing the occurrence of an accident and some parameters such as the color of danger and mobility and ambient light in NCHRP Report Number 600 and the research by Campbell et al. [29]. Our modelling results indicated that the color and mobility of hazards and the environment light influence single-car accidents.

Our findings have proven the effect of hazard color and mobility in rural road single accident, as demonstrated in the research by Krauss et al. [30] and Levulis et al. [31], with respect to overtaking time detection. Our findings confirm the result of Wogalter et al., [32]. The main difference between this research and the studies by Wogalter et al. [32] was that their studies focused on the colors of signs, but we examined the colors of other fixed and moving hazards.

In addition, our findings show that an increasing vehicle speed in high-risk situations, such as work zones, increases the likelihood of an accident. However, the research by Zhang et al. [27] showed that increasing or decreasing speed in this situation depended on design, strategies, or considerations in that environment.

In various road safety studies in which driving simulators were used such as those conducted by Calvi et al. [26] and Bella [25], the environmental conditions and their effects on driver perception were examined. Nevertheless, in our study, changes in hazard properties that can pass the width of the road in addition to the environment were also investigated, and its consequences on the probability of an accident were assessed. The common part of all research was the use of the speed parameter in the analysis.

The results of the evaluation of different models in our research demonstrated that, in the analysis of single-car accidents, the linear probability model is preferred to Poisson and negative binomial models used in the research by Lu and Tolliver [11] and probit and binomial logit models in the research conducted by Deng et al. [6] and Yu and Abdel-Aty [8]. However, it is important to note that the way we collect our research data compared to the research mentioned is different. Paying attention to the normal distribution of error data and how to convert aggregate to binary mode is highly important. In particular, LPM is very useful, since we can interpret the parameters on the probability scale, which is the scale of interest.

Owing to some practical limitations, the researchers have considered a limited number of factors regarding hazard characteristics. However, the driver’s characteristics like the ones used in Luo et al. [33] and Asadamraji et al. [2] along with the hazard characteristics are suggested to be further investigated. In addition, fuzzy models used in the study can be considered for single-vehicle collisions.

Finally, it should be mentioned that the scope of the findings of this study is valid in the limits of the simulated environment and the sociodemographic characteristics of the participants as described. One of the major limitations of this study is the number of participants. In this study, due to budget constraints and the difficulty of people visiting the laboratory, it was impossible to increase the number of participants more than the mentioned number. Moreover, the accident variable here is defined as the occurrence of a severe incident after which neither the driver, if alive, nor the car would have the ability to continue the trip. In the future, it is recommended to investigate such occasions of milder incidents in which the driver has made an urgent reaction, but no property or health damage is made. Investigation of the findings in a naturalistic driving study paradigm is suggested.

Data Availability

The research data are available in the .XLSX format file. They are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This research was funded through the grant awarded from the University of Tarbiat Modares as a common grant for every graduate university student.

Supplementary Materials

In this part, some detailed descriptions of the participants, variance inflation factors, correlations between the variables in the model and a description of the models’ outcomes are presented. Table A-1: participants’ age classification. Table A-2: participants’ education. Table A-3: participants’ sex and marital stage. Table A-4: participants’ driving experience classification. Table A-5: variance inflation factors of model variables. Table A-6: correlations between variables in model and the error term of the model. Table A-7: statistical description of estimations of the model. (Supplementary Materials)