Abstract

The purpose of this study is to minimize the negative influences of the severe traffic accidents in China by profoundly analyzing the complex coupling relations among accident factors contributing to the single-vehicle and multivehicle traffic accidents with the Bayesian network (BN) crash severity model. The BN model was established by taking the critical factors identified with the improved grey correlation analysis method as node variables. The severe traffic accident data collected from accident reports published in China were used to validate this model. The model’s efficiency was validated objectively by comparing the conditional probability obtained by this model with the actual value. The result shows that the BN model can reflect the real relations among factors and can be seen as the target network for the severe traffic accidents in China. Besides, based on BN’s junction tree engine, five-factor combination sequences for the number of deaths and three-factor combination sequences for the number of injuries were ranked according to the severity degree to reveal the critical reasons and reduce the massive traffic accidents damage.

1. Introduction

Severe traffic accidents occur in random form regardless of time and space [1]. Mass casualties and high risk are two main distinctive features that can quickly differentiate severe traffic accidents from general accidents [2]. Besides, the enormous negative impacts of severe traffic accidents on public opinion and personal property security also need to be noticed by the traffic administration and scholars in the field of traffic safety [35]. However, the studies of severe traffic accidents are lacking, no matter at home or abroad. Although several relevant researches and policies have been carried out because of the high frequency of severe traffic accidents in foreign countries [6], there are still deficiencies in understanding the critical contributing factors and mechanisms of extraordinarily severe traffic accidents in China [7, 8]. Besides, the tool of early accident prevention and emergency rescue is not perfect, which weakens the prevention ability. Therefore, it is necessary to research severe serious traffic accidents deeply [911]. Exploring the occurrence law of serious accidents and taking effective prevention measures play an important role in reducing the severity and improving road safety in China [12].

Drivers’ behaviors are considered as the main factors causing traffic accidents in early studies. Some scholars believe that drivers’ illegal behaviors significantly impact road traffic safety [13], and drivers themselves are related to accidents [1416]. For example, Shinar in Israel has verified that the use of seat belts is positively correlated with the age and education level of the drivers [17]. Vehicle conditions, road conditions, environmental conditions, and more and more social and economic factors have been gradually taken into account to study the impacts of fatal traffic accidents [1820]. Researchers from Japan have evaluated the traffic safety of 46 prefectures in Japan and concluded that natural binding force, such as social rules and social capital, could reduce the dangerous driving behaviors [21]. Researchers from several European countries have deeply integrated police investigation data and accident reports and established an accident information collection and analysis system to analyze numerous traffic accidents in Europe [2226]. It is found that the proportion of accidents caused by “vehicles driving off the roads” is up to 70% [27]. Peng and Boyle et al. have studied traffic accidents based on Washington’s accident database and found that speeding, fatigue driving, distraction, and driving without seat belts would affect the occurrence rate of accidents by prediction of the logical model [28]. Theofilatos et al. have analyzed the influencing factors of road accidents in urban and suburban areas based on Greece’s accident data and found the influencing factors of road accidents in urban areas are the drivers’ age, location of the intersection, and bicycle parking, while the main influencing factor of road accidents in suburban areas is weather condition [7]. Aidoo et al. have studied the relationship between road condition and accident frequency and found that lighting conditions at night, road alignment, and weather conditions significantly affect traffic accidents [6, 29]. Zhao and Deng have studied the characteristics and development trend of expressway accidents based on the annual statistical accident reports from 1995 to 2010. The results show that the factors of weather, region, time, and vehicle type contribute to the traffic accident [30]. The Bayesian network (BN) is widely used in sample learning methods, network structure construction, reasoning mechanism learning, and so on because of the powerful reasoning function [31]. In traffic accident safety analysis, BN is widely used to analyze the causes of maritime traffic accidents and road traffic accidents [3234]. In algorithm solving, the genetic algorithm is introduced into BN’s incremental learning, which alleviates the local extremum problem in the searching process [35]. A loop deletion algorithm considering KL spacing is also used to learn the structure of BN, eliminating the dependency on node order in the modeling process [36]. When establishing the Bayesian network model, the researchers have comprehensively considered the decision variables of solving the problem and the relationship among various factors. They have used the reasoning ability of BN to analyze the multiattribute decision-making problem in an uncertain environment [37].

Accidents’ research has been transformed from the initial single-factor analysis to multifactor analysis for a long time [38, 39]. However, several systematic reviews of the iteration among the influencing factors only consider the polymorphism of the consequences of accidents. Few in-depth discussions have been conducted on the mechanism by the objective data [40, 41]. This paper aims to identify the critical factors contributing to severe single-vehicle and multivehicle traffic accidents separately and explore the inherent relationships among different factors based on objective data. Through a comprehensive comparison of these factors, some recommendations can be made in this paper for active precaution system construction. Hence, an improved grey correlation analysis method and BN traffic severity model were constructed in this paper. Firstly, the weighted grey relational degree was used to determine the critical factors contributing to single-vehicle and multivehicle traffic accidents, respectively. Secondly, the BN model was constructed, taking the critical factors as the nodes and the inherent correlations as the links. Thirdly, the sample data was trained based on the continuous condition solved by the CH score learning theory solved with the K2 algorithm. Finally, the conditional probability based on Bayesian estimation was used to validate the model’s efficiency.

2. Data Description

2.1. Data Sources

The investigation and disposal report of accident reports in the production process has been published in China annually to record the accidents accurately and timely, whose transparency was required since 2014. According to the property loss and casualties, four traffic accident categories are shown in Table 1. The standards and collected accident data define this paper’s research objects, namely, road traffic accidents with ten or more deaths, including serious and extremely serious traffic accidents.

The data of 142 investigation reports were collected from investigation and disposal reports of accident reports in the production process from 2010 to 2016, available on the State and Provincial Work Safety administrations website in China. Besides, traffic accidents were divided into two categories: single-vehicle accidents and multivehicle accidents as the distributions of “occurrence time,” “occurrence location,” “vehicle type,” and “accident characteristics” are quite different [42]. Table 2 shows the raw data of some samples.

2.2. Data Virtualization and Discretization for the Sample Set

It is necessary to select the influencing factors before factor analysis to improve computing efficiency and highlight the correlation degree among factors. According to the 4M systematic theory principle, humans, facilities, environments, and management are regarded as the direct factors that play the dominant role in the accident occurrence. In this paper, these surveyed reports are taken as research objects to sort out the critical accident data, which are the basis for fatal traffic accidents study in China, mainly including four aspects and 35 items.

According to the BN model, construction requirements, classification, and coding need to be processed to virtualize and discretize the nodes’ attribute variables. The variables’ virtualization is an assignment of each attribute. The discretization is to map the assignment of continuous variables to the several mutually disjoint ranges. Referenced by the model construction experience of the investigation and disposal report of accident reports in the production process, the assignment result of node variables identified by the improved grey correlation is shown in Table 3.

3. Methodology

The basic idea of the traffic crash severity analysis model based on BN is firstly, determining the critical factors based on the improved grey correlation for network construction; secondly, clarifying the potential interconnectivity among network nodes and expressing directly through the network graph and the structure learning process based on the CH score adapted in this paper; thirdly, using node probability learning based on Bayesian estimation (BE) for validating the model’s efficiency. The flowchart of this BN model is shown in Figure 1.

3.1. The Improved Grey Correlation Analysis Method

Grey correlation analysis is a comprehensive evaluation method based on a grey theory using the correlation degree of comparison sequence and reference sequence to distinguish the evaluation objects. Traditional grey correlation analysis methods can be divided into three categories: Deng’s grey relational analysis, absolute grey correlation analysis, and relative greyness analysis. Deviation maximization theory is applied to enhance the traditional grey correlation method and then to overcome the limitation of traditional methods from a pure perspective through assigning weights. The application of deviation maximization theory can be described as follows:(i)Definition of comparison sequence and reference sequence.The accident factor set is defined as the comparison sequence , , n is the number of factors. The accident frequency, death rate, and injury amount are defined as accident description set : .(ii)Calculation of weighted grey correlation degree between and .Deng’s correlation degree , absolute greyness degree , and relative greyness degree are calculated at first. Then, the weighted grey correlation degree is calculated as follows:where are the weight of Deng’s correlation degree, absolute greyness degree, and relative greyness degree, respectively.(iii)Determination of weight coefficient.where , , and are the deviation value of , , and compared with other indexes, respectively. The calculation method is

The number of accidents , the number of deaths , and the number of injuries are usually selected to describe a traffic accident [43]. Hence, these three indexes were used as the reference sequence in the improved grey correlation model for the accidents’ feature. Moreover, the factors in Table 2 were inputted into the model also as the reference sequence. The flowchart of critical factor identification is shown in Figure 2.

3.2. BN Modeling

The interdependence of multiple factors in severe traffic accidents in network graphics can be studied with BN based on probability theory. BN is mainly composed of the Directed Acyclic Graph (DAG) and the Conditional Probability Table (CPT).

Utilizing the conditional relation among variables, the joint probability distribution can be formed with BN to reduce the complexity. Supposing that the random variable represented by node i is , then the joint probability of node i iswhere is the parent node of node . With the probability value of the input variable (evidence variable), the probability distribution of the output variable (query variable) can be calculated according to the existing network structure and CPT. Therefore, the logical relationship between node variables in the network model is manifested in the propagation of conditional probability, which makes it possible to analyze the network’s inference.

3.3. Structure Learning Based on the CH Score Method and K2 Algorithm

Structure learning is a data mining process, aiming to clarify the potential interconnectivity among network nodes and express directly through network graph. The principle is to construct the network structure according to certain grading criteria and searching strategies. Although the most optimized network structure is not always available, the accuracy, complexity, and robust model can be evaluated thoroughly. The model is expressed aswhere is the possible network structure; is the evaluation score; represents that structure meets the limitation of constrained requirements . Since the evaluation function used in this paper is based on BN, the most optimized network structure iswhere is the posterior probability of structure N under a given training data set D; is the corresponding prior probability.

The iteration steps for network construction are shown as follows:Step 1: The factors are selected as the initial network nodes.Step 2: An empty network is provided, and the node sequence of is supposed.Step 3: The score function is calculated and the parent nodes are updated by the nodes with more significant posterior probability and connecting.Step 4: Judging the number of parent nodes. If , continue search. Moreover, give priority to the other nodes without corresponding parent nodes, which must meet the requirement that the maximization of the new CH score function . If , then select the as the new parent node; else stop search.Step 5: The node variables and the parent nodes are connected to form the directional edge of the network.

3.4. Node Probability Learning Based on BE

Node probability learning is searching the parameters’ variables through data mining when the network structure is known. The parameter learning method of this paper is the BE, which can combine the prior knowledge and training data set to improve the model’s accuracy. The fundamental mechanism is as follows.

Supposing that the prior probability of network parameters is , this paper searches the parameters with maximum posterior probability through the training data set . Then, the posterior probability is calculated as

According to the law of total probability, . Supposing that the samples are independent of each other, ; then,

Because of the conjugate nature of Dirichlet distribution, the calculation complexity of this network model can be reduced significantly. Therefore, the Dirichlet distribution is usually used to improve the efficiency of .

When the network structure is determined, the probability relation among variables can be described by the conditional probability. Supposing that the prior distribution of each node variable is Dirichlet distribution, the Full-BNT toolbox in Matlab was used to learn the conditional probability under different contributing factors with the BN estimation method. Then, the junction tree engine in Matlab was used to combine the factor links. The model’s effectiveness was validated by comparing learning results and the actual results.

4. Result

4.1. Critical Factors Identification Result for Single-Vehicle and Multivehicle Accidents

The critical factor identification results based on the improved grey correlation analysis are shown in Tables 4 and 5.

According to Pearson’s correlation analysis principle, the factor with a coefficient of more than 0.75 is considered to have a significant effect [44]. Hence, the factors shown above were classified and organized according to an average weighted correlation degree of more than 0.75, which is the standard to build the set of key influencing factors for single-vehicle and multivehicle traffic accidents, respectively. In conclusion, the number of accidents is the dominant feature of the system, followed by the number of injuries and deaths, indicating that various factors have a considerable correlation with the number of accidents. Similarly, the number of accidents is the dominant feature of multivehicle traffic accidents, followed by the number of injuries and deaths, indicating that various factors correlate with the number of accidents. The critical factor set for single-vehicle and multivehicle traffic accidents was established through the weighted grey correlation degree, as shown in Figures 3 and 4.

4.2. BN Network for Single-Vehicle and Multivehicle Accidents’ Severity

Based on the analysis result of the improved grey correlation method, two categories of accident variables as network nodes were obtained in this paper. The first category is the primary variables, including drivers’ behavior, vehicles, road, and the environment. The second category is the result variables, including the number of deaths and the number of injuries. As for single-vehicle accidents, 13 nodes variables were selected in this paper, shown in the preliminary learning result in Figure 5. Besides, Figure 6 shows the preliminary learning result of a multivehicle traffic accident network with 16 nodes’ variables.

Figure 5 shows that the node variables 4, 6, and 10 are not connected to nodes 1 and 2. Figure 6 shows that the node variables of overspeed, fatigue driving, driving cross the line, and low visibility have a low correlation degree with the two features of the accident in the multivehicle network. Therefore, the node variables of misoperation, driving status, and safety protection facility were deleted from the single-vehicle network. These four factors in multivehicle preliminary BN were deleted as well. Continuing to be trained by the K2 algorithm for BN structure searching iteration, the final learning result is split into two networks, shown in Figures 7 and 8, respectively, because of the independence of the number of deaths and the number of injuries.

In Figure 7, ten factors contribute to the occurrence of single-vehicle traffic accidents with two accident feature variables (the number of deaths and the number of injuries) included. The factor variables are as follows:(i)Direct factors contributing to the number of deaths: overspeed and vehicle type.Direct factors contributing to the number of injuries: vehicle type and accident period.(ii)Indirect acting factors contributing to the number of deaths: road alignment, physical separation, pavement condition (wet or dry), accident period, weather condition, and visibility.Indirect acting factors contributing to the number of injuries: road alignment, physical separation, visibility, and weather condition.

Besides, it can be found in Figure 7 that five-factor set sequences contribute to the number of deaths, and three-factor sequences contribute to the number of injuries. For example, one of the most extended sequences shown in Figure 7(a) is {accident occurring period ⟶ visibility ⟶ pavement alignment ⟶ physical separation ⟶ vehicle types ⟶ number of deaths}. One of the most extended sequences shown in Figure 7(b) is {weather condition ⟶ visibility ⟶ pavement alignment ⟶ physical separation ⟶ vehicle type ⟶ number of injuries}.

Figure 8 shows ten factors contributing to the number of deaths and five factors contributing to the number of injuries of multivehicle severe traffic accidents:(i)Direct factors contributing to the number of deaths: mislane use, road alignment, and workday.Direct factors contributing to the number of injuries: overload.(ii)Indirect acting factors contributing to the number of deaths: bus, vehicle safety condition, physical separation, accident period, and weather condition.Indirect acting factors contributing to the number of injuries: bus, heavy truck, road alignment, and weather condition.

Similarly, there are six factor sequences contributing to the number of deaths and three factor sequences contributing to the number of injuries in the multivehicle severe traffic accidents. The most extended sequence in Figure 8(a) is {accident occurring period ⟶ physical separation ⟶ vehicle safety status ⟶ mislane use ⟶ the number of deaths}. Besides, the most extended sequence in Figure 8(b) is {pavement alignment ⟶ heavy trunk involved or not ⟶ misoperation or not ⟶ the number of injuries}. Although the multivehicle traffic accidents usually lead to a more severe effect, the factor sequence of multivehicle traffic accidents is slightly shorter than single-vehicles’, indicating that the multivehicle accidents can be prevented more quickly because of fewer causes.

4.3. BN Conditional Probability of Learning Result

The conditional probability of influencing factors of the single-vehicle and multivehicle traffic accident was calculated and shown in Tables 6 and 7, respectively.

It can be seen from Table 6 that overspeed, commuter bus involved, straight alignment, physical separation, slippery pavement, night, bad weather, and low visibility have significant impacts on single-vehicle accidents. As shown in Table 7, the significant factors contributing to multivehicle accidents are overload, mislane use, commuter bus involved, heavy truck involved, poor braking, straight alignment, no physical separation, night, weekend, and bad weather. The corresponding variable state is then selected to analyze the risk degree of each factor by the interval sorting theory of the BN network. The risk degree ranking of factors contributing to single-vehicle and multivehicle traffic accidents is shown in Tables 8 and 9.

As shown in Tables 8 and 9, different factors contribute to different degrees of severity in terms of the number of deaths and the number of injuries in single-vehicle and multivehicle traffic accidents. The results show that the risk factors that most significantly influence the number of deaths and injuries in single-vehicle accidents are bad weather and commuter bus. The risk factors that have the most significant influence on the number of deaths and injuries in multivehicle accidents are commuter bus and night.

4.4. Severity Ranking Result of Factors Combination

Since accidents result from multiple factors, it is necessary to study the probability distribution of the number of deaths and injuries under the combination of multiple factors based on the analysis of a single factor. As for the analysis of each factor’s effect on the severe traffic accident, the posterior probability of death and injury was deduced with BN’s interval theory. Then, the inherent logical relations among these factors were ranked by the severity degree in terms of the number of deaths and the number of injuries, as shown in Tables 10 and 11.

The study on factor sequences can reduce the accident damage and help safety managers propose effective measures. The key reasons for the enormous damage caused by bad weather conditions are overspeed, overload, and mislane use. Therefore, countermeasures of adverse weather conditions, reasonable control of vehicle speed, and proper lane use should be focused on to minimize severe traffic accidents.

4.5. Model Validation Test Result

Using the mathematical statistics, the conditional probability accuracy of this BN accident severity model is validated. The model’s efficiency can be tested by the MSE and RMSE calculated by the conditional probability of actual value and learning value shown in Table 12.

From Table 12, as for single-vehicle accidents, the model’s accuracy for the number of deaths is slightly lower than the number of injuries since the maximum absolute error is 0.0027, and the mean relative error is 0.3390. That is why the sample distribution between these two types is unbalanced. Hence, the crash severity for single-vehicle traffic accidents can be analyzed using BN model with a greater prediction accuracy when a more randomly distributed accident data sample is provided. As for multivehicle accidents, although the model’s accuracy for the number of deaths is slightly lower than the number of injuries, the model still meets the requirement of prediction. Hence, the BN model can be used to analyze the crash severity for multivehicle traffic accidents.

5. Conclusions and Discussions

Most previous researches have studied the relations between various factors and accident indexes from a particular perspective. This paper studies the critical factors contributing to severe traffic accidents in China from single vehicle and multiple vehicles with the BN crash severity model. From the case application result, the following conclusions can be drawn:(1)The direct factors contributing to the single-vehicle traffic accidents are commuter bus involved, mislane use, and night. As for the multivehicle traffic accidents, the direct factors are pavement alignment, mislane use, weekend, and overload.(2)Under the influence of a single factor, the risk factors that have the most significant influence on the number of deaths and injuries in single-vehicle accidents are bad weather and commuter bus, while the risk factors that have the most significant influence on the number of deaths and injuries in multivehicle accidents are commuter bus and night. Single-vehicle accidents and multivehicle accidents involving commuter buses are more serious.(3)There are different inherent hierarchical correlations among variables in different types of accidents. Overloading of the commuter bus may cause serious accidents. Bad braking of commuter buses on the improper lane is more likely to cause a serious accident. Single-vehicle accidents on separated highways are more likely to result in serious injuries. Under the conditions of low visibility, such as wet road and foggy days, the injuries caused by accidents are relatively low because of drivers’ high vigilance.(4)One of the factor combination sequences for single-vehicle traffic accidents, namely, {bad weather  slippery pavement  overspeed}, can be speculated through the importance degree ranking of these critical factors. Moreover, the possible factors’ combination links of multivehicle traffic accident can be speculated as {night  nonphysical separation  poor braking  mislane use}. These links could provide some theoretical support for active precaution management of severe traffic accidents.

5.1. Limitation

Only several factor links were concluded in this study with a rough calculation of sample data collected from the investigation and disposal report of accident reports in the production process. The structure of the BN crash severity model was relatively simple, as only a few factors were analyzed. More accurate factors were needed to increase the complexity of the model’s structure, which could reveal factor coupling relation and explore the factor links more deeply. The improved grey correlation method and BN crash severity model were adopted to analyze the severity and contributing factors of severe traffic accidents.

This paper aims to construct a targeted model for traffic accidents to identify the contributing factors precisely. However, the data and sample distribution available are limited, which decreases the accuracy of BN model. This paper studies the crash severity and contributing factors from a superficial perspective, lacking the “drivers-vehicles-road” coupling analysis for traffic safety. Therefore, it recommended that the following research should focus more on the comprehensive factor analysis with the BN model constructed in this paper, such as “driving behavior + driving environment” analysis for road safety of the autodriving system, if the data from simulation experiments can be obtained.

Data Availability

The data of participants responding to the investigation and disposal report of accident reports are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

The authors would like to thank the participants for their cooperation. This research was supported by the Natural Science Foundation of Zhejiang Province (no. LQ19E080003).