#### Abstract

Crash severity prediction has been raised as a key problem in traffic accident studies. Thus, to progress in this area, in this study, a thorough artificial neural network combined with an improved metaheuristic algorithm was developed and tested in terms of its structure, training function, factor analysis, and comparative results. Data from I5, an interstate highway in the Washington State during the period of 2011–2015, were used for fitting and prediction, and after setting the theoretical three-layer neural network (NN), an improved Particle Swarm Optimization (PSO) method with adaptive inertial weight was offered to optimize the NN, and finally, a comparison among different adaptive strategies was conducted. The results showed that although the algorithms produced almost the same accuracy in their predictions, a backpropagation method combined with a nonlinear inertial weight setting in PSO produced fast global and accurate local optimal searching, thereby demonstrating a better understanding of the entire model explanation, which could best fit the model, and at last, the factor analysis showed that non-road-related factors, particularly vehicle-related factors, are more important than road-related variables. The method developed in this study can be applied to a big data analysis of traffic accidents and be used as a fast-useful tool for policy makers and traffic safety researchers.

#### 1. Introduction

##### 1.1. Crash Severity

Traffic safety is a challenging task to be accomplished and has been identified as crash hotspots around the world. The total number of fatal crashes in the U.S. increased to around 35,000 in 2016. In addition, according to the Washington State Collision Summary report, a total of 117,053 crashes were identified in the Washington State, including 499 fatal collisions, 36,531 injury collisions, and 77,358 property-damage-only collisions, indicating a crash occurred every 4.5 min and a person died in a crash every 16 hours [1]. Figure 1 shows the traffic fatality rates between the U.S. and Washington State [1]. Billions of dollars in personal and property damage are wasted in traffic crashes each year around the world [2].

To achieve the intrinsic goal of exploring numerous factors that trigger the crash, crash severity is often used for crash analyses to represent the degree of injury. A KABCO scale was proposed by WSDOT to represent the level of the injury: K—fatal injury; A—incapacitating-injury; B—nonincapacitating injury; C—minor injury; and O—property-damage-only injury. The KABCO scale has been widely adopted and adapted by many scholars (e.g., [3–5]). In this paper, the predicted targets were regrouped into three categories based on the combination of the KABCO scale and crash severity category of 5-year data (2011–2015) obtained from the HISI data system, namely, incapacitating-injuries, injuries, and noninjuries.

##### 1.2. Crash Severity Prediction

Crash prediction problems have long been a popular area of study around the world. Numerous studies conducted the prediction analyses based on classic statistical models, e.g., the linear, nonlinear, generalized linear model (GLM), generalized estimating equation (GEE), nominal binary (NB), and Poisson regression models, which are regarded as a good attempt at thoroughly formulating the relationship between tens or hundreds of explanative variables. However, it should be noted that the traditional statistical method has its limitation. The artificial intelligence (AI) technique, particularly, the deep learning methodology [6], as an emerging but promising tool for addressing the problems faced by the traditional statistical domain, deserves more attention and exploration.

Safety performance functions (SPFs) are frequently utilized to demonstrate the relationships between different crashes and crash impact parameters. Such functions usually use the crash frequency as the targeting variable. The Highway Safety Manual [7] has several chapters demonstrating the average crash frequency of an entire network, facility, or individual site. Elvik et al. [8] developed six power functions to demonstrate the relationship between speed and road safety; these six equations are slightly adjusted according to the road characteristics under the following categories: fatal injuries, fatal and serious injuries, all injuries, fatal accidents, fatal and serious accidents, and all injury accidents. Russo et al. [9] used a negative binomial regression model to develop four sets of SPFs based on the crash frequency and conducted a residual analysis to prove the accuracy. Park and Abdel-Aty [10] developed several crash modification factors (CMFs) for a combination of traffic and roadway cross-sectional elements on noncurved and curved roadway sections using a cross-sectional method and found that CMFs for increasing the lane and shoulder width were decreasing as the annual average daily traffic (AADT) level increased. Using a Bayesian ranking technique, Ahmed et al. [11] examined the safety effects of the roadway geometry on the crash occurrence rate along a freeway section that features mountainous terrain and an adverse weather condition and confirmed that segments with steep downgrades are more crash-prone along the studied section.

A number of researchers are eager to dig into the use of statistical analysis for traffic safety research, such as linear, nonlinear, GLM, GEE, NB, or Poisson regression models. Such models have performed well when the number of explanatory variables is constrained. Debrabant et al. [12] applied an autoregressive Poisson–Tweedie model to mine spatially and temporally aggregated hospital records of traffic accidents, and it was confirmed that this method is very accurate when applied to a black spot identification problem. Pei et al. [13] developed a joint probability model combined with a Markov chain, Monte Carlo (MCMC) approach, and full Bayesian to estimate the effects of explanatory factors, and their results indicated that the model achieves a good statistical fit and provides an accurate analysis of the influences of the explanatory factors. El-Basyouny et al. [14] applied a multivariate model based on a MCMC simulation to address the impact of weather elements, and their results showed that temperature and snowfall are statistically significant with intuitive signs for all crash types, whereas rainfall is mostly insignificant, as is the maximum wind gust with a few exceptions that are positively related to the crash type.

To address the shortcomings of the traditional statistical model, scholars in the crash analysis field are more willing to use AI methods at present. Karlaftis and Vlahogianni [15] discussed the differences and similarities between a statistical method and neural networks (NNs), and their results showed that the goals of the analysis are more important than the tools used, and that there are always assumptions to all modeling approaches. Specific to a traffic accident analysis, although classic statistical models such as NB, Poison, and a Bayesian network can achieve a good identification of a broad range of risk factors, they are also limited to a finite factor assumption as compared with a deep-learning method [16]. Aided by the powerful hardware and software of modern computers, deep-learning methods are becoming powerful tools for many aspects of our daily lives [6]. For example, in a crash analysis, Zeng and Huang [17] used a pruned NN for the crash severity, adopting a convex combination (CC) training algorithm and a NN pruning for a function approximation (N2PFA) structure optimization method, and found that the CC outperforms the backpropagation (BP) method in both convergence ability and training speed; in addition, simplification of the nodes in an NN structure can obtain a better performance. Huang et al. [18] developed an optimized radial basis function neural network (RBFNN) model to analyze the relationships between crash frequency and the relevant risk factors, and their comparative work showed that RBFNN models outperform negative binomial models and backpropagation neural network (BPNN) models. Li et al. [5] developed a data-driven method combining the nondominated sorting genetic algorithm (NSGA-II) with an NN to identify the key factors in a fatal highway crash analysis. All of the abovementioned studies have focused more on the NN structure itself, using complicated mathematical equations to illustrate their abstract concepts; nevertheless, they have seldom given a thoughtful, detailed, and general procedure to deal with a complete traffic severity analysis problem.

However, in short, both statistical and AI methods in the previous studies are still facing some challenges, and especially for neural networks, the traditional NNs are more easily stuck in the local optimum in accordance with its random weights initialization at the very first beginning. Although some studies [5, 18] were carried out to address the abovementioned problems, the efficiency of the global optimum search for particle swarm optimization (PSO) and local optimum search for some other optimization methods is still to be improved.

To solve the aforementioned problems, the purpose of this paper is as follows: (1) to provide a BPNN algorithm integrating the PSO with adaptive inertial weights for the establishment of the crash severity prediction model; (2) to conduct a detailed factor analysis (FA) based on the refined model to quantify the internal relationship and heterogeneity of different variables that trigger the crash of distinguished severity. The prediction target in the first phase is the crash severity. It should be noted that the novelty of the paper lies in the fact that an integrated method incorporating an emerging AI technique and a traditional statistical model was provided for crash severity analysis. Besides, it is marginally original to use FA and PSO for the calculation of the parameters of the crash triggers.

This paper is organized as follows: in the first part, the background and severity levels are discussed to demonstrate the reason for conducting this research. In the second part, classic statistical models and NN models dealing with crash analysis are reviewed to illustrate their advantages and limitations. The following section demonstrates the processing of the dataset, including its description and simplification. In the fourth part, the entire methodology for the developed model and data incorporation are presented to highlight the process of conducting a severity prediction process. Finally, the results are presented with the conclusion.

#### 2. Data Description

The dataset used in this study consisted of the data acquired from the Highway Safety Information System (HSIS); in this system, data from nine states (California, Washington, Minnesota, Michigan, Maine, Ohio, North Carolina, and Illinois) in the U.S. are available. Considering the author’s time studying in the Washington State during the period of 2016 to 2017, the crash data from this were selected as the target data.

The HSIS data contained, roughly, two tables for the variables. The first one is related to Accident, Vehicle, and Occupant files, which involve TIME, ENVIRONMENT, ACCIDENT-RELATED INFORMATION, VEHICLE INFORMATION, DRIVER INFORMATION, OCCUPANT, ROADWAY ELEMENTS, and PEDESTRIAN/BICYCLIST INFORMATION, whereas the second table was more concerned with the roadway containing LOCATION/LINKAGE ELEMENTS, ROADWAY CLASSIFICATION, ROAD ALIGNMENT, CROSS SECTION, ROAD FEATURES, TRAFFIC CONTROL/OPERATIONS, and TRAFFIC DATA. In addition, there were tens of subvariables for both the tables.

The crash data from I5 in the Washington State covering the years 2011–2015 were extracted. The data from the first four years, 2011–2014, were used to fit the model, and the data from the later year were used as the prediction validation set. The total crashes were 9926, 10083, 10127, 11628, and 12804 from 2011–2015. Based on these raw samples, the following steps need to be conducted before digging into the model input procedure:(1)Exclude apparently irrelevant variables. More than 40 features were requested from the HSIS database system, and some features, such as “CASENO” (accident case number), “MILEPOST” (milepost), “RD_INV” (a linkage variable on the Accident file which is used in the merging operation), and “RTE_NBR” (route number) are not related to the crash severity and were omitted for simplicity.(2)Samples with features such as “LIGHT,” “WEATHER,” and “ACCTYPE” (accident type) which have values such as “UNKNOWN,” “NAN,” “UNSTATED,” and “NULL” were also omitted for simplicity.(3)Some nominal variables which cannot be denoted by the continuous number such as “DIR_CURV” (the horizontal curve direction) and “DIR_GRAD” (the vertical curve grade direction), both representing the relative direction of left or right, were transformed into discrete scale values (“1” or “0”).(4)The vehicle-related and driver-related variables such as “DRV_AGE” (driver age) and “DRV_SEX” (driver sex) were incorporated with accident-related data files through the “CASENO” label, whereas the grad/curve-related variables were incorporated with accident-related data through “MILEPOST”; here, a data process computer program written in MATLAB was developed to locate the “MILEPOST” between “BEGPOST” and “ENDPOST” in the grad/curve files.(5)“VEHYR,” which indicates the vehicle model year, was transformed into the vehicle operation year through the following formula: where refers to the vehicle operation year in the year of , refers to the vehicle model year in the raw file (for example, “11” represents the year 2011 and “98” represents the year “1998”), and refers to the value of the year. For example, if a car was produced in the year 2011 and a crash occurred in the year 2013, should be , indicating .(6)The output file contains the vectors derived from the “SEVERITY” variable in the raw dataset. In detail, noninjury was derived from “1, No Injury,” injury was derived from “6, Nondisabling Injury; 7, Possible Injury,” and incapacitating injury was derived from the remaining. The vectors are described through the following formula:where refers to the output of the crash item , and refer to the Boolean index of the crash severity for an injury, noninjury, and incapacitating injury.

After the processing was conducted through the abovementioned steps, a total of 4310, 4494, 4436, 4666, and 4984 samples from 2011 to 2015 were used for model fitting and validation; glancing at the data, crashes seemed to be slightly more prone to occur during the winter (cold season) and on work days, whereas younger drivers contribute to a significant number of accidents (Figure 2).

**(a)**

**(b)**

**(c)**

After the selection of 20 features (Table 1) available from the raw file, the next step for the data cleaning and processing work was variable standardization which was carried out using the min-max method through the following formula:

From Table 1, we can see that, among all 20 features, there are 15 categorical and five continuous variables. In addition, for a research perspective, we divided all variables into two large categories: road-related and nonroad-related.

#### 3. Methodology

##### 3.1. NN with the BP Method

An artificial neural network uses information technology to mimic the human neurons and can process complicated connections between the input, hidden, and output layers. Among the multiple-layer neural networks, a three-layer simple NN has been proven to be most adopted and effective in a previous research [19].

In the forward propagation three-layer NN, the input variables can be defined as an input vector :where refers to the input variable, = 20, and refers to a transpose in the matrix calculation.

Similarly, the expectation of the crash severity level output vectors can bewhere refers to the Boolean index for an injury-related crash, refers to a Boolean index for a noninjury-related crash, and refers to a Boolean index for an incapacitating injury.

In addition, the weight matrix between the input and hidden layers, , should bewhere denotes the weight between the input node and hidden node and refers to the total number of the hidden layers, *I* = 20.

The weight matrix between the hidden and output layers, , should bewhere denotes the weight between the output node and the hidden node, refers to the total number of hidden layers, and the vector has values indicating injuries, noninjuries, and incapacitating injuries.

The structure of a general three-layer neural network is shown in Figure 3.

Using the forward calculation method [17], the outputs of the nodes in the hidden layer can be depicted aswhere *m* is the number of the output neurons, denotes the weight between the input node and the hidden node, refers to the total number of nodes in hidden layer, is the activation function between the input and hidden layers, and is the bias term.

Similarly, the outputs calculated from the hidden layer are shown aswhere denotes the weight between the output node and the hidden node, refers to the total number of nodes in the hidden layer, the vector has values including injuries, noninjuries, and incapacitating injuries, and is the activation function between the output layer and hidden layers. is the bias term.

Generally, the activation functions such as sigmoid or tan*h* are selected and have the ability to transform the input signal into a certain range. If the network adopts sigmoid function as the output activation function, the output can be narrowed into a small scale as ; however, between the hidden and input layers, usually, the tan*h* function is adopted because it usually can converge fast.

Another aspect used for building an NN is the definition of the number of nodes in the hidden layer, and there is no easy and complete mathematical way of defining this number; however, based on the experience from former research [20], the empirical equation used is as follows:where is the number of hidden nodes, is the number of input nodes, is the number of outputs, and is a constant varying from 1 to 10.

Based on the BP method [17], the local gradients and of the output and hidden layer neurons and the correction values and of their connection weights are as follows:where and are the momentum and step size, respectively.

The weights in the network can be updated as

Other than the general BP method, there are some other modified training functions, including resilient backpropagation (RPROP), conjugate gradient backpropagation, gradient descent/momentum, and adaptive backpropagation. These functions have different levels of accuracy and training speeds, and thus, an attempt should be made to find a better solution.

In the traditional BP, method, the initialized weights in formulas (6) and (7) are randomly initialized following two different uniform distributions. However, this implementation will highly possibly cause the whole model solution stopping at a local optimum. In order to address this problem, the authors offered a particle swarm optimization method with refined adaptive inertial weights to enhance both local and global searching for the optimal initial weights for BPNN. The details are provided in the following section.

##### 3.2. PSO with Adaptive Inertial Weights

The particle swarm optimization method was a well-known metaheuristic computation method provided in 1995 [21]. Also, it was easy to use in different optimization applications [22]. The standard and original formulas for this method arewhere is the particle at the generation and denotes this particle’s velocity to the generation. and are two constants usually taken around the value of 2. and are the two uniform random numbers in the range of [0, 1], is the best position experienced by the particle itself, and is the best position experienced by the particle swarm.

In this paper, the initial weights from BPNN are treated as the particles in the PSO algorithm, and the optimization problem can be described as mapping a decision space *X* to *Y*, and for a typical 3-layer neural network, they are encoded in the following set:where refer to the weights and bias connecting the input and hidden layer, while refer to the weights and bias connecting the hidden and output layer. Also, the transfer objective function *Y* is the neural network mean squared error (MSE).

For the standard or original PSO, it could solve nonlinear or nondifferentiable problems easily, but the searching space for a particular particle is almost fixed during each phase of generation, which means the model could then be easy and fast to find a solution, a possible solution near the local optimal. Thus, bringing the tradeoff between the local search ability and global search ability ahead [23], an inertial weight is introduced into formula (14), which could be written as follows:where is the inertial weights controlling the global and local optimal searching speed, and it iterated during each generation in a linear or nonlinear form.

Thus, in this paper, different inertial weight setting methods [23–25] including the linear and nonlinear form are compared as follows:where and refer to the weights at the start and the end of the generation, and they are usually set to 0.9 and 0.4, respectively, is the current iteration, and is the max generation.

The function graph for formulas (18)–(20) is depicted in Figure 4. From Figure 4, can we see that the weight is of a relatively high value at first in order to expand the search space for global optimal; however, at the end of the iteration, it is converging slowly to enhance the local optimal search.

In conclusion, the pseudocode for the whole procedure in Sections 3.1 and 3.2 is formulated as follows: FOR *I* = 1: MAX GENERATION FOR *J* = 1: POPULATION SIZE PARTICLE INITIALIZE; CALCULATE FITNESS (MSE of BP NN); UPDATE PARTICLE VELOCITY (WITH ADAPTIVE INERTIAL WEIGHTS); UPDATE PARTICLE POSITION; END END ASSIGN SOLUTION TO BP NN; BP NN TRAINING, TESTING, VALIDATION; NN PREDICTION;

The calculation process is given in Figure 5.

##### 3.3. Factor Analysis

To carry out a factor analysis (FA), the factor importance index (FII) is introduced in this paper. According to the nonlinear and classification function in practice, the *n*-dimensional input vector constructs the whole input space. Thus, from the perspective of engineering math, the first order partial derivative of the outcome *Y* with respect to the variable could explain the connecting weight of that input variable vector, according to the chain rule in calculus, and the math description is as follows:where is the linear output value from the hidden layer to the output layer in BPNN, which could be described as formula (9), and thus, equation (21) is transformed towhere *j* is the nodes in the hidden layer, refers to the weights through the hidden layer to the output layer, denotes the derivative with respect to the activation function between the output and hidden layers.

Considering formula (8) combined with chain rule in calculus, formula (22) could be written aswhere is a short note for formula (8) and refers to the weights through the input layer to the hidden layer. denotes the derivative with respect to the activation function between the input layer and the hidden layer.

Through formula (24), we can see that while given a fixed input vector in the dimension, the value of and is fixed with respect to all ; thus, considering the remaining part, the FII for the variable can be written aswhere the value of and can be calculated through formulas (12) and (13). These values are stored in a dictionary in a program code during the calculation process.

Finally, in order to ease the simulation variance of the model training process, the FII expectation is introduced by running the model for a certain *k* times:

#### 4. Results and Discussion

##### 4.1. NN Model Structure Test

Based on the theory discussed in the previous section, the most primary step in building an NN is to define the number of good hidden layer nodes and a better training function. Usually, the model performance (mean square error, MSE) combined with the total iteration number of convergence is used to test the structure. For the number of hidden layer nodes, based on formula (10), consecutive numbers of 5 through 14 were selected; for the training function, however, one of the following (Table 2) is applied.

The last two methods in Table 2 usually provide a fast calculation speed, but tend to be challenging and inefficient when dealing with the big data issues, especially, for the GPU hardware with low configuration. For the present study, considering the sample size, GPU support is not a problem, and thus, this minor difference is not a significant concern.

Theoretically, the number of nodes in the hidden layers should be within the range of based on formula (10); usually, however, the number of hidden layer nodes does not fall below 10, and thus, a combined test using the popular training functions and 10–14 hidden layer nodes number was carried out based on a loop test.

We randomly separated the sample data from 2011 to 2014 to form the dataset of training, testing, and validation with respect to 70%, 15%, and 15%. In detail, a total data of 17839 were divided into 12487, 2676, and 2676, in accordance with training, testing, and validation. The outcome is shown in Table 3.

It can be seen from Table 3 that 12 and 14 hidden layer nodes achieve the best performance (in other words, the lowest MSE) and the BR training function has the lowest MSE. However, when adopting these methods, the gradient value of the GDA, GDX (with an adaptive learning rate), and LM decreases quickly during the very early validation stage and, thus, quickly converges to the preset goal, and methods such as BR, GDM, and GD converge slowly, requiring more validation than usual. Thus, in conclusion, the choice of GDA, GDM, or LM should be better, and considering a lack of GPU support for the simulation environment, we selected the Levenberg–Marquardt backpropagation (LM) as the training function. During the number test, LM with 12 nodes of the hidden layer showed the best performance.

##### 4.2. Results for PSO Optimization

After setting the adaptive inertial weights for the PSO optimizer, we can conclude the following performance graph through each iteration.

To eliminate the random data separation variance, 100-time simulation was conducted, and the average performance for each method is described in Table 4.

From Figure 6 and Table 4, it could be seen that the performance refers to the MSE of the training model, training accuracy refers to the average classification accuracy among the training set (17839), and the prediction accuracy refers to the classification accuracy among the 4984 samples from 2015. Although the fitness (MSE) of PSO with drops fast in the early stage (almost starts with 22), it converges poorly and has the poor condition of performance and training/testing accuracy, which means it lacks of certain ability for local optimal searching. On the other hand, PSO with and (two nonlinear formulas) outperforms the other methods not only in global optimal search (both from 28, while NN with standard PSO drops from nearly 40) but also in the local optimal search (with good performance and accuracy), and the NN and NN with standard PSO are in the middle level with an acceptable result. It can be assumed that with more data flushing in, the PSO with nonlinear will perform a dominating advantage over the other methods.

**(a)**

**(b)**

**(c)**

**(d)**

As shown in Table 4, the accuracy of training is better than the results of the prediction; the relatively lower sample number may be a major contributor. Notably, the performance from all categories is at the same rate related to the sample size, and it can be concluded that although the prediction accuracy applied to other samples could result in a marginally worse accuracy, the model can explain all variance and contributors.

##### 4.3. Results for Factor Analysis

The final results for two PSO with nonlinear adaptive inertial weight are described in Table 5.

From Figure 7 and Table 5 we can see that two methods (PSO with and PSO with ) provide almost the same ranking with respect to each variable, despite some minor difference, for example, MONTH and LIGHT are ranked slightly higher in the second method, while DRIVER SEX are ranked slightly lower. Nearly seven or eight of the 20 variables have a relative importance value exceeding 50%, including the vehicle year, driver age, accident type, month, location type, and road functional class (also LIGHT in the second method), with vehicle year and driver age taking the top two values. Additionally, the related road factors such as curve and gradient variables have the least importance, and the only important factor contributing to the severity prediction is the road functional class, which indicates whether a crash occurred on a certain type of road. Thus, it can be concluded that the relative road variables contribute less than the relative nonroad variables, particularly, compared with the relative vehicle factors (vehicle year and type). Therefore, the policy makers should pay more attention to the vehicle and driver regulation rules, as well as the road design, to reduce the possible severity level in the future.

**(a)**

**(b)**

Driver age and month are two other important factors in predicting a crash severity. From the sample size, we can see that the most severe crashes occur during the winter in the Washington State (December, January, and February), and drivers below the age of 25 and above the age of 60 are more prone to encountering severe injury crashes. The month may account for the rainy season in the mountainous Seattle area, whereas age may be derived from the fact that younger people and older people are more prone to making severe mistakes.

#### 5. Conclusions

In this study, a thorough artificial neural network (ANN) was developed to address the problems of the crash severity level modeling and factor analysis (FA). Besides the test of different types of training structure and methods, more importantly, a nonlinear adaptive PSO optimization method was proposed in order to solve the tradeoff problem between the global and local search ability among the previous studies. The detail test of different algorithm confirmed our hypothesis. The additional contributing factor analysis also offers a different point of view compared with former statistical analysis. The main conclusions can be concluded as follows:(1)The number 12 hidden layer nodes fit the model developed in this paper well; and the BP method (Levenberg–Marquardt) can be better utilized when aided by fast hardware(2)The simulation result showed that the PSO optimizer with nonlinear adaptive inertial weight outperforms the standard PSO and PSO with linear adaptive inertial weight(3)Through the factor analysis (FA), it can be found that, among all 20 variables, nonroad-related variables can account for most of the severity prediction variance, and the rainy mountainous area in Seattle may be the reason for the importance of the month as a factor and, also, the impact of driver age, where younger and older people are more prone to encountering a severe crash

The main innovations can be concluded as follows:(1)Traditional studies often used statistical methods like the Poisson regression, negative binary regression, and generalized logit or probit model for the identification and mathematical qualification of the inner internal triggers and their impact on crash severity, while this paper utilized FA as the analytical tool, which is unusual for the current research system of crash severity, and we think that our attempt extended the methods of crash severity analyses, and more research could be conducted in the future work.(2)FA, as a traditional statistical implement, also can serve as a powerful explanatory tool in the last stage of the model, and our work has proved it. The application of FA in this paper indicated that the basic statistical method is still useful and efficient while the AI methods sometimes did not have an agreeable explanation for the inner mechanism of the data.

The method developed in this study can be applied to a big data analysis of traffic accidents and be used as a fast-useful tool for policy makers and traffic safety researchers. The authors recognize that much can be further investigated. In this paper, only crash severity was discussed. Further research could be conducted from the perspective of the collision type (e.g., head-on collisions and rear-end collisions). Besides, the dataset could be enlarged in the future research to improve the accuracy.

#### Data Availability

The dataset used in this study was made up of data requested from the Highway Safety Information System (HSIS), and requests for access to these data should be made by filling the form at the following link: https://www.hsisinfo.org/datarequest.cfm.

#### Conflicts of Interest

The authors declare no conflicts of interest.

#### Acknowledgments

The authors would like to thank the Highway Safety Information System, U.S.A, Smart Transportation Application and Research Laboratory (STAR Lab) at the University of Washington, U.S.A, and Pacific Northwest Transportation Consortium Region 10, U.S.A. They also thank National Natural Science Foundation of China (Grant nos. 51778141 and 71871078), China Scholarship Council, and Jiangsu Creative PhD Student Sponsored Project (KYLX15_0157) for providing essential data and support.