Abstract

Road traffic accidents are a concrete manifestation of road traffic safety levels. The current traffic accident prediction has a problem of low accuracy. In order to provide traffic management departments with more accurate forecast data, it can be applied in the traffic management system to help make scientific decisions. This paper establishes a traffic accident prediction model based on LSTM-GBRT (long short-term memory, gradient boosted regression trees) and predicts traffic accident safety level indicators by training traffic accident-related data. Compared with various regression models and neural network models, the experimental results show that the LSTM-GBRT model has a good fitting effect and robustness. The LSTM-GBRT model can accurately predict the safety level of traffic accidents, so that the traffic management department can better grasp the situation of traffic safety levels.

1. Introduction

“By 2020, half the number of global deaths and injuries from road traffic accidents” is one target of the Sustainable Development Goals (SDGs) published by the United Nations (UN) in 2015 [1]. The country’s attention to traffic safety continues to increase. Applying traffic accident situation prediction results to traffic planning can improve traffic safety. Many experts and scholars have predicted some indicators of traffic accidents [2, 3]. The research methods are mainly divided into three categories, statistical regression method [4], grey prediction [5], and neural network model method.

Statistical regression methods include time series prediction and many classic traffic accident experience models (Smid model, I. Agalal model, Japanese model, and Beijing model). Yannis et al. [6] proposed an autoregressive nonlinear time-series modelling of traffic fatalities in Europe. Kumar and Toshniwal [7] proposed a novel framework for time series data of road traffic accidents, which segments the time series data into different clusters for trend analysis. Ihueze and Onwurah [8] analyzed road traffic crashes in Anambra State, Nigeria, with the intention of developing accurate predictive models for forecasting crash frequency in the state using autoregressive integrated moving average (ARIMA) and autoregressive integrated moving average with explanatory variables (ARIMAX) modelling techniques. The regression model is simple and convenient to calculate, and it can predict short-term data changes. The essence of the regression model is the linear fit to the data. However, the results predicted by the model are one sided and weak in anti-interference ability. Due to the randomness of traffic accidents themselves, there are many influencing factors. Therefore, the reliability of its prediction results is not guaranteed.

The grey prediction model can predict a small number of samples, and the principle is simple, the operation speed is fast, and the testability is strong. The grey prediction model can make short-term and medium-term macropredictions for data with little fluctuation. The essence of the model is to find the dynamic relationship between the road traffic accident sequence data. However, the grey theory is modeled for a class of series that conforms to the condition of a smooth discrete function, and the grey system model describes only a process that monotonically increases or decays exponentially over time. Shi et al. [9] proposed a sequence GM (1, 1) model with strong exponential law to predict traffic accidents, but the model can only describe the monotonous change process. Hosse et al. [10] applied a Grey Systems Theory MGM (1, 4) in order to predict the development of road traffic accidents in Germany until 2025 based on the market diffusion of electronic stability program (ESP). Liu and Wu [11] proposed a grey Verhulst prediction model for road traffic accidents, which is suitable for nonmonotonic wobble development sequences or S-shaped sequences with saturation. Zhao et al. [12] proposed a model that weighted and combined a variety of grey prediction methods. Although the prediction accuracy has been improved, its essence is a linear combination of the original data and there are still shortcomings in the medium- and long-term prediction.

The neural network prediction method has strong nonlinear mapping ability, high robustness, and powerful self-learning ability and has been widely used in many fields. He and Guo [13] proposed a traffic accident prediction model based on the BP neural network. The model can implement any nonlinear mapping, especially suitable for complex internal mechanisms. The shortcomings of the BP neural network model include slow training convergence, long training time, and easy to fall into the saddle point. Liwei et al. [14] proposed a grey neural network model. The grey theory compensated for the shortcomings of data mining for small sample data distortion, while the neural network compensated for the shortcomings of grey theory that can only be used for short-term prediction. Although the model improves the training speed, the accuracy of the model prediction results is low and the deviation is too large.

This paper proposed an LSTM-GBRT model for traffic accident prediction. The LSTM layer captures time-dependent information in the data; the GBRT model has the advantage of high robustness of ensemble learning for model training.

2.1. Long Short-Term Memory

The LSTM [15] model proposed by Hochreiter et al. is a variant of the recurrent neural network (RNN). It builds a specialized memory storage unit that trains the data through a time backpropagation algorithm. It can solve the problem that the RNN has no long-term dependence. The schematic diagram of the LSTM structure is shown in Figure 1.

The standard LSTM can be expressed as follows. Each step t and its corresponding input sequence are , the input gate is t, the forget gate is , and the output gate is . Memory cell state controls data memory and oblivion through different gates. The formula is as follows:

The memory cell state of the unit time t of the jth LSTM is as follows:

After the memory cell state is updated, calculate the current hidden layer :where W is the weight matrix of the input, U is the state transition weight matrix, is the sigmoid function, tanh is the hyperbolic tangent function, is the hidden state vector of the output, is the new cell state after the adjustment and update, and “” indicates point multiplication. The three types of gates jointly control the information entering and leaving the memory cell state, and the input gates adjust new information into the memory cells; the forgetting gate controls how much information is stored in the memory cells and how much information can be output by the output gate definition. The gate structure of the LSTM allows the information in the time series to form a balanced long short-term dependency.

2.2. Boosting Ensemble Learning Framework

GBRT model is a boosting [16] type ensemble learning algorithm. Ensemble learning is a technical framework that combines multiple different models to perform the corresponding tasks in order to achieve more efficient and accurate arrival. Currently used ensemble learning frameworks include bagging, boosting, and stacking. The training process of the boosting framework is stepped, the base model is trained in order, and the training set of the base model is transformed according to a certain summary strategy. Then, the prediction results of all the base models are linearly integrated to produce the final prediction result. Figure 2 is a schematic diagram of the boosting ensemble learning framework.

The overall model based on the boosting framework can be described by a linear combination:where is the product of the base model and its weight. The training goal of the overall model is to approximate the predicted value F(x) to the true value y, that is, to make the predicted value of each base model approximate the partial true value to be predicted. is tested by using training examples, and the weight of misclassified instances is increased. The researchers came up with a greedy solution: train only one base model at a time, and in each iteration, focus on one base model training problem:

Fit the residual. Introducing an arbitrary loss function and fitting the inverse gradient

2.3. Gradient Boosted Regression Trees Model

For a given data set with n examples and m features , a tree ensemble model uses K additive functions to predict the output.where is the space of regression trees. Here, q represents the structure of each tree that maps an example to the corresponding leaf index, T is the number of leaves in the tree, and each corresponds to an independent tree structure q and leaf weights . Unlike decision trees, each regression tree contains a continuous score on each of the leaf, and we use to represent score on the ith leaf.

3. Data Source

3.1. Road Safety Impact Factor Data

As we all know, the occurrence of traffic accidents is caused by the combination of factors such as people, vehicles, roads, and the environment. People include pedestrians and drivers; vehicles include motor vehicles and nonmotor vehicles on the road; road conditions are the condition of the road; environment refers to the natural environment and social environment, and the social environment includes political, economic, cultural, and other factors. On the premise of collecting data, we should consider as much as possible the relevant data of the accident. The data used in this paper include gross national product (GDP) (100 million yuan), per capita GDP (yuan), gross national income (RMB 100 million), road mileage (10,000 kilometers), highway mileage (10,000 kilometers), number of civilian vehicles (10,000 vehicles), number of drivers (10,000 people), passenger traffic (10,000 people), road passenger traffic (10,000 people), total population at the end of the year (10,000 people), male population (10,000 people), female population (10,000 people), urban population (10,000 people), rural population (10,000 people), and the total number of deaths from traffic accidents per year (person). The data used are from the 1997–2016 data of the National Bureau of Statistics of China. The data are shown in Table 1.

3.2. Prediction Index for Road Safety Level

The measures of traffic safety level generally include the number of accidents, deaths, injuries, and property losses. To ensure the accuracy of the data, indicators such as the number of accidents, the number of injured people, and economic losses are subject to subjective influence and the accuracy is difficult to judge. The statistics on the number of deaths are true and reliable, difficult to falsify, and comparable. Therefore, this article uses the number of deaths as a predictor of traffic safety levels to predict the number of deaths.

3.3. Variable Correlation Analysis

If the information in the data is uncorrelated or noisy, the quality of the predictions may be affected [17]. In this paper, by comparing the chi-square value and the Pearson correlation coefficient to filter the features, the prediction results can be optimized. The formula for calculating the chi-square value is as follows:where k is the variable of the kth group, r is the variable number, c is the target variable number, d is the degree of freedom = , is the observation frequency of the variable , and is the expected frequency of the variable .

The Pearson correlation coefficient is calculated as follows:where R is the correlation coefficient, X is the independent variable, Y is the dependent variable, is the mean of the independent variable, and is the mean of the dependent variable.

The chi-square test can calculate the degree of deviation between samples, and the greater the score of the chi-square test, the more obvious the association exists. The Pearson correlation coefficient can roughly give the degree of correlation between variables, and the absolute value of the Pearson coefficient indicates the degree of correlation. According to the chi-square score and the Pearson coefficient in Table 2, we removed the variable with the smallest chi-square score (highway mileage) and removed the variable with the smallest Pearson coefficient (road passenger traffic). Finally, 12 related independent variables and death toll were used as input variables, a total of 13.

3.4. Model Performance Evaluation Index

In this paper, error rate (E) and root mean square error (RMSE) were used to compare the predicted deviation degree, and root mean square logarithmic error (RMSLE) and decision coefficient (R-square) were used to measure the fitting capacity of the model.

The error rate and root mean square error formula are as follows:

The formula for the logarithmic error and the coefficient of determination of the root mean square is as follows:where n is the number of samples, is the original value, is the predicted value, and is the sample mean.

4. LSTM-GBRT Modelling Methodology

The LSTM neural network is capable of capturing time-dependent information and has an excellent effect on time series prediction, but it is insufficient in predicting inflection point data. The GBRT model is a typical representative of the ensemble learning algorithm, and the model is robust. In this paper, the LSTM-GBRT model is proposed by combining the two methods. The LSTM neural network is used to extract the features with time-dependent information. The features are trained by the GBRT model to predict traffic accidents. The structure of the LSTM-GBRT model is shown in Figure 3.

4.1. Normalization

The raw data are processed using min-max normalization to eliminate dimensional differences. A linear transformation of the original data causes the result to fall into the [0, 1] interval, and the conversion formula is as follows:where max represents the maximum value of the feature in the sample data and min represents the minimum value of the feature in the sample data; x represents raw data, and X represents normalized data.

4.2. LSTM Layer Hidden Unit Number

There is no clear theoretical guidance for determining the number of nodes in the hidden layer. In general, use the following formula to select the number of nodes:where N is the number of hidden nodes; n is the number of input nodes; m is the number of output nodes; and a can take a constant of 1 to 10.

In this paper, there are 13 input nodes and 1 output node. According to formula (7), the number of hidden nodes is 5∼13. Try a different number of hidden layer nodes using 1 layer of LSTM and judge the degree of deviation according to the error rate and root mean square error, so as to select the number of hidden layer nodes.

The experimental results of the test set show that the LSTM model using 11 hidden nodes has the smallest RMSE value and the best prediction effect. The detailed error rate and root mean square error results of the test set are shown in Table 3.

4.3. LSTM Layer Depth

Since there are only 19 records in this example, the model depth is too high, which will cause the data to be overfitting. The experiment uses 1∼5 layer models for comparison, and 11 hidden nodes are used for each layer. After training, the model performance is judged by calculating the root mean square logarithmic error and the decision coefficient of all records. The fitting results are shown in Table 4.

The smaller the RMSLE model, the better the fitting effect. The closer the R-square is to 1, the stronger the ability of the variable to interpret y and the model fits the data better. According to the results in Table 4, the 2-layer LSTM model has the best fitting ability.

4.4. GBRT Layer Regularization

The regularization formula is as follows:where . Here, l is a differentiable convex loss function that measures the difference between the prediction and the target ; the second term Ω penalizes the complexity of the model (i.e., the regression tree functions). The additional regularization term helps to smooth the final learnt weights to avoid overfitting.

4.5. Hyperparameters of LSTM-GBRT Model

The hyperparameters of the LSTM layer include the number of network layers, the number of hidden cells in the layer, the learning rate, and the optimizer type, and the parameter settings are shown in Table 5.

The hyperparameters of the GBRT layer include the learning rate, the number of estimators, the maximum depth of the tree, the number of split nodes in the sample, the minimum sample required for the leaf nodes, and the loss function. This paper uses GridResearchCV to automatically find the optimal superparameters. The final parameter settings are shown in Table 6.

5. Comparative Analysis of Experiments

5.1. Experimental Environment

The experimental environment of this example is TOSHIBA satellite S40-A laptop, CPU: Intel(R) Core(TM) i3-3217U CPU at 1.80 GHz, running memory is 10 G, operating system is Windows 10 Enterprise Edition 2016 long-term service version, development environment. To use the PyCharm integrated development tool of Python 3.5 language, use the neural network model such as LSTM provided by Keras and use the GBRT model provided by skit-learn.

5.2. Experimental Design and Analysis of Results

Experiments include traditional regression models, neural network models, and integration model types of experiments. The experimental items are multivariate nonlinear regression (MUL), BP neural network model (BP), long- and short-term memory neural network model (LSTM), gradient boosted regression trees model (GBRT), and LSTM-GBRT model. The 15 data from 1998 to 2012 were used as training sets, and the four data from 2013 to 2016 were used as test sets. Use the data from the previous year as an input sample to predict the number of traffic accident deaths in the coming year. Figure 4 is a trend chart of actual traffic accident deaths from 1998 to 2016.

After experimental training, the prediction results of each model in the test set are shown in Table 7.

The prediction results in the test set show that the BP neural and MUL regression models have no obvious regularity, and the prediction accuracy is not high.

The accuracy of LSTM in 2013, 2014, and 2015 was extremely high, and the deviation in 2016 suddenly increased. The trend of the actual number of deaths in Figure 4 is analyzed. 2016 is the year of the inflection point in the time period, and the trend of the first three years is consistent with the trend of the training data set, indicating that LSTM has an excellent prediction effect on the same trend. Conversely, when the forecast is the inflection point of the trend, the performance will suddenly drop. It also proves that the LSTM model can indeed learn the time-dependent information in the data.

The prediction results of the GBRT model and the LSTM-GBRT model did not fluctuate particularly among large samples, and the overall prediction effect remained stable. Many of the predicted values of the GBRT model and the LSTM-GBRT model are the same. We analyzed that the base model of the GBRT model is a regression tree, and the data fluctuations in the test set in 2013–2016 are small. In addition, the result for 2015–2016 moves away from the real data because the LSTM layer included in the LSTM-GBRT model learned time-dependent information, resulting in poor prediction of the trend inflection point.

Figure 5 shows the actual death toll in 1998–2016 and the fitted prediction results for each model.

After observing the prediction results of each model, we evaluate the effect of fitting all the data of each model. In addition to the performance indicators mentioned in Section 3.4, we add the training time-consuming indicators of the model for comparison. The performance indicators are shown in Table 8.

The performance indicators of different models in Table 8 show that the LSTM-GBRT model has the smallest RMSLE value, the best model fitting effect, the R-square value is closest to 1, and the variable has the strongest interpretation ability for the predicted value, but the training time is the longest. GBRT model training time is the shortest. The prediction performance of the GBRT model is not bad but slightly lower than the LSTM-GBRT model. The performance of the LSTM neural nework is lower than the GBRT model, and the performance of the MUL regression model and the BP neural network model is poor.

In terms of training time, the MUL regression model and the GBRT model have the shortest training time because their essence is a linear combination of mathematical data; the LSTM-GBRT model has the slowest training time, and LSTM model training time is very close to BP neural network training time. The training time of the neural network is obviously higher than that of the former because the training of the neural network model needs to construct a complex network structure.

5.3. Robustness Analysis

The occurrence of a traffic accident is influenced by many factors. The predictive model can predict the complex and variable conditions more stably, which indicates that the model has better robustness.

When analyzing the robustness of the model in this experiment, two aspects should be considered: first, internal factors, whether there are abnormal fluctuations in the model training data; second, external factors and policies proposed at the social level have promoted or inhibited the predicted data. Regardless of internal factors and external factors, the core is in the data. For model training data, the role of external factors is still indirectly affecting the data required for training, and then the effect of prediction is reflected. The model which is difficult to control is the external factor.

In this case, the model uses annual periodic data and policy factors have a short period of action and can cause less data fluctuation, so the robustness is better. When the model uses more sophisticated data, the influence of data fluctuation will increase. Firstly, anomaly data should be analyzed visually, the uniformity of each variable should be observed, and the uneven data should be processed, such as log function. Secondly, the abnormal variables are divided into two or more types of processing strategies. After the correlation analysis is completed, two or more models are established to train and predict. Finally, the prediction results of the multiple models are accumulated. Model training for specific data classification can also improve the accuracy of prediction, thus enhancing the robustness of the model.

6. Conclusion

The prediction of traffic accidents is of great significance. The future traffic accident trend forecasting work can help the traffic management department to grasp the trend dynamics in time, discover the rules of traffic accidents, formulate laws and regulations according to the rules, make scientific decisions, and construct the traffic system reasonably.

This paper proposes a road traffic accident prediction model based on the LSTM-GBRT model. Compared with the traditional regression model, the traditional BP neural network model, the LSTM neural network model, and the GBRT model, the experimental results show that the LSTM-GBRT model has the strongest ability to fit the data and the variable has the best interpretability for the predicted value. The model has a good predictive ability for the trend of road traffic safety level and can provide more accurate forecast data for the traffic management department, so that the traffic management department can better grasp the situation of traffic safety levels.

The model proposed in this paper also has defects. (1) Data collection, model training lacks relevant data on environmental factors. Due to the large randomness of road traffic accidents, its occurrence is affected by many factors. The environmental data belong to spatiotemporal data, which are difficult to collect, and the data of annual accident traffic accidents are not easy to quantify, so the weather environment factors are lacking in the model training data. (2) In terms of inflection point prediction, since the inflection point of the trend is unlikely to be discovered by the model in advance, the forecasting ability of the inflection point of the possible future trend is poor.

This paper takes China’s annual traffic accident data as the research object, the proposed prediction task of the model is relatively macroscopic, and the predictability of microdata needs further experiment. This paper considers adding more relevant features, but in the macrodata forecasting work, related features are difficult to obtain or difficult to quantify. In the future, when taking microtraffic accident data as the research object, consider adding more features.

Data Availability

The raw data we used were official open data published by the UK Department of Transportation, and our experimental data were filtered from raw data online available at https://data.gov.uk/dataset/road-accidents-safety-data. The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was funded by the Xinjiang Uygur Autonomous Region Natural Science Fund Project “Research on Highway VANET Early Warning Information Broadcast Transmission Mechanism” (2017D01C042).