Abstract

Annual fatal traffic accident data often demonstrate time series characteristics. The existing traffic safety analysis approaches (e.g., negative binomial (NB) model) often cannot accommodate the dynamic impact of factors in fatal traffic accident data and may result in biased parameter estimation results. Thus, a linear Poisson autoregressive (PAR) model is proposed in this study. The objective of this study is to apply the PAR model to analyze the dynamic impact of traffic laws and climate on the frequency of fatal traffic accidents occurred in a large time span (from 1975 to 2016) in Illinois. Besides, the NB model, NB with a time trend, and autoregressive integrated moving average model with exogenous input variables (ARIMAX) are also developed to compare their performances. The important conclusions from the modelling results can be summarized as follows. (1) The PAR model is more appropriate for analyzing the dynamic impacts of traffic laws on annual fatal traffic accidents, especially the instantaneous impacts. (2) The law that allows motorcycles and bicycles to proceed on a red light following the rules applicable after a “reasonable period of time” leads to an increase in the frequency of annual fatal traffic accidents by 14.98% in the short term and 30.69% in the long term. The climate factors such as average temperature and precipitation concentration period have insignificant impacts on annual fatal traffic accidents in Illinois. Thus, the modelling results suggest that the PAR model is more appropriate for annual fatal traffic accident data and has an advantage in estimating the dynamic impact of traffic laws.

1. Introduction

A report of National Highway Traffic Safety Administration (NHTSA) reveals that 37,461 people were killed in 34,439 motor vehicle crashes, an average of 102 deaths per day in year 2016. To reduce the number of people killed in traffic accidents, it is important to analyze the influential factors affecting the frequency of fatal traffic accidents. Among many factors, the traffic law is considered an effective measure to reduce the severity of injuries and the number of fatalities as a means of macroeconomic regulation and control. Some existing studies have analyzed the impact of certain traffic laws on the number of traffic accidents, such as the seat belt law [1], driving under the influence (DUI) law [2, 3], and alcohol control law [4]. However, the dynamic impacts of these traffic laws on traffic accidents have not been adequately studied.

So far, numerous traffic safety analysis models have been developed. Since the frequency of traffic accident is non-negative and integer, many studies assumed such events follow a Poisson distribution and modelled the frequency of traffic accidents using a Poisson regression model [5, 6]. However, the Poisson model cannot handle overdispersed or underdispersed data and may result in biased estimation. In order to analyze the overdispersed data, many studies proposed different mixed Poisson models, such as the Poisson-gamma model (the negative binomial (NB) model) [713], Poisson-lognormal model [1416], and Poisson-inverse gamma model [17]. For the data with many zeros (i.e., excess zero-count data), the zero-inflated models were applied, including the zero-inflated Poisson model [18, 19], zero-inflated negative binomial model [2022], and their extension models (i.e., multiple random parameter zero-inflated negative binomial regression model [20] and zero expansion Poisson regression model with random parameter effect [23]). Although rare, crash data can sometimes be characterized by underdispersion. The Conway–Maxwell–Poisson model [24] and diagonal inflated bivariate Poisson regression model [25] were appropriate.

Several recent works about regression models cannot properly address the time series characteristics of the traffic accident count data. Noland et al. [26] proposed a time trend variable as an explanatory variable in the count regression model to consider the series correlation. However, this model may not clearly account for the effects of serial correlation. An alternative approach was modelling possible dynamics in the traffic accident count data with a lagged dependent variable in the Poisson or NB models. These models failed to represent adequately the dynamics in persistent time series because they implied that the growth rate of the process was the exponentiated coefficient on the lagged dependent variable. Such a process may potentially generate time series data rather than dynamic data [27]. These two kinds of models were dynamic models with a trend, but not necessarily a cyclical or dynamic component. Another approach to handle the time series was the autoregressive integrated moving average (ARIMA) model and its extensions including the seasonality autoregressive integrated moving average (SARIMA) [28] and nonlinear autoregressive exogenous (NARX) [29]. These time series models may not be applicable to discrete time series variables (e.g., traffic accident count data). To consider both time series and discrete characteristics of the response variables, an integer-valued autoregressive (INAR) Poisson model [3034] was developed. However, the dynamic characteristics of the influential factors were not adequately described in the INAR model. Few approaches can adequately model the dynamics and distribution of annual fatal traffic accident data.

To address the above issues, this study proposed a linear Poisson autoregressive (PAR) model. The objective of this study is to apply the PAR model to analyze the dynamic impact of traffic laws on annual fatal traffic accident frequency from the macroscopic point of view using the data collected in Illinois from year 1975 to 2016. The contribution is to demonstrate the performance of the PAR model in the analysis of the dynamic influence of factors on traffic accident frequency and quantitatively analyze the impacts of traffic laws.

The rest of the paper is organized as follows. Section 2 introduces the specification, estimation, and interpretation of the PAR (p) model. Section 3 describes the dataset used in this study and the source of our data. In Section 4, the results of statistical modelling are shown to understand the contribution of different factors to the annual fatal traffic accidents in Illinois and compare the performances of various models. A conclusion and future recommendations are provided in Section 5.

2. Methodology

2.1. The PAR Model

Before presenting the model, the linear autoregressive (AR) process is firstly introduced. The AR model describes the random variables at some time by using the linear combination of random variables at earlier time as equation (1). It is a common form of time series.where is the traffic accident count value at time t, is the past traffic accident count value at the i moments before time , is autocorrelation coefficient, and is a random error term.

Because there are explanatory variables in the PAR model, it is necessary to redefine the variables in the AR process. The conditional data replace in the AR (p) model, which is a vector that included all the observed values of the dependent and independent variables at time .where are the past explanatory variables (factors affecting traffic accidents) at the i moments before time . can be regarded as all the prior information about the series of interest at time . Assume that is a realization from a Markov process with the conditional transition probability and . Let the conditional expectation at time have a finite mean. Then, is a th order linear autoregressive process as shown in the following equation:

Then, we can obtain equations (4) and (5) by using iterated expectations [35] of equation (3):where equation (5) is a geometric series for ; then,

Since , equation (3) can be written as

This is a linear AR(p) process, where the distribution of is not used. The only role of distribution of is to define the possible value of .

The PAR (p) model can be defined as follows. The assumptions of the model are that the observed traffic accident counts are generated from a Poisson distribution on the condition of . Then, the measurement equation for the observation value is obtained in the following equation:

Assume that is the conditional mean of the linear AR process of , which defines the state variable for the model. According to the measurement equation (8), this state density with its mean and variance is in the exponential family. The mean or state variable of the marginal Poisson distribution evolves according to a stationary AR (p) process with autocorrelation parameters , and its transition equation is expressed as follows:where are -matrices of explanatory variables composed of various factors affecting traffic accidents, is a -vector of regression parameters, and is the number of factors.

Finally, assume that the density of the state variable has a gamma-distributed conjugate prior with gamma’s location parameter and scale parameter, sowhere and . The prior is constructed using the observed traffic accidents data. The prior can help to find the conditional mean and variance at time based on the previous observations.

Since the prior is gamma, using an extended Kalman filter, the conditional distribution at time given is also gamma, that is, . Since the measurement equation is Poisson and the state equation is gamma, an estimate of the posterior at time is equation (11). This is a negative binomial distribution.

Replace the AR (1) process with and obtain the PAR (1) model with a negative binomial predictive distribution. The one-step ahead conditional forecast function for the PAR (p) model is expressed as follows:

2.2. The Impact Multipliers

Because the PAR (p) model considers an explanatory variable matrix and , the interpretation differs from the Poisson and NB models significantly [36]. There is a concept of impact multiplier as in a Gaussian linear autoregressive model, which is the effect of a change in explanatory variable . The instantaneous impact multiplier can be obtained by calculating the first derivative of the mean function for this change. The calculation process is shown in the following equation:where is the coefficient of the explanatory variable . This is the instantaneous effect of a shock in factors affecting traffic accidents on the mean of traffic accidents . The long-run impact multiplier which means the total effect of a shock to can be calculated by equation (14), as in the Gaussian time series analysis. The long-run multiplier can be compared with the parameter estimation of other count regression models, which measures the impact of a shock on the conditional mean number of events.

The long-run impact multiplier and the instantaneous impact multiplier correspond to the concepts of average impact and marginal impact in economics, respectively. In economics, the average impact corresponds to the whole time, while the marginal impact corresponds to the “present” in time. The long-run impact is compared with the whole of the past, which should consider the impact of the past. However, the instantaneous impact multiplier or the marginal impact (the marginal effect is obtained by the partial derivation of as shown in equation (13)) focuses on the impact on the future without considering the past.

In the Poisson and NB models, the long-run and instantaneous impact multipliers are the same and they can also be calculated by equation (14). The reason for this difference is that the PAR (p) model accounts for the influence of explanatory variables on traffic accidents and the dynamic responses to the changes in explanatory variables over time.

3. Data Description

The annual fatal traffic accident frequency data for Illinois from year 1975 to 2016 were obtained from NHTSA’s Fatality Analysis Reporting System (FARS). The annual fatal traffic accident was taken as the dependent variable to avoid underreporting due to various definitions of traffic accidents. Some traffic laws were considered to evaluate the instantaneous and long-run impact on traffic accidents from a quantitative perspective. According to WHO [37], five road safety risk factors (i.e., speeding, drunk-driving, the use of helmets, seatbelts, and child restraint systems) played an important role in traffic injuries and deaths. And Senna et al. [38] concluded that driving under the influence of alcohol is always a dominant problem. Due to the far-reaching influence of law on traffic accidents, the research data of law are basically based on the year [1, 39]. Thus, the traffic laws in Table 1 were selected for analysis of annual fatal traffic accidents. The variables related to law were set as binary variables here. For example, the initial effective date of safety belt law in Illinois was January 1, 1988, and then this traffic law variable equalled 0 for the first 13 periods and 1 thereafter (every traffic law used in this study and their effective date are shown in Table 1). If the date of implementation of the law was in the latter half of the year, the law was considered to work from the second year.

Besides, various factors identified as related to traffic safety are composed of our dataset to analyze how the explanatory variables affect the annual fatal traffic accident. In order to be consistent with legal factors, we also select some macroscopic indicators ranging from 1975 to 2016 with an interval of year in Illinois. The dataset is assembled from a variety of sources including the U.S. Energy Information Administration, Federal Highway Administration, and National Institute on Alcohol Abuse and Alcoholism. The dataset covers economic, social, driver, climate, and law factors. Summary statistics of the dataset are shown in Table 1. Note that the climate factors mentioned are different from the microscopic weather factor, which indicates the wet and dry conditions over a large space-time range. In the dataset, gross domestic product (GDP), total vehicle miles of travel (VMT), rural VMT as proportion of total VMT, per capita beer consumption, gasoline price, and safety belt law are used to analyze the influence of various factors on traffic accidents [3]. Geedipally et al. [3] demonstrated that DUI laws, beer consumption, the proportion of rural VMT, and shocks in the economy had a significant effect in traffic fatalities. Note that all the economic indicators are converted to 2016 dollars using the consumer price index (CPI) calculator at the Bureau of Labor Statistics.

4. Modelling Results

Four models (i.e., the ARIMAX, PAR, NB, and NB with a time trend model) are developed using the Illinois data, where the ARIMAX, NB, and NB with a time trend model are considered as benchmark models. All the models constructed in this paper are implemented with R. The main purpose is to identify the influential traffic laws affecting the fatal traffic accidents. Figure 1 shows the trend of the fatal traffic frequency accident and VMT over time. As can be seen from Figure 1, the frequency of fatal traffic accident decreases significantly with time and shows sequence correlation, while VMT increases with time.

For the PAR(p) model mentioned in the methodology, the order of the PAR (p) model is determined firstly (Table 2). Based on the PAR (1) model, stepwise regression is used to select all significant variables as the combination of explanatory variables of each model. Thus, the final models include only a subset of the original explanatory variables, which is shown in Tables 2 and 3. The Akaike information criterion (AIC) which is an estimator of the relative quality of statistical models for a given set of data provided another means for order selection. The smaller its value is, the better the fitting effect of the model is. When , the model complexity increases but AIC does not decrease significantly. As discussed by Eluru et al. [40], different from the AIC, the Bayesian information criterion (BIC) imposes a larger penalty on model overfitting with excess parameters. As can be seen from Table 2, the BIC value for the PAR (4) and PAR (5) models differ slightly because the PAR (5) model has more parameters than the PAR (4) model. Besides, the estimated insignificant parameters are shown in bold in Table 2. There are insignificant parameters from , which will affect the analysis of the impact multiplier of explanatory variables. Based on the modelling results in Table 2, the PAR (2) model is chosen as the final model.

In addition to the PAR (2) model, the ARIMAX, NB, and NB with a time trend models are also compared as alternative models. Because of the time series in the data, the ARIMAX model considering the explanatory variables is selected as one of the alternative models. Based on AIC values, the final model is determined as ARIMAX (1, 1, 0). Besides, the NB model, which is most commonly used in traffic accident frequency analysis, is considered as one of the alternative models. Furthermore, the NB with a time trend model which can consider the time series and discrete characteristic of the traffic accident frequency by using a simple solution is also compared. The parameter values of these models are estimated using the maximum likelihood estimation method.

The results of these models are shown in Table 3. According to the AIC and BIC values in Table 3, the PAR (2) model fits this dataset best, followed by the NB with a time trend model and ARIMAX (1, 1, 0) model. The NB model provides the least fitting performance. The models with a time trend structure seem to fit best since the traffic accidents appear to be serially correlated. Taking AIC as example, the fitting performance of the PAR model increases by 12% compared to the NB model, 6% compared to the ARIMAX model, and 5% compared to the NB with a time trend model. However, the coefficients of explanatory variables estimated by the ARIMAX model are not significant; in other words, the ARIMAX model is not able to explain how these factors affect annual fatal traffic accidents. For the NB with a time trend model, most of the explanatory variables are significant and the values of parameter estimates are similar to those of the NB and PAR (2) models. However, the belt law and beer consumption variables are insignificant. This phenomenon shows that this model may not help us to explain the impact of various legal factors on the frequency of traffic accidents completely. Since the PAR (2) model and NB model have all statistically significant variables, their fitting performance and parameter estimates are compared in the following paragraphs.

Except the ARIMAX model, other models belong to the regression model. During modelling the fatal traffic accidents, total VMT is considered as an offset term because there is a linear relationship between total VMT and fatal traffic accidents [3]. From a qualitative point of view, both the coefficients estimated by the NB and PAR models show that beer consumption has the greatest impact among these factors on the frequency of annual fatal traffic accidents (Table 3). However, the AIC and BIC values of the PAR (2) model are much smaller than those of the NB model. Note that the PAR model can capture dynamics in fatal traffic accident data and provide better fitting performance. In addition to the goodness-of-fit statistics, we further compare the modelling results of the PAR (2) model and the NB model.

For dynamic models, the results cannot be fully obtained by displaying coefficients in Table 3, which represented the average effect of explanatory variables [41]. Besides, the generalized linear model includes a link function, which makes it difficult to explain the original coefficients of the model independently [41]. In other words, the estimated coefficients in Table 3 cannot directly quantify the impact of factors per unit change on fatal traffic accident frequency. Thus, in order to compare the effects of explanatory variables on fatal traffic accidents in different models, the long-run and instantaneous multipliers are calculated by using equations (13) and (14). Since this study focuses on the impact of traffic laws on annual fatal traffic accidents, only the impact multipliers of law are presented in the first column of Tables 4 and 5. Because of the dynamic structure of the PAR (2) model, the value of long-run impact multiplier is larger than that of instantaneous impact multiplier . The three laws (e.g., belt law, DUI toughened penalties, and alcohol law) lead to a decrease in fatal traffic accident, among which the DUI toughened penalties law has the greatest influence. After the implementation of the DUI toughened penalties, the frequency of fatal traffic accident decreases by 91 in the short run and 186 in the long run. However, implementation of the law that allows motorcycles and bicycles to proceed on a red light following the rules applicable after a “reasonable period of time” (red running) leads to an increase in the frequency of fatal traffic accidents. Before this study, Pai and Jou [42] have revealed the high association between bicyclist red-running and accidents in Taiwan. The effect of red running is a total increase of 201 fatal traffic accidents in the long run and 72 fatal traffic accidents in the short run, which indicates that this law is not conducive to traffic safety. This law may be designed to increase the traffic efficiency of nonmotorized vehicles and reduce travel time, but it is not desirable to improve traffic efficiency at the expense of traffic safety.

Finally, the impact multipliers of the NB and PAR (2) models are compared (Tables 4 and 5). All signs of parameters estimated by the NB and PAR models are the same. It can be observed that the frequency of fatal traffic accidents decreases with the increase of gasoline prices, the implementation of the belt law, and the enforcement of the DUI penalty law. With beer consumption rising and red running allowed, the frequency of fatal traffic accident increases. However, the values of impact multipliers estimated by the NB and PAR (2) models differ significantly. Taking the implementation of red running law as an example, which is the only law variable leading to an increase in the frequency of traffic accidents in the dataset, the instantaneous impact of the red running law is about 98 for the PAR (2) model, which means that the implementation of red running law has increased the number of accidents by 98 at this point. The long-run multiplier of red running law is 201, which means the frequency of accidents increased by 201 in the long run. The NB model estimates the impact of enforcing the red running law by a multiplier of 113. These percentage changes are shown in Tables 4 and 5. For the PAR (2) model, the total percentage change due to this intervention in the number of fatal traffic accidents is an increase of 30.69% while the instantaneous percentage change is 14.98%. For the NB model, the total change in the number of fatal traffic accidents is 17.59%. The instantaneous effect of the PAR (2) model is smaller than the instantaneous effect of the NB model, and long-run effects of the PAR (2) model are larger than the long-run effect of the NB model. The reason for this phenomenon is that the estimated coefficients of the PAR (2) model include dynamic characteristics. The long-run multiplier takes into account the impact of the previous stage on the present, while the instantaneous effect only takes into account the current impact. Since describes an average effect, the multipliers calculated from can only describe the average effect. Thus, the instantaneous effect of the NB model actually reflects the average effect, which leads to overestimation of the short-run effect. The NB model cannot properly consider the dynamics of fatal traffic accident data, which leads to overestimation of the instantaneous impact of explanatory variables on fatal traffic accidents.

For the remaining three variables, both the long-run and instantaneous impact multipliers estimated by the PAR model are smaller than those estimated by the NB model. Taking the belt law as an example, the instantaneous and long-run percentage changes due to the intervention estimated by the PAR model are −7.32% and −15%, respectively, and the estimated multiplier of the NB model is −20.15%. This phenomenon indicates again that the NB model overestimates the impact of explanatory variables, especially for the instantaneous impacts. The dynamic nature of the PAR (p) model makes it more suitable for estimating the dynamic impact of traffic laws on annual fatal traffic accidents. The instantaneous impact of a safety intervention strategy can inform the transportation management agencies to design more appropriate traffic laws, while the NB model cannot provide such information.

5. Conclusions

Annual fatal traffic accidents are count data with time series characteristics. The existing traffic accident analysis models cannot fully model their dynamic characteristics and analyze the dynamic influence of explanatory variables on annual fatal traffic accidents. Among many explanatory variables of traffic accident analysis, the dynamic effect of the enforcement of traffic laws has not been widely concerned. In this study, a linear Poisson autoregressive model is proposed to analyze the long-run and instantaneous impact of traffic laws on annual fatal traffic accidents. Then, the modelling results of PAR (p), ARIMAX, NB, and NB with a time trend models are compared. Several major conclusions are summarized as follows:(1)The PAR model can outperform the ARIMAX, NB, and NB with a time trend models in terms of fitting performance and estimation of dynamic effects. The PAR (p) model is more suitable for analyzing the dynamic impact of traffic laws on annual fatal traffic accidents. Compared with the ARIMAX model, the PAR (p) model can consider discrete characteristics in the accident data and analyze the influence of factors. Compared with the NB with a time trend model, the PAR (p) model can accurately analyze the influence of more explanatory variables on the frequency of fatal traffic accidents. Compared with the NB model, the PAR (p) model can capture the time series in annual fatal traffic accident frequency and calculate the dynamic effect of traffic laws and other explanatory variables. The omission of the dynamics from the NB model leads to biased parameter estimates, especially the inability to estimate the instantaneous multipliers of factors. However, instantaneous multipliers can indicate the immediate effects of traffic law interventions for traffic safety management agencies and help to make new laws.(2)Some climate and traffic law factors are considered to quantitatively evaluate their impact on annual fatal traffic accidents in Illinois. The average temperature and precipitation concentration period have insignificant impacts. The law of DUI toughened penalties results in a decrease of annual fatal traffic accidents by 12.52% in the short run and 25.65% in the long run, which has the greatest inhibitory effect on fatal traffic accidents among the analyzed laws. However, the law allowing red running leads to an increase of annual fatal traffic accidents by 14.98% in the short term and 30.69% in the long term. Therefore, controlling the DUI behaviors and modifying the red running law may significantly reduce the frequency of annual fatal traffic accidents, which provide guidance for future traffic law development.

The PAR (p) model can be widely applied to analyze the time series count data. Besides the traffic laws mentioned in this paper, the applicable explanatory variables exhibiting a sudden change can be extended to the factors such as the emergence of policies and regulations, and the dynamic impact of these kinds of variables can be well explained by the PAR (p) model. For future research, the PAR (p) model can be applied to investigate traffic accident data collected from other sites. Furthermore, with the development of data acquisition technology, multisource datasets [4346] can be used to analyze traffic accidents.

Data Availability

The experimental data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This research was sponsored jointly by the National Key Research and Development Program of China (grant no. 2018YFE0102800), the Shanghai Science and Technology Committee (grant no. 19210745700), and the Fundamental Research Funds for the Central Universities (grant no. 22120200035).