Journal of Advanced Transportation

Volume 2017, Article ID 5391054, 8 pages

https://doi.org/10.1155/2017/5391054

## Bayesian Hierarchical Modeling Monthly Crash Counts on Freeway Segments with Temporal Correlation

School of Civil Engineering and Transportation, South China University of Technology, Guangzhou, Guangdong 510641, China

Correspondence should be addressed to Huiying Wen; nc.ude.tucs@newyh

Received 26 June 2017; Accepted 11 September 2017; Published 24 October 2017

Academic Editor: Francesco Bella

Copyright © 2017 Qiang Zeng et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

As the basis of traffic safety management, crash prediction models have long been a prominent focus in the field of freeway safety research. Studies usually take years or seasons as the observed time units, which may result in heterogeneity in crash frequency. To eliminate that heterogeneity, this study analyzes monthly crash counts and develops Bayesian hierarchical models with random effects, lag-1 autoregression (AR-1), and both (REAR-1) to accommodate the multilevel structure and temporal correlation in crash data. The candidate models are estimated and evaluated in the freeware WinBUGS using a crash dataset obtained from the Kaiyang Freeway in Guangdong Province, China. Significant temporal effects are found in the three models, and Deviance Information Criteria (DIC) results show that taking temporal correlation into account considerably improves the model fit compared with the Poisson model. The hierarchical models also avoid any misidentification of the factors with significant safety effects, because their variances are greater than in the Poisson model. The DIC value of the AR-1 model is substantially lower than that of the random effect model and equivalent to that of the REAR-1 model, which indicates the superiority of the lag-1 autoregressive structure in accounting for the temporal effects in crash frequency.

#### 1. Introduction

The freeway has become a primary method of long-distance passenger and cargo transportation due to its high capacity and potential for high speed. The traffic on freeways consists of more motor vehicle types than urban traffic and includes passenger cars, coaches, vans, and light/medium/heavy trucks. This diversity in freeway traffic composition may result in numerous vehicle interactions, and the high speeds result in shorter driver response times when encountering emergencies. Freeways in mountainous areas may also suffer from poor geometrical design and adverse weather conditions [1, 2], which increase the crash risk. Developing a crash prediction model (or safety performance function) that provides a good understanding of the crash occurrence mechanism on a freeway is thus vital [1] when ranking sites for safety improvement and for accurately evaluating the effectiveness of countermeasures.

In the absence of detailed driving information (e.g., acceleration, braking, and steering), most studies have analyzed the relationship between the risk factors and crash frequency [3]. The most common analytical methods are statistical count models, which explicitly illustrate the random, discrete, and nonnegative nature of crash frequency data and the safety effects of the main contributing factors [4]. Poisson regression is the basic model for crash prediction, in which crash count is assumed to follow a Poisson distribution that requires the mean to be equal to the variance [5]. To account for important issues related to crash data such as overdispersion, underdispersion, excess zero observations, spatiotemporal correlation, multilevel structures, and unobserved heterogeneity, a great many Poisson model variations have been proposed, which significantly improve model fit and predictive performance [3, 6]. With more recent advances in crash prediction modeling, Bayesian inference has been extensively applied to traffic safety analysis because of its ability to deal with complex models (often without closed-form likelihood functions) such as the hierarchical model [1, 2], spatiotemporal model [7, 8], random parameters model [9], and multivariate model [10] and through these [8]. The freeware WinBUGS provides a user-friendly platform for making Bayesian inferences using Markov chain Monte Carlo (MCMC) techniques. The integrated nested Laplace approximation (INLA) approach has additionally been developed as a computationally efficient alternative to the MCMC methods, and an R package (R-INLA) is available to easily apply the approach [11].

Consideration of multiple levels is extremely important in freeway safety analysis because panel data are used for crash modeling [12]. Bayesian hierarchical models are thus the most widely used methods in freeway safety. However, freeway crash frequencies have previously been aggregated by year or season, which may result in information loss in time-varying explanatory variables. To avoid this phenomenon, crash data should be aggregated into small time intervals (e.g., months). This manipulation results in the same freeway section generating multiple observations, which may be correlated over time because of their shared effects of unobserved or unobservable time-dependent factors. Washington et al. [13] pointed out that ignoring temporal correlation will lead to an underestimation of the parameters’ variances and thus potentially lead to the incorrect identification of the contributing factors, which has significant consequences for safety.

The widely used hierarchical Poisson model (also called the random effect model) is able to accommodate temporal correlation to some extent, but the added residual term is unstructured, which may not fully account for the temporal correlation. In addition to the random effect model, a variety of methodological approaches have been proposed to assess temporal effects in crash frequency data. These include generalized estimating equations with independent, exchangeable, autoregressive, or unstructured temporal terms [14, 15], a Bayesian hierarchical Poisson with a lag-1 autoregression (AR-1) model [7], an autoregressive integrated moving average model [16], an integer-valued autoregressive Poisson model [17], a latent variable representation of count data models with autoregressive temporal terms [18], and a multinomial generalized Poisson model with temporal dependence [19]. Of these approaches, the Bayesian hierarchical Poisson AR-1 model with the simplest formula is able to account for a multilevel structure and temporal correlation simultaneously. In both the generalized estimation equation and the hierarchical Poisson regression modeling frameworks, the autoregressive terms have been found to significantly outperform unstructured terms in model fit [7, 15].

In this study, the key objective is to develop a hierarchical temporal model to analyze freeway crash frequency aggregated by month, which accommodates a multilevel structure in panel crash data and temporal correlation across observations at the same site. Bayesian hierarchical Poisson and hierarchical Poisson AR-1 models are the two candidate methods. An approach integrating the two methods is also proposed, to simultaneously account for the structured and unstructured temporal effects. A Poisson model is used as a benchmark to demonstrate these temporal models, and they are calibrated and compared in the Bayesian context using a year’s worth of crash data from the Kaiyang Freeway in China.

The remainder of this paper is as follows. In the next section, the alternative models and a model comparison criterion are specified. Section 3 describes the collected data for model demonstration. Detailed estimations of the models are introduced in Section 4, and the results of model comparison and parameter estimation are discussed. Section 5 concludes and presents directions for future research.

#### 2. Methodology

In this section, the structure of the Poisson model is formulated. The formulations of the three Bayesian hierarchical models for predicting crash frequency with temporal correlation are then specified in order of complexity. Finally, the Deviance Information Criteria (DIC) is introduced for the purpose of model comparison.

##### 2.1. Model Specification

###### 2.1.1. Poisson Model

In the Poisson model, the crash occurrence is assumed to be a Poisson process. That is, the crash count on freeway segment during month is assumed to follow a Poisson distribution [5]:where and are the number of observed sites and periods, respectively, and is the underlying Poisson mean of . Conceptually, the expected crash count is modeled as the product of crash exposure and crash risk [8]:

Crash exposure is defined as the number of opportunities for crashes in a given time in a given area. The crash exposure of a roadway segment is generally associated with its length and the traffic volume. Forms proposed in previous studies include annual average daily traffic [20] and vehicle miles traveled [1]. In the current research, the observational time unit is month. As the numbers of days in certain months differ, the monthly total traffic (MTT) is used as a crash exposure variable to specify the traffic volume precisely. The freeway crash exposure is formulated by the product of a power of MTT and of segment length, which reveal the potential nonlinear relationship between crash frequency and traffic volume [8]:in which is the length of freeway segment and is its MTT during month . The two parameters to be estimated are and .

A generalized linear function is assumed between the crash risk and the observed risk factors : where are the coefficients corresponding to the risk factors.

###### 2.1.2. Random Effect Model

The monthly crash counts may be affected by unobserved or unobservable factors related to the freeway section, resulting in site-specific effects [1]. The shared site-specific effects of the crash counts on the same freeway section during different months are referred to as unstructured temporal effects [3]. To account for the site-specific/unstructured temporal effects in the random effect model, a residual term is added to the generalized linear function for modeling crash risk:where is assumed to follow a normal distribution with mean 0 and standard deviation :

###### 2.1.3. Autoregression-1 (AR-1) Model

The AR-1 model accounts for the temporal correlation among crash frequencies during successive months by specifying a residual term with lag-1 dependence [7], where lag-1 means that the temporal effect on a specific freeway section during a month is affected by its counterpart during the previous month:where the temporal terms are assumed to follow the normal distributions, which are based on the stationarity assumption [21]In the above two equations, is the autocorrelation coefficient and is the standard deviation of the temporal terms.

###### 2.1.4. Random Effect with Autoregression-1 (REAR-1) Model

As mentioned above, the random effect and AR-1 models account for unstructured and structured temporal effects, respectively. To combine the strengths of the two models, both the unstructured and structured residual terms, and , are added to the crash risk modeling function, resulting in the REAR-1 model:

##### 2.2. Model Comparison

The DIC is commonly used for measuring the goodness-of-fit of the models inferred by the Bayesian method [7, 8, 22]. As in previous research, it is used here to compare the above formulated models. The DIC is intended to be a Bayesian generalization of Akaike’s Information Criteria that penalizes models with more parameters. Specifically, it provides a Bayesian measure of model complexity and fitting and is given by [23]where is the posterior mean deviance that can be taken as a Bayesian measure of fitting and is a complexity measure for the effective number of parameters. Generally, models with lower DIC values are preferable. However, any critical difference in DIC is very difficult to determine. According to Spiegelhalter et al. [24], roughly over 10 differences may rule out the model with the higher DIC; differences between 5 and 10 are considered substantial; and if the DIC difference is less than 5 and the parameter estimation results are significantly different then it could be misleading to simply report the model with the lowest DIC.

#### 3. Data Preparation and Preliminary Analysis

To calibrate the candidate models and compare their performances on model fit, the crash, traffic, and roadway data on Kaiyang Freeway in Guangdong Province, China, in 2014 were collected. Kaiyang Freeway has four lanes and a median barrier. Its total length is about 125 km and the posted speed limit is 120 km/h. The disaggregated crash data are obtained from the Highway Maintenance and Administration Management Platform of the Guangdong Transportation Group. The traffic data are acquired from the Guangdong Freeway Networked Toll System, and the roadway data are extracted from the Horizontal and Longitudinal Profile, designed by Guangdong Province Communication Planning and Design Institute Co., Ltd.

The first and essential step in data preparation is roadway segmentation. With reference to the previous studies on freeway traffic analysis [1], the major criterion used for segmenting the freeway is homogeneity in roadway horizontal and vertical alignments. In addition, the minimum length of each segment is set to 150 m, to eliminate the low exposure issue and the high statistical uncertainty of the crash risk on short segments. Segments shorter than 150 m are combined with proximal segments that have similar roadway features where possible. According to the two segmentation criteria, Kaiyang Freeway is divided into 154 segments ().

Crashes are mapped to these segments, based on their locations along the freeway recorded in the collected crash data. The crash counts are aggregated by segment and month (). In the Networked Toll System, vehicles are classified into five categories, according to their head height, axis number, wheel number, and wheelbase. The classification criteria are listed in Table 1. The initial traffic volumes in the system are recorded for each vehicle category. For freeway segment and month , the weighted average traffic is calculated by using the weights 1, 1.5, 2, 3, and 3.5 for vehicle categories , respectively:where , , , , and are the traffic volumes for vehicle categories , respectively, on freeway segment during month . The traffic composition, that is, the percentages of each vehicle category , , , , and , is calculated as