Green Intelligent Transport SystemView this Special Issue
A Quasi-Poisson Approach on Modeling Accident Hazard Index for Urban Road Segments
In light of the recently emphasized studies on risk evaluation of crashes, accident counts under specific transportation facilities are adopted to reflect the chance of crash occurrence. The current study introduces more comprehensive measure with the supplement information of accidental harmfulness into the expression of accident risks which are also named Accident Hazard Index (AHI) in the following context. Before the statistical analysis, datasets from various sources are integrated under a GIS platform and the corresponding procedures are presented as an illustrated example for similar analysis. Then, a quasi-Poisson regression model is suggested for analyses and the results show that the model is appropriate for dealing with overdispersed count data and several key explanatory variables were found to have significant impact on the estimation of AHI. In addition, the effect of weight on different severity levels of accidents is examined and the selection of the weight is also discussed.
Aggregated accident analyses for certain transportation facilities including road segments, intersections, and recently emphasized traffic analysis zones (TAZs) have been thoroughly studied in the forms of accident frequency and accident rates [1, 2]. The major approach in those studies is to use regression models to make connections between number of accidents and attributes of these facilities, such as geometric and traffic characteristics of road segments and socioeconomic and demographic properties of TAZs. Past studies can readily consider many aspects of accident data as well as deal with some critical issues. For example, the earlier models based on the Poisson distribution actually rely on the nature of the Poisson process of accident occurrence , the negative binomial distribution improves the Poisson model by introducing the consideration of overdispersed data  and later more advanced zero-inflated models involve the specification of excessive zero counts  into the model. It is no doubt that these advanced models have discovered the nature of accident counts in different respects. However, for the purposes of traffic safety evaluation, the value of accident frequency or accident rate is less informative to reflect the magnitude of accident risks because it cannot take the severity attributes of each accident into account.
Generally, a comprehensive risk function for safety evaluation should be able to measure the expected harmfulness of each accident and some practical analysis  raises such consideration for risk analyses. Such risk function actually contains two sources of information, namely, the chance of accident occurrence and harmfulness of each accident. Nevertheless, most previous studies neglect the severity information in measuring safety risks, and the exception would be several multivariate analyses in which a predefined distributional form of accident frequency for different levels of severity is adopted [7–9]. These multivariate models provide inference on the correlative relationships among accident frequencies for different severity levels, but they also cannot provide a single risk measurement for the purpose of safety evaluations. Towards this end, one of the objectives of this study seeks to define a single risk measure that can comprehensively capture the compound effect of accident occurrence and harmfulness on AHI.
Under this circumstance, this study also seeks to find appropriate regression models for predication and inference of AHI. With the consideration of possible overdispersion on data, a quasi-likelihood model [10–12] is adopted as it provides a semiparametric method to estimate the mean value of interested parameters and hence it is a more nature approach for risk predications because less distributional assumptions are required. The suggested model can provide an important alternative to the frequently used negative binomial model. In addition, the quasi-Poisson model has been found to be more accurate for certain count data than the negative binomial models . Besides, it is also necessary to present the procedure of data integration of this study as an example for similar analyses in the future. In fact, such analysis requires multiple data inputs  including accident data, traffic system data, road segments data, and traffic flow data from various sources.
In sum, one of the major objectives of this study is to identify a comprehensive measure for safety evaluation in terms of accident risks, namely, the Accident Hazard Index (AHI) in the following context. With such consideration, this study also tries to contribute a statistical analysis through the quasi-Poisson likelihood model which is suggested as a nature way to deal with overdispersed count data. The results will be analyzed by variables with significant coefficients and the effects of these variables on AHI are also illustrated. This study provides an alternative method to analyze the accident risk, in which the weight crash rate for different level of severity is used instead of the traditional analysis on the single value of crash rate. In addition, the selection of the value of weight is also presented as an indication to similar analyses.
This research will focus on predication and inference on the Accident Hazard Index for road segments and the used accident data was collected in Pikes Peak Area, Colorado, USA, during the period from July 2006 to December 2010. In order to aggregate these accidents for road segments in this area, as well as to incorporate several key variables for regression, another dataset of road segments is also used in this study. The data describe each accident in terms of its severity, time of accident, locations, and so forth. The supplemental datasets describe the road traffic and geometric characteristics including variables such as the length of road segments, average annual daily traffic (AADT), ownership of the road segments, and number of through lanes. The traffic and roadway data is obtained from Colorado Department of Transportation (CDOT). The accident dataset is obtained from the Department of Revenue (DOR) and coded into GIS database by the Pikes Peak Area Council of Government (PPACG). With the location information of each accident, GIS platform can be used as a tool to integrate the two sources of data .
For demonstrative purposes, Figure 1 presents a sample area, which contains the road network map and the corresponding accidents. In this area, a road segment is highlighted and it can be used to illustrate the procedure for data integration. Through GIS platforms, the important step is to map accidents to the road segments to which they belong. Because accidents majorly occurred at either road segments or intersections, the first step is to remove those accidents that occurred at intersections and a convenient approach considers an area with radius of 200 ft from the center of each intersection. Then for the remaining accidents, a 150 ft buffer size of double sides for road segments is used to consider the actual range of road segment and the observation error of location of accidents during the data collections. In addition, accidents occurred in ramps and parking lots are also excluded during the process.
Then with the constructed one-to-one corresponding mapping relationship between accidents and road segments, accidents for each road segment are aggregated by their severity levels. In this study, two levels of severity, namely, fatal-injury accidents and property-damage-only accidents, are considered. The distributions of accident counts by each level of severity are presented in Figure 2. Moreover several explanatory variables such as intersection density of road segments, urban/rural location, and ownership are considered in the following analysis. Table 1 provides the basic descriptive statistics for explanatory variables and the exposure variable.
2.2. The Definition of Accident Hazard Index (AHI)
In order to measure accident risks with compound information of the likelihood of accident occurrence and corresponding harmfulness of each accident, a more general form for the Accident Hazard Index is presented as follows: where is the AHI that is expressed by the expected value of the weighted crash rates in terms of two severity levels. The measure is believed to have the ability to reflect the magnitude of accident risks for the purposes of safety evaluations as well as black spot diagnostics for road segments. is the total number of fatal-injury accidents and is the total number of property-damage-only accidents. is the weight associated with fatal-injury accidents, which is ranging from 0.5 to 1 as a reflection of the relative importance of fatal-injury accidents in the analysis, and is the exposure variable usually the vehicle mile traveled (VMT). Further, if, all types of accidents are treated equally and hence will be proportional to crash rate and if, the property-damage-only accidents will be ignored.
In fact, it is true that fatal-injury accidents are usually accompanied with property damage. However, the crucial point to distinguish between the two types of accidents is that the property-damage-only accidents are defined to be accidents that are not fatal or injury-involved. Under such situation, even though fatal-injury accidents are usually associated with property damage, it can be assigned larger weight than property-damage-only accidents because it is reasonable to assume that the situation of fatal or injury-involved accident is more harmful than the situation of property-damage-only accident. As a result, the weighting process is meaningful to reflect the relative harmfulness of the two different types of accidents and therefore provide AHI as a more comprehensive and precise criterion indicating the overall losses from all types of accidents during the safety evaluations.
2.3. The Quasi-Poisson Approach
Accident Hazard Index in fact is the weighted expected value of two crash rates as defined above. It can be determined through the regression of and . As a result, the current study suggests quasi-likelihood models as an alternative to the traditional Poisson and negative binomial regression models since quasi-likelihood framework does not require a predefined distributional form of the responses and hence may produce more nature and accurate results . In quasi-Poisson model, the variance is assumed to be the mean multiplied by a dispersion parameter. Therefore, the quasi-Poisson model is capable of considering overdispersed data, which is a common characteristic in accident counts. For assumed i.i.d. accident frequency on road segments, is the corresponding mean value such that
In the following analysis, a log link function will be used. If is denoted as the accident exposure measure for road segment, the logarithm of will be an offset term with fixed coefficient of one under the log link function as follows:
Instead of assumptions on the distributions, quasi-likelihood models only require specification of the relationship between mean and variance . And the quasi-Poisson model adopts the relationship from the Poisson distribution such that the variance is related to mean only by a multiplication of the dispersion parameter as follows: where is the number of accidents for certain type of severity and is the so-called variance function in a generalized linear model (GLM) setting. The quasi-score function is the first-order derivative of the log-likelihood function which is the same definition from the traditional score function. For the quasi-Poisson model, its score function for a single observation is where is the sample value for number of accidents in road segment. Therefore, the quasi-likelihood function for sample can be written in the following form: and the quasi-likelihood function for all sample is the summation of quasi-likelihood function for each observation as follows:
The estimated parameters will try to maximize the value of and the estimation equation is in the following system of equations: which is equivalent to
And, in terms of regression parameters and exposure variable, the system of equations is also as follows:
In addition, the dispersion parameter can be estimated by the Pearson estimator in the following equation:
Therefore, the Accident Hazard Index for road segment can be formulated in the following form in this study: where is the vector of regression parameters for fatal-injury accidents and is the corresponding parameters for property-damage-only accidents.
3. Results and Discussion
The model is constructed by the function of generalized linear models in . Based on the above model specification, parameters of and are estimated by the quasi-Poisson model and the results are presented in Table 2, which includes only statistically significant effects (variable selection is based on backward elimination with a 0.05 significance level to stay). Interestingly, the individual impact of each covariate is largely consistent between fatal-injury accidents and property-damage-only accidents with a slight exception that several variables do not show evident impact on but on .
Road segments show different risk levels between rural and urban locations in terms of the crash rate of fatal-injury accidents. The model indicates that rural road segments are more likely to be involved in fatal-injury accidents. Even the effect of rural location is controversial , it is also worth to mention that one possible reason is that the driving speed is usually high and hence more likely to involve more serious accidents like fatal or injury ones. Several previous studies also indicate that in rural areas there are higher death rates compared to urban areas due to excessive speeding  and special rural driving cultures .
Intersection density is defined by the number of intersections along the road segments per mile. The positive coefficient shows that intersection density is an unsafe factor that may introduce more chances for both fatal-injury and property-damage-only accidents to occur. Even though the samples of accidents occurred at intersections have been excluded in this study, the remaining accidents that occurred at the road segments are also affected by these intersections and are possibly due to the complicated upstream or downstream traffic flows near intersections. Specifically, the increased demand of waving and lane change actions when the vehicle approaches or leaves intersections will lead to complicated traffic flow situations as well as conflictions between vehicles and hence may contribute to more risk of accidents.
As mentioned, positive coefficient is an indication of high risk for accidents and another unsafe factor is the ownership of road segments. This study will distinguish the ownership of road segments in terms of state/federal roads or roads owned by other levels of governments. Therefore, the coefficient indicates that road segments under the ownership of state or federal level will produce small AHI for accidents with all other variables being fixed. It is plausible that state or federal roads may receive better considerations from the road design to traffic operation and gain safer conditions than roads owned by town or municipal governments.
Presence of trucks is not an unsafe factor for road safety evaluation as the truck drivers are well trained  and hence more professional and cautious during driving than the drivers of passenger vehicles. The coefficient on the percent average daily truck is negative which means the appearance of truck is a safe factor.
Annual average daily traffic consistently associates with the occurrence of all types of accidents and the coefficients indicate that road segments with larger AADT may lead to fewer risks. One of the possible reasons is that the operation speed is usually low for large AADT and may provide safer environment for driving. In addition, better pavement condition will also lead to fewer risks by negatively affecting the occurrence of property-damage-only accidents.
The dispersion parameters are estimated by the Pearson estimator and a value greater than one indicates that overdispersion exists for the count data. For fatal-injury accident and property-damage-only accidents, the dispersion parameters are estimated and 3.096 and 16.415 are the results, respectively. Therefore, the count data is overdispersed in this analysis and the quasi-Poisson is appropriate under such consideration.
With the estimated coefficients of and , the risk can be estimated by its mean expression. The harmfulness weight is used to reflect magnitude of losses that resulted from fatal-injury accidents relative to the property-damage-only accidents. So naturally, should be greater than 0.5 and there could be many approaches on choosing a particular . In the simplest way, it can be determined by past experiences  or the subjective impression on the harmful levels of fatal-injury accidents over property-damage-only accidents. Another criterion for determining could use the relative ratio between averaged insurance claimed value of fatal-injury accidents and property-damage-only accidents. For the safety evaluation purposes, the ranks of risks of all road segments are sufficient instead of the absolute numerical values of risks. Figure 3 therefore plots the risk ranks between two different choices of to indicate the variation on the evaluated risk levels with respect to the choice of . Specifically, is used as the base case, which represents the equal importance of the two types of accidents in evaluation, to compare with several other choices of .
The correlative relationships between two resulted ranks under difference choice of can reflect the influence of the value of on risk ranks of all road segments. In Figure 3, an off-diagonal line indicates that the risk ranks are consistent between two choices of , whereas the fact that more points away from this line means more variations of risk ranks under different . Therefore, it is clear that the risk ranks of AHI exhibit evident discrepancy if is close to one and the dramatic discrepancy occurred when is greater than 0.8. Thus, the harmfulness of accidents actually could be better considered for the selection of as illistrated in the plot.
4. Summary and Conclusions
At the planning level, all transportation system characteristics and road traffic characteristics are important indicators which may in turn influence the roadway accident risks. The whole framework of the safety analysis at the planning level requires connections between transportation planning outputs and the accident risk evaluation criteria. Correspondingly, some of the explanatory variables in the regression models of crash counts will be used as the bridge between transportation planning and safety evaluations of planned roadway. The contribution of this paper on transportation safety planning is the developed risk evaluation models which are important preliminary works for the safety analysis at the planning level even though it is not a direct study towards safety planning.
Towards this end, this study provides a statistical analysis on a comprehensive measure which is also called the Accident Hazard Index (AHI) on accident risks. AHI is suggested as a compound value of accident frequency and corresponding harmfulness of each accident. In order to consider the overdispersed nature of the accident data, a quasi-Poisson model is proposed to connect the accident rate to several key explanatory variables. The data is integrated from several sources through the GIS platform and an clear procedure for data processing is also presented as an example for similar studies. With the aggregated accident counts on road segments, the regression model is estimated and several variables are found to have significant impact on the estimation of accident risks. For example, the intersection density has negative contributions for reducing risks, whereas AADT affects it in the opposite direction. Besides, the weight on fatal-injury accident also affects the estimated AHI and the influence can be illustrated by the changes of ranks of AHI in terms of changes of weight and the plot indicates that a value of is suggested for the consideration of harmfulness of fatal or injury accidents.
This work is believed to be an important first step toward a comprehensive risk analysis of traffic accidents. In addition, there are several important avenues for further research. First, it is necessary to find regression models for the accident risks as a whole such that the relationship between different types of accidents can be considered. Second, the nature of excessive zero count of accidents would be taken into account in the model as an important supplement for the traditional quasi-likelihood models.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
This research was supported by the National Natural Science Foundation of China (no. 51208032 and no. 71210001), Fundamental Research Funds for the Central Universities (nos. 2013JBM041 and 2013JBM008), and “863” Research Project (2011AA110303).
P. C. Anastasopoulos, A. P. Tarko, and F. L. Mannering, “Tobit analysis of vehicle accident rates on interstate highways,” Accident Analysis and Prevention, vol. 40, no. 2, pp. 768–775, 2008.View at: Publisher Site | Google Scholar
D. Lord and F. L. Mannering, “The statistical analysis of crash-frequency data: a review and assessment of methodological alternatives,” Transportation Research A, vol. 44, no. 5, pp. 291–305, 2010.View at: Publisher Site | Google Scholar
A. Tarko and M. Tracz, “Accident prediction models for signalized crosswalks,” Safety Science, vol. 19, no. 2-3, pp. 109–118, 1995.View at: Publisher Site | Google Scholar
S. P. Miaou, “The relationship between truck accidents and geometric design of road sections: poisson versus negative binomial regressions,” Accident Analysis and Prevention, vol. 26, no. 4, pp. 471–482, 1994.View at: Publisher Site | Google Scholar
D. Lord, S. P. Washington, and J. N. Ivan, “Poisson, poisson-gamma and zero-inflated regression models of motor vehicle crashes: balancing statistical fit and theory,” Accident Analysis and Prevention, vol. 37, no. 1, pp. 35–46, 2005.View at: Publisher Site | Google Scholar
F. Holt and Ullevig, “2035 Regional transportation plan update, safety analysis-procedural overview,” Prepared for Denver Regional Council of Governments, 2008.View at: Google Scholar
J. Ma and K. M. Kockelman, “Bayesian multivariate poisson regression for models of injury count, by severity,” Transportation Research Record, vol. 1950, pp. 24–34, 2006.View at: Google Scholar
P. C. Anastasopoulos, V. N. Shankar, J. E. Haddock, and F. L. Mannering, “A multivariate tobit analysis of highway accident-injury-severity rates,” Accident Analysis and Prevention, vol. 45, pp. 110–119, 2012.View at: Publisher Site | Google Scholar
Y.-C. Chiou and C. Fu, “Modeling crash frequency and severity using multinomial-generalized poisson model with error components,” Accident Analysis and Prevention, vol. 50, pp. 73–82, 2013.View at: Publisher Site | Google Scholar
R. W. M. Wedderburn, “Quasi likelihood functions, generalized linear models, and the Gauss Newton method,” Biometrika, vol. 61, no. 3, pp. 439–447, 1974.View at: Google Scholar | Zentralblatt MATH
P. McCullagh, “Quasi-likelihood functions,” Annals of Statistics, vol. 11, no. 1, pp. 59–67, 1983.View at: Publisher Site | Google Scholar | Zentralblatt MATH
T. A. Severini and J. G. Staniswalis, “Quasi-likelihood estimation in semiparametric models,” Journal of the American Statistical Association, vol. 89, no. 426, pp. 501–511, 1994.View at: Publisher Site | Google Scholar | Zentralblatt MATH
J. M. ver Hoef and P. L. Boveng, “Quasi-poisson versus negative binomial regression: how should we model overdispersed count data?” Ecology, vol. 88, no. 11, pp. 2766–2772, 2007.View at: Publisher Site | Google Scholar
B. P. Y. Loo, “Validating crash locations for quantitative spatial analysis: a GIS-based approach,” Accident Analysis and Prevention, vol. 38, no. 5, pp. 879–886, 2006.View at: Publisher Site | Google Scholar
T. Steenberghen, T. Dufays, I. Thomas, and B. Flahaut, “Intra-urban location and clustering of road accidents using GIS: a Belgian example,” International Journal of Geographical Information Science, vol. 18, no. 2, pp. 169–181, 2004.View at: Publisher Site | Google Scholar
P. McCullagh and J. A. Nelder, Generalized Linear Models, Chapman and Hall, New York, NY, USA, 2nd edition, 1989.
X. Yan, B. Wang, M. An, and C. Zhang, “Distinguishing between rural and urban road segment traffic safety based on zero-inflated negative binomial regression models,” Discrete Dynamics in Nature and Society, vol. 2012, Article ID 789140, 11 pages, 2012.View at: Publisher Site | Google Scholar
R. P. Gonzalez, G. R. Cummings, H. A. Phelan, S. Harlin, M. Mulekar, and C. B. Rodning, “Increased rural vehicular mortality rates: roadways with higher speed limits or excessive vehicular speed?” The Journal of Trauma, vol. 63, no. 6, pp. 1360–1363, 2007.View at: Publisher Site | Google Scholar
M. E. Rakauskas, N. J. Ward, and S. G. Gerberich, “Identification of differences between rural and urban safety cultures,” Accident Analysis and Prevention, vol. 41, no. 5, pp. 931–937, 2009.View at: Publisher Site | Google Scholar
X. Zhu and S. Srinivasan, “A comprehensive analysis of factors influencing the injury severity of large-truck crashes,” Accident Analysis and Prevention, vol. 43, no. 1, pp. 49–57, 2011.View at: Publisher Site | Google Scholar