Research Article  Open Access
A QuasiPoisson Approach on Modeling Accident Hazard Index for Urban Road Segments
Abstract
In light of the recently emphasized studies on risk evaluation of crashes, accident counts under specific transportation facilities are adopted to reflect the chance of crash occurrence. The current study introduces more comprehensive measure with the supplement information of accidental harmfulness into the expression of accident risks which are also named Accident Hazard Index (AHI) in the following context. Before the statistical analysis, datasets from various sources are integrated under a GIS platform and the corresponding procedures are presented as an illustrated example for similar analysis. Then, a quasiPoisson regression model is suggested for analyses and the results show that the model is appropriate for dealing with overdispersed count data and several key explanatory variables were found to have significant impact on the estimation of AHI. In addition, the effect of weight on different severity levels of accidents is examined and the selection of the weight is also discussed.
1. Introduction
Aggregated accident analyses for certain transportation facilities including road segments, intersections, and recently emphasized traffic analysis zones (TAZs) have been thoroughly studied in the forms of accident frequency and accident rates [1, 2]. The major approach in those studies is to use regression models to make connections between number of accidents and attributes of these facilities, such as geometric and traffic characteristics of road segments and socioeconomic and demographic properties of TAZs. Past studies can readily consider many aspects of accident data as well as deal with some critical issues. For example, the earlier models based on the Poisson distribution actually rely on the nature of the Poisson process of accident occurrence [3], the negative binomial distribution improves the Poisson model by introducing the consideration of overdispersed data [4] and later more advanced zeroinflated models involve the specification of excessive zero counts [5] into the model. It is no doubt that these advanced models have discovered the nature of accident counts in different respects. However, for the purposes of traffic safety evaluation, the value of accident frequency or accident rate is less informative to reflect the magnitude of accident risks because it cannot take the severity attributes of each accident into account.
Generally, a comprehensive risk function for safety evaluation should be able to measure the expected harmfulness of each accident and some practical analysis [6] raises such consideration for risk analyses. Such risk function actually contains two sources of information, namely, the chance of accident occurrence and harmfulness of each accident. Nevertheless, most previous studies neglect the severity information in measuring safety risks, and the exception would be several multivariate analyses in which a predefined distributional form of accident frequency for different levels of severity is adopted [7–9]. These multivariate models provide inference on the correlative relationships among accident frequencies for different severity levels, but they also cannot provide a single risk measurement for the purpose of safety evaluations. Towards this end, one of the objectives of this study seeks to define a single risk measure that can comprehensively capture the compound effect of accident occurrence and harmfulness on AHI.
Under this circumstance, this study also seeks to find appropriate regression models for predication and inference of AHI. With the consideration of possible overdispersion on data, a quasilikelihood model [10–12] is adopted as it provides a semiparametric method to estimate the mean value of interested parameters and hence it is a more nature approach for risk predications because less distributional assumptions are required. The suggested model can provide an important alternative to the frequently used negative binomial model. In addition, the quasiPoisson model has been found to be more accurate for certain count data than the negative binomial models [13]. Besides, it is also necessary to present the procedure of data integration of this study as an example for similar analyses in the future. In fact, such analysis requires multiple data inputs [14] including accident data, traffic system data, road segments data, and traffic flow data from various sources.
In sum, one of the major objectives of this study is to identify a comprehensive measure for safety evaluation in terms of accident risks, namely, the Accident Hazard Index (AHI) in the following context. With such consideration, this study also tries to contribute a statistical analysis through the quasiPoisson likelihood model which is suggested as a nature way to deal with overdispersed count data. The results will be analyzed by variables with significant coefficients and the effects of these variables on AHI are also illustrated. This study provides an alternative method to analyze the accident risk, in which the weight crash rate for different level of severity is used instead of the traditional analysis on the single value of crash rate. In addition, the selection of the value of weight is also presented as an indication to similar analyses.
2. Methods
2.1. Datasets
This research will focus on predication and inference on the Accident Hazard Index for road segments and the used accident data was collected in Pikes Peak Area, Colorado, USA, during the period from July 2006 to December 2010. In order to aggregate these accidents for road segments in this area, as well as to incorporate several key variables for regression, another dataset of road segments is also used in this study. The data describe each accident in terms of its severity, time of accident, locations, and so forth. The supplemental datasets describe the road traffic and geometric characteristics including variables such as the length of road segments, average annual daily traffic (AADT), ownership of the road segments, and number of through lanes. The traffic and roadway data is obtained from Colorado Department of Transportation (CDOT). The accident dataset is obtained from the Department of Revenue (DOR) and coded into GIS database by the Pikes Peak Area Council of Government (PPACG). With the location information of each accident, GIS platform can be used as a tool to integrate the two sources of data [15].
For demonstrative purposes, Figure 1 presents a sample area, which contains the road network map and the corresponding accidents. In this area, a road segment is highlighted and it can be used to illustrate the procedure for data integration. Through GIS platforms, the important step is to map accidents to the road segments to which they belong. Because accidents majorly occurred at either road segments or intersections, the first step is to remove those accidents that occurred at intersections and a convenient approach considers an area with radius of 200 ft from the center of each intersection. Then for the remaining accidents, a 150 ft buffer size of double sides for road segments is used to consider the actual range of road segment and the observation error of location of accidents during the data collections. In addition, accidents occurred in ramps and parking lots are also excluded during the process.
Then with the constructed onetoone corresponding mapping relationship between accidents and road segments, accidents for each road segment are aggregated by their severity levels. In this study, two levels of severity, namely, fatalinjury accidents and propertydamageonly accidents, are considered. The distributions of accident counts by each level of severity are presented in Figure 2. Moreover several explanatory variables such as intersection density of road segments, urban/rural location, and ownership are considered in the following analysis. Table 1 provides the basic descriptive statistics for explanatory variables and the exposure variable.

(a)
(b)
2.2. The Definition of Accident Hazard Index (AHI)
In order to measure accident risks with compound information of the likelihood of accident occurrence and corresponding harmfulness of each accident, a more general form for the Accident Hazard Index is presented as follows: where is the AHI that is expressed by the expected value of the weighted crash rates in terms of two severity levels. The measure is believed to have the ability to reflect the magnitude of accident risks for the purposes of safety evaluations as well as black spot diagnostics for road segments. is the total number of fatalinjury accidents and is the total number of propertydamageonly accidents. is the weight associated with fatalinjury accidents, which is ranging from 0.5 to 1 as a reflection of the relative importance of fatalinjury accidents in the analysis, and is the exposure variable usually the vehicle mile traveled (VMT). Further, if, all types of accidents are treated equally and hence will be proportional to crash rate and if, the propertydamageonly accidents will be ignored.
In fact, it is true that fatalinjury accidents are usually accompanied with property damage. However, the crucial point to distinguish between the two types of accidents is that the propertydamageonly accidents are defined to be accidents that are not fatal or injuryinvolved. Under such situation, even though fatalinjury accidents are usually associated with property damage, it can be assigned larger weight than propertydamageonly accidents because it is reasonable to assume that the situation of fatal or injuryinvolved accident is more harmful than the situation of propertydamageonly accident. As a result, the weighting process is meaningful to reflect the relative harmfulness of the two different types of accidents and therefore provide AHI as a more comprehensive and precise criterion indicating the overall losses from all types of accidents during the safety evaluations.
2.3. The QuasiPoisson Approach
Accident Hazard Index in fact is the weighted expected value of two crash rates as defined above. It can be determined through the regression of and . As a result, the current study suggests quasilikelihood models as an alternative to the traditional Poisson and negative binomial regression models since quasilikelihood framework does not require a predefined distributional form of the responses and hence may produce more nature and accurate results [13]. In quasiPoisson model, the variance is assumed to be the mean multiplied by a dispersion parameter. Therefore, the quasiPoisson model is capable of considering overdispersed data, which is a common characteristic in accident counts. For assumed i.i.d. accident frequency on road segments, is the corresponding mean value such that
In the following analysis, a log link function will be used. If is denoted as the accident exposure measure for road segment, the logarithm of will be an offset term with fixed coefficient of one under the log link function as follows:
Instead of assumptions on the distributions, quasilikelihood models only require specification of the relationship between mean and variance [16]. And the quasiPoisson model adopts the relationship from the Poisson distribution such that the variance is related to mean only by a multiplication of the dispersion parameter as follows: where is the number of accidents for certain type of severity and is the socalled variance function in a generalized linear model (GLM) setting. The quasiscore function is the firstorder derivative of the loglikelihood function which is the same definition from the traditional score function. For the quasiPoisson model, its score function for a single observation is where is the sample value for number of accidents in road segment. Therefore, the quasilikelihood function for sample can be written in the following form: and the quasilikelihood function for all sample is the summation of quasilikelihood function for each observation as follows:
The estimated parameters will try to maximize the value of and the estimation equation is in the following system of equations: which is equivalent to
And, in terms of regression parameters and exposure variable, the system of equations is also as follows:
In addition, the dispersion parameter can be estimated by the Pearson estimator in the following equation:
Therefore, the Accident Hazard Index for road segment can be formulated in the following form in this study: where is the vector of regression parameters for fatalinjury accidents and is the corresponding parameters for propertydamageonly accidents.
3. Results and Discussion
The model is constructed by the function of generalized linear models in . Based on the above model specification, parameters of and are estimated by the quasiPoisson model and the results are presented in Table 2, which includes only statistically significant effects (variable selection is based on backward elimination with a 0.05 significance level to stay). Interestingly, the individual impact of each covariate is largely consistent between fatalinjury accidents and propertydamageonly accidents with a slight exception that several variables do not show evident impact on but on .
 
# Indicates that the coefficient is statistically insignificant. *Reference categories. 
Road segments show different risk levels between rural and urban locations in terms of the crash rate of fatalinjury accidents. The model indicates that rural road segments are more likely to be involved in fatalinjury accidents. Even the effect of rural location is controversial [17], it is also worth to mention that one possible reason is that the driving speed is usually high and hence more likely to involve more serious accidents like fatal or injury ones. Several previous studies also indicate that in rural areas there are higher death rates compared to urban areas due to excessive speeding [18] and special rural driving cultures [19].
Intersection density is defined by the number of intersections along the road segments per mile. The positive coefficient shows that intersection density is an unsafe factor that may introduce more chances for both fatalinjury and propertydamageonly accidents to occur. Even though the samples of accidents occurred at intersections have been excluded in this study, the remaining accidents that occurred at the road segments are also affected by these intersections and are possibly due to the complicated upstream or downstream traffic flows near intersections. Specifically, the increased demand of waving and lane change actions when the vehicle approaches or leaves intersections will lead to complicated traffic flow situations as well as conflictions between vehicles and hence may contribute to more risk of accidents.
As mentioned, positive coefficient is an indication of high risk for accidents and another unsafe factor is the ownership of road segments. This study will distinguish the ownership of road segments in terms of state/federal roads or roads owned by other levels of governments. Therefore, the coefficient indicates that road segments under the ownership of state or federal level will produce small AHI for accidents with all other variables being fixed. It is plausible that state or federal roads may receive better considerations from the road design to traffic operation and gain safer conditions than roads owned by town or municipal governments.
Presence of trucks is not an unsafe factor for road safety evaluation as the truck drivers are well trained [20] and hence more professional and cautious during driving than the drivers of passenger vehicles. The coefficient on the percent average daily truck is negative which means the appearance of truck is a safe factor.
Annual average daily traffic consistently associates with the occurrence of all types of accidents and the coefficients indicate that road segments with larger AADT may lead to fewer risks. One of the possible reasons is that the operation speed is usually low for large AADT and may provide safer environment for driving. In addition, better pavement condition will also lead to fewer risks by negatively affecting the occurrence of propertydamageonly accidents.
The dispersion parameters are estimated by the Pearson estimator and a value greater than one indicates that overdispersion exists for the count data. For fatalinjury accident and propertydamageonly accidents, the dispersion parameters are estimated and 3.096 and 16.415 are the results, respectively. Therefore, the count data is overdispersed in this analysis and the quasiPoisson is appropriate under such consideration.
With the estimated coefficients of and , the risk can be estimated by its mean expression. The harmfulness weight is used to reflect magnitude of losses that resulted from fatalinjury accidents relative to the propertydamageonly accidents. So naturally, should be greater than 0.5 and there could be many approaches on choosing a particular . In the simplest way, it can be determined by past experiences [6] or the subjective impression on the harmful levels of fatalinjury accidents over propertydamageonly accidents. Another criterion for determining could use the relative ratio between averaged insurance claimed value of fatalinjury accidents and propertydamageonly accidents. For the safety evaluation purposes, the ranks of risks of all road segments are sufficient instead of the absolute numerical values of risks. Figure 3 therefore plots the risk ranks between two different choices of to indicate the variation on the evaluated risk levels with respect to the choice of . Specifically, is used as the base case, which represents the equal importance of the two types of accidents in evaluation, to compare with several other choices of .
(a)
(b)
(c)
(d)
(e)
(f)
The correlative relationships between two resulted ranks under difference choice of can reflect the influence of the value of on risk ranks of all road segments. In Figure 3, an offdiagonal line indicates that the risk ranks are consistent between two choices of , whereas the fact that more points away from this line means more variations of risk ranks under different . Therefore, it is clear that the risk ranks of AHI exhibit evident discrepancy if is close to one and the dramatic discrepancy occurred when is greater than 0.8. Thus, the harmfulness of accidents actually could be better considered for the selection of as illistrated in the plot.
4. Summary and Conclusions
At the planning level, all transportation system characteristics and road traffic characteristics are important indicators which may in turn influence the roadway accident risks. The whole framework of the safety analysis at the planning level requires connections between transportation planning outputs and the accident risk evaluation criteria. Correspondingly, some of the explanatory variables in the regression models of crash counts will be used as the bridge between transportation planning and safety evaluations of planned roadway. The contribution of this paper on transportation safety planning is the developed risk evaluation models which are important preliminary works for the safety analysis at the planning level even though it is not a direct study towards safety planning.
Towards this end, this study provides a statistical analysis on a comprehensive measure which is also called the Accident Hazard Index (AHI) on accident risks. AHI is suggested as a compound value of accident frequency and corresponding harmfulness of each accident. In order to consider the overdispersed nature of the accident data, a quasiPoisson model is proposed to connect the accident rate to several key explanatory variables. The data is integrated from several sources through the GIS platform and an clear procedure for data processing is also presented as an example for similar studies. With the aggregated accident counts on road segments, the regression model is estimated and several variables are found to have significant impact on the estimation of accident risks. For example, the intersection density has negative contributions for reducing risks, whereas AADT affects it in the opposite direction. Besides, the weight on fatalinjury accident also affects the estimated AHI and the influence can be illustrated by the changes of ranks of AHI in terms of changes of weight and the plot indicates that a value of is suggested for the consideration of harmfulness of fatal or injury accidents.
This work is believed to be an important first step toward a comprehensive risk analysis of traffic accidents. In addition, there are several important avenues for further research. First, it is necessary to find regression models for the accident risks as a whole such that the relationship between different types of accidents can be considered. Second, the nature of excessive zero count of accidents would be taken into account in the model as an important supplement for the traditional quasilikelihood models.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
This research was supported by the National Natural Science Foundation of China (no. 51208032 and no. 71210001), Fundamental Research Funds for the Central Universities (nos. 2013JBM041 and 2013JBM008), and “863” Research Project (2011AA110303).
References
 P. C. Anastasopoulos, A. P. Tarko, and F. L. Mannering, “Tobit analysis of vehicle accident rates on interstate highways,” Accident Analysis and Prevention, vol. 40, no. 2, pp. 768–775, 2008. View at: Publisher Site  Google Scholar
 D. Lord and F. L. Mannering, “The statistical analysis of crashfrequency data: a review and assessment of methodological alternatives,” Transportation Research A, vol. 44, no. 5, pp. 291–305, 2010. View at: Publisher Site  Google Scholar
 A. Tarko and M. Tracz, “Accident prediction models for signalized crosswalks,” Safety Science, vol. 19, no. 23, pp. 109–118, 1995. View at: Publisher Site  Google Scholar
 S. P. Miaou, “The relationship between truck accidents and geometric design of road sections: poisson versus negative binomial regressions,” Accident Analysis and Prevention, vol. 26, no. 4, pp. 471–482, 1994. View at: Publisher Site  Google Scholar
 D. Lord, S. P. Washington, and J. N. Ivan, “Poisson, poissongamma and zeroinflated regression models of motor vehicle crashes: balancing statistical fit and theory,” Accident Analysis and Prevention, vol. 37, no. 1, pp. 35–46, 2005. View at: Publisher Site  Google Scholar
 F. Holt and Ullevig, “2035 Regional transportation plan update, safety analysisprocedural overview,” Prepared for Denver Regional Council of Governments, 2008. View at: Google Scholar
 J. Ma and K. M. Kockelman, “Bayesian multivariate poisson regression for models of injury count, by severity,” Transportation Research Record, vol. 1950, pp. 24–34, 2006. View at: Google Scholar
 P. C. Anastasopoulos, V. N. Shankar, J. E. Haddock, and F. L. Mannering, “A multivariate tobit analysis of highway accidentinjuryseverity rates,” Accident Analysis and Prevention, vol. 45, pp. 110–119, 2012. View at: Publisher Site  Google Scholar
 Y.C. Chiou and C. Fu, “Modeling crash frequency and severity using multinomialgeneralized poisson model with error components,” Accident Analysis and Prevention, vol. 50, pp. 73–82, 2013. View at: Publisher Site  Google Scholar
 R. W. M. Wedderburn, “Quasi likelihood functions, generalized linear models, and the Gauss Newton method,” Biometrika, vol. 61, no. 3, pp. 439–447, 1974. View at: Google Scholar  Zentralblatt MATH
 P. McCullagh, “Quasilikelihood functions,” Annals of Statistics, vol. 11, no. 1, pp. 59–67, 1983. View at: Publisher Site  Google Scholar  Zentralblatt MATH
 T. A. Severini and J. G. Staniswalis, “Quasilikelihood estimation in semiparametric models,” Journal of the American Statistical Association, vol. 89, no. 426, pp. 501–511, 1994. View at: Publisher Site  Google Scholar  Zentralblatt MATH
 J. M. ver Hoef and P. L. Boveng, “Quasipoisson versus negative binomial regression: how should we model overdispersed count data?” Ecology, vol. 88, no. 11, pp. 2766–2772, 2007. View at: Publisher Site  Google Scholar
 B. P. Y. Loo, “Validating crash locations for quantitative spatial analysis: a GISbased approach,” Accident Analysis and Prevention, vol. 38, no. 5, pp. 879–886, 2006. View at: Publisher Site  Google Scholar
 T. Steenberghen, T. Dufays, I. Thomas, and B. Flahaut, “Intraurban location and clustering of road accidents using GIS: a Belgian example,” International Journal of Geographical Information Science, vol. 18, no. 2, pp. 169–181, 2004. View at: Publisher Site  Google Scholar
 P. McCullagh and J. A. Nelder, Generalized Linear Models, Chapman and Hall, New York, NY, USA, 2nd edition, 1989.
 X. Yan, B. Wang, M. An, and C. Zhang, “Distinguishing between rural and urban road segment traffic safety based on zeroinflated negative binomial regression models,” Discrete Dynamics in Nature and Society, vol. 2012, Article ID 789140, 11 pages, 2012. View at: Publisher Site  Google Scholar
 R. P. Gonzalez, G. R. Cummings, H. A. Phelan, S. Harlin, M. Mulekar, and C. B. Rodning, “Increased rural vehicular mortality rates: roadways with higher speed limits or excessive vehicular speed?” The Journal of Trauma, vol. 63, no. 6, pp. 1360–1363, 2007. View at: Publisher Site  Google Scholar
 M. E. Rakauskas, N. J. Ward, and S. G. Gerberich, “Identification of differences between rural and urban safety cultures,” Accident Analysis and Prevention, vol. 41, no. 5, pp. 931–937, 2009. View at: Publisher Site  Google Scholar
 X. Zhu and S. Srinivasan, “A comprehensive analysis of factors influencing the injury severity of largetruck crashes,” Accident Analysis and Prevention, vol. 43, no. 1, pp. 49–57, 2011. View at: Publisher Site  Google Scholar
Copyright
Copyright © 2014 Lu Ma et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.