Abstract

In this study, the traffic crash rate, total crash frequency, and injury and fatal crash frequency were taken into consideration for distinguishing between rural and urban road segment safety. The GIS-based crash data during four and half years in Pikes Peak Area, US were applied for the analyses. The comparative statistical results show that the crash rates in rural segments are consistently lower than urban segments. Further, the regression results based on Zero-Inflated Negative Binomial (ZINB) regression models indicate that the urban areas have a higher crash risk in terms of both total crash frequency and injury and fatal crash frequency, compared to rural areas. Additionally, it is found that crash frequencies increase as traffic volume and segment length increase, though the higher traffic volume lower the likelihood of severe crash occurrence; compared to 2-lane roads, the 4-lane roads have lower crash frequencies but have a higher probability of severe crash occurrence; and better road facilities with higher free flow speed can benefit from high standard design feature thus resulting in a lower total crash frequency, but they cannot mitigate the severe crash risk.

1. Introduction

Previous studies have been focused on distinguishing between rural and urban traffic safety using traffic crash data, but the influence of rural or urban settings on segment safety is controversial. The fatal traffic crash research indicated that fatality rates in rural areas are higher than in urban areas [13]. The higher fatality and injury rates in rural road facilities have been attributed to various reasons, such as longer emergency response time and further distance to crash locations [3]. The other explanations include higher speed limit and worse driving habits (e.g., alcohol, a lower rate of seat belt use, and safety precaution use), road conditions, and accessibility to trauma care [46]. On the other hand, some studies reported that crash frequencies in urban areas were higher than those in rural areas [7, 8], because urban regions involve more complex traffic conditions, high traffic volume, congestion, poor pavement conditions, and so forth [9]. While the risk of severe crashes appears higher in rural segments, no differences were identified in the cause of injury and place of injury between urban and rural drivers [10].

Neither crash frequencies nor fatality rate can entirely represent the influence of rural or urban settings on the segments. Many other factors lead to the occurrence of traffic crashed, such as traffic characteristics, road design characteristics, demographic features, and pavement maintenance conditions [1117]. Therefore, numerous cross-sectional studies have been conducted to characterize the relationships between factors and road segment-involved crashes. In the prior studies, the Poisson models are the most common ones which have been widely used [18, 19]. It is known that a Poisson model would be appropriate only when the mean and the variance of the crash frequencies are approximately equal. However, this assumption has been proved invalid for modeling traffic crash frequencies [20, 21], because the variances of crash frequencies were generally greater than means. Therefore, the negative binomial (NB) regression models were introduced to overcome this overdispersion problem, which had a more flexible mean-variance equality constraint [16, 22]. Nevertheless, both Poisson and NB models cannot deal with the property of crash frequency data with a large density of zeros (no crashes occur on roads during the observation period). Correspondingly, the zero-inflated count regress models were developed and applied for analyzing and predicting crash frequencies. The zero-inflated count regression models are capable of handling the apparent “excess” zeros crash data and generally have a more statistical suitability for modeling crash data than Poisson and NB regression models [23].

In order to understand the role of rural or urban settings in segment safety, the crash rate, crash frequencies, and the injury and fatality frequencies are taken into consideration in this study to distinguish between rural and urban traffic safety. The GIS-based crash data during four and half years in Pikes Peak Area, USA were applied for analyses. The GIS techniques for traffic data process have been proved effective to analyze and visualize crash data [24] and have advantages in data display, clear presentation of spatial relationship, and convenient query of relevant data [25, 26]. Since previous studies have discussed the suitability of various models in the prediction of crash frequencies, we adopt zero-inflated negative binomial (ZINB) regression models for crash frequency analysis and prediction, because zero-crash segments account for more than 40% of the total data in this study.

2. Methodology

2.1. Data Preparation

Accident data were obtained from the department of revenue (DOR) and were calculated by the total accidents recorded from 2006 July to 2010 December. It contains useful traffic information, such as crash location, severity, weather condition, and segment type, and the data were geocoded into GIS databases by the PPACG (Pikes Peak Area Council of Governments).

Based on the GIS process of spatial join between whole road network and urban boundary, the road segments were classified into two categories: rural segments and urban segments. Before analyzing segment crashes, the crashes at intersections were separated from the databases. Thus, the 200-ft intersection buffers were first created, and the crashes within these intersection buffers were deleted from the segment crash analyses. Then, with a road-segment layer separated from the road network geodatabase, the crashes associated with segments needed to be further separated from all other crashes. Because these segments may have wide cross-sections, a 150-foot buffer on both sides of an arterial centerline was adopted to capture most crashes associated with the segments only. After the 150 foot buffers were created, the crashes within these buffers were selected and aggregated in their corresponding segments.

Because different categories of road facilities vary by characteristics of highway design, traffic operation, and environments, the crash data associated with a specific type of highways needed to be separated from the other types of highways. In this study, the crash risk was calculated and analyzed not only for the overall segment network, but also for interstate, expressway, principal arterial, and minor arterial, respectively. The segments belonging to other road types were excluded from these segments. The combined data set was further organized according to the following criteria.(i)These accidents were divided into three categories: fatal, injury, and property-damage only (PDO) accounting for the accident severity.(ii)Road segments with 2 and 4 lanes were selected, because 6 lanes segments exist in urban areas only.(iii) ADT was calculated by 1000, because the change in crash frequency with increment of one vehicle is meaningless.The cleaned accident data were overlaid with the GIS-based network and distributed into each segment in rural and urban areas. The segments were first analyzed and compared in terms of crash rate based on the comparative statistics of the four types of road segments. Then, ZINB models for segment crash frequency analyses and predictions were developed, in which variables are described in Table 1.

2.1.1. Zero-Inflated Negative Binomial Regression

For a Poisson crash frequency model, it assumes that the observed crash count data , given the vector of covariate , follows a Poisson distribution. The density function of can be expressed as follows: where the parameter , conditional mean number of events for each covariate , is given by where is a parameter vector ( is the coefficient for intercept, and are for regressors).

In the Poisson regression, the conditional variance of the count variable is equal to the conditional mean as follows: where is the covariate of road segment geometric and traffic features in each record including the intercept; is the conditional mean of the crash frequency . Since this assumption is contradict to the fact that the vehicle accident data are always significantly overdispersed relative to its mean, the NB regression model was developed with a heterogeneity component accounting for unobserved heterogeneity in the crash count data as follows: where is the parameter coefficients vector to be estimated for independent variables including intercept; is a heterogeneity component accounting for unobserved heterogeneity in the crash count data, which is independent of . However, there is always a large density of zeros in crash count data, which cannot accurately be predicted by traditional NB models. For this situation, the zero-inflated regression models were developed in the crash frequency-related research area.

Zero-inflated count models provide a way of modeling the excess zeros in addition to allowing for overdispersion. For each road segment, there are two possible data generation processes. Process 1 is chosen with probability and process 2 with probability . Process 1 generates only zero counts, whereas process 2 generates counts from either a poisson or a negative binomial model. In this paper, the probability depends on the geometric and traffic features of segment , can be obtained from the logistic function , as follows: where is the vector of independent variables specified in the logistic regression model (road facility and traffic features) and intercept; is the vector of zero-inflated coefficients to be estimated.

The probability of crash frequency for segment can be expressed as follows: where follows either Poisson distribution or NB distribution; is the vector of covariates of observation specified in the model.

In this study, ZINB models were used for regression efforts because zero-crash segments account for more than 40% of the total data.

3. Results

3.1. Comparative Statistical Analyses of Rural and Urban Traffic Safety

During the observation period of four and a half years, there were 9651 crashes occurring in the study areas, consisting of 1057 records in rural segments and 8594 records in urban segments. Among the crashes in the rural segments, there were 15 fatal and 176 injured accidents. On the other hand, 46 fatal and 1038 injury crashes happened in urban areas. Table 2 shows the descriptive statistics for rural and urban segment lengths, which indicate that average mileage of rural segments (0.968 mile) is longer than urban segments (0.293 mile) because of a lower density of intersections in rural networks. Figure 1 displays the road segment crash rate distribution, calculated as the number of crashes per 100 million VMT, where the double line is the boundary between rural and urban areas. It shows that the percentage of segments with higher crash rates within the urban region is more than rural areas.

Table 3 displays the -test statistics of rural and urban segment comparison for different types of facilities. It shows that there is a significant difference between rural and urban in terms of crash rates using both crash per lane*miles*year and crash per 100 million VMT in 2-lane segments. The crash rates in rural segments are consistently lower than urban segments. The 2-lane expressway is exceptional mainly because of the small sample size of 2-lane rural expressway. However, there is no statistical difference between rural and urban 4-lane arterial segments.

3.2. ZINB Regression Analyses

The crash frequencies distribution histogram (Figure 2) clearly illustrates that there are excessive zeros (over 40%) in the crash data. The values in Kolmogorov-Smirnov, Cramer-von Mises, and Anderson-Darling normality tests are all less than 0.05. Therefore, it strongly supports the null hypothesis that the crash data do not follow the normal distribution. Therefore, the ZINB models are suitable to the crash count data regression analyses.

ZINB models were developed using the software SAS 9.2. We chose the crash frequency in segment (Num_crsh) as the dependent variable, and the regressors included segment length (length), number of lanes (Numberofla), thousand average annual day traffic (ADT_1000), free flow speed (FFS), and RoU (rural or urban). The segment type was not considered in this model since it was highly correlated with FFS and RoU.

Table 4 shows the parameter estimates of ZINB model for total crash frequency in segment, and only significant variables () were included in the model. The ZINB model parameter estimates include 2 parts: NB regression and logistic regression. In the NB regression process, it can be found that the number of lanes, rural or urban, ADT, length, and FFS are all significantly correlated with the number of crashes. Further, the measure of Alpha in Table 4 is 1.435, with a value less than 0.001, displaying a very strong overdispersion effect and indicating the superiority of the ZINB model over the zero-inflated Poisson (ZIP) model. ADT_1000 and LENGTH are positive associated with the crash frequency, suggesting that crash frequencies increase with increments of traffic volume and segment length. The results are consistent with many previous research conclusions [7, 9, 27]. FFS is negatively associated with the crash frequency, indicating that crash frequencies are decreasing with increment of roadway free flow speed. Since FFS is correlated with the design standard of road facilities, it would be more appropriate to be explained that a better road facility with higher FFS has a lower crash rate compared to the facilities with lower FFS. In this study, FFS can be treated as a surrogate of speed limit but it can more accurately reflect the actual traffic operation status in road segments than speed limit. Previous research finding is less conclusive about the impact of speed limit on crash frequency [28]. In addition, four-lane roadways were found to be associated with a lower number of crashes than 2-lane roadways in this model. This is reasonable because this comparison was based on the assumption of same traffic exposure so that the segments with 4 lanes should have lower traffic volume per lane. More importantly, the urban regions appear to have a higher crash frequency than rural areas, which is consistent with the crash rate analyses results. The logistic regression part of the model predicts the likelihood of zero crash occurrences. The modeling results reveal that the variables of ADT_1000 and LENGTH are significant in estimating the probability of segments belonging to the zero crash occurrence group. According to the parameter coefficients estimated, the higher the traffic exposure (thousand of AADT and segment length), the lower the possibility of zero crash occurrences, which is consistent with all the previous study conclusions.

Furthermore, Table 5 shows the parameter estimates of ZINB model for injury and fatal crash frequency in a segment (Alpha is 1.074, with a value less than 0.001). The NB regression indicates that Numberofla, RoU, ADT_1000, and LENGTH are significant variables to predict injury and fatal crash frequency, which displays a very similar result to that for total crash frequency except for FFS. It implies that although the better road facilities with higher FFS benefit from high standard design features resulting in a lower total crash frequency (as shown in Table 4), they would not mitigate the severe crash risk. A previous study reported that by controlling the other factors, purely increasing operation speed in road segments by 1% would approximately result in 2% increment in injury crash rate and 4% increment in fatal crash rate [29]. On the other hand, compared to the total crash frequency model, the logistic regression results for injury and fatal crash frequency model are quite different though the effect of LENGTH keeps similarity. First, the number of lanes is a significant variable for estimating the probability of zero injury and fatal crash occurrence in segment. Compared to 2-lane roads, the 4-lane roads have a lower severe crash frequency but have a lower probability of zero crashes. A possible explanation is that changing lane maneuver in 4-lane segments would increase the severe crash risk. Second, the effect of ADT_1000 in the Logistic regression of injury and fatal crash model is reverse from the total crash model. It shows that as traffic volume increases, the likelihood of zero severe crashes decreases. This interesting finding is consistent with the previous conclusion in a crash severity study, which explains that lower ADT could mean higher speeds that more often lead to severe/fatal crashes [30].

4. Conclusion and Discussions

There have been numerous studies to clarify the role of rural or urban settings in segment safety, but it was still controversial to make a conclusion. Before reaching the common agreement on the difference between rural and urban traffic safety, it is important to clarify the definition of “rural.” Generally, to distinguish from urban environments, rural areas have the attributes associated with demographic features (e.g., low population size and density, outside boundary of urban area), economic statues (low economic indicators, farming, and agriculture), social structure (e.g., intimate, informal, and homogeneous forms of social interaction, limited social resources), cultural characteristics (e.g., traditional, conservative, provincial, slow to change), and so forth. The above features are often used to explain the statistical fact that the death rate from many common causes in US is significantly higher in rural compared to urban areas [1, 6], as well as in different countries [3133].

However, these thresholds should not be universally applied to make local transportation safety analyses. For many developed regions, although districts are clearly separated into rural and urban regions according to their demographic, economic, or social attributes, the transportation facilities are well connected to each other and formed more standardized road networks. Thus, it was reported that there are relatively high numbers of crashes in urban regions because the heavy traffic volume and complex driving environments in urban lead to more conflicts between vehicles [34]. Therefore, for a specific safety evaluation project, this study supports the argument that more detailed crash risk comparisons between rural and urban transportation road segments should be performed at a comparable level. In this paper, the crash rate comparison and ZINB regression for both total crash frequency and injury and fatal crash frequency in road segment were conducted to discriminate between rural and urban traffic safety. It was found that compared to urban areas, the measures for traffic safety in rural areas show lower crash rates, total crash frequencies, and injury and fatal crash frequencies. The results based on the ZINB regression models also showed the following. (i)Segment crash frequencies increase as traffic volume and segment length increase. However, higher traffic volume will lower the likelihood of severe crash occurrence.(ii)Compared to 2-lane roads, the 4-lane roads have a lower crash frequency but have a higher probability of severe crash occurrence.(iii) Better road facilities with higher free flow speed benefit from high standard design feature resulting in a lower total crash frequency but would not mitigate the severe crash risk.Finally, it can be concluded that in the research area traffic safety of rural segments is better than urban segments, which implies that a priority for traffic safety improvement should be put on the urban highway segments.

Acknowledgments

The authors acknowledge that this study is supported by Chinese National 973 Project (2012CB725403), National Natural Science Foundation (71171014, 71210001), Ph.D. Programs Foundation of Ministry of Education of China (20110009110013), the State Key Laboratory of Rail Traffic Control and Safety (RCS2011ZT007), and Program for New Century Excellent Talents in University (NCET-11-0570).