Journal of Advanced Transportation

Volume 2019, Article ID 3521793, 13 pages

https://doi.org/10.1155/2019/3521793

## Comparative Analysis of the Reported Animal-Vehicle Collisions Data and Carcass Removal Data for Hotspot Identification

^{1}Key Laboratory of Road and Traffic Engineering of Ministry of Education, Tongji University, Shanghai 201804, China^{2}Texas A&M Transportation Institute 3135 TAMU, College Station, Texas 77843-3135, USA^{3}Department of Civil and Environmental Engineering, University of Washington, Washington More Hall 133B, USA

Correspondence should be addressed to Yichuan Peng; moc.liamtoh@2891gnepnauhciy

Received 13 October 2018; Revised 8 January 2019; Accepted 30 January 2019; Published 1 April 2019

Academic Editor: Md. Mazharul Haque

Copyright © 2019 Xiaoxue Yang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Two common types of animal-vehicle collision data (reported animal-vehicle collision (AVC) data and carcass removal data) are usually recorded by transportation management agencies. Previous studies have found that these two datasets often demonstrate different characteristics. To accurately identify the higher-risk animal-vehicle collision sites, this study compared the differences in hotspot identification and the effect of explanation variables between carcass removal and reported AVCs. To complete the objective, both the Negative Binomial (NB) model and the generalized Negative Binomial (GNB) are applied in calculating the Empirical Bayesian (EB) estimates using the animal collision data collected on ten highways in Washington State. The important findings can be summarized as follows. The explanatory variables have different effects on the occurrence of carcass removal data and reported AVC data. The ranking results from EB estimates when using carcass removal data and reported AVC data differ significantly. The results of hotspot identification are different between carcass removal data and reported AVC data. However, the ranking results of GNB models are better than those of NB models in terms of consistency. Thus, transportation management agencies should be cautious when using either carcass removal data or reported AVC data to identify hotspots.

#### 1. Introduction

Animal-vehicle collisions (AVCs) have always been one of research frontiers and hot topics. Van der Ree et al. [1] indicated that mortality rate of AVCs is a major concern across most of the developed countries, and it becomes more serious in the developing countries in the next few decades. It was estimated that the number of AVCs per year exceeded 1 million in the 1990s [2]. There are about 155-211 deaths, 13,713-29,000 injuries, and 1 billion dollars property loss per year caused by AVCs [2–5]. The fact that the average number of fatal AVCs was increasing year by year was inferred from the record from the NHTSA Fatality Analysis Reporting System (FARS) [4]. Previous studies found that the number of wild animals decreased significantly due to AVCs [6–8], and billions of wild animals died annually in the collision with vehicle and other types of transportation mode [9, 10].

To implement reasonable management measures with limited resources, hotspot identification (HSID), identifying sites with higher collision risk as hotspots, is an important task in the overall road safety improvement process. In recent years, researchers have proposed various HSID methods, e.g., accident frequency (AF), accident rate (AR), accident reduction potential (ARP), and Empirical Bayesian (EB). Among these methods, the EB method is adopted in this study [11–15]. Previously, researchers have mainly investigated two types of animal-vehicle collision data (number of carcass removal and reported AVCs) [16, 17]. In order to reduce the risk of AVCs and formulate effective countermeasures, transportation safety researchers have tried various statistical models to study the influence of quantitative explanation on AVCs [18], such as Poisson regression [19–22], Negative Binomial (NB) regression [23–27], Poisson–lognormal regression model [28], and Gamma regression model [29, 30].

Most previous AVCs studies considered either the reported AVC data or the carcass removal data. For the carcass removal data, Gkritza et al. [31] evaluated the effect of deterministic factors on the occurrence frequency and severity of AVCs using Poisson regression model and NB regression model. For the reported AVC data, a stepwise logistic regression model was used to identify the important factors and the high risk collision points [32–34]. Seiler [35] predicted the collision of nonincident control points through the reported AVCs data using a multiple logistic regression model. Researchers like Lao et al. [36] found a carcass found on the road is likely caused by collisions with vehicles. However, many previous studies found that the number of the carcass removal differs from the number of the reported AVCs [37–39]. The discrepancy of two AVC data sources is explained as follows. First, not all the wild animals related to the AVCs are died. Second, not all the carcasses are reported through the media.

Meanwhile, there are several researchers focusing on the difference and relationship between two datasets. A fuzzy logic–based mapping algorithm is used to merge the two incomplete datasets [36]. Lao et al. [40] developed a diagonal inflated bivariate Poisson regression model to consider the two datasets simultaneously. To predict AVCs risk, Visintin et al. [41] proposed a model that considers two types of factors: vehicles and animals.

However, few studies have compared the hotspot identification results obtained from the carcass removal and the reported AVCs data. Thus, the primary objective of this paper is to examine the difference in hotspot identification and the effect of the explanation variables on the carcass removal and the reported AVCs. To complete the objective, both the traditional NB model and the generalized Negative Binomial (GNB) are applied in calculating the EB estimates. The dispersion parameter of the NB model is fixed, while the GNB assumes the dispersion parameter varies from site to site. This study analysed the crash data collected at ten highways in Washington State.

The rest of the paper is organized as follows. The second section introduces the methodology of the EB method based on NB model and GNB model used in this study. The third section provides the data description and preliminary data analysis. The following section displays model results. The reported AVC and the carcass removal are also compared by the EB method based on the NB model and GNB model. Finally, the model results are discussed and summarized.

#### 2. Materials and Methods

The following two sections introduce Negative Binomial model based and generalized Negative Binomial model based EB methods, respectively.

##### 2.1. Negative Binomial Model Based Empirical Bayesian Method

The EB estimate of a site consists of two parts: predicted number of crashes from similar sites and observed number of crashes at the site. The prediction is usually based on safety performance functions (SPFs), which commonly assume the traffic counts follow some probability distributions. Until now, the NB method is the most popular approach to estimate the EB values. And the weight factor is determined by the dispersion parameter of the NB models. The NB model has the model structure below. Poisson distribution is used to assume the number of crashes during a specific time period, which is defined by where = mean response of the observation.

If the Poisson rate is assumed to be gamma distributed, the response variable follows a NB distribution. Thus, the NB distribution can be seen as a mixture of Poisson distributions. Hilbe [42] illustrated the whole derivation of the NB model. The probability density function of the NB is defined below: where response variable; mean of the observation; and dispersion parameter.

Compared to the Poisson distribution, the NB distribution is appropriate for handling the overdispersion (that is, the variance is larger than the mean). For , the mean of y is and variance is . If , the variance equals the mean and the NB distribution converges to the Poisson distribution.

The dispersion parameter of the NB model is of great significance in calculating the EB estimates. Thus, the EB method is proposed to calculate the long term mean for the site i by Hauer (1992) [43]. And the EB method is shown as follows: where =predicted number of crashes per year for site i estimated by EB method; = predicted number of crashes per year for site i expected by the SPF; = weight factor defined as a function of and dispersion parameter ; and =observed number of crashes per year at site i.

##### 2.2. Generalized Negative Binomial Model Based Empirical Bayesian Method

Traditionally, the NB models assume fixed dispersion parameter (i.e., all sites share the same dispersion parameter), and it is used to calculate EB estimates. However, in recent years, some studies have found that the dispersion parameter is related to the explanatory variables. They also discovered that GNB model presents better statistical adaptive performance and describes the dispersion phenomenon better [25, 44]. That is to say, the varying dispersion parameter has an impact on the EB estimates and may potentially improve the EB estimates [45]. For the GNB model, the difference of the EB estimates between the carcass removal and the reported AVCs is shown in this section.

When estimating the EB value, the weight factor will be influenced by the selection of the functional form. As discussed in a previous study [46], we considered several different functional forms to calculate the dispersion parameter . The functional forms representing dispersion parameter of GNB model are shown as follows: where = the dispersion parameter at segment i; = the segment length in miles for segment i; and = coefficients to be estimated.

#### 3. Data Description and Preliminary Data Analysis

The collision dataset used in this study was collected at ten highways (I90, US2, SR8, SR20, US97, US101, US395, SR525, US12, and SR970) in Washington State. This dataset includes the reported AVC and the carcass removal data over a five-year period from 2002 to 2006 [40]. In our study, 10475 road segments are chosen as the research targets. That is, the number of the count is 10475. According to specific road characteristics (i.e., median width, lane width, and shoulder type), the highway is divided into road segments with different length. Table 1 shows the data acquisition time covered by the three main datasets used in this study. Reported AVCs dataset is collected from traffic collision records of Washington State Department of Transportation (WSDOT) and Highway Safety Information System (HSIS). Carcass removal dataset is gathered from the maintenance files recorded by the maintenance workers of WSDOT.