Table of Contents Author Guidelines Submit a Manuscript
Journal of Advanced Transportation
Volume 2017, Article ID 5230248, 9 pages
Research Article

Developing a Clustering-Based Empirical Bayes Analysis Method for Hotspot Identification

1The Key Laboratory of Road and Traffic Engineering, Ministry of Education, Tongji University, Shanghai 201804, China
2University of Washington, P.O. Box 352700, Seattle, WA 98195-2700, USA
3Uncertainty Decision-Making Laboratory, Sichuan University, Chengdu 610064, China
4Department of Civil and Environmental Engineering, University of Washington, Seattle, WA 98195, USA
5College of Urban Railway Transportation, Shanghai University of Engineering Science, 333 Longteng Road, Shanghai 201620, China

Correspondence should be addressed to Yanxi Hao; moc.361@6130xyoah and Yichuan Peng; moc.liamtoh@2891gnepnauhciy

Received 15 June 2017; Revised 10 October 2017; Accepted 15 October 2017; Published 22 November 2017

Academic Editor: Chunjiao Dong

Copyright © 2017 Yajie Zou et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Hotspot identification (HSID) is a critical part of network-wide safety evaluations. Typical methods for ranking sites are often rooted in using the Empirical Bayes (EB) method to estimate safety from both observed crash records and predicted crash frequency based on similar sites. The performance of the EB method is highly related to the selection of a reference group of sites (i.e., roadway segments or intersections) similar to the target site from which safety performance functions (SPF) used to predict crash frequency will be developed. As crash data often contain underlying heterogeneity that, in essence, can make them appear to be generated from distinct subpopulations, methods are needed to select similar sites in a principled manner. To overcome this possible heterogeneity problem, EB-based HSID methods that use common clustering methodologies (e.g., mixture models, -means, and hierarchical clustering) to select “similar” sites for building SPFs are developed. Performance of the clustering-based EB methods is then compared using real crash data. Here, HSID results, when computed on Texas undivided rural highway cash data, suggest that all three clustering-based EB analysis methods are preferred over the conventional statistical methods. Thus, properly classifying the road segments for heterogeneous crash data can further improve HSID accuracy.