Table of Contents Author Guidelines Submit a Manuscript
Discrete Dynamics in Nature and Society
Volume 2015, Article ID 793010, 18 pages
http://dx.doi.org/10.1155/2015/793010
Research Article

An Efficient MapReduce-Based Parallel Clustering Algorithm for Distributed Traffic Subarea Division

1School of Computer and Information Science, Southwest University, Chongqing 400715, China
2School of Information Engineering, Guizhou Minzu University, Guiyang 550025, China
3School of Information Technology, Deakin University, Waurn Ponds, VIC 3216, Australia

Received 21 April 2015; Revised 12 July 2015; Accepted 13 August 2015

Academic Editor: Hubertus Von Bremen

Copyright © 2015 Dawen Xia et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Traffic subarea division is vital for traffic system management and traffic network analysis in intelligent transportation systems (ITSs). Since existing methods may not be suitable for big traffic data processing, this paper presents a MapReduce-based Parallel Three-Phase -Means (Par3PKM) algorithm for solving traffic subarea division problem on a widely adopted Hadoop distributed computing platform. Specifically, we first modify the distance metric and initialization strategy of -Means and then employ a MapReduce paradigm to redesign the optimized -Means algorithm for parallel clustering of large-scale taxi trajectories. Moreover, we propose a boundary identifying method to connect the borders of clustering results for each cluster. Finally, we divide traffic subarea of Beijing based on real-world trajectory data sets generated by 12,000 taxis in a period of one month using the proposed approach. Experimental evaluation results indicate that when compared with -Means, Par2PK-Means, and ParCLARA, Par3PKM achieves higher efficiency, more accuracy, and better scalability and can effectively divide traffic subarea with big taxi trajectory data.