Table of Contents Author Guidelines Submit a Manuscript
Discrete Dynamics in Nature and Society
Volume 2015 (2015), Article ID 793010, 18 pages
http://dx.doi.org/10.1155/2015/793010
Research Article

An Efficient MapReduce-Based Parallel Clustering Algorithm for Distributed Traffic Subarea Division

1School of Computer and Information Science, Southwest University, Chongqing 400715, China
2School of Information Engineering, Guizhou Minzu University, Guiyang 550025, China
3School of Information Technology, Deakin University, Waurn Ponds, VIC 3216, Australia

Received 21 April 2015; Revised 12 July 2015; Accepted 13 August 2015

Academic Editor: Hubertus Von Bremen

Copyright © 2015 Dawen Xia et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. Y. Qi and S. Ishak, “Stochastic approach for short-term freeway traffic prediction during peak periods,” IEEE Transactions on Intelligent Transportation Systems, vol. 14, no. 2, pp. 660–672, 2013. View at Publisher · View at Google Scholar · View at Scopus
  2. A. de Palma and R. Lindsey, “Traffic congestion pricing methodologies and technologies,” Transportation Research Part C: Emerging Technologies, vol. 19, no. 6, pp. 1377–1399, 2011. View at Publisher · View at Google Scholar · View at Scopus
  3. V. Marx, “The big challenges of big data,” Nature, vol. 498, no. 7453, pp. 255–260, 2013. View at Publisher · View at Google Scholar · View at Scopus
  4. D. Agrawal, P. Bernstein, E. Bertino et al., “Challenges and opportunities with big data: a community white paper developed by leading researchers across the United States,” White Paper, 2012. View at Google Scholar
  5. “Special online collection: dealing with data,” Science, vol. 331, no. 6018, pp. 639–806, 2011.
  6. “Big data: science in the petabyte era,” Nature, vol. 455, no. 7209, pp. 1–136, 2008.
  7. R. R. Weiss and L. Zgorski, Obama Administration Unveils ‘Big Data’ Initiative: Announces $ 200 Million in New R&D Investments, Office of Science and Technology Policy, Executive Office of the President, 2012.
  8. J. Manyika, M. Chui, B. Brown et al., “Big data: the next frontier for innovation, competition, and productivity,” Tech. Rep., McKinsey Global Institute, 2011. View at Google Scholar
  9. Y. Genovese and S. Prentice, “Pattern-based strategy: getting value from big data,” Gartner Special Report, 2011. View at Google Scholar
  10. N. J. Yuan, Y. Zheng, L. Zhang, and X. Xie, “T-finder: a recommender system for finding passengers and vacant taxis,” IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 10, pp. 2390–2403, 2013. View at Publisher · View at Google Scholar · View at Scopus
  11. J. Yuan, Y. Zheng, C. Zhang et al., “T-drive: driving directions based on taxi trajectories,” in Proceedings of the 18th International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL GIS '10), pp. 99–108, ACM, San Jose, Calif, USA, November 2010. View at Publisher · View at Google Scholar · View at Scopus
  12. J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers, Waltham, Mass, USA, 3rd edition, 2011.
  13. D. Pelleg and A. Moore, “X-means: extending K-means with efficient estimation of the number of clusters,” in Proceedings of the 17th International Conference on Machine Learning (ICML '00), pp. 727–734, Stanford, Calif, USA, June 2000.
  14. S. Kantabutra and A. L. Couch, “Parallel K-means clustering algorithm on NOWs,” NECTEC Technical Journal, vol. 1, no. 6, pp. 243–247, 2000. View at Google Scholar
  15. Y. Zhang, Z. Xiong, J. Mao, and L. Ou, “The study of parallel K-means algorithm,” in Proceedings of the 6th World Congress on Intelligent Control and Automation (WCICA '06), pp. 5868–5871, IEEE, Dalian, China, June 2006. View at Publisher · View at Google Scholar · View at Scopus
  16. P. Kraj, A. Sharma, N. Garge, R. Podolsky, and R. A. McIndoe, “ParaKMeans: implementation of a parallelized K-means algorithm suitable for general laboratory use,” BMC Bioinformatics, vol. 9, article 200, 13 pages, 2008. View at Publisher · View at Google Scholar · View at Scopus
  17. M. K. Pakhira, “Clustering large databases in distributed environment,” in Proceedings of the IEEE International Advance Computing Conference (IACC '09), pp. 351–358, Patiala, India, March 2009. View at Publisher · View at Google Scholar · View at Scopus
  18. K. J. Kohlhoff, V. S. Pande, and R. B. Altman, “K-means for parallel architectures using all-prefix-sum sorting and updating steps,” IEEE Transactions on Parallel and Distributed Systems, vol. 24, no. 8, pp. 1602–1612, 2013. View at Publisher · View at Google Scholar · View at Scopus
  19. C.-T. Chu, S. K. Kim, Y.-A. Lin et al., “Map-reduce for machine learning on multicore,” in Proceedings of the 20th Annual Conference on Neural Information Processing Systems (NIPS '06), pp. 281–288, Vancouver, Canada, 2006.
  20. W. Zhao, H. Ma, and Q. He, “Parallel K-means clustering based on MapReduce,” in Proceedings of the 1st International Conference on Cloud Computing (CloudCom '09), pp. 674–679, Beijing, China, December 2009.
  21. P. Zhou, J. Lei, and W. Ye, “Large-scale data sets clustering based on MapReduce and Hadoop,” Journal of Computational Information Systems, vol. 7, no. 16, pp. 5956–5963, 2011. View at Google Scholar · View at Scopus
  22. C. D. Nguyen, D. T. Nguyen, and V.-H. Pham, “Parallel two-phase K-means,” in Proceedings of the 13th International Conference on Computational Science and Its Applications (ICCSA '13), pp. 24–27, Ho Chi Minh City, Vietnam, June 2013.
  23. R. J. Walinchus, “Real-time network decomposition and subnetwork interfacing,” Tech. Rep. HS-011 999, Highway Research Record, 1971. View at Google Scholar
  24. S. C. Wong, W. T. Wong, C. M. Leung, and C. O. Tong, “Group-based optimization of a time-dependent TRANSYT traffic model for area traffic control,” Transportation Research Part B: Methodological, vol. 36, no. 4, pp. 291–312, 2002. View at Publisher · View at Google Scholar · View at Scopus
  25. D. I. Robertson and R. D. Bretherton, “Optimizing networks of traffic signals in real time: the SCOOT method,” IEEE Transactions on Vehicular Technology, vol. 40, no. 1, pp. 11–15, 1991. View at Publisher · View at Google Scholar · View at Scopus
  26. Y.-Y. Ma and X.-G. Yang, “Traffic sub-area division expert system for urban traffic control,” in Proceedings of the International Conference on Intelligent Computation Technology and Automation (ICICTA '08), pp. 589–593, Hunan, China, October 2008. View at Publisher · View at Google Scholar · View at Scopus
  27. K. Lu, J.-M. Xu, S.-J. Zheng, and S.-M. Wang, “Research on fast dynamic division method of coordinated control subarea,” Acta Automatica Sinica, vol. 38, no. 2, pp. 279–287, 2012. View at Publisher · View at Google Scholar · View at Scopus
  28. K. Lu, J.-M. Xu, and S.-J. Zheng, “Correlation degree analysis of neighboring intersections and its application,” Journal of South China University of Technology, vol. 37, no. 11, pp. 37–42, 2009. View at Google Scholar · View at Scopus
  29. H. Guo, J. Cheng, Q. Peng, C. Zhu, and Y. Mu, “Dynamic division of traffic control sub-area methods based on the similarity of adjacent intersections,” in Proceedings of the IEEE 17th International Conference on Intelligent Transportation Systems (ITSC '14), pp. 2208–2213, Qingdao, China, October 2014. View at Publisher · View at Google Scholar
  30. C. Li, Y. Xie, H. Zhang, and X.-L. Yan, “Dynamic division about traffic control sub-area based on back propagation neural network,” in Proceedings of the 2nd International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC '10), pp. 22–25, Nanjing, China, August 2010. View at Publisher · View at Google Scholar · View at Scopus
  31. Z. Zhou, S. Lin, and Y. Xi, “A fast network partition method for large-scale urban traffic networks,” Journal of Control Theory and Applications, vol. 11, no. 3, pp. 359–366, 2013. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  32. J. MacQueen, “Some methods for classification and analysis of multivariate observations,” in Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297, University of California Press, Berkeley, Calif, USA, 1967. View at Google Scholar
  33. A. K. Jain, “Data clustering: 50 years beyond K-means,” Pattern Recognition Letters, vol. 31, no. 8, pp. 651–666, 2010. View at Publisher · View at Google Scholar · View at Scopus
  34. X. Wu, V. Kumar, J. R. Quinlan et al., “Top 10 algorithms in data mining,” Knowledge and Information Systems, vol. 14, no. 1, pp. 1–37, 2008. View at Publisher · View at Google Scholar · View at Scopus
  35. W. Zou, Y. Zhu, H. Chen, and X. Sui, “A clustering approach using cooperative artificial bee colony algorithm,” Discrete Dynamics in Nature and Society, vol. 2010, Article ID 459796, 16 pages, 2010. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  36. D. T. Pham, S. S. Dimov, and C. D. Nguyen, “An incremental K-means algorithm,” Proceedings of the Institution of Mechanical Engineers Part C: Journal of Mechanical Engineering Science, vol. 218, no. 7, pp. 783–794, 2004. View at Publisher · View at Google Scholar · View at Scopus
  37. D. T. Pham, S. S. Dimov, and C. D. Nguyen, “A two-phase K-means algorithm for large datasets,” Journal of Mechanical Engineering Science, vol. 218, no. 10, pp. 1269–1273, 2004. View at Publisher · View at Google Scholar · View at Scopus
  38. K. Shvachko, H. Kuang, S. Radia, and R. Chansler, “The Hadoop distributed file system,” in Proceedings of the 26th Symposium on Mass Storage Systems and Technologies (MSST '10), pp. 1–10, IEEE, Incline Village, Nev, USA, May 2010. View at Publisher · View at Google Scholar · View at Scopus
  39. A. Ene, S. Im, and B. Moseley, “Fast clustering using MapReduce,” in Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '11), pp. 681–689, San Diego, Calif, USA, August 2011.
  40. T. White, Hadoop: The Definitive Guide, O'Reilly Media, Sebastopol, Calif, USA, 3rd edition, 2012.
  41. D. Xia, Z. Rong, Y. Zhou, Y. Li, Y. Shen, and Z. Zhang, “A novel parallel algorithm for frequent itemsets mining in massive small files datasets,” ICIC Express Letters Part B: Applications, vol. 5, no. 2, pp. 459–466, 2014. View at Google Scholar · View at Scopus
  42. D. Xia, Z. Rong, Y. Zhou, B. Wang, Y. Li, and Z. Zhang, “Discovery and analysis of usage data based on Hadoop for personalized information access,” in Proceedings of the IEEE 16th International Conference on Computational Science and Engineering—Big Data Science and Engineering (CSE-BDSE '13), pp. 917–924, IEEE, Sydney, Australia, December 2013. View at Publisher · View at Google Scholar
  43. D. Xia, B. Wang, Z. Rong, Y. Li, and Z. Zhang, “Effective methods and strategies for massive small files processing based on Hadoop,” ICIC Express Letters, vol. 8, no. 7, pp. 1935–1941, 2014. View at Google Scholar · View at Scopus
  44. S. Ghemawat, H. Gobioff, and S.-T. Leung, “The google file system,” in Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP '03), pp. 29–43, Bolton Landing, NY, USA, October 2003. View at Scopus
  45. J. Dean and S. Ghemawat, “MapReduce: simplified data processing on large clusters,” Communications of the ACM, vol. 51, no. 1, pp. 107–113, 2008. View at Publisher · View at Google Scholar · View at Scopus
  46. P. Zikopoulos, C. Eaton, D. deRoos, T. Deutsch, and G. Lapis, Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data, McGraw-Hill, New York, NY, USA, 2011.
  47. W. K. D. Pun and A. B. M. S. Ali, “Unique distance measure approach for K-means (UDMA-Km) clustering algorithm,” in Proceedings of the IEEE Region 10 Conference (TENCON '07), pp. 1–4, IEEE, Taipei, Taiwan, November 2007. View at Publisher · View at Google Scholar · View at Scopus
  48. A. M. Fahim, A. M. Salem, F. A. Torkey, and M. A. Ramadan, “An efficient enhanced K-means clustering algorithm,” Journal of Zhejiang University SCIENCE A, vol. 7, no. 10, pp. 1626–1633, 2006. View at Publisher · View at Google Scholar · View at Scopus
  49. M. Zhu, Data Mining, University of Science and Technology of China Press, 2002.
  50. L. Kaufman and P. J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis, John Wiley & Sons, 1990.
  51. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Proceedings of the 26th Annual Conference on Neural Information Processing Systems (NIPS '12), pp. 1097–1105, Lake Tahoe, Nev, USA, December 2012.
  52. S. Englert, J. Gray, T. Kocher, and P. Shah, “A benchmark of NonStop SQL release 2 demonstrating near-linear speedup and scaleup on large databases,” ACM SIGMETRICS Performance Evaluation Review, vol. 18, no. 1, pp. 245–246, 1990. View at Publisher · View at Google Scholar
  53. X. Xu, J. Jäger, and H.-P. Kriegel, “A fast parallel clustering algorithm for large spatial databases,” Data Mining and Knowledge Discovery, vol. 3, no. 3, pp. 263–290, 1999. View at Publisher · View at Google Scholar · View at Scopus