Table of Contents Author Guidelines Submit a Manuscript
Journal of Healthcare Engineering
Volume 2017, Article ID 1425102, 12 pages
https://doi.org/10.1155/2017/1425102
Research Article

Handling Data Skew in MapReduce Cluster by Using Partition Tuning

1College of Information Science and Technology, Beijing Normal University, Beijing, China
2Department of Industrial Engineering, Pusan National University, Pusan, Republic of Korea
3Cooperative Innovation Center of Internet Healthcare, Henan Province, China
4School of Information Engineering, Zhengzhou University, Zhengzhou, China
5Beijing Advanced Innovation Center for Future Education, Beijing Normal University, Beijing, China

Correspondence should be addressed to Jiacai Zhang; nc.ude.unb@gnahz.iacaij

Received 31 October 2016; Revised 2 January 2017; Accepted 19 February 2017; Published 29 March 2017

Academic Editor: Chase Wu

Copyright © 2017 Yufei Gao et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. W. Raghupathi and V. Raghupathi, “Big data analytics in healthcare: promise and potential,” Health Information Science and Systems, vol. 2, no. 1, pp. 1–10, 2014. View at Publisher · View at Google Scholar
  2. M. D. Dikaiakos, D. Katsaros, P. Mehra, G. Pallis, and A. Vakali, “Cloud computing: distributed internet computing for IT and scientific research,” IEEE Internet Computing, vol. 13, no. 5, pp. 10–13, 2009. View at Google Scholar
  3. K. Shim, “MapReduce algorithms for big data analysis,” Proceedings of the VLDB Endowment, vol. 5, no. 12, pp. 2016–2017, 2012. View at Publisher · View at Google Scholar · View at Scopus
  4. Y. C. Kwon, M. Balazinska, B. Howe, and J. Rolia, “A study of skew in mapreduce applications,” in Open Cirrus Summit, IEEE, Moscow, Russia, June 2011.
  5. S. Ibrahim, H. Jin, L. Lu, B. He, G. Antoniu, and S. Wu, “Handling partitioning skew in mapreduce using leen,” Peer-to-Peer Networking and Applications, vol. 6, no. 4, pp. 409–424, 2013. View at Publisher · View at Google Scholar · View at Scopus
  6. Y. Xu, P. Zou, W. Qu, Z. Li, K. Li, and X. Cui, “Sampling-based partitioning in MapReduce for skewed data,” in ChinaGrid Annual Conference (ChinaGrid), IEEE, pp. 1–8, Beijing, China, 2012. View at Publisher · View at Google Scholar · View at Scopus
  7. S. R. Ramakrishnan, G. Swart, and A. Urmanov, “Balancing reducer skew in MapReduce workloads using progressive sampling,” in Proceedings of the Third ACM Symposium on Cloud Computing, pp. 16–2012, ACM, San Jose, California, October 14–17, 2012.
  8. J. Lin, “The curse of zipf and limits to parallelization: a look at the stragglers problem in mapreduce,” 7th Workshop on Large-Scale Distributed Systems for Information Retrieval, ACM, Boston, MA, USA, 2009. View at Google Scholar
  9. B. Gufler, N. Augsten, A. Reiser, and A. Kemper, “Handing data skew in MapReduce,” in Proceedings of the 1st International Conference on Cloud Computing and Services Science, INSTICC, vol. 146, pp. 574–583, Noordwijkerhout, Netherlands, 2011.
  10. T. White, Hadoop: The definitive guide, O'Reilly Media/Yahoo Press, CA, USA, 2012.
  11. D. Tomar and S. Agarwal, “A survey on data mining approaches for healthcare,” International Journal of Bio-Science and Bio-Technology, vol. 5, no. 5, pp. 241–266, 2013. View at Publisher · View at Google Scholar · View at Scopus
  12. Y. Ji, H. Ying, J. Tran et al., “Mining infrequent causal associations in electronic healthcare databases,” in 2011 IEEE 11th International Conference on Data Mining Workshops, pp. 421–428, Vancouver, Canada, 2011.
  13. B. M. Patil, R. C. Joshi, and D. Toshniwal, “Association rule for classification of type-2 diabetic patients,” in Machine Learning and Computing (ICMLC), 2010 Second International Conference on IEEE, pp. 330–334, Minneapolis, MN, USA, 2010.
  14. U. Abdullah, J. Ahmad, and A. Ahmed, “Analysis of effectiveness of apriori algorithm in medical billing data mining,” ICET, pp. 327–331, 2008. View at Publisher · View at Google Scholar · View at Scopus
  15. M. J. Zaki, “Parallel and distributed association mining: a survey,” IEEE Concurrency, vol. 7, no. 4, pp. 14–25, 1999. View at Publisher · View at Google Scholar · View at Scopus
  16. Z. Liu, Q. Zhang, R. Boutaba, Y. Liu, and B. Wang, “OPTIMA: on-line partitioning skew mitigation for MapReduce with resource adjustment,” Journal of Network and Systems Management, vol. 24, no. 4, pp. 859–883, 2016. View at Publisher · View at Google Scholar · View at Scopus
  17. H. C. Yang, A. Dasdan, R. L. Hsiao, and D. S. Parker, “Map-reduce-merge: simplified relational data processing on large clusters,” in Proceedings of the 2007 ACM SIGMOD international conference on Management of data, pp. 1029–1040, Beijing, China, 2007.
  18. S. Chopra and M. R. Rao, “The partition problem,” Mathematical Programming, vol. 59, no. 1–3, pp. 87–115, 1993. View at Publisher · View at Google Scholar · View at Scopus
  19. “Hadoop [EB/OL],” June 2012, http://lucene.apache.org/hadoop
  20. R. Jain, D. M. Chiu, and W. R. Hawe, A Quantitative Measure of Fairness and Discrimination for Resource Allocation in Shared Computer System, Eastern Research Laboratory, Digital Equipment Corporation, Hudson, MA, 1984.
  21. F. Ahmad, S. Lee, M. Thottethodi, and T. N. Vijaykumar, Puma: Purdue Mapreduce Benchmarks Suite, Purdue University, Indiana, USA, 2012.
  22. G. Urdaneta, G. Pierre, and M. V. Steen, “Wikipedia workload analysis for decentralized hosting,” Computer Networks, vol. 53, no. 11, pp. 1830–1845, 2009. View at Publisher · View at Google Scholar · View at Scopus
  23. M. Hammoud, M. S. Rehman, and M. F. Sakr, “Center-of-gravity reduce task scheduling to lower mapreduce network traffic,” in Cloud Computing (CLOUD), 2012 IEEE 5th International Conference, pp. 49–58, Honolulu, HI, USA, 2012.
  24. R. RAgrawal and R. Srikant, “Fast algorithms for mining association rules,” in Proceedings of the 20th International Conference Very Large Data Bases, VLDB, vol. 1215, pp. 487–499, Santiago de Chile, Chile, 1994.
  25. United States Department of Health and Human Services. Substance Abuse and Mental Health Services Administration. Center for Behavioral Health Statistics and Quality, National Survey on Drug Use and Health, 2014, ICPSR36361-v1, Inter-university Consortium for Political and Social Research [distributor], Ann Arbor, MI, 2016. View at Publisher · View at Google Scholar
  26. E. R. Omiecinski, “Alternative interest measures for mining associations in databases,” IEEE Transactions on Knowledge and Data Engineering, vol. 15, no. 1, pp. 57–69, 2003. View at Publisher · View at Google Scholar · View at Scopus