Table of Contents Author Guidelines Submit a Manuscript
Mathematical Problems in Engineering
Volume 2017, Article ID 4953280, 11 pages
https://doi.org/10.1155/2017/4953280
Research Article

A Method for Entity Resolution in High Dimensional Data Using Ensemble Classifiers

1PLA University of Science and Technology, Nanjing, Jiangsu 210007, China
2Nanjing Telecommunication Technology Institute, Nanjing, Jiangsu 210007, China

Correspondence should be addressed to Cao Jian-jun; ten.haey@oacnujnaij

Received 31 October 2016; Accepted 17 January 2017; Published 15 February 2017

Academic Editor: Yaguo Lei

Copyright © 2017 Liu Yi et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. A. K. Elmagarmid, P. G. Ipeirotis, and V. S. Verykios, “Duplicate record detection: a survey,” IEEE Transactions on Knowledge and Data Engineering, vol. 19, no. 1, pp. 1–16, 2007. View at Publisher · View at Google Scholar · View at Scopus
  2. P. Christen, “A survey of indexing techniques for scalable record linkage and deduplication,” IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 9, pp. 1537–1555, 2012. View at Publisher · View at Google Scholar · View at Scopus
  3. V. K. Borate and S. Giri, “XML duplicate detection with improved network pruning algorithm,” in Proceedings of the IEEE International Conference on Pervasive Computing (ICPC '15), pp. 1–5, IEEE, Pune, India, January 2015. View at Publisher · View at Google Scholar · View at Scopus
  4. E. K. Rezig, E. C. Dragut, M. Ouzzani, and A. K. Elmagarmid, “Query-time record linkage and fusion over Web databases,” in Proceedings of the 31st IEEE International Conference on Data Engineering (ICDE '15), pp. 42–53, Seoul, Korea, April 2015. View at Publisher · View at Google Scholar · View at Scopus
  5. O. Peled, M. Fire, L. Rokach, and Y. Elovici, “Entity matching in online social networks,” in Proceedings of the IEEE International Conference on Social Computing, vol. 10, No. 1, Washington, DC, USA, 2013.
  6. Z.-H. Zhou, N. V. Chawla, Y. Jin, and G. J. Williams, “Big data opportunities and challenges: discussions from data analytics perspectives,” IEEE Computational Intelligence Magazine, vol. 9, no. 4, pp. 62–74, 2014. View at Publisher · View at Google Scholar · View at Scopus
  7. J. Wang, T. Kraska, M. J. Franklin, and J. Feng, “CrowdER: crowdsourcing entity resolution,” in Proceedings of the VLDB Endowment, vol. 5, no. 11, 2012.
  8. S. E. Whang, P. Lofgren, and H. G. Molina, “Question selection for crowd entity resolution,” in Proceedings of the VLDB Endowment, vol. 6, no. 6, pp. 349–360, Trento, Italy, August 2013.
  9. A. Abboura, S. Sahrl, M. Ouziri, and S. Benbernou, “CrowdMD: crowdsourcing-based approach for deduplication,” in Proceedings of the 3rd IEEE International Conference on Big Data (IEEE Big Data '15), pp. 2621–2627, Santa Clara, Calif, USA, November 2015. View at Publisher · View at Google Scholar · View at Scopus
  10. C. Zhang, R. Meng, L. Chen, and F. Zhu, “CrowdLink: an error-tolerant model for linking complex records,” in Proceedings of the the Second International Workshop on Exploratory Search in Databases and the Web, pp. 15–20, ACM, Melbourne, VIC, Australia, May 2015. View at Publisher · View at Google Scholar
  11. P. A. Priya, S. Prabhakar, and S. Vasavi, “Entity resolution for high velocity streams using semantic measures,” in Proceedings of the 5th IEEE International Advance Computing Conference (IACC '15), pp. 35–40, IEEE, Banglore, India, June 2015. View at Publisher · View at Google Scholar · View at Scopus
  12. S. Fries, B. Boden, G. Stepien, and T. Seidl, “PHiDJ: parallel similarity self-join for high-dimensional vector data with MapReduce,” in Proceedings of the 30th IEEE International Conference on Data Engineering (ICDE '14), pp. 796–807, Chicago, Ill, USA, April 2014. View at Publisher · View at Google Scholar · View at Scopus
  13. A. Abouzeid, K. B. Pawlikowsk, D. Abadi, A. Silberschatz, and A. Rasin, “HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads,” Proceedings of the VLDB Endowment, vol. 2, no. 1, pp. 922–933, 2009. View at Publisher · View at Google Scholar
  14. J. Dittrich, J. A. Q. Ruiz, A. Jindal, Y. Kargin, V. Setty, and J. Schad, “Hadoop++: making a yellow elephant run like a cheetah (Without It Even Noticing),” Proceedings of the VLDB Endowment, vol. 3, no. 12, pp. 518–529, 2010. View at Google Scholar
  15. M. Alexandrov, V. Heimel, V. Markl et al., “Massively parallel data analysis with PACTs on nephele,” in Proceedings of the VLDB Endowment, vol. 3, no. 1, pp. 1625–1628, 2010.
  16. F. Masulli and S. Rovetta, “Clustering high-dimensional data,” in Proceedings of the 1st International Workshop (CHDD'12), pp. 1–13, 2012.
  17. G. Cheng, D. Xu, and Y. Qu, “C3D+P: a summarization method for interactive entity resolution,” Web Semantics: Science, Services and Agents on the World Wide Web, vol. 35, pp. 203–213, 2015. View at Publisher · View at Google Scholar · View at Scopus
  18. M. Song, E. H.-J. Kim, and H. J. Kim, “Exploring author name disambiguation on PubMed-scale,” Journal of Informetrics, vol. 9, no. 4, pp. 924–941, 2015. View at Publisher · View at Google Scholar · View at Scopus
  19. J. Guerreiro, D. Gonçalves, and D. M. D. Matos, “Towards a fair comparison between name disambiguation approaches,” in Proceedings of the International Conference in the Riao Series: Open Research Areas in Information Retrieval (OAIR '13), pp. 17–20, Lisbon, Portugal, May 2013.
  20. P. Treeratpituk and C. L. Giles, “Disambiguating authors in academic publications using random forests,” in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL '09), pp. 39–48, Austin, Tex, USA, June 2009. View at Publisher · View at Google Scholar · View at Scopus
  21. J.-J. Cao, X.-C. Diao, Y. Du, F.-X. Wang, and X.-Y. Zhang, “Classification detection of approximately duplicate records based on feature selection using ant colony algorithm,” Acta Armamentarii, vol. 31, no. 9, pp. 1222–1227, 2010. View at Google Scholar · View at Scopus
  22. C. Kacfah Emani, N. Cullot, and C. Nicolle, “Understandable big data: a survey,” Computer Science Review, vol. 17, pp. 70–81, 2015. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  23. I. P. Fellegi and A. B. Sunter, “A theory for record linkage,” Journal of the American Statistical Association, vol. 64, no. 328, pp. 1183–1210, 1969. View at Publisher · View at Google Scholar · View at Scopus
  24. F. Naumann and M. Herschel, “An introduction to duplicate detection,” in Synthesis Lectures on Data Management, vol. 2, No. 1, 2010. View at Google Scholar
  25. H. Yu and J. Ni, “An improved ensemble learning method for classifying high-dimensional and imbalanced biomedicine data,” IEEE/ACM Transactions on Computational Biology & Bioinformatics, vol. 11, no. 4, pp. 657–666, 2014. View at Publisher · View at Google Scholar · View at Scopus
  26. K.-Q. Li, X.-C. Diao, J.-J. Cao, and F. Li, “High precision method for text feature selection based on improved ant colony optimization algorithm,” Journal of PLA University of Science & Technology, vol. 11, no. 6, pp. 634–639, 2010. View at Google Scholar · View at Scopus
  27. D. H. Wolpert, “The supervised learning no-free-lunch theorems,” in Proceedings of the World Conference on Soft Computing, pp. 25–42, 2002.
  28. B. Krawczyk and G. Schaefer, “Breast thermogram analysis using classifier ensembles and image symmetry features,” IEEE Systems Journal, vol. 8, no. 3, pp. 921–928, 2014. View at Publisher · View at Google Scholar · View at Scopus
  29. B. Li, J. Li, K. Tang, and X. Yao, “Many-objective evolutionary algorithms,” ACM Computing Surveys, vol. 48, no. 1, pp. 1–35, 2015. View at Google Scholar
  30. E. Zitzler, L. Thiele, M. Laumanns, C. M. Fonseca, and V. G. Da Fonseca, “Performance assessment of multiobjective optimizers: an analysis and review,” IEEE Transactions on Evolutionary Computation, vol. 7, no. 2, pp. 117–132, 2003. View at Publisher · View at Google Scholar · View at Scopus
  31. L. Ke, Q. Zhang, and R. Battiti, “Using ACO in MOEA/D for Multiobjective Combinatorial Optimization,” http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.720.644.
  32. M. López-Ibáñez and T. Stützle, “The impact of design choices of multiobjective ant colony optimization algorithms on performance: an experimental study on the biobjective TSP,” in Proceedings of the 12th Annual Genetic and Evolutionary Computation Conference (GECCO '10), pp. 713–720, Portland, Ore, USA, July 2010. View at Publisher · View at Google Scholar · View at Scopus
  33. J.-J. Cao, P.-L. Zhang, Y.-X. Wang, G.-Q. Ren, and J.-P. Fu, “Graph-based ant system for subset problems,” Journal of System Simulation, vol. 20, no. 22, pp. 6146–6150, 2008. View at Google Scholar · View at Scopus
  34. M. Lopez-Ibanez and T. Stutzle, “The automatic design of multiobjective ant colony optimization algorithms,” IEEE Transactions on Evolutionary Computation, vol. 16, no. 6, pp. 861–875, 2012. View at Publisher · View at Google Scholar · View at Scopus