Table of Contents Author Guidelines Submit a Manuscript
Security and Communication Networks
Volume 2017, Article ID 6841216, 12 pages
https://doi.org/10.1155/2017/6841216
Research Article

Efficient Two-Step Protocol and Its Discriminative Feature Selections in Secure Similar Document Detection

1Department of Computer Science, Kangwon National University, 1 Kangwondaehak-gil, Chuncheon-si, Gangwon 24341, Republic of Korea
2Electronics and Telecommunications Research Institute, 218 Gajeong-ro, Yuseong-gu, Daejeon 34129, Republic of Korea

Correspondence should be addressed to Yang-Sae Moon; rk.ca.nowgnak@noomsy

Received 27 July 2016; Revised 31 January 2017; Accepted 6 February 2017; Published 28 March 2017

Academic Editor: Kai Rannenberg

Copyright © 2017 Sang-Pil Kim et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. D. Sorokina, J. Gehrke, S. Warner, and P. Ginsparg, “Plagiarism detection in arXiv,” in Proceedings of the 6th IEEE International Conference on Data Mining, pp. 1070–1075, Hong Kong, December 2006.
  2. W. Jiang, M. Murugesan, C. Clifton, and L. Si, “Similar document detection with limited information disclosure,” in Proceedings of the IEEE 24th International Conference on Data Engineering (ICDE '08), pp. 735–743, Cancun, Mexico, April 2008. View at Publisher · View at Google Scholar
  3. A. Stavrianou, P. Andritsos, and N. Nicoloyannis, “Overview and semantic issues of text mining,” SIGMOD Record, vol. 36, no. 3, pp. 23–34, 2007. View at Publisher · View at Google Scholar · View at Scopus
  4. C. C. Aggarwal and P. S. Yu, “Privacy-preserving data mining: a survey,” Handbook of Database Security: Applications and Trends, pp. 431–460, 2008. View at Publisher · View at Google Scholar · View at Scopus
  5. R. Agrawal and R. Srikant, “Privacy-preserving data mining,” SIGMOD Record (ACM Special Interest Group on Management of Data), vol. 29, no. 2, pp. 439–450, 2000. View at Google Scholar · View at Scopus
  6. P. R. Bhaladhare and D. C. Jinwala, “Novel approaches for privacy preserving data mining in k-anonymity model,” Journal of Information Science and Engineering, vol. 32, no. 1, pp. 63–78, 2016. View at Google Scholar · View at Scopus
  7. S. Buyrukbilen and S. Bakiras, “Secure similar document detection with simhash,” in Proceedings of the Workshop on VLDB-Secure Data Management (SDM '13), pp. 61–75, Trento, Italy, August 2013.
  8. Y. Peng, G. Kou, Y. Shi, and Z. Chen, “Privacy-preserving data mining for medical data: application of data partition methods,” in Communications and Discoveries from Multidisciplinary Data, vol. 123 of Studies in Computational Intelligence, pp. 331–340, Springer, Berlin, Germany, 2008. View at Publisher · View at Google Scholar
  9. C. Clifton, M. Kantarcioglu, J. Vaidya, X. Lin, and M. Y. Zhu, “Tools for privacy preserving distributed data mining,” ACM SIGKDD Explorations Newsletter, vol. 4, no. 2, pp. 28–34, 2002. View at Publisher · View at Google Scholar
  10. B. Pinkas, “Cryptography techniques for privacy-preserving data mining,” SIGKDD Explorations, vol. 4, no. 2, pp. 12–19, 2002. View at Google Scholar
  11. E. Bingam and H. Mannila, “Random projection in dimensionality reduction: applications to image and text data,” in Proceedings of the 7th International Conference on Knowledge Discovery and Data Mining, pp. 245–250, ACM SIGKDD, San Francisco, Calif, USA, August 2001.
  12. D. Cai, X. He, and J. Han, “SRDA: an efficient algorithm for large-scale discriminant analysis,” IEEE Transactions on Knowledge and Data Engineering, vol. 20, no. 1, pp. 1–12, 2008. View at Publisher · View at Google Scholar · View at Scopus
  13. E. Bertino, D. Lin, and W. Jiang, “A survey of quantification of privacy preserving data mining algorithms,” in Privacy-Preserving Data Mining, C. C. Aggarwal and P. S. Yu, Eds., vol. 34, pp. 183–205, Kluwer Academic, Norwell, Mass, USA, 2008. View at Publisher · View at Google Scholar
  14. Y.-S. Moon, H.-S. Kim, S.-P. Kim, and E. Bertino, “Publishing time-series data under preservation of privacy and distance orders,” in Proceedings of the 21st International Conference on Database and Expert Systems Applications, Part II, pp. 17–31, Publishing Time-Series Data Under Preservation of Privacy and Distance Orders, Bilbao, Spain, August 2010.
  15. H.-S. Won, S.-P. Kim, S. Lee, M.-J. Choi, and Y.-S. Moon, “Secure principal component analysis in multiple distributed nodes,” Security and Communication Networks, vol. 9, no. 14, pp. 2348–2358, 2016. View at Publisher · View at Google Scholar · View at Scopus
  16. M. Shah and H. D. Joshi, “Privacy preserving data mining techniques in a distributed environment,” International Journal of Computer Applications, vol. 94, no. 6, pp. 21–27, 2014. View at Publisher · View at Google Scholar
  17. W. Jiang and B. K. Samanthula, “N-gram based secure similar document detection,” in Proceedings of the IFIP Annual Conference on Data and Applications Security and Privacy, pp. 239–246, Richmond, Va, USA, July 2011.
  18. B. Goethals, S. Laur, H. Lipmaa, and T. Mielikainen, “On secure scalar product computation for privacy-preserving data mining,” in Proceedings of the 7th Annual International Conference in Information Security & Cryptology, pp. 104–120, Seoul, Republic of Korea, December 2004.
  19. S. Berchtold, C. Böhm, and H. Kriegal, “The pyramid-technique,” ACM SIGMOD Record, vol. 27, no. 2, pp. 142–153, 1998. View at Publisher · View at Google Scholar
  20. Y.-S. Moon, B.-S. Kim, M. S. Kim, and K.-Y. Whang, “Scaling-invariant boundary image matching using time-series matching techniques,” Data and Knowledge Engineering, vol. 69, no. 10, pp. 1022–1042, 2010. View at Publisher · View at Google Scholar · View at Scopus
  21. Y.-S. Moon and W.-K. Loh, “Triangular inequality-based rotation-invariant boundary image matching for smart devices,” Multimedia Systems, vol. 21, no. 1, pp. 15–28, 2014. View at Publisher · View at Google Scholar · View at Scopus
  22. F. Sebastiani, “Machine learning in automated text categorization,” ACM Computing Surveys, vol. 34, no. 1, pp. 1–47, 2002. View at Publisher · View at Google Scholar · View at Scopus
  23. Y. Yang and J. O. Pedersen, “A comparative study on feature selection in text categorization,” in Proceedings of the 14th International Conference on Machine Learning (ICML '97), pp. 412–420, Nashville, Tenn, USA, July 1997.
  24. Y.-S. Moon and B. S. Lee, “Safe MBR-transformation in similar sequence matching,” Information Sciences, vol. 270, pp. 28–40, 2014. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  25. W. Han, J. Lee, Y. Moon, S. Hwang, and H. Yu, “A new approach for processing ranked subsequence matching based on ranked union,” in Proceedings of the ACM SIGMOD International Conference on Management of data (SIGMOD '11), pp. 457–468, Athens, Greece, June 2011. View at Publisher · View at Google Scholar
  26. Bag of Words Data Sets, UCI Machine Learning Repository, https://archive.ics.uci.edu/ml/datasets/Bag+of+Words.