Table of Contents Author Guidelines Submit a Manuscript
Computational Intelligence and Neuroscience
Volume 2015, Article ID 217216, 13 pages
Research Article

MapReduce Based Personalized Locality Sensitive Hashing for Similarity Joins on Large Scale Data

1School of Information Science and Technology, Xiamen University, Xiamen 361005, China
2Shenzhen Research Institute of Xiamen University, Shenzhen 518058, China

Received 28 September 2014; Revised 24 February 2015; Accepted 2 March 2015

Academic Editor: J. Alfredo Hernandez

Copyright © 2015 Jingjing Wang and Chen Lin. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Locality Sensitive Hashing (LSH) has been proposed as an efficient technique for similarity joins for high dimensional data. The efficiency and approximation rate of LSH depend on the number of generated false positive instances and false negative instances. In many domains, reducing the number of false positives is crucial. Furthermore, in some application scenarios, balancing false positives and false negatives is favored. To address these problems, in this paper we propose Personalized Locality Sensitive Hashing (PLSH), where a new banding scheme is embedded to tailor the number of false positives, false negatives, and the sum of both. PLSH is implemented in parallel using MapReduce framework to deal with similarity joins on large scale data. Experimental studies on real and simulated data verify the efficiency and effectiveness of our proposed PLSH technique, compared with state-of-the-art methods.