Table of Contents Author Guidelines Submit a Manuscript
Scientific Programming
Volume 2017, Article ID 3072813, 9 pages
https://doi.org/10.1155/2017/3072813
Research Article

Cross-Checking Multiple Data Sources Using Multiway Join in MapReduce

National Technical University of Athens, Athens, Greece

Correspondence should be addressed to Zaid Momani; moc.oohay@ynamom_dez

Received 27 May 2017; Revised 30 August 2017; Accepted 27 September 2017; Published 20 November 2017

Academic Editor: Marco Aldinucci

Copyright © 2017 Foto Afrati et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

As data sources accumulate information and data size escalates it becomes more and more difficult to maintain the correctness and validity of these datasets. Therefore, tools must emerge to facilitate this daunting task. Fact checking usually involves a large number of data sources that talk about the same thing but we are not sure which holds the correct information or which has any information at all about the query we care for. A join among all or some data sources can guide us through a fact-checking process. However, when we want to perform this join on a distributed computational environment such as MapReduce, it is not obvious how to distribute efficiently the records in the data sources to the reduce tasks in order to join any subset of them in a single MapReduce job. To this end, we propose an efficient approach using the multiway join to cross-check these data sources in a single round.