Table of Contents Author Guidelines Submit a Manuscript
The Scientific World Journal
Volume 2013, Article ID 795408, 11 pages
Research Article

Large Scale Near-Duplicate Celebrity Web Images Retrieval Using Visual and Textual Features

College of Information Systems and Management, National University of Defense Technology, Changsha 410073, China

Received 10 July 2013; Accepted 2 August 2013

Academic Editors: J. T. Fernandez-Breis, S.-S. Liaw, and J. H. Sossa

Copyright © 2013 Fengcai Qiao et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Near-duplicate image retrieval is a classical research problem in computer vision toward many applications such as image annotation and content-based image retrieval. On the web, near-duplication is more prevalent in queries for celebrities and historical figures which are of particular interest to the end users. Existing methods such as bag-of-visual-words (BoVW) solve this problem mainly by exploiting purely visual features. To overcome this limitation, this paper proposes a novel text-based data-driven reranking framework, which utilizes textual features and is combined with state-of-art BoVW schemes. Under this framework, the input of the retrieval procedure is still only a query image. To verify the proposed approach, a dataset of 2 million images of 1089 different celebrities together with their accompanying texts is constructed. In addition, we comprehensively analyze the different categories of near duplication observed in our constructed dataset. Experimental results on this dataset show that the proposed framework can achieve higher mean average precision (mAP) with an improvement of 21% on average in comparison with the approaches based only on visual features, while does not notably prolong the retrieval time.