Review Article

Bag-of-Words Representation in Image Annotation: A Review

Table 2

Comparisons of datasets used and annotation performance.

WorkCategoriesDatasetNo. of categoriesNo. of imagesBaseline
SceneObject

2012

de Campos et al. [70]vPASCAL′07/′0816209292
Elfiky et al. [97]vvSport event/15 scene/butterflies17/
PASCAL′07/′09
15/206000/21000/2000/160k/4194kSpatial pyramid
Fernando et al. [68]vPASCAL′06/Caltech 101810/10/115304/3044BoW
Gavves et al. [77]vOxford 5k19115062
Kesorn and Poslad [80]vOlympic organization website + Google images816000pLSA
Lee and Grauman [103]vvMSRC-v020/-v2/PASCAL′08/Corel/Gould′0921/20/7/143457/591/1023/100/715LDA
Qin and Yung [64]vSCENE-8/-158/152688/4485BoW
Romberg et al. [102]vvFlickr-10M>30010080251pLSA
Shang and Xiao [99]vCaltech 256/MSRC20/20BoW
Stottinger et al. [104]PASCAL′07209963
Tong et al. [100]vvTattoo dataset/Oxford/Flickr101745/5062/1002805RS21/HKM/AKM

2011

Hare et al. [73]vvUK Bench/MIR Flickr-2500022BoW
López-Sastre et al. [78]vCaltech 10110890Mikolajczyk et al. [25];
Stark and Schiele [119]
Luo et al. [18]vCaltech 4/Graz-02235/2400/200 Li and Perona [31];
Moosmann et al. [72]
Van Gemert [65]vvCorel/PASCAL′09202000/7054BoW/spatial pyramid
Yang et al. [37]vPASCAL′08208445Divvala et al. [120];
Zhong et al. [109]
Zhang et al. [76]vvGoogle images/Caltech 101and25615376500BoW
Zhang et al. [38]vvImageNet2415 queries1.5 millionNister and Stewenius [121];
Zhong et al. [61]

2010

Bae and Juang [79]vCorel1520000LSA
Chen et al. [62]vOxford buildings/Flickr 1k11 (55 queries)/7
(56 queries)
5062/11282Sivic and Zisserman [14];
Philbin et al. [45];
Lazebnik et al. [48]
Cheng and Wang [82]v6-scene dataset6700Vogel and Schiele [122]; Bosch et al. [98];
Quelhas et al. [13];
Boutell et al. [123]
Ding et al. [105]vTRECVID′06252061901Binary/TF/TF-IDF weighting
Jégou et al. [22]vvHolidays26/Oxford 5k/U. of Kentucky object recognition27500/11 (55 queries)1491/5062/6376BoW by HE28/
Jiang et al. [17]vTRECVID′062079484
Li and Godli [87]vvCorel505000Duygulu et al. [118];
Jeon et al. [124];
Lavrenko et al. [125];
Monay and Gatica-Perez, 2007 [126]
Qin and Yung [106]vv  8/13/152688/3759/4485Siagian and Itti [127, 128];
Bosch et al. [29];
Li and Perona [31];
Quelhas et al. [60];
Lazebnik et al. [48]
Tirilly et al. [107]vvU. of Kentucky object recognition/Oxford 5k/Caltech 6 & 101300/55/200 queries10200/5062/8197TF-IDF weighting
Uijlings et al. [33]vPASCAL′07/TRECVID′05/Caltech 10120/101/159963/12914/4485BoW
Wu et al. [69]vLabelMe29/PASCAL′06495/10BoW; Bar-Hillel et al. [129];
Davis et al. [130]; Goldberger et al. [131];
Perronnin et al. [113];
Weinberger et al. [132]

2009

Chen et al. [39]v  LabelMe8 (448 queries)2689Yang et al. [133]
Lu and Ip (a) [41]vLabelMe + Web images31239k-NN; LDA
Lu and Ip (b) [42]vvCorel/histological images10/51000pLSA/SVM
S. Kim and D. Kim [40]vvCorel/histological images10/51000LLP/GLP/SVM/pLSA
Uijlings et al. [43]vPASCAL′07209963BoW
Xiang et al. [108]vCorel/TRECVID′0550/395000Feng et al. [134]
Zhang et al. [94]vvGoogle images/Corel/Caltech 101 and 2561506 queries/50/15376500/500/2250BoW

2008

Bosch et al. [98]v6-/8-/13-/15-scene6/8/13/152688/702BoW
Liu et al. [96]vPASCAL′06/Caltech 4/MSRC-v220/5/15Savarese et al. [135]
Marszalek and Schmid [109]vvCaltech 256256Lazebnik et al.[48];
Zhang et al. [35]
Rasiwasia and Vasconcelos [66]v15-natural scene/Corel15/50Bosch et al. [29];
Lazebnik et al. [48];
Li and Perona [31];
Liu and Shah [136]
Tirilly et al. [81]vCaltech 6 and 1016/1015435/8697SVM
Van de Sande et al. [110]vvPASCAL′07/TRECVID′0520
Zheng et al. [71]vCaltech 101/PASCAL′0512/4BoW

2007

Bosch et al. [24]vCorel6700Global and block-based features + k-NN; Vogel and Schiele [122]
Chum et al. [52]vvOxford + Flickr104844BoW
Gökalp and Aksoy [28]vLabelMe71050Bag of individual regions/bag of region pairs
Hörster and Lienhart [21]vFlickr12 (60 queries)246348BoW/color based BoW
Jegou et al. [74]vvObject recognition benchmark3010200Object recognition benchmark
Li and Fei-Fei [111]v8 events8240LDA
Lienhart and Slaney [93]vFlickr12 (60 queries)253460LSA
Philbin et al. [45]vvOxford 5 k/Flickr 1 and 211/145 and 450 tags5062/99782/1040801BoW
Quelhas et al. [13]vCorel + Web images56680/3805/9457/6364BoW; Vailaya et al. [137]
Wu et al. [46]vvCaltech 7/Corel8/6600LDA/pLSA
Yuan et al. [112]vCaltech 1012558BoW

2006

Agarwal and Triggs [47]vCaltech 7 + Graz/KTH-TIPS31/Cal-IPNP324/10/21337/810/360LDA
Bosch et al. [29]v6-/8-/13-scene6/8/132688/702/1071BoW
Lazebnik et al. [48]vv15-scene/Caltech 101/Graz15/101/2Zhang et al. [138];
Opelt et al. [139]
Marszalek and Schmid [49]vPASCAL′05 Wang et al. [20]
Monay et al. [50]vCorel46600
Moosmann et al. [72]vGraz-02/PASCAL′053/4BoW
Perronnin et al. [113]vvCorel101000BoW; Farquhar et al. [140];
Deselaers et al. [141]

16http://pascallin.ecs.soton.ac.uk/challenges/VOC/.
17http://www.comp.leeds.ac.uk/scs6jwks/dataset/leedsbutterfly/.
18http://www.vision.caltech.edu/Image_Datasets/Caltech101/.
19http://www.robots.ox.ac.uk/~vgg/data/oxbuildings/.
20http://www.cs.utexas.edu/~grauman/research/datasets.html.
21Random seed [142].
22http://press.liacs.nl/mirflickr/.
23http://lear.inrialpes.fr/people/marszalek/data/ig02/.
24http://www.image-net.org/.
25http://www-nlpir.nist.gov/projects/tv2006/tv2006.html.
26http://lear.inrialpes.fr/~jegou/data.php.
27http://vis.uky.edu/.
28Hamming embedding.
29http://labelme.csail.mit.edu/.
30http://vis.uky.edu/%7Estewe/ukbench/.
31http://www.nada.kth.se/cvap/databases/kth-tips/.
32http://crl.ucsd.edu/.