Research Article

Deep Visual Semantic Embedding with Text Data Augmentation and Word Embedding Initialization

Table 3

Experimental results with text data augmentation on Flickr8k.

ModelFeature (s)Image countImage retrieval
R@1R@5R@10Med rR@1R@5R@10Med r

DeViSE [13]Word2Vec4.816.527.3285.920.129.629
DeFrag [14]R–CNN12.632.944.0149.729.642.515
VSA [16]R–CNN + BRNN16.540.654.27.611.832.144.712.4
UVSE [18]ConvNet + LSTM13.536.245.71310.43143.714
UVSE (VGG) [19]VGG + LSTM18.040.955.0812.537.051.510
VSE++ [21]VGG + GRU + HNM16.337.752.591233.348.111
OursAug20.944.158.8714.53951.210
OursAug + Word2Vec21.549.162.3615.138.953.19