Research Article

Deep Visual Semantic Embedding with Text Data Augmentation and Word Embedding Initialization

Table 4

Experimental results with text data augmentation on Flickr30k.

ModelFeature (s)Image countImage retrieval
R@1R@5R@10Med rR@1R@5R@10Med r

DeViSEWord2Vec4.518.129.2266.721.932.725
DeFragR–CNN16.440.254.7810.331.444.513
VSAR–CNN + BRNN22.248.261.44.815.237.750.59.2
UVSEConvNet + LSTM14.839.250.91011.834.046.313
UVSE (VGG)VGG + LSTM23.050.762.9516.842.056.58
VSE++VGG + GRU + HNM29.054.466.5420.34859.96
OursAug30.657.968.5421.449.361.46
OursAug + Word2Vec33.459.269.6323.349.961.76