Research Article
Deep Visual Semantic Embedding with Text Data Augmentation and Word Embedding Initialization
Table 4
Experimental results with text data augmentation on Flickr30k.
| Model | Feature (s) | Image count | Image retrieval | R@1 | R@5 | R@10 | Med r | R@1 | R@5 | R@10 | Med r |
| DeViSE | Word2Vec | 4.5 | 18.1 | 29.2 | 26 | 6.7 | 21.9 | 32.7 | 25 | DeFrag | R–CNN | 16.4 | 40.2 | 54.7 | 8 | 10.3 | 31.4 | 44.5 | 13 | VSA | R–CNN + BRNN | 22.2 | 48.2 | 61.4 | 4.8 | 15.2 | 37.7 | 50.5 | 9.2 | UVSE | ConvNet + LSTM | 14.8 | 39.2 | 50.9 | 10 | 11.8 | 34.0 | 46.3 | 13 | UVSE (VGG) | VGG + LSTM | 23.0 | 50.7 | 62.9 | 5 | 16.8 | 42.0 | 56.5 | 8 | VSE++ | VGG + GRU + HNM | 29.0 | 54.4 | 66.5 | 4 | 20.3 | 48 | 59.9 | 6 | Ours | Aug | 30.6 | 57.9 | 68.5 | 4 | 21.4 | 49.3 | 61.4 | 6 | Ours | Aug + Word2Vec | 33.4 | 59.2 | 69.6 | 3 | 23.3 | 49.9 | 61.7 | 6 |
|
|