Research Article
Multimodal Feature Learning for Video Captioning
Table 3
Performance comparison with other state-of-the-art models on MSVD dataset.
| Models | B@1 | B@2 | B@3 | B@4 | CIDEr |
| SCN [11] | - | - | - | 51.1 | 77.7 | LSTM-TSA [12] | 82.8 | 72.0 | 62.8 | 52.8 | 74.0 | hLSTMat [10] | 82.9 | 72.2 | 63.0 | 53.0 | 73.8 | SeFLA | 84.8 | 70.8 | 60.0 | 50.0 | 94.3 |
|
|