Research Article
Multimodal Feature Learning for Video Captioning
Table 4
Performance comparison with other state-of-the-art models on MSR-VTT dataset.
| Models | BLEU@4 |
| MP-LSTM (V) [1] | 34.8 | MP-LSTM (C) [1] | 35.4 | MP-LSTM (V + C) [1] | 35.8 | SA (V) [2] | 35.6 | SA (C) [2] | 36.1 | SA (V + C) [2] | 36.6 | hLSTMt [10] | 37.4 | hLSTMat [10] | 38.3 | SeFLA | 41.8 |
|
|