Research Article

Multimodal Feature Learning for Video Captioning

Table 4

Performance comparison with other state-of-the-art models on MSR-VTT dataset.

ModelsBLEU@4

MP-LSTM (V) [1]34.8
MP-LSTM (C) [1]35.4
MP-LSTM (V + C) [1]35.8
SA (V) [2]35.6
SA (C) [2]36.1
SA (V + C) [2]36.6
hLSTMt [10]37.4
hLSTMat [10]38.3
SeFLA41.8