Research Article

Multimodal Feature Learning for Video Captioning

Table 3

Performance comparison with other state-of-the-art models on MSVD dataset.

ModelsB@1B@2B@3B@4CIDEr

SCN [11]---51.177.7
LSTM-TSA [12]82.872.062.852.874.0
hLSTMat [10]82.972.263.053.073.8
SeFLA84.870.860.050.094.3