Research Article

Multimodal Feature Learning for Video Captioning

Table 2

Comparison of different feature sets on MSVD dataset.

Feature setsB@1B@2B@3B@4CIDEr

CGN66.147.837.126.526.4
DSN + CGN76.058.145.735.850.0
SSN + CGN78.863.451.441.477.8
DSN + SSN + CGN84.870.860.050.094.3