Research Article

Multimodal Feature Learning for Video Captioning

Table 1

Performance of semantic feature networks on MSVD dataset.

NetworksVal-accuracyTest-accuracy

DSN99.42%99.43%
SSN99.61%99.64%