Research Article

Scaling Human-Object Interaction Recognition in the Video through Zero-Shot Learning

Table 5

The impact of the RNNs (LSTM/GRU) on the proposed system’s performance. The model is trained with 1A + 2B and tested on all data.

MethodmAP (%)
LSTMGRU

2Stream + WE - VLAD19.519.42
3Stream + WE - VLAD20.8420.63
2Stream + WE + VLAD (rnn)20.8620.64
3Stream + WE + VLAD (rnn)21.2721.19