Research Article

Scaling Human-Object Interaction Recognition in the Video through Zero-Shot Learning

Table 2

The effect of local feature aggregation on HOI recognition performance. The model was trained with 1A + 2B and tested on 2A + 1B and all data.

MethodmAP (%)
ALL dataUnseen data (2A + 1B)

2Stream – WE - VLAD17.814.83
2Stream + WE - VLAD19.516.08
2Stream – WE + VLAD (rnn)18.715.33
2Stream + WE + VLAD (rnn)20.9616.96
2Stream + WE + VLAD (cnn)20.6516.65