Research Article
Scaling Human-Object Interaction Recognition in the Video through Zero-Shot Learning
Table 2
The effect of local feature aggregation on HOI recognition performance. The model was trained with 1A + 2B and tested on 2A + 1B and all data.
| Method | mAP (%) | ALL data | Unseen data (2A + 1B) |
| 2Stream – WE - VLAD | 17.8 | 14.83 | 2Stream + WE - VLAD | 19.5 | 16.08 | 2Stream – WE + VLAD (rnn) | 18.7 | 15.33 | 2Stream + WE + VLAD (rnn) | 20.96 | 16.96 | 2Stream + WE + VLAD (cnn) | 20.65 | 16.65 |
|
|