Research Article
Scaling Human-Object Interaction Recognition in the Video through Zero-Shot Learning
Table 1
Zero-shot HOI recognition mAP on Charades dataset. The model trained with 1A + 2B and tested on 2A + 1B and all data.
| Method | mAP (%) on the test set | All data | Unseen data (2A + 1B) |
| Chance | 1.43 | 1.45 | Compositional [58] | 14.32 | 10.48 | SES [68] | 13.12 | 9.56 | DEM [75] | 11.78 | 8.97 | CC [76] | 14.31 | 10.13 | 1stream [14] | 16.48 | 11.23 | 2Stream – SI | 17.8 | 14.83 | 2Stream + SI | 19.5 | 16.08 |
|
|