Research Article

Scaling Human-Object Interaction Recognition in the Video through Zero-Shot Learning

Table 1

Zero-shot HOI recognition mAP on Charades dataset. The model trained with 1A + 2B and tested on 2A + 1B and all data.

MethodmAP (%) on the test set
All dataUnseen data (2A + 1B)

Chance1.431.45
Compositional [58]14.3210.48
SES [68]13.129.56
DEM [75]11.788.97
CC [76]14.3110.13
1stream [14]16.4811.23
2Stream – SI17.814.83
2Stream+SI19.516.08