Scaling Human-Object Interaction Recognition in the Video through Zero-Shot Learning

<div>Such samples of incorrect classification of our final model. In the first row, the true class is “opening a laptop” but predicted as “fixing a laptop.” In the second row, the class of “fixing a vacuum” was predicted as “holding a vacuum.” Row 3 shows the “working at a table” that is predicted as “watching at a book,” and the final row shows the “grasping onto a doorknob,” which is predicted by our model as “fixing a door.”</div>

Computational Intelligence and Neuroscience

fig9

Figure 9

Figure 9: Scaling Human-Object Interaction Recognition in the Video through Zero-Shot Learning