Scaling Human-Object Interaction Recognition in the Video through Zero-Shot Learning

<table class="table-group" id="tab3"><tr><td><table class="table"><tr><td class="thead-hr" colspan="3"><hr/></td></tr><tr class="thead"><td class="align_left" rowspan="2">Method</td><td class="align_center" colspan="2">mAP (%)</td></tr><tr class="thead"><td class="align_center">ALL data</td><td class="align_center">Unseen data (2A + 1B)</td></tr><tr><td class="thead-hr" colspan="3"><hr/></td></tr><tr><td class="align_left">2Stream – WE-VLAD</td><td class="align_center">17.8</td><td class="align_center">14.83</td></tr><tr><td class="align_left">3Stream – WE-VLAD</td><td class="align_center">19.21</td><td class="align_center">16.65</td></tr><tr><td class="align_left">2Stream + WE-VLAD</td><td class="align_center">19.5</td><td class="align_center">16.08</td></tr><tr><td class="align_left">3Stream + WE-VLAD</td><td class="align_center">20.84</td><td class="align_center">17.32</td></tr><tr><td class="align_left">2Stream + WE + VLAD (rnn)</td><td class="align_center">20.86</td><td class="align_center">16.96</td></tr><tr><td class="align_left">3Stream + WE + VLAD (rnn)</td><td class="align_center">21.27</td><td class="align_center">17.63</td></tr><tr class="table-tr"><td colspan="3"><hr class="tbody-hr"/></td></tr></table></td></tr></table>

<div>The impact of the optical flow on the proposed system’s performance. The model trained with 1A + 2B and tested on 2A + 1B and all data.</div>

Computational Intelligence and Neuroscience

tab3

Table 3

Table 3: Scaling Human-Object Interaction Recognition in the Video through Zero-Shot Learning