Scaling Human-Object Interaction Recognition in the Video through Zero-Shot Learning

<table class="table-group" id="tab5"><tr><td><table class="table"><tr><td class="thead-hr" colspan="3"><hr/></td></tr><tr class="thead"><td class="align_left" rowspan="2">Method</td><td class="align_center" colspan="2">mAP (%)</td></tr><tr class="thead"><td class="align_center">LSTM</td><td class="align_center">GRU</td></tr><tr><td class="thead-hr" colspan="3"><hr/></td></tr><tr><td class="align_left">2Stream + WE - VLAD</td><td class="align_center">19.5</td><td class="align_center">19.42</td></tr><tr><td class="align_left">3Stream + WE - VLAD</td><td class="align_center">20.84</td><td class="align_center">20.63</td></tr><tr><td class="align_left">2Stream + WE + VLAD (rnn)</td><td class="align_center">20.86</td><td class="align_center">20.64</td></tr><tr><td class="align_left">3Stream + WE + VLAD (rnn)</td><td class="align_center">21.27</td><td class="align_center">21.19</td></tr><tr class="table-tr"><td colspan="3"><hr class="tbody-hr"/></td></tr></table></td></tr></table>

<div>The impact of the RNNs (LSTM/GRU) on the proposed system’s performance. The model is trained with 1A + 2B and tested on all data.</div>

Computational Intelligence and Neuroscience

tab5

Table 5

Table 5: Scaling Human-Object Interaction Recognition in the Video through Zero-Shot Learning