Scaling Human-Object Interaction Recognition in the Video through Zero-Shot Learning

<table class="table-group" id="tab4"><tr><td><table class="table"><tr><td class="thead-hr" colspan="3"><hr/></td></tr><tr class="thead"><td class="align_left" rowspan="2">Method</td><td class="align_center" colspan="2">mAP (%)</td></tr><tr class="thead"><td class="align_center">LSTM</td><td class="align_center">GRU</td></tr><tr><td class="thead-hr" colspan="3"><hr/></td></tr><tr><td class="align_left">2Stream + WE - VLAD</td><td class="align_center">20.28</td><td class="align_center">20.33</td></tr><tr><td class="align_left">3Stream + WE - VLAD</td><td class="align_center">20.94</td><td class="align_center">20.94</td></tr><tr><td class="align_left">2Stream + WE + VLAD (rnn)</td><td class="align_center">20.74</td><td class="align_center">20.78</td></tr><tr><td class="align_left">3Stream + WE + VLAD (rnn)</td><td class="align_center">21.31</td><td class="align_center">21.35</td></tr><tr class="table-tr"><td colspan="3"><hr class="tbody-hr"/></td></tr></table></td></tr></table>

<div>The impact of the RNNs (LSTM/GRU) on the proposed system’s performance. Training and testing are performed on the same subset (averaged on four subsets).</div>

Computational Intelligence and Neuroscience

tab4

Table 4

Table 4: Scaling Human-Object Interaction Recognition in the Video through Zero-Shot Learning