Active Object Recognition with a Space-Variant Retina

<table>Mean per-class accuracy for our approach on Caltech-256 as a function of the number of training instances compared to the methods of [<a href="/journals/isrn/2013/138057/#B10">10</a>, <a href="/journals/isrn/2013/138057/#B26">26</a>, <a href="/journals/isrn/2013/138057/#B33">33</a>, <a href="/journals/isrn/2013/138057/#B36">36</a>, <a href="/journals/isrn/2013/138057/#B37">37</a>]. Chance performance is <svg height="14.85" id="M101" style="vertical-align:-2.22495pt" version="1.1" viewbox="0 0 39.762501 14.85" width="39.762501" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<g transform="matrix(.017,-0,0,-.017,.062,12.012)"><path d="M384 0h-275v27q67 5 81.5 18.5t14.5 68.5v385q0 38 -7.5 47.5t-40.5 10.5l-48 2v24q85 15 178 52v-521q0 -55 14.5 -68.5t82.5 -18.5v-27z" id="x31"></path></g><g transform="matrix(.017,-0,0,-.017,8.222,12.012)"><path d="M368 703l-264 -866h-60l265 866h59z" id="x2F"></path></g><g transform="matrix(.017,-0,0,-.017,15.225,12.012)"><path d="M412 140l28 -9q0 -2 -35 -131h-373v23q112 112 161 170q59 70 92 127t33 115q0 63 -31 98t-86 35q-75 0 -137 -93l-22 20l57 81q55 59 135 59q69 0 118.5 -46.5t49.5 -122.5q0 -62 -29.5 -114t-102.5 -130l-141 -149h186q42 0 58.5 10.5t38.5 56.5z" id="x32"></path></g><g transform="matrix(.017,-0,0,-.017,23.385,12.012)"><path d="M153 550l-26 -186q79 31 111 31q90 0 141.5 -51t51.5 -119q0 -93 -89 -166q-85 -69 -173 -71q-32 0 -61.5 11.5t-41.5 23.5q-18 17 -17 34q2 16 22 33q14 9 26 -1q61 -50 124 -50q60 0 93 43.5t33 104.5q0 69 -41.5 110t-121.5 41q-53 0 -102 -20l38 305h286l6 -8
l-26 -65h-233z" id="x35"></path></g><g transform="matrix(.017,-0,0,-.017,31.544,12.012)"><path d="M137 343l67 33q37 17 63 17q79 0 129.5 -53t50.5 -131q0 -92 -58 -156.5t-147 -64.5t-147 68t-58 182q0 63 17 119t43 95.5t61.5 72t69 52t67.5 31.5q62 22 128 33l6 -32q-56 -11 -108 -35q-149 -71 -184 -231zM227 337q-47 0 -95 -27q-6 -23 -6 -70q0 -93 36 -155.5
t96 -62.5q53 0 78 45.5t25 105.5q0 68 -35 116t-99 48z" id="x36"></path></g>
</svg>. Kanan [<a href="/journals/isrn/2013/138057/#B26">26</a>] used a gnostic field with color SIFT features, and our space-variant ICA filters achieve almost the same accuracy (slightly more for one training instance), despite being a self-taught approach. Bergamo and Torresani [<a href="/journals/isrn/2013/138057/#B37">37</a>] combined five kinds of features (color GIST, oriented HOG, unoriented HOG, SSIM, and SIFT) into a metadescriptor using spatial-pyramid histograms. Gehler and Nowozin [<a href="/journals/isrn/2013/138057/#B36">36</a>] used five types of engineered features (PHOG, SIFT, LBP, V1+ Gabors, and region covariance) and used multiple kernel learning to combine 39 different kernels. Kanan and Cottrell [<a href="/journals/isrn/2013/138057/#B10">10</a>] used a nonfoveated model of active vision (see discussion). Griffin et al. [<a href="/journals/isrn/2013/138057/#B33">33</a>] provides baseline results.</table>

International Scholarly Research Notices

Active Object Recognition with a Space-Variant Retina

Figure 12