Anomaly Detection via Midlevel Visual Attributes

<div>Overview of our approach. This is an example of two-level spatiotemporal pyramid. The input is a video stream. Then a 3D volume around a pixel is constructed represented by the outer red cube. Then it is segmented into 8 (<svg height="9.06134pt" id="M6" style="vertical-align:-0.3625803pt" version="1.1" viewbox="-0.0498162 -8.69876 47.7594 9.06134" width="47.7594pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.0135,0,0,-0.0135,0,0)"><path d="M412 140C382 77 369 73 315 73H129L270 222C362 320 402 379 402 466C402 571 322 635 234 635C177 635 130 609 99 576L42 495L64 475C90 514 133 568 201 568C274 568 318 519 318 435C318 349 255 267 193 193C144 135 87 78 32 23V0H405C417 45 427 89 440 131L412 140Z" id="g113-51"></path><glyph.data ascent="3473" descent="-2876" horiz-adv-x="480" vert-adv-y="480"></glyph.data></g><g transform="matrix(.0135,0,0,-0.0135,9.531,0)"><path d="M528 54L331 254L528 455L492 493L294 291L96 493L60 455L257 254L60 54L96 16L294 217L492 16L528 54Z" id="g117-42"></path><glyph.data ascent="3473" descent="-2876" horiz-adv-x="587" vert-adv-y="587"></glyph.data></g><g transform="matrix(.0135,0,0,-0.0135,20.512,0)"><path d="M412 140C382 77 369 73 315 73H129L270 222C362 320 402 379 402 466C402 571 322 635 234 635C177 635 130 609 99 576L42 495L64 475C90 514 133 568 201 568C274 568 318 519 318 435C318 349 255 267 193 193C144 135 87 78 32 23V0H405C417 45 427 89 440 131L412 140Z" id="g113-51"></path><glyph.data ascent="3473" descent="-2876" horiz-adv-x="480" vert-adv-y="480"></glyph.data></g><g transform="matrix(.0135,0,0,-0.0135,30.043,0)"><path d="M528 54L331 254L528 455L492 493L294 291L96 493L60 455L257 254L60 54L96 16L294 217L492 16L528 54Z" id="g117-42"></path><glyph.data ascent="3473" descent="-2876" horiz-adv-x="587" vert-adv-y="587"></glyph.data></g><g transform="matrix(.0135,0,0,-0.0135,41.024,0)"><path d="M412 140C382 77 369 73 315 73H129L270 222C362 320 402 379 402 466C402 571 322 635 234 635C177 635 130 609 99 576L42 495L64 475C90 514 133 568 201 568C274 568 318 519 318 435C318 349 255 267 193 193C144 135 87 78 32 23V0H405C417 45 427 89 440 131L412 140Z" id="g113-51"></path><glyph.data ascent="3473" descent="-2876" horiz-adv-x="480" vert-adv-y="480"></glyph.data></g></svg>) smaller cubes denoted by different numbers in this figure. The smaller cubes form the lower but finer level of the pyramid. HOG features are extracted for each smaller cube. And the HOG features of upper level cube can be constructed efficiently from lower level cubes. We use visual attribute representation to bridge the semantic gap between low-level feature and high-level event. The three-level (feature-attribute-event) framework can be modeled by extreme learning machine. Finally the anomaly detection is completed by combining the outputs of the machine.</div>

Mathematical Problems in Engineering

Anomaly Detection via Midlevel Visual Attributes

Figure 1