Vision Transformer and Deep Sequence Learning for Human Activity Recognition in Surveillance Videos

<table class="table-group" id="tab1"><tr><td><table class="table"><tr><td class="thead-hr" colspan="2"><hr/></td></tr><tr class="thead"><td class="align_left">Variables/symbol</td><td class="align_center">Description</td></tr><tr><td class="thead-hr" colspan="2"><hr/></td></tr><tr><td class="align_left"><span style="width: 12.4438ptpx;"><svg height="8.8162pt" id="M36" style="vertical-align:-0.1803007pt" version="1.1" viewbox="-0.0498162 -8.6359 12.4438 8.8162" width="12.4438pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M600 0V24L339 665L311 657L43 24V0H600ZM497 50H107L303 539L497 50Z"></path></g><g transform="matrix(.013,0,0,-0.013,8.327,0)"><path d="M298 36L289 62C276 55 253 45 228 45C202 45 169 60 169 141V397H276C289 405 292 426 282 437H169V574L155 576L90 509V437H45L17 408L21 397H90V107C90 28 125 -12 188 -12C198 -12 213 -8 230 1L298 36Z"></path></g></svg></span></td><td class="align_center">Input over time <i>t</i></td></tr><tr><td class="align_left"><span style="width: 6.54571ptpx;"><svg height="9.49473pt" id="M37" style="vertical-align:-0.2063999pt" version="1.1" viewbox="-0.0498162 -9.28833 6.54571 9.49473" width="6.54571pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M471 379C471 433 467 511 427 578C379 658 305 712 208 712C154 712 98 699 70 684C70 668 95 634 111 634C141 651 180 665 234 665C306 665 350 627 372 575C391 530 393 480 393 442V415C371 436 344 448 313 448C280 448 227 437 176 411C100 372 23 279 23 148C23 67 64 -12 180 -12C293 -12 360 46 407 129C450 205 471 310 471 379ZM382 335C382 142 289 50 238 35C226 31 213 28 200 28C139 28 110 91 110 164C110 310 191 382 223 395C245 404 259 409 280 409C305 409 332 404 359 388C379 376 382 362 382 335Z"></path></g></svg></span></td><td class="align_center">Sigmoid activation function</td></tr><tr><td class="align_left"><span style="width: 9.03804ptpx;"><svg height="6.03027pt" id="M38" style="vertical-align:-0.1802902pt" version="1.1" viewbox="-0.0498162 -5.84998 9.03804 6.03027" width="9.03804pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M696 437H516V411C580 402 587 397 574 345C555 271 527 183 505 114H503C461 222 420 328 383 432H357C315 316 278 214 241 118H239C214 195 189 274 165 350C151 396 156 402 216 411V437H9V411C57 405 65 398 83 346C125 226 162 108 199 -11H227C267 100 308 199 346 300C387 195 428 97 468 -11H497C542 124 587 257 617 337C637 392 646 404 696 411V437Z"></path></g></svg></span></td><td class="align_center">Weights</td></tr><tr><td class="align_left"><span style="width: 6.7284ptpx;"><svg height="9.48819pt" id="M39" style="vertical-align:-0.1802893pt" version="1.1" viewbox="-0.0498162 -9.3079 6.7284 9.48819" width="6.7284pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M152 404V712C115 698 54 683 7 677V654C71 648 73 642 73 579V24C128 -2 179 -12 220 -12C353 -12 471 92 471 238C471 357 381 449 274 449C262 449 249 446 233 439L152 404ZM152 374C170 384 202 393 230 393C313 393 382 326 382 213C382 97 330 26 246 26C194 26 165 62 158 81C154 91 152 101 152 116V374Z"></path></g></svg></span></td><td class="align_center">Bias terms</td></tr><tr><td class="align_left"><span style="width: 6.88072ptpx;"><svg height="11.5564pt" id="M40" style="vertical-align:-3.27283pt" version="1.1" viewbox="-0.0498162 -8.28357 6.88072 11.5564" width="6.88072pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M135 536C164 536 186 560 186 587C186 617 164 639 136 639C109 639 85 617 85 587C85 560 109 536 135 536ZM252 0V26C188 32 181 38 181 106V451C138 433 90 420 39 412V388C99 379 102 374 102 312V106C102 38 95 32 32 26V0H252Z"></path></g><g transform="matrix(.0091,0,0,-0.0091,3.497,3.132)"><path d="M298 36L289 62C276 55 253 45 228 45C202 45 169 60 169 141V397H276C289 405 292 426 282 437H169V574L155 576L90 509V437H45L17 408L21 397H90V107C90 28 125 -12 188 -12C198 -12 213 -8 230 1L298 36Z"></path></g></svg></span></td><td class="align_center">Input gate</td></tr><tr><td class="align_left"><span style="width: 7.24608ptpx;"><svg height="12.5807pt" id="M41" style="vertical-align:-3.272799pt" version="1.1" viewbox="-0.0498162 -9.3079 7.24608 12.5807" width="7.24608pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M54 437L27 408L31 397H101V103C101 37 94 32 30 26V0H266V25C187 33 180 36 180 110V397H288C299 404 304 428 298 437H180V477C179 562 190 610 203 630C214 647 230 659 256 659C289 659 318 641 337 622C346 612 355 612 364 619C374 627 380 635 383 643C388 655 387 667 378 678C362 697 333 710 299 712C260 707 225 689 189 659C135 613 119 563 112 541S101 490 101 458V437H54Z"></path></g><g transform="matrix(.0091,0,0,-0.0091,3.862,3.132)"><path d="M298 36L289 62C276 55 253 45 228 45C202 45 169 60 169 141V397H276C289 405 292 426 282 437H169V574L155 576L90 509V437H45L17 408L21 397H90V107C90 28 125 -12 188 -12C198 -12 213 -8 230 1L298 36Z"></path></g></svg></span></td><td class="align_center">Forget gate</td></tr><tr><td class="align_left"></td><td class="align_center">Output gate</td></tr><tr><td class="align_left"><span style="width: 23.9397ptpx;"><svg height="9.48819pt" id="M42" style="vertical-align:-0.1802893pt" version="1.1" viewbox="-0.0498162 -9.3079 23.9397 9.48819" width="23.9397pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M298 36L289 62C276 55 253 45 228 45C202 45 169 60 169 141V397H276C289 405 292 426 282 437H169V574L155 576L90 509V437H45L17 408L21 397H90V107C90 28 125 -12 188 -12C198 -12 213 -8 230 1L298 36Z"></path></g><g transform="matrix(.013,0,0,-0.013,4.043,0)"><path d="M433 39L423 65C413 59 399 54 387 54C370 54 352 69 352 114V299C352 352 342 392 307 422C285 440 255 449 225 449C168 437 102 399 75 379C56 365 44 353 44 339C44 315 69 296 87 296C101 296 111 303 116 319C124 349 133 371 145 385C156 397 171 404 190 404C241 404 275 364 275 291V274C253 256 180 229 120 209C65 190 39 159 39 110C39 47 88 -12 159 -12C189 -12 237 25 277 52C282 35 288 21 301 8C312 -3 333 -12 348 -12L433 39ZM275 84C256 65 221 48 195 48C164 48 124 73 124 124C124 161 146 180 185 198C206 208 254 229 275 240V84Z"></path></g><g transform="matrix(.013,0,0,-0.013,9.633,0)"><path d="M524 0V26C466 32 460 36 460 104V297C460 393 411 449 331 449C302 449 276 437 248 419C223 402 201 387 181 372V451C137 432 90 420 42 411V388C96 378 102 374 102 310V104C102 38 97 33 29 26V0H246V26C187 32 181 36 181 104V339C211 365 250 390 290 390C357 390 381 345 381 276V109C381 40 374 32 315 26V0H524Z"></path></g><g transform="matrix(.013,0,0,-0.013,16.809,0)"><path d="M513 0V26C455 31 447 37 447 103V278C447 398 390 449 309 449C255 449 202 412 166 376V712C127 700 67 684 19 677V653C85 647 87 643 87 580V103C87 37 79 31 19 26L18 0H231V26C171 32 166 39 166 103V341C194 373 232 392 270 392C337 392 368 351 368 269V103C368 38 360 32 302 26V0H513Z"></path></g></svg></span></td><td class="align_center">Tan <i>h</i> activation function</td></tr><tr><td class="align_left"><span style="width: 43.8521ptpx;"><svg height="9.5404pt" id="M43" style="vertical-align:-0.2324991pt" version="1.1" viewbox="-0.0498162 -9.3079 43.8521 9.5404" width="43.8521pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M409 504C401 567 396 607 392 642C354 654 312 665 266 665C137 665 60 583 60 487C60 374 161 325 225 290C300 250 355 215 355 141C355 68 311 21 235 21C131 21 86 122 71 183L41 176C48 128 61 42 68 21C78 16 93 8 118 0C142 -7 175 -15 216 -15C349 -15 438 69 438 174C438 287 344 333 265 374C186 414 138 449 138 522C138 576 172 631 249 631C336 631 363 562 380 499L409 504Z"></path></g><g transform="matrix(.013,0,0,-0.013,6.345,0)"><path d="M257 449C165 449 37 374 37 209C37 98 119 -12 256 -12C355 -12 473 65 473 226C473 349 381 449 257 449ZM244 416C333 416 380 320 380 204C380 67 329 21 267 21C184 21 130 115 130 241C130 354 184 416 244 416Z"></path></g><g transform="matrix(.013,0,0,-0.013,12.818,0)"><path d="M576 34L567 61C554 54 531 44 505 44S447 60 447 141V397H554C566 405 569 427 560 437H447V569L433 572L368 504V437H180V463C179 563 189 607 204 629C217 649 235 659 258 659C285 659 317 642 335 624C344 615 351 615 359 620C369 627 378 637 382 646C388 658 386 671 375 682C361 697 337 710 300 712C264 707 229 690 191 659C135 614 118 562 111 539C105 518 101 492 101 461V437H52L26 408L31 397H101V104C101 41 94 33 31 26V0H253V26C187 32 180 40 180 105V397H368V107C368 27 403 -12 466 -12C476 -12 490 -8 508 0L576 34Z"></path></g><g transform="matrix(.013,0,0,-0.013,20.384,0)"><path d="M861 0V28C774 35 771 41 768 147L759 509C756 612 762 614 851 622V650H681L449 149L221 650H57V622C148 613 153 609 144 479L130 271C123 166 117 123 111 88C104 46 85 34 26 28V0H259V28C192 35 169 42 167 90C166 130 166 173 170 256L185 541H187L411 7H431L675 555H679L683 147C683 41 680 35 598 28V0H861Z"></path></g><g transform="matrix(.013,0,0,-0.013,31.745,0)"><path d="M433 39L423 65C413 59 399 54 387 54C370 54 352 69 352 114V299C352 352 342 392 307 422C285 440 255 449 225 449C168 437 102 399 75 379C56 365 44 353 44 339C44 315 69 296 87 296C101 296 111 303 116 319C124 349 133 371 145 385C156 397 171 404 190 404C241 404 275 364 275 291V274C253 256 180 229 120 209C65 190 39 159 39 110C39 47 88 -12 159 -12C189 -12 237 25 277 52C282 35 288 21 301 8C312 -3 333 -12 348 -12L433 39ZM275 84C256 65 221 48 195 48C164 48 124 73 124 124C124 161 146 180 185 198C206 208 254 229 275 240V84Z"></path></g><g transform="matrix(.013,0,0,-0.013,37.452,0)"><path d="M474 0V26C414 34 401 43 364 100L267 248C300 297 324 332 345 358C381 400 394 405 455 411V437H272V411C316 406 323 401 305 370C287 337 267 306 247 276L188 369C169 397 173 405 215 411V437H16V411C71 404 83 396 114 348L201 212C171 167 144 127 116 92C77 42 66 34 4 26V0H190V26C139 34 136 43 156 77C175 113 198 150 220 183L294 66C311 39 302 31 260 26V0H474Z"></path></g></svg></span></td><td class="align_center">Activation for the final classification</td></tr><tr><td class="align_left"><span style="width: 9.79486ptpx;"><svg height="8.70527pt" id="M44" style="vertical-align:-0.1802902pt" version="1.1" viewbox="-0.0498162 -8.52498 9.79486 8.70527" width="9.79486pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M719 650H483V622C553 619 576 606 581 560C585 532 588 487 588 394V148H585L167 650H20V622C67 618 88 610 109 585C127 562 129 557 129 484V264C129 173 126 124 123 93C118 44 92 31 35 28V0H272V28C204 32 181 44 176 95C173 124 170 172 170 264V515H172L598 -9H629V394C629 487 631 532 635 563C639 606 663 619 719 622V650Z"></path></g></svg></span></td><td class="align_center">Numbers of classes</td></tr><tr class="table-tr"><td colspan="2"><hr class="tbody-hr"/></td></tr></table></td></tr></table>

<div>Parameters details shown in the formulation of LSTM network.</div>

Computational Intelligence and Neuroscience

tab1

Table 1

Table 1: Vision Transformer and Deep Sequence Learning for Human Activity Recognition in Surveillance Videos