Research Article
Vision Transformer and Deep Sequence Learning for Human Activity Recognition in Surveillance Videos
Table 2
Different variants of ViT model used for image classification.
| ||||||||||||||||||||||||||||||||||||||||||
The proposed method for features extraction is represented in bold text. |