Vision Transformer and Deep Sequence Learning for Human Activity Recognition in Surveillance Videos

<div>Class-wise accuracy of HMDB51 dataset on the proposed ViT and multilayer LSTM model.</div>

Computational Intelligence and Neuroscience

Figure 6: Vision Transformer and Deep Sequence Learning for Human Activity Recognition in Surveillance Videos