Vision Transformer and Deep Sequence Learning for Human Activity Recognition in Surveillance Videos

<div>Class-wise accuracy of UCF50 dataset on the proposed ViT and multilayer LSTM model.</div>

Computational Intelligence and Neuroscience

Figure 5: Vision Transformer and Deep Sequence Learning for Human Activity Recognition in Surveillance Videos