Vision Transformer and Deep Sequence Learning for Human Activity Recognition in Surveillance Videos

<div>Confusion matrix of the proposed model. (a) UCF50 and (b) HMDB51 dataset.</div>

Computational Intelligence and Neuroscience

Figure 4: Vision Transformer and Deep Sequence Learning for Human Activity Recognition in Surveillance Videos