Research Article
An Enhanced Visual Attention Siamese Network That Updates Template Features Online
Algorithm 2
Inference of the proposed framework.
Input: test video; initial frame and bounding box of initial frame; | Compute by the backbone network; | Compute by the channel attention module; | Compute by the spatial self-attention module; | Preprocessing: crop and resize X and set three different scale patches . | While test video is not empty do | Get search patch X and corresponding bounding box; | Compute by the backbone network; | Compute by the channel attention module; | Compute by the spatial self-attention module; | Upsampling feature map X to 272 × 272; | Locate target center in feature map X by finding peak; | Computer the offset of the upsampled map relative to the feature map; | Computer the offset of the feature map relative to original image; | Update target size and corresponding bounding box; | end |
|