Research Article

An Enhanced Visual Attention Siamese Network That Updates Template Features Online

Algorithm 2

Inference of the proposed framework.
Input: test video; initial frame and bounding box of initial frame;
Compute by the backbone network;
Compute by the channel attention module;
Compute by the spatial self-attention module;
Preprocessing: crop and resize X and set three different scale patches .
While test video is not empty do
 Get search patch X and corresponding bounding box;
 Compute by the backbone network;
 Compute by the channel attention module;
 Compute by the spatial self-attention module;
 Upsampling feature map X to 272 × 272;
 Locate target center in feature map X by finding peak;
 Computer the offset of the upsampled map relative to the feature map;
 Computer the offset of the feature map relative to original image;
 Update target size and corresponding bounding box;
end