Security and Communication Networks

Research Article

An Enhanced Visual Attention Siamese Network That Updates Template Features Online

Inference of the proposed framework.

Input: test video; initial frame and bounding box of initial frame;
Compute by the backbone network;
Compute by the channel attention module;
Compute by the spatial self-attention module;
Preprocessing: crop and resize X and set three different scale patches .
While test video is not empty do
Get search patch X and corresponding bounding box;
Compute by the backbone network;
Compute by the channel attention module;
Compute by the spatial self-attention module;
Upsampling feature map X to 272 × 272;
Locate target center in feature map X by finding peak;
Computer the offset of the upsampled map relative to the feature map;
Computer the offset of the feature map relative to original image;
Update target size and corresponding bounding box;
end