Vision Transformer-Based Video Hashing Retrieval for Tracing the Source of Fake Videos

<div>Overview of our proposed networks. Our method consists of two networks: ViTHash and localizator, and two basic modules: upper sampling and transformer block. ViTHash and localizator are composed of these basic modules. ViTHash trains hash centers from triplet videos, which include the original video and two randomly related fake videos. The trained hash centers are used to trace the source of fake videos. The localizator is designed to analyze the differences between the traced video and the fake video, which are not affected by the video quality or cropping. The different areas of the two videos are represented by generated masks.</div>

Security and Communication Networks

fig4

Figure 4

Figure 4: Vision Transformer-Based Video Hashing Retrieval for Tracing the Source of Fake Videos