Security and Communication Networks / 2023 / Article / Tab 2 / Research Article
Vision Transformer-Based Video Hashing Retrieval for Tracing the Source of Fake Videos Table 2 Evaluation of cross-dataset performance on five subsets of FaceForensics++ (FF++): DeepFake (DF), Face2Face (F2F), Faceswap (FS), NeuralTexture (NT), and FaceShifter (FSh).
Training set Methods Test set (ACC) DF F2F FS NT FS DF Xception [54 ] 99.3 73.6 49.0 73.6 — HRNet [51 ] 99.3 68.2 39.1 71.4 — Face X-ray [52 ] 98.7 63.3 60.0 69.8 — ADD [1 ] 98.7 — — — — Grad-CAM [53 ] 99.2 0.76.4 49.7 81.4 — Ours 98.8 0.98.8 98.8 99.1 98.6 F2F Xception [54 ] 80.3 99.4 76.2 69.6 — HRNet [51 ] 83.6 99.5 56.6 61.3 — Face X-ray [52 ] 63.0 98.4 93.8 94.5 — ADD [1 ] — 96.8 — — — Grad-CAM [53 ] 83.7 99.4 98.7 98.4 — Ours 99.2 99.4 99.2 99.2 99.2 FS Xception [54 ] 66.4 88.8 99.4 71.3 — HRNet [51 ] 63.6 64.1 99.2 68.9 — Face X-ray [52 ] 45.8 96.1 98.1 95.7 — ADD [1 ] — — 97.9 — — Grad-CAM [53 ] 68.5 99.3 99.5 98.0 — Ours 99.9 99.8 99.9 99.8 99.9 NT Xception [54 ] 79.9 81.3 73.1 99.1 — HRNet [51 ] 94.1 87.3 64.1 98.6 — Face X-ray [52 ] 70.5 91.7 91.0 92.5 — ADD [1 ] — — — 88.5 — Grad-CAM [53 ] 89.4 99.5 99.3 99.4 — Ours 99.3 99.2 99.3 99.3 99.3 FS ADD [1 ] — — — — 96.6 Ours 98.8 98.8 99.0 99.3 99.1
We trained on one subset and tested on the other four subsets. Bold values represent the best results in the correlation domain.