Learning to Discriminate Adversarial Examples by Sensitivity Inconsistency in IoHT Systems
Table 3
Detection performance of three detection methods. The model, dataset, and attack method are consistent for the training and testing phases. As Deepwordbug is a character-level attack and FGWS detection is just designed for word-level attacks, the experimental results of Deepwordbug detection with FGWS are not meaningful, and “—” in the table indicates that the experiment is not conducted.
Model
Dataset
Attack
Recall (%)
F1-score (%)
FGWS
WDR
SIFD
FGWS
WDR
SIFD
BERT
AG’s news
TextFooler
81.5
83.0
91.7
87.5
86.1
90.7
PWWS
85.1
87.9
91.9
89.7
90.5
92.2
BAE
49.7
80.0
86.7
57.2
81.2
84.5
Deepwordbug
—
75.4
85.0
—
78.3
85.6
IMDB
TextFooler
79.9
95.5
97.2
86.6
95.8
96.4
PWWS
82.5
92.7
95.5
85.8
94.2
96.0
BAE
56.7
90.3
96.2
67.8
93.1
96.3
Deepwordbug
—
92.0
94.2
—
92.7
94.8
CNN
AG’s news
TextFooler
82.9
92.0
95.5
86.2
89.7
91.5
PWWS
86.8
91.0
94.0
91.2
86.0
90.6
BAE
56.7
88.2
92.4
62.1
85.5
88.5
Deepwordbug
—
91.0
92.4
—
86.3
84.9
IMDB
TextFooler
75.9
89.9
99.7
85.3
91.5
97.8
PWWS
80.2
87.2
99.0
86.0
87.2
96.5
BAE
59.8
88.9
98.2
70.1
87.1
96.5
Deepwordbug
—
91.2
97.9
—
89.6
95.7
LSTM
AG’s news
TextFooler
86.2
91.3
96.2
90.1
87.8
91.2
PWWS
84.7
84.6
94.5
90.4
86.8
88.5
BAE
62.2
88.2
91.7
67.9
88.8
90.3
Deepwordbug
—
83.4
88.6
—
83.3
84.1
IMDB
TextFooler
77.4
94.8
97.8
83.8
95.0
95.4
PWWS
70.5
92.5
92.0
80.0
92.4
92.7
BAE
48.8
95.5
96.9
57.4
95.5
97.7
Deepwordbug
—
92.0
92.2
—
93.6
91.5
Bold values indicate the optimal results among three defense methods.