Journal of Healthcare Engineering

Research Article

Learning to Discriminate Adversarial Examples by Sensitivity Inconsistency in IoHT Systems

Table 3

Detection performance of three detection methods. The model, dataset, and attack method are consistent for the training and testing phases. As Deepwordbug is a character-level attack and FGWS detection is just designed for word-level attacks, the experimental results of Deepwordbug detection with FGWS are not meaningful, and “—” in the table indicates that the experiment is not conducted.


Model	Dataset	Attack	Recall (%)			F1-score (%)
Model	Dataset	Attack	FGWS	WDR	SIFD	FGWS	WDR	SIFD

BERT	AG’s news	TextFooler	81.5	83.0	91.7	87.5	86.1	90.7
		PWWS	85.1	87.9	91.9	89.7	90.5	92.2
		BAE	49.7	80.0	86.7	57.2	81.2	84.5
		Deepwordbug	—	75.4	85.0	—	78.3	85.6
	IMDB	TextFooler	79.9	95.5	97.2	86.6	95.8	96.4
		PWWS	82.5	92.7	95.5	85.8	94.2	96.0
		BAE	56.7	90.3	96.2	67.8	93.1	96.3
		Deepwordbug	—	92.0	94.2	—	92.7	94.8

CNN	AG’s news	TextFooler	82.9	92.0	95.5	86.2	89.7	91.5
		PWWS	86.8	91.0	94.0	91.2	86.0	90.6
		BAE	56.7	88.2	92.4	62.1	85.5	88.5
		Deepwordbug	—	91.0	92.4	—	86.3	84.9
	IMDB	TextFooler	75.9	89.9	99.7	85.3	91.5	97.8
		PWWS	80.2	87.2	99.0	86.0	87.2	96.5
		BAE	59.8	88.9	98.2	70.1	87.1	96.5
		Deepwordbug	—	91.2	97.9	—	89.6	95.7

LSTM	AG’s news	TextFooler	86.2	91.3	96.2	90.1	87.8	91.2
		PWWS	84.7	84.6	94.5	90.4	86.8	88.5
		BAE	62.2	88.2	91.7	67.9	88.8	90.3
		Deepwordbug	—	83.4	88.6	—	83.3	84.1
	IMDB	TextFooler	77.4	94.8	97.8	83.8	95.0	95.4
		PWWS	70.5	92.5	92.0	80.0	92.4	92.7
		BAE	48.8	95.5	96.9	57.4	95.5	97.7
		Deepwordbug	—	92.0	92.2	—	93.6	91.5

Bold values indicate the optimal results among three defense methods.