Review Article

A Survey on Adversarial Attack in the Age of Artificial Intelligence

Table 3

Text-based adversarial attack.

AuthorSolutionCoresShortcomings

Ebrahimi et al. 2018 [26]Javid Ebrahimi et al. propose a method of generating white-box adversarial examples, HotFlip, to make the character-level classifier classification errors.(1) Use beam search to find a set of operations (flip, insert, and delete) that obfuscate the classifier
(2) Calculate the directional derivative of the operation using the gradient represented by the one-hot input vector to estimate the change in the loss.
Failure to evaluate the robustness of different character-level models for different tasks
The context is not well considered.

Gao et al. 2018 [28]Ji Gao et al. introduce a new algorithm for generating text perturbed in black-box scenarios: DeepWordBug, which causes the deep learning classifier to misclassify text input.(1) Use the scoring function to determine the importance of each word to the classification results and rank the words according to their ratings
(2) Use the transformation algorithm to change the words selected.
This paper does not discuss the application of the algorithm in the white-box scenario.

Cheng et al. 2018 [58]In this paper, the Seq2Sick framework is proposed to generate adversarial examples for sequence-to-sequence (seq2seq) model. Nonoverlapping attack and targeted keywords attack are mainly studied.(1) A projection gradient method is proposed to solve the discrete problem of input space
(2) Adopting group lasso to enhance the sparsity of distortion
(3) Developed a regularization technique to improve the success rate.
The success rate of targeted one-keyword attack is reduced when the model of subword transformation is attacked.

Li et al. 2019 [30]This paper proposes a general attack framework for generating adversarial text: TextBugger.(1) White box: find the most important word through the Jacobian matrix, generate five types of bugs, and find the best one based on confidence
(2) Black box: find the most important sentences first and then use the scoring function to find the most important words
(3) Assessment: sentiment analysis and harmful content detection.
(1) This paper only performs nontarget attack and does not involve target attack
(2) The integration of defense systems based on language perception or structure perception can be further explored to improve robustness.

Hang et al. 2019 [59]In this paper, Metropolis-Hastings sampling (MHA) is proposed to generate adversarial samples for natural language.(1) Black-MHA: select words by traversing the index for conversion operation and select the most likely words according to the score
(2) White-MHA: the difference between the white-box attack and the black-box attack is preselection, which introduces gradients into the score calculation.
(1) It may produce incomplete sentences
(2) Unrestricted entity and verb substitution also have a negative impact on the adversarial example generation of tasks (such as NLI).

Zang et al. 2020 [60]In this paper, a new black-box adversarial attack model is proposed to solve the combinatorial optimization problem of word-level adversarial attack.(1) A word substitution method based on the minimum semantic unit sememe is designed to reduce the search space
(2) A search algorithm based on particle swarm optimization is proposed to search adversarial examples.
The improvement of robustness and the use of sememe in defense model need further study.