With the rapid evolution of the Internet, the application of artificial intelligence fields is more and more extensive, and the era of AI has come. At the same time, adversarial attacks in the AI field are also frequent. Therefore, the research into adversarial attack security is extremely urgent. An increasing number of researchers are working in this field. We provide a comprehensive review of the theories and methods that enable researchers to enter the field of adversarial attack. This article is according to the “Why? → What? → How?” research line for elaboration. Firstly, we explain the significance of adversarial attack. Then, we introduce the concepts, types, and hazards of adversarial attack. Finally, we review the typical attack algorithms and defense techniques in each application area. Facing the increasingly complex neural network model, this paper focuses on the fields of image, text, and malicious code and focuses on the adversarial attack classifications and methods of these three data types, so that researchers can quickly find their own type of study. At the end of this review, we also raised some discussions and open issues and compared them with other similar reviews.

1. Introduction

In the age of the Internet, with the accumulation of large amounts of data, the evolution of computing power, and the constant innovation and evolution of machine learning methods and frameworks, artificial intelligence (AI) technologies such as image recognition, machine translation, and autonomous driving have been widely deployed and widely applied all over the world [1]. Artificial intelligence is marching towards historic moments for mankind. At the same time, machine learning algorithms also have a significant impact on the research of the traditional computer security field [2]. In addition to using machine learning (ML) to build various malicious detections and attack identification systems, hackers may also use ML to achieve more accurate attacks. Recent studies have shown that many application fields, from computer vision to network security, are vulnerable to adversarial attack threats [3, 4].

Szegedy et al. [5] first proposed the concept of adversarial sample—an interesting weakness in neural networks. This paper has sparked widespread interest in adversarial attacks among researchers, and the number of adversarial attacks will continue to increase in the future as the economic benefits trend. Based on gradient descent and L-norm optimization methods, Liu et al. [6] first presented the method to exploit a malware-based visual detector to adversarial example attacks. Zhou et al. [7] were the first to propose an alternative model for training adversarial attacks without real data.

In order to provide a comprehensive reference tool for researchers to study adversarial attack, this paper classifies and compares adversarial attack algorithms of image, text, and malware fields. Moreover, the concepts, types, harms, evolvement trends of adversarial attack, and the commonly used defense technologies in recent years are reviewed systematically, and the future research direction is discussed [8, 9]. This paper is aimed at providing theoretical and methodological support for the research on adversarial attack security by extensively investigating the research literature on adversarial attacks and robustness [10, 11].

To sum up, the main contributions are as follows: (1)Present adversarial attacks according to the idea of “Why? → What? → How?” In this way, researchers can quickly and efficiently establish the awareness of adversarial attack(2)Portray a roadmap for carrying out adversarial attack research, which can help researchers quickly and efficiently access the domain of adversarial attacks security research(3)In order to ensure the timeliness of the review, the most up-to-date and comprehensive references are provided based on the articles published in authoritative journals and top academic conferences after 2010(4)In order to enable researchers to find referable methods as needed, the technical core and deficiency of methods in each typical method are introduced(5)In order to quickly find the entry point of adversarial attack research, the pieces of literature on adversarial attack are classified from different perspectives.

The structure of the article is as follows. In “Why Study Adversarial Attack,” we first recommend why we should study adversarial attacks. In “Adversarial Attack,” we describe the concepts, classifications, hazards, and methods associated with adversarial attacks. A separate section is devoted to categorizing and comparing the adversarial attack methods of images, text, and malware. In “Defense Methods,” we discuss defense methods. Then, in “Additional Complements,” the additional supplements needed for adversarial attack research are presented. After that, we put forward a broader prospect of research direction in “Discussion.” In “Comparison with Other Similar Reviews,” the differences between this paper and other similar reviews are discussed. Finally, we concluded in “Conclusions.”

2. Why Study Adversarial Attack

Choosing directions is the first step in conducting research. The research direction of adversarial attack security is reflected in the following aspects: (1)Adversarial attack has become a serious threat to the current AI system. In the era of the Internet of Everything, the network has become the ideal goal for cyber attackers. Vulnerability in artificial intelligence is often exploited by attackers to launch cyber-attacks. In 2018, a fake video of Obama railing against Trump went viral across the United States. Some criminals fabricate information to manipulate the election. With the frequent occurrence of these fraud incidents, governments and organizations of various countries have formulated and improved the relevant laws and regulations. In the United States, for example, the Deepfake Report Act was introduced in June 2019 and passed unanimously in the Senate as a separate bill only 120 days later. Despite the relentless efforts of the machine learning and AI communities to erect protective fences, the number of adversarial attacks is climbing significantly, and the threat posed by them continues to increase. Szegedy et al. [5] first proposed the concept of adversarial samples—intriguing weaknesses of neural networks. Their essay has sparked broad interest among researchers in adversarial attacks. Moreover, the number of adversarial attacks will continue to increase in the future, as economic interests evolve. Only by continuously strengthening the protective barrier of the deep learning model can a secure network security environment be built [12](2)The race of AI can promote adversarial attack security research. Attacking and defending adversarial machine learning is a process of iterative evolution. Adversarial attack creators have been probing new vulnerabilities, depicting new algorithms, and seeking new threats, while the defenders have been analyzing the features of new adversarial attacks and employing new methods to ensure efficient and effective defense against adversarial attacks. Consequently, choosing adversarial attack security as the research direction can not only merely keep the study at the forefront of AI security but also enable researchers to stimulate continuous motivation in the process of research.

3. Adversarial Attack

3.1. Concepts

In this subsection, some common terms used in the literature related to adversarial attacks are presented.

3.1.1. Adversarial Example

Adversarial example is an artificially constructed example that makes machine learning models misjudge by adding subtle perturbations to the original example but at the same time does not make human eyes misjudge [13].

3.1.2. White-Box Attack

White-box attacks assume that the target model can fully obtain the structure of the model, including the composition of the model and the parameters of the partition layer, and can fully control the input of the model [14].

3.1.3. Black-Box Attack

Black-box attacks have no idea of the internal structure of the model and can only control the input and carry out the next attack by comparing the feedback of the input and output [15].

3.1.4. Real-World Attack/Physical Attack

Real-world attacks/physical attacks do not understand the structure of the model and even have weak control over the input [16].

3.1.5. Targeted Attack

Targeted attacks will set the target before the attack, causing it to incorrectly predict the specific label of the adversarial images, which means that the effects after the attacks are determined [17].

3.1.6. Untargeted Attack

Untargeted attacks do not need to set the target before the attack, as long as the result of identification is wrong after the attack [18].

3.1.7. Evade Attack

Evade attacks refer to adding disturbance to test samples and modifying the input during the test phase, to avoid or deceive the detection of the model and make the AI model unable to be correctly identified [19].

3.1.8. Poisoning Attack

By adding carefully constructed malicious examples in the model training phase, poisoning attacks make the trained model have backdoor or vulnerability, which can be used by attackers to carry out attacks [20].

3.1.9. Backdoor Attack

Backdoor attack means that the attacker can enter the system without identity authentication, which means that the attacker can bypass the security control to gain access to the system and even further damage the computer or network. The neural network backdoor attack referred to in this paper refers to that the attacker generates a model with a backdoor by implanting specific neurons in the neural network model, so that the judgment of the model on normal inputs is consistent with the original model, but the judgment on special inputs will be controlled by the attacker [21]. A backdoor attack is a type of poisoning attack [22].

3.1.10. Detection Model

The detection model means to detect whether the examples to be judged are adversarial examples by detecting components. Various detection models may judge whether the input is an adversarial example according to different criteria [23].

3.2. Classifications

In this section, we describe the main methods for generating adversarial examples in the image, text, and malware domains. And the work of some researchers is reviewed. Adversarial machine learning is a widely used technology in the field of image, and the research has been quite comprehensive. The malicious code domain and the text domain are similar and can be borrowed from each other. Figure 1 is the classification of image-based adversarial attack. According to the attacker’s knowledge, the attack can be divided into black-box attack, white-box attack, and physical attack. Figure 2 is the classification of text-based adversarial attack. According to the access permissions of the model, the attack can be divided into black-box attack and white-box attack. And according to the effect after the attack, it can be divided into the targeted attack and nontargeted attack. Moreover, according to the text granularity, it can be divided into character level, word level, and sentence level. Besides, according to the attack strategy, it can be divided into image-to-text, importance-based, optimization-based, and neural network-based. Figure 3 is the classification of malware-based adversarial attack. According to the object, it can be divided into attacks on training data and attacks on neural network model.

3.3. Hazards

The main harm of adversarial attack includes the following: (1)Model losing or stealing: network information data has become the most precious intangible asset in the current Internet of Everything era. In the current era of the Internet of Everything, network information data has become the most precious intangible asset. Plenty of adversarial attacks are designed to steal secret data, such as stealing privacy data from computers or servers and after that deceiving or blackmailing victims. More malicious attacks are aimed at enterprises to steal valuable trade secrets from enterprises and obtain economic benefits, even worse, targeting a country and stealing national security-related intelligence information from government departments for strategic purposes(2)Model failure: the adversarial attack causes the failure of the deep learning model and makes it unable to work properly utilizing physical attack against the vulnerability of the model. For instance, in the field of autonomous driving, the superposition of image data with subtle perturbations makes it difficult for humans to recognize by senses, which leads to the wrong classification decision made by the machine learning model and causes traffic accidents(3)Data poisoning: a typical scenario is the recommendation system. When hackers add abnormal data to the training samples of deep learning, the most direct impact is that the classification error of the deep learning model, i.e., the recommendation system, generates an error(4)Other hidden hazards: apart from the obvious harms mentioned above, adversarial attacks can also cause some hidden hazards. For instance, many malwares generate adversarial examples through some algorithms, and the malicious behavior does not change, but the detection of antivirus system fails, which eventually leads to malicious attacks on computers and networks as well as losses of profits.

3.4. Methods

The typical attack algorithms of each application domain are shown in Table 1.

3.4.1. Image-Based Adversarial Attack

Adversarial attacks are a major threat to computer vision. CCS 2020 has a total of four papers on adversarial machine learning, all of which are based on image research, three of which are about robustness and defense mechanisms, and only one is about adversarial attack methods. I have compared and summarized several papers of CCS, NIPS, USENIX, ECML PKDD, and JNCA from 2016 to 2020, as shown in Table 2.

Through reading the papers, it can be found that most of the current research in the field of image are optimized and improved based on previous research, and the attack types are generally divided into white-box attack, black-box attack, and physical attack [12].

3.4.2. Text-Based Adversarial Attack

Studies have shown that DNN models are susceptible to adversarial samples that lead to false predictions by adding imperceptible per to normal input. The study of adversarial samples in the field of images is abundant, but not enough in the field of text. Table 3 summarizes some of the relevant papers on the research of adversarial machine learning in the text field in recent years.

By reading the literature, the text domain adversarial samples can be divided into character level, word level, and sentence level. Attack types can be classified into white-box attack and black-box attack according to the model access rights. According to the effect after the attack, it can be divided into the targeted attack and untargeted attack. Attack strategies in the text domain generally include reference algorithms in the field of image, transformed into optimization problems, using gradient or designing scoring functions, and training neural network models to automatically generate adversarial samples [13].

3.4.3. Malware-Based Adversarial Attack

In recent years, machine learning has been used to detect malware, and malware developers have a strong incentive to attack the detection model. Table 4 summarizes some of the papers related to adversarial attacks in the field of malicious code in recent years.

Through literature reading, it can be found that attack types are generally divided into attacks on training data and attacks on neural network models. According to the attack scenario, it can be divided into white-box attack, gray-box attack, and black-box attack. In addition, the methods adopted include GAN-based, gradient-based, and heuristic-based algorithms [14].

4. Defense Methods

At present, relevant studies have been conducted to explore the security of AI itself, to ensure the integrity and confidentiality of AI models and data, so that it will not be easily affected by attackers to change judgment results or leak data in different scenarios. Hendrycks and Gimpel [64] proposed three methods for detecting adversarial images. Zhang et al. [65] proposed a hardware-assisted randomization method against adversarial examples to defend against various adversarial attacks. At present, AI attacks are divided into the evasive attack, poisoning attacks backdoor attack, and model-stealing attack. The backdoor attack is a kind of poisoning attack. Table 5 lists the various defense techniques of the AI system during data collection, model training, and model use phases. Furthermore, we need to continue to study AI interpretability, enhance our understanding of how ML works, and build institutional defense measures to build AI security platforms [66].

4.1. Defense Methods of Evasive Attack
4.1.1. Adversarial Training

Adversarial training is an important way to enhance the robustness of neural network models. The basic principle of this technique is to use known attack techniques to generate adversarial samples in the model training stage. Then, the adversarial samples are added to the training set of the model, and the model is iteratively retrained until a new model that can resist disturbance is generated. At the same time, since the synthesis of multiple types of adversarial samples increases the data of the training set, this technique can not only enhance the robustness of the newly generated model but also enhance the accuracy and standardization of the model [67].

Ju et al. [68] propose E-ABS (analysis-by-synthesis) which can extend the ABS robust classification model to more complex image domains. The core contents include the following: (1) generation model: Adversarial Auto Encoder (AAE) is used to evaluate the class-conditional probability; (2) discriminative loss: use the discriminative loss during training to expose the conditional generation model to unevenly distributed samples; (3) variational inference: the ABS-like model estimates the likelihood of each category by maximizing the likelihood estimation; and (4) lower bound for the robustness of E-ABS: the lower bound of the nearest adversarial example to E-ABS is derived by using the ABS model. Nevertheless, the generation model is sensitive to image similarity measurements. And the reasoning and operation efficiency of ABS-like model on large datasets are low.

Chen et al. [69] are the first to evaluate and train the verifiable robust properties of PDF malware classifiers. They proposed a new distance metric in the PDF tree structure to construe robustness, that is, the number of different subtrees of depth 1 in two PDF trees. Furthermore, their experimental results show that the most advanced and new adaptive evolutionary attackers need 10 times the L0 feature distance and 21 times the PDF operation to evade the robustness model. However, the tradeoff between multiple robustness attributes and training costs needs further study.

4.1.2. Network Distillation

Distillation is a method of compacting the knowledge of a large network into smaller networks. Specialist models means, for a large network, multiple specialized networks can be trained to improve the model performance of the large network. The practice of distillation is generally to train a large model (teacher network) first and then heat the large model. The output of the large model is used as a soft target, and the real label of data is used as a hard target. The two are combined to train the small model (student network) [70].

The basic principle of the network distillation technique is to series multiple DNNs in the model training stage, in which the classification results generated by the former DNN are used to train the latter DNN. Papernot et al. [71] found that the transfer knowledge could reduce the sensitivity of the model to subtle disturbances to some extent and improve the robustness of the AI model. Therefore, network distillation technology is proposed to defend against evasive attacks, and it is tested on MNIST and CIFAR-10 datasets. It is found that network distillation technology can reduce the success rate of specific attacks (such as JSMA and FGSM).

4.1.3. Adversarial Example Detection

The principle of adversarial example detection is to detect whether the example to be judged is an adversarial example by adding the detection component of the external detection model or the original model in the usage stage of the model. Before the input example reaches the original model, the detection model will determine whether it is an adversarial example. The detection model can also extract relevant information from each layer of the original model and synthesize various information to carry out detection. Various detection models may use different criteria to determine whether the input is an adversarial example [72].

Shumailov et al. [73] propose a new provable adversarial example detection protocol, the Certifiable Taboo Trap (CTT). They extended the Taboo Trap method. Moreover, three different CTT modes (CTT-lite, CTT-loose, and CTT-strict) are also discussed. However, this scheme cannot be used to defend against a specific adversarial example, thus resulting in a more flexible and universal defense mechanism.

4.1.4. Input Reconstruction

The principle of input reconstruction is that the input samples are transformed to resist evasive attack in the use stage of the model, and the transformed data will not affect the normal classification function of the model. Reconstruction methods include noising, denoising, preprocessing, gradient masking, and using autoencoder to change the input examples [74]. At the same time, I think the metamorphosis of the sample has some similarities with the evolution of malicious code. Interestingly, 8 papers presented at the ICLR 2018 conference used gradient masking, but researchers quickly cracked 7 of them.

4.1.5. DNN Verification

Similar to software verification analysis technology, DNN verification technology uses solvers to verify various properties of DNN models, such as verifying that there is no adversarial example within a specific perturbation range. Nevertheless, the DNN model is usually verified as an NP-complete problem, and the efficiency of solver is low. Through selection and optimization, such as priority selection of model node validation, sharing of validation information, and validation by region, the operation efficiency of DNN verification can be further improved [75].

4.1.6. Data Augmentation

In reality, we often encounter the situation of insufficient data. The essence of data augmentation is to expand the original training set with generated adversarial samples when massive data is lacking, so as to ensure effective training of the model [76].

4.2. Defense Methods of Poisoning Attack
4.2.1. Training Data Filtering

This technique focuses on the control of the training dataset and uses detection and purification methods to prevent the poisoning attack from affecting the model. Specific directions include the following [77]: finding possible poisoning attack data points according to the label characteristics of the data and filtering these attack points during retraining; the model contrast filtering method was used to reduce the sampling data that could be used by poisoning attack, and the filtering data was used to against poisoning attack.

4.2.2. Regression Analysis

This technique is based on statistical methods that detect noise and outliers in datasets. Specific methods include defining different loss functions for the model to check outliers, using the distribution characteristics of data for detection [78], etc.

4.2.3. Ensemble Learning

Ensemble learning is to build and combine multiple machine learning classifiers to improve the ability of the machine learning system to resist poisoning attacks. Multiple independent models jointly constitute the AI system. And the possibility of the whole system being affected by the poisoning attack is further reduced due to the different training datasets adopted by multiple models [79].

Liao et al. [80] design an adaptive attack method to study the effectiveness of integrated defense based on transformation for image classification and its reasons. They propose two adaptive attacks to evaluate the integrated robustness of reversible transformation: TAA (adaptive attack based on transferability) and PAA (perturbation aggregation attack). Moreover, the ensemble evaluation method is used to evaluate the ensemble robustness of the irreversible transformation. However, the experimental results show that the integrated defense based on transformation is not enough to resist the antagonistic samples, the defense method is not reliable, and further efforts are needed.

4.2.4. Iterative Retraining

Iterative retraining refers to the iterative training of neural networks. Adversarial examples are generated according to any attack model and added to training data. Then, the neural network model is attacked, and the process is repeated [81].

4.3. Defense Methods of Back Door Attack
4.3.1. Input Preprocessing

The purpose of this method is to filter the input that can trigger the back door and reduce the risk of the input triggering the back door and changing the model judgment [82]. Data can be divided into discrete and continuous types. The image data is continuous and easy to encode as a numerical vector. The pretreatment operation is linear and differentiable, and there are many operation methods, such as mean standardization. Text data is discrete and symbolized. The preprocessing operation is nonlinear and nondifferentiable. One-hot is generally used for pretreatment.

4.3.2. Model Pruning

Pruning in neural networks is inspired by synaptic pruning in the human brain. Synaptic pruning, complete decline and death of axons and dendrites, is a synaptic elimination process that occurs between childhood and the onset of puberty in many mammals, including humans. The principle of model pruning is to cut off the neurons of the original model appropriately, reducing the possibility of the backdoor neurons working under the condition that normal function is consistent. Using a fine-grained pruning method [83], the neurons that make up the backdoor can be removed, and the backdoor attack can be prevented.

4.4. Defense Methods of Model-Stealing Attack
4.4.1. PATE

The basic principle of PATE is to divide the training data into multiple datasets without intersection in the model training stage, and each set is used to train an independent DNN model (called teacher model). These independent DNN models are then used to vote together to train a student model [84]. This technology ensures that the judgment of the student model will not disclose the information of a particular training data, so as to ensure the privacy of the training data.

4.4.2. Differential Privacy

In the model training stage, the method is used to add noise to the data or the model training step by conforming to the differential privacy [85]. For example, Giraldo et al. [86] are the first to solve the adversarial classification problem in a system that uses differential privacy to protect the user’s privacy. They find an optimal fake data injection attack that reduces the system’s ability to detect anomalies, while allowing the attacker to remain undetected by “hiding” the fake data in the differential privacy noise. Besides, they design the optimal defense method to minimize the impact of such attack. They also show how the optimal DP-BDD (differential privacy–bad-data detection) algorithm achieves a Nash equilibrium between attackers trying to find the optimal attack distribution and defenders trying to design the optimal bad-data detection algorithm. Yet, the trade-offs between security, privacy, and practicality need to be further explored. Differential privacy adds noise to sensitive data or calculations performed on sensitive data to ensure privacy without unduly reducing the utility of the data.

4.4.3. Model Watermarking

The technique is to embed special labels in the original model during the model training stage. If a similar model is found, a special input sample can be used to identify whether the similar model was obtained by stealing the original model. Adi et al. [87] proposed a black-box deep neural network watermarking method. The robustness of the watermarking algorithm is evaluated under black-box and gray-box attacks.

In addition to the defensive measures mentioned above, there are other ways to improve the model’s robustness against attacks, such as data compression, data randomization, data secondary sampling, outliers removal, model regularization, deep contractive network, biologically inspired conservation [32], attention mechanism [88], GAN-based, magnet [89], and high-level representation guided denoiser (HGD) [90].

Each of the above defense techniques has specific application scenarios and cannot completely defend against all the adversarial attacks. We can consider the above defense technology in parallel or serial integration to see whether the defense effect is better. For instance, data augmentation has the flexibility to easily plug in other defense mechanisms [91].

5. Additional Complements for Adversarial Machine Learning Research

Apart from focusing on the research methods of adversarial attack and defense, we also need to know other sides of this field.

5.1. The Choice of Dataset

By analyzing the already published articles, there are three types of datasets used in the adversarial machine learning research community currently. The common dataset for image adversarial machine learning research is shown in Table 6. The application of the text adversarial machine learning dataset is shown in Table 7. And Table 8 shows the application of the malware adversarial machine learning dataset.

5.1.1. Publicly Accessible Dataset

At present, the majority of published papers use publicly accessible datasets from the Internet. These datasets are free to use and are maintained and updated by researchers in the field of computer science.

5.1.2. Commercial Dataset

Commercial datasets are generally not freely utilized or publicly available.

5.1.3. Artificially Generated Dataset

Other datasets are manually generated or crawled from the website by researchers using special tools.

5.2. General Adversarial Machine Learning Tools

Some commonly used tools can be used to assist experimental verification in the experimental phase of adversarial machine learning research. Common tools for adversarial ML include sandboxes, Python-based tools, and Java-based tools.

5.2.1. Sandbox

The Sandbox is a virtual system application that allows the user to run a browser or other application in a Sandbox environment, so the changes that result from running it can be deleted later. It creates a separate operating environment that restricts program behavior according to security policies, and programs that run inside of it do not permanently affect the hard disk. In cyber security, a sandbox is a tool used to handle untrusted files or applications in an isolated environment. For instance, Hu et al. used the Cuckoo Sandbox to process malware samples.

5.2.2. Machine Learning Tools Based on Python

Python is regarded as the best suited programming language for ML. Therefore, a series of Python-based machine learning and deep learning tools have been developed by researchers in the adversarial ML.

5.2.3. TensorFlow

TensorFlow is an end-to-end open source machine learning platform. It has a comprehensive and flexible ecosystem of tools, libraries, and community resources that will help researchers drive the development of advanced machine learning technologies. Besides, it enables developers to easily build and deploy applications powered by machine learning.

5.2.4. Keras

Keras is a high-level neural network API written in Python. It runs with TensorFlow, CNTK, or Theano as a backend. The focus of Keras development is to support rapid experimentation. Being able to turn ideas into results with minimal delay is the key to carry out research. With it, deep network can be built quickly and training parameters can be selected flexibly.

5.2.5. PyTorch

PyTorch is based on Torch and is used for applications such as natural language processing. It is a tensor library optimized using GPU and CPU. Moreover, it can be regarded as a powerful deep neural network with automatic derivation function.

5.2.6. NumPy

NumPy is a basic package for scientific computing using Python. It can be used to store and process large matrices and support a large number of dimension arrays and matrix operations. In addition, it also provides a large number of mathematical libraries for array operations.

5.2.7. Scikit-Learn

Scikit-Learn is a machine learning tool based on Python with simplicity and efficiency. It can perform data mining and data analysis. Everyone can access it, and it is open source. It includes the following six basic functions: classification, regression, clustering, dimensionality reduction, model selection and preprocessing [115].

5.2.8. Machine Learning Tools Based on Java

Java-based machine learning platforms include WEKA, KNIME, and RapidMiner. WEKA provides Java’s graphical user interface, command-line interface, and Java API interface. It is probably the most popular Java machine learning library. Apache OpenNLP is a toolkit for processing natural language text. It provides methods for natural language processing tasks such as tokenization, segmentation, and entity extraction.

5.3. The Connections between Papers

Table 9 shows the relationships among some of the literatures listed in this paper.

6. Discussion

The competition between offense and defense in the security field is endless. The previous sections provide an introduction to adversarial attacks and defenses, so that the reader can learn about them. Next, there is our discussion and outlook in this area.

6.1. Adversarial Example Generation Based on Malware

First, let us introduce the “feature engineering” of malicious code: (1)Digital feature extraction: scale, normalize, and MinMaxScaler(2)Text feature extraction: word set model and word bag model(3)Data extraction: CSV is the most common format.

Adversarial machine learning is a widely used technique in the image domain. The adversarial attack technology in the field of the image is becoming more and more mature. In the future, we plan research adversarial attack samples in the field of malicious code. Since the structure of malicious code is similar to that of text data, we can consider transferring the text adversarial attack algorithm to the field of malicious code. There are two main ways to generate text adversarial samples. One is to generate adversarial samples directly through text editing operations such as insert, delete, and replace by using the characteristics of the text. The other is to map the text data into continuous data and generate the adversarial samples by using some algorithms in the field of computer vision for reference. Yan et al. [116] propose a genetic algorithm-based malicious code adversarial sample generation method to static rewrite PE files. The atomic rewrite operation screened by the fuzzy test is similar to the text edit operation.

6.2. Adversarial Example Generation Based on Swarm Intelligence Evolutionary Algorithm

Swarm intelligence evolutionary algorithm is a heuristic computing method that simulates the swarm behavior of insects, birds, and fish in the biological world, including genetic algorithm, ant colony algorithm, particle swarm algorithm, and cultural algorithm. Currently, Liu et al., Yan et al. [116], and Wang et al. [56] have all generated adversarial samples by improving swarm intelligence evolutionary algorithm. However, the literature review found that there are few articles about the cultural algorithm, among which no one has conducted the research of adversarial sample generation based on cultural algorithm. This is one of our future research directions.

6.3. Malware Evolution

Adversarial sample generation in the domain of malicious code is, in my opinion, similar to the evolution of malicious code. In order to counter the network security protection system, malicious code makers continue to use new technologies and new methods to create new malicious code. As a result, the malicious code is constantly evolving to ensure that it can evade security systems. Taking the evolutionary process of a family sample as an example, the sample population within a family can be regarded as a spatiotemporal set sequence, and the sample sets generated at different stages have different functional characteristics. Samples within each set will adopt different evolutionary methods to carry out internal upgrading, and different sets will also adopt collaboration methods to carry out coevolution. The goal of its evolution is to ensure its continued survival ability and attack ability to complete destructive tasks under different network security protection environments. The generated adversarial samples can be seen as a form of evolution of the malicious code. More attention can be paid to this research in the future.

6.4. Improve the Transportability

Transportability does not mean that a program can be written without modification to any computer, but rather that a program can be written without many modifications when conditions change. Transportability is one of the system quality assurances. It reflects the universality of the model. And transportability means that an adversarial sample generated by a neural network model on one dataset can also successfully attack another neural network model or dataset. Generally divided into three types:(1) the same architecture, different datasets (for example, both are based on Windows platform but use PE file and APK file, two types of datasets, respectively); (2) the same dataset with different architectures (for example, both are PE files but are applied to Windows and iOS architectures); and (3) different datasets and architectures. Although some models have successfully achieved portability, performance is still declining. Therefore, it is worth studying to improve the portability of the model.

6.5. Automation

Although some researchers have achieved automatic adversarial sample generation, many others craft adversarial samples manually. The artificial way is time-consuming and laborious and does not conform to the trend and new requirements of the development of The Times. In the future, with the efforts of researchers, I believe this problem will be greatly improved.

6.6. Possible Defensive Measures

In order to ensure system security in the field of AI, how to defend against attacks is the focus of current research. Many good researchers have designed powerful attacks but have not come up with effective countermeasures to defend them. The attack mentioned in this article must be carried out on the premise that the adversary can access the software or system. If the security of the access control is done well, it is helpful to protect the security of AI. In the stage of access control, identity authentication technology is one of the most important links. Secure multiparty computation (MPC) and homomorphic encryption (HE) are important privacy protection technologies in identity authentication systems [117121]. I envision a combination of MPC and fully homomorphic encryption (FHE) to defend against attacks. MPC and FHE have high security. Zheng et al. [122] design and build Helen, a system that can achieve malicious security collaborative learning. They greatly reduced the model training time for achieving stochastic gradient descent (SGD) under MPC’s SPDZ protocol by using the alternating direction multiplier method (ADMM) and singular value decomposition (SVD). Giraldo et al. [86] have solved the adversarial classification problem in a system that uses differential privacy to protect user privacy. Differential privacy is a noise-based MPC. Besides, the homomorphic encryption model is an effective way to improve data security and availability, which means the behaviors of users will not be leaked when trusted third parties help users process their data. To sum up, it is worth trying to combine secure multiparty computing and fully homomorphic encryption to defend against attacks. In addition, the blockchain technology has been mature and widely used in various fields, which can be regarded as a solution for examining malicious input and can be combined with MPC [123125].

7. Comparison with Other Similar Reviews

As shown in Table 10, there are several similar surveys of the adversarial attack. The differences between this article and the other are as follows: (1)This paper follows the route of “Why? → What? → How?” and does not put forward any new concepts, intended to let beginners quickly enter the field of adversarial attack and defense under the guidance of this essay(2)This paper did not carry out adversarial attack studies from the L-BFGS method, FGSM-based attack, basic and least-likely-class iterative methods, JSMA attack, C&W method, and DeepFool method, as most of the review did. Instead, the research methods of adversarial attack are classified from the fields of image, text, and malicious code to assist researchers to discover the breakthrough point for their research(3)This paper is a systematic introduction of influential papers published after 2010 to ensure the advanced and comprehensive of the essay. Researchers can quickly find their interest points by browsing this article, improving the efficiency of the study.

8. Conclusions

In the field of AI security, it is a constant battle between attack and defense. To help researchers quickly enter the field of adversarial attack, this review is based on high-quality articles published since 2010. We summarize the typical adversarial attacks in the fields of text, images, and malware to help researchers locate their own research areas. Also, we introduce defense technologies against attacks. Finally, we present some discussions and open issues. Adversarial learning has a long history in the field of security. It is hoped that under the guidance of this paper, new researchers can effectively establish the framework of adversarial attack and defense.

Data Availability

Previously published articles were used to support this study, and these prior studies and datasets are cited at relevant places within the article.

Conflicts of Interest

The authors declare that there is no conflict of interest regarding the publication of this paper.


This work was supported by the National Key Research & Development Program of China (2020YFB1712104), Major Scientific and Technological Innovation Projects of Shandong Province (2020CXGC010116), the National Natural Science Foundation of China (No. 61876019, No. U1936218, No. 62072037), and Zhejiang Lab (No. 2020LE0AB02).