Abstract
Convolutional neural networks have outperformed humans in image recognition tasks, but they remain vulnerable to attacks from adversarial examples. Since these data are crafted by adding imperceptible noise to normal images, their existence poses potential security threats to deep learning systems. Sophisticated adversarial examples with strong attack performance can also be used as a tool to evaluate the robustness of a model. However, the success rate of adversarial attacks can be further improved in blackbox environments. Therefore, this study combines a modified Adam gradient descent algorithm with the iterative gradientbased attack method. The proposed Adam iterative fast gradient method is then used to improve the transferability of adversarial examples. Extensive experiments on ImageNet showed that the proposed method offers a higher attack success rate than existing iterative methods. By extending our method, we achieved a stateoftheart attack success rate of 95.0% on defense models.
1. Introduction
In image recognition tasks, convolutional neural networks are able to classify images with an accuracy approaching that of humans [1–4]. However, researchers have found that neural networks are also vulnerable to adversarial examples. Szegedy et al. [5] first proposed the concept of adversarial examples: images added with small perturbations, which cause neural network models to output incorrect classifications with high confidence. These adversarial perturbations are often indistinguishable to the human eyes (in other words, there is no obvious visual difference between the adversarial examples and the original images).
Adversarial attacks can be categorized into white and blackbox attacks. A variety of techniques can be used to generate adversarial examples and perform whitebox attacks, depending on the model structure and corresponding parameters [6–10]. In addition, adversarial examples are generally transferable as data generated for one model may be able to fool other models. This facilitates blackbox attacks, in which the structure and parameters of the model are not available, for various neural networks [11]. Goodfellow et al. [6] suggested that different models learn similar decision boundaries during the same image classification tasks and obtain similar parameters. These properties make it easier to generalize adversarial examples to different models.
Although adversarial examples are generally transferable, the optimal approach of improving this transferability needs to be examined further. Due to a balance between attack performance and transferability, basic iterative attacks are often more effective than singlestep attacks in whitebox environments and weaker in blackbox environments. In whitebox attacks, iterative methods excessively fit specific network parameters, achieving a high success rate but preventing generalization to other models [9]. We consider that this is the result of overfitting since attack performance for adversarial examples in white and blackbox environments is similar to the neural network performance on training and test sets.
Unlike whitebox attacks, blackbox attacks are more consistent with actual attackdefense environments and are primarily performed in one of the following three ways. (1) Decisionbased attacks are conducted using information collected from the network. Although access to model structure and parameters is unavailable, the attacker can generate adversarial examples by executing multiple queries against the model [12–15]. (2) In substitutemodel techniques, the attacker can input images into the model to obtain an output label. Then, the substitute model can be trained to imitate the target model and generate adversarial examples [16]. (3) Transferability attacks are conducted by improving the transferability of adversarial examples generated in a whitebox setting [8, 9]. The first two blackbox attacks require large quantities of model queries, which is impractical in some cases (e.g., online platforms often limit the number of queries). As such, this study investigates blackbox attacks using adversarial data with strong transferability. We investigate a stateoftheart optimizer Adam for improving blackbox adversarial attacks.
In this paper, we propose the Adam iterative fast gradient method (AIFGM) to improve the transferability of adversarial examples among different models [17]. Inspired by the fact that Adam is better than momentum in optimization, we adopt Adam optimizer into the iterative gradientbased attack, so as to accelerate data update in dimensions with small gradients and achieve better convergence, by applying the second momentum term and a decreasing step size. As shown in Figure 1, being different from the momentum method that just accumulates the gradients of data points along the optimization path, our Adam method can also accumulate the square of the gradients. Such accumulation might help obtain an adaptive update direction, which leads to a better local minimum. Besides, a variable step size is also used to avoid oscillating.
Compared with existing gradientbased methods, the proposed method offers improved attack performance in blackbox settings. This approach was tested on multiple networks, including adversarially trained networks, achieving higher attack success rates on blackbox models. In addition, we combined our approach with advanced methods and attacked an ensemble of multiple networks, which further improved the transferability of adversarial examples.
2. Related Works
2.1. Adversarial Attack Methods
A variety of techniques have been proposed to generate transferable adversarial examples, which can be divided into two categories: optimizerbased methods and dataaugmentationbased methods. Specifically, Goodfellow et al. proposed the fast gradient sign method (FGSM) to generate adversarial examples with very low calculation costs [6]. By extending FGSM, Alexey et al. developed the iterative fast gradient sign method (IFGSM) [7]. Dong et al. proposed the momentum iterative fast gradient sign method (MIFGSM) to improve the transferability of adversarial examples in blackbox environments [8]. Lin et al. proposed Nesterov iterative fast gradient sign method (NIFGSM) and adopted Nesterov accelerated gradient into the iterative attacks [18]. Xie first applied data augmentation in the generation of adversarial examples and proposed the diverse inputs method (DIM) [9]. Dong et al. proposed a translationinvariant attack (TIFGSM) and improved the transferability of adversarial examples on defense models by a large margin [19]. Lin et al. proposed the scaleinvariant attack method (SIM) and combined it with existing methods, resulting in SINITIDIM (the currently strongest blackbox attack method) [18].
2.2. Adversarial Defense Methods
Multiple defense mechanisms have been proposed to protect deep learning models from the threat of adversarial examples [20–29]. Among these, adversarial training is the most effective way to improve model robustness [6, 30]. In this process, adversarial examples are generated and added to the training set to participate in the model training procedure. Since normal adversarial training remains vulnerable to adversarial examples, Tramèr et al. proposed ensemble adversarial training to improve robustness further [31]. In this process, adversarial data generated by multiple models are included in the training set for a single model, thereby producing a more robust classifier.
3. Methodology
This section provides a detailed introduction to our proposed methodology. Let denote an input image, and denote the corresponding groundtruth label. The term represents network parameters, and describes a loss function, typically the crossentropy loss. The primary objective is to generate an adversarial example to fool the model by maximizing , such that the prediction label . In this paper, we used an norm bound to limit adversarial perturbations, such that , where refers to the size of the perturbation. The generation of adversarial examples can, therefore, be converted into the following optimization problem:
3.1. Attack Methods Based on Gradient
In this section, several techniques that are used to solve the above optimization problem are introduced briefly:(i)Fast gradient sign method (FGSM) [6]: As one of the simplest techniques, it seeks adversarial examples in the direction of the gradient of the loss function with respect to the input image . The method can be expressed as(ii)Iterative fast gradient sign method (IFGSM) [7]: This algorithm is an iterative version of FGSM. The approach involves dividing the FGSM gradient operation into multiple steps that can be expressed as follows: where denotes the step size of each iteration and , in which denotes the number of iterations. The function was included to limit the adversarial example within the neighborhood of the original image and satisfy the norm constraint. IFGSM is more effective than FGSM in whitebox environments but less effective in blackbox environments. In other words, IFGSM exhibits poor transferability. This iterative attack method is also known as projected gradient descent (PGD) if the algorithm is added by a random initialization on [20].(iii)Momentum iterative fast gradient sign method (MIFGSM) [8]: In this method, a momentum term is applied to the iterative process to escape from poor local maxima. This produces adversarial examples with more transferability, which can be expressed as [8]where denotes the decay factor of the momentum term, and denotes the weighted accumulation of gradients in the first rounds of iterations.
3.2. Adam Iterative Fast Gradient Method
The generation of adversarial examples is similar to the training of neural networks; both processes can be viewed as an optimization problem. Specifically, the adversarial example can be viewed as the training parameter, while the whitebox model can be viewed as the training set, and the blackbox model can be viewed as a testing set. From this point of view, the transferability of the adversarial examples is similar to the generalization of the models. Therefore, we can apply the methods used to improve the generalization of the models to the generation of adversarial examples. Many methods are proposed to improve the generalization of neural network models, which can be divided into two categories: better optimization and data augmentation. Correspondingly, these two kinds of methods can be used in the adversarial attacks, and there have been attempts to do so, for example, MIFGSM and DIM [8, 9]. Based on the analysis above, we aim to improve the transferability of adversarial examples with Adam optimizer since it performs well in the training of neural networks.
Adam [17] is an optimization algorithm combining momentum [32] and RMSProp [33]. In the iteration process, Adam accumulates not only the gradients of the loss function with respect to the input image but also the square of the gradients. This is done to accelerate loss function ascent in dimensions with small gradients. In addition, Adam uses a decreasing step size in each iteration to achieve better convergence.
In contrast, both IFGSM and MIFGSM adopt a constant step size. In the later stages of the algorithm, oscillations occur near the local maximum, which do not converge well. To solve this problem, we have improved the Adam algorithm by normalizing the step size sequence (defined in Algorithm 1) and have applied it to the iterative gradient method. The proposed Adam iterative fast gradient method (AIFGM) is summarized in Algorithm 1.

Specifically, the gradient in each iteration is normalized by its own distance, defined in Algorithm 1, because the scale of these gradients differs widely in each iteration [8]. Similar to MIFGSM, accumulates gradients of the first iterations with a decay factor , defined in Algorithm 1. The result can be considered the first momentum. The term represents the second momentum, which accumulates the squares of gradients for the first t iterations with a decay factor β_{2}, defined in Algorithm 1. It is noteworthy that denotes an elementwise square , where represents the Hadamard product [32]. The terms and are typically defined in the range . The update direction of input is defined in Algorithm 1, where the stability coefficient is set to avoid a zero in the denominator. The term is advantageous as it prompts to escape from local maxima and accelerates the updating of in dimensions with small gradients.
Learning rate decay is often used in the training of neural networks. In this study, the step size used in the Adam optimizer is continually reduced to help improve the convergence of the algorithm. If and are set appropriately, the value of will decrease with the increase of . Thus, a decreasing sequence can be generated when ranges from to . The sequence can then be normalized to obtain the weight of step sizes in each iteration, relative to the total step size . This can be used to acquire a set of exponential decay step sizes controlled by and β_{2}, as defined in Algorithm 1.
Existing techniques typically use the sign function to satisfy the norm limitation. However, if the sign function was applied in Algorithm 1, the update direction in our method would be equivalent to that of MIFGSM. Hence, we constrain adversarial examples within the norm bound by the Clip function and apply the step size and update direction within the corresponding norm bound, defined in Algorithm 1. The relationship between the norm bound () and the norm bound () is defined in Algorithm 1, where represents the dimension of the input image .
3.3. Attacking an Ensemble of Networks
The proposed method was also used to attack an ensemble of networks. If an adversarial example poses a threat to multiple networks, it is far more likely to transfer to other models [11]. We followed the ensemble strategy proposed by Dong et al., in which multiple models are attacked by fusing network logits [8]. Specifically, in order to attack an ensemble of models, logits were fused as follows:where refers to the logit output of the kth model, refers to the ensemble weight (), and .
4. Experiment
Extensive validation experiments were conducted to demonstrate the effectiveness of the proposed methodology. Experimental settings are provided in Section 4.1. The results of attacking a single network and some parameters that influence the results are discussed in Section 4.2. Thus, the results of attacking an ensemble of models are shown in Section 4.3. Finally, Section 4.4 exhibits the results of attacks by the combination of AIFGM and existing methods (source code is available at https://github.com/YinHeng121/Adam_attack).
4.1. Experimental Setup
(i)Data set: if the original images cannot be classified correctly by the network, it is meaningless to generate adversarial examples based on these data. Therefore, we selected 1,000 images belonging to the 1,000 categories from the ImageNet validation set, all of which could be classified correctly by the tested networks. All images were scaled to . As such, in Section 3.2.(ii)Networks: seven different models were studied; four of which were normally trained networks (i.e., Inceptionv3 (Incv3) [34], Inceptionv4 (Incv4) [35], InceptionResnetv2 (IncResv2) [35], and Resnetv2101 (Res101) [36]). The other three models were adversarially trained networks (i.e., ens3advInceptionv3 (Incv3_{ens3}), ens4advInceptionv3 (Incv3_{ens4}), and ensadvInceptionResNetv2 (IncResv2_{ens}) [31]).(iii)Other details: the proposed method was compared with the other methods discussed in Section 3.1. All the experiments described in this paper were based on the norm. Thus, the momentum decay factor μ in Algorithm 1 was 1.0, and the stability coefficient δ in Algorithm 1 was 10^{−8}, as suggested in [17].
4.2. Attacking a Single Network
We first performed adversarial attacks on a single network with FGSM, IFGSM, MIFGSM, and AIFGM. Moreover, the adversarial examples were generated on four normally trained networks and tested on all seven networks. The results are shown in Table 1, where the success rates are misclassification rates for the corresponding models, with adversarial examples used as input. The decay factors and in AIFGM were set to 0.99 and 0.999, respectively. The maximum perturbation was 16. The number of iterations for IFGSM, PGD, MIFGSM, and AIFGM was 10. These methods are hereafter referred to as “iterative methods” without ambiguity. The effects of these parameter choices are discussed further in this section.
As shown in Table 1, all four iterative methods attacked a whitebox model with a near 100% success rate. AIFGM performed better than the other three methods on all blackbox models. For example, for adversarial examples generated on Incv3, AIFGM had a success rate of 60.7% on Incv4, while MIFGSM, PGD, IFGSM, and FGSM reached 54.3%, 18.9%, 27.5%, and 32.1%, demonstrating the effectiveness of the proposed method. An original image and the corresponding adversarial example generated for Incv3 by AIFGM are shown in Figure 2.
(a)
(b)
4.2.1. Decay Factors
The terms and control not only the decay amplitude of the step sizes but also the accumulation intensity of gradients for and . As such, they have a direct impact on attack success rates. We applied a gridsearch method to identify an optimal set of and values. In the experiments, the maximum perturbation was set to 16, and the number of iterations was set to 10. The values of and ranged from 0 to 1. Notably, we not only chose the values of and uniformly from 0.1 to 0.9 but also selected values close to 0 and 1, as shown in Figure 3. This was done to provide a more comprehensive study on the effects of decay factors, as we propose that the relationship between the success rate and the decay factors may be nonlinear. Adversarial examples were then generated on Incv3 with AIFGM and used to perform attacks on Incv3, Incv4, and Incv3_{ens4}. It is evident from the figure that regardless of the values of and , attack success rates were always near 100% in whitebox environments. Beyond that, attack success rates were more sensitive to for blackbox models, regardless of whether networks were normally or adversarially trained. Success rates were maximized when both and were close to 1.
(a)
(b)
(c)
4.2.2. The Number of Iterations
The effect of iteration quantities on success rates was studied by performing attacks using iterative methods. The maximum perturbation was set to 16, and the decay factors and were set to 0.99 and 0.999, respectively. The number of iterations ranged from 1 to 20. Adversarial examples generated on Incv3 with IFGSM, MIFGSM, and AIFGM were then used to attack Incv3 and Incv4, as shown in Figure 4.
As shown in the figure, the success rates of blackbox attacks for several iterative methods decreased with the increase of the iteration quantities. However, AIFGM always outperformed IFGSM and MIFGSM for a given value of .
4.2.3. Perturbation Size
The effect of adversarial perturbation size on attack success rates was studied by setting the number of iterations to 10, with decay factors of 0.99 and 0.999. The size of perturbations ranged from 1 to 30. Adversarial examples generated on Incv3 with IFGSM, MIFGSM, and AIFGM were used to attack Incv3 and Incv4, as shown in Figure 5.
It is evident from the figure that attack success rates can reach 100% in whitebox environments. Success rates for blackbox models increased steadily with increasing values of . AIFGM always outperformed IFGSM and MIFGSM for a given value of . In other words, AIFGM can achieve comparable blackbox attack success rates with smaller perturbations.
4.3. Attacking an Ensemble of Networks
The experimental results presented above suggest that AIFGM can increase the transferability of adversarial examples. In addition, the success rate of blackbox attacks can be further improved by attacking an ensemble of networks. As discussed in Section 3.3, multiple models were attacked by fusing network logits. In the experiment, all seven networks described in Section 4.1 were used, and we generated adversarial examples on the ensemble of Incv3, Incv4, IncResv2, and Res101, with FGSM, IFGSM, MIFGSM, NIFGSM, and AIFGM. Attacks were then performed on the other three defense models. Decay factors were set to 0.99 and 0.999; the number of iterations was 10; the maximum perturbation was 16; and each network had an equal ensemble weight of . The corresponding results are shown in Table 2, and it is evident that AIFGM was more effective than the other four methods on adversarially trained models.
4.4. Combination with Advanced Methods
In this subsection, we combined our AIFGM with SINITI, SINIDI, and SINITIDI [18] and compared the blackbox attack success rates of our extensions with the original methods under a singlemodel setting. The adversarial examples were generated on Incv3, with the number of iterations set to 10 and the maximum perturbation to 16. It is noteworthy that our combination with other methods is more like an improvement. For example, the combination of AIFGM and SINITI was done by replacing the Nesterov optimizer in SINITI with Adam, thus resulting in SIAITI. As shown in Table 3, our method SIAITIDI achieved an average attack success rate of 65.1%, surpassing the stateoftheart attack by 10.7%.
Furthermore, we generated adversarial examples on the ensemble models using our SIAITIDI. As shown in Table 4, we achieved an average attack success rate of 95.0% on adversarially trained models under the blackbox setting, which raised a new security issue for the robust deep neural networks.
5. Discussion
It is commonly acknowledged that the training of neural network models is similar to the generation of adversarial examples, especially for gradientbased generation methods. Hence, techniques used in the training of neural networks, to improve model generalizability, can also be adopted to improve the transferability of adversarial examples. Since the Adam optimizer is often used in the training of neural networks, to improve convergence and achieve better performance on test sets, a second momentum term and decay step size in Adam were included in the AIFGM algorithm. This was done to improve the transferability of adversarial examples. Furthermore, we suggest that other techniques (such as data augmentation) could be used to improve the performance of adversarial examples further in blackbox environments.
6. Conclusions
In this study, we proposed the Adam iterative fast gradient method to improve the transferability of adversarial examples. Specifically, the Adam algorithm was modified to increase its suitability for the generation of adversarial examples. The proposed method improved both the iteration update direction and step size. The effectiveness of the proposed method was verified by an extensive series of experiments with ImageNet. Compared with previous gradientbased adversarial example generation techniques, our method improved attack success rates in blackbox environments. In addition, we further improved the transferability of adversarial examples by combining our approach with existing methods and attacking ensemble models, which achieved a stateoftheart attack success rate against adversarially trained networks. We suggest the proposed method could be used as a reference for other iterative gradientbased methods. For example, data augmentation could be combined with AIFGM to achieve better attack performance.
Data Availability
The public data (data set and models) can be downloaded from https://github.com/tensorflow/cleverhans/tree/master/examples/nips17_adverarial_competition/dataset, https://github.com/tensorflow/models/tree/master/research/slim, and https://github.com/tensorflow/models/tree/master/research/adv_imagenet_models. The source code of the proposed method in this paper is available at https://github.com/YinHeng121/Adam_attack.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
Authors’ Contributions
Heng Yin and Hengwei Zhang contributed equally to this work.
Acknowledgments
This work was supported by the National Key Research and Development Program of China, under grant no. 2017YFB0801900.