Abstract

Convolutional neural networks have outperformed humans in image recognition tasks, but they remain vulnerable to attacks from adversarial examples. Since these data are crafted by adding imperceptible noise to normal images, their existence poses potential security threats to deep learning systems. Sophisticated adversarial examples with strong attack performance can also be used as a tool to evaluate the robustness of a model. However, the success rate of adversarial attacks can be further improved in black-box environments. Therefore, this study combines a modified Adam gradient descent algorithm with the iterative gradient-based attack method. The proposed Adam iterative fast gradient method is then used to improve the transferability of adversarial examples. Extensive experiments on ImageNet showed that the proposed method offers a higher attack success rate than existing iterative methods. By extending our method, we achieved a state-of-the-art attack success rate of 95.0% on defense models.

1. Introduction

In image recognition tasks, convolutional neural networks are able to classify images with an accuracy approaching that of humans [14]. However, researchers have found that neural networks are also vulnerable to adversarial examples. Szegedy et al. [5] first proposed the concept of adversarial examples: images added with small perturbations, which cause neural network models to output incorrect classifications with high confidence. These adversarial perturbations are often indistinguishable to the human eyes (in other words, there is no obvious visual difference between the adversarial examples and the original images).

Adversarial attacks can be categorized into white- and black-box attacks. A variety of techniques can be used to generate adversarial examples and perform white-box attacks, depending on the model structure and corresponding parameters [610]. In addition, adversarial examples are generally transferable as data generated for one model may be able to fool other models. This facilitates black-box attacks, in which the structure and parameters of the model are not available, for various neural networks [11]. Goodfellow et al. [6] suggested that different models learn similar decision boundaries during the same image classification tasks and obtain similar parameters. These properties make it easier to generalize adversarial examples to different models.

Although adversarial examples are generally transferable, the optimal approach of improving this transferability needs to be examined further. Due to a balance between attack performance and transferability, basic iterative attacks are often more effective than single-step attacks in white-box environments and weaker in black-box environments. In white-box attacks, iterative methods excessively fit specific network parameters, achieving a high success rate but preventing generalization to other models [9]. We consider that this is the result of overfitting since attack performance for adversarial examples in white- and black-box environments is similar to the neural network performance on training and test sets.

Unlike white-box attacks, black-box attacks are more consistent with actual attack-defense environments and are primarily performed in one of the following three ways. (1) Decision-based attacks are conducted using information collected from the network. Although access to model structure and parameters is unavailable, the attacker can generate adversarial examples by executing multiple queries against the model [1215]. (2) In substitute-model techniques, the attacker can input images into the model to obtain an output label. Then, the substitute model can be trained to imitate the target model and generate adversarial examples [16]. (3) Transferability attacks are conducted by improving the transferability of adversarial examples generated in a white-box setting [8, 9]. The first two black-box attacks require large quantities of model queries, which is impractical in some cases (e.g., online platforms often limit the number of queries). As such, this study investigates black-box attacks using adversarial data with strong transferability. We investigate a state-of-the-art optimizer Adam for improving black-box adversarial attacks.

In this paper, we propose the Adam iterative fast gradient method (AI-FGM) to improve the transferability of adversarial examples among different models [17]. Inspired by the fact that Adam is better than momentum in optimization, we adopt Adam optimizer into the iterative gradient-based attack, so as to accelerate data update in dimensions with small gradients and achieve better convergence, by applying the second momentum term and a decreasing step size. As shown in Figure 1, being different from the momentum method that just accumulates the gradients of data points along the optimization path, our Adam method can also accumulate the square of the gradients. Such accumulation might help obtain an adaptive update direction, which leads to a better local minimum. Besides, a variable step size is also used to avoid oscillating.

Compared with existing gradient-based methods, the proposed method offers improved attack performance in black-box settings. This approach was tested on multiple networks, including adversarially trained networks, achieving higher attack success rates on black-box models. In addition, we combined our approach with advanced methods and attacked an ensemble of multiple networks, which further improved the transferability of adversarial examples.

2.1. Adversarial Attack Methods

A variety of techniques have been proposed to generate transferable adversarial examples, which can be divided into two categories: optimizer-based methods and data-augmentation-based methods. Specifically, Goodfellow et al. proposed the fast gradient sign method (FGSM) to generate adversarial examples with very low calculation costs [6]. By extending FGSM, Alexey et al. developed the iterative fast gradient sign method (I-FGSM) [7]. Dong et al. proposed the momentum iterative fast gradient sign method (MI-FGSM) to improve the transferability of adversarial examples in black-box environments [8]. Lin et al. proposed Nesterov iterative fast gradient sign method (NI-FGSM) and adopted Nesterov accelerated gradient into the iterative attacks [18]. Xie first applied data augmentation in the generation of adversarial examples and proposed the diverse inputs method (DIM) [9]. Dong et al. proposed a translation-invariant attack (TI-FGSM) and improved the transferability of adversarial examples on defense models by a large margin [19]. Lin et al. proposed the scale-invariant attack method (SIM) and combined it with existing methods, resulting in SI-NI-TI-DIM (the currently strongest black-box attack method) [18].

2.2. Adversarial Defense Methods

Multiple defense mechanisms have been proposed to protect deep learning models from the threat of adversarial examples [2029]. Among these, adversarial training is the most effective way to improve model robustness [6, 30]. In this process, adversarial examples are generated and added to the training set to participate in the model training procedure. Since normal adversarial training remains vulnerable to adversarial examples, Tramèr et al. proposed ensemble adversarial training to improve robustness further [31]. In this process, adversarial data generated by multiple models are included in the training set for a single model, thereby producing a more robust classifier.

3. Methodology

This section provides a detailed introduction to our proposed methodology. Let denote an input image, and denote the corresponding ground-truth label. The term represents network parameters, and describes a loss function, typically the cross-entropy loss. The primary objective is to generate an adversarial example to fool the model by maximizing , such that the prediction label . In this paper, we used an norm bound to limit adversarial perturbations, such that , where refers to the size of the perturbation. The generation of adversarial examples can, therefore, be converted into the following optimization problem:

3.1. Attack Methods Based on Gradient

In this section, several techniques that are used to solve the above optimization problem are introduced briefly:(i)Fast gradient sign method (FGSM) [6]: As one of the simplest techniques, it seeks adversarial examples in the direction of the gradient of the loss function with respect to the input image . The method can be expressed as(ii)Iterative fast gradient sign method (I-FGSM) [7]: This algorithm is an iterative version of FGSM. The approach involves dividing the FGSM gradient operation into multiple steps that can be expressed as follows:where denotes the step size of each iteration and , in which denotes the number of iterations. The function was included to limit the adversarial example within the neighborhood of the original image and satisfy the norm constraint. I-FGSM is more effective than FGSM in white-box environments but less effective in black-box environments. In other words, I-FGSM exhibits poor transferability. This iterative attack method is also known as projected gradient descent (PGD) if the algorithm is added by a random initialization on [20].(iii)Momentum iterative fast gradient sign method (MI-FGSM) [8]: In this method, a momentum term is applied to the iterative process to escape from poor local maxima. This produces adversarial examples with more transferability, which can be expressed as [8]where denotes the decay factor of the momentum term, and denotes the weighted accumulation of gradients in the first rounds of iterations.

3.2. Adam Iterative Fast Gradient Method

The generation of adversarial examples is similar to the training of neural networks; both processes can be viewed as an optimization problem. Specifically, the adversarial example can be viewed as the training parameter, while the white-box model can be viewed as the training set, and the black-box model can be viewed as a testing set. From this point of view, the transferability of the adversarial examples is similar to the generalization of the models. Therefore, we can apply the methods used to improve the generalization of the models to the generation of adversarial examples. Many methods are proposed to improve the generalization of neural network models, which can be divided into two categories: better optimization and data augmentation. Correspondingly, these two kinds of methods can be used in the adversarial attacks, and there have been attempts to do so, for example, MI-FGSM and DIM [8, 9]. Based on the analysis above, we aim to improve the transferability of adversarial examples with Adam optimizer since it performs well in the training of neural networks.

Adam [17] is an optimization algorithm combining momentum [32] and RMSProp [33]. In the iteration process, Adam accumulates not only the gradients of the loss function with respect to the input image but also the square of the gradients. This is done to accelerate loss function ascent in dimensions with small gradients. In addition, Adam uses a decreasing step size in each iteration to achieve better convergence.

In contrast, both I-FGSM and MI-FGSM adopt a constant step size. In the later stages of the algorithm, oscillations occur near the local maximum, which do not converge well. To solve this problem, we have improved the Adam algorithm by normalizing the step size sequence (defined in Algorithm 1) and have applied it to the iterative gradient method. The proposed Adam iterative fast gradient method (AI-FGM) is summarized in Algorithm 1.

Input: A convolutional neural network and the corresponding cross-entropy loss function ; an original image and the corresponding ground-truth label ; the number of iterations ; the iteration time step ; the dimension of the input image ; the size of the perturbation ; Adam decay factors and ; and a denominator stability factor .
Output: An adversarial example , s.t. .
(1) , , , and ;
(2) 
(3) while do:
(4)  
(5)  ;
(6)  ;
(7)  ;
(8)  ;
(9)  ;
(10)  ;
(11)  ;
(12) end while
(13) return .

Specifically, the gradient in each iteration is normalized by its own distance, defined in Algorithm 1, because the scale of these gradients differs widely in each iteration [8]. Similar to MI-FGSM, accumulates gradients of the first iterations with a decay factor , defined in Algorithm 1. The result can be considered the first momentum. The term represents the second momentum, which accumulates the squares of gradients for the first t iterations with a decay factor β2, defined in Algorithm 1. It is noteworthy that denotes an elementwise square , where represents the Hadamard product [32]. The terms and are typically defined in the range . The update direction of input is defined in Algorithm 1, where the stability coefficient is set to avoid a zero in the denominator. The term is advantageous as it prompts to escape from local maxima and accelerates the updating of in dimensions with small gradients.

Learning rate decay is often used in the training of neural networks. In this study, the step size used in the Adam optimizer is continually reduced to help improve the convergence of the algorithm. If and are set appropriately, the value of will decrease with the increase of . Thus, a decreasing sequence can be generated when ranges from to . The sequence can then be normalized to obtain the weight of step sizes in each iteration, relative to the total step size . This can be used to acquire a set of exponential decay step sizes controlled by and β2, as defined in Algorithm 1.

Existing techniques typically use the sign function to satisfy the norm limitation. However, if the sign function was applied in Algorithm 1, the update direction in our method would be equivalent to that of MI-FGSM. Hence, we constrain adversarial examples within the norm bound by the Clip function and apply the step size and update direction within the corresponding norm bound, defined in Algorithm 1. The relationship between the norm bound () and the norm bound () is defined in Algorithm 1, where represents the dimension of the input image .

3.3. Attacking an Ensemble of Networks

The proposed method was also used to attack an ensemble of networks. If an adversarial example poses a threat to multiple networks, it is far more likely to transfer to other models [11]. We followed the ensemble strategy proposed by Dong et al., in which multiple models are attacked by fusing network logits [8]. Specifically, in order to attack an ensemble of models, logits were fused as follows:where refers to the logit output of the k-th model, refers to the ensemble weight (), and .

4. Experiment

Extensive validation experiments were conducted to demonstrate the effectiveness of the proposed methodology. Experimental settings are provided in Section 4.1. The results of attacking a single network and some parameters that influence the results are discussed in Section 4.2. Thus, the results of attacking an ensemble of models are shown in Section 4.3. Finally, Section 4.4 exhibits the results of attacks by the combination of AI-FGM and existing methods (source code is available at https://github.com/YinHeng121/Adam_attack).

4.1. Experimental Setup

(i)Data set: if the original images cannot be classified correctly by the network, it is meaningless to generate adversarial examples based on these data. Therefore, we selected 1,000 images belonging to the 1,000 categories from the ImageNet validation set, all of which could be classified correctly by the tested networks. All images were scaled to . As such, in Section 3.2.(ii)Networks: seven different models were studied; four of which were normally trained networks (i.e., Inception-v3 (Inc-v3) [34], Inception-v4 (Inc-v4) [35], Inception-Resnet-v2 (IncRes-v2) [35], and Resnet-v2-101 (Res-101) [36]). The other three models were adversarially trained networks (i.e., ens3-adv-Inception-v3 (Inc-v3ens3), ens4-adv-Inception-v3 (Inc-v3ens4), and ens-adv-Inception-ResNet-v2 (IncRes-v2ens) [31]).(iii)Other details: the proposed method was compared with the other methods discussed in Section 3.1. All the experiments described in this paper were based on the norm. Thus, the momentum decay factor μ in Algorithm 1 was 1.0, and the stability coefficient δ in Algorithm 1 was 10−8, as suggested in [17].

4.2. Attacking a Single Network

We first performed adversarial attacks on a single network with FGSM, I-FGSM, MI-FGSM, and AI-FGM. Moreover, the adversarial examples were generated on four normally trained networks and tested on all seven networks. The results are shown in Table 1, where the success rates are misclassification rates for the corresponding models, with adversarial examples used as input. The decay factors and in AI-FGM were set to 0.99 and 0.999, respectively. The maximum perturbation was 16. The number of iterations for I-FGSM, PGD, MI-FGSM, and AI-FGM was 10. These methods are hereafter referred to as “iterative methods” without ambiguity. The effects of these parameter choices are discussed further in this section.

As shown in Table 1, all four iterative methods attacked a white-box model with a near 100% success rate. AI-FGM performed better than the other three methods on all black-box models. For example, for adversarial examples generated on Inc-v3, AI-FGM had a success rate of 60.7% on Inc-v4, while MI-FGSM, PGD, I-FGSM, and FGSM reached 54.3%, 18.9%, 27.5%, and 32.1%, demonstrating the effectiveness of the proposed method. An original image and the corresponding adversarial example generated for Inc-v3 by AI-FGM are shown in Figure 2.

4.2.1. Decay Factors

The terms and control not only the decay amplitude of the step sizes but also the accumulation intensity of gradients for and . As such, they have a direct impact on attack success rates. We applied a grid-search method to identify an optimal set of and values. In the experiments, the maximum perturbation was set to 16, and the number of iterations was set to 10. The values of and ranged from 0 to 1. Notably, we not only chose the values of and uniformly from 0.1 to 0.9 but also selected values close to 0 and 1, as shown in Figure 3. This was done to provide a more comprehensive study on the effects of decay factors, as we propose that the relationship between the success rate and the decay factors may be nonlinear. Adversarial examples were then generated on Inc-v3 with AI-FGM and used to perform attacks on Inc-v3, Inc-v4, and Inc-v3ens4. It is evident from the figure that regardless of the values of and , attack success rates were always near 100% in white-box environments. Beyond that, attack success rates were more sensitive to for black-box models, regardless of whether networks were normally or adversarially trained. Success rates were maximized when both and were close to 1.

4.2.2. The Number of Iterations

The effect of iteration quantities on success rates was studied by performing attacks using iterative methods. The maximum perturbation was set to 16, and the decay factors and were set to 0.99 and 0.999, respectively. The number of iterations ranged from 1 to 20. Adversarial examples generated on Inc-v3 with I-FGSM, MI-FGSM, and AI-FGM were then used to attack Inc-v3 and Inc-v4, as shown in Figure 4.

As shown in the figure, the success rates of black-box attacks for several iterative methods decreased with the increase of the iteration quantities. However, AI-FGM always outperformed I-FGSM and MI-FGSM for a given value of .

4.2.3. Perturbation Size

The effect of adversarial perturbation size on attack success rates was studied by setting the number of iterations to 10, with decay factors of 0.99 and 0.999. The size of perturbations ranged from 1 to 30. Adversarial examples generated on Inc-v3 with I-FGSM, MI-FGSM, and AI-FGM were used to attack Inc-v3 and Inc-v4, as shown in Figure 5.

It is evident from the figure that attack success rates can reach 100% in white-box environments. Success rates for black-box models increased steadily with increasing values of . AI-FGM always outperformed I-FGSM and MI-FGSM for a given value of . In other words, AI-FGM can achieve comparable black-box attack success rates with smaller perturbations.

4.3. Attacking an Ensemble of Networks

The experimental results presented above suggest that AI-FGM can increase the transferability of adversarial examples. In addition, the success rate of black-box attacks can be further improved by attacking an ensemble of networks. As discussed in Section 3.3, multiple models were attacked by fusing network logits. In the experiment, all seven networks described in Section 4.1 were used, and we generated adversarial examples on the ensemble of Inc-v3, Inc-v4, IncRes-v2, and Res-101, with FGSM, I-FGSM, MI-FGSM, NI-FGSM, and AI-FGM. Attacks were then performed on the other three defense models. Decay factors were set to 0.99 and 0.999; the number of iterations was 10; the maximum perturbation was 16; and each network had an equal ensemble weight of . The corresponding results are shown in Table 2, and it is evident that AI-FGM was more effective than the other four methods on adversarially trained models.

4.4. Combination with Advanced Methods

In this subsection, we combined our AI-FGM with SI-NI-TI, SI-NI-DI, and SI-NI-TI-DI [18] and compared the black-box attack success rates of our extensions with the original methods under a single-model setting. The adversarial examples were generated on Inc-v3, with the number of iterations set to 10 and the maximum perturbation to 16. It is noteworthy that our combination with other methods is more like an improvement. For example, the combination of AI-FGM and SI-NI-TI was done by replacing the Nesterov optimizer in SI-NI-TI with Adam, thus resulting in SI-AI-TI. As shown in Table 3, our method SI-AI-TI-DI achieved an average attack success rate of 65.1%, surpassing the state-of-the-art attack by 10.7%.

Furthermore, we generated adversarial examples on the ensemble models using our SI-AI-TI-DI. As shown in Table 4, we achieved an average attack success rate of 95.0% on adversarially trained models under the black-box setting, which raised a new security issue for the robust deep neural networks.

5. Discussion

It is commonly acknowledged that the training of neural network models is similar to the generation of adversarial examples, especially for gradient-based generation methods. Hence, techniques used in the training of neural networks, to improve model generalizability, can also be adopted to improve the transferability of adversarial examples. Since the Adam optimizer is often used in the training of neural networks, to improve convergence and achieve better performance on test sets, a second momentum term and decay step size in Adam were included in the AI-FGM algorithm. This was done to improve the transferability of adversarial examples. Furthermore, we suggest that other techniques (such as data augmentation) could be used to improve the performance of adversarial examples further in black-box environments.

6. Conclusions

In this study, we proposed the Adam iterative fast gradient method to improve the transferability of adversarial examples. Specifically, the Adam algorithm was modified to increase its suitability for the generation of adversarial examples. The proposed method improved both the iteration update direction and step size. The effectiveness of the proposed method was verified by an extensive series of experiments with ImageNet. Compared with previous gradient-based adversarial example generation techniques, our method improved attack success rates in black-box environments. In addition, we further improved the transferability of adversarial examples by combining our approach with existing methods and attacking ensemble models, which achieved a state-of-the-art attack success rate against adversarially trained networks. We suggest the proposed method could be used as a reference for other iterative gradient-based methods. For example, data augmentation could be combined with AI-FGM to achieve better attack performance.

Data Availability

The public data (data set and models) can be downloaded from https://github.com/tensorflow/cleverhans/tree/master/examples/nips17_adverarial_competition/dataset, https://github.com/tensorflow/models/tree/master/research/slim, and https://github.com/tensorflow/models/tree/master/research/adv_imagenet_models. The source code of the proposed method in this paper is available at https://github.com/YinHeng121/Adam_attack.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Authors’ Contributions

Heng Yin and Hengwei Zhang contributed equally to this work.

Acknowledgments

This work was supported by the National Key Research and Development Program of China, under grant no. 2017YFB0801900.