Security Threats to Artificial IntelligenceDriven Wireless Communication Systems
View this Special IssueResearch Article  Open Access
Exploiting the Relationship between Pruning Ratio and Compression Effect for Neural Network Model Based on TensorFlow
Abstract
Pruning is a method of compressing the size of a neural network model, which affects the accuracy and computing time when the model makes a prediction. In this paper, the hypothesis that the pruning proportion is positively correlated with the compression scale of the model but not with the prediction accuracy and calculation time is put forward. For testing the hypothesis, a group of experiments are designed, and MNIST is used as the data set to train a neural network model based on TensorFlow. Based on this model, pruning experiments are carried out to investigate the relationship between pruning proportion and compression effect. For comparison, six different pruning proportions are set, and the experimental results confirm the above hypothesis.
1. Introduction
Model compression is a common method to transplant artificial intelligence from the cloud to the embedded terminal. Network pruning is a particularly effective compression solution for models [1, 2]. In [1, 3], Han et al. proposed a method of compression based on pruning but did not investigate the relationship between pruning proportion and compression effect. At the same time, He et al. [2] studied channel pruning for accelerating very deep neural networks, yet the pruning rate on the prediction effect is not stated. In fact, some studies of pruning methods have been carried out in recent years. However, to the best of our knowledge, there are very few studies on the relationship between the pruning proportion and the size, accuracy, and computing time which is used to make predictions. It is also the motivation of our research.
In a trained neural network model, pruning sets all parameters with values less than a specific threshold to zero. After pruning, retraining and sparsification are normally conducted, where sparsification can delete connections with the zero values to compress the size of the model [4, 5]. As an example, the two figures show the comparison before and after pruning, where Figure 1 shows the original structural diagram, and Figure 2 shows the structural diagram after pruning.
Here, based on TensorFlow, we will use MNIST as the data set to train a neural network model. TensorFlow is an opensource machine learning framework. Specifically, it is software, and users need to build mathematical models by programming in Python and other languages. These models are used in the application of artificial intelligence. MNIST data set is a handwritten data set with 60,000 handwritten digital images in the training library and 10,000 in the test library. It is a good database for people who want to try learning techniques and pattern recognition methods on realworld data while spending minimal efforts on preprocessing and formatting.
In this paper, we make the hypothesis that the pruning proportion is positively correlated with the compression scale of the model but not with the prediction accuracy and calculation time. So, our research object is the preliminary relationship between pruning proportion and compression effect in the neural network model. Specifically, this paper studies the relationship from three aspects: first, the relationship between pruning proportion and model size; second, the relationship between pruning proportion and model prediction accuracy; lastly, the relationship between pruning proportion and computing time for model predictions. For the above objective, a great number of experiments are carried out to investigate the relationship between pruning proportion and compression effect, and the above hypothesis is confirmed, which is our main contribution in this paper.
The rest of this paper is organized as follows. In Section 2, the neural network model is proposed first. To test the hypothesis, an original model and an experimental plan are introduced in Section 3. Section 4 gives the experimental procedures, and Section 5 gives the experimental results and analysis. Finally, Section 6 concludes this paper.
2. Neural Network Model
A neural network is constituted by one input layer, one or several hidden layers, and one output layer, and every layer is constituted by a certain number of neurons. These neurons are interconnected, just like the nerve cells of humans. Figure 3 shows the structure of the neural network.
We assume that is the ith individual (solution) in the population. The mutation operator aims to generate mutant solutions. For each solution , a mutant solution is created by the corresponding mutation scheme. There are some classical mutation schemes listed as follows:(1)DE/rand/1:(2)DE/rand/2:(3)DE/best/1:(4)DE/best/2:where are five randomly selected individual indices between 1 and N, and is usually used. is the global best individual (solution).
The crossover operator focuses on recombining two different individuals and creates a new one. In DE, a trial solution is created based on the following crossover operation:where CR is called the crossover rate, the random value rand_{j} is in the range [0, 1], and j_{r} is a randomly selected dimension index. As seen, U_{i} inherits from V_{i} and X_{i} based on the value of CR. For a large CR, most dimensions of U_{i} are taken from V_{i}. For a small CR, most dimensions of U_{i} are taken from X_{i}. For the latter case, U_{i} is similar to its parent X_{i}.
3. Design of the Experiment
3.1. Structure of the Original Model
The basic neural network structure consists of the following layers in sequence: convolutional layer, pooling layer, convolutional layer, pooling layer, and two fully connected layers [6, 7], which is shown in Figure 4. In the experiment plan, pruning is performed by default on the weight parameters of the two fully connected layers. Alternative pruning is performed on all network parameters, and the specific operations are executed by changing the command line parameters [8, 9].
3.2. Experiment Plan
The experiment is based on the TensorFlow framework and used MNIST as the dataset. An original model is trained in the beginning, and then six pruning practices with different pruning proportions are employed [10, 11]. For each pruning, retraining and sparsification are subsequently performed. When all three operations are completed on the original model, the task of pruning compression is also finished [12, 13]. Then, the data are collected and analysed for comparison (size, accuracy, and computing time for making predictions).
4. Experimental Procedures
4.1. Run Command of the Pruning Experiment
Model pruning is executed by the following command: python train.py −1 −2 −3 train_data_dir /tmp/mnist_data train_dir /tmp/mnist_train variables_dir /tmp/mnist_variables max_steps 10000 batch_size 32 sparse_ratio 0.9 pruning_variable_names w_fc1, w_fc2. Table 1 specifies the parameters in this command [14–16].

4.2. Pruning Effect View Command
The effects of the −1 or −2 parameters can be viewed through eval_predict_with_dense_network.py. The specific command is python eval_predict_with_dense_network.py test_data_dir /tmp/mnist_data checkpoint_dir /tmp/mnist_train/step_2_2 batch_size 32 max_steps 10. Table 2 specifies the parameters in this command [17–19].

The effect of −3 sparsification can be viewed through eval_predict_with_sparse_network.py. The specific command is python eval_predict_with_sparse_network. py test_data_dir /tmp/mnist_data checkpoint_dir /tmp/mnist_train/step_3 batch_size 32 max_steps 10. Table 3 specifies the parameters in the command.

4.3. Hardware and Software Configuration of the Experiment
Pruning experiments are based on the following hardware and software parameters and versions [20, 21]: Operating system: Windows 10 GPU: NVIDIA GeForce GTX 1080 Ti 11.0 G CPU: Intel(R) Core(TM) i34160 CPU @ 3.60 GHz Memory: 16.0 GB DDR3 Disk: Lenovo SSD SL700 240G Software: TensorFlowGPU 1.5.0, Python 3.6
4.4. Construction of Experimental Environment
The experiment is based on the MNIST dataset and the TensorFlow framework. The experimental environment was constructed by the following three steps [22–24]: Step 1: constructing the Python environment. Directly following Anaconda and then directly adding and running Anaconda34.3.1Windowsx86_64.exe. Step 2: constructing the plugins of NVIDIA GPU. Directly running the cuda_9.0.103_win10.exe for installation. Unzipping cudnn9.0windows10x64v7.zip and copying its contents to the folder C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0. Step 3: constructing the TensorFlow environment. Executing the installation command on the CMD commands: pip install tensorflowgpu = = 1.5.0.
5. Experimental Results and Analysis
In this section, six different pruning proportions are employed in this experiment. The six groups of tables show the specific data of pruning proportion, model size, accuracy, and computing time for predictions.
5.1. 10% Pruning Proportion
First, the pruning proportion is set to 10%; Table 4 shows the parameters of pruning effect in the first scene. In this group of experiments, the parameter threshold values of the two fully connected layers are set to 0.012034996 and 0.013038448. In this way, the valid parameter numbers are reduced from 3,211,264 and 10,240 to 2,890,137 and 9,215, respectively, making exactly 10% of the parameter values of the two fully connected layers equal to 0. However, the model size after the pruning, retraining, and sparsification is 66.5 M, which is larger than the size (37.5 M) of the original model. Hence, no compression effect is achieved. In addition, compared with the original model, the accuracy does not change, and the computing time for predictions slightly increases [25, 26].

5.2. 30% Pruning Proportion
Second, the pruning proportion is set to 30%; Table 5 shows the parameters of pruning effect in the first scene. In this group of experiments, the parameter threshold values of the two fully connected layers are set to 0.036936015 and 0.039559085. In this way, the valid parameter numbers are reduced from 3,211,264 and 10,240 to 2,247,884 and 7,167, respectively, making exactly 30% of the parameter values of the two fully connected layers equal to 0. However, the model size after the pruning, retraining, and sparsification is 51.8 M, which is larger than the size (37.5 M) of the original model. Hence, no compression effect was achieved. Again, compared with the original model, the accuracy does not change, and the computing time for predictions slightly increases.

5.3. 50% Pruning Proportion
Third, the pruning proportion is set to 50%; Table 6 shows the parameters of pruning effect in the first scene. In this group of experiments, the parameter threshold values of the two fully connected layers are set to 0.06429165 and 0.068891354. In this way, the valid parameter numbers are reduced from 3,211,264 and 10,240 to 1,605,631 and 5,119, respectively, making exactly 50% of the parameter values of the two fully connected layers equal to 0. The model size after pruning, retraining, and sparsification is 37.1 M, which is slightly smaller than the size (37.5 M) of the original model. Here, compression takes effect. Besides, both accuracy and computing time for predictions slightly decrease as compared with those of the original model.

5.4. 70% Pruning Proportion
Fourth, the pruning proportion is set to 70%; Table 7 shows the parameters of pruning effect in the fourth scene. In this group of experiments, the parameter threshold values of the two fully connected layers are set to 0.09749276 and 0.10360378. In this way, the valid parameter numbers are reduced from 3,211,264 and 10,240 to 963,379 and 3,071, respectively, making exactly 70% of the parameter values of the two fully connected layers equal to 0. The model size after pruning, retraining, and sparsification is 22.3 M, which is smaller than the size (37.5 M) of the original model. The compression effect is obvious. Moreover, both accuracy and computing time for predictions slightly decrease as compared with those of the original model.

5.5. 80% Pruning Proportion
Fifth, the pruning proportion is set to 80%; Table 8 shows the parameters of pruning effect in the fifth scene. In this group of experiments, the parameter threshold values of the two fully connected layers are set to 0.11903707 and 0.12662686. In this way, the valid parameter numbers are reduced from 3,211,264 and 10,240 to 642,252 and 2,047, respectively, making exactly 80% of the parameter values of the two fully connected layers equal to 0. The model size after pruning, retraining, and sparsification is 14.9 M, which is smaller than the size (37.5 M) of the original model, and compression is 60%. Additionally, as compared with the original model, the accuracy slightly decreases and the computing time for predictions slightly increases.

5.6. 90% Pruning Proportion
Lastly, the pruning proportion is set to 90%; Table 9 shows the parameters of pruning effect in the sixth scene. In this group of experiments, the parameter threshold values of the two fully connected layers are set to 0.14814831 and 0.15710811. In this way, the valid parameter numbers are reduced from 3,211,264 and 10,240 to 321,126 and 1,023, respectively, making exactly 90% of the parameter values of the two fully connected layers equal to 0. The model size after pruning, retraining, and sparsification is 7.6 M, which is compressed by 80%. Furthermore, both accuracy and computing time for predictions slightly decreased as compared with those of the original model.

5.7. Comparison Results
Figure 5 shows the comparison results for persistence model size of the four networks, with the pruning ratio increases, and the model size represented by the red columns decreases gradually. Apparently, the pruning proportion is positively correlated with the model size.
Figure 6 shows the comparison results for testing accuracy of the four networks, with the pruning ratio increases, and the testing accuracy represented by the red columns has no obvious changes. This means that there is no positive relationship between pruning proportion and accuracy.
Figure 7 shows the comparison results for computing time of the four networks. With the pruning ratio increases, the computing time for prediction represented by the red columns changes irregularly. Also, there is no positive relationship between pruning ratio and computing time for predictions.
6. Conclusions
By comparing the experimental data of six different pruning proportions, it is found that pruning does not necessarily compress the size of the model. Compression takes effect only when the pruning proportion reaches 50% or more. Furthermore, we found a positive relationship between the pruning proportion and the model size. However, there was no positive relationship between pruning proportion and accuracy and between pruning proportion and computing time for predictions.
Since there is no specific experimental verification for other models, the conclusion does not apply to other models. Additionally, the experimental is based on the pruning method, pruning is only one of the compression methods of various models; thus, the conclusion of this study is not applicable to other compression methods [27–29].
Data Availability
The data used to support the findings of this study can be accessed publicly in the website http://yann.lecun.com/exdb/mnist/.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
Acknowledgments
This work was supported by the key project of the Natural Science Research of Higher Education Institutions in Anhui Province (grant no. KJ2018A0461); Anhui Province Key Research and Development Program Project (grant no. 201904a05020091); and a provincial quality engineering project from Department of Education Anhui Province (grant no. 2019mooc283).
References
 S. Han, J. Pool, S. Narang, H. Mao et al., “DSD: densesparsedense training for deep neural networks,” in Proceedings of the International Conference on Learning Representations (ICLR), pp. 1–13, Toulon, France, April 2017. View at: Google Scholar
 Y. He, X. Zhang, and J. Sun, “Channel pruning for accelerating very deep neural networks,” in Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), 2017. View at: Publisher Site  Google Scholar
 S. Han, J. Kang, H. Mao et al., “ESE,” in Proceedings of the Proceedings of the 2017 ACM/SIGDA International Symposium on FieldProgrammable Gate Arrays—FPGA ’17, pp. 75–84, Monterey, California, USA, February 2017. View at: Publisher Site  Google Scholar
 H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf, “Pruning filters for efficient convnets,” 2016, https://arxiv.org/abs/1608.08710. View at: Google Scholar
 Y. Zhang, G. Cui, S. Deng et al., “Efficient query of quality correlation for service composition,” IEEE Transactions on Services Computing, 2018. View at: Publisher Site  Google Scholar
 C. Wan, X. Yan, D. Zhang, Z. Qu et al., “An advanced fuzzy Bayesianbased FMEA approach for assessing maritime supply chain risks,” Transportation Research Part E: Logistics and Transportation Review, vol. 125, pp. 222–240, 2019. View at: Publisher Site  Google Scholar
 L. Qi, Y. Chen, Y. Yuan, S. Fu et al., “A QoSaware virtual machine scheduling method for energy conservation in cloudbased cyberphysical systems,” World Wide Web, vol. 23, no. 2, pp. 1275–1297, 2019. View at: Publisher Site  Google Scholar
 Z. Huang, G. Shan, J. Cheng, and J. Sun, “TRec: an efficient recommendation system for hunting passengers with deep neural networks,” Neural Computing and Applications, vol. 31, no. 1, pp. 209–222, 2019. View at: Publisher Site  Google Scholar
 S. Anwar, K. Hwang, and W. Sung, “Structured pruning of deep convolutional neural networks,” ACM Journal on Emerging Technologies in Computing Systems (JETC), vol. 13, no. 3, p. 32, 2017. View at: Publisher Site  Google Scholar
 S. Han, H. Mao, and W. J. Dally, “Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding,” 2015, https://arxiv.org/abs/1510.00149. View at: Google Scholar
 B. Wu, X. Yan, Y. Wang, and C. Guedes Soares, “An evidential reasoningbased CREAM to human reliability analysis in maritime accident process,” Risk Analysis, vol. 37, no. 10, pp. 1936–1957, 2017. View at: Publisher Site  Google Scholar
 B. Wu, L. Zong, X. Yan, and C. Guedes Soares, “Incorporating evidential reasoning and TOPSIS into group decisionmaking under uncertainty for handling ship without command,” Ocean Engineering, vol. 164, pp. 590–603, 2018. View at: Publisher Site  Google Scholar
 Y. Wang, E. Zio, X. Wei, D. Zhang, and B. Wu, “A resilience perspective on water transport systems: the case of Eastern Star,” International Journal of Disaster Risk Reduction, vol. 33, pp. 343–354, 2019. View at: Publisher Site  Google Scholar
 L. Qi, X. Zhang, S. Li, S. Wan et al., “Spatialtemporal datadriven service recommendation with privacypreservation,” Information Sciences, vol. 515, pp. 91–102, 2020. View at: Publisher Site  Google Scholar
 R. Zhu, Z. Sun, T. Ristaniemi, and J. Hu, “Special issue on green telecommunications,” Telecommunication Systems, vol. 52, no. 2, pp. 12331234, 2013. View at: Google Scholar
 R. Zhu, W. Shu, T. Mao, and T. Deng, “Enhanced MAC protocol to support multimedia traffic in cognitive wireless mesh networks,” Multimedia Tools and Applications, vol. 67, no. 1, pp. 269–288, 2013. View at: Publisher Site  Google Scholar
 D. Zhang, R. Zhu, S. Men, and V. Raychoudhury, “Query representation with global consistency on user click graph,” Journal of Internet Technology, vol. 14, no. 5, pp. 759–769, 2013. View at: Google Scholar
 J. Chang, H. Chao, C. Lai, and R. Zhu, “An efficient geographic routing protocol design in vehicular adhoc network,” Computing, vol. 96, no. 2, pp. 119–131, 2014. View at: Google Scholar
 K. Zhu, R. Zhu, H. Nii, H. Samani et al., “PaperIO: a 3D interface towards the internet of embedded papercraft,” IEICE Transactions on Information and Systems, vol. E97.D, no. 10, pp. 2597–2605, 2014. View at: Publisher Site  Google Scholar
 Y. Jalaeian, C. Yin, Q. Wu et al., “Locationaware deep collaborative filtering for service recommendation,” IEEE Transactions on Systems, Man, and Cybernetics: Systems (TSMC), pp. 1–12, 2019. View at: Publisher Site  Google Scholar
 Y. Ma, W. Cho, J. Chen, Y. Huang, and R. Zhu, “RFIDbased Mobility for seamless personal communication system in cloud computing,” Telecommunication Systems, vol. 58, no. 3, pp. 233–241, 2015. View at: Publisher Site  Google Scholar
 L. Qi, Q. He, F. Chen et al., “Finding all you need: web APIs recommendation in web of Things through keywords search,” IEEE Transactions on Computational Social Systems, vol. 6, no. 5, pp. 1063–1072, 2019. View at: Publisher Site  Google Scholar
 X. Xu, Y. Xue, L. Qi, Y. Yuan, X. Zhang et al., “An edge computingenabled computation offloading method with privacy preservation for internet of connected vehicles,” Future Generation Computer Systems, vol. 96, pp. 89–100, 2019. View at: Publisher Site  Google Scholar
 K. Guo, S. Han, S. Yao, Y. Wang et al., “Softwarehardware codesign for efficient neural network acceleration,” IEEE Micro, vol. 37, no. 2, pp. 8–25, 2017. View at: Publisher Site  Google Scholar
 X. Xu, Y. Li, T. Huang, Y. Xue et al., “An energyaware computation offloading method for smart edge computing in wireless metropolitan area networks,” Journal of Network and Computer Applications, vol. 133, pp. 75–85, 2019. View at: Publisher Site  Google Scholar
 Y. Peng, K. Wang, Q. He et al., “Coveringbased web service quality prediction via neighborhoodaware matrix factorization,” IEEE Transactions on Services Computing, 2019. View at: Publisher Site  Google Scholar
 X. Xu, Q. Liu, Y. Luo, K. Peng et al., “A computation offloading method over big data for IoTenabled cloudedge computing,” Future Generation Computer Systems, vol. 95, pp. 522–533, 2019. View at: Publisher Site  Google Scholar
 B. Jalaeian, R. Zhu, H. Samani, and M. Motani, “An optimal crosslayer framework for cognitive radio network under Interference Temperature model,” IEEE Systems Journal, vol. 10, no. 1, pp. 293–301, 2014. View at: Publisher Site  Google Scholar
 T. Zhou, C. Wu, J. Zhang, and D. Zhang, “Incorporating CREAM and MCS into fault tree analysis of LNG carrier spill accidents,” Safety Science, vol. 96, pp. 183–191, 2019. View at: Publisher Site  Google Scholar
Copyright
Copyright © 2020 Bo Liu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.