Abstract
The deep learning based image steganalysis is becoming a serious threat to modification-based image steganography in recent years. Generation-based steganography directly produces stego images with secret data and can resist the advanced steganalysis algorithms. This paper proposes a novel generation-based steganography method by disguising the stego images into the kinds of images processed by normal operations (e.g., histogram equalization and sharpening). Firstly, an image processing model is trained using DCGAN and WGAN-GP, which is used to generate the images processed by normal operations. Then, the noise mapped by secret data is inputted into the trained model, and the obtained stego image is indistinguishable from the processed image. In this way, the steganographic process can be covered by the process of image processing, leaving little embedding trace in the process of steganography. As a result, the security of steganography is guaranteed. Experimental results show that the proposed scheme has better security performance than the existing steganographic methods when checked by state-of-the-art steganalytic tools, and the superiority and applicability of the proposed work are shown.
1. Introduction
Data hiding is an important technique to solve security problems and protect data. Steganography is an important branch of data hiding, which can be divided into image steganography [1, 2], audio steganography [3, 4], and video steganography [5, 6], according to different carriers. Image steganography aims to conceal secret data within cover images transmitted through public channels without causing suspicion [7, 8]. As the adversary, the purpose of image steganalysis is to reveal the presence of steganography by detecting the images transmitted on public channels [9, 10]. Image steganography and steganalysis technology have achieved rapid development in opposition. Traditional image steganography methods are mainly based on embedding, in which secret data is embedded into a cover image by modification [11, 12].
At present, the most advanced approach is content adaptive steganography (e.g., HILL [13], SUNIWARD [14], and WOW [15]). The main idea is to minimize the heuristically defined embedding distortion to achieve high undetectability [16]. In recent years, GAN (generative adversarial network) is gradually being combined with traditional steganography [17–20]. These methods are still modification based steganography, in which the modification traces will inevitably be left during embedding and be detected by a modern steganalyzer [21]. Modern steganalysis mainly uses supervised machine learning to classify cover and stego images. Features are extracted from an image training set to train a steganalytic model firstly and are then used to identify suspicious images [22–24]. The ensemble classifier [25] is commonly used to improve the detection accuracy, which has been proved to be effective in steganalysis. In recent years, the performance of deep learning based image steganalysis is good, in which the feature extraction and classifier are combined together [26–28]. Qian et al. [26] firstly use the KV kernel in the traditional method to filter the image to generate a residual image and use a Gaussian function to extract more effective features, and the final performance is slightly inferior to SRM [22]. In XuNet [27], Tanh activation function is used in the first two layers of the network model, 1 × 1 convolution is used in the upper layer, and batch normalization layer is added behind the convolution layer to prevent the model from converging to a local minimum. The performance of the XuNet model is comparable to that of the rich model. YeNet [28] puts 30 kV kernels in the SRM into the first layer of the model to extract residual information, combines with channel selection information, and uses truncated linear units to filter the features. The performance exceeds SRM and its variant algorithms. Up to now, the deep learning based image steganalysis is becoming a serious threat to modified steganography.
Steganography without embedding (SWE) means that there is no need to modify the carrier to realize information hiding. Compared with embedding-based steganography, it leaves no trace of modification and has the ability to resist these state-of-the-art deep learning based image steganalyzers [21]. As a type of SWE, generation-based steganography has been developed in recent years [21, 29–31], in which the stego image is produced by secret data directly. In [29–31], image generation steganography method is designed by mapping the secret information from semantic features of the cover images. The disadvantage is the small steganographic capacity and the stego image generated is a fixed type of texture synthesis image or fingerprint image, which may attract the attention of the warder. Hu et al. [21] propose an image generation steganography method by mapping secret information to random noise according to certain rules based on DCGAN.
In some scenarios, images are processed (e.g., histogram equalization and sharpening) for better visual quality. Since these operations are widely used in image processing, the processed images will not cause noticeable suspicion. Thus, the processed images can be used for steganography. However, the undetectability of steganography will decrease when the cover image is processed [12]. Therefore, we propose a novel generation-based steganography method in this paper, by disguising steganography as image processing operations instead of modifying the processed image to resist these state-of-the-art steganalyzers. Specifically, we first train image processing generation model based on DCGAN and WGAN-GP, and then the model can generate high-quality processed images through random noise mapped by secret data. In this way, the steganographic process can be covered by the process of image processing, and the purpose of leaving no embedded trace in the process of image steganography can be realized. Compared with [29, 30], the stego images generated have better diversity. In addition, our method has better security performance than existing steganographic methods when checked by state-of-the-art steganalytic tools.
2. Related Work
This part introduces the related work of image generation steganography we propose. Since the steganographic method we propose is based on DCGAN and WGAN-GP, this section first introduces the basic principles of GAN and its variants, as well as their respective advantages and disadvantages. Then, the applications of GAN and its variants in the field of image steganography in recent years are described.
2.1. GAN and Its Variants
GAN is a deep unsupervised learning model proposed in 2014 [32]. It is characterized by being able to generate samples as close as possible to the data distribution of the target samples. GAN uses game theory to train two neural networks: generator model G and discriminator model D. G converts the random noise that obeys the prior distribution into generated samples, so that the distribution of the generated sample that obeys is as close to the distribution of the real data as possible; D tries to determine whether the input samples are real samples or generated samples. The training process of GAN can be summarized as a Minimax game: discriminator D tries to maximize the probability of correctly distinguishing real samples from generated samples; generator model G tries to maximize the probability that discriminator model D cannot distinguish generated samples. The objective function when training GAN is shown in (1), where G (z) represents the sample generated by generator G according to input random noise vector z, and D (x) represents the probability that discriminator model D determines that sample x is the real sample:
GAN has become a research hotspot since it was proposed. However, there are some problems in GAN, such as unstable training, disappearance of gradients, and mode collapse, so there are constantly new varieties, such as DCGAN [33], WGAN [34], and WGAN-GP [35], which greatly promote the development of GAN.
DCGAN is a better improvement after GAN. The improvement is mainly in the network structure. DCGAN almost completely uses convolution layer instead of full link layer. The discriminator is almost symmetric to the generator. The entire network does not have pooling layer and upsampling layer. In fact, convolution with step length is used instead of upper sampling to increase the stability of training. The network structure of DCGAN is widely used to improve the stability of GAN training and the quality of generated results [33].
Although DCGAN has a good structure, it still cannot improve the stability of GAN training, and it needs to balance the training process of G and D carefully. Unlike DCGAN, WGAN mainly improves GAN from the perspective of loss function, and WGAN theoretically solves the problems of model training instability and model collapse and generates more diverse samples [34]. WGAN-GP model puts forward a new method of continuity restriction gradient penalty, which solves the problem of gradient disappearance and gradient explosion and has faster convergence speed than the standard WGAN [35]. In addition, WGAN-GP model can generate higher quality samples without much parameter adjustment in the whole training process.
2.2. GAN for Image Steganography
In recent years, steganalysis has gradually combined with deep learning and has achieved a great degree of development. This poses a great threat to image steganography [36]. And research on deep learning based image steganography has also made some progress. These tasks have mainly achieved image steganography through the training of neural networks on large-scale sample sets. Zhang et al. [36] add the gradient of the loss function of the neural network steganalysis of the target as the antinoise to the input image, forming the adversarial sample to resist the detection of steganalyzer. However, the disadvantage of the adversarial sample is that it is difficult to extract secret information [19]. Tang et al. [18] propose an automatic steganographic learning framework ADSL-GAN based on the STC algorithm [37]. ADSL-GAN achieves the adaptive embedding of secret information into the carrier image, but its performance is slightly inferior to that of S-UNIWARD. The variant of the ADSL-GAN model is produced later [19]. Since the ADSL-GAN model and its variant still embed secret information by modifying image pixel values, they will inevitably be detected by more advanced steganalyzers because they leave traces of modification. Volkhonskiy et al. [17] design an image steganographic model SGAN based on DCGAN. The model includes generator G, discriminator D, and steganalyzer S. The three parts update the parameters of each other through alternating training to make the SGAN model get better performance. SGAN generates carrier images by generating adversarial networks. The steganographic process still uses traditional steganographic algorithms. SGAN does improve the security of the model, but the carrier image generated is not real enough, which facilitates attracting the attention of channel listeners. In this paper, we try to implement image generation steganography in the process of image processing by using different GAN models and detect the security of the stego images.
3. Proposed Image Steganographic Scheme
The flowchart of the proposed method is shown in Figure 1. Firstly, the original image set is processed by common image processing methods of histogram equalization and sharpening to form processed image sets. Then, the generation model based on DCGAN and WGAN-GP is trained with the formed processed image sets. Next, we train corresponding extraction models according to the recovery error of random noise vector to extract secret data from the stego image generated by the generation models. At last, along with the method of [21], the sender divides the secret data to be transmitted into several segments, maps each segment to the noise zi through the mapping rules, and then inputs zi into the trained generation model to generate the processing stego image: the receiver uses the trained extraction model to extract the noise vector z′ from the received processed stego image and recovers the secret data according to the reverse mapping rules.

3.1. Training Image Processing Generation Model
Since our idea is to embed secret information in the process of image processing, the image processing generation model needs be trained firstly. The generation model uses GAN excellent variants—DCGAN and WGAN-GP. Before the generation model training, histogram equalization and sharpening are performed on the original image set to form two processed image sets. Then, the two processed image sets are used to train DCGAN and WGAN-GP, respectively, until the two models converge and generate high-quality images.
3.2. Training the Extraction Model
The purpose of the extraction model is to extract the noise vector from the processed stego image generated by the generation model and then recover the secret data according to the mapping rules. The structure of the extraction model is shown in Figure 2. It is similar to the structure of the discriminator D in DCGAN [33] and is mainly composed of convolutional neural networks and fully connected networks. For the purpose of enhancing the ability of nonlinear learning and improving the speed of network convergence, each layer of convolutional neural network of the extraction model network uses a Leaky Relu activation function and batch normalization. In order to satisfy the output noise value of the extraction model between −1 and 1, we set the output layer function of the network as tanh.

The training process of the extraction model is as follows: Firstly, we generate random noise vector z with the same dimensions as zi between −1 and 1 continuously. Then, random noise vector z is input into the trained image processing generation model to generate processed stego image, and then the processed stego image is input to the extraction model to extract the noise vector z′. Here, the square loss between the input random noise vector z and the recover noise vector z′ extracted from the extraction model is defined as the loss function of the extraction model. During the training of the extraction model, when its loss function is small enough and the network converges, we can use it to extract noise vectors from the processed stego images.
3.3. Communication of Secret Information
We refer to the method of reference [21] to map secret information into a range of random noise values. Firstly, the sender divides the secret information into segments and then maps each segment of the secret information into corresponding noise vector according to the mapping rules. In the mapping process, the divided secret information is mapped to a noise value with a given interval according to the following equation:
In the equation, the function random generates the mapping noise vector by the interval . m is the value of the secret information bits to be mapped, and is a positive integer variable, representing the number of secret information bits mapped by a random noise. represents the gap between the divided intervals, and a small amount of deviation is allowed when extracting secret information from the processed stego images, so that the accuracy of secret information extraction can be ensured.
Subsequently, the sender generates processed stego images through the mapped noise values and the trained image processing generation model. The sender transmits the processed stego image generated to the receiver. After receiving the processed stego images, the receiver extracts the corresponding noise vector through the trained extraction model and restores the corresponding secret information according to the reverse mapping rules. Here, since the mapping rule refers to Formula (2), the information extraction process also requires Formula (2) and its parameters.
4. Experimental Results and Analysis
In this paper, we use the Celebrities dataset to train the models. This dataset contains about 200,000 faces and is one of the commonly used datasets. Before the experiment, all the images are cropped to 64 × 64 pixels firstly, and then we perform histogram equalization and sharpening on the cropped datasets with MATLAB to form two new datasets, dataset 1 and dataset 2. Then, we use dataset 1 and dataset 2 to train the image processing generation models based on DCGAN and WGAN-GP, respectively. The models are trained using minibatch stochastic gradient descent (SGD) with a minimum batch size of 64. The initial learning rates of the DCGAN and WGAN-GP models during training are set to 0.0002 and 0.0001, respectively. In addition, Adam optimization method is used to adjust the learning rate of the models automatically during the training process, which can make the network model easier to converge and learn network parameters smoothly. During the experiment, the model of GPU used is NVIDIA Tesla P100-PCIE-16GB and the experimental environment is TensorFlow.
The experiment consists of four parts: In the first part, we train image processing generation models based on DCGAN and WGAN-GP with dataset 1 and dataset 2, respectively. In this part, we compare the convergence process of two network models on two training sets and the quality of the generated images. In the second part, we train the extraction model based on the output of the trained image processing generation model. In the third part, we generate processed stego images through mapped noise values and the trained image processing generation model and use the trained extraction model to extract the noise vector and recover the secret information and then test the recovery accuracy of the embedded secret information in the process of image processing. In the fourth part, we test the security of the proposed steganographic method with several state-of-the-art steganalysis methods today.
4.1. Training Image Processing Generation Model
In the process of training image processing generation model, we train DCGAN and WGAN-GP models with dataset 1 and dataset 2, respectively, with the purpose of generating high-quality histogram equalization and sharpening images. In the experiment, the dimension of random noise vector is 100, which is input into the generation model. And we set the initial learning rate of 0.0002; the minibatch is 64 to train the DCGAN model for 100 epochs, and WGAN-GP model is trained for 200 epochs; the initial learning rate is 0.0001, and the minibatch is 64. The loss curve of training generation model is shown in Figure 3.

(a)

(b)

(c)

(d)
Through comparative experiments, we observe that the DCGAN model is easier to converge than the WGAN-GP model during the training process, but the parameters of the DCGAN model need to be adjusted continuously during the training process; the WGAN-GP is more stable in the training process; the parameters do not need to be adjusted during the training process. And, during training, the histogram equalization processing generation model is easier to converge than the image sharpening processing generation model. The processed stego images generated by different types of image processing generation models are shown in Figure 4. As can be seen from Figure 4, the processed image generated by the image processing generation model based on WGAN-GP is more real and of higher quality than the processed image generated by the image processing generation model based on DCGAN.

(a)

(b)

(c)

(d)
4.2. Training the Extraction Model
Since the extraction model is mainly composed of convolutional neural network and fully connected network, therefore it needs a large dataset to train in the training process. In this paper, a large number of random noise vectors are input to the trained image processing generation model to generate processed stego images firstly, and then we train the extraction model by minimizing the mean square error between the corresponding input noise vector and the extracted noise vector. During training, the input images of the extraction model are processed stego images generated by the trained image processing generation model, with size of 64 × 64 pixels, and the optimization method is Adam optimization. In addition, we set the initial learning rate of the extraction model to 0.0002 and the minibatch to 100. Finally, we recorded 500 batches as an iteration, and loss curves of the extraction models corresponding to the four image processing generation models after 100 iterations are shown in Figure 5. It can be seen from the change curve of the loss function of the extraction model that, during the initial training period, the loss function curve fluctuates violently, which means that the ability to extract the noise vector is weak. In addition, it can be seen from Figure 5 that the loss function of the extraction model corresponding to the histogram equalization generation model converges faster than the loss function of the extraction model corresponding to the image sharpening generation model. When the extraction model of the histogram equalization generation model is trained for 80 iterations, and the extraction model of the image sharpening generation model is trained for 90 iterations, their corresponding loss function curves are gradually stable, which means that the ability to extract the noise vector has become very strong.

(a)

(b)
4.3. Secret Information Recovery Accuracy
For a steganographic method, it is important to extract the embedded information completely [38]. We define the recovery accuracy as the ratio of the number of secret information bits which correctly recovered from the processed stego image to the number of original secret information bits. Here, we test the important index of recovery accuracy of secret information. It can be seen from [21] that different values have a certain effect on the secret information recovery accuracy. Therefore, during this part of the experiment, we extract the secret information of the processied stego images generated by the four image processing generation models under different and is 0.01, study the corresponding secret information recovery accuracy, and compare it with the steganographic method proposed in [21] under the same conditions. The results are shown in Figure 6. It can be seen from Figure 6 that the smaller the payload, the higher the accuracy of the secret information recovery. In addition, under the same conditions, the recovery accuracy of the sharpening stego images generated is higher than that of the histogram equalization stego images generated. Finally, the recovery accuracy of the processed stego images generated by our proposed image processing generation model is slightly higher than that of the stego images generated in [21] under the same conditions.

(a)

(b)
In addition, in order to test the robustness of the proposed steganographic method, we add 10 dbw Gaussian noise interference to the processed stego images generated and test the recovery accuracy of the processed stego images generated by our proposed image processing generation model under the condition of being 1 and being 0.01; the test results are shown in Table 1. Comparing Figure 6 and Table 1, we can see that adding Gaussian noise to the processed stego images generated can indeed reduce the accuracy of the recovery of secret information to a certain extent, but the reduction is very small.
4.4. Security Analysis
The characteristic of SWE method is that there is no difference between the cover images and the stegos image, so its undetectability is usually relatively strong [29, 30]. Since the steganographic method proposed in this paper is also a kind of SWE, therefore, in theory, the steganographic method can effectively resist the detection of steganalysis. Next, to study the undetectability of the steganographic method proposed in this paper, we use state-of-the-art image steganalysis methods to detect the processed stego images generated. The SRM proposed in [22] is a state-of-the-art feature extractor, which combines with an ensemble classifier [25] and constitutes a high performance steganalyzer in spatial domain. Ye et al. [28] proposed a CNN-based spatial steganalysis model, which is characterized by the initialization of the first layer with high pass filters and uses the activation function TLU and selection-channel-aware scheme for steganalysis. We use the trained image processing generation model to generate 5000 processed stego images and detect them with these state-of-the-art steganalyzers. This experiment is divided into the following two parts: In the first part of the experiment, 5000 processed cover images are taken from dataset 1 and dataset 2, respectively, and then they are embedded with several state-of-the-art spatial steganographic algorithms HILL, S-UNIWARD, and WOW with different payloads to generate the corresponding stego images. Next, we use these cover images and corresponding stego images to train the steganalyzer and test 5000 stego images generated by the image processing generation model. The criterion of evaluating the performance of the processed stego image set is to obtain the minimum total error on the test set [28]:where represents false alarm rate and represents missed detection rate. Testing error of the processed stego image generated from different image processing steganography with different steganalyzers is shown in Tables 2 and 3. Column of Algorithm 1 of Table 2 and
Table 3, DCGAN1, DCGAN2, DCGAN3, WGAN-GP1, WGAN-GP2, and WGAN-GP3, respectively, represent the results of the test of processed stego images generated by the two image processing generation models with different steganalyzers. Among them, steganalyzers are trained by HILL, S-UNIWARD, and WOW steganographic algorithms with different payloads. And testing error rate comparison curve of the processed stego image generated from different image processing generation models with different steganalyzers is shown in Figure 7.

(a)

(b)

(c)

(d)
Figure 7 shows the undetectability comparisons of the processed stego images generated from different image processing generation models with different steganalyzers intuitively. Tables 2 and 3 depict all numerical values. The experimental results indicate that the use of traditional steganographic method to embed the secret information in the processed cover image to form the processed stego image has poor resistance to detection. Among them, with the increase of the steganographic payload, the undetectability of the processed stego images gradually deteriorates, especially when the payload increases to 0.5 bpp and the detection error approaches 0. And through comparative experiments, we can observe that the processed stego images generated by the image processing generation model proposed in this paper can significantly improve the undetectability and, as the payload increases, the improvement effect is more obvious. In addition, the processed stego images generated by the image processing generation model based on WGAN-GP have stronger undetectability than the processed stego images generated by the image processing generation model based on DCGAN under the same conditions.
During the second part of the experiment, we use the processed cover images and the corresponding processed stego images generated from the image processing generation model proposed in this paper to train the steganalyzer and detect the processed stego images generated. The testing error of the processed stego image generated from different image processing generation models with different steganalyzers is shown in Table 4.
According to [21], we know that the steganographic capacity of the processed stego image generated by the image processing generation model according to the mapping rule between secret information and noise is greater than or equal to 37.5, and the size of the processed stego image generated is 64 × 64, so the payload of the processed stego image generated is greater than or equal to 0.0732 bpp. It can be seen from Table 2 that the undetectability of the processed stego image is gradually weakened with the increase of steganographic payload, so the undetectability of the processed stego image generated with payload of 0.0732 bpp is less than that generated with payload of 0.05 bpp. Therefore, it can be concluded that, under the condition of payload of 0.0732 bpp, the undetectability of the histogram equalization stego image generated by the image processing generation model is higher than that of the processed stego image generated by traditional steganographic methods by comparing Table 4 with Tables 2 and 3. However, the undetectability of the processed image generated is still relatively poor. This problem has also been raised in [21]. The main reason is that there is a certain degree of distortion in the image generated by GAN model; that is, there is a certain gap between the image generated and the original input image. However, in the practical application, we can take the method of keeping the processed stego image set generated confidential to ensure the security of the processed stego image generated by the image processing generation model proposed in this paper.
4.5. Steganographic Capacity
In the experiment, the dimension of the input random noise vector is 100; therefore, the steganographic capacity of each processed stego image generated is 100 × σ, according to the mapping rules. We know that the current steganographic capacity of SWE is smaller than that of embedded steganography [21]. The proposed generation based steganography is a type of SWE, and the steganographic capacity is relatively small. However, since the size of the processed stego image generated is smaller, the relative steganographic capacity (steganographic capacity/size of the stego image) is larger.
4.6. Further Scope
Since our secret information extraction model is based on convolutional neural networks instead of artificially designed extraction rules, our proposed steganographic method cannot recover 100% of secret information due to the loss of convolutional neural network training. Reference [21] has the same problem. In the future, we may try to combine Reed-Solomon error-correction codes to further improve the accuracy of secret information extraction. In addition, since our steganographic method is based on GAN, the processed stego images generated by proposed steganographic method still have a certain degree of distortion. In the future, we will try different methods to generate more realistic processed stego images.
5. Conclusion
This paper proposes a new steganographic method, that is, embedding secret information in the process of image histogram equalization and sharpening processing through the trained histogram equalization generation model and sharpening generation model. Among them, the image processing generation models can generate high-quality processed images through random noise; then, we try to generate stego images directly from noise mapped by secret information during image processing. And the secret information can be extracted through the extraction model. To prove the security of the steganographic scheme proposed in this paper, we use the current state-of-the-art steganalyzer to detect the processed stego images generated. Experimental results prove that the steganographic method has better security performance to resist detection by state-of-the-art steganalyzer.
Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that they have no conflicts of interest.