Abstract

With the gradual introduction of deep learning into the field of information hiding, the capacity of information hiding has been greatly improved. Therefore, a solution with a higher capacity and a good visual effect had become the current research goal. A novel high-capacity information hiding scheme based on improved U-Net was proposed in this paper, which combined improved U-Net network and multiscale image analysis to carry out high-capacity information hiding. The proposed improved U-Net structure had a smaller network scale and could be used in both information hiding and information extraction. In the information hiding network, the secret image was decomposed into wavelet components through wavelet transform, and the wavelet components were hidden into image. In the extraction network, the features of the hidden image were extracted into four components, and the extracted secret image was obtained. Both the hiding network and the extraction network of this scheme used the improved U-Net structure, which preserved the details of the carrier image and the secret image to the greatest extent. The simulation experiment had shown that the capacity of this scheme was greatly improved than that of the traditional scheme, and the visual effect was good. And compared with the existing similar solution, the network size has been reduced by nearly 60%, and the processing speed has been increased by 20%. The image effect after hiding the information was improved, and the PSNR between the secret image and the extracted image was improved by 6.3 dB.

1. Introduction

With the advent of the current network information age, information security has received more and more attention. As an important technology in the field of information security, digital watermarking and image information hiding technology could effectively protect the privacy and information security of network participants [1]. Information hiding technology could hide secret information into images, which could be used for the transmission of secret information and also be used in fields such as intellectual property protection.

Traditional information hiding used the correlation between digital image pixels to hide, which could be divided into spatial domain schemes and transform domain schemes. The spatial domain scheme directly utilized the correlation between pixels [2], and more researched schemes included least significant bit (LSB), histogram translation [3], difference expansion [4], and other methods. The transform domain scheme used a series of transforms to convert image into transform coefficients and then used the correlation between the coefficients for information hiding. More researched schemes included discrete cosine transform, wavelet transform [5, 6], and so on. In addition, in order to improve the secrecy and robustness of hidden information, information hiding schemes based on image compression [7] and encryption [8] have also been proposed.

With the emergence and wide application of deep learning, a steganography scheme based on deep learning, deep steganography was proposed [9]. In the initial research, Wu et al. [10] used deep convolutional networks for image steganography. Such a scheme was not a simple LSB change, and the residual image did not contain identifiable secret images, so the effect of the scheme was good. Rehman et al. [11] proposed an end-to-end steganography scheme. The scheme used the CNN network to design the encoder and decoder ideas, which were hidden by the encoder and extracted by the decoder.

The application of deep learning to the field of information hiding was a major technological innovation. On the one hand, it greatly increased the capacity of information hiding [12], and on the other hand, it greatly improved the robustness of information hiding [13]. Traditional information hiding schemes could hide binary images into grayscale images or color images [14, 15], but the schemes that hide grayscale images into color images had poor visual effects [16]. However, the capacity of information hiding using deep learning network was greatly improved. It had better visual effects to hide grayscale images into color images [17, 18], and even hide color images into color images [19].

In recent years of research, some excellent deep learning solutions have been used in the field of information hiding, including models such as image generation, target detection, and image segmentation. In 2014, Goodfellow et al. [20] proposed a Generative Adversarial Network (GAN), which was originally designed for image generation. Zhang et al. [21] used GAN for information hiding. GAN network performed well in the field of image generation, so it also had good visual effects when generating images containing secret information. Yang et al. [22] and Wang et al. [23] also used the GAN network idea to generate hidden images in the confrontation between the generator and the discriminator. The generated hidden images and the extracted secret information had good visual effects. Zhou et al. [24] proposed an information hiding technology using Faster-RCNN. This solution used Faster-RCNN to identify the contents of the carrier image and expressed secret data through these contents. After steganalysis, it can be seen that the scheme had high security.

In 2015, Ronneberger et al. [25] proposed a U-shaped neural network: U-Net, which was used for image segmentation. U-Net scheme not only performed well in the field of medical image segmentation but also had good results when applied to the field of information hiding. Duan et al. [26] used U-Net for reversible information steganography, which had a good visual effect under the premise of larger capacity. Liu et al. [27] innovatively combined the U-Net network and the wavelet transform method to realize data hiding. The introduction of wavelet transform made the image feature division more accurate, and the extracted image had smaller differences with the original secret image. Liu et al. [28] used the combination of DenseNet and DWT sequence for information hiding. The scheme hid the secret information after DWT transformation into the image through a deep learning network. The research focused of this scheme was higher robustness, but the hidden capacity was relatively low.

In this paper, a novel high-capacity information hiding scheme based on improved U-Net was proposed by combined U-Net network and wavelet transform. The improved U-Net had a smaller network scale and could achieve better results in both information hiding and information extraction. When hiding information, the secret image was converted into four wavelet components by the wavelet transform, and then the wavelet components were hidden into carrier image through the improved U-Net network. When extracting information, the image features were extracted into four components through the improved U-Net network, and then the extracted secret image was obtained. The scheme proposed in this paper had the following contributions: (1) an improved U-Net network structure that could be used for information hiding and information extraction at the same time was proposed, and the network had a smaller scale; (2) compared with the existing information hiding solutions, the information hiding capacity was larger, and the image quality of the hiding and extraction was significantly improved; and (3) the scatter plots of PSNR and SSIM were proposed in this paper, which could compare solutions more clearly.

The rest of this paper was organized as follows: Section 2 is related to work, which introduces and analyzes the scheme proposed by Liu et al. [27]. The scheme proposed is discussed in Section 3 in this paper. It introduced the hiding network, extraction network, loss function, and overflow problem of the proposed scheme. The experimental results and analysis are shown in Section 4, which compared and analyzed the experimental data of the proposed scheme. Section 5 is conclusion.

2. Preliminaries

2.1. Haar Wavelet Transform

Haar wavelet transform (HWT) [29] was used in the scheme. HWT decomposed the image to obtain four components with different detailed features. The process of performing HWT on image B could be roughly described as , where H was the Haar transformation matrix, and its size should be the same as that of the image B. According to the following formula, the Haar basis function of size N × N could be generated, where , , and :where , if , then , q = 0 or 1, and if , then . The process of obtaining each component of the image after Haar wavelet transform (HWT) and its inverse process (iHWT) is shown in Figure 1.

Among them, A was the approximate component, which was the closest to the pixel distribution of the original image and contains the highest energy; H was the horizontal component, which was the detailed feature of the image in the horizontal direction; V was the vertical component, which was the detailed feature of the image in the vertical direction; and D was the diagonal component, which contained the details of the diagonal direction of the image.

2.2. Information Hiding Scheme Using U-Net

In the scheme of using deep learning network for information hiding, Liu et al. [27] proposed a data hiding scheme based on wavelet transform and U-Net, which used a similar U-Net network connection structure in the hiding network. When hiding, the Haar wavelet transform was performed on the secret image to get the wavelet components, and the wavelet components were connected to different convolutional layers of the U-Net through the convolutional network. When extraction, the features were decomposed into different wavelet components through two layers of deconvolution, and then the secret image was obtained after inverse Haar wavelet transform.

The scheme proposed by Liu et al. [27] used the U-Net connection structure in the information hiding model. After the secret image passed through the HWT, four components were obtained. The four components were connected to different feature layers according to their different energies. The hiding network structure is shown in Figure 2.

The information hiding network of this scheme included 7-layer convolution and 7-layer deconvolution, not only that but also 7-layer convolution in secret information processing. Such a connection structure could ensure that the obtained hidden image had high definition, but the network structure was too redundant. The extraction network obtained four components by extracting features, and then obtained the secret image according to the inverse Haar wavelet transform. The extraction structure is shown in Figure 3.

In the extraction network, there were 3-layer convolution and 10-layer deconvolution, and the U-Net connection method was not used. Because only a simple convolution connection was used, the accuracy of the information extraction process was relatively low. Also, the activation function used in the output layer was LeakyReLU, and the value range of the activation function would be less than 0 or greater than 1, so the resulting image would have a pixel overflow problem with pixel values less than 0 or greater than 255. However, the pixel overflow problem was not dealt with in the paper, so the effect of the secret image obtained was relatively poor.

3. Proposed Scheme

Because the information hiding scheme using deep learning had a high hidden capacity, and the traditional information hiding using wavelet transform had good visual effects. A novel high-capacity information hiding scheme based on improved U-Net network and Haar wavelet transform was proposed in this paper. The improved U-Net structure had a smaller scale, which was used in both information hiding and information extraction, and had good experimental results. Compared with traditional information hiding schemes, the proposed scheme had a larger capacity increase, and compared with the existing deep learning information hiding schemes, the visual effect was greatly improved, and the network structure was greatly reduced. In the hiding network, the secret image was decomposed into four components using Haar wavelet transform (HWT), and the wavelet component was hidden into the image through the improved U-Net. In the extraction network, the image features were extracted into four components using the improved U-Net, and after inverse wavelet transform (iHWT), the secret image could be obtained.

3.1. Hiding Network

The hiding network used the improved U-Net structure of the full convolutional network. The input was a 256 × 256 × 3 carrier image and a 256 × 256 grayscale secret image, and the output was a 256 × 256 × 3 hidden image. The entire hiding network had only 4 convolution operations and 4 deconvolution operations, as shown in Figure 4. The secret information was decomposed into four components after HWT, and the components were concatenated into the feature layer of the hiding network. In the hiding network, except for the first and last layers, all other layers used BatchNorm (BN) acceleration. The activation function ReLU was used after each convolutional layer, and the activation function LeakyReLU was used after the deconvolutional layer. The size of the convolution kernel was 4 × 4, stride = 2, and padding was set to the same.

In response to the needs of the information hiding field, we had improved the structure of the traditional U-Net. First, we abandoned the pooling layer because it had a bad impact on the quality of the hidden image. Second, only the downsampling and upsampling of the convolution part were retained. Because experiments have shown that such a change could reduce the network scale on the basis of ensuring the effect. Finally, a cross-layer connection scheme that did not require crops was used to further reduce the pixel loss.

The hiding network structure of the proposed scheme was greatly reduced compared with the scheme proposed by Liu et al. [27]. Also, the combination of the proposed scheme was to connect all wavelet coefficients with the same feature instead of separating them. The components of the secret image after wavelet transform had more precise details. The wavelet components were hidden in the carrier image, which increased the degree of secrecy of the secret image in the hidden image. The hiding network used three cross-layer connections. This technology could supplement the underlying features and improve the clarity of hidden images.

3.2. Extraction Network

The extraction network used the improved U-Net structure similar to the hiding network, as shown in Figure 5. The input was a 256 × 256 × 3 RGB secret image, and the output was a 256 × 256 grayscale secret image. At the end of the network, the features were divided into four wavelet components A, H, V, D through four deconvolutions (the last four deconvolutions were called tail layer), and after inverse Haar wavelet transform (iHWT), the secret image was obtained. The extraction network also used BN acceleration except for the first layer and the tail layer. The activation function was the same as the hiding network. The size of the convolution kernel of the other layers except the tail layer was 4 × 4, stride = 2, the size of the convolution kernel of the tail layer was 3 × 3, stride = 1, and the padding was set to the same.

The extraction network in this paper used three cross-layer connections. This connection method could supplement the underlying features so that the extracted secret image was more accurate and the details were good. The extraction network used a network in the form of U-Net rather than a simple convolutional connection. Compared with Liu et al. [27], it had a great improvement, and the extraction effect was greatly improved.

3.3. Loss Function

The loss function used the mean square error (MSE), which could represent the error between two images. A smaller MSE value indicated that the difference between the images was small. In order to ensure the high quality of information hiding and information extraction at the same time, the loss function of the scheme needed to include both information hiding and information extraction loss functions. Both the hidden network and the extraction network used MSE as the loss function. The calculation method of MSE is as follows:

Among them, I and I′ represent two images, respectively, and M and N are the length and width of the image. The loss of the information hiding network loss1 is as follows:where represents the carrier image and represents the hidden image. The loss of information extraction loss2 is as follows:where A, H, V, and D represent the wavelet components of the secret image and represent the extracted wavelet components. The loss function of the whole scheme is as follows:

It was worth noting that, in practical applications, the four components obtained after image normalization and wavelet transformation had the following value ranges:

In the activation function of the neural network, there was no activation function that satisfied the four value ranges at the same time. Therefore, in this paper, the components H, V, and D after HWT were uniformly added by one, and the extracted components were subtracted by one as a whole. In this way, when LeakyReLU was used as the activation function, the four components could be extracted more accurately.

3.4. Pixel Overflow Prevention

The proposed scheme used LeakyReLU as the activation function in the last layer, and the value range of the activation function would be less than 0 or greater than 1. The image obtained at this time would have the phenomenon that the pixel value was less than 0 or greater than 255, which was called pixel underflow or pixel overflow. At the same time, since HWT was used in the extraction network, the image obtained after iHWT based on the four extracted components would also have the risk of overflow.

In order to prevent the occurrence of pixel overflow, this paper used the method of forced conversion to process the obtained image. The processing method was as follows: for the obtained hidden image and the obtained extracted image, if the image pixel was less than 0, it was forced to be converted to 0, and if the pixel value was greater than 255, it was forced to be converted to 255.

In actual situations, it was impossible for the pixel value of an image to be less than 0 or greater than 255, so such a forced conversion could improve the accuracy of the resulting image.

4. Simulation Results and Discussion

This paper used a total of 46,000 images integrated from the ImageNet dataset [30] and the VOC2012 dataset [31]. Also, 40,000 images and 6000 images were used for training and testing, respectively. The images in the dataset were preprocessed into 256 × 256 × 3 RBG images. For each carrier image, its secret image was randomly selected in the dataset and converted into a 256 × 256 grayscale image. The hardware used in the experiment was GPU Tesla P100, the software environment was pytho3.6 and pytorch, and the initialization parameters during training were the initial network learning rate lr = 0.001 and the number of iterations epoch = 1000.

In order to verify the superiority of the scheme proposed in this paper, the visual effects, model advantages, residual graphs, and scatter plots were analyzed. Figure 6 shows the part of the images used for training or testing in this paper, where the upper line was the carrier image, and the lower line was the secret image.

4.1. Hidden Capacity Analysis

Information hiding capacity was an important indicator of information hiding technology. With the current deepening of information hiding research, information hiding capacity has been continuously improved. The solution proposed in this paper could hide a grayscale image into a color image. The effective capacity (EC) of the solution represented the number of bits embedded in each unit pixel, and the unit was bpp (bit per pixel). The calculation method of EC is as follows:where NS represents the size of the secret image (bits), and NC represents the number of pixels of the carrier image (color images have three dimensions of RGB). Table 1 shows the effective capacity (EC) comparison of the proposed scheme, the traditional information hiding scheme, and the existing deep learning information hiding scheme.

According to Table 1, it could be seen that the effective capacity of the proposed scheme was 2.67 bpp and in each pixel was hidden 2.67 bits of secret information. Compared with the traditional information hiding scheme [1, 6, 10], it had a larger capacity improvement. Compared with the deep learning information hiding scheme [3, 22], the capacity has also been greatly improved. The proposed scheme hid the grayscale image into the color image of the same size, which had the same hidden capacity as the scheme [18, 27]. Therefore, it could be seen that the information hiding capacity of the proposed scheme was greatly improved compared with the existing schemes, and it has reached the maximum theoretical capacity for color images to hide gray images.

4.2. Visual Effects

Information hiding technology required that the hidden image had as little visual difference as possible from the carrier image, and the extracted image should also have no pixel difference with the secret image as much as possible. When the visual difference between the two images was particularly small, the histogram could show the invisible pixel difference between the two images. Figure 7 shows the part of the images in the program test set and the corresponding histograms.

According to the result image in Figure 7, the visual difference between the carrier image and the hidden image in this paper was so small that it was impossible to see whether any secret information was hidden, and it fully met the requirements of information hiding. At the same time, the extracted image was clear, and with good visual effects, and there was almost no visual difference between the secret image and the extracted image. According to the pixel distribution histogram, it could be seen that the carrier image histogram and the hidden image histogram were basically the same, and it could not be seen whether the secret image was hidden. Similarly, the histogram of the secret image was basically the same as the histogram of the extracted image, and there was no visual difference. Therefore, the results of the hiding and extracting schemes proposed in this paper had good visual effects.

4.3. PSNR and SSIM

Peak signal-to-noise ratio (PSNR) was the most common measurement parameter in the field of information hiding, which was used to determine whether the information hiding effect was good. PSNR measured the difference between two images, and a larger PSNR indicated that the difference between the images was small. The method of calculating the PSNR between the images I and Ia is as follows:

Among them, I and Ia represent two images for calculating PSNR. Structural similarity index measure (SSIM) was also widely used in the field of information hiding. It measured the difference between two images in three aspects: luminance, contrast, and structure. The value range of SSIM was (−1, 1); when two images were exactly the same, the value of SSIM was 1. The calculation method of SSIM is as follows:where and are the average values of I and Ia, respectively, and are the variances of I and Ia, cov is the covariance of I, and Ia, c1, and c2 are variables to avoid a zero denominator. Table 2 shows the PSNR and SSIM comparison of the proposed scheme and similar schemes.

In the traditional field of information hiding, PSNR and SSIM were widely used as a standard to measure the effect of hiding and extracting. A larger PSNR value indicated that the two images were very similar. Similarly, the SSIM closer to 1 indicated that the two images had higher similarity structures. From the PSNR and SSIM comparison given in Table 2, it could be seen that the average PSNR and the average SSIM of the proposed scheme in the information hiding stage were higher than the average values in Refs. [11, 27], and in the information extraction stage were higher than the average values in Refs [11, 17, 27]. And the average value of PSNR and SSIM of the proposed scheme was greater than the maximum value of PSNR and SSIM of Zhang et al. [21] and others. Therefore, it could be seen that the measurement of PSNR and SSIM in the information hiding and information extraction stage of the proposed scheme had a certain improvement compared with the existing scheme.

In this paper, the information hiding scheme of the deep learning system had the characteristics of a larger dataset. This paper proposed a scheme of using the scatter plots of PSNR and SSIM for comparison. The scatter plots of PSNR and SSIM could display all the data in the test set in the form of scatter points, which could clearly show the maximum, minimum, average, and data distribution status of the program, Therefore, it was convenient to conduct a comprehensive analysis of the scheme. Figure 8 shows the scatter plots of PSNR and SSIM of the proposed scheme on 6000 images in the test set, as well as the average or maximum PSNR and SSIM of the comparison scheme.

According to the scatter plots of PSNR and SSIM given in Figure 8, it can be concluded that this scheme performed well on 6000 images in the test set. Most of the points of PSNR and SSIM in the test set were distributed on the upper right side of Rehman et al. [11], which was shown that the PSNR and SSIM of most test sets were higher than the average value of Rehman et al. [11]. The proposed scheme performed well in the information extraction stage, and PSNR and SSIM had greater advantages than other schemes. And the average value of PSNR and SSIM of the hiding scheme was only smaller than the scheme of Li et al. [17], but better than other schemes. Therefore, it could be concluded that this scheme had certain advantages compared with the existing schemes in the two processes of comprehensive hiding and extraction.

At the same time, the proposed scatter plots of PSNR and SSIM could clearly show the maximum, minimum, average, and distribution of the proposed scheme on the test set, which was convenient for comparison and analysis, and had a certain application value in the field of deep learning information hiding.

4.4. Model Comparison

In order to fully demonstrate the superiority of the proposed scheme, this section analyzed from the perspective of the model, including the model size, running speed, and other aspects. When the visual effect was close to the experimental data, the larger size model was at a disadvantage in actual application scenarios. At the same time, the running speed of the model was also part of the advantages of the model, and the advantage of faster processing speed was more obvious when processing large amounts of data. Table 3 shows the comparison between the model of the proposed scheme and the scheme proposed by Liu et al. [27]. Among them, the running speed was the average time per processing one image in the test set when BatchSize = 1.

According to Table 3, compared with Liu et al. [27], it could be seen that the proposed scheme had certain advantages in PSNR in the hiding process and the extraction process. On this basis, the number of layers of the model network had greatly reduced, which was 44% less than the number of convolutional layers of the model proposed by Liu et al. [27]. At the same time, the model size was reduced by nearly 60% compared with the scheme of Liu et al. [27]. The size of the extraction network was slightly, but the overall size of the network was greatly reduced. In addition, according to the running speed, it could be seen that in the hiding process, the running speed of the proposed scheme was nearly doubled compared to the scheme proposed by Liu et al. [27], the extraction process was also improved to a certain extent, and the overall processing speed has been increased by only 20%. It could be seen that the scheme proposed in this paper had a larger model improvement and performance improvement than the scheme proposed by Liu et al. [27].

4.5. Residual Image

Using the residual image, the difference between the carrier image and the hidden image could be displayed. If the difference between the two images was large, the residual image would have a clear image. At the same time, when the hiding network setting was not perfect, there would be the texture of the secret image in the residual image. At this time, the attacker could obtain the secret image from the residual image. Figure 9 shows some of the experimental result images and the corresponding residual images in the test set, as well as the 20-fold residual images.

It could be seen from Figure 9 that the residual image between the original image and the hidden image was almost completely black, and no information could be seen, which were shown that the difference between the two images was extremely small. At the same time, the image displayed by the 20-fold residual image was the outline of the carrier image and had no relevance to the secret image. At the same time, in Figure 7, the histogram of the hidden image did not contain the pixel distribution information of the secret image, which was shown that the scheme had better security. The attacker could not know whether the image hid the secret information through the histogram and the residual image, nor could extract the secret information through the histogram and the residual image.

5. Conclusion

At present, deep learning performed well in the field of information hiding. The information hiding scheme using deep learning had a high hidden capacity and had a certain improvement in visual effects and extraction accuracy. A novel high-capacity information hiding scheme based on improved U-Net was proposed in this paper, which combined deep learning network and multiscale image analysis. The scheme used an improved U-Net structure in both the information hiding and information extraction stages. The wavelet component of the secret information was hidden into the image by improving the U-Net structure, and the four components were extracted by improving the U-Net structure in the extraction stage. The visual effect of the hidden image was very similar to the carrier image, which was the combined effect of the decomposition of Haar wavelet transform and the hiding of the improved U-Net structure. The structure similarity between the original secret image and the extracted image was extremely high, and the detail extraction effect was good. According to the results of the simulation experiment, it can be seen that the proposed scheme had a greater hidden capacity improvement than the traditional information hiding scheme and had a greater improvement in network structure, model size, and visual effects than the newly proposed solution. Compared with the existing deep learning information hiding schemes, the hidden image obtained by the hiding network and the extracted image obtained by the extraction network had a greater accuracy improvement. In addition, the scheme had high security, and the attacker could not obtain any secret image content based on the histogram and residual image.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this article.

Acknowledgments

This project was funded by the National Key Research and Development Program of China (No. 11974373), National Natural Science Foundation of China (No. 61976126), and Shandong Natural Science Foundation (No. ZR2019MF003).