Abstract

In this paper, we propose a privacy protection scheme using image dual-inpainting and data hiding. In the proposed scheme, the privacy contents in the original image are concealed, which are reversible that the privacy content can be perfectly recovered. We use an interactive approach to select the areas to be protected, that is, the protection data. To address the disadvantage that single image inpainting is susceptible to forensic localization, we propose a dual-inpainting algorithm to implement the object removal task. The protection data is embedded into the image with object removed using a popular data hiding method. We further use the pattern noise forensic detection and the objective metrics to assess the proposed method. The results on different scenarios show that the proposed scheme can achieve better visual quality and antiforensic capability than the state-of-the-art works.

1. Introduction

Photo sharing has become a widespread user activity with the advent of intelligent mobile devices and online social networks (OSN). Image distributions cause privacy concerns and the requirement to modify permissions since the shared content contains sensitive data of users. By providing unique rights to selected communicating parties in OSN, users’ security and privacy can be strengthened. A well-established form of privacy protection is to blur a part of an image, which can be achieved by various image processing techniques, for example, blurring, mosaic, masking, and object removal, as shown in Figure 1. In these methods, the first three must introduce a significant amount of distortion to hide the underlying content. Object removal provides more natural viewing conditions and is able to protect the content. This process is reversible such that the original data can be accessed with permissions [1].

After object removal in an image, the broken parts can be inpainted using the surrounding contents. Generally, image inpainting algorithms can be divided into three groups, including the statistical-based, the diffusion-based, the patch-based, and the deep generative models-based methods [2, 3]. Statistical methods use parametric models to describe textures but fail when additional intensity gradients are applied [4]. Diffusion-based methods propagate pixels from the known areas of the image [57] using smoothness priors; however, blurring occurs when large and high-frequency regions need to be inpainted. Patch-based and deep generative models are the most widely used, where the former fills the holes in the image using the patch from local or global search regions [812] and the latter exploits semantics learned from large-scale datasets [1315]. None of the inpainting algorithms have considered the secrecy of the inpainted areas from the security perspective. The inpainted images are easy to be detected and located by forensic algorithms.

In this paper, we propose a new privacy protection scheme using image inpainting and data hiding, which realizes the antiforensics capability. When considering the undetectability of edge inpainting, we use the algorithm of the DFNet network [16]. The regions around the broken edge are inpainted twice, and the inpainting results are fused to achieve the capability of antiforensics. By combining image dual-inpainting and data hiding, a privacy protection scheme with antiforensics capability is realized. We combine local variation within and between channels and use the popular data hiding algorithm HILL [17] to embed the protection data. The rest of this paper is organized as follows: we introduce the related works in Section 2. The proposed method is depicted in Section 3. Experimental results and analysis are provided in Section 4. Section 5 concludes the whole paper.

In this section, we introduce the works that are related to the proposed method, including the image inpainting, the data hiding, and the image forensics.

2.1. Image Inpainting

Image inpainting is a method to fill the missing information in an image and is quite important in the field of image processing. Nowadays, the deep generative models-based methods are widely used in the field of image inpainting [14, 1823]. Numerous methods can be divided into two categories [24]. One approach is to use an effective loss function or construct an attention model to fill in the missing regions to try to make the content more realistic. They use the content in the background to fill, and a better way is to fix the unknown region by partial convolution [18]. The other approach focuses on structural consistency. To ensure the continuity of the image structure, these approaches usually adopt edge-based contextual priors. For example, [19] designed an edge linking strategy that can well solve the image semantic structure inconsistency problem.

Regardless of the inpainting method, there is a discontinuous transition zone at the edge of the inpainting. This area will become a forensic object and thus easy to locate the inpainting area by someone who is interested, which is quite unsafe. In order to not only achieve a good visual effect but also secure safety, a smooth transition needs to be achieved in advance. An iterative method to optimize the pixel gradients in the edge transition regions is proposed in [25],. The quality of fusion depends on whether the incorporated content is consistent with the original content in terms of gradient changes. Thus, Hong et al. [16] design a learnable fusion block to implement pixel-level fusion in the transition region, which is named deep fusion network for image completion (DFNet). The results show that DFNet has superior performances, especially in the aspects of harmonious texture transition, texture detail, and semantic structural consistency.

2.2. Data Hiding

To further optimize the data embedding problem in information hiding, adaptive embedding algorithms are widely proposed. Among them, STC (Syndrome Trellis Coding) [26] based adaptive architectures are most preferred by researchers. This method uses a predefined distortion function to minimize the additive distortion between stego and cover. For the multiscale characteristics of the image space, the design of the distortion function has attracted more and more attention. For instance, Li et al. [17] proposed a new distortion function for image information hiding. The cost function is composed of a high-pass filter and two low-pass filters. The high-pass filter is used to locate the difficult-to-predict parts of an image and then employ the low-pass filters to make the low-cost values more clustered. Furthermore, the methods of MiPOD (Minimizing the Power of Optimal Detector) [27] and ASO (Adaptive Steganography by Oracle) [28] were proposed one after another. In addition, a number of distortion functions have been proposed for JPEG steganography as well, such as IUERD (Improved UERD) [29], UED (Uniform Embedding Distortion) [30], and RBV (Residual Blocks Value) [31].

In addition, some work uses machine learning algorithms to design steganalysis tools to detect steganography. Most of these approaches learn a general steganography model through a supervised strategy and then use it to distinguish suspicious images [3235]. With the rapid development of deep learning, the performance of steganalysis has been greatly improved [3638]. However, depth features still have limitations in steganalysis [39]. For example, the truncation and quantization operations in the feature extraction process are difficult to be learned by existing networks. Therefore, feature extraction is still a challenge in steganalysis, and many rich feature sets have been used for JPEGY steganalysis. The main available feature sets include JPEG rich-model [40], DCTR GFR (Gabor filter residuals) [41], and DCTR (Discrete Cosine Transform Residual) [42]. In the classification process, the ensemble classifier is considered to be effective in measuring the feature set [43, 44].

2.3. Image Forensics

Currently, there are two forensic methods of detecting image inpainting [45, 46]. In [45], the authors find that the Laplacian operations along the isophote direction in the inpainted regions are different from the other regions. Accordingly, the inpainted regions can be identified by exploring the changes of local variances between intra- and interchannels. In [46], noise pattern analysis is used to locate the inpainted regions. For the images captured by one camera, the noise patterns in each image are approximately the same and vice versa. Therefore, the noise pattern can be used as the fingerprint for a camera, which is widely adopted in image forensics.

The noise pattern analysis algorithm in [46] is popular. In this model, the pixel values can be constructed by ideal pixel values, multiplicative noises, and various additive noises, which can be expressed bywhere I and O are the actual pixel and ideal pixel value of the natural scene, a is the sum of various additive noises, f() is the camera processing like CFA interpolation, and K is the coefficient for noise pattern. In equation (1), the multiplicative noise K·O is the theoretical expression of the noise pattern, which is a multiplicative noise in the high frequencies related to the image contents. Generally, we can use a low-pass filter to remove the additive noises. The residual noise is then used to estimate the noise pattern [47], as shown in the following equation:where F(∙) is the low-pass filter and is the estimated noise pattern. The noise pattern can be used to distinguish the content from different images. Therefore, the inpainted region can be detected after extracting the noise pattern from each part of the image.

During inpainting, since there are limited pixels around the damaged regions, each diffusion is smoothed based on the surrounding pixels to accomplish the diffusion. Therefore, the pixels located in the inpainted region satisfy , which means that the results of Laplacian operation on this position remain unchanged along the isophote direction after the diffusion-based inpainting. The Laplacian variation along the isophote direction can be calculated bywhere is the -th Laplacian value and is the result of Laplacian operation on a virtual pixel on . The virtual pixel is located at the direction of , and its distance to the pixel is identical to 1.

3. Proposed Method

In this section, we present an antiforensic framework to perform object removal in images using dual-inpainting and data hiding. As shown in Figure 2, the proposed framework contains four parts. We first select the protected area interactively and calculate the percentage of the area in the whole image. Then, the background with the missing protected area was inpainted. In order to achieve a satisfactory visual effect and be as forensic-free as possible, an image dual-inpainting algorithm is proposed, as shown in Figure 3 and described in Section 3.13.3. For the inpainted image, region segmentation is performed based on the changes of local variances between the intra- and interchannels. Meanwhile, the protected region is embedded into the background after converting it into a bitstream by combining the HILL embedding algorithm and considering the segmentation. On the recipient side, we can extract the embedded data, fuse it with the background image, and recover the original image.

3.1. Protection Region Selection

We interactively specify the area in an image to be protected, which also means that the hidden area is determined. After that, we calculate the number of the pixels to be hidden, including the values and coordinates of these RGB pixels. The pixels are converted into bit stream for embedding. We define the bits of each pixel as 5 × 9, in which “5” stands for pixel values in three channels, horizontal and vertical coordinate values, and “9” means that we convert each decimal to 9 bits. In a color image, information can be embedded in all three channels at each position. Thus, the maximum amount of embeddable information is three times the image size. The maximum embedding ratio T is calculated to be 6.66% per image. Let t be the proportion of the selected protection region. The proportion should be smaller than a predefined threshold T. An example of the interactive region selection is shown in Figure 4.

3.2. Background Processing

After specifying the protection area, we remove the contents in this area and inpaint the image. When inpainting large areas, it is often not possible to perfectly blend the inpainted area with the existing content, especially in the edge areas [16]. To fill this gap, the DFNet network [23] introduces a fusion block, which combines the structural and texture data and smoothly blends them during the inpainting process. As shown in Figure 5, I is the input image, Fk is the feature maps from k-th layer, and Ik is resize of I. The learnable function M is designed to extract the raw completion Ck from feature maps Fk, which is as follows:where M denotes the channel conversion operation, which converts n channel feature maps into 3-channel images under the condition of constant resolution.

In addition, another learning function A is used to generate the alpha composition map ak:

Map ak usually is obtained by synthesis from a single channel or 3 channels for imagewise alpha composition. Previous experience has demonstrated that channelwise alpha composition performs better. A is a convolutional module which consists of 3 convolutional layers with kernel sizes of 1, 3, and 1, respectively. The final result is achieved by

The fusion block makes the image inpainted by the DFNet network almost visually free of edge discontinuity. Although the DFNet network achieves good visual results, it is not suitable for privacy protection since it can be easily localized for forensics. For example, pattern noise of the image detection reveals clear artifacts in the restoration edge area. To conceal these traces and achieve the privacy-preserving, further manipulation of the inpainting image is required.

The detection area is mostly found in the edge area of the restoration, so we consider secondary processing of the edge area to eliminate the traces left during the restoration process. In this process, we used the mathematical morphology of the dilation operation and the erosion operation. In the dilation operation, the structural element B is used as an external window to increase the overall boundary of the target image. In the erosion operation, the structural elements serve as the internal windows to eliminate the boundary of the image. The dilation operation is expressed by equation (7) and erosion operation can be expressed by equation (8):

The specific dual-inpainting process is shown in Figure 3. Firstly, the background image should be inpainted using the DFNet network. Then, we apply a mathematical morphological dilation operation on the edges of the broken region mask map. Based on this mask map, secondary inpainting of the primary inpainted image is performed in the region. In addition, mathematical morphology erosion operation is then applied to the secondary inpainted region, leaving only a portion of the region close to the edge. Note that the dilation operation uses a larger size of structural elements than that of the erosion operation to ensure the results of the secondary inpainting of the lower edge are preserved. The results of the secondary inpainting of the edge region are fused with the primary repair map to obtain a graph of the experimental results of antiedging detection.

3.3. Area Segmentation and Data Hiding

To hide the secret data of the protection region, we employ the popular data hiding framework which can be achieved by STC [17]. We improve the popular cost function HILL for STC to fit the requirements in our method.

In the STC framework, the theoretical minimum steganography distortion D for the marked image with an embedding amount of γ (bits) can be defined aswhere and are the probabilities of adding 1 or subtracting 1 on , , and stands for the distortion values used to measure the effects of modification. The parameter λ (λ > 0) is used to make the ternary data entropy of the modification probability identical to the capacity γ, as shown in the following equation:

To achieve the minimum distortion D, STC encoding is used. Let the secret bits m = [m1, m2, …, mγ]T ∈ {0, 1}γ, cover pixels c = [c1, c2, …, cMN]T, and stego pixels y = [y1, y2, …, yMN]T. Then, m can be embedded into c usingwhere yl ∈ {0, 1}MN is the least significant bits of the stego image, C(m) = {z ∈ {0, 1}MN|Hz = m} is the companion set of m, and H ∈ {0, 1}γ×MN is a predefined low-density parity test matrix related to embedding speed and embedding efficiency. The embedded bits m can be extracted simply by a matrix multiplication operation:

To fit the requirements in our method, we improve the popular cost function HILL for STC by combining variations within and between adjacent pixel channels. Specifically, we divide the cover image into four regions (marked with green, blue, black, and red in Figure 6) using the cost values of HILL and edge connectivity. The pixel complexity of the four regions decreases in the order of green, blue, black, and red. In other words, the green region has the most complex pixels and is the best embedding region for the whole image. Therefore, secret bits are embedded into the green region preferentially.

4. Experimental Results

This section presents the experimental evaluation results. Firstly, we introduce the database employed and the corresponding parameters. Then, experiments for each part are presented in turn and their validity is demonstrated.

4.1. Performance for Antiforensics

To evaluate the performance of antiforensics, we randomly select images from the database for validation and interactively select the areas to be protected, as mentioned in Section 2.

In each image, the selection of the protected area is irregular shape generally. For later embedding of data, we strictly controlled the ratio of protected areas to the image to less than 6.66%. We use two separate forensic approaches for the forensic analysis of our results: one is pattern forensics by pattern noise, and the other one is based on changes between and within adjacent pixel channels.

Firstly, we select 50 landscape images sized 512 × 512 from Today’s Headlines. As shown in Figure 7, we selected four of them, I1, I2, I3, and I4 in turn. Table 1 lists the space proportion t and the number of pixels to be embedded in the whole image of the corresponding protection area of the four images in Figure 7. Figure 7(c) shows the images after being inpainted based on DFNet, Figure 7(e) shows the images after being inpainted by our method, and Figures 7(d) and 7(f) show the pattern noise maps of Figures 7(c)∼7(e), respectively. Comparing with the ground truth Figure 7(b), we find that Figure 7(d) has obvious traces at the repair edges, which makes the repair region easy to be forensically located. While our method overcomes this drawback well, it is difficult to forensically locate our tampered region from the pattern noise forensic aspect only. It shows that our aspect has a good antipattern noise forensic effect.

In Figure 8, we show the experimental results for five images (M1, M1, M3, M4, and M5) in the UCID database, sized 384 × 512. Table 2 lists the space proportion t and the number of pixels to be embedded in the whole image of the corresponding protection area of the four images in Figure 8. Two traditional methods and a deep learning method are used for comparison, where the traditional methods are edge-oriented and Delaunay-oriented provided by G’MIC [48], a full-featured open-source framework for image processing. The deep learning-based one is the DFNet method mentioned in [16].

Comparing from the subjective vision, both our experimental results and the deep learning method outperform the traditional method and achieve good visual connectivity at the edges. In particular, in row 7 of Figure 8, the effect at the red petal achieves a good visual effect after blending with the primary restored image by our secondary processing of the restored edges.

In addition, we localized the inpainted image for forensics by the forensic algorithm proposed in [46], as shown in the even rows of Figure 7. The traditional restoration-based algorithm is easy to be detected and located, and the DFNet-based restoration also achieved good antiforensic results. However, the images obtained by our method are more suitable to hide the area to be protected. In particular, the results are better when the area to be protected accounts for less than 4% of the whole image.

In Table 3, we show the F1 values of the five images in Figure 8, where a smaller F1 value indicates a worse ability to correctly locate the image and indicates that we have a better antiforensic effect. We can see from Table 3 that our method is superior in terms of objective indicators.

4.2. Experiment Setup

In our experiments, we use the free user-shared image dataset provided by Today’s Headlines, which contains a large number of people landscapes, and various life images. We also use the UCID database. Based on the maximum amount of data that can be embedded in an image, it can be calculated that the size of the protected area must not exceed 6.66% of the whole image (T = 6.66%) no matter how large the image size is. For the structural elements for the mathematical morphology of the background process, the circular structure is employed since it has a smoother edge where the structure size is 10 for the dilation operation and 5 for the erosion operation.

To evaluate the performance of image dual-inpainting against detection and localization, we adopt F1-score, peak signal-to-noise ratio (PSNR), and mean square error (MSE) objective indicators to evaluate the inpainting results:where TP (true positive), FN (false negative), and FP (false positive) stand for the number of detected inpainted pixels, undetected inpainted pixels, and wrongly detected untouched pixels, respectively:where A(i, j) and B(i, j) are the original image and the inpainted image, respectively.

4.3. Reversibility Analysis

In this section, we show that our privacy protection method is effective during communication or sharing. Meanwhile, our method is fully reversible, which enables data to be extracted when it reaches the recipient side.

In Figure 9, we show five sets of comparisons between the recovered images and the original images. The first two of which are from the Today’s Headlines database and the last three from the UCID database. In the prerecovery and embedding image operations, there is no damage or tampering to the regions other than the region to be protected. Therefore, under the condition of having the pixel values and coordinates of the region to be protected, the original images can be recovered.

5. Conclusion

Currently, most of the privacy protection methods only focus on visual quality, while the real protection needs to be considered from the perspective of image security analysis. We propose a reversible privacy protection scheme using image dual-inpainting and data hiding, in which the original image can be perfectly recovered. Experimental results show that after the inpainting of the image with the removal of the area to be protected by the dual-inpainting algorithm, antiforensics for the two current methods for target removal forensics can be achieved. The later embedding and extraction of the protected region also achieve an effective combination of the two research directions of antiforensics and steganography. In addition, reversible privacy protection not only effectively stops snooping but also guarantees that the original image can be recovered when needed.

Data Availability

In our experiments, we use the free user-shared image dataset provided by Today’s Headlines, which contains a large number of people landscapes and various life images. We also use the UCID database.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported by the Natural Science Foundation of China (U20B2051).