Abstract

The color filter array of the camera is an effective fingerprint for digital forensics. Most previous color filter array (CFA)-based forgery localization methods perform under the assumption that the interpolation algorithm is linear. However, interpolation algorithms commonly used in digital cameras are nonlinear, and their coefficients vary with content to enhance edge information. To avoid the impact of this impractical assumption, a CFA-based forgery localization method independent of linear assumption is proposed. The probability of an interpolated pixel value falling within the range of its neighboring acquired pixel values is computed. This probability serves as a means of discerning the presence and absence of CFA artifacts, as well as distinguishing between various interpolation techniques. Subsequently, curvature is employed in the analysis to select suitable features for generating the tampering probability map. Experimental results on the Columbia and Korus datasets indicate that the proposed method outperforms the state-of-the-art methods and is also more robust to various attacks, such as noise addition, Gaussian filtering, and JPEG compression with a quality factor of 90.

1. Introduction

With the rapid development of image editing technologies, digital image manipulation has become increasingly easy to perform. Unfortunately, tampered images can introduce harmful impacts through the rapid distribution on the Internet. Consequently, image forensics aimed at forgery detection, and localization or camera identification has attracted significant attention in recent years [1]. In practical forensic applications, researchers are more interested in forgery localization, i.e., locating tampered regions, rather than other goals [2].

Most forgery localization methods can be classified into physics-based methods and statistical. The physics-based methods study physical inconsistencies of images, such as the direction of incident light [3], illumination color [4], or shading and shadows [5]. These methods analyze the overall image information with physical models. They are robust to most image postprocessing, such as resizing and recompression. Although they perform well on quite controlled scenes, they are seldomly applicable to real-world images [6].

The most successful and widespread forgery localization methods are statistical. They depend on the inherent intrinsic fingerprints left on the image during the capture process, such as noise level [7, 8], lens aberration [9], or color filter array (CFA) [10, 11]. Although these efficient methods have been widely used, their localization performance degrades significantly for images undergoing postprocessing, such as median filtering.

Fortunately, most postprocessing operations can be revealed, such as resampling [12, 13], median filtering [14, 15], and contrast enhancement [16, 17]. Moreover, the various forgery localization methods are considered as tools, and a fusion framework combining different tools can avoid their drawbacks and limitations in practical applications. Fontani et al. [18] employed Dempster–Shafer theory to define a fusion framework for image forensics, which can be easily extended incrementally with new tools. Jeong et al. [19] proposed to identify the types of image forgery using a set of mixed statistical moments. Furthermore, Cozzolino et al. [20] fused the outputs of two fine-tuned algorithms to exploit their respective strengths and weaknesses. This technique obtained the best score in phase 1 of the first Image Forensics Challenge in 2013. Benefiting from the use of statistical methods as tools in fusion framework for practical applications, the improvement of single statistical method still makes sense.

In this paper, we propose a novel CFA-based forgery localization method. Most previous CFA-based methods assume that the interpolation algorithms used in digital cameras are linear, thereby simplifying the model. However, the interpolation algorithms used are often nonlinear [21], which reduces the performance of these methods in practical applications. For the nonlinear interpolation algorithms, the coefficients may vary with different image components, but the acquired pixel domain used for interpolation can be assumed constant. The interpolation process is similar to low-pass filtering making the interpolated pixel value linearly relate to the acquired pixel values in this domain. Therefore, we calculate the probability that an interpolated pixel value is within the range of its neighboring acquired pixel values within the predicted window size, which is normalized to obtain a new feature. Finally, the expectation–maximization algorithm and curvature are employed for statistical distribution analysis to obtain the tampering probability map. This method is independent of linear assumption and insensitive to content, resulting in improved performance. The experimental results show that the proposed method outperforms the reference methods and is more robust to attacks compared to other CFA-based methods.

The main contributions of this paper can be summarized as follows: (1) A content insensitive CFA fingerprint is proposed for forgery of localization. (2) Curvature is used for automatically determining whether the statistical feature can distinguish between original and tampered regions. (3) Experiments using publicly available datasets show that the proposed method outperforms the reference methods.

This work has been organized as follows. Section 2 reviews the previous works of CFA in the image forensics task. In Section 3, we present the theory of the novel CFA-based forgery localization method. We describe the experiment evaluation in Section 4 and conclude this work in Section 5.

Commercial digital cameras are equipped with a CFA in front of the image sensor to capture images with only one single color sample at each pixel location. In order to obtain a three-channel color image, an interpolation algorithm is employed to estimate the other two color samples. For the most widely used Bayer CFA, the green pixels are sampled on a quincunx lattice, the red and blue pixels are sampled on the complementary locations. This CFA has four configurations: RGGB, BGGR, GRBG, and GBRG. The top-left of the CFA image with the RGBG configuration is illustrated in Figure 1.

Let us suppose that , with , is the observed CFA image, and denotes the acquired green signal constructed from as follows:

The green channel of a complete color image is composed by acquired component and interpolated component:where denotes interpolation coefficients for the acquired pixels within the window.

The specific correlations introduced by CFA interpolation can be quantified for image forensics. Popescu and Farid [10] introduced the expectation–maximization (EM) algorithm to estimate the interpolation coefficients and obtained the probability of each pixel being correlated with its adjacent pixels. The periodicity of the possibility map deriving from the interpolation artifacts presented are particularly prominently in the Fourier domain. Bammey et al. [22] found a least square optimal filter instead of the iterative EM algorithm. Furthermore, Fernández et al. [23] estimated the interpolation coefficients with the ordinary least squares algorithm and applied the discrete cosine transform on small blocks for forgery localization. The main advantage of these methods is that a wide range of modifications can be detected without previous training and knowledge. However, they rely on the estimation of interpolation coefficients, which significantly increases the computational burden.

In addition, Choi et al. [24] defined different neighbor patterns and estimated the CFA pattern with the number of intermediate values in each channel. Moreover, they measured the hue changing by the intermediate value counting approach to identify the image color modification [25]. Shin et al. [26] identified the CFA pattern configuration based on the relationship of the variance of acquired and interpolated samples in the red, blue, and green channels. Jeon et al. [21] differentiated the CFA pattern by the truncated sum of the singular values. Besides, the prediction error is most widely used, which is defined as follows [27]:where denotes the predicted interpolation coefficients for the acquired pixels within a window.

Ferrara et al. [27] proposed a feature based on the prediction error variance to measure the absence and presence of CFA traces to obtain a fine-grained tampering possibility map that can detect small forgery. Singh et al. [28] introduced Markov random process to reduce the false detections and computational complexity on the basic study of Ferrara et al. [27]. Lu et al. [29] applied broad first search neighbors clustering algorithm to detect copied regions and duplicated regions in the copy–move images. Then they localized duplicated regions based on the prediction error. Furthermore, Chang et al. [30] detected photographic images and identified device classes based on the Fourier spectrum of the prediction error variances.

Although these methods based on prediction error have achieved good performance in various image forensics tasks, their linear interpolation assumption degrades their performance in practical applications. Most of the interpolation algorithms used in cameras are nonlinear, and their coefficients vary with the gradient to enhance edge information. As a result, these previous methods are sensitive to the content and sometimes even fail to extract CFA fingerprints effectively.

3. The Proposed Method

Similar to most previous CFA-based splicing forgery localization methods, we study the familiar Bayer CFA in the green channel. For each square of the green channel, the number of acquired and interpolated pixels is equal. These two kinds of pixels can be decomposed according to even and odd locations. However, the interpolated pixels have four locations in red and blue channels. Consequently, CFA feature extraction by applying the green channel can effectively reduce computation complexity. The proposed forgery localization framework is illustrated in Figure 2.

Let be the pixel value at of . Equation (2) shows that the interpolated pixel value is a weighted sum of its neighboring acquired pixel values, and the weights have:

Let be the real used in the interpolation algorithm of the camera. For example, is equal to 1 for the bilinear interpolation algorithm and is equal to 2 for gradient-based interpolation algorithm [10].

Let denote the values of the pixels at the quincunx lattice centered of within the window. The minimum and maximum values of is defined as follows:

When is the interpolated pixel and , we can conclude that ranges from to :

The probability that satisfies Equation (4) is defined as . When is the acquired pixel, is denoted as ; when is the interpolated pixel, is denoted as . Obviously, and .

Generally, since the in-camera interpolation algorithm is unknown, is also unknown. Therefore, the predicted window size is used, which is named . can have various states with different relationships between and .

As shown in Figure 3, the yellow window denotes the real window including the acquired pixels used for interpolation, namely . The red windows denote the predicted windows for interpolation, i.e., and . Moreover, the dark green cells denote bigger coefficients, and the pale green cells denote smaller coefficients for the interpolation. For the bigger red window (), contains all acquired pixel values used for interpolation, and is linearly correlated to it, resulting in . However, for the smaller red window (), some of the acquired pixel values used for interpolation are not within , resulting in .

For most interpolation algorithms, the acquired pixel values closest to the interpolated pixel have higher weights. These neighboring values contribute significantly to the interpolated value. Therefore, when , and are still strongly correlated and , which can be used to distinguish between interpolated pixels and acquired pixels. Additionally, since is mainly affected by the difference between and , it is constant for the same interpolation algorithm. Specifically, can be used to differentiate various interpolation algorithms, and it is insensitive to the content.

To obtain , we define the comparison result as follows:where . When the satisfies Equation (6), . When the does not satisfy Equation (6), .

Since the locations of the acquired and interpolated pixels are unknown, needs to be estimated on the even and odd locations. For the green channel of an image:

and are two obtained binarized comparison result maps whose densities can be used to estimate and , respectively. Binarized comparison result maps and are divided into sub-blocks at one-pixel step, and the sums of these values in each block are denoted as and , respectively. The density of and , named and , are estimated by the following equations:

To establish a simple and tractable model, we assume that and are Gaussian distribution in the original image. For the of a forgery image, let and be the hypotheses of the original and tampered regions. Since the CFA fingerprints in and are different, we can describe pixels belonging to and with the conditional probability density functions as follows:where and are different, making the distribution of have two peaks, which can be regarded as a Gaussian mixture model (GMM).

To analyze the distribution of , we introduce the EM algorithm [31]. It is a famous iterative method to estimate the means ( and ), variances ( and ) and mixing coefficients ( and ) of the component distributions by maximizing the expectation of a complete log-likelihood function. With these parameters, the GMM can be written as follows [32]:where is a GMM function fitted by parameters, , and for notational simplicity, we denote it by , is a 1D continuous-valued data vector, and are the component Gaussian densities. However, for the original image, is assumed to be a Gaussian distribution with only one peak. Therefore, we introduce the curvature of to distinguish between GMM and Gaussian distributions:where and are the first-order and second-order derivatives of . For the Gaussian distribution, the curvature changes from negative to positive and then to negative. Therefore, the curvature of the Gaussian distribution has two positive and negative changes, while the curvature of the GMM has more than three changes. The times of positive and negative changes in are counted and marked with :where is the times of positive and negative changes in . When , the distribution of has two peaks, assuming a GMM distribution. Otherwise, the distribution of has only one peak, assuming a Gaussian distribution.

In the same way, we can get from . Ultimately, we choose the appropriate feature as the tampering probability map through and . When and , is used; when and , is used; when and , both and can be used, and we choose to use empirically.

4. Experiment Evaluation

In this section, we conduct some experiments to evaluate the performance of the proposed method. The experimental evaluation contains Columbia Uncompressed Image Splicing Detection Evaluation Dataset (Columbia dataset [33]) and Realistic Tampering Dataset (Korus dataset [34]). The Columbia dataset was acquired using four cameras (Canon G3, Nikon D70, Canon 350D Rebel XT, and Kodak DCS 330), 15% of which were taken outdoors. The captured images from two cameras were spliced to obtain 30 tampered images, for a total of six combinations to get 180 spliced tampered images. The sizes of these forgery images range from to and the number of pixels in the tampered region is relatively large. The Korus dataset contains 220 realistic forgeries created by hand in modern photo-editing software (GIMP and Affinity Photo) and covers various challenging tampering scenarios involving both object insertion and removal. The original images were captured by four different cameras (Sony alpha57, Canon 60D, Nikon D7000, and Nikon D90) and the final forgery images are  px. Both datasets suffer a single image manipulation without any postprocessing and are saved in TIFF uncompressed format, which is beneficial to preserve the image CFA features. We only considered the reference methods that do not require training or other prior information, including CFA1 [27], CFA2, CFA3 [35], BLK [36], CAGI [37], NOI1 [38], and NOI5 [39]. For more details of the reference methods and source codes, please refer to Zampoglou et al.’s [40] study.

4.1. Performance Criteria

Forgery localization can be regarded as a special segmentation task, dividing each pixel into original (background) or tampered (foreground). Among the various evaluation criteria for segmentation tasks, mean intersection over union (MIoU) is the standard and most frequently used one [41]. It is the ratio between the intersection and the union of two sets, defined as follows:where TP, TN, FN, and FP are statistics of the observed true positives, true negatives, false negatives, and false positives, respectively.

Another important criterion is the mean pixel accuracy (MPA), the ratio of correct pixels is computed on a per-class basis and then averaged over the total number of classes:

At last, we evaluate the performance with the Matthews correlation coefficient (MCC), the cross-correlation coefficient between the decision map and the ground truth, defined as follows:

The MCC is robust to unbalanced classes. For some forgery images on the Korus dataset, the tampered region is much smaller than the original one, making it more appropriate to evaluate the performance of various methods with MCC.

Since the criteria used work on binary maps, and most methods only produce heatmaps with continuous values, a threshold is needed to convert these heatmaps to the corresponding binary maps. However, a single threshold algorithm will bias the detection results of different methods. Therefore, the threshold maximizing the criteria is taken. In addition, some methods just distinguish between original and tampered regions, and thus the output heatmap may have an inverted polarity with the ground truth. Consequently, we consider both the original and inverted truth ground images, leaving the best image as the result.

Most previous work has averaged the criterion scores over all test images to evaluate method performance on the dataset, such as the score. However, it just gives a general survey of the results on the dataset. For the sake of discussion completeness, we propose the efficiency ratio based on the scores on the dataset:where is the total number of the test images in the experiment. is the number of results greater than the valid threshold . Therefore, we can set the results of to be valid and evaluate the detection results more precisely by controlling .

4.2. Parameter Discussion

The proposed method is impacted by two parameters and . In this case, we assess the effect of three prediction window sizes . Additionally, to assess the impact of for the proposed method, we evaluate the performance for five block sizes: 5, 25, 45, 65, and 85. To speed up the computation, we apply the Columbia dataset, which has a lower image resolution compared to the Korus dataset, and measure the performance with scores and .

Figure 4 represents the scores of four forgery images when the method employs different parameters. For these four detection results, the scores of the detected results become higher when the block size increases. The best results are obtained in this experiment when and . It is worth noting that the improvement of method performance when over is small. However, when , it increases the computational effort of the method, therefore we set to 65 instead of 85 in our subsequent experiments.

To evaluate the impact of parameters in detail, we first evaluate the performance of the proposed method on the Columbia dataset when is 65 and takes different values as in the previous experiment. Figure 5(a) shows the efficiency ratio at different threshold for the three predicted window sizes. At each , the proposed method with the predicted window size of 1 outperforms the other two sizes. For example, at the valid threshold of 0.5, the achieves 76.11% when the predicted window size is 1, whereas the achieves 72.22% when the predicted window size is 3. At the valid threshold of 0.8, the achieves 40.55% when the predicted window size is 1, whereas the achieves 32.77% when the predicted window size is 3. The result of this experiment shows that the proposed method performs better with small predicted window size. Therefore, the used in the proposed method should be set to 1.

We follow the same protocol for the proposed method to assess the impact of block size . In this case, we evaluate the performance of the proposed method on the Columbia dataset when is 1 and takes different values as in the previous experiment. Figure 5(b) shows the efficiency ratio at different valid threshold for the five block sizes. We can observe that the method performs poorly when the block size is 5 and performs particularly well when the block size is 65 and 85. In addition, when is larger than 65, the increase of only slightly improves the method performance. Finally, the recommendation used in the proposed method is set to 65.

4.3. Comparative Experiments

We compare the performance of the proposed method with the reference methods with three criteria on two datasets. To evaluate the comprehensive performance of all methods, we conduct extensive experiments on one dataset by using three criteria at a time. For the proposed method, is set to 1 and is set to 65. Table 1 shows the results with respect to , , and on Columbia and Korus dataset.

We start our evaluation with a comparison of the overall performance on the two datasets. Notably, for the proposed method, the score on the Columbia dataset is 22.67% better than that on the Korus dataset; the average MCC score on the Columbia dataset is 70.82% better than that on the Korus dataset. In fact, the complexity of the scenario on the Korus dataset makes the test challenging. All methods achieve much worse performance on this dataset than on the Columbia dataset. Additionally, small tampered regions on the Korus dataset affect the effectiveness of and . Therefore, it is reasonable to evaluate the performance of the Korus dataset with MCC, which is robust to unbalanced classes. The Columbia dataset, with large tampered regions, can be assessed with the widely used . Regardless of the criteria, the proposed method ranks first on both datasets.

Additionally, we can readily observe that the CFA-based methods perform better than the other four methods. Experiments in Popescu and Farid’s [10] study show that the CFA-based methods perform particularly well on the Korus dataset, similar to our experimental results. In fact, experiments in Ferrara et al.’s [27] study show that the CFA-based method has low false positive rate, with a 0% false positive rate in its simulate tampering, which is an important advantage of CFA-based methods. The images of the two datasets used in our experiments are in uncompressed TIFF format, which perfectly preserves the CFA fingerprints. Therefore, the advantages of CFA-based methods are clearly exhibits, making them outperform other methods.

To visually compare the performance of the different methods, Figure 6 shows an example heat map of the localization results. Overall, CFA1, CFA3, and the proposed method outperform the other three methods in locating tampered regions. In the first and second rows, the output of CFA1 presents some false alarms that degrade the performance of the results. Although the forgery localization of CFA3 is rough, the few false alarms make its result scores higher than that of CFA1. The CFA1 method detects detailed parts of the tampered region, but there are many false alarms. The CFA3 has few false alarms, but the results are coarse, and detail is seriously lost. The proposed method detects the details of tampered regions with few false detections.

4.4. Robust Analysis

The experiments in the previous section have demonstrated the robustness of the proposed method to complex scenarios. Subsequently, we test the robustness of the CFA-based methods against various attacks. Since many postprocessing of the whole image completely destroy the CFA fingerprints, we consider only three attacks, namely noise addition, Gaussian filtering, and JPEG compression.

Compared to the RTD dataset, the image resolution in the Columbia dataset is lower. Therefore, this subsection uses the Columbia dataset for the experiments to reduce the computational cost. Three new datasets were generated by attacking the Columbia dataset. (1) We added the familiar Gaussian noise (20 dB) to images to obtain the noise addition dataset. (2) The filtering operation is similar to the interpolation process, and most of the filtering will destroy the CFA fingerprints, such as median filtering and mean filtering. The Gaussian filtering dataset is obtained by Gaussian filtering with filter size of 3 and standard deviation of 0.29. (3) Ferrara et al. [27] tested the sensitivity of their CFA-based method to JPEG compression, the performance quickly drops when the quality is less than 90. Therefore, we use the “imwrite” function in MATLAB to obtain the JPEG compressed dataset with a quality factor of 90.

Figure 7 illustrates the efficiency ratio under various attacks. Obviously, the proposed method outperforms other CFA-based methods under noise addition and JPEG compression attacks. Figure 6 illustrates that the proposed method gives fine localization results. Therefore, the proposed method provides a high score, but it also sensitive to noise in the extracted feature. For the images after Gaussian filtering attack, the proposed method results in many high and low scores and a little of intermediate scores, i.e., high scores when is greater than 0.6 and low scores when is greater than 0.5 and less than 0.6. CFA3 gets coarse forgery localization results, thus it is less sensitive to noise in the extracted features. That is, although it gets a little of high scores, it get a lot of intermediate scores. Therefore, CFA3 has a high score when is greater than 0.5 and less than 0.6, but the score decreases rapidly when is greater than 0.6. For a more intuitive display, Table 2 shows results with respect to score for the Columbia dataset under various attacks. In all three new datasets, the proposed method ranks first. Moreover, it is 12.31%, 18.65%, and 24.53% better than the second-best method (CFA3), which is much larger than 1.55% on the Columbia dataset. Although the performance of the CFA-based method is significantly degraded under various attacks, the proposed method has more significant advantages over other CFA-based methods.

5. Conclusion

In this paper, we propose a CFA-based forgery localization method. Most previous CFA-based methods assumed the interpolation algorithm is linear, which is impractical for commercial cameras. In contrast, the proposed method is based on the fact that an interpolated pixel value falls in the range of its neighboring acquired pixel values, which is valid for both linear and nonlinear interpolation algorithms. The proposed method outperforms the reference methods and is more robust to the tested attacks.

The CFA-based forgery localization method mainly considers raw images. Although these images are rarely present in daily life, they still exist in certain fields, such as copyright protection. For raw images, the CFA-based method has a low false detection rate and outperforms most methods. Therefore, the CFA-based forgery localization methods are still useful tools in practical applications. In the future, we will try to combine the CFA-based method with various other methods to make them applicable for practical applications.

Data Availability

The databases used to support the findings of this study are included within the article [33, 34]. The codes used to support the findings of this study are included within the article [40].

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the Technical Research Program of Ministry of Public Security (2020JSYJC25) and the Open Project of Key Laboratory of Forensic Science of Ministry of Justice (KF202317).