Copy-move forgery is the most predominant forgery technique in the field of digital image forgery. Block-based and interest-based are currently the two mainstream categories for copy-move forgery detection methods. However, block-based algorithm lacks the ability to resist affine transformation attacks, and interest point-based algorithm is limited to accurately locate the tampered region. To tackle these challenges, a coarse-to-fine model (CFM) is proposed. By extracting features, affine transformation matrix and detecting forgery regions, the localization of tampered areas from sparse to precise is realized. Specifically, in order to further exactly extract the forged regions and improve performance of the model, a two-level local search algorithm is designed in the refinement stage. In the first level, the image blocks are used as search units for feature matching, and the second level is to refine the edge of the region at pixel level. The method maintains a good balance between the complexity and effectiveness of forgery detection, and the experimental results show that it has a better detection effect than the traditional interest-based copy and move forgery detection method. In addition, CFM method has high robustness on postprocessing operations, such as scaling, rotation, noise, and JPEG compression.

1. Introduction

With the rapid development of technology worldwide, there are many ways to obtain and process images [1]. Evolutions in computer technology, the Internet, and image applications have allowed individuals to tamper easily with image content. Copy-move is the most common means of image forgery, in which a copy of a region is inserted into the same image. Two examples are shown in Figure 1, where the copy-move forgeries are used to enrich image content. Considering scenarios involving the court, news, and so on, it is of paramount importance to determine whether an image is tampered. The purpose of digital image forensics is to verify the authenticity of an image.

As one of the most common means of image tampering, copy-move forgeries may be accompanied by certain postprocessing, including JPEG compression, noise addition, and blurring, to change the image content and confuse the information recipient [2]. In particular, the copied area is often geometrically transformed (rotated, scaled, etc.). Therefore, the passive forensics of copy-move tampered images faces great technical challenges and has a strong practical application value. This paper studies the corresponding passive forensic techniques for copy-move operations.

Our main contributions can be summarized as follows:(1)This paper proposes a coarse-to-fine model for detecting forged regions by the affine transformation matrix (CFM). The localization of the forged regions from sparse to accurate is achieved.(2)To further extract the forgery region accurately, a two-stage local search algorithm is designed in the refinement stage to better maintain the balance between complexity and effectiveness of forgery detection.(3)The method has better detection results and higher robustness to postprocessing operations such as scaling, rotation, noise, and JPEG compression.

Numerous methods for copy-move forgery detection (CMFD) have been proposed in the last decade, which are traditionally categorized into two classes: block-based and interest point-based methods.

2.1. Block-Based CMFD

In 2003, Fridrich [3] proposed the first CMFD algorithm which divided an input image into overlapping blocks to yield similar block pairs and used discrete cosine transform (DCT) to describe image blocks. LBP is a grey-scale texture operator which is used to describe the spatial structure of the image texture. Wang et al. [4] extracted Quaternion Exponent Moment (QEM) moduli from each overlapped circular color block. The main limitation of this method is the higher computational complexity, which can be reduced by applying super pixel theory. Chen et al. [5] proposed a scheme to detect copy-move regions through the invariant features extracted from each block, and each block was only compared with other blocks under the intersection of closed mean and variance features. Mahmood et al. [6] divided the approximation sub-band of the shift invariant stationary wavelet transform into overlapping blocks. Distinct features extracted from the overlapping blocks were used to expose tampered regions forged in digital images. The features of these algorithms can be classified as follows: invariant moments, dimension reduction, textural features, and polar transform. Matching techniques include dictionary sorting and Euclidean distance [7]. However, most algorithms based on image blocks do not perform well in resisting affine transformation attacks.

2.2. Interest Point-Based CMFD

Different from block-based algorithms, interest point-based CMFD algorithms are more robust against affine transformations. Unlike dividing an image, this method extracts interest points on the image, and image features are then extracted around the interest points. He et al. [8] used PCA on the feature vector to reduce computational complexity. Mohamadian and Pouyan [9] combined SIFT and Zernike moments to reduce the potential of being unable to detect tampered regions in flat regions. Pun et al. [10] proposed a novel CMFD scheme using adaptive oversegmentation and feature point matching, which integrates block-based and interest point-based forgery detection methods. Pandey et al. [11] proposed a fast and effective copy-move forgery detection algorithm through hierarchical feature point matching. Due to the high stability of intermediate and postprocessing operations, the SIFT method has been widely used in CMFD. To improve SIFT performance, Bay et al. [12] initially proposed the speeded-up robust features (SURF) technique. The SURF operator maintains the excellent performance of the SIFT operator but addresses the shortcomings of high computational complexity and time consumption. Bo et al. [13] proposed a CMFD technique based on SURF and extended the dimensions of Bay’s techniques to 128 to reduce false matching. Many scholars have only used this technique in interest point detection to produce feature points, after which local features were employed to describe an interest point to achieve satisfactory results [14, 15]. Mishra et al. [16] presented a detection method based on the combination between speeded-up robust features (SURF) and hierarchical agglomerative clustering (HAC). Zandi et al. [17] proposed a new interest point detector that leverages the advantages of block-based and traditional interest point-based methods and uses improved strategies to implement the algorithm. However, because the interest points are comparatively few and scattered, interest point-based detection methods can encounter difficulties in locating a precise forged region.

The block-based CMFD algorithm and interest point-based CMFD algorithm each have a similar framework as depicted in Figure 2 [18].(i)Preprocessing: its main purpose is to eliminate irrelevant information in the image and restore useful real information; the most common approach is to convert the image from an RGB version to a grayscale image(ii)Feature extraction: local image information is extracted from an image block or interest point represented by a feature descriptor(iii)Matching: similar pairs of image blocks or points are determined during the matching process

Most existing algorithms based on image blocks suffer from some attacks, such as scaling, rotation, and noise addition, and interest point-based methods cannot locate the tampered region precisely. To solve these problems, a hybrid two-level method combining image blocks and interest points is proposed in this paper. We chose the SIFT as the feature descriptor to represent the interest point. Then, the adaptive oversegmentation method is used to improve the matching process and calculate the affine transformation matrix. Finally, the proposed local search algorithm is applied to image block level and pixel level, respectively, to locate the tampered region accurately.

3. Proposed Detection Algorithm

In this paper, an accurate CMFD method based on interest point and local search algorithm is proposed. The process is illustrated in Figure 3.

The main flow of the proposed algorithm is as follows: (1) feature extraction: interest points are detected in the input image represented by a feature descriptor, after which accurate interest point matches are obtained via a matching process; (2) affine transformation calculation: utilize a random verification algorithm to calculate the affine transformation matrix; (3) forgery region extraction: local search algorithm is applied to the image block level and the pixel level. The image block level realizes the location of the tampering region, and the pixel level is used to refine the tampering region boundary.

The image-level detection and pixel-level detection of the proposed model on the testing dataset show promising results. Our main contributions are as follows:(i)A method combining image blocks and pixels is proposed. Based on the block, the forged region can be located, and the pixel points are used to make the area boundary more refined. This method can make up for the poor performance of only extracting tampered areas with points of interest, thereby improving detection performance.(ii)Considering the balance between algorithm complexity and performance, design a two-level local search algorithm. In the first stage, the image is divided into small blocks by rectangular blocks. If the image block contains the point of interest, it is marked as a forgery unit and calculated by affine transformation. The search algorithm matches the result to get the forgery region. In the second stage, the boundary of the forged area is extracted at the pixel level, and a secondary search algorithm is used for improvement to further improve the accuracy of model detection.(iii)Four different postprocessing operations were performed on the test dataset, and the experimental results show that our model still exhibits high robustness.

In the rest of this section, we present the process of this detection algorithm as illustrated in Figure 3. The details of our proposed algorithm are reflected in the following sections: Section 3.1 presents the feature extraction and description along with image segmentation using the adaptive oversegmentation algorithm to prepare for the next matching process. Section 3.2 outlines the feature-matching process using the two nearest neighbor (2NN) algorithm [19]. And then, the affine transformation is calculated. Section 3.3 introduces the local search algorithm. In Section 3.4, two-level local search algorithm using affine transformation matrix is utilized to locate the tampered region accurately. In the first stage, the image blocks are used as search units for feature matching, and the second stage is at the pixel level to refine the edge of the region.

3.1. Feature Extraction and Adaptive Oversegmentation

The first phase of the proposed algorithm involves interest point detection and feature extraction based on SIFT features, referring to local features of an image. SIFT remains invariant to rotation, scaling, and light intensity and maintains stable robustness to changes in the viewing angle, affine transformation, and noise. The interest points and their corresponding descriptors are obtained. Based on these results, the proposed algorithm performs a matching operation to identify similar local regions.

To obtain good performance in matching and calculation of the affine transformation matrix, the adaptive oversegmentation method is adopted [10]. Next, we find corresponding interest point pairs via the feature matching process. In our proposed method, the segmentation algorithm is simple linear iterative clustering (SLIC). SLIC algorithm can generate compact and nearly uniform superpixel, and has high comprehensive evaluation in terms of operation speed, object contour preservation, and superpixel shape, which is more in line with the expected segmentation effect. When the SLIC segmentation method is used, the balance between computational cost and detection precision must be guaranteed. Therefore, the adaptive over segmentation algorithm is adopted to adaptively define the size of superpixels according to the texture of the test images.

Next, a segmented image builds the image blocks set , where NB is the total number of image blocks; the interest points and feature descriptors in the ith image block are stored in Bi. Figure 4 depicts the relationship of the block set. Then, we find the corresponding interest point pairs via the feature-matching process.

3.2. Interest Point Matching and Affine Transformation Calculation

The 2NN algorithm utilizes the ratio of the distance between the nearest neighbor and the second nearest neighbor. If image blocks Bi and Bj must match, for any feature point, where is the kth point in block Bi, the calculation is as follows:where is the similarity threshold, d1 is the closest neighbor, and d2 is the second closest neighbor. The distance dm is calculated aswhere dm denotes the distance between point and point . is the mth point in Pj, and and are the corresponding feature descriptors.

In our experiment, is set to 0.2. If constraint (1) is satisfied, then the inspected interest point is matched with ( and denotes the interest pairs).

We iterate the 2NN process in different image blocks in our experiment until all blocks have been traversed, resulting in a dataset: , where the interest pairs between Bi and Bj are stored in .

Matching operations between image blocks can avoid failed matching due to the proximity of points to coordinates. To further prevent match failure, assuming that exists in MP, if the number of point pairs in is too small, then the point pairs between the image blocks Bi and Bj are considered a failure and must be deleted. As such,where size () represents the number of point pairs in MP [x] and the threshold Tp is set to 3 to filter the failed pairs. Thus, most missed matches are filtered.

To better display the tampered region, affine transformation matrix T is used to describe the relationship between the source region and replication regions. The traditional method of estimating affine transformation is not suitable for the algorithm in this paper. In our proposed method, we propose a more efficient matrix estimation algorithm. If exists in MP, then we randomly extract three point pairs and store them in . The affine transformation matrix T is described as follows:where the affine transformation matrix T is represented aswhere and ty denote translations and a1, a2, a3, and a4 are associated with scaling and rotation. Cmatrix can obtain the affine transformation matrix T.

To verify the accuracy of matrix T, all point pairs in must be tested using this matrix. For any interest point pairs () in , point p can obtain the corresponding interest point p' using the following equation:

We verify the matrix accuracy based on the distance between and p'.where x', y', and x', y' are the coordinates of and p'. Td is the similarity threshold of the matrix (Td = 1.5 in our experiment). Then, we obtain the number of right point pairs count in .

When rate is greater than 0.5, the matrix T is considered correct. In this case,where size () is the amount of all point pairs in .

In most cases, the source region and replication region may be covered by many image blocks. Many affine transformation matrices can be obtained through MP. We propose an algorithm to deal with this problem. Whenever any set M in MP must be calculated, we must examine the relationship between point pairs in M and existing matrix using formulas (6)–(8). If the label rate is more than 0.5, the set M is not to be calculated. Finally, the matrix set is described as follows:

Next, we will display the tampered region in the search algorithm.

3.3. Local Search Algorithm

Extracting the tampered region using only the interest point results in poor performance. By considering the balance between algorithm complexity and performance to more accurately extract the forgery region, we propose a local search algorithm that can be applied at the image block level and pixel level. The role of the local search algorithm is described in Figure 5, where the grid is used to replace the test image, the region outlined in red is the forged region, and the blue small block is the forged unit; when the first search algorithm is used, the forged unit is an image block, and the second forged unit is a pixel. Details of the search algorithm are provided in the following section.

The detection unit can find a corresponding unit via the affine transformation matrix, which is key to the local search algorithm. The detected unit can find corresponding unit through the matrix. Before executing the search algorithm, the forged units must be collated and added to the forgery region set (TR). Then, the local search algorithm is executed; steps are shown in Algorithm 1.

Input: forgery region set (TRcnt) (block or pixel), affine transformation matrix (Tend)
 Output: forgery region set (TRcnt + 1)
(1)Detection unit p selected from TRcnt and obtain the neighborhood pnei; elements that have been detected in pnei are deleted. pnei is added to the set Dnei.
(2)Nondetection unit pi is removed from Dnei and obtain detection unit by T. Calculate the similarity between pi and ; if successful, pi and are added to TRcnt + 1, and the neighborhood of pi is added to Dnei. Continue to execute step 2 until Dnei is empty.
(3)Iterate steps 1 and 2 until all elements in TRcnt have been detected.

TRcnt is the result of the current detection, Dnei is the set of neighborhood , and (1, 2, 3, 4) denotes four angles (0°, 90°, 180°, and 270°). Notably, the detection unit in pnei may be the detected element; therefore, the detected elements in pnei must be deleted. Then, the corresponding unit pi is calculated by matrix T, and feature descriptors are used to measure the similarity. These descriptors are explained in detail in the following section.

The successfully matched unit pairs are added to TRcnt + 1. This operation is iterated until all elements in TRcnt have been detected. Finally, the test result TRcnt + 1 is combined with the original result TRcnt + 1, and we obtain the final result TRcnt + 1 = TRcnt + 1 U TRcnt. To understand the algorithm flow and prove the validity of the local search algorithm, a flow chart is used for descriptive purposes (Figure 6).

Figure 6 presents the ordinary flow of the local search algorithm. There are only six forged units (a, b, c, a′, b′, and c′) at the beginning of the algorithm; the forged region is not completely covered. Implementation steps of the algorithm are described in Figure 6, where the blue blocks are forged units, green tags stand for detecting units, red blocks are nonforged units, and white blocks are units that have not been detected. Assume that there is only one affine transformation matrix T, and the final result was shown.

3.4. Tampered Region Localization

To balance the complexity and accuracy of the algorithm, the two-stage local search algorithm is proposed: the image block level. And, the second stage is at the pixel level to refine the edge of the tampered region. The framework of the algorithm is displayed in Figure 7.

3.4.1. The First Stage

In our method, interest points in the MP are extracted and stored in Pright. First, a small, nonoverlapping rectangular block is used to cover the host image, and all image blocks are scanned. If the image block contains interest points in Pright, the block is marked as a forged unit. Then, the image blocks as a detection unit are added to TR0, and the search algorithm is employed on the image block level. Corresponding image blocks are calculated by the affine transformation T. Assume that image block Bi calculates corresponding image block Bi; in this case, image block Bi cannot reach the center of another block (Bi) and needs to extract the true matching image block Bi, so feature comparison must be executed between Bi and Bi. Then, the ZNCC (zero-based normalized cross-correlation) should be calculated between Bi and Bi as follows:where I (u) and (u) denote pixel intensities at location u, and I and I are the average pixel intensities of Bj and B. We apply a Gaussian filter of 7 × 7 pixels with a standard deviation of 0.5 to reduce noise; the threshold (TRD) is set up to obtain similar image block pairs:

In our work, TRD is set to 0.55 once formula (11) has been calculated. The two image blocks (Bi and Bj)

are similar, and the results of the search algorithm are stored in TR1.

A filtering algorithm is used to render the test results more accurate. For each forged unit in TR1, the neighbor of detection element D must be extracted, and the neighboring blocks are defined as Dnei = {d0, d1, d2, d3, d4, d5, d6, d7}. In our experiment, if the number of forged units in Dnei is less than 2, the detection element D is deleted.

3.4.2. The Second Stage

It is challenging to extract the forgery region at the image block level, and the algorithm does not have good performance at the edge of the tampered region. Thus, the edge of TR1 is extracted, and we obtain an edge region ER0 and a center region CR1 on the image block level, where ER0 is considered inaccurate and CR1 is accurate. In matrix T, all pixels in ER1 must be calculated. For the obtained pixel pairs, the ZNCC algorithm is used to measure similarities, and the threshold (TDR) is set to 0.55. The matching result is saved in ER1, from which, forgery region TR2 is obtained by combining the center region CR1 and the matching result ER1. To improve the edge of the forged region, the edge of ER2 is extracted at pixel level in TR2, and ER2 is used to execute local search algorithm. Assume that we get (I, I′) by matrix T; the color feature should be extracted, respectively, between I and I′ as follows:where R (), G (), and B () are three color channels of the detected image unit; FI, FI' are the color features of I and I'; and if feature FI and FI' conform to formula (11), matching is successful between unit I and I'.where is the degree of similarity between I and I′. In our work, is 0.5. Results are stored in ER3.

The tampered region TR2 is obtained by combining ER3 and center region CR2. After the filtering step, the morphological close operation is applied to TR3 to eliminate small gaps, after which the tampered region TRend is generated. The algorithm is evaluated in the following section to demonstrate its effectiveness.

4. Experimental Results

In this section, a series of experiments are conducted to evaluate the performance of the proposed CMFD method. Section 4.1 introduces the image dataset used in our experiments and the evaluation criteria used to evaluate the performance of the proposed method. Section 4.2 shows the experimental results of the proposed algorithm. In section 4.3, the experimental results of the proposed CMFD method were finally compared with existing state-of-the-art CMFD methods under different transforms, and the results of comparative analysis were outlined.

4.1. Datasets and Evaluation Criteria

In the following experiments, a benchmark database [20] that includes realistic copy-move forgeries was used to test the proposed scheme. This image dataset included 48 source images along with manually prepared per image, semantically meaningful regions to be copied. Each image measured 3000 × 2300 pixels. Forgery regions comprised approximately 10% of each image. The copied regions belonged to the categories of living, natural, artificial, and mixed textures ranging from smooth to complex. Transformed images, such as those that underwent rotation, scaling, JPEG artifacts, and added noise, were also included in the image dataset.

To quantitatively evaluate the detection performance, we adopted two metrics: precision and recall. Precision is the fraction of pixels identified as forgery that are truly forgery, defined as the ratio of the number of correctly detected forged pixels to the total number of detected forged pixels. Recall refers to the fraction of forged pixels that are correctly classified, defined as the ratio of the number of correctly detected forged pixels to the number of forged pixels in the ground truth forgery image. Precision and Recall are calculated using (14) and (16), where Ω denotes the set of the detected forged regions in forged images with the CMFD method at the pixel level and Ω denotes the forged regions of the ground-truth of forged images. We provide the Fi score as a measure that combines precision and recall in a single value.

Using these metrics, we show how precisely the CMFD algorithms identified tampered regions. To reduce the effects of random samples, the average precision and recall were computed for all images in the dataset.

4.2. Experimental Results of the Proposed Algorithm
4.2.1. Experimental Results on Plain Copy-Move Forgery

Plain copy-move forgery is a kind of one-to-one copy-move method that does not involve other transformation operations. It is to cut the local area of the target image and then paste it into the target image again through rotation, scaling, and other operations to generate a new tampered image. We experimented on 48 plain copy-move forgery images in total. Figure 8 displays eight copy-move forgery detection results for the plain copy-move forgery, and the forgery content is either smooth (e.g., sky), rough (e.g., rocks), or structured (typically man-made buildings). From top to bottom are test images and corresponding ground-truth forged regions, and the final row is forged region detected by the CFModel. As can be seen from the figure, the proposed model obtains fine prediction masks and even in small forgery region. These groups can be used as categories for CMFD images.

4.2.2. Experimental Results under Various Attacks

In addition to one-to-one copy-move forgery, we also experimented on the various attacks to verify the effectiveness of the proposed algorithm.(i)Scale: the tampered region is rescaled to between 91% and 109% of their original size with 2% step length.In total, 48 × 10 = 480 images are experimented. Figure 9 displays eight copy-move forgery detection results for the scaling, and some scale resizing parameters are included: 91%, 93%, 95%, 97%, 103%, 105%, 107%, and 109%.(ii)Rotation: the tampered region is rotated at a rotation angle varying from 2° to 10° with a step length of 2°. In total, 48 × 5 = 240 images are experimented. Figure 10 shows eight copy-move forgery detection results for the rotation, and some rotation angles, i.e., 2°, 4°, 6°, 8°, and 10°, are considered.(iii)Gaussian noise: the image intensities of the tampered region is normalized between 0 and 1 with added zero-mean Gaussian noise with standard deviations of 0.02 to 0.10 and a step length of 0.02. In total, 48 × 5 = 240 images are experimented. Figure 11 illustrates eight copy-move forgery detection results for noise, and noise standard deviations are included: 0.02, 0.04, 0.06, 0.08, and 0.1.(iv)JPEG compression: the forged image is JPEG compressed with quality factors varying between 100 and 20 and a step length of 10. In total, 48 × 9 = 432 images are experimented. Figure 12 shows eight copy-move forgery detection results for the JPEG, quality factor (QF) which included: 20, 30, 40, 50, 60, 70, 80, and 90.

4.3. Comparative Analysis of Algorithms

This section presents the comparison results between CFModel and the existing methods, and experiments on the dataset proposed in [20] including 1488 tampered images. Three recent methods based on SIFT [20] and SURF [20] along with iterative CMFD [17] were selected for comparison.

4.3.1. Detection Results under Plain Copy-Move Forgery

We first evaluated our algorithm under plain copy-move forgery attack. We experimented on 48 original images and 48 forged images, which are tampered by one-to-one copy-move forgery. Tables 1 and 2 present the results of the evaluation at the image level and pixel level.

As noted in Table 1, the CFModel achieved 97.82% precision and 93.75% recall, better than the most state-of-the-art methods at image level. Our scheme also achieved better performance at the pixel level. As indicated in Table 2, the CFModel achieved up to 84.58% precision and up to 97.41% recall, surpassing most state-of-the-art methods. Compared to Bi [22] and Chen [23], F1 score is slightly lower than them. The possible reason is that the proposed model is based on block and interest point, which focuses more on recall rate (whether the forged pixel is checked completely and correctly). These results show that the proposed method is more effective than others. Figure 8 also provides the representative results of eight examples. As is shown in the figure, we can see that our proposed algorithm can accurately locate the tampered region even in those small or smooth copy-move regions.

4.3.2. Detection Results under Various Attacks

In order to obtain a more detailed assessment of the discriminative properties of the method, the detailed data of copy-move forgery detection results, experimented on 1392 tampered images under various attacks in total, are shown in Figure 13. We use 1392 images in total under different attacks. Figure 13 provides all qualitative results: top to bottom—scale attack, rotation attack, Gaussian noise addition, and JPEG compression; left to right—precision rate, recall rate, and F1 score.

As shown in the figure, the precision rate and recall rate of our scheme reached a higher level than other methods, the F1 score was particularly prominent under scale indicating that our method provides a good balance of precision and recall. The main reason is that our method proposes a two-stage local search algorithm, which can not only locate the tampered region at the image block level but also locate the edge at the pixel level. In other words, our scheme performed better than most state-of-the-art methods in most cases; however, our method has a very low score when the standard deviation exceeds 0.6, and we will address this deficiency in subsequent work.

5. Conclusion

With the development of digital technology, digital images can be easily forged using image processing software. Forged images must be identified given the potential legal and other implications. In this paper, we propose a copy-move forgery detection algorithm using SIFT as the interest point and feature extraction method. The affine transformation matrix was then calculated, followed by a local search algorithm to locate the forged region. Experimental results show that the proposed scheme performs much better than state-of-the-art copy-move forgery detection algorithms and demonstrates good performance under various attacks. However, performance was poor when images contained noise; we will focus on this image type in later work.

Future research is mainly as follows:(1)To address the problem that the method cannot adapt to noisy operations, future plans are to incorporate richer texture feature information to achieve better robustness(2)In future work, we will focus on detection tasks with multiple copy-move tampered regions at the same image to realize practical applications of the detection algorithm

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.


This research was supported by the National Key Research and Development Program of China (2018YFB080402 and 2018YFB0804203), Regional Joint Fund of NSFC (U19A2057), the National Natural Science Foundation of China (61672259 and 61876070), Jilin Province Science and Technology Development Plan Project (20190303134SF and 20180201064SF), CERNET Innovation Project (NGII20190802), and Undergraduate Innovation and Entrepreneurship Training Program of Jilin University (202010183389).