Abstract

In order to solve the problem of high computational complexity in block-based methods for copy-move forgery detection, we divide image into texture part and smooth part to deal with them separately. Keypoints are extracted and matched in texture regions. Instead of using all the overlapping blocks, we use nonoverlapping blocks as candidates in smooth regions. Clustering blocks with similar color into a group can be regarded as a preprocessing operation. To avoid mismatching due to misalignment, we update candidate blocks by registration before projecting them into hash space. In this way, we can reduce computational complexity and improve the accuracy of matching at the same time. Experimental results show that the proposed method achieves better performance via comparing with the state-of-the-art copy-move forgery detection algorithms and exhibits robustness against JPEG compression, rotation, and scaling.

1. Introduction

With the development of computer technique, there are more and more image editing tools, such as Photoshop and Fireworks. As a result, it becomes much easier for people even nonprofessionals to do some operations on digital images. However, it also brings convenience for people to tamper maliciously. Once these tampered images are applied to court forensics, newspaper, or academic research, the social credibility crisis will be aroused. Therefore, the image forgery detection becomes necessary. There are many types of tampering operations, and copy-move is one of the most common operations among them. The copy-move forgery is to copy a region and paste it into another place in the same image.

In the last few years, a large number of methods have been proposed to detect copy-move forgery that are mainly concentrated on two categories. One is based on block feature, and the other is based on keypoint feature. The block-based methods usually first divide the image into overlapping blocks, then extract each block’s feature, and finally find the duplicate regions after matching them. Fridrich et al. [1] first proposed the copy-move forgery detection (CMFD) method by using the coefficients of Discrete Cosine Transform (DCT), and they applied dictionary sorting to the matching process. However, the computing complexity of this algorithm is very high. As a result, many dimensionality reduction-based algorithms have been proposed. Popescu and Farid [2] reduced the feature dimensions via principal component analysis (PCA). Bashar et al. [3] improved the performance further by Kernel-PCA (KPCA). Li et al. [4] combined discrete wavelet transform (DWT) and Singular Values Decomposition (SVD) to extract block feature. Kang and Wei [5] extracted the singular values of a reduced-rank approximation (SVD) as the block feature. Bayram et al. [6] computed the Fourier-Mellin Transform (FMT) for each block, and this method had high robustness in rotation and scaling. Mahdian and Saic [7] proposed a method based on blur moment invariants (Blur). Wang et al. [8] applied Hu moments (Hu) to each block to extract features. Ryu et al. [9, 10] used Zernike moments (Zernike) as block feature, which was only robust in rotation. Luo et al. [11] used the average of red, green, and blue components, respectively, and computed directional information for each block. Wang et al. [12] used the mean intensities of circles with different radii around the block center to obtain the robust feature. Lin et al. [13] got a 9-dimensional feature vector via calculating the gray average intensities of each block and its subblocks. Bravo-Solorio and Nandi [14] extracted the entropy as block feature. Though the block-based method can locate the tampered region in pixel level accurately, the performance may decrease a lot when the suspicious image was attacked by some operations, such as noise, JPEG compression, and scaling. In addition, it is hard for these algorithms to estimate an accurate geometrical transform. Moreover, the computing complexity is very high in the matching process due to a large number of overlapping blocks, and the computation time would increase rapidly as the image size becomes larger.

The keypoint-based methods rely on the extraction of keypoints in the images. Huang et al. [15] proposed an algorithm based on Scale Invariant Feature Transform (SIFT), which is robust and sensitive to postimage processing. Amerini et al. [16] matched the SIFT descriptors via g2NN, resulting in managing multiple regions matching successfully. Speeded-Up Robust Feature (SURF) [17, 18] is used to improve matching efficiency, and the length of this feature is half of SIFT. SIFT and SURF are the most widely used keypoints for CMFD. Some other local features are also proposed to detect duplicate regions, like Local Binary Pattern (LBP) [19], Binary Robust Invariant Scalable Keypoints (BRISK) [20], and DAISY [21]. Though the computing complexity of these algorithms in matching process is much lower, they also have a main drawback: the performance will be very poor when the copy-move regions are smooth.

To overcome the shortcomings in block-based and keypoint-based approaches, a novel matching strategy is proposed to detect the copy-move forgery in digital images. We divide image into nonoverlapping blocks; then the image is segmented into smooth part and texture part according to the distribution of keypoints. In texture part, we use the extracted keypoints to find duplicate regions, while in smooth part, in order to reduce the computational complexity, blocks with similar color are clustered into a group, and searching duplicated blocks will also be carried out in the same group. Because we use nonoverlapping blocks, we cannot find the two same blocks in the content, which results in mismatching. So we need to update the candidate block by registration if the query block and the candidate block are partly consistent in content before projecting them to hash space. Finally, LSH algorithm is used to group the updated candidate blocks into several hash buckets. The strategy based on nonoverlapping can significantly reduce computational complexity, while maintaining the high accuracy levels for CMFD. The rest of the paper is organized as follows. In Section 2, we show the framework in our proposed method. The experimental results are presented in Section 3, and Section 4 draws conclusions.

2. Nonoverlapping Blocks Based CMFD Approach

The framework of our method is shown in Figure 1, which consists of three key steps: image division, feature matching, and postprocessing. In the first step, an image is divided into two parts: the yellow part is the texture region, while the blue part is the smooth region. In the second step, two types of features for the two parts are extracted to obtain the matched parts. In the last step, falsely matched pairs are removed; then we exploit morphology operations to generate a forgery detection map.

2.1. Image Division

In this section, we will introduce the image division. When we get a suspicious image, we first divide the image into nonoverlapping blocks; then we extract the keypoints in each of them. There are many kinds of keypoints, and we use SIFT in this paper. The reason why we choose SIFT keypoint will be explained in Section 2.2. Finally, we count the number of keypoints in each block. As we all know, the identification and selection of the keypoints rely on the high-entropy regions. As a result, most keypoints can only exist in texture regions. Consequently, we select a threshold to decide whether the current block is a smooth block or a texture block.

2.2. Feature Matching

According to the paper [23], we know that SIFT feature and Zernike moments are recommended for their excellent performance among the keypoint-based and block-based features, respectively. In our implementation, we employ vlFeat software to help us to detect and describe the keypoints. Once the parameters are set, the distribution and quantity of the keypoints are fixed. SIFT feature can detect forged regions even after some attacks but fail to detect flat area. Zernike moments are invariant to rotation and show effective for flat area. Therefore, in our implementation we choose SIFT and Zernike moments as features for detection in texture blocks and smooth blocks, respectively.

2.2.1. Feature Matching in Texture Region

After image division, we obtain a set of keypoints with their corresponding descriptors . We use g2NN matching in this step, because this method can well solve multiple regions matching problem. The candidate matches for each keypoint are found by calculating the Euclidean distance between all the other keypoints; then we obtain the vector which represents these distances sorted from nearest to farthest. Ratios are calculated in turn . The iteration stops in the value when and . Each keypoint in correspondence to a distance in is considered as a matched keypoint.

2.2.2. Feature Matching in Smooth Region

Due to the fact that only a small number of keypoints can be extracted in the smooth region, we propose a new matching strategy. Firstly, blocks with similar color are clustered into a group. Secondly, blocks are updated by registration within a group. Feature-based LSH is used to accelerate finding correct matches in the last. Noting that after the first round of matching we need to do registration according to the next block and execute the same operation until all the blocks in this group are traversed. The framework of feature matching in smooth region is shown in Figure 2.

We assume that the source region and its corresponding target region in the image share the same color distribution. Hence, when we detect a pair of suspicious regions, we just need to search in blocks which have similar color. For example, if the content of the coping source region is ocean, its matching region must exist in the blue region in the image. In addition, in this way, we can reduce computational complexity. In order to remove the influence of brightness, we convert these selected smooth blocks from RGB color space to HSV color space to get new ones . The and components are uniformly quantized to the levels, where ; . Then we compute the mean value of and in each block to obtain and , where represents the coordinates of the upper left corner of the block. We use as block’s color feature; then some of them will be divided into a group if their color features are very similar, since and are quantized to the levels, respectively, and the total number of color groups is . Finally, we count all the nonempty color groups; then the color group can be represented as , where is the number of the nonempty color groups.

The traditional block-based methods usually use overlapping blocks to extract feature, which are very time consuming, so we use nonoverlapping blocks in this paper. As shown in Figure 3, the copying source region is marked by a green line and the pasting target region is marked by a blue line. The two red solid lines and are candidate blocks to be matched in the next step. If we use the LSH algorithm directly to find similar blocks, the mismatching is more likely to occur due to disalignment. To solve this problem, before hashing them, we need to calibrate the remaining blocks according to each block in turn within a color group.

Phase correlation is a common method of image registration [24]. Given an block , shift it by , and we will get , where Their Fourier transforms satisfy The cross power spectrum of them is As shown in Figure 4, the inverse Fourier transformation of (4) is a two-dimensional pulse function, with a peak at . We make and stand for two candidate matching blocks and in Figure 3. We first compute the inverse Fourier transform of their cross power spectrum. If the peak value is larger than and the values of other positions are lower than , we consider that the two blocks satisfy the registration condition. Then we use the position of the peak as a displacement vector to change to . In Figure 2, the updated block is marked by a red dashed line. On the contrary, if these two blocks cannot satisfy the registration condition, we will not update the current candidate block.

We use Zernike moments as block feature here. Zernike moments of order and repetition of a digital image are defined as where is a Zernike polynomial of order and repetition . where , is even, and are real-valued radial polynomials. We choose the order of the Zernike moment, . Consequently, the Zernike moments extracted from all blocks can be grouped as follows: We make the blocks corresponding to feature vectors in the set satisfying

Feature vectors can be regarded as data points distributed in high dimensional character space. The closer the data points, the higher the similarity they have. LSH algorithm can project data points from original data space to hash space through hash functions, and the hash functions satisfy the intuitive notion that the probability of a hash collision for two points is related to the similarity between the points.

Each feature vector can be projected to hash space and get hash value. We choose a hash function based on -stable distribution: where represents an operation of rounding down, is quantization step, is a random real number located in , is set to 2, and is a random vector, . In order to increase the accuracy of collision between similar vectors, we use groups of hash functions to project the feature vectors, respectively, in which each group has hash functions . Projecting each feature vector with a group of hash functions can obtain a set of hash values that will be used as their bucket numbers. Therefore, we can get different hash index tables for similarity vectors searching. The establishment of a hash tables is shown in Figure 5. After projecting feature vector by groups of hash functions, we get different bucket numbers. All the feature vectors of these buckets are taken out as a set to be screened. Then the similarity between and all of these vectors is calculated. It is worth noting that we do not use the first order as it represents the average intensity and its value is much higher than that of others. In our implementation, L2-norm is used to represent similarity between feature vectors. The similarity is therefore If , the two blocks are detected to be similar. Then, we calibrate the rest of the blocks according to another block in this color group until all the blocks are traversed.

A lot of measures are taken to try to reduce the computational complexity of our algorithm. We divide image into texture part and smooth part; therefore we need to calculate their computational complexity separately. In texture part, we extract SIFT feature and use g2NN matching strategy, so the computational complexity of this part is , where is the number of keypoints, while in smooth part blocks with similar color are clustered into a group. Since and are quantified to 10 levels, the number of color groups is 100. And in registration step, we update all the other blocks with reference to a query block each time. So the computational complexity of this part is on average, where is the number of smooth blocks. Special to note is that traditional block-based approaches use overlapping blocks to find duplicated regions, and the total number of blocks is , where and are the width and height of the image, respectively, and is the length of square block. It is obvious that its computational complexity is . Because we use nonoverlapping blocks, at most, and in most cases. Therefore, in the worst case the computational complexity of our method is . The number of keypoints is usually not much, and for the most frequently used datasets, our method can be much faster than the traditional block-based methods.

2.3. Postprocessing

The goal of this last step is to present to us more accurate matches. In this paper, we consider three steps to remove falsely matched pairs, including distance, relative position, and affine transformation.

The adjacent keypoints or blocks might be misregarded as the matching pairs, due to their similar characteristics. So the matching pairs will be eliminated, if the distance between them is smaller than the threshold .

The true matching pairs are often distributed densely, so a merging method named Broad First Search Neighbors (BFSN) clustering [25] is performed on spatial locations of the matching pairs to help delete the isolated keypoints or blocks. In this algorithm, if the distance between point and point is smaller than a threshold, the two points are defined as neighbors. For a cluster with elements, point can be merged into , while is the neighbor of at least elements in . The radio factor , which ranges from 0 to 1, controls size and shape of the cluster. First of all, we create an empty class and put the first row of coordinate matrix into . Secondly, we search for all neighbors of in a breadth first order and determine whether the current vector can be incorporated into the class in terms of clustering conditions. Finally, we delete the vectors in which have been incorporated into and put the first row of the current matrix into . Repeat the above three steps until . The cluster whose inner elements are less than 3 will be regarded as the isolated cluster and the elements in them will be deleted.

Truly matched pairs satisfy the same affine transformation, which means they should exhibit similar amounts of translation, scaling, and rotation. The relationship between these two matched blocks and can be represented as follows, where is a matrix: We assume that if the two matching pairs are from the same duplicated regions, the slopes of their matching lines are consistent. BFSN clustering is performed on the slopes of matching lines to divide the matching pairs into different clusters. RANSAC algorithm [26] is then performed on each group, respectively, to eliminate false matches. This method can avoid misdeleting in the situation of managing the multiduplicated matching. In addition, the test image will be identified as the tampered one, if the number of remaining matches is more than a threshold.

3. Experimental Results

In this section, error measures are first introduced, and experiments are conducted on three databases that are available online, Benchmark presented by Christlein et al. [23], CoMoFoD [27], and GRIP [28]. The first one is composed of 48 images, which range from to in size, having about 10% of the whole image as tampered regions. The second one comprises 200 original images and their corresponding tampered images with size of . There are 7 groups of tampered images, one is without any attacks, and the rest are manufactured by adding various attacks, including JPEG compression, blurring, noise adding, and color reduction. The last one consists of 80 images, and these images have fixed size pixels; each of them has only one pair of duplicated regions. The hardware environment of the experiments is a 3.60 GHz Intel Core i7-4790 processor; the software used is Windows 7 MATLAB R2014b.

3.1. Error Measures

We often evaluate the performance of a CMFD method at two levels, namely, image level and pixel level. At image level, we are concerned about whether or not this image has been tampered, while in pixel level we pay more attention to the accuracy of location. Precision and recall are used here. The detection error at the image level is defined as where is the number of images that have been correctly identified as forged, represents the number of images that have been erroneously detected as forged, and indicates the falsely missed forged images. The detection error at the pixel level is defined as where is the number of pixels detected by the CMFD method; is the number of all forgery pixels marked by ground truth.

We also use score as a comprehensive measure.

3.2. Detection Results on the Benchmark Database

We first test the proposed method on Benchmark database. Several example test cases are shown in Figure 6. The first row of Figure 6 shows the original images, the second row shows the tampered images, the third row shows the ground truth maps, and the last row shows the detection results of the proposed method. In Table 1 we compare the proposed algorithm with two other methods, and the result indicates that our method can achieve 90.91% precision, 83.33% recall, and F1 score, which preforms better than SIFT [16] and Segmentation [22]. In the third column of Figure 6, though the tampered area is smooth, we can still detect it.

A practical CMFD algorithm should have a relatively low computational complexity in addition to maintaining a certain degree of accuracy. In order to measure the performance of our proposed method, we compare the time complexity of different algorithms on this dataset. The average execution times are illustrated in Figure 7. It should be noted that the implementation platforms of evaluated methods are different. For instance, Zernike is implemented in C++ for speed, and segmentation and the proposed method are implemented in MATLAB. Because of the high resolution, all methods demand more time. The proposed method is the fastest, except for SIFT [16].

3.3. Detection Results on the CoMoFoD Database

Next the experiments are carried out on CoMoFoD database. Illustration of four cases is shown in Figure 8. Each row from top to bottom represents original images, tampered images, ground truth, and detection results of the proposed method, respectively. Table 2 shows the detection results comparing with other methods. From this table, it can be clearly seen that the results of the proposed method outperform [16, 22] under ideal conditions.

In addition, we conduct experiments to evaluate the robustness of our method against various attacks. That will make the CMFD more difficult. The forgery images are generated by making the copied snippets undergo 3 kinds of attacks, namely, JPEG compression, rotation, and scaling.(1)JPEG compression: the image quality-factor varies from 20 to 100 with the step of 10. The results are shown in first row of Figure 9. For most of the quality factors, the precision, recall, and score of our method are higher than those of [16, 22]. The curve of red line has smaller slope than others; that means our method has better robustness against JPEG compression.(2)Rotation: we rotate the copied snippets with the rotation angles varying from 2° to 10° with the step of 2°. The results are shown in second row of Figure 9. It can be clearly seen that our method is much better than [16, 22].(3)Scaling: the copied snippets are scaled with the scale factors varying from 91% to 109% with the step of 2%. As shown in third row of Figure 9, our method outperforms [16, 22] in terms of precision and score, and recall is almost as same as that of [16].

3.4. Detection Results on the GRIP Database

We also test the performance of the proposed method on the GRIP database. The detection results are given in Table 3. It can be observed that all evaluation indicators of our method are well over those of [16, 22]. In particular, each evaluation indicator is more than 11% higher than that of [16]. This is because the database contains a lot of smooth images, and our method divides the smooth area of the image into nonoverlapping blocks to find correct duplicated regions.

We first classify the blocks into several groups according to their color distribution. In order to prove that this step can improve the efficiency of the algorithm, a baseline reference technique with registration and LSH is considered. From Figure 10, we are able to see that the advantage of grouping is obvious, and the computational complexity is decreased by 94.67%.

Moreover, we present some experiments to assess the impact of registration. Some smooth images from this dataset are picked out for this purpose. Results reported in Table 4 confirm the effectiveness of the registration step in matching.

4. Conclusion

In this paper, we propose a new CMFD method, which is able to solve the problem of high computational complexity in traditional block-based methods. First, image is divided into two different types of regions. Next, features are extracted and matched in texture region and smooth region using SIFT and Zernike moments, respectively. In particular, we use nonoverlapping blocks as candidates in smooth area and do the registration via the phase correlation algorithm before matching. Finally, the exact forgery regions will be generated after removing falsely matched pairs and exploiting morphology operations. There are three main contributions in our proposed method as follows. (1) Dividing image into texture region and smooth region according to the number of keypoints can not only avoid the poor performance using SIFT in smooth area but also reduce computational complexity. (2) We use nonoverlapping blocks as candidates in smooth area instead of using all overlapping blocks and apply color based feature to decrease the number of blocks needed to be matched. (3) In order to prevent mismatching due to disalignment, we do registration before using LSH algorithm to obtain more accurate candidate blocks.

Experimental results show that the performance of the proposed algorithm is better than the state-of-the-art CMFD methods. In addition, our method can manage multiple copied regions. Meanwhile, it exhibits robustness to JPEG compression, rotation, and scaling. However, when dealing with very smooth blocks, such as sky and wall, the effect of registration is not particularly good. In our future work, we will try to improve the accuracy of location and the detection speed.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported in part by National Key Research and Development of China (2016YFB0800404), National NSF of China (61672090, 61332012), and Fundamental Research Funds for the Central Universities (2015JBZ002).