Abstract

Image hashing schemes have been widely used in content authentication, image retrieval, and digital forensic. In this paper, a novel image hashing algorithm (SSL) by incorporating the most stable keypoints and local region features is proposed, which is robust against various content-preserving manipulations, even multiple combinatorial manipulations. The proposed algorithm combines cale invariant feature transform (SIFT) with aliency detection to extract the most stable keypoints. Then, the ocal binary pattern (LBP) feature extraction method is exploited to generate local region features based on these keypoints. After that, the information of keypoints and local region features are merged into a hash vector. Finally, a secret key is used to randomize the hash vector, which can prevent attackers from forging the image and the hash value. Experimental results demonstrate that the proposed hashing algorithm can identify visually similar images which are under both single and combinatorial content-preserving manipulations, even multiple combinations of manipulations. It can also identify maliciously forged images which are under various content-changing manipulations. The collision probability between hashes of different images is nearly zero. Besides, the evaluation of key-dependent security shows that the proposed scheme is secure that an attacker cannot forge or estimate the correct hash value without the knowledge of the secret key.

1. Introduction

Recently, due to the wide application of digital image processing technology, the image data protection has raised serious concerns. In order to prevent images that are used to transmit information from being tampered or forged, image hashing schemes have been proposed. Image hashing is a technique that extracts a content-based compact signature from an image. It has been effectively applied in content authentication [15], digital forensic [6, 7], and image retrieval [8, 9].

Comparing to the traditional hash function in cryptography, image hash needs to resist content-preserving manipulations such as geometric transformation and common signal-processing operations. It also needs to be sensitive to content-tampering manipulations such as object removal, object replacement, and object insertion. In general, an image hashing algorithm should satisfy the following properties [10]. (1) One-wayness: it is easy for an image hashing method to obtain the hash value from an input image. However, it should be rather difficult to derive the image content from an image hash value. (2) Compactness: the size of an image hash value should be far less than the size of the original image. (3) Perceptual robustness: the difference between two image hashes generated from two similar perceptual contents should be small. Hence, for the visually similar images, the hash function should produce approximate image hashes even under various content-preserving operations. (4) Collision resistance: the image hashes of perceptually different images should be different, and different images can be easily distinguished by hash distances. (5) Unpredictability: it is computationally impractical to estimate an image hash without the secret key.

Currently, many existing image hashing algorithms are only robust to single content-preserving manipulations, while these methods are not effective against combinatorial manipulations. In order to resist various content-preserving manipulations and their multiple combinations, we propose a robust image hashing algorithm (namely SSL) by exploiting cale invariant feature transform (SIFT) with aliency detection and ocal binary pattern (LBP) feature extraction. The proposed scheme extracts the most stable keypoints and texture features from an image to generate image hash, which can resist both single content-preserving manipulation and multiple combinatorial manipulations. It can also detect tampered images that are under content-changing manipulations. To the best of the authors’ knowledge, this is the first time SIFT, Saliency detection, and LBP are combined to achieve good robustness and discrimination thus can address the challenges of the state-of-the-art techniques. Moreover, randomization of hash sequence is proposed which can effectively ensure the security of the proposed scheme.

The SSL image hashing algorithm firstly utilizes SIFT and saliency detection to extract the most stable keypoints from an image. Then, the hashing algorithm generates texture features of local regions which are around those stable keypoints. The texture features are represented by LBP, which is gray-scale invariant and can be used to detect content-changing manipulations. After that, the information of keypoints and texture features are combined to generate a hash vector. Finally, in order to enhance the security of the image hash, the hash vector is randomized into an encrypted hash by using a secret key. In the image authentication stage, the receiver first calculates the image hash of a received image and matches the same keypoints between the received image and the original image. If there are some matched keypoints (indicating they are the same image) and some mismatched keypoints between the two images, the received image is considered as tampered. The proposed scheme is a nonlinear hash scheme, and the keypoints extracted from a tampered image are different from the keypoints extracted from the original image. Then, the texture features around the matched keypoints in the two hash values are aligned, so that the tampered region can be located by calculating the Hamming distance of the hash values between the aligned texture feature parts. The performance of the proposed image hashing scheme is evaluated with respect to the perceptual robustness, the collision resistance between hashes of different images, the sensitivity to visual content changes, the security of key dependence, the time cost, and the hash length.

The main contributions of this paper are as follows:(i)The proposed scheme extracts most stable keypoints by exploiting SIFT with saliency detection, which are robust not only to single content-preserving manipulations but also to combinatorial manipulations, even their multiple combinations. In addition, these extracted keypoints contain sufficient image information, such as geometric information and descriptors of keypoints, and texture features around the keypoints. This information ensures that the proposed hash algorithm has good robustness and discrimination.(ii)The proposed method can recover the received image to the orientation and scale of the original image according to its geometric information. Moreover, by incorporating the most stable keypoints and local texture features, the proposed scheme is sensitive to content-changing manipulations and can also locate the tampered regions during the image authentication process.(iii)By randomizing the hash sequence with a secret key, the proposed scheme is so secure that an attacker cannot forge or estimate the correct hash value without the knowledge of the secret key.

The rest of this paper is organized as follows. Related works are reviewed in Section 2. The scale invariant feature transform, saliency detection, and local binary pattern are described in Section 3. Section 4 elaborates the proposed image hashing scheme. Experimental results are presented in Section 5. Finally, conclusions are given in Section 6.

The earlier image hashing schemes are based on image transformation, such as discrete cosine transform (DCT), discrete Fourier transform (DFT), and discrete wavelet transform (DWT). Venkatesan et al. [11] are the first to propose image hashing. They use randomized processing strategies to convert an image into a random binary sequence. In [12], the image hash is generated by using DCT-free modes to random smooth patterns with a secret key. Swaminathan et al. [13] propose a Fourier-Mellin transform based image hashing scheme. Tang et al. [14] propose a lexicographical framework to generate image hashes by using DCT and nonnegative matrix factorization. Lei et al. [15] apply Radon transform and DFT on an image to calculate the magnitude of the significant DFT coefficients. The image hashes are obtained by normalizing and quantizing the significant DFT coefficients. Karsh et al. [16] propose an image hashing scheme by using DWT-SVD and spectral residual method. Tang et al. [17] divide an image into nonoverlapping blocks and apply a three-level two-dimensional DWT to each block to extract DWT statistical features.

Recently, some image hashing schemes based on invariant features have been proposed. Monga et al. [18] generate image hash using feature points, which can resist attacks including geometric transformation and common signal processing. The scheme extracts significant features by using a wavelet based feature detection algorithm and uses an iterative procedure to obtain a set of feature points. Tang et al. [19] extract a rotation-invariant feature matrix and compress the matrix with multidimensional scaling to generate an image hash. Tang et al. [20] construct three-order tensor from an image and decompose the tensor to generate an image hash. Zhao et al. [21] propose an image hashing scheme by incorporating global features and local features. The global features and local features are based on Zernike moments of the luminance/chrominance components and salient regions, respectively. This method can be applied to image authentication and can detect malicious tampering. Karsh et al. [22, 23] also use global features and local features to generate image hash. The global features are obtained by using ring partition and projected gradient nonnegative matrix factorization [22] or statistical feature distance [23], and the local features are extracted from the salient regions of an image.

Some image hashing schemes that use SIFT to extract keypoints and generate hash values are proposed. The keypoints extracted by SIFT can resist most of the content-preserving manipulations. Roy et al. [5] propose an image hashing algorithm to detect image tampering. The scheme extracts feature points by SIFT and compresses them into a part of the hash value. The other part of the hash value is obtained by calculating the orientation of each block, which is generated by the edge detection algorithm. Battiato et al. [24] propose an image hashing algorithm by encoding the spatial distribution of feature points, which is extracted by SIFT. The scheme uses a voting procedure in the parameter space of the model to recover the geometric transformation. Lv et al. [10] propose a SIFT-Harris detector to select stable keypoints. They generate an image hash by embedding keypoints into shape-contexts-based descriptors. Ouyang et al. [25] apply quaternion Zernike moments and SIFT to extract image features and generate the image hash. The feature extraction algorithm removes repeated keypoints and selects top keypoints as the features of an image. The methods in [10, 25] do not evaluate the robustness of the algorithm under combinatorial manipulations.

Texture is also an important feature of images. Many descriptors have been proposed to represent the texture features of an image, such as the LBP [26], the center-symmetrical local binary pattern (CSLBP) [27], the dual-cross pattern (DCP) [28, 29], and the transition local binary pattern (tLBP) [30]. Many image hashing algorithms based on these descriptors are proposed in the literature. Qin et al. [31] use the encoding of dual-cross pattern to extract texture features. Then, they combine the salient structural features to compress the texture features to generate image hashes. Qin et al. [32] exploit block truncation coding and CSLBP to generate an image hash, which can be applied in image authentication and retrieval. Song et al. [33] propose a hashing algorithm for near-duplicate video retrieval based on multiple visual features, including LBP features and the global features. Dubey et al. [34] use multichannel decoded LBP as features of an image to retrieve the image. Abbas et al. [35] combine Noise Resistant LBP and Singular Value Decomposition to construct an image hashing algorithm against content-preserving manipulations. This method does not evaluate the robustness of the algorithm under image rotation. Davarzani et al. [36] use CSLBP to extract features from each nonoverlapping block and generate a hash value by inner products of feature vectors. The identification accuracy of this method will decrease when facing geometric manipulations.

There are rarely few image hashing algorithms that can resist combinatorial manipulations. Li et al. [37] propose an image hashing scheme based on random Gabor filtering and dithered lattice vector quantization. The scheme only evaluates the robustness of the algorithm under a few combinational manipulations, such as rotation with cropping and rotation with scaling. Tang et al. [38] propose an image hashing scheme using a ring partition and a nonnegative matrix factorization, which can resist rotation operations. Only the combined attacks using rotation with other content-preserving operations are evaluated, while other combinations of attacks are ignored in the experiments. Tang et al. [39] combine ring partition and invariant vector distance to generate image hash. This method is robust to image rotations and is sensitive to content changes. Similarly, this method is only validated on combined attacks using rotation with other content-preserving operations.

3. Preliminary

3.1. Scale Invariant Feature Transform

SIFT is an algorithm that extracts feature points of an image. These feature points are robust to geometric transformation and common signal-processing operations. The feature points extraction process of SIFT is as follows [40].

Scale-Space Extrema Detection. This step uses a difference-of-Gaussian (DOG) function to search for potential keypoints that are invariant to scale and orientation. The scale space of an image is first convolved with Gaussian function with different scales , which is defined aswhere and correspond to the abscissa and ordinate of a point in an image , respectively. is the convolution operation in the point . is a variable-scale Gaussian function. The DOG function is defined as the difference of two nearby scales separated by a constant multiplicative factor , as follow [40]:SIFT algorithm searches extreme values of the DOG images at the current and adjacent scales as candidate keypoints.

Accurate Keypoint Localization and Orientation Assignment. In this step, a model is used to determine the location and the scale of the keypoints. Meanwhile, the unstable keypoints with low contrast are eliminated. To assign an orientation to each keypoint, the gradient orientations around the keypoint need to be calculated. Then, the number of each orientation is counted and an orientation histogram is formed from these gradient orientations. The peak of the orientation histogram represents the orientation of the corresponding keypoint.

Descriptor Generation. After the keypoint detection is completed, the position, scale, and orientation of each keypoint are determined. Then, the descriptors of keypoints are generated based on this geometric information.

3.2. Saliency Detection

The spectral residual is used to detect the salient region that describes the nontrivial part of an image [41]. In [41], Hou and Zhang decompose the image information into two parts based on information theory, the innovation part, and the redundant part. By analyzing and processing the log spectrum of an image, the spectral residual approach can estimate the innovation part and remove the statistical redundant components. The spectral residual of saliency detection consists of the following process [41].

An input image is firstly transformed into a frequency domain image by Fourier Transform. The result of Fourier Transform is denoted as . Then, the log spectrum and the phase spectrum of the image are obtained from . The redundant information is defined by [41] where is an low-pass kernel. The spectral residual can be calculated by [41] By applying the Inverse Fourier Transform, the saliency map is constructed in spatial domain. The value of each point in a saliency map is calculated by [41] where is a Gaussian filter and is an Inverse Fourier Transform.

After that, salient regions can be obtained by using threshold segmentation on the saliency map, which are represented by an object map. If the value of the point in the saliency map is smaller than the threshold, the value of the corresponding point on the object map is set to be 0. Otherwise, the value of the corresponding point on the object map is set to be 1. Salient regions are the regions with the value of 1 on the object map.

3.3. Local Binary Pattern

The LBP operator is a texture descriptor, which is introduced by Ojala et al. [26]. It assigns a binary label (8 bits) to each pixel of an image. When calculating the label of each pixel, the current pixel is set as the center pixel. Then, the gray value of the center pixel is compared with the gray value of its neighborhood pixels in a clockwise (or counterclockwise) direction. If the neighboring pixel value is smaller than the center pixel value, the corresponding bit on the binary label is set to be 0. Otherwise, the corresponding bit on the binary label is set to be 1. After comparing the gray value of the center pixel and all neighboring pixels, the binary label of the center pixel is obtained and converted to a decimal number. Therefore, there are possibilities in the adjacent region (including pixels), and the range of LBP vector of each pixel is from 0 to 255. Finally, the histogram of the labels represents a texture descriptor.

4. Proposed Image Hashing Scheme

The overall flow of the proposed image hashing method is shown in Figure 1. There are five stages in this scheme to generate a robust image hash. Firstly, a preprocessing step, which includes normalization and filtration operations, is applied to the original image. Secondly, the stable feature points are extracted from the preprocessed image by combining SIFT with salient region detection. Thirdly, LBP algorithm is used to extract texture information from each region which is around each feature point. Then, the descriptors of feature points and texture features are combined and compressed into a hash sequence. Finally, the hash sequence is randomized by a secret key to enhance the security of the proposed image hash, which can be transmitted in an untrusted channel. The process of each stage is described in the following sections, and the authentication method is presented in Section 4.6.

4.1. Preprocess: Normalization and Filtration

Before extracting feature points of an image, a preprocessing operation is applied to the original image I. Firstly, in order to ensure that each image has the same hash length, the original image I is normalized to a fixed size by using bilinear interpolation. Then, a low-pass filtering is applied to the normalized image. The purpose of the filtering is to eliminate irrelevant information and extract useful image features. The filtering operation is important for the robustness of the image hashing.

4.2. Keypoints Extraction Based on SIFT and Saliency Detection

The process of the keypoints extraction is summarized in Algorithm 1. Firstly, the SIFT algorithm is applied to the preprocessed image to extract feature points. The information of these feature points contains geometric information and descriptors. The set of geometric information of feature points is denoted by a vector , where and is the total number of all feature points. The of each keypoint contains the information about location, scale, and direction. Similarly, the set of the descriptors of feature points is denoted by a vector . Each which has 128 dimensions represents the gradient distribution around each feature point. In general, the number of feature points is large. However, some of these feature points are not robust against content-preserving manipulations. This means that the unstable feature points will be lost under different content-preserving manipulations, which will have a significant impact on the hash value. Therefore, in the proposed scheme, the unstable feature points are removed, while only the stable feature points are reserved.

Input: Preprocessed image
Output: Geometric information and descriptors of keypoints
 (1) // Extract feature points using SIFT [40]
 (2) Generate DOG images with image
 (3) Search for candidate feature points from DOG images
 (4) Determine feature points and get geometric information of feature points:
  
 (5) Descriptor generation:
 (6) // Saliency detection [41]
 (7) Apply the Fourier Transform to the image :
 (8) Obtain the spectral residual:
 (9) Obtain the saliency map:
 (10) // Extract stable keypoints
 (11) for   to   do
 (12)  , where
 (13) end for
 (14)
 (15) for   to   do
 (16)  if     then
 (17)   Reserve
 (18)  else
 (19)   Remove
 (20)  end if
 (21) end for
 (22) Sort the remaining keypoints according to the scale
 (23) Get the geometric information of the top keypoints ,
  corresponding descriptors
 (24) if the number of keypoints   then
 (25)  the value of all elements in and are set to be 0
 (26) end if
 (27) return Geometric information and descriptors

In order to enhance the robustness of the proposed image hashing, the salient region detection [41] is used to extract the spectral residual of an image and eliminate the unstable feature points. By processing the log spectrum of an input image, the spectral residual approach constructs a saliency map which contains the salient regions of an image. The value of each point in the saliency map represents the estimation error, which is denoted by . The is added to according to the position of in the image. For simplicity, is represented as . Therefore, the geometric information of each feature point contains the information about location, scale, direction, and estimation error, which is represented by . The following criterion is applied to eliminate the unstable feature points, which is indeed a filtering operation:where is the average intensity of keypoints. If the of the th feature point is less than the average intensity of all keypoints, the feature point is removed. Otherwise, the feature point will be kept. As the average intensity of all keypoints is used as the threshold, this will ensure that not all the feature points will be removed.

After filtering, the remaining feature points are considered as keypoints. Then, all the keypoints are sorted in descending order according to the scale . Finally, the top keypoints are selected as the most stable points which are represented by a vector . Correspondingly, the descriptors of the keypoints are represented as . If the number of keypoints is less than , an interpolation operation will be performed to ensure that every image has the same hash length. In other words, the values of all elements in and are set to be 0. Note that solid color images are not considered in the proposed image hash method. The reason is that a solid color image contains a very low amount of information, and it does not have the value of image authentication and protection.

Compared with the method using only SIFT to extract keypoints, the proposed keypoints extraction method using SIFT with saliency detection uses fewer keypoints. As a result, the proposed method has higher collision probability than the method using SIFT only. However, the proposed method has better robustness against manipulations than the method using SIFT only. Besides, if the user increases the value of , the number of keypoints extracted using SIFT with saliency detection will be close to the number of keypoints extracted using SIFT. The collision probability of the proposed method will decrease and be close to the collision probability of the SIFT based method. Therefore, in this paper, SIFT with saliency detection are used to extract fewer and more robust keypoints, which is a trade-off between the robustness and collision resistance. The number of keypoints is theoretically discussed in Section 5.2.

To further illustrate the stability of the extracted keypoints, the proposed keypoints extraction method is compared with the SIFT method [40], the SIFT-Harris method [10] and the Ouyang’s method [25]. A function is used to evaluate the stability of different methods for extracting keypoints, which is defined by [10]where is the cardinality of a set, which represents the number of different elements in the set. and represent the set of keypoints extracted from an original image and a content-preserving manipulated image, respectively. Twenty images are selected from the uncompressed color image database (UCID) [42] as the original images. Content-preserving manipulations are applied to these images to generate visually similar images. These content-preserving manipulations include brightness adjustment (BA) with Photoshop’s scale = 10, contrast adjustment (CA) with Photoshop’s scale = 10, gamma correction (GC) with = 0.75, Gaussian low-pass filtering (GF) with , salt and pepper noise (SPN) with var = 0.005, multiplicative noise (MN) with var = 0.005, JPEG compression (JP) with QF = 50, watermark embedding (WE) with strength = 50, scaling (SC) with factor = 0.75, and rotation (RT) with = 5°. Then, the proposed method and the above state-of-the-art methods are used to extract keypoints of each image. The average value of each method is calculated, which is shown in Figure 2. It is shown that the feature points extracted by the proposed method are more stable than the other methods under all these content-preserving manipulations. The average value of the proposed keypoints extraction method is approximately 1 under some cases, , brightness adjustment, contrast adjustment, gamma correction, and watermark embedding.

4.3. Texture Features Extraction

Texture features have a high correlation with human visual perception, which are useful for feature extraction. In this work, LBP is used to describe the texture features. The texture feature extraction algorithm is shown in Algorithm 2. Firstly, the robust keypoints extracted by SIFT and saliency detection are used to determine the local regions. Each local region is a circle around each keypoint, in which each keypoint is the center, and the radius is a constant . represents the th local region and represents the number of pixels in a local region . Then, the LBP vector (8 bits) of each pixel is converted into a decimal number (0~255). If a pixel in a local region is out of the boundary of an image, the value of the corresponding LBP is set to be 0. The interval from 0 to 255 is divided into 32 intervals: . For a local region , the number of LBP vector in each interval is denoted by , and the texture feature of the local region is denoted by . Finally, the binary sequence of a local region is obtained bywhere ranges from 1 to 32, and is the number of pixels in a local region . Therefore, the texture feature of keypoints is represented by .

Input: Geometric information of keypoints
Output: Texture features around keypoints
 (1) for   to   do
 (2)  Determine the th local region according to the geometric information of the th keypoint
 (3)  The pixels in the are denoted as
 (4)  // Calculate the LBP value [26]
 (5)  for   to   do
 (6)   Initialize a 8-bit vector
 (7)   Determine the neighbourhood pixels of , which are denoted as
 (8)   for   to   do
 (9)    if gray value of < gray value of   then
 (10)     
 (11)    else
 (12)     
 (13)    end if
 (14)   end for
 (15)   Assign the vector to
 (16)  end for
 (17)  Convert vectors to decimal numbers
 (18)  Divide 0 to 255 into 32 intervals:
 (19)  Count the number of vectors in each interval as
 (20)  for   to   do
 (21)   if    then
 (22)     
 (23)   else
 (24)     
 (25)   end if
 (26)  end for
 (27)   
 (28) end for
 (29) return  Texture features
4.4. Hash Construction

After the above steps, keypoints and texture features of the corresponding local regions are obtained. For the convenience of storage and transmission, the information about keypoints and texture features will be compressed and transformed into binary sequences. The proposed hash construction algorithm is summarized in Algorithm 3. First, the position, scale, and orientation in of each keypoint are used to form . The geometric information of keypoints are represented as a new vector . Then, each value of is converted to binary value.

Input: Geometric information , descriptors and texture features
Output: Image hash sequence
 (1) for   to   do
 (2)  Extract from the to form
 (3)  Convert each value of into binary values
 (4) end for
 (5) Get geometric information of keypoints:
 (6) for   to   do
 (7)  for   to   do
 (8)   if    then
 (9)    
 (10)  else
 (11)   
 (12)  end if
 (13) end for
 (14) 
 (15) end for
 (16) Quantized descriptor of keypoints are  
 (17) Combine and generate image hash as  
 (18) return Image hash sequence

The original descriptor of each keypoint with 128 dimensions is compressed to 32 dimensions, and is normalized to , where . Then, each element in the is quantized to 1 or 0 by the following rule:where is the th element of and ranges from 1 to 32. is a threshold. The value of is theoretically discussed in Section 5.2. The quantized descriptor of keypoints is denoted by a vector .

Finally, the position , the quantized descriptor , and the texture feature are combined to generate the image hash . The size of the normalized image is (). As each value in can be represented by an 8-bit binary sequence, the length of is bits. As the length of is bits, the length of is bits. Similarly, the length of is also bits. Therefore, the length of the image hash value is bits. Table 1 shows the composition of an image hash.

4.5. Randomization of Hash Sequence

In order to enhance the security of the proposed image hashing scheme, a secret key is used to randomize the hash sequence generated by the above steps. The secret key is only known to the sender and the receiver. Similar to the work of Wang et al. [1], the proposed scheme also uses a chaotic sequence to randomize a hash value. Unlike using logistic map to generate chaotic sequences in [1], in this paper, a skew tent map is used for the generation of pseudorandom numbers. The pseudorandom numbers are uniformly distributed in , which is defined by Kanso et al. [43]: where and are the initial condition and the control parameter, respectively. The is converted to a binary value by the following equation [43]: Given an initial condition and a control parameter, an infinite length chaotic binary sequence can be generated. By setting a starting index, bits binary sequence can be intercepted from this infinite sequence. Therefore, the secret key is composed of an initial condition, a control parameter and an initial index. The intercepted bits sequence is denoted as . represents the original binary hash sequence. The encrypted image hash is obtained by Since the chaotic sequence generated by the tent map obeys the uniform distribution of and the mean value of the chaotic sequence is 0.5, the statistical distribution of the encrypted hash values will also be uniformly distributed. The probability that each of the encrypted hash values is 0 (or 1) is analyzed as follows. Assume that each bit in is 0 or 1 is an independent event. The probability that is 1 is denoted by , namely, . Since the probability that each bit in a chaotic sequence is 0 or 1 is identical [1], i.e., , the probability that the th bit in is 1 is as follows [1]:This implies that, the probability that each bit in the encrypted image hash is 0 or 1 is 0.5. For the proposed scheme, there are possibilities of the encrypted image hash values, and the probability of each possibility is equal. All possibilities of the encrypted image hash value are denoted as . The information entropy of can be calculated bywhere is the probability that the encrypted image hash is . Since has equal probability for each possible value, the information entropy reaches the maximum value. This implies that the encrypted hash is random enough. Therefore, the proposed method can effectively resist statistical attacks, and it can prevent an attacker from forging or estimating the correct hash value without the knowledge of the secret key.

4.6. Authentication Method

The framework of image authentication is also shown in Figure 1. Before authenticating an image, the encrypted image hash is deciphered with the secret key to obtain the hash of the original image . The hash of the received image is calculated using the above hash generation method. By comparing and , the proposed method can determine whether the received image has been maliciously modified. The authentication process consists of the following steps:(i)Partition and into three parts, which are denoted by and , respectively.(ii)For the th quantized descriptor in , the dot products of and each in are calculated. Then, the two smallest dot products are denoted as and , where . If , the th keypoint in matches the corresponding keypoint in , where is a threshold.(iii)If there are more than two pairs of keypoints matching between the original image and the received image , the difference in orientation and scale between these two images is calculated by a voting process. Then, the received image is restored to the orientation and scale of the original image. This operation can effectively resist the attack of scaling and rotation.(iv)The number of matched keypoints between the original image and the received image is denoted as , and the number of mismatched keypoints between the two images is denoted as . If , the received image is considered as tampered. The texture features around the matched keypoints in the two hash values are aligned. The tampered region can be located by calculating the Hamming distance of the hash values between the aligned texture feature parts. For each local region, if the Hamming distance between the aligned partial hashes is bigger than , the local region of the received image is considered as tampered.

5. Experimental Results

First, the experimental setup and the parameter settings are presented in Sections 5.1 and 5.2, respectively. Then, the proposed image hashing scheme is evaluated from five aspects in the subsequent sections: (1) perceptual robustness validation; (2) the collision resistance; (3) the sensitivity to visual content changes; (4) the security of key dependence; (5) the time cost comparison and the hash length comparison.

5.1. Experimental Setup

In the experiments, the UCID [42] database is used to evaluate the performance of the proposed image hashing scheme, which contains 1338 images with size or . Content-preserving operations are applied to these original images to generate visually similar images by using Photoshop, MATLAB, and StirMark [44]. The content-tampering images are generated by modifying local regions of images using Photoshop. All the experiments are implemented in MATLAB R2014a, running on a laptop with Intel Core i7-7700HQ CPU (2.8 GHz) and 8GB RAM.

We use the normalized Hamming distance (NHD) as a metric to evaluate the similarity between a pair of hashes, which is calculated as [13]where is the length of an image hash and and are the th elements of the image hash and , respectively. Obviously, the value of is between 0 and 1. For similar images, the value of is expected to be close to 0. For different images, the value of is expected to be far away from 0. A predefined threshold is used to decide whether a given image is a similar image or a different image compared to the original image. The value of the threshold is approximately between 0.1 and 0.3, which will be discussed in the next section. Besides, when detecting tampered regions, the Hamming distance of the hash values between the aligned texture feature parts is calculated bywhere and are the th texture feature of image hash and the th texture feature of image hash , respectively.

5.2. Parameter Settings

In the experiment, the parameters are set as follows. Firstly, the number of extracted keypoints is discussed. Since the proposed scheme detects tampered manipulations based on extracted keypoints, the number and location of the keypoints will affect the accuracy of tampering detection. We evaluate the accuracy of the proposed method for detecting tampered images under different settings. The 1338 images from the UCID [42] are used as the original images. Content-tampering manipulations are applied to these images to generate tampered images. The content-tampering manipulations include object removal, object replacement, object duplication and object insertion. Each type of content-tampering manipulation generates a tampered image. The number of keypoints is set to be 2, 4, 6, 8, 10, 12, 14, and 16, respectively. Figure 3 presents the accuracy of the proposed method for detecting tampered images on these 5352 test images under different settings. It is shown that the accuracy increases with the increase of value, and the accuracy of detecting tampered images exceeds when the number of keypoints is 10. When is greater than 10, the accuracy increases slowly.

Since both the SIFT and the saliency detection algorithms extract features based on the content of an entire image, tampering in a local region of an image will also affect these keypoints. Although the selected keypoints are only distributed in some parts of an image, if the content of a local region that does not contain a keypoint changes, the keypoints will also change. Therefore, the tampered image can still be effectively detected. However, the setting of will affect the accuracy of locating the tampered region. In the worst case, the extracted keypoints do not cover a tampered region. In this situation, the proposed method may not be able to locate the tampered region accurately, but it can still detect the tampering manipulations. To overcome this issue, the user can increase the radius of the circular region around the keypoints to cover more image areas. Another solution is to increase the value of . However, a larger will result in a longer hash value, which will increase the calculation time and storage requirement. By considering the hash length and the accuracy of the tampering detection (as shown in Figure 3), the number of keypoints is set to be 10 as a trade-off.

The parameters of the proposed scheme , , , and (the circle radius), are set to be 0.5, 0.1, 0.25, and 20, respectively. The parameter is used to determine the matched keypoints. According to the conclusion given in [40], when is 0.5, the probability of correct matches is high and the probability of incorrect matches is close to 0. The parameter determines the number of 0 and 1 in the vector . If the parameter is too small, each element in the vector will be 1. On the contrary, if the parameter is too large, each element in the vector will be 0. When the parameter is set to be 0.1, the number of 0 and 1 in the vector will be similar. The parameter is used to identify tampered images. The parameter is set to be 0.25 to resist content-preserving manipulations, so that similar images will not be misidentified as tampered images. The circle radius determines the area of the local regions around each keypoints. The larger the radius is, the more areas of an image can be covered, but the accuracy of locating an tampered region will decrease. Hence, the circle radius is set to be 20 as a trade-off.

The setting of the threshold is also important. If the threshold is low, images under content-preserving manipulations may be misidentified as different images. Conversely, if the threshold is high, different images may be misidentified as similar images. In the experiment, the threshold is set to be 0.2. The main reasons are as follows. On the one hand, for the geometric transformation, the coordinates of 10 keypoints extracted from the transformed image may change. Hence, in the hash value of the transformed image, there are bits that are different from that of the original image. The normalized Hamming distance between the hash values of the transformed image and the original image will be less than 0.2. On the other hand, for content-changing manipulations, there are more than two mismatched keypoints between the original image and a tampered image. Hence, in the hash value of the tampered image, there are more than bits that are different from the corresponding hash value in the original image. Therefore, the normalized Hamming distance between the hash values of the tampered image and the original image will be more than 0.2. If the threshold is 0.2, the tampered image will not be misidentified as a similar image. In addition, experimental results also show that, when the threshold is set to be 0.2, the proposed image hashing scheme not only is robust to single manipulation and combinatorial manipulations, but also satisfies the requirement of the collision resistance.

5.3. Perceptual Robustness

To evaluate the perceptual robustness of the proposed image hash method, a single manipulation image data set is firstly constructed by using UCID [42] database. 20 visually similar images are generated for each image in the UCID [42] database by using content-preserving operations similar to [39]. The following operations are adopted: brightness adjustment (Photoshop’s scale = , ), contrast adjustment (Photoshop’s scale = , ), gamma correction ( = ), Gaussian low-pass filtering ( = ), salt and pepper noise (var = ), multiplicative noise (var = ), JPEG compression (QF = ), watermark embedding (strength = ), scaling (factor = ), and rotation ( = 5°, 10°). Therefore, there are totally pairs of visually similar images in the single manipulation image data set.

Figure 4 shows the distribution of normalized Hamming distance between each pair of hashes under different single content-preserving manipulations. It is shown that most of the normalized Hamming distances between each pair of hashes are smaller than 0.1. If the threshold is set to be 0.1, the accuracy of detecting similar images is . This implies that the hashes of perceptually similar images will be identified as similar images (the NHD is smaller than ), and the proposed scheme is robust to various content-preserving manipulations. The normalized Hamming distance between a partially rotated image and the original image is greater than 0.1. The main reason is that image rotation causes the coordinates of keypoints in an image to change. However, the coordinates of keypoints are only a small part of the overall hash value. The threshold can be increased to tolerate such a small change in the hash value. Hence, if the threshold is set to be 0.2, all images under the content-preserving manipulations can be identified as similar images.

Furthermore, the robustness of the proposed scheme against content-preserving combinatorial manipulations is evaluated, which includes three types: double manipulations, triple manipulations, and quadruple manipulations. These manipulations with their corresponding parameter values are combined to obtain different combinatorial manipulations. For example, Gaussian filtering with = 0.5 and rotation with are combined into a double manipulations. Therefore, for double manipulations, triple manipulations, and quadruple manipulations, there are , , and kinds of combinatorial manipulations, respectively. Then, these combinatorial manipulations are applied to the images from the UCID [42] database. As an extremely large number of similar images are generated by using these combinatorial manipulations (i.e., ), for double manipulations, triple manipulations, and quadruple manipulations, we randomly select 10 thousand images from them, respectively, to construct a combinatorial manipulations image data set. Finally, the normalized Hamming distances between the manipulated images and the corresponding original images are calculated to evaluate the robustness of the proposed scheme against combinatorial manipulations.

The distribution of the normalized Hamming distance between each pair of similar images, under double manipulations, triple manipulations, and quadruple manipulations, is shown in Figure 5. It is shown that the normalized Hamming distances between most similar images range from 0 to 0.3. For these three types of combinatorial manipulations, the accuracy of detecting similar images under different thresholds is shown in Table 2. Two conclusions can be obtained: (1) as the threshold increases, the accuracy increases; (2) as the number of combined attacks increases, the accuracy decreases. For example, when the threshold is set to be 0.2, for double manipulations and triple manipulations, the accuracy of identifying similar images is higher than . When the quadruple manipulations are applied to the images, the detection accuracy of the proposed scheme is not high. However, the value of the threshold can be increased, e.g., when , the accuracy will be high up to 99.27% under double manipulations and reaches 88.28% under quadruple manipulations. Therefore, the proposed image hashing scheme is not only robust to single manipulation, but also robust to multiple combinatorial manipulations.

Further, the robustness of the proposed algorithm is compared with state-of-the-art hashing algorithms, including the SIFT based algorithms, the LBP based algorithms, and the algorithms against combinational manipulations. In these algorithms, the R&A SCH hashing [10] and the SIFT-QZMs hashing [25] are based on SIFT feature extraction. The SVD-CSLBP hashing [36] uses LBP to extract the texture features of an image. The GF-LVQ hashing [37] and the RP-NMF hashing [38] have considered the robustness of hashing algorithms under combined attacks using rotation with other content-preserving operations, while other combinations of attacks are not evaluated in their experiments.

In the experiments, the generated single manipulation image data set and the generated combinatorial manipulations image data set are used. For each hash algorithm, test images are classified according to the generated hash values. Classification accuracy are obtained under different thresholds , then the receiver operating characteristic (ROC) curves are constructed. The ROC curve for content-preserving manipulations is composed of true positive (TP) rate and false positive (FP) rate , which are calculated by [39]where is the number of the pairs of images which are correctly identified as similar images, is the number of all the pairs of similar images, is the number of the pairs of different images which are misidentified as similar images, and is the number of all the pairs of different images. reflects the ability of a hashing algorithm to distinguish similar images, which is used to evaluate the robustness of an image hashing algorithm. In a ROC curve, for a given false positive rate, the algorithm which has a larger TP rate is more robust than the algorithm with a small TP rate. In other words, the ROC of a good algorithm should be as close as possible to the top left corner [39].

Table 3 illustrates the of different hashing schemes under different manipulations when . Compared with other schemes, the of the proposed scheme is closer to 1 for each single manipulation. For some manipulations such as brightness adjustment, contrast adjustment, gamma correction, JPEG compression, and watermark embedding, the of the proposed scheme can reach up to 1. For combinatorial manipulations, the of all the hashing algorithms decreases as the number of manipulations increases. However, the of the proposed scheme is still higher than other schemes. The ROC curves of the proposed scheme and other schemes under single manipulation and combinatorial manipulations are shown in Figures 6(a) and 6(b), respectively. As shown in Figure 6(a), the of the proposed hash scheme is higher than other schemes under the same . As shown in Figure 6(b), similar conclusions can be obtained under combinatorial manipulations. The reason behind this is that the SIFT algorithm with saliency detection can extract more stable keypoints, which can resist various content-preserving manipulations as well as multiple combinatorial manipulations.

5.4. Collision Resistance

As an image hashing algorithm needs to distinguish perceptually distinct images, images with different contents should produce different hash values. Hence, hash values generated from different images should be statistically independent of each other [13]. This implies that the normalized Hamming distance of different images should be nearly 0.5. In this experiment, the UCID database is used to evaluate the collision resistance of the proposed scheme. The UCID database, which includes 1338 images, is composed of various types of images including natural scenes and man-made objects, both indoors and outdoors [42]. Although the number of images in UCID is small compared to the entire space of possible images, images in the UCID database are representative and thus have been widely used for content-based image retrieval in the literature. In the experiment, each time two different images are chosen from the UCID [42] database to generate hash values, and the normalized Hamming distance between the two hashes is calculated. Therefore, a total of hash pairs are obtained from the UCID database.

The distribution of the normalized Hamming distances between these 894453 hash pairs of different images is presented in Figure 7. The Chi-square test (also known as test) is used to evaluate and determine the distribution of these normalized Hamming distances, where the possible candidate distributions include Normal, Lognormal, Weibull, and Gamma distributions. The Chi-square test results of different distributions are shown in Table 4. Because the test statistic of the normal distribution is the smallest, the distribution of the normalized Hamming distances is considered to follow a normal distribution. The mean and the standard deviation of this normal distribution are and , respectively. Given a threshold , the collision probability of the proposed image hashing scheme can be calculated by

Table 5 presents the collision probabilities of the proposed image hashing scheme under different thresholds . Obviously, the collision probability of the proposed image hashing scheme is extremely low which can be ignored. It means that the probability of two different images generating similar hash values is nearly 0. Besides, as the threshold decreases, the collision probability decreases. However, if the threshold is too small, the robustness of the image hashing will decrease. Hence, the threshold can be set as 0.2 to achieve a trade-off between robustness and distinguishing ability.

In addition, the discrimination of the proposed image hashing scheme is compared with other works. In the ROC curve of Figure 6, reflects the ability of a hashing algorithm to distinguish different images, which is used to evaluate the discrimination of an image hashing algorithm. As shown in Figures 6(a) and 6(b), the of the proposed hash scheme is lower than other schemes under the same . Therefore, the proposed scheme has a better collision resistance.

5.5. Sensitivity to Visual Content Changes

Since the proposed hashing scheme is based on local region features, it can not only tell whether the content has been tampered, but also locate the manipulated regions. When a region of an image is tampered, the keypoints extracted from the tampered image and the hash value will also change. The tampered region of an image can be located by the authentication process. To examine the capability of identifying forged images, another image test set is generated, namely, forged image data set. The forged images are obtained by modifying the contents of the 1338 images from UCID database [42] with Photoshop. These content-tampering operations include object removal, object replacement, object duplication, and object insertion. Each type of content-tampering operation generates 10 different forged versions of an image. Hence, there are 40 different versions of forged images generated from each original image. Then, the original UCID [42] database, the above two visually similar image data sets, and the forged image data set are combined to construct a hybrid image data set. The proposed image hashing scheme is used to generate hash values of these images. The normalized Hamming distance between each pair of an original image and a tampered image is calculated. The mean of all the normalized Hamming distances between each pair of hashes is 0.443. It means that tampering the content of an image will cause a great change on the hash value. The reason is that the keypoints extracted from a tampered image will be significantly different from the keypoints extracted from the original image. As a result, the keypoints extracted from the tampered image cannot match all the keypoints extracted from the original image. Therefore, tampered images can be detected by their hash values, and the proposed image hashing scheme is sensitive to content-changing manipulations.

The accuracy of the proposed algorithm for detecting different types of tampering operations is also evaluated. Note that, the threshold is 0.2 and the corresponding collision probability is . In this situation, the accuracy of the proposed scheme of detecting tampered images under different content-tampering manipulations is shown in Table 6. It is shown that the proposed scheme has a high detection accuracy for these content-tampering manipulations. Compared with object removal and object replacement operations, the detection accuracy of object duplications and object insertion operations have higher detection accuracy. In addition, ten examples of locating tampered regions are shown in Figure 8. The left column and the middle column are original images and tampered images, respectively. The authentication results are illustrated in the last column of Figure 8, in which, the red blocks in each image represent the detected tampered regions. Moreover, as shown in Figure 8, the tampered image of (i) and (j) are not only processed by a content manipulation, but also processed by a rotation manipulation. The proposed method can still detect the tampered region of this image. Furthermore, it is shown that the proposed method can also recover the forgery image to the correct location of the original image, and mark the tampered regions (with red blocks). However, since this method uses only a few keypoints, there is a small probability that the keypoints cannot cover all the tampered corners. In this worst case, the proposed method may not be able to locate the tampered region accurately, but it can still detect the tampering manipulations.

The capability of different hashing algorithms on detecting content-changing operations is also compared. The hybrid image data set is used to evaluate the performance of different hash algorithms for detecting tampered images. ROC curves for content-changing manipulations are obtained to evaluate the performance of different algorithms. The and in these ROC curves are calculated bywhere is the number of the pairs of images correctly identified as authentic images, is the total number of the pairs of all authentic images which include original images and similar images, is the number of the pairs of forged images incorrectly identified as authentic images, and is the total number of the pairs of all forged images. and reflect the ability of a hashing algorithm to distinguish between authentic images and forged images, which are used to evaluate the capability of image hashing algorithms to detect content-changing operations.

Figure 9 illustrates the ROC curves of different algorithms under content-changing manipulations, which represents the ability of a hashing algorithm to detect content-changing operations. It is shown that the proposed method can achieve a higher than other methods for a given . The main reasons are as follows. For image hashing methods based on global feature extraction [37, 38], it is difficult to detect local tampering of an image. The image hashing methods which are based on local feature extraction [10, 25, 36] can detect local tampering of an image. However, these methods may be fragile to combinatorial content-preserving manipulations. There are many images that are applied combinatorial content-preserving manipulations in the data set of our experiment. It is difficult for these methods to distinguish tampered images from these images. For the proposed method, the extracted keypoints are robust to the combinatorial content-preserving manipulations. Meanwhile, the extracted keypoints are sensitive to the content-changing manipulations. When the content of an image changes, the keypoints extracted by the SIFT algorithm with saliency map will also change, which will result in a large difference between the hash of the tampered image and the hash of the original image. Overall, the proposed image hashing algorithm has a good ability to distinguish both visually similar images and content-changing images.

5.6. Key-Dependent Security

As the hash value will be transmitted in an untrusted channel, security is another metric of an image hashing technique [13]. In this work, a secret key is used to generate the hash in order to ensure the security of the hash. Ideally, when different keys are applied to extract hashes of an image, the calculated distances between these key-based hashes should be large enough. To verify the key dependence of the proposed hashing method, 100 images randomly selected from the UCID [42] are used as test images. For each image, a correct key and 100 different wrong keys are applied to generate hash values. The normalized Hamming distances between the hash pairs using these wrong secret keys and the correct key are calculated, which are illustrated in Figure 10. It is shown that the minimum distance and the maximum distance are 0.434 and 0.553, respectively. The mean and the standard deviation of these distances are 0.493 and 0.0156, respectively. This implies that the image hash generated by using the wrong key is very different from the image hash generated by using the correct key. Therefore, it is extremely difficult for an attacker to estimate or forge the correct hash value without the knowledge of the secret key.

5.7. Time Cost Comparison and Hash Length Comparison

The time cost is also an important metric of an image hashing algorithm. To compare the average computation time of the proposed scheme and other schemes, different hashing algorithms are used to generate hash values from 100 different images, and the average computation time of each algorithm is calculated, which is presented in Table 7. These algorithms are the same as those used in Section 5.3. It is shown that the proposed scheme is faster than the SIFT based algorithms, i.e., R&A SCH hashing [10] and SIFT-QZMs hashing [25], since the preprocess and saliency detection reduce the number of keypoints. However, since the SIFT algorithm takes some time to extract keypoints, the proposed scheme is slower than other schemes without using SIFT, i.e., the SVD-CSLBP hashing [36], the GF-LVQ hashing [37], and the RP-NMF hashing [38]. As the proposed method can achieve a better authentication performance, this time overhead is acceptable.

The hash length of different schemes is shown in Table 7. The hash length of the proposed scheme is determined by the number of keypoints. In the experiment, the number of keypoints is set to be 10, so the hash length of the proposed scheme is 960 bits. Although the hash length of the proposed scheme is longer than that of other schemes, the hash value of the proposed scheme contains a large amount of image information, such as geometric information, descriptors of keypoints, and texture features around the keypoints. This information ensures that the proposed hash algorithm has good robustness and discrimination. The geometric information of keypoints is sensitive to content-changing manipulations and thus can help locate the tampered regions. Moreover, the length of the proposed image hashing is still acceptable according to the storage and computational overhead.

6. Conclusion

In this paper, a robust image hashing algorithm by using SIFT keypoints with saliency detection and LBP feature extraction is proposed, which can not only resist various single or multiple combinational content-preserving manipulations, but also detect tampered regions of forged images. The combination of SIFT and saliency detection can extract the most stable keypoints which can make the image hashing algorithm visually robust. Based on these stable keypoints, texture features of local regions of an image are generated, which can be utilized to detect tampered images in the image authentication process. In addition, the hash sequence is randomized to enhance its security. Experimental results show that the proposed method can resist various single content-preserving manipulations and their multiple combinations. The proposed image hashing scheme also satisfies the collision resistance between hashes of different images, the sensitivity to visual content changes, and the security of key dependence.

The proposed scheme also has some shortcomings that need further improvement. First, as only a small number of keypoints is used, there is a small probability that the extracted keypoints cannot cover all the tampered corners. In the worst case, the proposed method may not be able to locate the tampered region accurately, but it can still detect the tampering manipulations. Second, the hash generated by the proposed scheme contains more geometric information and texture features of an image than other schemes, so the hash length of the proposed scheme is longer than other methods. Lastly, in order to improve the robustness of the hash algorithm, the SIFT algorithm is used to extract keypoints. Since the SIFT algorithm takes some time to extract keypoints, the proposed scheme is slower than other schemes without using SIFT, but faster than the other SIFT based algorithms. Therefore, our future works will focus on improving the accuracy of locating tampered areas and reducing the hash length with good performance.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work is supported by the National Natural Science Foundation of China (no. 61602241), and the Fundamental Research Funds for the Central Universities (no. NS2016096).