Security and Communication Networks

Volume 2019, Article ID 9795621, 18 pages

https://doi.org/10.1155/2019/9795621

## SSL: A Novel Image Hashing Technique Using SIFT Keypoints with Saliency Detection and LBP Feature Extraction against Combinatorial Manipulations

College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China

Correspondence should be addressed to Mingfu Xue; nc.ude.aaun@eux.ufgnim

Received 14 November 2018; Revised 12 January 2019; Accepted 7 February 2019; Published 3 March 2019

Academic Editor: David Megias

Copyright © 2019 Mingfu Xue et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Image hashing schemes have been widely used in content authentication, image retrieval, and digital forensic. In this paper, a novel image hashing algorithm (SSL) by incorporating the most stable keypoints and local region features is proposed, which is robust against various content-preserving manipulations, even multiple combinatorial manipulations. The proposed algorithm combines cale invariant feature transform (SIFT) with aliency detection to extract the most stable keypoints. Then, the ocal binary pattern (LBP) feature extraction method is exploited to generate local region features based on these keypoints. After that, the information of keypoints and local region features are merged into a hash vector. Finally, a secret key is used to randomize the hash vector, which can prevent attackers from forging the image and the hash value. Experimental results demonstrate that the proposed hashing algorithm can identify visually similar images which are under both single and combinatorial content-preserving manipulations, even multiple combinations of manipulations. It can also identify maliciously forged images which are under various content-changing manipulations. The collision probability between hashes of different images is nearly zero. Besides, the evaluation of key-dependent security shows that the proposed scheme is secure that an attacker cannot forge or estimate the correct hash value without the knowledge of the secret key.

#### 1. Introduction

Recently, due to the wide application of digital image processing technology, the image data protection has raised serious concerns. In order to prevent images that are used to transmit information from being tampered or forged, image hashing schemes have been proposed. Image hashing is a technique that extracts a content-based compact signature from an image. It has been effectively applied in content authentication [1–5], digital forensic [6, 7], and image retrieval [8, 9].

Comparing to the traditional hash function in cryptography, image hash needs to resist content-preserving manipulations such as geometric transformation and common signal-processing operations. It also needs to be sensitive to content-tampering manipulations such as object removal, object replacement, and object insertion. In general, an image hashing algorithm should satisfy the following properties [10]. (1) One-wayness: it is easy for an image hashing method to obtain the hash value from an input image. However, it should be rather difficult to derive the image content from an image hash value. (2) Compactness: the size of an image hash value should be far less than the size of the original image. (3) Perceptual robustness: the difference between two image hashes generated from two similar perceptual contents should be small. Hence, for the visually similar images, the hash function should produce approximate image hashes even under various content-preserving operations. (4) Collision resistance: the image hashes of perceptually different images should be different, and different images can be easily distinguished by hash distances. (5) Unpredictability: it is computationally impractical to estimate an image hash without the secret key.

Currently, many existing image hashing algorithms are only robust to single content-preserving manipulations, while these methods are not effective against combinatorial manipulations. In order to resist various content-preserving manipulations and their multiple combinations, we propose a robust image hashing algorithm (namely SSL) by exploiting cale invariant feature transform (SIFT) with aliency detection and ocal binary pattern (LBP) feature extraction. The proposed scheme extracts the most stable keypoints and texture features from an image to generate image hash, which can resist both single content-preserving manipulation and multiple combinatorial manipulations. It can also detect tampered images that are under content-changing manipulations. To the best of the authors’ knowledge, this is the first time SIFT, Saliency detection, and LBP are combined to achieve good robustness and discrimination thus can address the challenges of the state-of-the-art techniques. Moreover, randomization of hash sequence is proposed which can effectively ensure the security of the proposed scheme.

The SSL image hashing algorithm firstly utilizes SIFT and saliency detection to extract the most stable keypoints from an image. Then, the hashing algorithm generates texture features of local regions which are around those stable keypoints. The texture features are represented by LBP, which is gray-scale invariant and can be used to detect content-changing manipulations. After that, the information of keypoints and texture features are combined to generate a hash vector. Finally, in order to enhance the security of the image hash, the hash vector is randomized into an encrypted hash by using a secret key. In the image authentication stage, the receiver first calculates the image hash of a received image and matches the same keypoints between the received image and the original image. If there are some matched keypoints (indicating they are the same image) and some mismatched keypoints between the two images, the received image is considered as tampered. The proposed scheme is a nonlinear hash scheme, and the keypoints extracted from a tampered image are different from the keypoints extracted from the original image. Then, the texture features around the matched keypoints in the two hash values are aligned, so that the tampered region can be located by calculating the Hamming distance of the hash values between the aligned texture feature parts. The performance of the proposed image hashing scheme is evaluated with respect to the perceptual robustness, the collision resistance between hashes of different images, the sensitivity to visual content changes, the security of key dependence, the time cost, and the hash length.

The main contributions of this paper are as follows:(i)The proposed scheme extracts most stable keypoints by exploiting SIFT with saliency detection, which are robust not only to single content-preserving manipulations but also to combinatorial manipulations, even their multiple combinations. In addition, these extracted keypoints contain sufficient image information, such as geometric information and descriptors of keypoints, and texture features around the keypoints. This information ensures that the proposed hash algorithm has good robustness and discrimination.(ii)The proposed method can recover the received image to the orientation and scale of the original image according to its geometric information. Moreover, by incorporating the most stable keypoints and local texture features, the proposed scheme is sensitive to content-changing manipulations and can also locate the tampered regions during the image authentication process.(iii)By randomizing the hash sequence with a secret key, the proposed scheme is so secure that an attacker cannot forge or estimate the correct hash value without the knowledge of the secret key.

The rest of this paper is organized as follows. Related works are reviewed in Section 2. The scale invariant feature transform, saliency detection, and local binary pattern are described in Section 3. Section 4 elaborates the proposed image hashing scheme. Experimental results are presented in Section 5. Finally, conclusions are given in Section 6.

#### 2. Related Work

The earlier image hashing schemes are based on image transformation, such as discrete cosine transform (DCT), discrete Fourier transform (DFT), and discrete wavelet transform (DWT). Venkatesan* et al.* [11] are the first to propose image hashing. They use randomized processing strategies to convert an image into a random binary sequence. In [12], the image hash is generated by using DCT-free modes to random smooth patterns with a secret key. Swaminathan* et al.* [13] propose a Fourier-Mellin transform based image hashing scheme. Tang* et al.* [14] propose a lexicographical framework to generate image hashes by using DCT and nonnegative matrix factorization. Lei* et al.* [15] apply Radon transform and DFT on an image to calculate the magnitude of the significant DFT coefficients. The image hashes are obtained by normalizing and quantizing the significant DFT coefficients. Karsh* et al.* [16] propose an image hashing scheme by using DWT-SVD and spectral residual method. Tang* et al.* [17] divide an image into nonoverlapping blocks and apply a three-level two-dimensional DWT to each block to extract DWT statistical features.

Recently, some image hashing schemes based on invariant features have been proposed. Monga* et al.* [18] generate image hash using feature points, which can resist attacks including geometric transformation and common signal processing. The scheme extracts significant features by using a wavelet based feature detection algorithm and uses an iterative procedure to obtain a set of feature points. Tang* et al.* [19] extract a rotation-invariant feature matrix and compress the matrix with multidimensional scaling to generate an image hash. Tang* et al.* [20] construct three-order tensor from an image and decompose the tensor to generate an image hash. Zhao* et al.* [21] propose an image hashing scheme by incorporating global features and local features. The global features and local features are based on Zernike moments of the luminance/chrominance components and salient regions, respectively. This method can be applied to image authentication and can detect malicious tampering. Karsh* et al.* [22, 23] also use global features and local features to generate image hash. The global features are obtained by using ring partition and projected gradient nonnegative matrix factorization [22] or statistical feature distance [23], and the local features are extracted from the salient regions of an image.

Some image hashing schemes that use SIFT to extract keypoints and generate hash values are proposed. The keypoints extracted by SIFT can resist most of the content-preserving manipulations. Roy* et al.* [5] propose an image hashing algorithm to detect image tampering. The scheme extracts feature points by SIFT and compresses them into a part of the hash value. The other part of the hash value is obtained by calculating the orientation of each block, which is generated by the edge detection algorithm. Battiato* et al.* [24] propose an image hashing algorithm by encoding the spatial distribution of feature points, which is extracted by SIFT. The scheme uses a voting procedure in the parameter space of the model to recover the geometric transformation. Lv* et al.* [10] propose a SIFT-Harris detector to select stable keypoints. They generate an image hash by embedding keypoints into shape-contexts-based descriptors. Ouyang* et al.* [25] apply quaternion Zernike moments and SIFT to extract image features and generate the image hash. The feature extraction algorithm removes repeated keypoints and selects top keypoints as the features of an image. The methods in [10, 25] do not evaluate the robustness of the algorithm under combinatorial manipulations.

Texture is also an important feature of images. Many descriptors have been proposed to represent the texture features of an image, such as the LBP [26], the center-symmetrical local binary pattern (CSLBP) [27], the dual-cross pattern (DCP) [28, 29], and the transition local binary pattern (tLBP) [30]. Many image hashing algorithms based on these descriptors are proposed in the literature. Qin* et al.* [31] use the encoding of dual-cross pattern to extract texture features. Then, they combine the salient structural features to compress the texture features to generate image hashes. Qin* et al.* [32] exploit block truncation coding and CSLBP to generate an image hash, which can be applied in image authentication and retrieval. Song* et al.* [33] propose a hashing algorithm for near-duplicate video retrieval based on multiple visual features, including LBP features and the global features. Dubey* et al.* [34] use multichannel decoded LBP as features of an image to retrieve the image. Abbas* et al.* [35] combine Noise Resistant LBP and Singular Value Decomposition to construct an image hashing algorithm against content-preserving manipulations. This method does not evaluate the robustness of the algorithm under image rotation. Davarzani* et al.* [36] use CSLBP to extract features from each nonoverlapping block and generate a hash value by inner products of feature vectors. The identification accuracy of this method will decrease when facing geometric manipulations.

There are rarely few image hashing algorithms that can resist combinatorial manipulations. Li* et al.* [37] propose an image hashing scheme based on random Gabor filtering and dithered lattice vector quantization. The scheme only evaluates the robustness of the algorithm under a few combinational manipulations, such as rotation with cropping and rotation with scaling. Tang* et al.* [38] propose an image hashing scheme using a ring partition and a nonnegative matrix factorization, which can resist rotation operations. Only the combined attacks using rotation with other content-preserving operations are evaluated, while other combinations of attacks are ignored in the experiments. Tang* et al.* [39] combine ring partition and invariant vector distance to generate image hash. This method is robust to image rotations and is sensitive to content changes. Similarly, this method is only validated on combined attacks using rotation with other content-preserving operations.

#### 3. Preliminary

##### 3.1. Scale Invariant Feature Transform

SIFT is an algorithm that extracts feature points of an image. These feature points are robust to geometric transformation and common signal-processing operations. The feature points extraction process of SIFT is as follows [40].

*Scale-Space Extrema Detection. *This step uses a difference-of-Gaussian (DOG) function to search for potential keypoints that are invariant to scale and orientation. The scale space of an image is first convolved with Gaussian function with different scales , which is defined aswhere and correspond to the abscissa and ordinate of a point in an image , respectively. is the convolution operation in the point . is a variable-scale Gaussian function. The DOG function is defined as the difference of two nearby scales separated by a constant multiplicative factor , as follow [40]:SIFT algorithm searches extreme values of the DOG images at the current and adjacent scales as candidate keypoints.

*Accurate Keypoint Localization and Orientation Assignment. *In this step, a model is used to determine the location and the scale of the keypoints. Meanwhile, the unstable keypoints with low contrast are eliminated. To assign an orientation to each keypoint, the gradient orientations around the keypoint need to be calculated. Then, the number of each orientation is counted and an orientation histogram is formed from these gradient orientations. The peak of the orientation histogram represents the orientation of the corresponding keypoint.

*Descriptor Generation. *After the keypoint detection is completed, the position, scale, and orientation of each keypoint are determined. Then, the descriptors of keypoints are generated based on this geometric information.

##### 3.2. Saliency Detection

The spectral residual is used to detect the salient region that describes the nontrivial part of an image [41]. In [41], Hou and Zhang decompose the image information into two parts based on information theory, the innovation part, and the redundant part. By analyzing and processing the log spectrum of an image, the spectral residual approach can estimate the innovation part and remove the statistical redundant components. The spectral residual of saliency detection consists of the following process [41].

An input image is firstly transformed into a frequency domain image by Fourier Transform. The result of Fourier Transform is denoted as . Then, the log spectrum and the phase spectrum of the image are obtained from . The redundant information is defined by [41] where is an low-pass kernel. The spectral residual can be calculated by [41] By applying the Inverse Fourier Transform, the saliency map is constructed in spatial domain. The value of each point in a saliency map is calculated by [41] where is a Gaussian filter and is an Inverse Fourier Transform.

After that, salient regions can be obtained by using threshold segmentation on the saliency map, which are represented by an object map. If the value of the point in the saliency map is smaller than the threshold, the value of the corresponding point on the object map is set to be 0. Otherwise, the value of the corresponding point on the object map is set to be 1. Salient regions are the regions with the value of 1 on the object map.

##### 3.3. Local Binary Pattern

The LBP operator is a texture descriptor, which is introduced by Ojala* et al.* [26]. It assigns a binary label (8 bits) to each pixel of an image. When calculating the label of each pixel, the current pixel is set as the center pixel. Then, the gray value of the center pixel is compared with the gray value of its neighborhood pixels in a clockwise (or counterclockwise) direction. If the neighboring pixel value is smaller than the center pixel value, the corresponding bit on the binary label is set to be 0. Otherwise, the corresponding bit on the binary label is set to be 1. After comparing the gray value of the center pixel and all neighboring pixels, the binary label of the center pixel is obtained and converted to a decimal number. Therefore, there are possibilities in the adjacent region (including pixels), and the range of LBP vector of each pixel is from 0 to 255. Finally, the histogram of the labels represents a texture descriptor.

#### 4. Proposed Image Hashing Scheme

The overall flow of the proposed image hashing method is shown in Figure 1. There are five stages in this scheme to generate a robust image hash. Firstly, a preprocessing step, which includes normalization and filtration operations, is applied to the original image. Secondly, the stable feature points are extracted from the preprocessed image by combining SIFT with salient region detection. Thirdly, LBP algorithm is used to extract texture information from each region which is around each feature point. Then, the descriptors of feature points and texture features are combined and compressed into a hash sequence. Finally, the hash sequence is randomized by a secret key to enhance the security of the proposed image hash, which can be transmitted in an untrusted channel. The process of each stage is described in the following sections, and the authentication method is presented in Section 4.6.