Abstract

Image hashing has attracted much attention of the community of multimedia security in the past years. It has been successfully used in social event detection, image authentication, copy detection, image quality assessment, and so on. This paper presents a novel image hashing with low-rank representation (LRR) and ring partition. The proposed hashing finds the saliency map by the spectral residual model and exploits it to construct the visual representation of the preprocessed image. Next, the proposed hashing calculates the low-rank recovery of the visual representation by LRR and extracts the rotation-invariant hash from the low-rank recovery by ring partition. Hash similarity is finally determined by norm. Extensive experiments are done to validate effectiveness of the proposed hashing. The results demonstrate that the proposed hashing can reach a good balance between robustness and discrimination and is superior to some state-of-the-art hashing algorithms in terms of the area under the receiver operating characteristic curve.

1. Introduction

With the popularity of the platforms of the social network, such as Facebook and Twitter, more and more digital images are transmitted via the Internet and stored in the cyberspace. Therefore, efficient techniques are required for processing massive images and protecting content security. For example, when an important event happens, such as an opening ceremony of the Olympic Games, many people would like to forward the same image of the event in their social network. These forwarded images may undergo some digital operations, such as compression and enhancement. Consequently, there are many image copies of a hot event in the cyberspace. It is an important task to find a hot event of the social network by detecting image copies [1]. In recent years, a useful technique called image hashing [2, 3] attracts much attention of the community of multimedia security, which can extract a short code called hash based on the visual content of the input image regardless of its detailed bits. It can not only be applied to social event detection [1],but also can be used in many other applications [48], e.g., image retrieval, image authentication, image copy detection, and image quality assessment. In practice, the hash is used to represent the input image. As the hash is a short representation, the use of image hashing can achieve efficient data processing. In this paper, we study a novel hashing algorithm based on the low-rank representation model and ring partition.

The most important properties of the image hashing algorithm are robustness and discrimination [9]. The requirement of robustness is that the image hashing algorithm must map visually similar images to the same or similar hashes no matter whether their bit representations are the same or not. In other words, the hashing algorithm should be robust to normal digital operations, e.g., compression, filtering, and enhancement. This is because they alter the bit representations of digital images but keep their visual appearances unchanged. The requirement of discrimination is that the image hashing algorithm must map different images to completely different hashes. Since the number of different images is much bigger than that of similar images in practice, good discrimination means a low error rate of judging different images as similar images. This property is helpful to many applications, such as social event detection and image retrieval. Note that the two properties restrict each other. A high-performance algorithm should reach a good balance between them.

In the past years, many researchers have devoted themselves to developing image hashing algorithms. Some typical techniques are briefly introduced below. For example, Swaminathan et al. [10] combined the Fourier-Mellin transform and randomization method to develop secure image hashing. Monga and Evans [11] proposed to use feature points detected by the end-stopped wavelet to build the hash. These two hashing algorithms [10, 11] are resilient to small-angle rotation. Lv and Wang [12] combined SIFT features and Harris points to construct the hash. This scheme is robust to large-angle rotation and brightness adjustment, but its robustness against additive noise and blurring must be improved. Zhao et al. [13] derived the robust image hash for authentication from global features determined by Zernike moments and local features based on the position information of salient features. Tang et al. [14] calculated the image hash by jointly using the color vector angle (CVA) and discrete wavelet transform (DWT). Both the above hashing methods [13, 14] only resist small-angle rotation. Laradji et al. [15] extracted the hash of the color image by using the hypercomplex numbers and quaternion Fourier transform (QFT). This approach has good discrimination, but its robustness against rotation needs to be improved. Wang et al. [16] jointly exploited the Watson model and SIFT features to design the hashing algorithm for authentication. This method has better performance than the hashing method [13].

Recently, Yan et al. [17] introduced the quaternion techniques called quaternion Fourier transform and quaternion Fourier-Mellin moments to image hashing design. In another study, Yan et al. [18] exploited the adaptive local image features to design multiscale hashing. Both the hashing schemes [17, 18] reach good performance in tampering detection. Tang et al. [19] constructed the feature matrix via the DCT (discrete cosine transform) and learned hash code from the DCT-based matrix by local linear embedding. This hashing only resists image rotation within 5°. In [20], Tang et al. proposed to extract local statistical features from image rings and compressed statistical features by calculating vector distance. As the contents of image rings are rotation-invariant, rotation robustness of this hashing [20] is thus enhanced. Davarzani et al. [21] combined SVD (singular value decomposition) with CSLBP (Center-Symmetric Local Binary Patterns) to make the hashing scheme for authentication. The scheme is robust to additive noise and JPEG compression, but it is sensitive to rotation. Huang et al. [22] introduced the random walk to hash generation for improving security. In another work, Qin et al. [23] used SVD to conduct preprocessing and extracted hybrid features based on the circle-based feature and block-based feature to construct the hash. The hybrid feature-based hashing has good robustness against compression and filtering, but its computational cost should be reduced. Tang et al. [24] first proposed to construct a three-order tensor with image blocks and extracted the image hash from the three-order tensor by Tucker decomposition. This approach can resist small-angle rotation. Shen and Zhao [25] proposed to compute the image hash by using the color opponent component and quadtree structure features. This method shows good performance in image authentication, but it is fragile to image rotation.

From the above reviews, it can be found that many algorithms do not make a good balance between rotation robustness and discrimination. Aiming at this problem, we propose a new image hashing based on low-rank representation and ring partition. The main contributions of the proposed algorithm are as follows: (1)We calculate the visual representation of the preprocessed image based on the saliency map extracted by the spectral residual model. Since the saliency map can indicate visual attention of human beings, the visual representation using the saliency map can effectively describe salient regions of the image. Consequently, hash generation with the visual representation can improve robustness of the proposed algorithm(2)We propose to incorporate low-rank representation into ring partition. The low-rank representation can capture the global structure of data, which is helpful to make the discriminative hash. Since ring partition can produce a set of image rings invariant to rotation, hash code extraction based on image rings can reach good rotation robustness

Many experiments are done to validate effectiveness of the proposed algorithm. The results illustrate that the proposed algorithm can resist many digital operations, including image rotation with a large angle. Performance comparisons with some state-of-the-art algorithms are also done. The receiver operating characteristic (ROC) curve results show that the proposed algorithm has better classification performance than those of the compared algorithms in discrimination and robustness.

The structure of the rest of the paper is as follows. Section 2 describes the image hashing algorithm proposed in this paper. Sections 3 and 4 discuss experimental results and performance comparisons, respectively. Section 5 summarizes this paper.

2. Proposed Image Hashing

The proposed image hashing can be divided into four steps: preprocessing, visual representation calculation, low-rank representation, and ring partition. Figure 1 is the diagram of our algorithm. The preprocessing is to produce a normalized image, and the visual representation calculation is to generate an image representation which can indicate salient regions of the image. The use of low-rank representation is to extract principal image features for making the discriminative hash, and the use of ring partition can make the extracted feature code invariant to image rotation. These steps are described in detail in the following sections.

2.1. Preprocessing

This step includes three operations: bilinear interpolation, Gaussian low-pass filtering, and color space conversion. The bilinear interpolation is to resize the input image to a standard size . This operation can make our algorithm resilient to image scaling. The Gaussian low-pass filtering is then applied to the resized image. Such operation can alleviate the influence of some digital operations on the resized image, such as image noise and JPEG compression. Finally, the filtered image in RGB color space is converted to HSV color space [26] and the brightness component in the HSV color space is used to denote the input image. The HSV color space is also called the hexagonal cone model and has shown good performances in many existing hashing algorithms [26]. Let be the hue of the pixel and and be the saturation and brightness of the pixel in the HSV color space, respectively. Thus, the calculation formulas for converting from RGB space to HSV space are as follows: where is the red component of the pixel, and are the green and blue components of the pixel, respectively, and represent the maximum value and the minimum value of , , and , respectively.

Figure 2 shows a practical example of the preprocessing. Figure 2(a) is an input image, Figure 2(b) is the resized image, and Figure 2(c) represents the filtered image. Figures 2(d)2(f) represent the hue, saturation, and brightness components of the filtered image in the HSV color space, respectively.

2.2. Visual Representation Calculation

In this work, a well-known model of saliency detection called the spectral residual model (SRM) [27] is exploited to find the saliency map of the image. Then, the visual image representation can be determined by combining the saliency map and the brightness component of the preprocessed image. Here, we select the SRM as the method of saliency detection. This is because the SRM is better than the conventional method such as Itti’s method [28] in detection performance and computational speed [27].

Saliency map calculation of the classical spectral residual model is based on the spectral residual of the image , which is defined as follows: where is a matrix of size and is the log spectrum of the image. More specifically, is defined as follows:

In addition, is determined by the below formula: where is the amplitude spectrum of the image, which can be determined by the following formula: where denotes a given image, is the Fourier transform, and || denotes the amplitude of the image. Finally, the saliency map in the spatial domain can be constructed by using the inverse Fourier transform as follows: where is a low-pass filter for smoothing the output saliency map of the inverse Fourier transform for better visual effects (a circular averaging filter radius 3 is used here), is the inverse Fourier transform, and is the phase spectrum of the image defined in the below equation: where denotes the phase of the image. In [27], it is stated that image width (or height) with 64 pixels can reach a good estimation of the scale of normal visual conditions. Following this, we resize the brightness component to and convert the calculated saliency map to the original size by bilinear interpolation. More details of SRM can be referred to [27].

Figure 3 shows an instance of saliency map detection via the spectral residual model. Figure 3(a) is the enlarged view of Figure 2(f), and Figure 3(b) is the saliency map of Figure 3(a) detected by the spectral residual model. From the results, it can be found that salient regions of the image are successfully extracted.

Suppose that is the brightness component of the preprocessed image and is the element of in the th row and the th column. Also, assume that denotes the saliency map of and is the element of in the th row and the th column. Therefore, the visual representation can be obtained by the following equation: where is the element of in the th row and the th column.

2.3. Low-Rank Representation

Low-rank representation (LRR) is a useful technique for capturing the global structure of data [29]. The LRR is robust to noise and can extract the lowest-rank representation of all data [29, 30]. It has been widely used in many applications, such as subspace segmentation [31], image segmentation [32], and image classification [33]. Suppose that is an observation matrix corrupted by noise. Thus, LRR calculation can be solved by the regularized rank minimization problem [2931] as follows: in which is the nuclear norm of a matrix (sum of the singular values of the matrix), is a parameter for balancing effects of the two parts, is the norm, and is defined as follows:

Let be the low-rank recovery of . Assume that the minimizer of (9) is . Thus, it can be obtained by or . In practice, problem (9) can be converted to an equivalent optimization problem as follows:

This optimization problem can be solved by the below ALM (Augmented Lagrange Multiplier) problem: in which and are the Lagrange multipliers and is the penalty parameter. Problem (12) can be solved by the inexact ALM method [30]. More details of LRR can be referred to [2931].

In this work, the input of LRR is the visual representation and the low-rank recovery is taken for hash code extraction. The reasons of our use of LRR for image hashing design are as follows. The influences of digital operations (e.g., compression, filtering, and noise) on image are viewed as noises added to the image. Since LRR is robust to noise, hash generation with low-rank recovery can improve robustness of our algorithm. In addition, LRR can efficiently capture the global structure of input data. Therefore, the use of LRR can ensure the discriminative capability of the proposed algorithm.

2.4. Ring Partition

To make the proposed algorithm resilient to image rotation, a well-known technique of image segmentation called ring partition (RP) [9, 34] is exploited here. RP takes the image center as the circle center and divides the inscribed circle of the image into a set of rings. Figure 4 presents an example of RP with 4 image rings. Clearly, the contents of image rings are unchanged after image rotation. Therefore, we can calculate the hash code resistant to rotation by using the mean values of these rings. Details of hash code extraction based on RP are explained as follows.

Suppose that the low-rank recovery is divided into rings. Note that the size of is and the area of each ring is kept the same. Obviously, the elements of image rings can be determined by using two adjacent radii except those of the innermost ring. Assume that is the th radius () labeled from the small value to the big value. Therefore, is the radius of the innermost circle and is the radius of the outmost circle. It is clear that , where means downward rounding. To calculate the other radii, the average area of the image ring should be first determined by the below equation: where is the area of the inscribed circle. Thus, the radius of the innermost circle can be calculated by the following equation:

Next, other () can be determined by the below formula:

After all radii are obtained, the elements of can be classified into image rings by using radii and the distances from these elements to the image center. Let be the set of the elements of the th ring () and be the element of in the th row and th column. Assume that the coordinates of the image center are . Therefore, and if is an even number. Otherwise, and . Thus, the distance from to the image center can be obtained by the below equation:

Consequently, the set can be determined by one of the following equations:

For each set, the mean of its elements is selected as compact feature. Let be the mean of the elements in (). Thus, it is quantized to an integer for reducing the storage by the below equation: where is the rounding operation. Finally, our hash is obtained by concatenating these integers as follows:

Therefore, the length of our hash is integers.

2.5. Hash Similarity Computation

As our hash is composed of some integers, the well-known distance metric called the norm is exploited to measure similarity between two hashes. Assume that and are the hash sequences of two images. Thus, the norm of the two hashes is defined as follows: in which is the th element of and is the th element of . Generally, a smaller norm means more similar hashes of the evaluated images. If the norm is bigger than a threshold , the evaluated images corresponding to the input hashes are judged as different images. Otherwise, they are viewed as similar images.

3. Experimental Results

The parameter settings of the proposed algorithm are as follows. The of LRR is 0.9, the input image is resized to , and the ring number is 64, i.e., and . In the following experiments, Sections 3.1 and 3.2 validate the properties of robustness and discrimination, respectively. Section 3.3 analyzes our hash storage in binary form. Section 3.4 discusses the influence of the ring number on our algorithm performance.

3.1. Robustness

An open database called the Kodak dataset [35] is exploited to construct a test database of similar images. The Kodak dataset is composed of 24 color images of different categories with the size of or . To produce similar images of these color images, some commonly used operations are used to conduct robustness attacks. These operations are achieved by Photoshop, MATLAB, and StirMark. More specifically, Photoshop provides brightness and contrast adjustments (parameters are ±10 and ±20). MATLAB provides Gaussian low-pass filtering (standard deviations range from 0.3 to 1.0 with a step 0.1), gamma correction (parameters are 0.75, 0.9, 1.1, and 1.25), salt-and-pepper noise, and speckle noise (both parameters range from 0.001 to 0.01 with a step 0.001). StirMark provides JPEG compression (quality factors range from 30 to 100 with a step 10), watermark embedding (strengths range from 10 to 100 with a step 10), image scaling (scaling ratios are 0.5, 0.75, 0.9, 1.1, 1.5, and 2.0), and image rotation (rotation angles are ±1, ±2, ±5, ±10, ±15, ±30, ±45, and ±90). Note that image rotation will increase image size and some padded pixels are added to the rotated images. In this experiment, only the central parts of 24 original images and their rotated images are taken for evaluating rotation robustness. Therefore, the number of the used operations is 10, which totally contribute 80 manipulations. This implies that each original image has 80 similar images. So the total number of visual similar images is , and the number of the used image is .

Figure 5 demonstrates robustness experiments under different operations, where the -axis is the parameter value of digital operation and the -axis is the norm. Note that the curves in Figure 5 are the mean values of the norms between hashes of 24 color images and their similar images. From Figure 5, it can be seen that the mean norms are all smaller than 15, except two values of rotation operation. For image rotation, the maximum value is 17.29, which is a little bigger than those of other operations. It is found that, if the threshold is selected as , our algorithm can correctly detect 92.19% similar images. If there is no rotated image, our algorithm can recognize all similar images. A high correct detection rate illustrates good robustness of our algorithm.

3.2. Discrimination

A famous database named UCID [36] is selected to test discrimination of our algorithm. This database contains 1338 color images with the size of or . Hashes of these 1338 color images are firstly calculated, and the norm between each pair of hashes is then computed. Therefore, the total number of valid distances is . Figure 6 illustrates the distribution of these distances, where the -axis is the value of the norm and the -axis is the frequency of the norm. It can be observed that the maximum norm is 163.25 and the minimum norm is 4.80. Moreover, the mean value of these norms is 42.72 and the standard deviation is 18.48. From Figure 6, it can be seen that most distances are bigger than the abovementioned threshold , indicating our good performance in discrimination. Actually, the performances of discrimination and robustness are closely related to the selected threshold. Different thresholds will lead to different performances. Table 1 demonstrates the performances of robustness and discrimination under different thresholds, where the robustness is measured by the correct detection rate and the discrimination is indicated by the false detection rate. Note that the correct detection rate is the ratio between the number of similar images correctly detected and the total number of similar images. The false detection rate is the ratio between the number of different images falsely judged as similar images and the total number of different images. From Table 1, we can select as the recommended threshold since it can make the minimum total error rate.

3.3. Hash Storage

To analyze the required bits for storing our hash, the hashes of 1338 images in UCID are selected as the data source. Note that each hash generated by our algorithm is composed of 64 integers. Therefore, there are integers in the data source. Figure 7 illustrates the distribution of these 85632 hash elements, where the -axis is the element value and the -axis is the frequency of the element value. It can be found that the minimum value of these hash elements is 1 and the maximum value is 49. Since 6 bits can represent a decimal number in the range , the storage of a hash element only requires 6 bits. Therefore, our hash storage requires bits in total.

3.4. Influence of the Ring Number on Hash Performance

To discuss the influence of the ring number on hash performance, different ring numbers are used to conduct robustness experiments and discriminative experiments. The used ring numbers include , , , , and . In the experiments, only the ring number is altered, and the other parameters are unchanged. The two image databases used in Sections 3.1 and 3.2 are also selected here. To make visual and quantitative comparisons, the receiver operating characteristic (ROC) graph [37] is taken. In the ROC graph, the -axis represents FPR (false positive rate) and the -axis represents TPR (true positive rate). Let and be the FPR and the TPR, respectively. Thus, they can be defined as follows: in which is the number of similar images correctly judged as similar images, represents the total number of similar images, is the number of different images falsely identified as similar images, and denotes the total number of different images. Obviously, and can indicate discrimination and robustness, respectively. In the ROC graph, a curve is plotted by using a set of points with coordinates . High performances of discrimination and robustness mean a low and a big . Therefore, it can intuitively conclude that the curve near the top-left corner has better classification performance than the curve far away from it. To make quantitative analysis, the AUC (area under the ROC curve) is taken. The range of AUC is [0, 1]. In general, a bigger AUC means better classification of the evaluated algorithm.

Figure 8 shows ROC curves of different ring numbers, where the curves in the upper-left area are zoomed in for easy comparison. Obviously, the five ROC curves are all close to the upper-left corner, indicating good classification of the proposed algorithm. In addition, the curves of ring numbers and are overlapping and both of them are above the curves of the other ring numbers. This means that the ring numbers and are slightly better than other ring numbers in terms of classification performance between robustness and discrimination. To make quantitative analysis, the AUC of each ring number is also calculated. It is found that the AUCs of ring numbers , , , , and are 0.98605, 0.98863, 0.99012, 0.99069, and 0.99087, respectively. Clearly, the AUCs of and are slightly different. They are all bigger than those of other ring numbers. This also verifies that and are better than other ring numbers.

To view computational time of different ring numbers, the proposed algorithm is coded by using MATLAB language and the used computer is a workstation with 2.10 GHz Intel Xeon Silver 4110 CPU and 64 GB RAM. The total time of extracting 1338 image hashes is computed to find the average time of generating a hash. It is found that the average time of , , , , and is 25.407, 27.133, 26.957, 29.361, and 33.611 seconds, respectively. Moreover, hash lengths of different ring numbers are also compared. Note that our hash length is equal to the ring number when the length is measured by integers. Therefore, the hash lengths of , , , , and are 8, 16, 32, 64, and 80 integers, respectively. Table 2 summarizes performances of different ring numbers. From the viewpoint of the whole performance, it can be observed that the ring number reaches a good balance among the three performance indices.

4. Performance Comparisons

To demonstrate advantages of our hashing algorithm, we compare it with some popular hashing algorithms. The selected hashing algorithms are CVA-DWT hashing [14], SVD-CSLBP hashing [21], random walk-based hashing [22], and hybrid feature-based hashing [23]. The datasets used in the above robustness experiment and discrimination test are also taken here, i.e., 1920 pairs of similar images and 1338 different images. To make fair comparison, the parameters of the selected algorithms are set to the same values as their original papers. Their hash similarity metrics are kept unchanged, i.e., the norm for CVA-DWT hashing and hybrid feature-based hashing, the correlation coefficient for SVD-CSLBP hashing, and the normalized Hamming distance for random walk-based hashing. The result of our hashing with is taken for comparison.

Figure 9 illustrates the ROC curves of the evaluated hashing algorithms. To see more details, the ROC curves in the upper-left area are zoomed in and placed in the bottom-right part of Figure 9. It is observed that the curves of our hashing and hybrid feature-based hashing intersect with each other. Moreover, both of them are above the curves of other evaluated hashing algorithms. To conduct quantitative analysis, the AUCs of the evaluated hashing algorithms are also calculated. It is found that the AUCs of CVA-DWT hashing, SVD-CSLBP hashing, random walk-based hashing, hybrid feature-based hashing, and our hashing are 0.97563, 0.74522, 0.95758, 0.97545, and 0.99069, respectively. From the results, it can be seen that our hashing is better than the compared algorithms in classification performance.

Average time of calculating a hash is also compared. The abovementioned computer is used again, and the compared algorithms are also implemented with MATLAB. It is found that the average time of CVA-DWT hashing, SVD-CSLBP hashing, random walk-based hashing, hybrid feature-based hashing, and our hashing are 0.05, 0.19, 0.02, 6.12, and 29.36 seconds, respectively. The speed of our hashing is slower than those of the compared algorithms. This is because the computational cost of the LRR method is relatively high. Moreover, the hash lengths of all algorithms are also compared. The hash lengths of CVA-DWT hashing and random walk-based hashing are 960 and 144 bits, respectively. The hash lengths of SVD-CSLBP hashing and hybrid feature-based hashing are 64 and 104 floating-point numbers. As a floating-point number requires 32 bits for storage according to the IEEE standard [38], the hash lengths of SVD-CSLBP hashing and hybrid feature-based hashing are 2048 and 3328 bits, respectively. The length of our hashing is 384 bits. It is longer than that of random walk-based hashing, but it is much shorter than those of other compared hashing algorithms. Table 3 summarizes performance indices of the evaluated hashing algorithms, where the text in italic is the best result of the corresponding column. Our hashing outperforms the compared hashing algorithms in classification performance in terms of AUC, but it runs slower than the compared algorithms. As to hash length, it is better than all compared algorithms, except random walk-based hashing.

5. Conclusions

In this paper, we have proposed a novel image hashing with LRR and RP. An important contribution is the calculation of the visual representation based on the saliency map determined by SRM. Hash generation based on the visual representation can improve robustness performance. Another significant contribution is the combination of LRR and RP, which can make the discriminative hash invariant to rotation. Many experiments with two well-known databases have been carried out. The results have shown that the proposed hashing is robust and discriminative. ROC curve comparisons have illustrated that the proposed hashing outperforms the compared hashing algorithms in classification performance. In addition, hash length comparisons have shown that the proposed hashing is better than all compared algorithms, except random walk-based hashing. As to running speed, the proposed hashing runs slower than the compared algorithms due to the high computational cost of the LRR method. In the future, we plan to design fast hashing algorithms, deep learning-based hashing algorithms, and hashing algorithms for image authentication.

Data Availability

The image datasets used to support the findings of this study can be downloaded from the public websites whose hyperlinks are provided in this paper.

Conflicts of Interest

The authors declare that there is no conflict of interest regarding publication of this paper.

Acknowledgments

This work is partially supported by the National Natural Science Foundation of China (61962008, 61762017, and 61966004), the Guangxi “Bagui Scholar” Team for Innovation and Research, the Guangxi Talent Highland Project of Big Data Intelligence and Application, the Guangxi Natural Science Foundation (2017GXNSFAA198222), the Guangxi Collaborative Innovation Center of Multi-Source Information Integration and Intelligent Processing, and the Innovation Project of Guangxi Graduate Education (YCSW2020109).