Visual Security Assessment via Saliency-Weighted Structure and Orientation Similarity for Selective Encrypted Images

Wu, Zhengguo; Zhang, Kai; Ren, Yannan; Li, Jing; Sun, Jiande; Wan, Wenbo

doi:https://doi.org/10.1155/2021/6675354

Security and Communication Networks

On this page

Abstract Introduction Related Work Conclusions Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Special Issue

Multimedia Communication Security in 5G/6G

View this Special Issue

Research Article | Open Access

Volume 2021 | Article ID 6675354 | https://doi.org/10.1155/2021/6675354

Visual Security Assessment via Saliency-Weighted Structure and Orientation Similarity for Selective Encrypted Images

Zhengguo Wu,¹Kai Zhang,¹Yannan Ren,²Jing Li,³Jiande Sun,¹and Wenbo Wan¹

Academic Editor: Jinwei Wang

Received17 Nov 2020

Revised25 Dec 2020

Accepted07 Jan 2021

Published23 Jan 2021

Abstract

Selective encryption has been widely used in image privacy protection. Visual security assessment is necessary for the effectiveness and practicability of image encryption methods, and there have been a series of research studies on this aspect. However, these methods do not take into account perceptual factors. In this paper, we propose a new visual security assessment (VSA) by saliency-weighted structure and orientation similarity. Considering that the human visual perception is sensitive to the characteristics of selective encrypted images, we extract the structure and orientation feature maps, and then similarity measurements are conducted on these feature maps to generate the structure and orientation similarity maps. Next, we compute the saliency map of the original image. Then, a simple saliency-based pooling strategy is subsequently used to combine these measurements and generate the final visual security score. Extensive experiments are conducted on two public encryption databases, and the results demonstrate the superiority and robustness of our proposed VSA compared with the existing most advanced work.

1. Introduction

Nowadays, with the widely pervasive usage of interaction devices, such as cameras, cloud storage devices, and the explosive growth of digital images, privacy protection has attracted a lot of attention from researchers [1–5]. Various security schemes, such as digital watermarking [6–8], steganography [9], and encryption [10], have been developed to protect copyright, and encryption is the mostly accepted approach which can ensure the security and integrity of data all the time. Roughly, the existing image encryption methods can be divided into two categories: full encryption and selective encryption. Full encryption refers to encrypting the entire image; therefore, we cannot get any information about the original image from the encrypted image. However, the content information of the image cannot be revealed if several kinds of redundant information are unencrypted. For this reason, traditional full encryption methods such as AES are not suitable for image data, because these methods always encrypt all the information of the image which cost a lot of time. Therefore, researchers proposed the selective encryption algorithm that has been widely used to protect the visual content of multimedia by only encrypting the specified parts of the multimedia data. A great variety of selective encryption algorithms [10–16] have been proposed in recent decades. Compared with full encryption algorithms, selective encryption has two main advantages as follows. First, it can be extremely fast on encryption and decryption because only a portion of the data needs to be encrypted. Second, the selective encrypted multimedia data can prevent the abuse of the essential visual property of the original data. These advantages make selective encryption highly desirable for protecting more and more image and video data which hide a large amount of personal privacy on the network.

The purpose of security analysis for selective image encryption is to measure the degree of visual security of selective encrypted images. Visual security analysis can measure the performance of the selective encryption methods and then help us to optimize the encryption methods. Since humans are the ultimate receivers of images, subjective tests conducted by human viewers are the most suitable and accurate way to evaluate the visual security of selective encrypted images. However, such tests are too time-consuming and laborious to accomplish real-time applications. For this reason, visual security assessment [17] (VSA) is proposed to evaluate the visual security of selective encrypted images by measuring the unintelligibility or unrecognizability of the image automatically, which indicates the amount of useful information about the original image that an attacker can obtain from its selective encrypted image via visual perception. The higher the unrecognizability degree of the selective encrypted image is, the less visual information the attacker can obtain. In such a situation, it becomes more difficult for an attacker to obtain information about the original image and the selective encryption method is more secure.

In the past decades, many efforts have been conducted to design VSAs. At the beginning, researchers believe that visual security of images has a strong relationship with image quality. Therefore, they directly used the well-known image quality assessment [18] (IQA) methods to evaluate the visual security degree of selective encrypted images, such as peak signal-to-noise ratio (PSNR), structural similarity (SSIM) [19] and visual information fidelity (VIF) [20], and the images with lower quality tend to have higher security. However, these metrics may be inconsistent with the concept of security strength. For example, an image with a lower PSNR value may be even more recognizable than one with a higher PSNR value. These IQA methods do not take full account of the characteristics of the selective encrypted images. For selective encrypted images, an important feature is that the skeleton of the image is still intelligible but the details are almost unintelligible [21]. On the other hand, the structure information can express the skeleton of an image which plays a more important role in selective encrypted images. Subsequently, several VSAs have been developed based on some visual features of selective encrypted images, e.g., edge similarity score (ESS) [22] based on the edge, luminance similarity score (LSS) [22] utilizing luminance feature, local feature-based visual security (LFBVS) [23] using luminance and localized gradient, and the visual security index-based Canny (VSI-Canny) [21] which extracted edge and texture features. However, these VSAs do not fully consider the role of visual perception [24–26] factor in VSAs, because the visual perception of each region differs from another according to the principle of the human visual system (HVS), which also have different impacts on visual security evaluation. Additionally, HVS presents an obvious visual saliency mechanism. HVS focuses only on these important regions for detailed perception and withdraws the other regions. The regions have high saliency values play more important roles than the other regions for visual perception, and the information leakage on the high saliency regions has a larger influence on the visual security assessment.

Motivated by the problems mentioned above, in this paper, we propose a visual security assessment via saliency-weighted structure and orientation similarity. Structure is the basic element that conveys important visual information, and selective encryption can cause obvious structure changes of an image [21]. Therefore, we can measure the visual security of selective encrypted images by the change of structure. The gradient magnitude (GM) and the phase congruence [27] (PC) are widely used to extract the image structure information. However, GM and PC cannot effectively reflect the structure degradation in the selective encryption images. GM is sensitive to luminance and it can well reflect the changes of image luminance [28]. However, this characteristic of GM also makes it is not effective to extract the structure information of the areas with similar grayscale values. Compared with GM, PC is not affected by luminance [27]. However, PC cannot extract the clear structure information of the areas with similar frequencies as it is calculated based on frequency [27]. Therefore, we integrated PC with GM to obtain the structure features of the selective encrypted images. Studies show that HVS is highly adapted to extract orientation information [29] and selective encryption can cause obvious orientation changes of an image. Therefore, we can extract the orientation information of a selective encrypted image to measure its security. The structure and the orientation feature maps are extracted from both original and selective encrypted images. Finally, an image saliency-based pooling strategy is introduced to combine these measurements and generate a visual security score. Our main contributions can be summarized as follows:(1)We propose to extract the structure and orientation features for the visual security evaluation of the selective encrypted images, because selective encryption can cause obvious changes in structure and orientation of an image and the HVS is highly sensitive to the change of structure and orientation. We combine GM and PC to extract the structure information to measure the structure similarity of original images and selective encrypted images and utilize the change of image orientation to measure the orientation similarity.(2)Considering that different regions of an image have different effects on visual security assessment of selective encrypted images, we combine the saliency map with the structure and orientation similarity maps to generate the final VSA.(3)We conduct comparative experiments on two common encryption image databases to evaluate the performance of our proposed VSA. The experimental results show that the proposed method achieves superior and robust performance compared with other state-of-the-art VSAs, especially on the images in low- and moderate-quality ranges.

The structure of the rest of the paper is as follows. Section 2 reviews the related work. The details of our proposed VSA method are in Section 3. Then, we describe the experimental evaluation of our proposed VSA and existing VSAs in Section 4. Finally, Section 5 concludes this paper.

A variety of methods have been proposed to estimate the visual security of selective encrypted images. The initial solutions usually employ well-known IQAs to evaluate the visual security. Subsequently, several VSAs have been proposed to evaluate the visual security.

2.1. Image Quality Assessment

Many researchers believe that the images with higher visual security tend to have lower visual quality, so many IQA methods designed for the assessment of image visual quality have been employed to measure the visual security of selective encrypted images. For instance, PSNR is the simplest and the most widely used method [30, 31]. PSNR, which evaluates visual security by calculating the Euclidean distance between the original image and distorted image, is the simplest and most popular visual quality assessment metric. SSIM [19] is also adopted for visual security evaluation [30, 31] by measuring the similarities of luminance, contrast, and structure between two images in consideration of the HVS. VIF [20] is another IQA method used to estimate the visual security of selective encrypted images. It measures the amount of information contained in original and selective encrypted images, respectively, and then measures the relationship between image information and visual quality. However, these IQA metrics often exhibit unsatisfactory performance when they are used to estimate the visual security of selective encrypted images of low quality. Since the task of IQA is inconsistent with that of VSA, an image with poor visual quality may not indicate its visual security [21].

For example, an image with a higher VIF, PSNR, or SSIM may even be more visually secure than one with a lower value of one of these indices. Figures 1(a)–1(c) show the performance of the PSNR, VIF, and SSIM indices on several images from the PEID database [33]. Figure 1(a) shows an original image, and Figures 1(b) and 1(c) show two encrypted images. It is clear that Figure 1(c) has a higher visual security, but this image is found to have better visual quality as assessed using the PSNR, VIF, and SSIM.

(a)

(b)

(c)

(d)

(e)

(f)

We can find that many IQAs cannot achieve excellent performance on visual security assessment because the targets of image quality assessment and visual security assessment are different: image quality assessment focuses on the fidelity of an image, but visual security assessment is concerned with the leakage degree of an image’s content.

2.2. Visual Security Assessment

Several VSAs have been proposed to evaluate the visual security of selective encrypted images. They are usually more accurate and effective than IQA methods because they are specifically designed for the visual security evaluation of selective encrypted images. Mao and Wu [22] proposed the ESS and LSS to compute the edge similarity and the luminance similarity between original and selective encrypted images. However, the ESS and LSS focus only on local information of the images, which may not cover the various types of distortions that appear in selective encrypted images. Tong et al. [23] presented the LFBVS by considering various types of distortions present in selective encrypted images and measured the similarities of luminance and the localized gradient between original and selective encrypted images. Although the LFBVS utilizes more visual information compared with the ESS and LSS, its performance is still unsatisfactory when tested on various encrypted image databases. Xiang et al. [21] proposed the VSI-Canny by calculating the edge and texture similarities between original and selective encrypted images. VSI-Canny considers more visual features of selective encrypted images and has relatively good performance, but it does not consider the image’s visual saliency, which is a critical property of the HVS.

For example, Figures 1(d)–1(f) illustrate the performance of different VSA indices on an image from the IVC-SelectEncrypt database. Figure 1(d) shows the original image, and two encrypted versions of which are shown in Figures 1(e) and 1(f). It is clear that Figure 1(f) is more visually secure than Figure 1(e). However, Figure 1(f) has higher LSS and VSI-Canny values than Figure 1(e).

As mentioned above, the problems of the existing visual security metrics exhibit many aspects. These questions will lead to the inaccurate evaluation of image security by visual security indicators. We consider and address these issues in our proposed scheme, as described in the following section.

3. Proposed Visual Security Assessment

In this work, we describe our proposed VSA and the flowchart of the proposed VSA is shown in Figure 2. First, we combine GM and PC to extract the structure information and we can compute the structure similarity map of the original and selective encrypted image. Secondly, based on the fact that the HVS is sensitive to the change of orientation, we extract the orientation information by the GM and we can compute the orientation similarity map. Next, considering that the security of a selective encrypted image depends on the degree of disclosure of its visual content, which is obtained by comparing it with the original image, we only compute the saliency map of the original image. At last, the generated structure and orientation similarity maps are further fused by saliency-based polling method to obtain the final score.

3.1. Structure Similarity

The structure of an image has important information which is highly sensitive to the visual perception. Both GM and PC can extract structural information of images and we found that they can complement each other. Therefore, we integrated the GM map with the PC map to generate the structure features of selective encrypted images.

3.1.1. Gradient Magnitude

Image gradient magnitude can be defined as a transition in intensity. The GM of an image is represented by a vector which consists of gradient in the horizontal and vertical directions at each pixel, and it reflects the maximum strength of structure variation. The gradient magnitude of an image is defined aswhere (i, j) is the index of an image I. In this work, for image I, and are calculated aswhere and T denote the convolution and transpose, respectively, and F is the gradient operator:

As shown in Figure 3(b), it can be seen that the GM maps of the encrypted images have obvious changes. However, there are no obvious changes in the GM values in some areas with similar grayscale values. GM is sensitive to luminance; therefore, it can well reflect the changes of image luminance [28]. However, this characteristic of GM also makes it is not effective to extract the structure information of the areas with similar grayscale values.

(a)

(b)

(c)

(d)

(e)

(f)

Figure 3

Illustrations of different feature maps. (a) First column is the input images. (b) Second column is the GM maps of the images in (a). (c) Third column is the PC maps of the images in (a). (d) Fourth column is the ST maps of the images in (a). (e) Fifth column is the orientation maps of the images in (a). (f) Sixth column is the saliency maps of the images in (a). And the images in second, fourth, and sixth rows in (a) are the encrypted images of first, third, and fifth rows in (a), respectively.

3.1.2. Phase Congruency

The phase congruency model [27], which is based on frequency domain processing of an image, means that features with similar edges appear more frequently at the same stage. It assumes that the visual system is more competent in performing operations using the phase and amplitude of the individual frequency components in an image than handing of image information spatially. Compared with GM, PC is invariant to local smooth luminance changes. Given an image I, its PC is computed aswithwhere W(i, j) denotes the weighting parameter to reduce the effect of frequency spread at position (i, j); denotes the manipulating function by weighting; N is the scale number; c offers a cutoff value for penalizing low PC values under it; l_norm(i, j)is the normalized luminance at (i, j) to avoid the effect of luminance. γ, as the gain variable, controls the cutoff sharpness; and symbol aims to set negative value to zero. To determine two-dimensional phase congruency of a given image, the image is first convoluted with a Log-Gabor filters bank s and θ are the scale and orientation of the Log-Gabor filter. And the even symmetric filter and odd-symmetric filter at scale s and orientation θ are and , respectively. (i, j) and (i, j) represent the amplitude and phase at position (i, j), respectively; T is a quantity introduced to compensate image noise; is a small positive constant to preserve stability; (i, j)represents the mean value of phase. Since it is out of scope to investigate these parameters’ influence on the PC map, in this study, we directly set them according to [27].

The effectiveness of PC can be demonstrated in Figure 3(c), and the PC maps of the encrypted images have obvious changes. However, there are no obvious changes in the PC values in some areas with similar frequencies. Compared with GM, PC is not affected by luminance [27]. However, PC cannot extract the clear structure information of the areas with similar frequencies as it is calculated based on frequency [28].

3.1.3. Structure Map Integrating GM with PC

As mentioned above, GM and PC cannot effectively reflect the structure degradation in the selective encryption images. Therefore, after extracting the GM and PC of the image, we proposed to reflect the structure map of the image by integrating GM with PC, and the structure information ST of the image can be obtained aswhere (i, j) is the index of the pixel, GM_max represents the maximum value of GM, the maximum value of the corresponding positions of GM and PC is used to form ST, and GM uses the maximum value for normalization. If one of the two values has a larger value, the pixel is considered to be a structural feature point, and the maximum fusion strategy can comprehensively extract the structural features of the image. GM and PC can play complementary roles in extracting structural information. Here, we give an example in Figure 3 to demonstrate the effectiveness of our structure extraction method. Figure 3 shows different feature maps from the reference images and their selective encrypted images. Apparently, as compared with the other feature map, the proposed structure map has a remarkable effect on structure degradation of a selective encrypted image. From Figure 3(d), we can observe the more accurate and clearer structure information in the selective encrypted image.

Analogous to the practice exercised in [19, 34], the structure similarity map S_ST of the original and the encrypted images can be measured aswhere S_ST(i, j)ε(0, 1], S_O(i, j) and S_E(i, j) are the structure maps of the original image O and the encrypted image E, respectively, and R is a positive constant used to avoid instability when the denominator converges to zero.

3.2. Orientation Similarity

In addition to the structure similarity, we also consider the orientation similarity between the original and encrypted images because the orientation information is an indispensable element for human visual perception. The orientation of an image, which has been widely used to the image quality assessment [35], conveys important information [29], which has an important effect on the visual security evaluation of selective encrypted images. The orientation change of each pixel can reflect the degradation of the selective encrypted image details.

A visual pattern was built by orientation information in [35], which can be used for IQA. However, this pattern ignores some intuitive visual information; therefore, this pattern does not fully apply to VSA.

Considering the above question, we design a new algorithm to compute the orientation similarity. In this work, for an image I, the preferred orientation of each pixel is calculated as its gradient direction :where G_h(i, j) and (i, j) are the gradient magnitudes along the horizontal and vertical directions, respectively, which can be obtained from equation (2). And (i, j) is the index of the pixel in I. So, we can obtain the quantitative orientation information. We give an example in Figure 3, and we can find that the orientation information of the image has obvious changes.

Then, we compute the orientation change D_O of the original image O and its encrypted image E by calculating their distance:where |.| denotes the absolute operation, (i, j) is the index of the pixel, and and are the orientation maps of the original image O and its encrypted image E, respectively.

Because the range of and is [−180, 180], the range of D_O is [0, 360]. Considering that HVS has a similar perception of relative orientation (such as 90^o and 270^o), we set the range of D_O to [0, 180] by setting .

Then, the orientation similarity map S_O of the original and the encrypted images can be measured as

3.3. Saliency-Weighted Pooling

It is observed that different regions have drastically different effects on the visual understanding of an image. Most of the contribution to visual perception is provided by the information loss and distortion in important regions. An image importance map refers to the important regions that provide a greater contribution to the visual perception, and such maps have been studied extensively in recent years. So, we highlight these important regions and suppress the other regions with a salient map for visual content extraction. To this end, the salient value of each pixel is required. As illustrated in Figure 3(f), visual saliency map highlights the important regions in an image, and the visual saliency map can extract the important areas of an image and then get a better VSA. In the past decades, a large number of saliency models [36–40] have been proposed and these models can help us complete a better VSA.

S_ST(i, j) and S_O(i, j), obtained by equations (7) and (10), respectively, are two feature similarity maps with the same size as the image. However, we need a VSA score to represent the visual security. Therefore, we need a pooling method to compress the two feature maps into two scores to represent the feature similarities. In our work, we take the simple and classic saliency-weighted pooling method. Considering that the security of a selective encrypted image depends on the degree of disclosure of its visual content, which is obtained by comparing it with the original image. So, we select the original image’s saliency map SM_O(i, j) to combine with the structure similarity map S_ST(i, j) and orientation similarity map S_O(i, j), respectively:

Considering that different saliency models affect the performance and communicational cost of our proposed VSA, we calculate the performance and running time of different saliency models. To eliminate the possible bias due to specific image selection, we randomly choose 100 images from the IVC-SelectEncrypt database and then calculate the average running time as the computation cost of each saliency model. Table 1 shows the results. From Table 1, we can find that the more appropriate saliency model is GBVS. Therefore, as a simple but powerful saliency model, graph-based visual saliency [36] (GBVS) is employed. A saliency map of an original image generated by GBVS can be seen in Figure 3(f).

After performing the similarity measurements on the structure and the orientation features between the original and encrypted images, respectively, the generated structure similarity VS_ST and orientation similarity VS_O are combined together to calculate the visual security:where α and β are two parameters used to adjust the relative importance of VS_ST and VS_O. The structure and orientation features of an image are important which are highly sensitive to the visual perception. For selective encrypted images, an important feature is that the skeleton of the image is still intelligible but the details are almost unintelligible. Therefore, structure obviously plays a more important role than orientation, and we explore the effect of the structure information and orientation information, respectively, in Table 2. So that the value of α should be greater than β. In our experiments, α and β are set to 0.8 and 0.2, respectively, because this setting was found to be optimal.

4. Experiments

In this section, the performance of our proposed VSA is analyzed by comparing with other IQAs and VSAs. We evaluate the performance from confidence, monotonicity, linearity, and accuracy and provide comparisons with other IQAs and VSAs.

4.1. Experimental Protocol

4.1.1. Test Database

To verify the performance of our proposed method, experiments are conducted on two common encrypted databases: IVC-SelectEncrypt [32] and PEID [33].

The IVC-SelectEncrypt can be downloaded from http://www.polytech.univ-nantes.fr/autrusseau-f/Databases/SelectiveEncryption/. The PEID is from https://sites.google.com/site/xiangtaooo/. Their detailed statistical information is summarized in Table 3.

The IVC-SelectEncrypt database consists of 8 original images, 200 encrypted images are generated from them using 5 different encrypted algorithms with 5 different encryption degrees. The range of its mean opinion scores (MOS) is [1, 5].

The PEID database has 1080 encrypted images obtained from 20 original images by using 10 encryption schemes. It has two subjective scores: the visual quality score and visual security score. We use only the visual security score here because our task is visual security assessment, and the range of its mean opinion scores (MOS) is [0, 6].

4.1.2. Evaluation Methodology

We evaluate the performance from confidence, monotonicity, linearity, and accuracy.

Confidence is utilized to establish how well a VSA actually reflects the human judgment [17]. Given a subjective score x(x ε MOS) on a database D, and for this score x, each image I has a subjective score V_I. We define V_max(x) as the maximum of the objective scores of those images on D and define V_min(x) as the minimum of the objective scores. Confidence C_x = | V_max(x) − V_min(x)| measures the difference between these two extrema. The normalized mean confidence µ_D, the normalized standard deviation σ_D, and the normalized maximum confidence max_D are the evaluation criteria which are generated based on C_x.

To ascertain the correlation between the subject VSA scores and object scores MOS, we compute the Spearman rank correlation coefficient (SRCC), the Kendall rank correlation coefficient (KRCC), the Person linear correlation coefficient (PLCC), and the root mean-squared error (RMSE). SRCC and KRCC can evaluate performance monotonicity, PLCC can evaluate linearity, and RMSE can evaluate accuracy. Before the calculation of the correlation between the subject VSA scores and object scores MOS, a five-parameter logistic regression function is applied to reduce the nonlinearity of the subject VSA scores [33], which is defined aswhere S′ is the fitted VSA score, S is the objective VSA score, and β_i (i = 1, 2, 3, 4, 5) denotes the parameters determined via curve fitting.

A better VSA should have lower µ_D, σ_D, max_D, and RMSE values but have higher SRCC, KRCC, and PLCC values.

4.2. Comparative Analysis

We compare our proposed VSA with other IQAs and VSAs by the evaluation criterions mentioned above from the following three aspects. And these IQAs and VSAs include the PSNR, SSIM [19], VIF [20], ESS [22], LSS [22], LFBVS [23], and VSI-Canny [21].

4.2.1. Overall Evaluation

The results of the confidence evaluation of all IQAs and VSAs on the IVC-SelectEncrypt database are shown in Figure 4. A better VSA should have lower and more stable C_x values. From Figure 4, we can find that the C_x values of VIF, VSI-Canny, and our VSA are more stable.

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

Table 4 lists the overall performance of all IQAs and VSAs on the IVC-SelectEncrypt and PEID databases, and the best is marked in bold. Obviously, our proposed VSA performs best on IVC-SelectEncrypt. On the PEID database, VIF achieves the best monotonicity (the highest SRCC and KRCC) and the lowest RMSE, LSS has the lowest σ_D, ESS achieves the lowest max_D, and our proposed VSA achieves the best μ_D and PLCC. Although our proposed VSA is not the best in some values, it is very close to the best one. Compared with other methods, the extracted structure and orientation features of our proposed VSA are more consistent with HVS because HVS is very sensitive to structure and orientation changes caused by selective encryption. In addition, we also considered the visual saliency that was not considered by other IQAs and VSAs. Therefore, it is clear and reasonable that our proposed VSA exhibits the better overall performance.

4.2.2. Evaluation on Different Quality Ranges

The selective encrypted images usually have low and moderate visual quality [21, 26]. Therefore, to evaluate the performance of these VSAs more comprehensively, we should evaluate the performance of these VSAs on different image quality ranges (i.e., low, moderate, and high). The detailed division information can be found in Table 2. Considering that the selective encrypted images are typically in the low- or moderate-quality ranges, it is more important to evaluate the performance of VSAs in the low and moderate image quality ranges than in the high-quality ranges [21, 26].

The comparison results of different VSAs in different image quality ranges on the two test databases are shown in Table 5. We can find that our proposed VSA has better performance compared with the other VSAs in the low and moderate image quality ranges. In the low image quality range, on the IVC-SelectEncrypt database, VIF shows superior performance in max_D, LSS shows the best performance on PLCC, SRCC, KRCC, and RMSE, our proposed VSA achieves the best value on µ_D and σ_D, and other values are very close to the best one. On the PEID database, VIF shows superior performance in confidence evaluation (lowest μ_D, σ_D, and max_D), and our proposed VSA achieves the best performance on PLCC, SRCC, KRCC, and RMSE. In the moderate image quality range, our proposed VSA achieves the best performance on monotonicity, linearity, and accuracy evaluation on the two databases. In the high image quality range, SSIM obtains the best performance on IVC-SelectEncrypt database; on the PEID database, various VSAs exhibit satisfactory performance in different aspects. In summary, our proposed VSA exhibits better performance in low and moderate image quality ranges on the two databases. In the low and moderate image quality ranges, the structure and orientation changes caused by selective encryption are more obvious. And the saliency of the original image can extract the more important areas of the images which is important for the visual security assessment of the selective encrypted images. Therefore, it is rational that our proposed VSA shows the better performance in low and moderate image quality ranges.

4.2.3. Evaluation on Different Encryption Types

We also evaluated the different VSAs on various types of encryption on the two test databases to more comprehensively evaluate the performance of all VSAs. There are 15 different encryption types in the test on the two databases. Tables 6 and 7 report the performance results of all encryption types that appear in the test databases, respectively.

From Table 6, we can find that our proposed VSA has better monotonicity performance (the higher SRCC and KRCC values) than other IQAs and VSAs on the two databases. More specifically, our proposed VSA achieves the highest SRCC hit-count (8 times) and KRCC hit-count (7 times), and this value is higher than those of the other metrics. We can also find that our proposed VSA still has the highest PLCC hit-count (7 times) and RMSE hit-count (7 times) from Table 7.

From Tables 6 and 7, we can see that all of the involved VSAs obtain relatively inferior performance on encryption types enc08 and enc09 in the PEID database, and our proposed VSA is still relatively great on enc09 but relatively poor on enc08. As shown in Figure 5, the distortions caused by these two encryption methods are different from other methods that they make the images warping. Therefore, the features extracted from these encrypted images cannot match the features of the original images and these VSAs and IQAs have not good performance on the two encryption methods. This situation also results the overall performance of our method at PEID being worse than IVC-SelectEncrypt. Except for enc08 and enc09, other encryption methods cause obvious structure and orientation changes of images. Therefore, it is reasonable that our proposed method shows the better performance because the features of our proposed VSA are more relevant to the content leakage caused by most encryption methods.

(a)

(b)

(c)

4.3. Computational Complexity

Finally, considering that the running time is important in many practical applications, we analyze the computational cost of all VSAs. In our test, we measure the computational cost of a VSA on 512 × 512 images. We perform experiments using the original code in the MATLAB R2016b on a 64-bit Windows 7 operating system at 16 GB memory and 3.20 GHz frequency of Intel processors. To avoid the possible bias caused by selecting the specific images, we randomly choose 100 images from the PEID database and then calculate the average running time as the computation cost of each VSA.

It is known from Table 8 that most of the metrics are fast to compute. By contrast, PSNR and SSIM are the fastest methods but they are mainly used for image quality assessment, and their performance is not excellent. VIF is also an IQA method which has a relatively good performance, but its running time is much higher than other methods because its computational model is more complex. Compared with other VSAs, our proposed VSA has a faster running speed. In implementation, our method takes up most of the time in feature extraction procedure. In the future, we will try to explore more efficient feature extraction techniques to reduce the computational cost of the proposed method.

5. Conclusions

In this paper, we have presented a novel visual security assessment (VSA) that makes use of the structure and orientation information. First, we extract the structure of the original and the encrypted images by combining PC and GM. Then, we extract the orientation information by the GM, and we can obtain similarity measurements by calculating the structure and orientation similarity maps. Meanwhile, we compute the saliency map of original image. Then, we utilize a saliency-based polling strategy to combine these two similarity maps and generate the final VSA score. We conduct extensive experiments to evaluate the performance of our proposed VSA and compare it with other IQAs and VSAs which are widely used for the visual security assessment for encrypted images on two encryption image databases. The experimental results show that our proposed VSA has better performance and stronger robustness than all existing IQAs and VSAs, especially in the range of low and moderate image quality.

Data Availability

Previously reported IVC-SelectEncrypt and PEID data were used to support this study and are available at http://www.polytech.univ-nantes.fr/autrusseau-f/Databases/SelectiveEncryption/ and https://sites.google.com/site/xiangtaooo/, respectively. These prior studies (and datasets) are cited at relevant places within the text as references.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding publication of this paper.

Acknowledgments

This work was supported in part by the Natural Science Foundation of China under grant nos. 61601268, 61803237, 61901246, and U1736122; in part by the Natural Science Foundation for Distinguished Young Scholars of Shandong Province under grant no. JQ201718; and in part by the Shandong Provincial Key Research and Development Plan under grant no. 2017CXGC1504.

References

F. Shang, H. Zhang, L. Zhu, and J. Sun, “Adversarial cross-modal retrieval based on dictionary learning,” Neurocomputing, vol. 355, pp. 93–104, 2019.
View at: Publisher Site | Google Scholar
E. Yu, J. Sun, J. Li, X. Chang, X.-H. Han, and A. G. Hauptmann, “Adaptive semi-supervised feature selection for cross-modal retrieval,” IEEE Transactions on Multimedia, vol. 21, no. 5, pp. 1276–1288, 2019.
View at: Publisher Site | Google Scholar
H. Liu, B. Xu, D. Lu, and G. Zhang, “A path planning approach for crowd evacuation in buildings based on improved artificial bee colony algorithm,” Applied Soft Computing, vol. 68, pp. 360–376, 2018.
View at: Publisher Site | Google Scholar
H. Wu, H. Zhang, L. Cui, and X. Wang, “A heuristic model for supporting users’ decision-making in privacy disclosure for recommendation,” Security and Communication Networks, vol. 2018, Article ID 2790373, 13 pages, 2018.
View at: Publisher Site | Google Scholar
Y. X. Yan, L. Wu, W. Y. Xu, H. Wang, and Z. M. Liu, “Integrity audit of shared cloud data with identity tracking,” Security and Communication Networks, vol. 2019, Article ID 1354346, 11 pages, 2019.
View at: Publisher Site | Google Scholar
W. Wan, J. Wang, and J. Li, “Hybrid JND model-guided watermarking method for screen content im-ages,” Multimedia Tools and Applications, vol. 79, no. 7-8, pp. 4907–4930, 2018.
View at: Publisher Site | Google Scholar
J. Wang and W. Wan, “A novel attention-guided JND model for improving robust image watermarking,” Multimedia Tools and Applications, vol. 79, no. 33-34, pp. 24057–24073, 2020.
View at: Publisher Site | Google Scholar
W. Wan, J. Wang, J. Li et al., “Pattern complexity-based JND estimation for quantization watermarking,” Pattern Recognition Letters, vol. 130, pp. 157–164, 2020.
View at: Publisher Site | Google Scholar
L. Zou, J. Sun, M. Gao, W. Wan, and B. B. Gupta, “A novel coverless information hiding method based on the average pixel value of the sub-images,” Multimedia Tools and Applications, vol. 78, no. 7, pp. 7965–7980, 2019.
View at: Publisher Site | Google Scholar
Y. Song, H. Wang, X. Wei, and L. Wu, “Efficient attribute-based encryption with privacy-preserving key generation and its application in industrial cloud,” Security and Communication Networks, vol. 2019, Article ID 3249726, 9 pages, 2019.
View at: Publisher Site | Google Scholar
X. Xiao, X. Zheng, and Y. Zhang, “A multidomain survivable virtual network mapping algorithm,” Security and Communication Networks, vol. 2017, Article ID 5258010, 12 pages, 2017.
View at: Publisher Site | Google Scholar
H. Liu, B. Liu, H. Zhang, L. Li, X. Qin, and G. Zhang, “Crowd evacuation simulation approach based on navigation knowledge and two-layer control mechanism,” Information Sciences, vol. 436-437, pp. 247–267, 2018.
View at: Publisher Site | Google Scholar
G. Han and W. Zhang, “Improved biclique cryptanalysis of the lightweight block cipher piccolo,” Security and Communication Networks, vol. 2017, Article ID 7589306, 12 pages, 2017.
View at: Publisher Site | Google Scholar
X. Zheng, J. Tian, X. Xiao, X. Cui, and X. Yu, “A heuristic survivable virtual network mapping algorithm,” Soft Computing, vol. 23, no. 5, pp. 1453–1463, 2019.
View at: Publisher Site | Google Scholar
H. Wang, D. He, J. Shen, Z. Zheng, X. Yang, and M. H. Au, “Fuzzy matching and direct revocation: a new CP-ABE scheme from multilinear maps,” Soft Computing, vol. 22, no. 7, pp. 2267–2274, 2018.
View at: Publisher Site | Google Scholar
W. Zhang and V. Rijmen, “Division cryptanalysis of block ciphers with a binary diffusion layer,” IET Information Security, vol. 13, no. 2, pp. 87–95, 2019.
View at: Publisher Site | Google Scholar
H. Hofbauer and A. Uhl, “Identifying deficits of visual security metrics for images,” Signal Processing: Image Communication, vol. 46, pp. 60–75, 2016.
View at: Publisher Site | Google Scholar
Z. Tang, Z. Huang, H. Yao, X. Zhang, L. Chen, and C. Yu, “Perceptual image hashing with weighted DWT features for reduced-reference image quality assessment,” The Computer Journal, vol. 61, no. 11, pp. 1695–1709, 2018.
View at: Publisher Site | Google Scholar
Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600–612, 2004.
View at: Publisher Site | Google Scholar
H. R. Sheikh and A. C. Bovik, “Image information and visual quality,” IEEE Transactions on Image Processing, vol. 15, no. 2, pp. 430–444, 2006.
View at: Publisher Site | Google Scholar
T. Xiang, S. Guo, and X. Li, “Perceptual visual security index based on edge and texture similarities,” IEEE Transactions on Information Forensics and Security, vol. 11, no. 5, pp. 951–963, 2016.
View at: Publisher Site | Google Scholar
Y. Mao and M. Wu, “Security evaluation for communication-friendly encryption of multimedia,” in Proceedings of IEEE International Conference on Image Processing (ICIP), vol. 1, pp. 569–572, Singapore, October 2004.
View at: Google Scholar
L. Tong, F. Dai, Y. Zhang, and J. Li, “Visual security evaluation for video encryption,” in Proceedings of ACM International Conference on Multimedia, pp. 835–838, Firenze, Italy, October 2010.
View at: Google Scholar
J. Sun, X. Liu, W. Wan, J. Li, D. Zhao, and H. Zhang, “Video hashing based on appearance and attention features fusion via DBN,” Neurocomputing, vol. 213, pp. 84–94, 2016.
View at: Publisher Site | Google Scholar
J. Zong, L. Meng, H. Zhang, and W. Wan, “JND-based multiple description image coding,” KSII Transactions on Internet and Information Systems, vol. 11, no. 8, pp. 3935–3949, 2017.
View at: Google Scholar
T. Xiang, Y. Yang, H. Liu, and S. Guo, “Visual security evaluation of perceptually encrypted images based on image importance,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 11, pp. 4129–4142, 2020.
View at: Publisher Site | Google Scholar
P. Kovesi, “Image features from phase congruency,” Videre: Journal of Computer Vision Research, vol. 1, no. 3, pp. 1–26, 1999.
View at: Google Scholar
Y. Liu, k. Gu, Y. Zhang et al., “Unsupervised blind image quality evaluation via statistical measurements of structure, naturalness, and perception,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 4, pp. 929–943, 2020.
View at: Publisher Site | Google Scholar
J. Wu, W. Lin, G. Shi, Y. Zhang, W. Dong, and Z. Chen, “Visual orientation selectivity based structure description,” IEEE Transactions on Image Processing, vol. 24, no. 11, pp. 4602–4613, 2015.
View at: Publisher Site | Google Scholar
X. Zhang, “Separable reversible data hiding in encrypted image,” IEEE Transactions on Information Forensics and Security, vol. 7, no. 2, pp. 826–832, 2012.
View at: Publisher Site | Google Scholar
J. Zhou, O. C. Au, G. Zhai, Y. Y. Tang, and X. Liu, “Scalable compression of stream cipher encrypted images through context-adaptive sampling,” IEEE Transactions on Information Forensics and Security, vol. 9, no. 11, pp. 1857–1868, 2014.
View at: Publisher Site | Google Scholar
F. Autrusseau, T. Stutz, and V. Pankajakshan, “Subjective quality assessment of selective encryption techniques,” 2010.
View at: Google Scholar
S. Guo, T. Xiang, X. Li, Y. Yang, and P. E. I. D. “, “A perceptually encrypted image database for visual sec-urity evaluation,” IEEE Transactions on Information Forensics and Security, vol. 15, no. 99, pp. 1151–1163, 2019.
View at: Google Scholar
L. Zhang, Y. Shen, and H. Li, “VSI: VSI: a visual saliency-induced index for perceptual image quality assessment,” IEEE Transactions on Image Processing, vol. 23, no. 10, pp. 4270–4281, 2014.
View at: Publisher Site | Google Scholar
J. Wu, W. Lin, G. Shi, L. Li, and Y. Fang, “Orientation selectivity based visual pattern for reduced-reference image quality assessment,” Information Sciences, vol. 351, pp. 18–29, 2016.
View at: Publisher Site | Google Scholar
J. Harel, C. Koch, and P. Perona, “Graph-based visual saliency,” in Proceedings of International Conference on Neural Information Processing Systems (NIPS), pp. 545–552, Columbia, Canada, December 2006.
View at: Google Scholar
S. Goferman, L. Zelnik-Manor, and A. Tal, “Context-aware saliency detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 10, pp. 1915–1926, 2012.
View at: Publisher Site | Google Scholar
E. Erdem and A. Erdem, “Visual saliency estimation by nonlinearly integrating features using region covariances,” Journal of Vision, vol. 13, no. 4, pp. 1–11, 2013.
View at: Publisher Site | Google Scholar
A. Garcia-Diaz, X. R. Fdez-Vidal, X. M. Pardo, and R. Dosil, “Saliency from hierarchical adaptation through decorrelation and variance normalization,” Image and Vision Computing, vol. 30, no. 1, pp. 51–64, 2012.
View at: Publisher Site | Google Scholar
J. Zhang and S. Sclaroff, “Saliency detection: a boolean map approach,” in Proceedings of the 2013 IEEE International Conference on Computer Vision, Sydney, NSW, Australia, December 2013.
View at: Google Scholar

Copyright

Copyright © 2021 Zhengguo Wu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

369

Downloads

711

Citations

Security and Communication Networks

Multimedia Communication Security in 5G/6G

Visual Security Assessment via Saliency-Weighted Structure and Orientation Similarity for Selective Encrypted Images

Abstract

1. Introduction

2. Related Work

2.1. Image Quality Assessment

2.2. Visual Security Assessment

3. Proposed Visual Security Assessment

3.1. Structure Similarity

3.1.1. Gradient Magnitude

3.1.2. Phase Congruency

3.1.3. Structure Map Integrating GM with PC

3.2. Orientation Similarity

3.3. Saliency-Weighted Pooling

4. Experiments

4.1. Experimental Protocol

4.1.1. Test Database

4.1.2. Evaluation Methodology

4.2. Comparative Analysis

4.2.1. Overall Evaluation

4.2.2. Evaluation on Different Quality Ranges

4.2.3. Evaluation on Different Encryption Types

4.3. Computational Complexity

5. Conclusions

Data Availability

Conflicts of Interest

Acknowledgments

References

Copyright