The fast development of image encryption requires performance evaluation metrics. Traditional metrics like entropy do not consider the correlation between local pixel and its neighborhood. These metrics cannot estimate encryption based on image pixel coordinate permutation. A novel effectiveness evaluation metric is proposed in this paper to address the issue. The cipher text image is transformed to bit stream. Then, Poker Test is implemented. The proposed metric considers the neighbor correlations of image by neighborhood selection and clip scan. The randomness of the cipher text image is tested by calculating the chi-square test value. Experiment results verify the efficiency of the proposed metrics.

1. Introduction

The accelerating growth of personal smart devices and Internet makes it easy to distribute, share, and exchange digital image data via various sorts of open networks. It is simple to access these image data when they are transmitted via open networks. In this case, image data security has become a crucial issue because some of the image content needs to be kept confidential. Image encryption is an effective solution to guarantee image data security. The encryption process converts the original image into another incomprehensible image. The ideal cipher text image is not intelligible. Such a cipher text image could be stored or transmitted across insecure networks without content leaking to anyone except the intended recipient. Since images have certain characteristics such as bulk data size and high intercorrelation, image encryption schemes focus on destroying the correlation between neighbor pixels. This requires that cipher text image should appear as meaningless noises.

Since image encryption has attracted extensive attention, various encryption schemes have been proposed in recent years. As a result, efficient image encryption performance evaluation is desired. To evaluate the encryption performance helps optimizing parameter setting as well as improving encryption scheme [1]. Subjective preference test provides several cipher text images and asks the observers to find out the one that they consider to be best performed. The subjective test is intuitive and straightforward. However, it is inconvenient, time-consuming, and expensive. Hence, it has led to a rising demand for objective evaluations. Image encryption destroys the correlation between neighbor pixels. Cipher text image should be nonrelative to the original plain text image and appear like meaningless noise. As a result, there are two kinds of objective evaluation metrics. The first kind tries to estimate the relation between cipher text image and plain text image. Correlation coefficient is a metric of this kind, which calculates the correlation coefficient between pixels in the same location in the plain text and cipher text images [2]. Another widely used metric is the deviation from identity, which measures the deviation of the histogram of the encrypted image from the histogram of the ideally encrypted image [3]. This kind of objective metrics requires the knowledge of reference image. The other kind focuses on randomness test of cipher text image. This kind of objective metrics does not need the plain text image as reference [4]. The nonreference objective encryption performance evaluation is more practical because the reference image is not always available. Several statistical tests for randomness could be employed to evaluate image encryption performance, such as approximate entropy and block frequency [5]. These evaluations usually require transformation of image to bit stream.

This paper proposes a nonreference objective metric to evaluate the image encryption performance. The novel metric is based on Poker Test with consideration of pixel neighborhood relationships. First traditional objective criterions are discussed. Then, Poker Test is introduced. The novel metric is described. Finally, performance of the metric is evaluated to check the effectiveness in Section 5.

2. Traditional Metrics

When an image encryption scheme is proposed, several performance analyses will be implemented to estimate the effectiveness of novel encryption scheme. The most widely used evaluation criterions include encryption quality and Shannon entropy.

2.1. Encryption Quality

The encryption quality is designed to measure the change rate of pixel values when encryption is applied to an image [6]. Higher change in pixel value indicates the image encryption and the encryption quality to be more effective. The encryption quality is expressed in terms of total changes in pixels values between the plain text image and the cipher text image.

Let and denote the plain text image and cipher text image, respectively. Assume and are the numbers of occurrences for each gray level in the plain text image and cipher text image; the total gray level is . Encryption quality is defined as follows:

The encryption quality could not estimate encryption based on image pixel coordinate permutation. The pixel positions shuffle without values changing makes no difference between and . In this case, encryption quality could not evaluate the performance of encryption. Moreover, encryption quality requires the knowledge of plain text image , which is not always available.

2.2. Shannon Entropy

Shannon entropy or information entropy is a measurement of uncertainty. The greater value of entropy indicates the system is more random. When a random variable’s probability distributes equally, the Shannon entropy will be the biggest. If the Shannon entropy of a cipher text image approaches the theoretical peak value, the encryption will be considered as effective [7].

Assume the probability of pixel value is and the total gray level is ; Shannon entropy of the image is calculated byShannon entropy requires no reference image. However, it cannot evaluate the encryption based on pixel position shuffle either. The neighbor correlations of pixels are not considered.

2.3. Autocorrelation

Autocorrelation is the cross-correlation of a signal with itself at different points in time [8]. Pixels in meaningful image are correlative in horizontal, vertical, and diagonal directions. Effective image encryption destroys the correlation among adjacent pixels. Thus, two-dimensional autocorrelation values could be adopted to evaluate the performance of image encryption. As the correlation between adjacent pixels is the focus, normalized autocorrelation values of one pixel shifted on these three directions are employed as the measurement of image encryption quality. That means the center of the mask image is shifted from the center of original image with one pixel. The normalized 2D autocorrelation of image is defined asin which is the pixel value in image at position . is the mean value of the image. If the size of the image is , will be a matrix with size of . The normalized autocorrelation value with one-pixel shift is the value in matrix at position of , , and

2.4. Gap Test

The gap test is used for testing randomness of a sequence. It is concerned with the number of gaps in any particular class of digits [9]. The elements of the sequence are classified into two categories: the elements with values between zero and upper bound, marked as 0, and the elements with other values, marked as 1. The continuous zeros are defined as “gap.” The length of zero gaps is counted. Then the chi-square statistic is calculated byin which is the number of zero gaps of length and is the probability of the length of zero gaps equal to . is calculated by , where is the probability that the element belongs to category 0.

3. Poker Test

Poker Test is a randomness test. It treats numbers grouped together as a poker’s hand. The hands obtained are compared to what is expected. The classical Poker Test consists of using all possible categories obtained from poker that uses hands of five numbers. In practice, Poker Test can be applied without being restricted to hands of five numbers. For cryptography application, four numbers are more convenient to deal with bit streams [10].

National Institute of Standards and Technology designed randomness test FIP 140-2 with Poker Test as the second test [11]. The Poker Test in FIP 140-2 is defined as follows: a single bit stream of 20,000 consecutive bits shall be divided into 5,000 nonoverlapping parts. These parts are called nibbles [12]. One nibble consists of 4 bits. The numbers of occurrences of each of the 24 = 16 possible values are counted and stored. The Poker Test determines whether these nibbles appear approximately in the same frequency. The chi-square test is employed to evaluate the randomness by formulain which is the number of the th possible values. It is obvious that Poker Test deals with bit stream and the number of the streams is constrained. A cipher image is a two-dimensional matrix with unknown size and integer values of elements. These issues require being addressed if we try to evaluate the randomness of cipher text image by the Poker Test.

4. Novel Metric

Image encryption destroys the correlation between neighbor pixels. Cipher text image should appear like meaningless noise as much as possible. Effective image encryption should generate randomness like cipher text image. The novel encryption performance evaluation metric is based on randomness test of cipher text image and pixel neighborhood correlations. Figure 1 provides the flowchart of encryption performance evaluation by proposed novel metric.

4.1. Pixel Value Binarization

The cipher text image consists of pixels with integer decimal values. The values need to be binarized for Poker Test because there are too many possible combinations for pixel values. The cipher text image can be converted into a binary image by tossing coin formula [13] in which is the pixel value at position and denotes the average value of all pixels in the cipher text image.

After the binarization, an image with binary pixel values is obtained.

4.2. Neighborhood Selection

The binarized cipher text image is divided into small cells that have the same size. One cell covers a local neighborhood. The size of cell is flexible, which is related to the neighborhood selection. Generally using 8-neighborhood of current pixel corresponds to a size cell. A size cell considers the adjacent pixels in the horizontal, vertical, and diagonal direction. It is another neighborhood selection. Besides these two kinds of size, one-dimensional sizes like or are also workable. If the cipher text image size cannot be divided with no remainder by the cell size, the remainder of last rows and columns will be abandoned in the succeeding procedures. The reason is that cell size is small. Thus, the remainders are also not big. The small sizes of remainders have little influence on the randomness property of the whole image.

The size of neighborhood selection affects the bit segment of bit stream. Different size generates different length of bit segment. The possible combination numbers of the bit segments are also different. This will lead to the difference of chi-square value because the number of possible occurrences influences the value. In this case, the evaluation procedure should be kept consistent with neighborhood selection when estimating encryption schemes. Otherwise, the results make no sense.

4.3. Bit Stream Production

After neighborhood selection, small cells with the same size are acquired. The cell is two-dimensional in most cases. Thus, it is required to convert these cells into bit stream for Poker Test. The cells are clip scanned to produce bit segments. Figure 2 shows the schematic of clip scan. This kind of scan makes adjacent pixels in a cell still adjacent in the output bit segment. The procedure keeps neighborhood correlations of binarized cipher text image as much as possible. The segments construct a bit stream, which is going to be used in Poker Test.

4.4. Poker Test

The bit stream consists of bit segments. These bit segments have the same length of , in which and are the numbers of cell rows and columns. There will be possible values of these segments. For each possible value, the number of occurrences is counted. This processing produces a histogram of bins. If the original pixel value is used to run the Poker Test, there will be bins for an 8-bit grey image. The computational cost is vast if the binarization step is skipped. Image has bulk data size. Thus the number of these occurrences is big enough, which is suitable for further statistic test. Chi-square test is employed to evaluate the similarity of observed and expected data. The expected data distributes evenly according to the hypothesis: “the occurrence of every possible value is the same.” The random distributed bit stream supports the hypothesis.

The observed occurrence of possible value is assumed as ; the expected occurrence of every possible value is Then the chi-square value is calculated byThe computed value approximately follows the chi-square distribution with degree of freedom. When the neighborhood is selected, the numbers of and are fixed and the degree of freedom is fixed. The more random distributed bit stream has smaller result value from (6), which means the cipher text image is more random-like. In this case, the encryption scheme that produces the cipher text image performs better.

5. Experiment

5.1. Evaluation of Pixel Position Encryption

Arnold’s cat map is a chaotic map. It could be employed to encrypt image by transforming pixel positions [14]. The encrypted image randomizes the original one with enough iteration. There are quite a number of image encryption algorithms based on Arnold’s cat map [15, 16]. Usually pixel value scrambling is combined with the position transformation to improve the security.

In this section, the classical Arnold’s cat map image encryption algorithm is adopted to verify the proposed performance evaluation metric. The pixel in position of the original image is transformed to position in the cipher text image byin which is the number of rows or columns of the image. The transformation could be iterated to gain more random-like results. However, if the iteration goes beyond necessary, the original image will be recovered. Figure 3 shows Arnold’s cat map encryption result of image Lena with different iterations. The pixels rapidly degenerate into a chaos by iteration number five. Then the appearance of ciphered image is less random-like by iterations seven and eight. After that, the pixels again show chaos like at iterations nine and ten. When (9) is iterated more than 12 times, the cipher text images show some pattern. At iteration 15, the cipher text image is the same as the original plain text image.

Because there is no pixel value confusion processing in this encryption, the histogram of any ciphered image in Figure 3 is kept the same as the histogram of the original. Thus, the encryption quality calculated by (1) will be zero for every ciphered image. And the Shannon entropy calculated by (2) will be the same for every image in Figure 3.

The proposed metric is affected by the neighborhood selection as mentioned in former section. Table 1 lists the evaluation results of Figure 3 by the proposed novel metric with different neighborhood selections. Figure 4 presents the curve of evaluation value with respect to iteration number. It could be observed that the evaluation result curves are shaped like W. This is consistent with the impression of cipher image randomness in Figure 3. The middle bounce in the curve is not clear as neighborhood size is growing. This phenomenon appears because of the evaluation value growing at the first and last iteration. However, the evaluation results in Table 1 indicate the changing trends are the same in every column: decreasing, a little bounce off the bottom, decreasing again, and then increasing.

The novel metric ranks these cipher images performance by evaluation results’ descending order as in Table 2. The poorest performance is considered to be cipher text image at iterations 1 and 14. This is consistent with the impression in Figure 3. The best performance of cipher image varies as the neighborhood selection changes. That is because the W shaped curve has two valleys. The ranking results demonstrate that the novel metric is efficient to evaluate the encryption performance. Moreover, the evaluation result is little affected by the neighborhood selection.

As the selection of neighborhood varies, the processing time for evaluation changes. Table 3 provides the time cost of novel metric with different neighborhood selection. It is clear that the time cost decreased as neighborhood area increased. It is because small neighborhood fragments the image area, which increases clip scan processing times. Since the evaluation result is affected little by the neighborhood selection, bigger area neighborhood is recommended. However, as the neighborhood area is growing, the numbers of possible combinations of the bit segment will grow exponentially. That causes high burden on computer memory. In this case, the area neighborhood should not be too big. Thus, the recommended neighborhood is 4 × 4, 4 × 5, and 5 × 4.

Normalized autocorrelation values with one-pixel shift on different directions are employed to measure the encryption performance in Figure 3 as a comparison. The center of the mask image is shifted from the center of original image one pixel in horizontal, vertical, and diagonal directions. Tables 4 and 5 present the evaluation results and performance ranking of autocorrelation. Because minus one value of autocorrelation indicates strong minus correlative, the absolute value in Table 4 is used to produce the ranking result in Table 5. The last several rows of vertical direction in Table 5 show there is a problem when evaluating the poorest performed cipher images. It is clear in Figure 3 that cipher images of iterations 1 and 14 are the poorest performed with iterations 2 and 13 after them. However, the vertical directional normalized autocorrelation ranks cipher image 13 in a high position. It is even higher than cipher image with iterations 3 and 8. This is not consistent with the visual impression in Figure 3.

Another comparison is presented by the gap test. The cipher text images are stretched to a vector column by column to perform the gap test. As the cipher text images in Figure 3 are with 8-bit grey levels, the upper bound is set to be 51, 128, and 179 for the elements classification. The three values are 20%, 50%, and 70% of the maximum grey level. Then the chi-square is calculated by (4) as the evaluation. Tables 6 and 7 present the evaluation results and performance ranking. The gap tests have some problems when evaluating the four poorest performed cipher text images. When the upper bound value is set to be 20% of the maximum value, cipher text image of iteration 2 is ranked in a high position compared to iteration 7. When the upper bound value is set to be 70% of the maximum value, cipher text images of iterations 1, 2, and 13 are ranked even higher than cipher images with iterations 10 and 5. This is not consistent with the visual impression.

5.2. Evaluation of More Sophisticated Encryption

In this section, image encryptions with pixel value scrambling and position shuffle are used to verify the novel evaluation metric. The plain text image is first encrypted by Vigenere cipher to scramble its pixel grey level, and then the pixel positions are shuffled. Coupled logistic map is employed to produce the pseudorandom sequences for the encryption [17]. The plain text image and cipher text images are shown in Figure 5. Figure 5(b) is the cipher text image with only position shuffle. Figure 5(c) is the cipher text image with only pixel value scrambling processing; Figure 5(d) is the cipher text image with both processing. The keys are the same. The complex processing brings higher security. Thus, Figure 5(d) should be evaluated as the best performed.

The evaluation results for Figures 5(b), 5(c), and 5(d) by the proposed metric and other metrics are provided in Table 8. It is clear that the proposed metric with different neighborhood selection gives Figure 5(d) the lowest value. That means the cipher text image encrypted both by pixel scrambling and by position shuffle is considered as the best encrypted. All of the proposed metrics also score Figure 5(b) as the poorest performed. This is consistent with the performance analysis. Normalized autocorrelation assigns Figure 5(c) the smallest absolute value except vertical direction. The gap test also has some problems when evaluating Figures 5(c) and 5(d). In the last column, Figure 5(b) has the same entropy as the original plain text image while Figures 5(c) and 5(d) have the same entropy. This is because their grey level histograms distribute the same.

6. Conclusion

This paper proposes a nonreference objective metric to evaluate the image encryption performance. The novel metric considers pixel neighborhood relationships. Poker Test is employed to test the randomness of the cipher text image. The metric can efficiently estimate encryption based on image pixel coordinate permutation. The experiment recommends several neighborhood selections.

Competing Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.


The project is supported by National Natural Science Foundation of China (Grant nos. 61402051 and 51278058), the 111 Project (no. B14043), Natural Science Basic Research Plan in Shaanxi Province of China (Program no. 2016JM6076), and the Young Scientists Fund of Natural Science Foundation of Shaanxi Province (Program no. 2015JQ6239).