Abstract

In this study, a pioneer selective video encryption (PSVE) algorithm is proposed based on the pseudorandom number generator (PRNG) of the Zipf distribution (Z-PRNG). It is a general algorithm with high efficiency and security. The encryption process is completely separable from the video coding process. In the PSVE algorithm, Z-PRNG is designed based on the 3D SCL-HMC hyperchaotic map. Firstly, encapsulated byte sequence payloads (EBSPs) are extracted from the video bitstream. Secondly, random numbers of the Zipf distribution are generated by Z-PRNG, and they are used to randomly select encrypted data from each EBSP. Lastly, the extracted data are encrypted by AES-CTR to obtain the encrypted video. Compared with existing algorithms, the encryption position is more flexible, and the key space is further enhanced. The high efficiency video coding (HEVC) video and the advanced video coding (AVC) video are taken as examples to test the PSVE algorithm. The analysis results show that the proposed scheme can effectively resist common attacks, and its time complexity is much less than most existing algorithms.

1. Introduction

With the development of network technology and the availability of multimedia applications, video has become the most prevalent information carrier in visualized communication. Statistics show that video accounts for about 80% of data traffic. Therefore, the security and efficiency of video communication have been widely noted in medicine, education, military, and other fields.

At present, many video encryption algorithms (VEAs) have been proposed based on video coding standards, such as H.264/AVC [1] and H.265/HEVC [2]. Video compression coding can greatly reduce data redundancy. Therefore, compared with VEAs before compression coding, VEAs based on compression coding are more efficient and more appropriate for real-time video protection. These VEAs are roughly divided into two categories: (1) encryption through video coding [323] and (2) encryption after video coding [2432]. Category (1) is embedded into one of the steps of the video coding standard, mostly before entropy coding. Although these algorithms prove their effectiveness, they must modify the structure of the video coding standard. Therefore, category (1) is not applicable to existing standard codecs. In contrast, category (2) has more flexibility and is implemented separately from video coding, so it is more suitable for encrypting encoded video bitstream. However, the encryption position can be more flexible and the encryption efficiency can be further improved.

Chaos plays a significant role in cryptography in that chaotic signals present randomness, high complexity, and sensitivity to initial conditions. Sallam et al. proposed an efficient HEVC selective stream encryption algorithm based on the logistic chaotic map [10]. Nevertheless, one-dimensional chaotic map is relatively simple in structure and the system parameter is easily estimated. Liu et al. designed a video encryption scheme based on the integer dynamic coupling tent map [17]. The improved tent map is more effective at encrypting video, but the improvement is shown to be limited. Therefore, the random performance of chaotic systems for video encryption can be further enhanced.

A pioneer selective video encryption (PSVE) algorithm is proposed in this study. It is completely separable from video coding and the encryption position extends to the encapsulated byte sequence payloads (EBSPs) of the video bitstream. In the PSVE algorithm, a pseudorandom number generator (PRNG) of Zipf distribution, namely, Z-PRNG, is generated based on the 3D SCL-HMC hyperchaotic map. It determines the encryption position in the EBSPs and the corresponding data are extracted. The traditional encryption algorithm AES-CTR is used to encrypt the extracted data. Different from existing VEAs, the key structure of the proposed algorithm has two parts: (1) the initial key of the 3D SCL-HMC and (2) the initial key of AES-CTR . The encryption position varies with the initial key of the 3D SCL-HMC map . High efficiency video coding (HEVC) and advanced video coding (AVC) are two common video coding standards, and they are used to test the proposed PSVE algorithm. Security analysis results show that it can resist common attacks. Encryption efficiency analysis results show that the time complexity is much less than most existing VEAs. The main contributions of this study are as follows:(i)A novel PRNG is designed based on the 3D SCL-HMC hyperchaotic map, namely, Z-PRNG. It is in Zipf distribution. Compared with random numbers of uniform distribution, the proposed Z-PRNG is more applicable in video encryption.(ii)A PSVE scheme is proposed using Z-PRNG. It applies to H.265/HEVC bitstream and H.264/AVC bitstream. The performance analysis demonstrates that the proposed PSVE scheme has high efficiency and it is highly capable of resisting attacks.(iii)The performance of the proposed PSVE scheme is compared with other existing schemes. The results show that the PSVE scheme causes significant improvements in key space, resistance to chosen-plaintext attack, and separability while meeting the compression ratio and real-time requirements.

The rest of this study is organized as follows. The related work of video encryption based on compression coding is introduced in Section 2. In Section 3, data extraction based on Z-PRNG is presented. In Section 4, the PSVE scheme is proposed. In Section 5, the performance of the PSVE algorithm is analyzed. Finally, the conclusion and future directions are discussed in Section 6.

Most existing VEAs are inseparable from video coding. Neda et al. encrypted DCT coefficients after the zigzag scanning in the AVC standard [8], unaffecting the bitstream compliance and compression ratio. Peng et al. encrypted multiple syntax elements before the entropy coding process in the HEVC standard, and coefficient scrambling is utilized to further improve the security [14]. However, these algorithms must modify the structure of the video coding standard to a specific codec including the encryption process. Thus, they are not applicable to existing standard codecs and are not compatible with other video coding formats. Moreover, the cipher stream of their algorithms has no correlation with the original video sequence, thus failing to resistchosen-plaintext attack. Compared with encryption algorithms through video coding, existing VEAs after video coding are more flexible to encrypt the encoded video bitstream. Yang et al. partly decoded the HEVC bitstream, and then, four syntax elements in bypass mode are extracted to encrypt [24]. Li et al. encrypted the most significant bits for video reconstruction without changing the structure of the H.264 standard codec [26]. In the two schemes, the compression and encryption processes were carried out separately. However, they only selected the syntax elements encoded by fixed length, such as the suffix of Exp-Golomb code and truncated rice code. These VEAs ensure the format compatibility of encrypted videos. Nevertheless, the encryption position is shown to be limited. In video encryption, format compatibility is dispensable in some practical applications. Taking video secure storage as an example, encrypted video content is unnecessary to maintain format compatibility. In addition, partial decoding is indispensable in these algorithms, consuming most of the decryption time. Then, some VEAs are based on the bitstream of network abstract layer unit (NALU) data, breaking the format compatibility. Lee and Jang encrypted the HEVC bitstream using the NALU Header [29], whereas the NALU header records the content characteristics of NALU, and it ensures the fault tolerance of online video information transmission. Therefore, NALU header is not suitable for encryption. Hao et al. selected VPS, SPS, and PPS to encrypt private videos [30]. However, the extracted data are orderly and many video sequences have identical parameter set information. Therefore, VPS, SPS, and PPS are also not appropriate for video encryption. Ting et al. utilized the bitstream of video coding layer (VCL) data [31]. The NALU bodies of I slices are encrypted by the AES block cipher and those of B/P slices are encrypted with the CBC-mode XOR algorithm. The algorithm is capable of resisting chosen-plaintext attack. However, the position information of encrypted videos is still public and can be directly obtained without the initial key. Meanwhile, the encryption efficiency remains to be further improved.

To solve the above problems in the existing VEAs, the PSVE algorithm is proposed using the 3D SCL-HMC and Z-PRNG.

3. Data Extraction Based on Z-PRNG

3.1. Data Extraction in Video Bitstream

NALU is the encapsulation for the H.265/HEVC and H.264/AVC standard. The video bitstream is transmitted over the network in the form of NALUs. The NALU has two categories, including (1) VCL NALU and (2) non-VCL NALU. Every NALU consists of NALU header and an EBSP.

The most sensitive data are expected to be extracted from the video bitstream. As shown in Figure 1, the EBSPs in VCL NALU are extracted. In the H.265/HEVC or H.264/AVC standards, different EBSPs are encoded and decoded independently. As the video bitstream is highly compressed, the data in the same EBSP are highly correlated. If the front data of one EBSP are encrypted, it will have a significant impact on the following data in the same EBSP. By Z-PRNG, more front data in EBSPs are extracted and then encrypted, so the video stream will undergo distortion propagated from the encrypted data. The specific process of generating Z-PRNG is introduced in 4.1.

3.2. The 3D SCL-HMC Chaotic Map
3.2.1. The Definition of 3D SCL-HMC

In this section, a novel three-dimensional chaotic map, namely, 3D SCL-HMC, is proposed based on the hybrid modulation coupling (HMC) pattern [33]. The seed maps of 3D SCL-HMC are Sine map, Chebyshev map, and a simple linear map, and then, the 3D SCL-HMC is defined as follows:where , , and are the state variables of the 3D SCL-HMC map, , , , and are the system parameters, , , , and . and are the system parameters of Sine chaotic map. and are the system parameters of Chebyshev chaotic map and the simple linear map, respectively. In the 3D SCL-HMC chaotic map, parameter controls amplitude. Parameters and are perturbation frequencies. Parameter controls offset degree. We set ; the attractor of the 3D SCL-HMC map is shown in Figure 2. With three state variables and four system parameters, its distribution is more random than its seed chaotic maps.

3.2.2. Performance Evaluation of the 3D SCL-HMC

The Lyapunov exponent (LE) spectrum and permutation entropy (PE) complexity are used to analyze the dynamical properties of the 3D SCL-HMC map. The LE spectrum is a significant indicator for analyzing dynamics of chaotic maps. Figure 3 shows the LE spectrum versus parameters , and . , , and are LEs. A positive LE means that the adjacent trajectories in this direction are exponentially separated, and it indicates that the system is chaotic. If there exist two or more positive LEs, the system is hyperchaotic. From Figure 3, the 3D SCL-HMC map has three stable positive LEs, verifying that it is hyperchaotic.

PE complexity is an effective way to evaluate the predictability and randomness of chaotic maps. Figure 4 analyzes the PE complexity of the 3D SCL-HMC map. Compared with three existing chaotic maps [33, 34], it has a larger and more stable PE complexity value. Therefore, the 3D SCL-HMC hyperchaotic map has higher complexity and more stable dynamical behaviors, which is feasible for video encryption.

4. The Proposed PSVE Scheme

4.1. The Generation of Z-PRNG

Based on the 3D SCL-HMC map, the random number is generated by

Figure 5 shows the frequency distribution of . It produces an almost uniform distribution at the interval (0, 1), which has good randomness for Z-PRNG. Subsequently, using the inverse transformation, Z-PRNG is designed aswhere denotes the minimal integer that is not less than and is the total number of bytes in each EBSP. ; thus, . For every iteration, is generated by equation (3). Then, the th byte in EBSP is extracted based on Z-PRNG . is also recorded simultaneously, and it determines encryption position. As shown in Figure 6, when the random number is equal to 1, the first byte in EBSP is extracted. Based on the recorded value , the encrypted byte can be put back into the original EBSP. As is a PRNG of Zipf distribution, more front bytes in EBSPs are extracted.

Here, a comparative experiment is carried out to verify the validity of Z-PRNG. Taking the RaceHorses video sequence as an example, 10 bytes are extracted from the first frame. In Figure 7(b), the bytes are randomly extracted; thus, each byte has the same probability to be extracted. In Figure 7(c), Z-PRNG is used and more front bytes in each EBSP are extracted. Obviously, Figure 7(c) has higher perceived security. Hence, Z-PRNG is more suitable for video encryption.

Moreover, the encryption position is fixed in the existing VEAs. The position information of encrypted videos is public and can be obtained without the initial key. In the proposed algorithm, the encryption position varies with the initial key of the 3D SCL-HMC map . Both position information and encryption data information are indispensable to decrypt video. Therefore, compared with existing VEAs, the proposed algorithm based on Z-PRNG has a high level of difficulty to be penetrated.

4.2. Video Encryption Algorithm

As shown in Figure 8, the encryption process is independent of the video coding process. After video coding, the specific encryption algorithm is introduced as follows.

4.2.1. Data Extraction

Step 1. based on the video bitstream, we extract the EBSPs in VCL NALU, namely, .

Step 2. we set the initial key of the 3D SCL-HMC chaotic map and then design Z-PRNG by equation (3).

Step 3. for each EBSP in , we generate random numbers based on Z-PRNG . is the number of extracted bytes in each EBSP, which is defined as follows:where is the extraction ratio. By (4), at least 2 bytes are extracted from each EBSP in .

Step 4. based on the random numbers ,…, , we extract the th byte, th byte,…, th byte from each EBSP in . By this way, the extracted data are obtained in .

Step 5. we record the value corresponding to each extracted byte, which determines the encryption position.

4.2.2. Data Encryption

Step 6. we set the initial key of AES-CTR, and we then encrypt the extracted data by AES-CTR. Thus, the encrypted data are obtained.

Step 7. we put back into the original video bitstream based on the encryption position information . Then, the encrypted video bitstream is obtained.

4.3. Video Decryption Algorithm

As shown in Figure 9, the video decryption algorithm is the inverse of the encryption algorithm. Similarly, by the initial key , an identical Z-PRNG is generated, and it is used to extract from . Using the initial key , is decrypted into by AES-CTR. Putting back into based on Z-PRNG , the decrypted video bitstream is obtained.

5. Experimental Results and Performance Analysis

Two experiments are carried out to test the PSVE algorithm. The first one is based on HEVC bitstream and the second one is applied to AVC bitstream.

5.1. HEVC Bitstream

In this section, twelve benchmark video sequences are used to analyze the proposed scheme. Table 1 defines these video sequences. These video sequences have different resolutions, motions, colors, objects, and frame rates. They are compressed into the video bitstream by H.265 encoder HM-16.9 and are decoded based on the FFmpeg library. The proposed PSVE scheme is applied to the twelve benchmark video sequences for low delay mode.

5.1.1. Experimental Results

The extraction ratio is set to 0.001, and QP is set to 24. Figure 10 shows the encryption results of some sample frames. By encrypting only about 0.1% of the video bitstream, the decoded frames after encryption have obvious visual degradation, indicating that the proposed PSVE algorithm is effective.

5.1.2. Key Space Analysis

The key security can be satisfied when the key space is more than . The key structure of the proposed PSVE algorithm has two parts: (1) the initial key of the 3D SCL-HMC chaotic map and (2) the initial key of AES-CTR algorithm . determines the encryption position and is used to decrypt the encrypted bytes. The 3D SCL-HMC chaotic map has three initial values and four system parameters . From Figure 3(a), since the hyperchaotic state of parameter is only stable in the range of [0.7 1], is set as constant 1 in the proposed PSVE scheme. Then, the key structure of is shown in Figure 11. From Figures 2 and 3(b)3(d), the 3D SCL-HMC can be ensured in hyperchaotic state when and . If stored in a double-precision decimal based on IEEE 754 standard, the 52-bit fraction can be traversed for , and c. Thereby, the key space . The key length of is 256 bits; thus, the key space . Therefore, the key space of the proposed PSVE algorithm , indicating that brute-force attack can be effectively resisted in the proposed algorithm.

5.1.3. Key Sensitivity Analysis

To ensure key validity, key sensitivity is used to evaluate cryptosystems. Taking Foreman as an example, is set as {0.1, 0.2, 0.3, 4, 4, 4}, and is set as {0xA0, 0x7F, 0xA5, 0xAE, 0x2C, 0x26, 0xCB, 0xAB, 0xEA, 0xA6, 0x18, 0x3D, 0x59, 0x5C, 0x22, 0x27, 0x31, 0x85, 0xEB, 0x0E, 0x88, 0xFF, 0x00, 0xCF, 0x67, 0x60, 0x4A, 0x7E, 0xDE, 0x46, 0x76, and 0xCF}. is slightly changed by only 1 bit to obtain , and is defined as {0xB0, 0x7F, 0xA5, 0xAE, 0x2C, 0x26, 0xCB, 0xAB, 0xEA, 0xA6, 0x18, 0x3D, 0x59, 0x5C, 0x22, 0x27, 0x31, 0x85, 0xEB, 0x0E, 0x88, 0xFF, 0x00, 0xCF, 0x67, 0x60, 0x4A, 0x7E, 0xDE, 0x46, 0x76, 0xCF}. The test results are displayed in Figure 12. When or is slightly changed, the encrypted frame has an obvious difference. In the decryption process, the video information of decrypted frame is still obscured when a slight disturbance is applied to the key. Therefore, the PSVE scheme has key sensitivity.

Peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) are two objective indicators to evaluate the quality of the encrypted frames. The PSNR is described as follows:where is the size of the video frame. and represent the original frame and the encrypted frame, respectively. Generally, when , the encrypted frame provides little information about the original frame. The SSIM is defined as follows:where , , , , and are the average, variance, and covariance values of the original frame and the encrypted frame . In (6), L = 255, k1 = 0.01, k2 = 0.03, and SSIM . A smaller value of SSIM implies a higher level of distortion.

Table 2 shows the results without encryption and with the PSVE scheme for different QPs. The average PSNR without encryption is greater than 40. Using the PSVE scheme, the average PSNR is reduced to below 10. The PSNR value has a significant distortion, indicating that the proposed PSVE scheme has a remarkable effect on the quality of the video sequences. The average SSIM without encryption is greater than 0.95. The video content of the decoded frames can be clearly and objectively recognized. By contrast, the average SSIM is down to roughly 0.2 based on the PSVE scheme. Hence, the texture and color information of video frames have been greatly protected.

5.1.4. Edge Detection Attack Analysis

Edge detection is a significant indicator for evaluating the similarity of video frames before and after encryption. Human visual cells are particularly sensitive to the edges of objects. Provided that an encryption algorithm fails to resist edge detection attacks, some edge information will be detected in the encrypted video frames, thereby obtaining the outline and texture features of the original video frames. Here, the Laplace edge detection method is used to analyze the edge difference between the original and the encrypted video frames. Figure 13 shows the edge information for the RaceHorses video and FourPeople video. The results indicate that the structural information is heavily distorted in the encrypted video frames.

5.1.5. Chosen-Plaintext Attack Analysis

The chosen-plaintext attack is a common classical cipher attack. In this case, different plaintexts are chosen and the corresponding ciphertexts can be obtained by attackers. To decrypt the PSVE algorithm, two steps are carried out for attackers to obtain the video information: (1) determining the encryption position and (2) decrypting the encrypted bytes. Here, the twelve benchmark video sequences are encrypted using the identical and . Although and are fixed, remains different for different NALUs. Table 3 lists the first four encryption position for two NALUs in different video sequences. It indicates that the encryption position still varies with different NALUs. Therefore, with identical and , provided that a video sequence and its corresponding encrypted video sequence are obtained by attackers, the encryption position and encryption bytes in other video sequences are still unable to be inferred. Thus, the chosen-plaintext attack is effectively resisted in the proposed PSVE algorithm.

5.1.6. Replacement Attack Analysis

To further analyze the security of the PSVE algorithm, a replacement attack is introduced. In this case, the encrypted data is replaced with arbitrary data to obtain plaintext information. Constant 0, constant 1, and the most likely values by statistical analysis are three common data to measure the replacement attack. For example, Xu et al. replaced the encrypted data with constant values 0 and 1 [5]; Liu et al. substituted the encrypted elements with the most likely values [17]. In the PSVE algorithm, the encrypted bytes have no statistical characteristics, which means they are unable to be statistically analyzed. Therefore, constant 0, constant 1 and a random sequence are used to replace the encrypted bytes. Figure 14 shows the results of Mobile video under replacement attack. The encrypted frame #1 and frame #30 in Mobile video are still not intelligible, verifying the effectiveness against the replacement attack.

To further analyze the resistance of the proposed PSVE algorithm to edge detection attack, the edge difference ratio (EDR) is introduced and it is defined as follows:where is the size of video frames and and represent the edge detection pixel value of the original video frames and the encrypted video frames, respectively. The maximum EDR is 1, and a higher EDR indicates that the structural information is more distorted. The EDR results for different video sequences are listed in Table 2. When directly decoding the video bitstream without encryption, the average EDR of the twelve video sequences is around 0.1. In this case, the decoded video frames can be clearly and objectively recognized by human eyes. After encryption, the average value is about 0.93. The video structural information has been severely confused, indicating that the proposed PSVE algorithm has high credibility to resist edge detection attack.

5.1.7. Encryption Efficiency Analysis

Besides security, encryption efficiency is also an essential indicator. The experimental environment is Visual Studio 2013 with an Intel Core i5-10400 CPU @ 2.90 GHz and 16.0 GB RAM on the Windows 10 OS. QP is set to 24. Table 4 shows the execution time of different video sequences. The ratio of encryption time to compression time measures the computational overhead difference between video coding with and without encryption. The average value of is about 0.004%, and it indicates that the encryption time is negligible compared with the compression encoding time . Table 5 shows the average ratio of encryption time to compression time for some existing schemes with format compatibility [7, 11, 17, 20, 24]. The experimental results show that the ratio is much less than other existing VEAs. Furthermore, Table 4 also shows the running time of directly encrypting the entire HEVC video bitstream using AES-CTR. The ratio is about 52.427%. Obviously, the execution time of the PSVE algorithm is much less than that of encrypting the entire HEVC video bitstream . It indicates that the encryption efficiency is greatly improved by the PSVE algorithm. Therefore, the proposed PSVE algorithm is highly efficient for HEVC video encryption.

5.1.8. Compression Ratio and Separability Analysis

Compression ratio and separability are also two significant indicators for VEAs. The compression ratio measures the variation of video size between the original video and the encrypted video. Most existing VEAs are performed after the binarization process in entropy coding. However, the arithmetic coding process may have a slight impact on the compression ratio [17]. In the PSVE algorithm, the video bitstream is encrypted after compression coding, so the video size remains unchanged. As shown in Table 6, the proportion is 0% for different video sequences, verifying that the proposed PSVE scheme has no effect on video compression efficiency. Separability measures the independence between the video compression coding process and the video encryption process. If a VEA is embedded into the compression coding process, the structure of the video coding standard is modified to a specific codec including the encryption scheme, which is thus not applicable to existing standard codecs and not compatible with other video coding formats. The PSVE algorithm is completely independent of the video compression process. Thereby, it has separability. With satisfactory performance of compression ratio and separability, the proposed algorithm is suitable for practical applications.

5.2. AVC Bitstream

The AVC bitstream was selected as another example to certify the effectiveness of the proposed PSVE algorithm. The videos are encoded into video bitstreams by the H.264 encoder JM-19.0 and are decoded by the FFmpeg library. We set . Mobile, PartyScene, and ChinaSpeed are used to analyze subjective vision of the PSVE scheme on H.264/AVC. Figure 15 shows the original video frames and the encrypted video frames. When approximately 0.1% of the video bitstream is encrypted, the encrypted frames have obvious distortion compared with the original decoded frames. To further analyze the visual degradation, Figure 16 presents the objective indicators for six test video sequences of different resolutions. Both PSNR and SSIM values show a significant degradation, verifying the validity of the PSVE algorithm.

5.3. Comparative Analysis

Here, the performance of the proposed PSVE algorithm is compared with several existing encryption schemes. Table 7 shows the comparison results concerning key space, chosen-plaintext attack, compression ratio, separability and time complexity. From Table 7, existing VEAs are sufficient to satisfy real-time requirements, and some VEAs also have a constant compression ratio. Nevertheless, the encryption position in most existing VEAs is public, and the key space is only determined by the cipher technique used in their VEAs. Additionally, most SEAs fail to resist chosen-plaintext attack, and they are embedded into video coding without separability properties. In contrast, the key structure of the proposed PSVE algorithm consists of and . determines the encryption position and is the initial key of cipher technique AES-CTR. Therefore, the PSVE scheme has a large key space. Besides, since the encryption position is closely related to and the encryption process is completely separable from the video coding process, the proposed PSVE scheme is highly capable of resisting chosen-plaintext attack and satisfies the separability requirement. Therefore, it is more reliable in practical applications.

6. Conclusion

In this study, a novel video encryption is proposed based on Z-PRNG. It provides visual protection for HEVC video and AVC video. Based on the 3D SCL-HMC hyperchaotic map, Z-PRNG is designed, and it is used to determine the encryption position in the video bitstream. Subsequently, AES-CTR algorithm is introduced to encrypt the extracted bytes. The performance is analyzed by key space, key sensitivity, edge detection attack, chosen-plaintext attack, replacement attack, encryption efficiency, compression ratio, and separability. The results show that it has a large key space and high security to resist common attacks. With a lower proportion of encrypted bytes, the time complexity is much less than existing VEAs. Furthermore, the proposed algorithm has an unchanged compression ratio, and it is completely independent of video coding. Therefore, the PSVE algorithm lays a foundation for the development of video VEAs and is applicable in video security. In the future, the proposed encryption algorithm will be applied to other information carriers, such as text, image, and audio.

Data Availability

The data that support the study can be obtained from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported in part by the Science, Technology and Innovation Commission of Shenzhen Municipality (Grant nos. WDZC20200818121348001, KCXFZ202002011010487, and SGDX2019091810120169).