Abstract

With the increasing application of advanced video coding (H.264/AVC) in the multimedia field, a great significance to research in video watermarking based on this video compression standard has been established. We propose a semifragile video watermarking algorithm, which can simultaneously implement frame attack and video tamper detection, herein. In this paper, the frame number is selected as the watermark information, and the relationship of the discrete cosine transform (DCT) nonzero coefficients is used as the authentication code. The subblocks, whose DCT nonzero coefficients are sufficiently complex, are selected to embed the watermark. The parities of these nonzero coefficients in the medium frequency are modulated to embed watermarks. The experimental results show that the visual quality of the embedded watermarked video is virtually unaffected, and the algorithm exhibits good robustness. Furthermore, the algorithm can correctly implement frame attack and video tamper detection.

1. Introduction

The rapid development of internet and multimedia technology has brought about the increasing popularity of digital videos, such as network TVs, online videos, and mobile videos [15]. These video applications greatly enrich people’s lives and engender ample convenience. However, the proliferation of videos and video-sharing also causes a variety of serious problems. With the help of various video editing tools, people can easily tamper and edit public video content illegally, as well as copy and spread it freely. However, these behaviors endanger the legitimate rights of the copyright owners. Recently, several major video hosting portals, such as Youku and Tudou, have been involved in copyright disputes, causing enormous losses to media manufacturers and restricting the application of digital multimedia. To protect the copyright ownership of original videos or to provide the content authentication of original works has become an urgent problem to grapple with. As an effective method to resolve the problem of copyright protection and content authentication of multimedia, digital watermarking technology has become a research focus in the field of information security [68]. The basic idea of video watermarking is to add some extra information to the original video without affecting its visibility, which can provide the evidence of copyright or integrity authentication for the video content when necessary.

With the acquirement of video quality, video sizes are becoming progressively larger. Due to their storage capacity and bandwidth, nearly all digital videos are transmitted with compression coding on the Internet or other transmission channels. Thus, video watermarking algorithms combining video compression standards are more applicable. As a new generation of video coding standard, the application of the advanced video coding (H.264/AVC) is widespread because of its higher compression efficiency and better network affinity. Presently, research on video watermarking algorithms based on H.264/AVC is becoming increasingly more active.

Usually, video watermarking is categorized into three types, namely, the fragile, robust, and semifragile watermarks. The fragile watermarking scheme is used to validate integrity authentication, and thus, it should be sensitive for all video manipulations with good transparency and large watermark capacity. Robust watermarking needs to resist most common video processing activities, such as recompression and filtering, and perhaps presents a greater sacrifice on transparency and watermark capacity. It is mainly applied for copyright protection. Semifragile watermarking is insensitive to common video processing operations, but it is sensitive to malicious attacks, and thus, it is mainly applied in tamper detection.

These different types of watermarking schemes based on (H.264/AVC) have developed in varying degrees in recent years, i.e., all types of watermarking schemes based on H.264/AVC have undergone considerable development. More researchers are beginning to concentrate on the semifragile video watermarking method because of its application for tampering detection [914]. Among them, the proposed algorithms in [911] can detect and locate tampering with different precision levels. Farfoura [10] provides a semifragile watermarking scheme for H.264/AVC. The scheme has a certain robustness for content-preserving manipulations and is sensitive to manipulations with content-changing. The algorithm utilizes dual watermarking. One watermark is embedded into I-frames to detect spatial tampering, and the other watermark is embedded into P-frames to identify temporal attack. Xu et al. [13] proposed a novel semifragile watermark, which generates feature codes through block energy and then embeds watermarks by using discrete cosine transform (DCT) coefficients on the diagonal, which has little effect on video quality. Cedillo [14] calculated the variance of the original pixel and the reference pixel in the blocks and selected the subblock with the smallest variance as the block to be embedded, and then hid the watermark content according to the coding characteristics of the prediction mode value. The algorithm has good transparency, but it needs to calculate the variance value of blocks, which has a large computational complexity. In [9], a semifragile video watermarking algorithm is proposed. In the context-adaptive variable-length coding (CAVLC) entropy encoding process of luminance block DCT coefficients, the watermark is embedded by modifying the last nonzero coefficient after zig-zag scanning. However, the algorithm is more complicated in design. In [15], a chaotic semifragile watermarking algorithm for copyright authentication is proposed. The time information of the video frame is used as a parameter combined with the modulation of the chaotic algorithm to be embedded in the video, thereby generating embedded watermark information and embedding it into the DCT coefficient. During the parsing process, mismatch of time information can be used to reveal tampering in the time domain. The disadvantage is that it is an uncompressed domain algorithm and cannot be applied to H.264/AVC. Facciol and Farrugia [16] proposed a reversible watermarking algorithm that applies differential extension rules to DCT block quantized coefficients. The algorithm firstly performs 16 quantized 4 s based on zig-zag scanning at the encoding end. The DCT coefficients are divided into eight groups, and it is determined whether or not each set of coefficients can be embedded in the watermark, and then the watermark is embedded therein.

Combining the advantages and disadvantages of the abovementioned existing algorithms, we herein propose an algorithm for simultaneously realizing video time domain and spatial domain integrity authentication. In our algorithm, the frame number is utilized as the watermark information, and the numerical relationship of the DCT coefficients is adopted as the authentication code. The watermark is embedded by changing the DCT coefficients of the luminance subblocks with more nonzero coefficients. Finally, the integrity of the video is detected by the watermark information, and the tamper detection is implemented by the authentication code.

This paper is organized as follows: in Section 2, the generation of watermarking and authentication code is given first, and then two parts of the scheme are briefly described in Section 3. Experimental results are given in Section 4, and the conclusions are drawn in the last section.

2. Generation of Watermark and Authentication Code

2.1. Generation of Watermark

To identify and verify a frame attack, one idea is to embed the frame number as watermark information into the current frame, such that the watermark information extracted from the video represents the frame number information of the video frame. If the extracted watermark can match the frame number, it indicates that the video is not subject to frame attack. Otherwise, the frame attack can be determined according to the relationship between the specific watermark and the frame number.

Here, we propose a new semifragile video watermarking algorithm, which is intended to encode the number of video frames into a binary sequence, and embed this binary sequence as watermark information into the current frame. After the watermark is extracted, the watermark sequence is decoded to a decimal number again. Finally, this value is compared with the number of the video frame to verify and identify the frame attack.

When the video is recompressed, the watermark extraction would be misplaced due to the transition of the prediction mode. A large number of experimental results prove that the higher the texture complexity of the video, the better the robustness of the watermark. To improve the robustness of the watermark, we choose to embed the watermark into the video region with a complex texture. To improve the correctness of frame number restoration, the watermark sequence is divided into three segments and embedded in the video in a loop. The specific watermark information scheme is as follows:

Step 1: combine the number of frames of a common video; we choose to encode the number value of the video frame into an 18-bit binary sequence. For example, if the frame number is 10, the binary sequence converted is 000000000000001010. The sequence does not have to be 18-bit, this can be decided according to the actual situation.

Step 2: divide the binary sequence into three groups. The first six values are the first group, the middle six values are the second group, and the last six values are the third group. We will embed these three sets of binary sequences as the watermark values to the frame. For example, if the frame number is 10, the binary sequence is 000000000000001010, the first set of sequences is 000000, the second set of sequences is 000000, and the third set of sequences is 001010.

Step 3: the watermark information sequence is handled by Arnold scrambles to obtain a secure watermark sequence. Finally, the watermark sequence should be embedded in the video repeatedly.

2.2. Generation of Authentication Code

The attack identification and verification of the frame can be implemented by the watermark information, and the tamper detection of the video requires the participation of the authentication code. The type of attack that the video is subjected to is usually determined based on the degree of change of the authentication code. Therefore, the authentication code needs to be insensitive to conventional attacks but sensitive to malicious attacks.

The main goal of major video tampering is to alter the interesting targets, which normally have complicated textures rather than the background, which is the flat region [17]. Therefore, we generate the authentication code from the region with complex texture region according to the DCT coefficients. In our algorithm, we select the numerical relationship of the video DCT coefficients as the authentication code, and the experimental results show that the authentication code is not sensitive to conventional attacks but to malicious attacks. The algorithm chooses to embed a watermark on the last nonzero coefficient. Therefore, the numerical relationship between the second last nonzero coefficient and the third last nonzero coefficient is specifically selected as the authentication code of the algorithm. The specific authentication code scheme is as follows:

Step 1: traverse each macroblock of each frame of video to determine the prediction mode of the macroblock. If the prediction mode of the macroblock is , skip it. If the prediction mode of the macroblock is , the execution continues.

Step 2: determine whether the macroblock includes six or more subblocks, which have at least three nonzero coefficients, if not, skip it. Otherwise, the execution continues.

Step 3: for the macroblock that satisfies the abovestated conditions, six subblocks with the most nonzero coefficients are selected, and the authentication code R is extracted in the order of the number of nonzero coefficients. Each subblock extracts a 1-bit authentication code, and each macroblock extracts a 6-bit authentication code. The authentication information is generated by comparing the relationship between the second last nonzero coefficient, , and the third last nonzero coefficient, , in each subblock, and the authentication codes are saved. The specific scheme is shown in Equation (1).

Step 4: after the aforementioned steps, we would get the authentication code array A[i](, is the sum of the authentication codes) and the authentication code flag array flag [j], corresponding to each macroblock (, is the sum of all macroblocks). Meanwhile, the watermarking image, including logo information are scrambled and converted into a binary sequence, , through dimension reduction. Then, the length of the sequence, , is calculated. Perform exclusive or operations on each element of array and the corresponding element of array in sequence, and the result is saved in array .

3. Watermark Embedding and Extraction Algorithm

3.1. Watermark Embedding Algorithm

Different from robust video watermarking, a semifragile video watermark is often derived from the characteristics of the video itself. To avoid misplacing the extracted watermark, the macroblock that does not satisfy the certain condition can also be embedded with a watermark with a value of -1 to achieve the purpose of “placeholder.” We choose to modify the parity of nonzero coefficients in a complex macroblock to embed the watermark. The specific watermark embedding scheme is as shown in Figure 1, and the detailed steps are as follows:

Step 1: the watermark binary sequence is equally divided into three sets of binary sequences.

Step 2: traverse all the macroblocks in the current frame, and select a macroblock that satisfies the following condition to perform the embedding of the binary watermark. The complex macroblocks include six or more subblocks, which have at least three nonzero coefficients. To avoid nonsynchronization of the extracted watermark, all other macroblocks in the frame that do not satisfy the abovementioned conditions are embedded into six watermark information units with a value of -1, and the video content is not modified.

Step 3: embed the watermark in the macroblock that meets the requirements of Step 2. Next, embed six binary watermarks in each luma macroblock, and select the watermark sequence value to be embedded, according to the remainder of the macroblock number divided by three.

Step 4: for the macroblock that satisfies the abovestated conditions, select six subblocks with the most nonzero coefficients, and embed the watermark in the order of the number of nonzero coefficients.

Step 5: for each filtered subblock, square the sum of the second last nonzero coefficient, , and the third last nonzero coefficient, , in each subblock. The watermark is embedded by modifying the last nonzero coefficient to satisfy the identical parity of the sum and the watermark value. The specific modification approach is as follows:

if and :

if and : where is the last nonzero coefficient in , is the sum of all nonzero coefficients, is the embedded watermark, and , represent rounding up and rounding down.

3.2. Watermark Extraction Algorithm and Authentication Code Judgment

The process of watermark extraction is equivalent to the inverse process of watermark embedding. The extracted watermark information contains the information of the frame number, so the frame number needs to be restored. The specific watermark extraction scheme is shown as Figure 2, and the detailed steps are as follows:

Step 1: traverse all the macroblocks in the current frame, and select a macroblock that satisfies the following condition to perform the embedding of the binary watermark. The prediction mode is and includes six or more subblocks, which have at least three nonzero coefficients. For all macroblocks in the frame that do not satisfy the abovementioned conditions, extract six watermarks with a value of -1.

Step 2: if the macroblock satisfies the aforementioned conditions, the six subblocks with the most nonzero coefficients are selected, and the watermark is extracted according to the order of the number of nonzero coefficients. If the square of the last nonzero coefficient is larger than the square sum of the second last and third last nonzero coefficients, then the corresponding watermark is one. If the square of the last nonzero coefficient is less than or equal to the square sum of the second last and third last nonzero coefficients, then the corresponding watermark value is zero. At the same time, the authentication code is restored.

Step 3: after extracting all the watermarks, the watermark sequences are grouped and classified. The watermark sequence should be classified into three groups. If the macroblock number is divided by three and the remainder is one, the watermark sequence corresponding to the macroblock belongs to the first group, and the other types are the same. The most frequently occurring watermark sequence is the correct result. The three sets of sequence values are then integrated into an 18-bit binary sequence that is converted to a decimal number, which is the number of the current frame.

Here, we should first determine whether the whole video has been maliciously attacked by the extracted watermark information. If the watermark information is completely correct, we assume that the video has not been tampered maliciously. Otherwise, the correct frame number cannot be extracted from the watermark sequence, and we presume that the video may be attacked unconventionally. In this case, the authentication code is adopted to determine the occurrence of tampering and locate the specific modification area in the tampered frames.

Given that we make use of the relationship between the second last nonzero coefficient, , and the third last nonzero coefficient, , as the authentication code, recompression can be resisted. If malicious content tamper in a frame occurs, it can be recognized because there would be a large change in the authentication code. The specific criteria for judgment are as follows:

Step 1: firstly, check whether the extracted watermark has problems in the macroblock. If the watermark information does not synchronize, continue to Step 2. Otherwise, we consider that there is no tampering.

Step 2: if the frame number based on the watermark extraction is incorrect, it is indicative that the video has been subjected to conventional processing or malicious tampering. Firstly, we should judge whether the flag of this macroblock is variant. If the flag is changed, the extracted information of the current macroblock is distant from the information in the encoder, and we can directly affirm that the current macroblock has been tampered. Otherwise, continue to Step 3.

Step 3: under the condition that the flag is changeless and the watermark is extracted successfully, we generate the authentication code through the same method in encoding. The number of 6-bit authentication code changed is necessary to make further decisions.

If there are or more authentication codes that change greatly in the current macroblock, it means that the macroblock has been tampered with maliciously. Otherwise, it means that the macroblock has been subjected to conventional video processing. Here, is a threshold, and it should be adjusted by the complexity of the video. Denote the authentication code sequence in the decoding as array , and perform exclusive or operations on each element of array and the corresponding element of array in sequence, and the result is saved as W’. Rearranging W’ into a two-dimensional array, the logo image can be reconstructed.

4. Results and Discussion

In this section, we would test the performance of our algorithm under different conditions. We realize our algorithm based on the reference joint test model (JM) software JM by H.264 using the version 8.6. Standard video sequences, including akiyo_qcif, container_qcif, mobile_qcif, news_qcif, tempete_qcif, carphone_qcif, coastguard_qcif, silent_cif, highway_cif, and flower_cif, are provided to demonstrate the effectiveness of the proposed watermarking algorithm. The size of all videos are and with quarter common intermediate format (QCIF) and common intermediate format (CIF), which adopts the baseline encoding mode. The frame rate is set to 30 frames per second, and the default quantization parameter is set to 28. The logo image is two-valued images, as shown as Figure 3.

4.1. Imperceptibility Test

To test the imperceptibility, we provide the comparison of the original sequence and the same reconstructed frame with embedded watermark for the different video. By the space limitation, we provide three video results, which are shown in Figure 4, whose left column is some frame of original sequence, and middle column is the corresponding frame of the same sequence with watermarking, and the right column is the pixel difference of both as seen in Figure 4; the video quality is almost unchanged by subjective judgment.

To better evaluate the invisibility of our watermarking method in an objective way, we introduce two evaluation standards of video quality evaluation, i.e., PSNR (peak signal to noise ratio) and SSIM (structural similarity index) [18] to compare the original video and the video with the embedded watermark. To eliminate the influence of randomness, we give the average of the PSNR and SSIM for all the frames in the video and compare our algorithm to [10]. The comparison results are as shown in Figures 5 and 6, respectively. Normally, if the PSNR is higher than 30, the quality of the processed image is better. From Figure 5, we can see that the PSNRs of our algorithm for various formats and different videos are all higher than 34.0, which proves that our method has inconsiderable effect on video quality. We also compare to [19], the result is shown in Figure 6. Though our method is slightly inferior than [19], the difference is very small, and other indicators have greater ground. From Figure 7, we can know that all the SSIMs of our algorithm are above 0.960, which is very close to 1, and this also proves that the image quality is fine. All SSIM results of different videos are better than [10], as shown in Figure 8. Because we only embed the information into the areas with more complex textures, the impact it has on the video quality is minimal. The abovementioned experiment results evidently prove this fact.

4.2. Bit Change Rate Test

For a watermarking method, the bit-change rate should be also considered. After all, nobody likes the size of the compressed video becoming bigger after embedding the watermark. If the bit-change rate is smaller, we can get better video affinity [20]. We can calculate the bit-change rate as follows:

The bit-change rate of our algorithm is very low, and the bit-increase rates of the video sequence of container, foreman, mobile, and news are only 0.08%, 0.06%, 0.04%, and 0.08%, respectively. Notwithstanding, the bit-rate changes in [17] are apparent. Comparing with [10, 17, 19], the results are shown in Figure 9. From the comparison results, we can see that our algorithm outperforms the compared ones. The smaller the bit-rate change, the better the real-time performance of the algorithm. Therefore, the proposed algorithm has a better real-time performance and is more suitable for real-time broadcast applications.

4.3. Robustness Test

Several experiments are conducted to verify the robustness of our watermarking algorithm against recompression. At first, the extracted watermark is provided in Table 1, under the condition that the video with the embedded watermark information is recompressed with equal quantization values ( and ). From Table 1, we can see that the logo image is clear, which reflects that our algorithm has better performance against recompression.

For objectively evaluating the performance of our algorithm against recompression, the NC (bit accuracy rate) and BER (bit error rate) are adopted to further measure the robustness of the algorithm. The closer the NC is to one and the nearer the BER is to zero, the video distortion from the watermark is less, and the robustness of the algorithm is better. The experimental results of the NC and BER are given in Figures 10 and 11, respectively. From them, it is revealed that under the condition of equal quantization and recompression, the NC is higher than 0.91, and it is stable at the mean value of 0.97. When the NC is higher than 0.8, it means that this algorithm has decent robustness. In the meantime, the BER of all sample videos is lower than 0.01 and the average BER is 0.028, which indicates that the proposed method has good robustness for recompression. Compared with [17], our algorithm has less BER than [10, 17] for the same test video, which proves the algorithm has stronger robustness under equal compression quantization, as shown in Figure 11.

4.4. Tamper Detection Test

Based on our watermarking method, the authentication code is used to locate intraframe tampering, and the watermark is used to detect malicious operations, such as addition and deletion between frames. To test the performance of the algorithm to resist the interframe attacks, many experiments are designed, including frame dropping, frame addition, and frame replacement. In the experiment of the frame dropping attack, the frames 40–80 of the video sequence are dropped. The detected result of our method is shown in Figure 12. We can see clearly that our method can successfully extract the reserved frame number (1–40) and identify the removal of frames 40–80. In the experiment of the frame adding attack, we added frames 61–80 to the 41st frame. The Figure 13 shows the results, wherein we can find our method is also effective for this attack. Finally, we experiment the method with frame replacing attack by substituting frames 61–80 with frames 21–40, and Figure 14 shows that our algorithm can detect such changes successfully.

Through experiments, we also provide the proportion of verification code changes of different video sequences after recompression. The statistical results are shown in Table 2 below.

Then, the proportion of verification code changes of different video sequences after tampering is also provided. The statistic results are shown in Table 3 below.

From the above tables, it is found that the rate of change of videos after recompression and tampering varies greatly.

Next, we tested the effectiveness of the proposed method to the tamper in a frame. Based on the method, if the current macroblock has or more authentication codes changed, it indicates that the macroblock has been maliciously tampered. Otherwise, the macroblock is processed normally. The value of is determined according to the actual situation. In this experiment, we implement the specific tamper to copy and paste the top right corner of the first frame of the mobile video sequence. By comparing the authentication codes of each macroblock whether changed to prove the effectiveness of the proposed method. The results of the experiment are shown in Figure 15. The right column of Figure 15 gives the location of the tamper, and we can see an evident region in the top right side.

The detection accuracy rate (DAR) is computed by , where TPR and TNR denote true positive rate and true negative rate, respectively. Table 4 shows the experimental results that represent the DAR of our method on different QPs. Also, the results illustrate that our method is efficient to handle content based tamper detection.

5. Conclusions

Herein, we propose a semifragile video watermarking algorithm based on content authentication, which has good robustness to recompression under different QPs and can simultaneously implement frame attack and video tamper detection. The frame number is binary and is converted into an 18-bit long binary sequence as a watermark. The sequence is divided into three sets of NNZ residual coefficients embedded in the DCT of the video to detect frame attack in the video. To realize the tampering detection of the video, the size relationship of the DCT nonzero coefficient is selected as the authentication code to distinguish whether the video is subjected to normal operation or malicious tampering. The experimental results show that the algorithm has good watermark invisibility, and the algorithm uses a multivalued watermark to embed, which avoids the nonsynchronization of the extracted watermark and greatly reduces the BER value of the watermark after recompression. Compared with the current algorithms, the algorithm has better invisibility and robustness, and it also has a good ability of frame attack and video tamper detection.

Data Availability

The video we use to test can be downloaded from http://trace.eas.asu.edu/yuv/index.html.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work is supported by the National Natural Science Foundation of China under grant no. 61901356 and the HPC Platform of Xi’an Jiaotong University.

We would like to thank Editage (http://www.editage.com/) for English language editing.