Abstract

Data hiding is a technique that allows secret data to be delivered securely by embedding the data into cover digital media. In this paper, we propose a new data hiding algorithm for H.264/advanced video coding (AVC) of video sequences with high embedding capacity. In the proposed scheme, to embed secret data into the quantized discrete cosine transform (QDCT) coefficients of frames without any intraframe distortion drift, some embeddable coefficient pairs are selected in each block, and they are divided into two different groups, i.e., the embedding group and the averting group. The embedding group is used to carry the secret data, and the averting group is used to prevent distortion drift in the adjacent blocks. The experimental results show that the proposed scheme can avoid intraframe distortion drift and guarantee low distortion of video sequences. In addition, the proposed scheme provides enhanced embedding capacity compared to previous schemes. Moreover, the embedded secret data can be extracted completely without the requirement of the original secret data.

1. Introduction

With the rapid development of multimedia and network technology, a huge amount of digital media, i.e., images, video, audio, and texts, is transmitted each second in the public network, such as the Internet. Such transmitted media are easily modified or illegally copied by malicious attackers. As a result, the security of private information has become a very important issue. Therefore, many solutions have been proposed in the literature, and they can be divided into two different categories, i.e., encryption and data hiding techniques. Since the encryption technique converts the media into a meaningless form that will emphasize the importance of the media’s content, this technique can actually result in attracting the attention of attackers. Data hiding is one promising technique to protect the security and privacy of the digital media because it embeds the secret data into the cover media. The embedded media that contain the secret data have a meaningful form, which helps avoid attracting the attention of attackers and can guarantee the security and privacy of the secret data.

Many data hiding schemes [17] have been proposed for different digital data in the last decade. After Richardson introduced the H.264/AVC video compression standard [8] in 2003, this standard has been used extensively for hiding secret data [919]. In 2005, Noorkami et al. [9] proposed a low complexity watermarking algorithm by using the relative change of the DC coefficients of the 4 × 4 block. In their scheme, a public key is used for determining the embedded data, while the copyright owner possesses a secret key. In 2006, Nguyen et al. [10] proposed a fast watermarking system based on H.264/AVC motion vectors. However, their scheme offered low embedding capacity; i.e., the average embedding capacity was only about 2,000 bits. Then, to achieve robustness, Zhang et al. [11] proposed a new data hiding scheme with a 2D, 8-bit watermark in the compressed domain. Noorkami and Mersereau [12] introduced a framework for robust watermarking of H.264 encoded video by using the quantized AC coefficient to obtain optimal detection of video watermarking. By using a texture masking-based perceptual model, Gong et al. [13] proposed a fast and robust watermarking scheme for H.264 video. In this scheme, the quantized DC coefficients were used to conceal the watermark. However, in these two schemes [12, 13], the original watermark is required for detection. In 2007, Kapotas et al. [15] proposed blind data hiding in an H.264 stream by using the difference of block sizes during the interframe prediction stage to increase the embedding capacity. However, Kapotas et al.’s scheme had a high bit-rate increment. To solve this issue, Kim et al. [16] embedded the watermark bit into the sign bit of the trailing ones in Context Adaptive Variable Length Coding (CAVLC). However, embedding the watermark in the discrete cosine transform (DCT) coefficients of the I-frames in these previous schemes result in stego videos that have low visual quality, which is caused by the distortion drift in the I-frame prediction. To overcome this shortcoming, in 2010, Ma et al. [17] proposed a new algorithm for data hiding in H.264/AVC based on DCT coefficients. Their scheme used several paired-coefficients of a 4 × 4 macroblock to embed the secret data to avoid distortion drift. However, this scheme obtained limited visual quality. To improve the visual quality of stego videos, Huo et al. [14] proposed a new data hiding scheme by using the controllable error drift-elimination technique. However, unsatisfactory embedding capacity was obtained by their scheme. To increase the embedding capacity of Ma et al.’s scheme, Lin et al. [18] fully used the remaining luminance blocks to hide the secret data. Their experimental results indicated that their embedding capacity was improved further when the increase of embedding capacity obtained by Lin et al.’s scheme is 0.15 bits per pixel (bpp) more than that of Ma et al.’s scheme. However, the average embedding capacity of these two schemes [17, 18] is still low, i.e., always smaller than 0.68 bpp, because they use a pair of DCT coefficients to hide the secret bit separately. By doing so, as the amount of embedded secret data increases, the the distortion of the video becomes greater. To overcome these shortcomings, we propose in this paper a new data hiding scheme for video sequences without intraframe distortion drift. Instead of using the coefficient pair separately for embedding data, all embeddable coefficient pairs in each luminance block are determined and classified into two different clusters, i.e., the embedding group and the averting group. The secret data are hidden in the embedding group by minimum modification, while the averting group is used to avoid distortion drift. The experimental results showed that the proposed algorithm further improved the embedding capacity while maintaining good visual quality and no distortion drift.

The remainder of this paper is organized as follows. Section 2 provides information about intraframe prediction and introduces the previous data hiding schemes. The proposed scheme is explained in Section 3. In Section 4, experimental results are presented that illustrate the performance of the proposed scheme in comparison with some previous schemes. Our conclusion is presented in Section 5.

2.1. Intraframe Prediction

Intraframe prediction is a technique in H.264/AVC coding [7] that is used to reduce the spatial redundancies of H.264/AVC intraframes. In H.264/AVC coding, for each block, some previously encoded adjacent blocks are used to predict the pixels of the current block. Figure 1 shows the current 4 × 4 block, , with its pixels labeled from to . These pixels are predicted based on the reference pixels (labeled from to M) of four adjacent blocks. These four blocks were encoded previously by using a prediction formula corresponding to the selected optimal prediction mode from nine prediction modes of each 4 × 4 block in Figure 2.

In H.264/AVC encoding, the current block, , is subtracted from its prediction block, , to obtain the residual block, . Then, the residual block, , is encoded further by the following processes of H.264/AVC coding, i.e., transformation, quantization, and entropy coding. For simplicity, 4 × 4 integer DCT transformation and quantization processes are implemented on to generate the corresponding quantized DCT coefficient matrix as follows:where , , , , and is the size of the quantization step which is defined by quantization parameter (QP).

After H.264/AVC coding, the block is represented by the prediction block and the quantized DCT coefficients . To encode the next block, , the encoded block, , should be decompressed from and . The dequantization and inverse DCT operations are implemented on to reconstruct the values of , denoted as , and the reconstructed block, denoted as , is calculated as .

In the decoding phase, the residual block, , is reconstructed using the dequantization process and the 4 × 4 integer inverse DCT transformation on by (2), and the reconstructed block is obtained as . where .

2.2. Data Hiding Schemes Based on Intraframe Prediction

In 2010, Ma et al. [17] used three pairs of quantized DCT coefficients in for embedding data into H.264/AVC video sequences. To prevent intraframe distortion drift during the embedding process, they analyzed the use of seven pixels, i.e., , in the reconstructed block, , for the intraprediction process. Figure 3 shows the four adjacent blocks, i.e., , and , that are affected directly by the above process. For example, if the selected prediction mode of is 0, as shown in Figure 2, then are modified by embedding data into some coefficients of .

During the embedding process, the seven pixels,, and the selected prediction modes of the four adjacent blocks are classified into three different conditions, i.e., Con1, Con2, and Con3, which are defined as follows. Con1 consists of the prediction modes in . This means the pixels located at the position of , and, are referenced for predicting . Con2 consists of the prediction modes in . This indicates that the pixels located at the positions of , and are referenced for predicting or . Con3 consists of the prediction modes in . This means that the pixel located at the position of is referenced for predicting .

To take advantage of the relationship of the reference pixels and the selected prediction modes for embedding data, these three conditions also can be classified into the five categories presented in Table 1.

To solve the drift distortion problem during embedding the secret data, Ma et al. classified the current block into three different conditions, i.e., Con1, Con2, and Con3. Then three specific pairs of quantized DCT coefficients are selected for embedding three secret data bits. Take the category Con2 as an example, three coefficient pairs, i.e., (, ), (, ), (, ), are used for embedding. The main reason is that these three pairs have the same property; i.e., when the values of quantized DCT coefficients are modified in a pair, the modification will be concentrated only on the two middle columns or the two middle rows of the block.

For example, consider the quantized DCT coefficient pair (, ) of . Assume that is added by the value of to embed the secret data, i.e., . Then, the modification will be spread to all pixels of the 4 × 4 block as , which will propagate to other adjacent blocks. However, in the Ma et al.’s scheme, to embed a hidden bit in (,), the quantized DCT coefficient pair (,) is perturbed to (+ - v), i.e., . Then, the modification will be .

In this scenario, it is clear that the modification is concentrated only on the two middle rows, while the modifications in the first and the last rows are zeros. Therefore, the distortion drift is avoided. However, they did not fully explore all of the cases for embedding data, so their average embedding rate was less than 0.45 bpp.

To further improve the embedding capacity of Ma et al.’s scheme while avoiding the distortion drift in the H.264/AVC video sequences, Lin et al. [18] have divided the relationship of the reference pixels and the selected prediction modes into five categories presented in Table 1. Then, for the block belongs to three categories, i.e., Cat1, Cat2, and Cat4, Lin et al. extracted one more coefficient pair for embedding one more secret bit by the same way as was done by Ma et al.’s scheme. In addition, each block belonging to Cat5 is also used for embedding one more secret bit to increase embedding capacity. As a result, the embedding capacity obtained by Lin et al.’s scheme is 0.15 bits per pixel (bpp) higher than that of Ma et al.’s scheme. However, in these two schemes [17, 18], each pair of quantized DCT coefficients is subsequently perturbed to embed only one secret bit. Thus, to embed secret bits, n selected pairs of quantized DCT coefficients are modified. This means that the more secret bits are embedded, the more distortion will cause in the video frames, leading to low visual quality of the video frames. In the schemes [17, 18], to guarantee the higher visual quality, if is a zero coefficient pair, it is not used to embed any secret data bits. Therefore, their embedding capacity is still low when the average embedding capacity is smaller than 0.68 bpp when QP = 28 is used.

It is obvious that, to maintain the high visual quality of video sequence, the previous schemes [17, 18] have selected quantized DCT coefficient pairs (excluding the zero coefficient pairs) of three categories, Cat1, Cat2, and Cat4 for embedding data. However, their schemes still obtained low embedding capacity, while the visual quality of video sequences is not guaranteed. Therefore, in this paper, to overcome their shortcoming, instead of modifying each coefficient pair for embedding data, the group of coefficients are selected and altered at the same time. In particular, all suitable pairs of quantized DCT coefficients are extracted and classified into two different groups, one group is used for embedding data and the other one is used to prevent distortion drift of video sequences. This means that, in the proposed scheme, the group of coefficient pairs is modified to embed secret bits. By modifying by the group, at most coefficient pairs are modified which guarantees the better visual quality of the embedding video sequences. In addition, to increase embedding capacity, in the proposed scheme, zero coefficient pairs are still used for embedding data. The details of the proposed scheme are described in the next section.

3. The Proposed Scheme

Figure 4 shows all of the main processes of the proposed embedding phase and extracting phase. In the embedding phase, the original H.264/AVC video sequence is first decoded by entropy coding. Then, the 4 × 4 quantized DCT blocks meet three cases, i.e., Cat1, Cat2, and Cat4; four secret data bits are embedded by based on group modulation. And if the blocks belong to the category Cat3, each coefficient is used to contain one secret bit. Then, all the quantized DCT coefficients are entropy encoded to get the embedded H.264/AVC video sequences. In this phase, to prevent the distortion drift in the proposed scheme, quantized DCT coefficients of each block are partitioned into two groups, i.e., the embedding group and the averting group. Then, to achieve high embedding capacity and to ensure good image quality, the embedding group is used for embedding data, while the averting group is used for preventing the proliferation of errors. Figure 4(b) shows the detail of the extracting phase. The embedded H.264/AVC video sequence is entropy decoded. Then, the category of the 4 × 4 quantized DCT blocks is determined. After that according to the determined category, the corresponding secret bits are extracted.

3.1. Category Selection and Coefficient Grouping

To prevent intraframe distortion drift in the proposed scheme, all blocks that belong to the first four categories are selected for embedding data by suitable ways. Therefore, the pixels that are used for intraframe prediction are not used during the embedding process so that the embedding distortion would not affect the other adjacent blocks. Figure 5 shows the percentage of the blocks which meet the conditions of the first four categories of the 14 H.264/AVC test video sequences. It is obvious that most of the blocks in each video sequence belong to Cat1 and Cat2. Therefore, the proposed scheme is designed to embed more secret bits into these two categories with small distortion and without distortion drift.

In the proposed scheme, four quantized DCT coefficient pairs of these three categories are also selected and divided into two groups, i.e., an embedding group and an averting group A, to conceal the secret data. Take the block which belongs to category Cat1 as an example, four coefficient pairs, (, ) (, ), (, ), (, ), (, ), are selected for embedding data. Then, all of the first coefficients of each pair are grouped to construct the embedding group, E, which is used to carry the secret bits by modifying, at most, two coefficients, whereas the remaining coefficients of four pairs are clustered into the averting group A, which is modified to prevent intraframe distortion drift during embedding process. Figure 6 shows an example of the grouping process of Cat1.

3.2. Embedding Phase

In this subsection, the embedding algorithm is described in detail. First, for security reasons, the secret data are encrypted in advance to S = s1, s2, …, , and . Original video sequences are decoded by entropy decoder to get the intraframe prediction modes and quantized DCT coefficients. Then, if the 4 × 4 quantized DCT blocks belong to , , , the secret data are embedded by based on group modulation. Otherwise, if the blocks belong to the category Cat3, each coefficient is used to contain one secret bit. Then, all the quantized DCT coefficients are entropy encoded to get the target embedded video sequences. To make the embedding algorithm clearer, two cases are used for embedding data as an example:

Case 1. If the blocks belong to Cat1, Cat2, and Cat4, the four steps are implemented for embedding secret bits as follows.

Step 1. Read four appropriate pair-coefficients and classify them into two different groups, i.e., E and A, as was done in Section 3.1.

Step 2. Read secret bits, , ,…, , from the encrypted message and embed them into group . For embedding, the weight value of group is calculated by (3). In this paper, four coefficient pairs are used for embedding; therefore, the value of could be set to at most 4. And, the difference value, d, is calculated by where is the decimal value of secret bits,.

Step 3. If the value of is 0, all elements of group are kept unchanged. Otherwise, we can arbitrarily increase or decrease the elements of group by 1. If increases by 1, the values of will be increased by 1 to 4. Whereas, if decreases by 1, the values of will be decreased by to . It can be observed that, in the proposed scheme, at most, two elements in group are modified by the value 1. Therefore, we can alter to by changing; at most, two elements in to satisfy .

Step 4. Preventing the drift of intraframe distortion, the inverse operations of modifying the group are performed on the corresponding elements of the group . For example, if the element increases by 1, the element will decrease by 1 and vice versa.
Similarly, when the block satisfies one of two categories, i.e., Cat2 and Cat4, four coefficient pairs, (, (, ), (, ), (, ), and (, ), are selected to generate the embedding and averting groups, E and A; then secret bits are embedded into the group by the same manner as was done for Cat1.

Case 2 (the block belongs to category Cat3). In such blocks, each coefficient is used to embed one secret bit because this category does not have any effect on other adjacent blocks during the encoding process. Then, the stego coefficient, , is calculated bywhere is one of 16 quantized DCT coefficients in . Here, if the block belongs to Cat5, we leave it without embedding any secret bits.

3.3. Extracting Phase

In this subsection, the extracting algorithm is used to extract the secret data from the embedded H.264/AVC video sequences. If the current block belongs to Cat1, Cat2, or Cat4, the embedding group will be reconstructed as was done in the embedding phase. Then, n embedded bits are extracted by

For the block that belongs to Cat3, each coefficient is checked to determine the embedded bit, , by using

4. Experimental Results

In this section, we describe the experimental evaluation of the performance of the proposed scheme. Fourteen video sequences, i.e., Akiyo, Bridge-Close, Bridge-Far, Carphone, Claire, Coastguard, Container, Foreman, Grandma, Hall, Mobile, Mother-Daughter, News, and Salesman, were used as test samples. The size 30 of the group of pictures (GOP) and the structure of “IBPBP” were used in the experiment. Six different quantization steps (QP), i.e., 18, 23, 28, 33, 38, and 43, were checked for the 14 video sequences mentioned above. In principle, in H.264/AVC coding, if a small value of QP is used, the better visual quality of video sequences is obtained and the more encoded bits are required.

4.1. Embedding Capacity Evaluation

Figure 7 shows the performance of the proposed scheme, in terms of embedding capacity, using six QPs. It is apparent that using a smaller value of QP resulted in a higher embedding capacity.

Table 2 compares the embedding capacity of the proposed scheme and two state-of-the-art schemes [17, 18]. As shown in the table, the average embedding capacity of the proposed scheme was considerably higher than those of the two state-of-the-art schemes. The average improvement in embedding capacity of our proposed scheme over the schemes of Ma et al. [17] and Lin et al. [18] were 160 and 91%, respectively. The main reason for this improvement over Ma et al.’s scheme was that they only selected three coefficient pairs for carrying secret bits, which resulted in low embedding capacity. Lin et al.’s scheme also had a lower embedding capacity than the proposed scheme because they did not use coefficient pairs with value of zero for embedding data to avoid significant distortion in the video intraframes. In our proposed scheme, all of the blocks that belonged to the first four categories were used to embed secret data with a small modification. As a result, the proposed scheme achieved higher embedding capacity. Specifically, Table 3 shows the embedding capacity of the three schemes for 14 test video sequences when the value of QP was 28. As Table 3 shows, the gain in embedding capacity of the proposed scheme ranged from 22 to 629% better than that of Ma et al.’s scheme and from 12 to 410% better than that of Lin et al.’s scheme. The value of the improvement rate was different for the 14 video sequences because the selected prediction modes of blocks were based on the content of each video sequence.

4.2. Bit-Rate Increment Ratio

Table 4 shows the ratio of the bit-rate increment of different QPs. The average ratios of the bit-rate increment of Ma et al.’s scheme, Lin et al.’s scheme, and the proposed scheme were 1.44%, 1.68%, and 2.01%, respectively. These results indicated that the degradation was quite small in all three schemes.

4.3. Visual Quality Evaluation

To evaluate the visual quality of video frames in the three schemes, the peak signal-to-noise ratio (PSNR) [20] is calculated by comparing the original frame to the embedded frame. Figure 8 shows the performance of the proposed scheme in the visual quality with different value of QP. It clear to see that the higher PSNR is obtained when the smaller QP is used.

Table 5 shows comparison results of the proposed scheme, Ma et al.’s scheme, and Lin et al.’s scheme in terms of visual quality of the video frames. It can be observed that the PSNR of the proposed scheme was slightly smaller than that of other two schemes [17, 18]. However, the proposed scheme can improve embedding capacity significantly; i.e., the average improvement rates were 160 and 91% over Ma et al.’s scheme and Lin et al.’s scheme, respectively. Table 6 compares the visual quality of the three schemes for QP = 28, corresponding to the embedding capacity in Table 3. Compared with Ma et al.’s scheme, the average degradation of the proposed scheme was larger than 0.45 dB. However, the proposed scheme provided better visual quality of video frames than Lin et al.’s scheme.

For a fair comparison, both values of PSNR and the structural similarly (SSIM) [21] index are used to evaluate the visual quality of the proposed scheme and two other schemes [17, 18]. Here, the max embedding capacity obtained by Lin et al.’s scheme [17] is embedded into the proposed scheme. Table 7, Figures 9 and 10 show that the proposed scheme successfully preserved the high visual quality of the video sequences. The average PSNR of the proposed scheme was better than those of Ma et al.’s and Lin et al.’s schemes, at 3.39 and 3.89 dB, respectively. As can be seen in Figure 9, the PSNR obtained by the proposed scheme is always better than that of two previous schemes [17, 18]. In terms of SSIM, in Figure 10, the proposed schemes showed better performance in some video sequences, i.e., Bridge-Close, Bridge-Far, and Coastguard, while two previous schemes [17, 18] obtained higher values of SSIM in some other video sequences, i.e., Carphone and Claire. However, the SSIM obtained by the proposed scheme also was higher than those of the two previous schemes [17, 18]. It can be concluded that, based on coefficient grouping for embedding data, the proposed scheme prevented intraframe distortion drift and improved embedding capacity further while maintaining good visual quality of the embedded video sequences. In addition, Figure 11 shows the visual effect in video sequences by three schemes. It is obvious that three schemes reveal the same visual quality observation.

5. Conclusion

In this paper, a high-quality data hiding algorithm based on H.246/AVC is proposed without intraframe distortion drift. In the proposed scheme, the quantized coefficients are clustered into two different groups, i.e., the embedding group and the averting group. Then, the embedding group is used to carry secret bits to preserve high visual quality and further improve embedding capacity. In addition, the averting group is used to avoid the intraframe distortion drift. The experimental results demonstrated that the embedding capacity increased significantly in the proposed scheme while guaranteeing the good visual quality of embedded video sequences.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This research is funded by Vietnam National Foundation for Science and Technology Development (NAFOSTED) under Grant no. 102.01-2016.06. The authors also would like to thank the Ministry of Science and Technology of the Republic of China for financially supporting this research under Contract nos. MOST 106-2218-E-035-011 and MOST 106-2627-M-035-007.