Abstract

In this work, a novel and efficient digital video watermarking technique based on the Singular Value Decomposition performed in the Multiresolution Singular Value Decomposition domain is proposed. While most of the existing watermarking schemes embed the watermark in all the video frames, which is time-consuming and also affects the perceptibly of the video quality, the proposed method chooses only the fast motion frames in each shot to host the watermark. In doing so, the number of frames to be processed is consequently reduced and a better quality of the watermarked video is also ensured since the human visual system cannot notice the variations in fast moving regions. The watermark information is embedded by Quantization Index Modulation which is a blind watermarking algorithm. The experimental results demonstrate that the proposed method can achieve a very good transparency, while being robust against various kinds of attacks such as filtering, noising, compression, and frame collusion. Compared with several methods found in the literature, the proposed method gives a better robustness.

1. Introduction

In recent years, the fabulous growth of the internet technology and the expansion of powerful computing devices have not only boosted the multimedia electronic commerce up but also incited artists to share and promote their work online. This obviously implied a massive presence on the web of digital multimedia data such as audio, image, and video. However, with the spread-out and ease of use of powerful multimedia dedicated processing tools, these data can be downloaded, easily modified, illicitly appropriated, and then largely redistributed or commercialized on the Internet. Protecting intellectual property rights of owners has then become a major concern. A solution to this problem is provided by digital watermarking [1].

A secure digital watermarking technique comprises two procedures: an embedding procedure and an extraction procedure performed by the use of embedding and extraction keys. The embedding procedure consists of inserting in the host multimedia content (usually called the cover) a watermark which is a digital signature that holds copyright information exclusively limited to the owner. Consequently, by means of the given secret keys, the extraction procedure permits solely to the owner or to an authorized recipient of the digital content to retrieve the watermark from the watermarked content [2].

In an efficient watermarking process, two important properties have to be taken into account [3]: imperceptibility: for an invisible watermarking scheme there must be no discernible difference between the original and the watermarked contents and robustness: the embedded watermark should be able to survive, to some extent, intentional and unintentional content manipulations.

Digital watermarking systems work either in the spatial domain or in a transform domain. A spatial domain technique works directly on pixels: the watermark is embedded by usually modifying directly the pixels values such as least significant bits (LSBs) [4], whereas a transform domain technique embeds the watermark by adjusting the transform domain coefficients. Popular transforms that have been frequently used are the Discrete Cosine Transform (DCT) [5], the Discrete Wavelet Transform (DWT) [6], the Discrete Fourier Transform (DFT) [4], and the Singular Value Decomposition (SVD) [7, 8]. Many combinations between these transforms have also been investigated in the literature to accomplish better results [9, 10]. Compared to spatial domain techniques, transform domain ones have been shown to achieve better robustness and imperceptibility [9]. Furthermore, according to the watermark extracting process, digital watermarking systems are categorized in three schemes [2]: blind, semi-blind, and non-blind. In a blind watermarking scheme, neither the original cover nor the embedded watermarks are required for detection but just the secret keys [2, 7, 11]. In a semi-blind watermarking scheme, only some information from the original cover and the secret keys are needed [2, 12]. A non-blind watermarking scheme requires the original cover, the original watermark, and the secret keys [2, 9]. This makes the blind watermarking schemes the most challenging ones to develop.

Initially, digital watermarking has been mainly studied for still images but in recent few years a considerable number of techniques dealing with video watermarking have been considered. However, one must say that video watermarking algorithms are more difficult to develop than those operating on images. This is essentially due to the temporal dimension which necessitates some specific requirements [13]: The robustness of the watermark should deal not only with common image processing attacks such as noise adding and JPEG compression, but also with video processing attacks such as MPEG compression and frame synchronization attacks. The imperceptibility in video watermarking is more difficult to achieve due to motion of objects in video sequences, so the temporal dimension should be taken into account in order to avoid distortion between frames. The complexity of the watermarking scheme should be low because of the significant number of frames to be processed in a video signal. Given that a digital video sequence is considered basically as a collection of sequential images [14], many of the image watermarking techniques that are present in the literature were extended to video [6, 9, 15, 16], as they embed the watermark in all frames of the video sequences. Thus, these algorithms are robust to frame dropping and frame swapping, but in return they are time-consuming and also affect the perceptibly of the video quality. To solve this problem of frame by frame embedding an answer to the following key question should be found: What are the preferred frames to host the watermark without degrading the visual quality of the watermarked video while maintaining the robustness reasonably unaffected? The answer is to adaptively embed the watermark in selected frames. In this direction, very few video watermarking schemes were considered. Tabassum and Islam [17] proposed a digital video watermarking technique based on identical frame extraction. In this method, the host video is initially divided into video shots. Then from each video shot one video frame called identical frame is selected for watermark embedding. In [18], Agilandeeswari and Ganesan developed an approach for video watermarking using SVD and DWT. In their algorithm they extracted the non-motion frames from the video using histogram difference based scene change detection algorithm, and then they embedded in them the same watermark. However, the problem in these techniques is the small number of watermarked video frames. So if those embedded frames are lost, the scheme becomes unreliable.

Jiang Xuemei et al. [19] developed an approach for video watermarking based on shot segmentation and block classification. They selected the frames with the biggest luminance value in every shot to be the host frames. The watermark signal is cropped into small watermarks according to the number of host frames in the host video. These small watermarks are then, respectively, embedded into the different selected host frames. Also, Chetan et al. [20] proposed a robust video watermarking scheme based on scene changes which embed different parts of a single watermark into different scenes of a video. These frames are selected based on scene change detection. In these two last cited techniques, if one watermarked frame is lost, the watermark cannot be extracted completely.

In this work, we propose a novel video watermarking scheme in fast motion frames using Singular Value Decomposition in the Multiresolution Singular Value Decomposition (MR-SVD) domain. The main contribution of our work is as follows:(i)In order to avoid embedding the watermark in all the frames of the video sequences, we first segment the video into temporally stationary signals using shot boundary detection. Then, from each shot we choose the frames with big motion energy (fast motion frames) to embed the watermark. This is done because the human visual system (HVS) cannot notice the details of fast moving regions [21] and thus the perceptual invisibility of the watermark is guaranteed.(ii)Because of their relevant advantages, we use a combination of the SVD and MR-SVD transforms. SVD, with its attractive mathematical properties, has been broadly applied in image compression and image watermarking and proved to be an efficient technique in both domains [22]. Most existing SVD based watermarking techniques combine the SVD transform with the multiresolution 2D-DWT [9, 10], as they were shown to be reliable and provide high robustness and better perceptual image quality. However, one of the drawbacks of the DWT is its huge resources consummation and high computation cost due to the convolutions carried out in each of the filters. To overcome this issue, Kakarala and Ogunbona [23] proposed the idea of the MR-SVD which performs multiresolution decomposition similar to that of the dwt, has perfect reconstruction, and above all is a matrix based operation like the SVD. Therefore a hybrid SVD MR-SVD watermarking technique is based only on matrix operations which make it well suited for real-time applications and simple for hardware implementation.(iii)Also, we embed watermark information by Quantization Index Modulation (QIM) which has been shown to be host interference free and provably optimal in terms of channel capacity under an additive white Gaussian noise attack. Furthermore, the extraction procedure in QIM is blind which makes it suitable for robust watermarking [24].(iv)Moreover, to embed the watermark in a secure manner, we encrypt the watermark using a logistic map based encryption [25].

This paper is organized in five sections. The next one introduces the preliminaries of our scheme. Section 3 gives the details of the proposed video watermarking which include four parts: the fast motion frames extraction, the watermark preprocessing, and the watermark embedding and extracting processes. The experimental results concerning the transparency and robustness against various attacks with comparisons with other previous algorithms found in the literature are presented in Section 4. Finally, conclusions are given in the last section.

2. Preliminaries

2.1. Singular Value Decomposition

In linear algebra, Singular Value Decomposition (SVD) is a numerical technique that decomposes a matrix into three matrices with valuable properties when applied in digital image processing [22]. If a matrix A represents, for example, an image of size N × N, then the SVD of A is given bywhere U and V are orthogonal matrices representing, respectively, the horizontal and vertical details (edges) of the image and S is a diagonal matrix, where the diagonal elements with are the singular values (SVs) of A.

Two main properties, related to the SVs, make the SVD appropriate for watermarking when the matrix S is utilized [8, 22]:(i)The energy content (luminance) of the image A is located in the SVs.(ii)The SVs have very good stability; i.e., a small perturbation added (a watermark, for example) to the image does not change significantly the SVs.

2.2. Multiresolution Singular Value Decomposition (MR-SVD)

As stated in the Introduction, the MR-SVD initially introduced in [23] is a matrix based operation.

2.2.1. 1D Multiresolution Singular Value Decomposition

Let represent a finite extent 1D signal and assume that N is divisible by 2L for some . Let the data matrix at the first level, denoted by X1, be constructed with its top and bottom rows containing, respectively, the odd-numbered and even-numbered samples of X:The corresponding centred matrix is , where is the identity and is the vector containing all ones.

Let U1 be the eigenvector matrix bringing the scatter matrix into diagonal form:where contains the squares of the two singular values, with .

Now let .

The top row of , denoted by , corresponds to the largest eigenvalue and represents the approximation component. The bottom row of , designated by , corresponds to the smallest eigenvalue and contains the detail component. The successive levels of decomposition repeat the procedure described above by placing the approximation component in place of X. Hence the MR-SVD can be written aswhere L is the desired level of decomposition.

2.2.2. 2D Multiresolution Singular Value Decomposition

We briefly describe here the 2D MR-SVD. The first-level decomposition of the image proceeds as follows. The M × N image X is divided into nonoverlapping blocks and each block is arranged into a vector by stacking columns to form the data matrix .The eigendecomposition of the scatter matrix is

LetThe top row of the resulting matrix is rearranged to form an matrix which is considered as the smooth (approximation) components of the image. The remaining rows , , and contain the detail components, which are denoted by , , , respectively. The complete transform can be represented as follows:The original image X can be reconstructed from the right hand side, since the steps are reversible. As an example, the one-level MR-SVD decomposition of the video frame “Foreman” is depicted in Figure 1.

3. Proposed Method

Our proposed algorithm encompasses four consecutive parts: fast motion frames where to embed the watermark, watermark preprocessing, watermark embedding process, and watermark extraction process.

3.1. Fast Motion Frames Extraction for Embedding (FMFE)

In order to increase the quality of the watermarked video, the proposed system exploits the characteristics of the human visual system (HVS) to select frames in which the watermark is embedded effectively. Because the HVS is less sensitive to errors in regions with great motion, we select the frames that have big motion energy to be the host frames [21]. In our scheme, to extract the fast motion frames the cover video V is first converted into individual frames; then shot boundaries (SBD) are detected using the algorithm proposed in [26] to obtain temporally stationary signals because frames within the same shot have a strong correlation. Afterwards, we measure the average motion energy of each frame using the mean magnitudes of motion vectors (MMMV).

Let be a shot of length k, where ,, represents the frame in the shot. The MMMV of the frame can be calculated as follows:where N is the number of macro blocks in a frame and and represent the components of the motion vector (MV) in, respectively, the X axis direction and the Y axis direction.The above threshold is used as a decision rule to distinguish between fast and slow motion frames. Furthermore, we use a constant α to adjust the decision rule:Figure 3 shows the average motion energy of some frames in Foreman, Football, and Akiyo video shots. In the case of the Forman video, we see that the fast motion frames take place between the frames 150 and 228.

3.2. Watermark Preprocessing

In order to improve the security of the proposed algorithm, the binary watermark should be first preprocessed before embedded. Here, the watermark is scrambled by using the chaotic logistic map which is determined by the equation [25, 27]:where is the system parameter. The initial value is adopted as a key.

Then, the binary image logo or signature is scrambled by ) with the following rule:with N being the total number of bits in the watermark and being the binary Exclusive Or (XOR) operation.

3.3. Watermark Embedding Process

In this section, we describe the proposed video watermarking scheme. Figure 2 shows the block diagram of the proposed video watermark embedding procedure, which is described as follows:(1)The fast motion frames are extracted from the original color video. Only these frames are watermarked. This makes the watermarked video quality good because the watermark is not embedded in all the frames as it is done in other watermarking schemes [6, 9, 28].(2)Every fast motion frame is converted from the RGB to the YCbCr color space.(3)Every luminance frame is transformed with the 1-Level MR-SVD decomposition to get approximation and detail components , 1, 2, .(4)The approximation component Φ is decomposed into blocks of size [u × u].(5)The SVD is applied to each block of the approximation component (i.e., the low frequency subband), which contains the major video frame energy.where k is the fast motion frame index and n is the location of the block. We use the SVD due to its suitable properties discussed earlier.(6)The watermark is encrypted using chaotic logistic map.(7)In order to guarantee robustness to our watermarking scheme, the watermarking system inserts the watermark bits in the largest singular value using the QIM method aswhere Q is the quantization step and stands for the rounding operation.(8)Inverse SVD transformation is conducted to obtain the watermarked block:(9)To build the watermarked luminance, inverse MR-SVD is applied to the modified approximation component .(10)Reconstruction of the watermarked video frame is done using the watermarked luminance part and the original frame chrominance parts Cb and Cr, by conversion from the YCbCr into the RGB color space.

3.4. Watermark Extraction Process

The watermark extraction process is shown in Figure 2 and is described as follows:(1)The watermarked fast motion frames are extracted from the watermarked color video.(2)Every watermarked fast motion frame is converted from the RGB to the YCbCr.(3)Every watermarked luminance frame is transformed into 1-Level MR-SVD decomposition to get the watermarked approximation .(4)The watermarked approximation component is decomposed into blocks of size [u × u].(5)The SVD is applied to each block of the approximation component.(6)The embedded watermark is extracted by the following rule:where is the floor function.(7)Decryption with the same chaotic sequence is performed to get the hidden binary watermark .(8)Since a video clip contains several fast motion frames, in which the same watermark is embedded, we calculate the final recovered watermark by averaging the watermarks extracted from these different frames.

4. Results and Discussion

The proposed algorithm is implemented using MATLAB. We used three CIF () standard sequences shown in Figure 4 (Foreman: 300 frames, Akiyo: 300 frames, and Football: 260 frames). The test videos are in RGB uncompressed avi format, with a frame rate of 30 fps [29]. The watermark is a binary image. Its resolution is . The block size in the proposed algorithm can be set to any desired value. However, a small block size leads to block effects problem, whereas a large block size reduces the total number of the watermark bits. We carried out various experiments and found that the block size of 16x16 allows the embedding of an acceptable number of watermark bits without causing noticeable block effect.

As shown in Figure 5, small values of the quantization step yield good transparency at the expense of poor robustness and vice versa. From this figure, it can be observed that the value of the quantization step gives a good compromise between robustness and imperceptibility.

4.1. Imperceptibility Tests

The imperceptibility of the watermark is estimated by measuring the PSNR (Peak Signal to Noise Ratio) and Mean Structure Similarity Index Measure (MSSIM) which are calculated using the luminance space Y of the original and watermarked frames [30]. High PSNR values of the watermarked video frames indicate better imperceptibility. It is worth noting that the SSIM provides a perceptual distortion in range of , . When both frames are numerically the same, this value is equal to 1. The PSNR is calculated as follows:where the Mean Square Error (MSE) between the host luminance Y and the watermarked luminance is defined aswith M and N, respectively, being the height and width of the video frame.

The MSSIM is defined as follows:where are the image contents at the j local window and M is the number of local windows of the image.where are the luminance comparison, the contrast comparison, and the structure comparison functions, respectively. With , , and are parameters used to adjust the relative importance of the three components.

As shown in Table 1, the average PSNR values of the watermarked videos are higher than 40 dB and the corresponding MSSIM values are very close to 1. This indicates the invisibility of the watermark which means that the watermarked videos appear visually identical to the original ones as shown in Figure 6.

4.2. Robustness Tests

The robustness for any watermarking system is a very important requirement. To verify it, we apply to the watermarked video various types of attacks and we use the Normalized Coefficient (NC) and the Bit Error Rate (BER) to compare the similarities between the original watermark W and the extracted watermark . The NC and BER are, respectively, calculated aswhere W, , and P are, respectively, the original watermark, the extracted watermark, and the size of the watermark.

The correlation between W and is very high when NC is close to 1.

Generally in video watermarking, attacks are divided into three categories: image processing attacks, frame synchronization attacks, and video compression attacks.

Image Processing Attacks. Considering a video as a sequence of images, the attacks applied to images can then be applied to the video sequences. The common image processing attacks are as follows:(i)Adding a noise: three kinds of noises are added to the watermarked video: Gaussian noise, salt & pepper noise, and speckle noise with density of 1%. It can be seen from Table 2 that the watermark is always detectable with NC and BER values, respectively, close to 1 and 0, especially, for Foreman and Football video sequences.(ii)Filtering: 3x3 median filter and 5x5 Gaussian filter are applied separately to the watermarked videos and we can see from Table 2 that the proposed method is robust against median and Gaussian filtering.(iii)JPEG compression: the watermarked video is compressed with different quality factors ranging from 10 to 100. Figure 7 shows the results for the JPEG compression; for instance, if the watermarked video is compressed with a quality factor of 40, the obtained NC is greater than 95% and the BER value is lower than 0.2%. This confirms the robustness of the proposed scheme to the JPEG compression attack.

Frame Synchronization Attacks. Because contents in the consecutive frames of a video are almost identical, it makes the video sequences susceptible to temporal synchronization attacks such as frame averaging, frame dropping, and frame swapping.(i)Frame dropping: in frame dropping, selected watermarked frames are replaced by their corresponding original frames. Table 3 shows the average NC and BER values given at different frame dropping rates. Our scheme achieves strong robustness against frame dropping even for the case of high rates (i.e., 80%).(ii)Frame averaging: in frame averaging, we replace selected watermarked frames by the average of their previous, current, and next frames. The watermarked video is averaged for various averaging rates and then we tried to extract the watermark. Table 4 shows that the watermark can be recovered at frame averaging rates up to 50%.(iii)Frame swapping: the results presented in Table 5 prove the robustness of our watermarking scheme against frame swapping because when all watermarked frames are swapped, the NC is 1 and the BER is 0.

Video Compression Attacks. Video compression is a fundamental attack in video watermarking that should be verified as video sequences are stored and transmitted in compressed format. Here we use a tool for video processing named VirtualDub to compress the videos sequences with two different Lossy compressions [31]: H264 coding with a bit rate of 512kbps and MPEG4 coding with bit rates of 1500kbps and 1000kbps. The NC and BER are depicted in Table 6 and we see that the algorithm can resist to video compression attacks.

Results with the StirMark Benchmark. We also evaluated the proposed method with StirMark 3.1, which is a well-known evaluation tool for watermarking robustness of watermarked video frames under image processing attacks [32, 33]. Table 7 shows the evaluation of the proposed method with the StirMark benchmark under JPEG compression, median filter with JPEG compression, Gaussian filter with JPEG compression, sharpening with JPEG compression, and removal of row and clone with JPEG compression. As can be observed, the proposed watermarking technique can resist to all these named attacks.

4.3. Comparison with Some Previously Reported Algorithms

In order to evaluate the performance of our algorithm, we compared the results of the proposed video watermarking scheme to the results of related video watermarking schemes given in [6, 16, 18, 20], which we introduced and discussed in the Introduction. Figure 8 displays these results of comparison under frame dropping, frame averaging, and JPEG compression attacks. It is clear that the proposed scheme achieves a better robustness.

4.4. Complexity of the Proposed Algorithm

The proposed watermarking schema has a low complexity because of the following:(i)The algorithm is done in the MR-SVD domain which has an overall complexity , where n indicates the signal length [23].(ii)To extract fast motion frames we use a simple threshold decision method based on motion energy.(iii)The embedding and extraction are not applied to all the frames of the video but just to fast motion frames.

In fact, the execution time of the proposed scheme required in watermark embedding and extraction processes the following results: watermark embedding: 0.36 s per frame; watermark extraction: 0.19 s per frame. The simulation is conducted on Intel(R) Core™ i5, 1.80 GHz processor, with 4 GB RAM using MATLAB ver.R2013b.

5. Conclusion

In this paper, we proposed a novel blind video watermarking scheme in fast motion frames using the SVD, the MR-SVD, the QIM, and the Logistic Map encryption. In the proposed technique, we solved the problem of embedding the watermark in all frames by choosing only the frames that have big motion energy to be the host frames, suitable to the HSV. Using this approach, we have limited the number of frames to be processed and also assured a higher imperceptibility of the watermark. In addition, the combination of the characteristics of the SVD, MR-SVD, QIM, and Logistic Map Encryption makes our scheme secure and robust to a variety of attacks. The experimental results confirm that the proposed watermarking scheme has good imperceptibility with PSNR greater than 40 dB. The proposed scheme is robust not only to image processing attacks, but also to frame synchronization and video compression attacks. The comparison results with other algorithms related to video watermarking indicate the superiority of our scheme.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflicts of interest.