Abstract

In this paper we propose a novel and efficient hardware implementation of an image watermarking system based on the Haar Discrete Wavelet Transform (DWT). DWT is used in image watermarking to hide secret pieces of information into a digital content with a good robustness. The main advantage of Haar DWT is the frequencies separation into four subbands (LL, LH, HL, and HH) which can be treated independently. This permits ensuring a better compromise between robustness and visibility factors. A Field Programmable Gate Array (FPGA) that is based on a very large scale integration architecture of the watermarking algorithm is developed to accelerate media authentication. A hardware cosimulation strategy using the Matlab-Xilinx system generator (XSG) was applied to prove the validity of the suggested implementation. The hardware cosimulation results show the effectiveness of the developed architecture in terms of visibility and robustness against several attacks. The proposed hardware system presents also a high performance in terms of the operating speed.

1. Introduction

Digital watermarking is a technique of hiding information on a digital support such as images, video, or audio for authentication control, copyright protection, integrity verification, etc. The hidden information is called a watermark and the marked documents are named watermarked data. Distortion caused by the hidden watermark on the host data should be made as low as possible. The watermarked and original images must be perceptually equivalent so that the embedded watermark can remain imperceptible by a Human Visual System (HVS). The Peak Signal to Noise Ratio (PSNR) parameter is used for the imperceptibility measure. Even if the distortion, caused by the watermark, is small, it can be undesirable in some image types such as the medical and military ones. For these types of applications, the PSNR value must be greater than 40 dB [1]. In this case, watermarking in the transform domain is recommended. In fact, transform spaces such as Discrete Cosine Transform (DCT), Discrete Wavelet Transform (DWT), and Karhunen Loeve Transform (KLT) provide a special authentication to host images. They are especially used in telemedicine, e-healthcare, legal domains, telesurgery, etc.

The performance of a watermarking system is generally subject to the following requirements.

(i) Imperceptibility. The watermark should not affect the quality of the original image after any watermarking operation. Cox et al. [2] defined the imperceptibility as a visual similarity between the original and watermarked images. The watermark has to be inserted in a way that it still is completely invisible to HVS [3]. Indeed, the insertion process must not damage the host image. However, not only the image but also the watermark should not be distorted. This latter must be invisible, but also easy to extract.

(ii) Capacity. The ability of a watermarking system refers to the ratio of the amount of data to be hidden according to the size of the host document [4]. Sometimes the size of the watermark is limited just to 1 bit.

(iii) Robustness. Robustness is the resistance of the watermark system against intentional transformations on a watermarked image [5]. These transformations can be of a given geometric type such as rotation and cropping and they include all types of image degradation caused by lossy compression, high-pass filter, low-pass filter, etc.

To these requirements, we can add the computational complexity. In fact, execution time can be an important factor for many applications. Watermarking algorithms with a low computation cost can be used to reduce the execution time. However, this can highly affect the system performance. Elsewhere, the algorithm can be adapted for hardware implementation to accelerate the processing while maintaining the techniques effectiveness [6]. In the related literature, software implementation of the watermarking algorithms is largely applied in contrast to hardware implementation, despite the performance that can be achieved by applying this type of development [7]. In a software implementation, the algorithm’s operations are performed as a code running on a microprocessor [8]. The main drawback of this type of implementation [8] is the limited means for improving the system speed and the hardware performances. Although it might be faster to implement an algorithm in software, there are a few compelling reasons for a move to hardware implementation. In this kind of implementation the algorithm’s operations are fully implemented in a custom-designed circuitry. This investigates great advantages such as hardware area and consumption decrease and mainly speed increase [79].

In the given literature, a number of hardware designs for conventional watermarking algorithms have been reported. The Very Large Scale Integration (VLSI) architecture for a conventional watermarking algorithm in the spatial domain proposed by Gerimella et al. [10] might be considered as a noteworthy early work. Later, Mohanty et al. [11] proposed a watermarking hardware architecture that can insert two visible watermarks into digital images using a spatial domain watermarking technique. Mohanty et al. [12] put forward a VLSI architecture that could insert invisible or visible watermarks into digital images in the DCT domain. Mohanty et al. [13] developed two versions (low-power, high-performance) of watermarking hardware module. The DC component and the three low frequency components are considered for insertion in the DCT domain. Maity et al. [14] suggested a fast Walsh transform (FWT) based on a Spread Spectrum (SS) image watermarking scheme that would serve for authentication in data transmission. In [15], Korrapati Rajitha et al. proposed an FPGA implementation of a watermarking system using the Xilinx System Generator (XSG). Insertion and extraction of information were applied in the spatial domain. In [16], Rohollah Mazrae Khoshki et al. put forward a hardware implementation of a watermarking system based on DCT. Their work was developed using Matlab-Simulink followed by Altera DSP Builder (integrated with Simulink Embedded coder) for Auto-Code generation. In [17], Rahate Kunal B. et al. suggested a hardware implementation of a fragile watermarking system operating in the spatial domain. Their proposed watermarking scheme was imperceptible and robust against geometric attacks, but fragile against filtering and compression. Hirak Kumar Maitya et al. [6] put forward a hardware implementation of reversible watermarking in the spatial domain by using a reversible contrast mapping technique. The principal advantage of the proposed work was the operation frequency (more than 98.76 MHz). In [18], Sakthivel and S.M. et al. put forward a VLSI architecture of a digital image watermarking system. Their embedding process was based on the Pixel Value Search Algorithm (PVSA) applied in the spatial domain. The system was implemented using verilog Hardware Description Language (HDL) and the Altera Quartus-II 11.0 tool with Matlab R-2012b. The presented results showed that the proposed system was not highly fast with an average quality of the watermarked image and the extracted watermark resulting in different attacks. In [19], Manas N. et al. suggested a hardware implementation of a watermarking algorithm based on phase congruency and singular value decomposition. Their idea consisted in embedding watermark data in the host image using the Singular Value Decomposition (SVD) in the congruency phase mapping points applied in the spatial domain. Their system was implemented using the Xilinx ISE 14.3 tool and a Virtex 5 FPGA device. In [6], Hirak M. et al. proposed an FPGA implementation of an image watermarking algorithm using Reversible Contrast Mapping (RCM) in the spatial domain. The implemented algorithm and the resulting architecture were relatively simple. In [20], Karthigai kumara P. et al. put forward an FPGA implementation of an image watermarking system using the XSG tool. Their suggested system consists in embedding a binary watermark in the discrete wavelet domain of a host image. The main disadvantage of the proposed system is that the corresponding hardware design consumed a lot of hardware resources despite that the system used only the DWT tool.

After this review of the existing work that addresses the hardware implementation of watermarking systems, we can note that the majority of their present inefficiency is in terms of hardware performances or in terms of robustness of the hardware design against attacks. Many of them are applied in the spatial domain with, some time, very simple techniques to be implemented as well as a lack of hardware speed efficiency. However, hiding confidential data in the spatial domain is generally vulnerable against hackers. In this work, we suggest a novel and efficient hardware implementation of a watermarking system based on Haar DWT. We aim at developing a watermarking system that ensures high performance in terms of hardware efficiency with high imperceptibility (PSNR) and robustness (Normalized Cross-Correlation, NC). The system is designed using the XSG tool and synthesized for Xilinx Virtex-5 FPGA of the ML507 platform. A comparison with existing watermarking systems will be undergone to show the effectiveness of the proposed module in terms of hardware performances with the high imperceptibility and robustness against several attacks.

The rest of the paper is organized as follows: In Section 2, a description of the different steps of the adopted watermarking algorithm is given. In Section 3, we describe the hardware design of the watermarking system. The implementation results and the performance evaluation of the developed watermarking system are presented in Section 4.

2. Description of Watermarking Algorithm

Watermarking systems of digital images are composed of two main parts: insertion and detection [21]. The diffusion process includes the attacks applied to watermarked images.

2.1. Insertion Step

As illustrated in Figure 1, the proposed system is an additive scheme. The watermark insertion is expressed by

In the insertion phase, our system requires four data inputs:(i)The original image (I) that will contain the data to be preserved and protected.(ii)The watermark (W), which represents the information to be inserted (a binary information).(iii)The key (C), which is a binary sequence to be mixed with the watermark for its protection.(iv)The visibility factor (α), which is the marking strength in the image. This coefficient must be adequately chosen to maintain a best compromise between robustness and imperceptibility factors of the scheme.

After the second level of decomposition using the 2D Haar DWT, we obtain four subbands of 1/8 of the input image size (Figure 1): approximation (LL2 band: low frequencies) and details (horizontal (LH2), vertical (HL2), and diagonal (HH2)). In our adopted method, we opt for inserting the watermark in the LH2 subband, which includes the medium frequencies. In the end of this phase, 2D IDWT is applied to construct the watermarked image.

2.2. Extraction Step

As depicted in Figure 2, the extraction step consists in following the same steps as in the insertion phase. The 2D Haar DWT is applied at the second level of the decomposition. After that, the watermark is recovered by using the following equation:The watermarked image may be subject to alterations caused by attacks. Indeed, a thresholding phase is necessary for the proper extraction of the watermark. Equation (3) is applied to set the value of the watermark.

3. Hardware Design of the Watermarking System

Xilinx Company proposes an Integrator Design Environment (IDE) for FPGA under the Matlab tool. This IDE is aiming to increase the abstraction level of the hardware design and to minimize the manual intervention of the HDL code generation [22]. This tool is named XSG; it is a high-level design tool that allows using the MathWorks Simulink environment in the design of digital circuits dedicated to Xilinx FPGAs [23]. It is used for hardware system generation, simulation, and validation throughout the hardware cosimulation technique.

The structure of a system is created in the Simulink modeling environment using a specific library offered by Xilinx. All the designing steps for the implementation on FPGA, including synthesis, placement, and routing, are automatically performed to generate an FPGA programming file.

The designer starts with creating the system model in Simulink. Next, “Sysgen" automatically generates the bitstream to program the FPGA. Intermediate steps, which are synthesis, placement, and routing, are performed by intermediate tools. Figure 3 describes the XSG based design flow.

In our design, the acquisition and display of input and output images are performed using the Matlab tool. At this phase, data are presented in a double-precision floating number. The processing algorithm is implemented by using XSG blocks. In the XSG design, boolean and fixed-point formats are used for data representation. To adapt the representation differences between the XSG design and the Matlab software part, Xilinx offers a simple interfacing utilizing predefined “Gateway-In” and “Gateway-Out” blocks provided in the Xilinx Blockset Library. The global design of the watermarking system is divided into two principal modules: insertion and extraction.

3.1. Insertion Module

As shown in Figure 4, the global design of the insertion module is composed of two main blocks. The first one corresponds to the decomposition and reconstruction of the DWT at the second level. The second one corresponds to the insertion step.

3.1.1. 2D Haar DWT

(a) Decomposition Step of the 2D DWT of Haar. The one-dimensional decomposition is obtained by applying the equations of the decomposition “A” for approximations and “M” for details.As shown in Figure 5, the Haar wavelet decomposition in two dimensions is mainly performed in two stages. The first stage consists in applying (1) and (2) along lines. This allows obtaining two subbands, generally denoted as L and H. Then a transposition is made in order to reach the second stage, which consists in applying the same equations on columns. So, the four subbands named LL, LH, HL, and HH will be obtained.

Figure 6 gives the various parts of the 2D Haar DWT global design.

(i) Preprocessing Subsystem. The preprocessing subsystem allows the preparation of the input data for accelerating the wavelet computing. The idea consists in decomposing the entire image into four components, so, separating the even and odd pixels from each even and odd image line. This process allows performing the wavelet steps in one go. The design is presented in Figure 7.

(ii) Calculation of the Subsystem. This subsystem computes the coefficient of wavelet field. Thus, it receives and processes the outputs of the preprocessing subsystem in order to produce four outputs, which are LL, LH, HL, and HH coefficients. Obviously, as shown in Figure 8, the calculation is done by the addition, subtraction, and multiplication blocks.

(iii) Storage Subsystem. After wavelet computing, a storage stage is required; hence we present the objective of “Storage” subsystem. However, to accelerate the write/read of data, we have opted for using internal RAM blocks. The Storage subsystem design is presented in Figure 9.

(b) Inverse Transformation of 2D DWT of Haar. The principle of calculating the coefficients of the original image is depicted in Figure 10. From four subbands (LL, LH, HH, and HL), the first step is to calculate L and H. Then in the second step, the original pixels are calculated.

The process of calculation is as follows:

(i) We begin with the computation of the L subband coefficients. This is done by browsing, at the same time, the two LL and LH subbands along the columns using the following equations:

(ii) Thereafter, with HL and HH, we calculate the coefficients of band H. This is achieved by the following equations:

(iii) At this stage, the original pixels are calculated, browsing at the same time the two bands L and H along lines using the following equations:

(iv) After this last step, we have the original pixels. Finally the pixels are organized to reform the input image.

For the implementation of the IDWT of Haar with XSG tools, we propose the subsystem shown in Figure 10. Thus, the subsystem processes the coefficients of wavelet field in order to acquire the original data. Hence, the computing of the original data is done with addition and subtraction blocks. Also, we use other logic blocks for data control and shaping.

3.1.2. Hiding Watermark on the Host Image

As presented in Figure 11, the second step is about the insertion system. At this step, the totality of the watermark is embedded in LH2 (second horizontal subband). The watermark is scrambled by a secret key generated by the “LFSR” block. Afterward, the “DSP48 macro” block is used to carry out the addition of the scrambled watermark multiplied by the “α” visibility factor.

The inputs of the “DSP48 macro” block are, respectively, LH2, alpha, and the scrambled watermark. Its output is the watermarked LH2.

3.2. Extraction Module

The extraction step is the last phase of the watermarking system, which aims to extract the inserted data. Figure 12 represents the global design of the extraction system. At this step, the same procedure is reversely used.

The main difference, relative to the insertion step, is the extraction block. Figure 13 presents the design of the extraction block. We obtain the original and watermarked subbands. After that, a subtraction is applied to extract the modified watermark, named W'. The latter is stocked in FIFO. Finally, by using the thresholding, the final watermark is extracted.

4. Implementation Results and Performance Evaluation

In this section, we start by presenting the hardware implementation results of the adopted system. Some examples of the cosimulation results of the generated hardware block will be present. The efficiency of the proposed system is then discussed according to the PSNR value, between the original and the watermarked image, and the NC value between the original and the detected watermark against several attacks. A comparison with some existing works will be described in the following.

4.1. Cosimulation Results

After the validation of the adopted algorithm by the software simulation, we proceed to the implementation on a Xilinx platform. The configuration file is obtained automatically by following the necessary steps to convert the design into an FPGA synthesizable module (Figure 14). The target device selected for this work is Virtex-5 FPGA on the ML507 platform.

The hardware implementation of the insertion and extraction steps, on the ML 507 target, generates the results of the FPGA resource consumer in Table 1. The Register Transfer Level (RTL) diagrams of the insertion and extraction systems are presented in Figure 15.

For the validation of our study, we considered an ordinary image base known as the image “Cameraman,” “Lena,” “Barbara,” etc. In Figure 16, we present some implementation results of the adopted watermarking system, on the “Cameraman” image with a variation of the value α (equal to 3, 6, 10, and 20). However, we notice that the increase in the visibility factor leads to the loss of the psychovisual quality of the watermarked image. It should be noted that, in the absence of attacks, the watermark is well extracted, from which we can conclude that the implemented system gives results similar to those obtained by software implementation.

4.2. Performance Evaluation against Several Attacks of Implemented System

Following the literature, the main constraints of the watermarked scheme are imperceptibility and robustness factors. The first one is named PSNR and presented in (8). PSNR is accepted if its value is greater than 30 dB [24]. The second one, named NC, is presented in (9). The NC value is accepted if its value is greater than 0.7 [25].

After several empirical tests, applied to ordinary images with and without attacks, we have found that, for α equal to 3, the adopted algorithm ensures a maximum compromise between robustness and imperceptibility factors.

Figure 17 shows the results of the hardware cosimulation of ordinary images in the absence of attacks. It can be concluded that the values of PSNR are well acceptable with respect to the previous work and with respect to results obtained for the software implementation.

Our algorithm, implemented on the hardware, is more robust against other types of attacks. After applying several attacks, we extract the watermark and we compare it to the original one. The main goal is to ensure that the extracted watermark is not modified by attacks. It is important to get an NC value close to 1 and a good PSNR value. The robustness against diverse types of attacks such as JPEG 2000 attacks, impulsive noise, median filter, cropping, flipping, and stretching is among the important watermarking constraints.

After attacking the 6 types of ordinary images, we attempt to extract our watermark and calculate the NC value. Our aim is to conclude on the degree of robustness of our scheme against diverse attacks. Tables 2, 3, 4, and 5 and Figure 18 show the experimental results relative to the NC and PSNR values between the host and extracted watermark after applying attacks.

4.3. Discussion of the Proposed Scheme

In this section, we compare the obtained results of the suggested system with results relative to the systems cited in the related work section. For this comparison, we consider the most typical and recent related papers [6, 1820]. The latter represent almost the most important works addressing watermarking systems hardware design with interesting results. First, for psychovisual quality of the original and watermarked images, our hardware implementation provides very good results compared to the software implementation ones.

As provided in Table 6, in the absence of any type of attack, PSNR, for the image “Lena" is equal to 48.4715, which represents a better result than those aforementioned algorithms. Among the most serious attacks, we apply the JPEG attack. The obtained result shows that the proposed scheme is very effective against this kind of attacks. In fact, the results presented in Table 2 show that, from a compression rate equal to 50%, the NC value is greater than 0.7. Compared with previous work (Table 6), we note that our implemented system gives better results.

The evaluation of our implemented method against impulsive noises shows very promising results as presented in Table 3. In fact, the recovery of the watermark is greater than 0.7 for a density equal to 0.01. Beyond this value, the recovery of the watermark is not acceptable. Indeed, our implemented approach has proven its robustness against this type of attack, and, compared to other works, our implemented system gives better results.

Also, we test our system against the median filter. The test is evaluated with various sized windows (from 3×3] until 9×9]) (Table 4). The detection by correlation between the extracted and inserted watermarks has shown that our implemented system is robust against median-filter attacks (NC is greater than 0.7 for a window size coefficient less than or equal to 5x5]) and keeps the visual appearance of the image after watermarking. As illustrated in Table 6, in general, the proposed architecture gives relatively good results.

The proposed architecture gives also acceptable results (NC greater than 0.7) against geometric attacks such as the flapping and stretching of the watermarked images. The last attack applied on the proposed system is the so-called “cropping" attack. Note that for a window lower than or equal to 25% of the size of the watermarked image, the NC value is less than 0.7. Compared to previous works we note that this is also an acceptable result for our architecture.

The hardware performances of the proposed system have been evaluated relatively to the operating frequency the FPGA resources occupancy rate. According to Table 7, broadly the proposed architecture gives better results. The highest operating frequency reported in previous work is 183.8 MHz [19]. However, for our algorithm, the maximum operating frequency is 224 MHz. Compared to [20], we noted that even if the proposed architecture is slower, it presents a better hardware resources occupation rate.

5. Conclusion

In this paper a novel and efficient hardware implementation of an image watermarking system based on the Haar Discrete Wavelet Transform has been developed. The performance of the proposed hardware implementation in terms of processing latency has been evaluated and compared to other previous work. The XSG tool has been used for system development. The utilization of this tool has a big benefit in terms of conception time, since the same design has been firstly used for the software validation and then for hardware system generation. A hardware cosimulation strategy using the XSG was applied to prove the validity of the proposed implementation. The hardware cosimulation results showed the effectiveness of the developed architecture in terms of visibility and robustness against several attacks.

Data Availability

The obtained results used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

Authors’ Contributions

All authors helped in conceiving the experiments. Mohamed Ali Hajjaji designed and performed the experiments. At the same time, Mohamed Ali Hajjaji and Mohamed Gafsi wrote the main part of the paper. Abdellatif Mtibaa and Abdessalem Ben Abdelali contributed to interpreting the results and revising and writing of the paper.