A New Reversible Database Watermarking Approach with Firefly Optimization Algorithm

Imamoglu, Mustafa Bilgehan; Ulutas, Mustafa; Ulutas, Guzin

doi:https://doi.org/10.1155/2017/1387375

Mathematical Problems in Engineering

On this page

Abstract Introduction Related Works Results and Discussion Conclusion Conflicts of Interest References Copyright Related Articles

Research Article | Open Access

Volume 2017 | Article ID 1387375 | https://doi.org/10.1155/2017/1387375

A New Reversible Database Watermarking Approach with Firefly Optimization Algorithm

Mustafa Bilgehan Imamoglu,¹Mustafa Ulutas,¹and Guzin Ulutas¹

Academic Editor: Lotfi Senhadji

Received05 Sept 2016

Revised25 Dec 2016

Accepted30 Jan 2017

Published06 Mar 2017

Abstract

Up-to-date information is crucial in many fields such as medicine, science, and stock market, where data should be distributed to clients from a centralized database. Shared databases are usually stored in data centers where they are distributed over insecure public access network, the Internet. Sharing may result in a number of problems such as unauthorized copies, alteration of data, and distribution to unauthorized people for reuse. Researchers proposed using watermarking to prevent problems and claim digital rights. Many methods are proposed recently to watermark databases to protect digital rights of owners. Particularly, optimization based watermarking techniques draw attention, which results in lower distortion and improved watermark capacity. Difference expansion watermarking (DEW) with Firefly Algorithm (FFA), a bioinspired optimization technique, is proposed to embed watermark into relational databases in this work. Best attribute values to yield lower distortion and increased watermark capacity are selected efficiently by the FFA. Experimental results indicate that FFA has reduced complexity and results in less distortion and improved watermark capacity compared to similar works reported in the literature.

1. Introduction

The rapid development in Information Technologies (IT) has simplified data sharing through the web and enabled collaboration among people, organizations, and governments worldwide. The collaborative environment necessitates either free or commercial sharing of information as in medicine, science, and stock quotes. Most of the time, centralized or distributed relational databases hold shared information where authorized users can access and use it by applications. One of the problems in sharing information is that information can be copied, altered, or modified easily by authorized users and may be distributed to unauthorized users for reuse. Therefore, ensuring copyright and preventing temper of shared data have become a serious problem.

Cryptography can be used to share confidential information without compromising security. However, it does not associate ciphered information with original content and cannot be used for copyright protection. Watermarking is proposed by [1] and provides solutions for authentication, fingerprinting, copy control, and copyright protection during distribution of digital data. Many watermarking techniques have been proposed in the literature to share images, video files, audio files, and relational databases without violating copyrights. Robustness, fidelity, and blindness properties of watermarking techniques have been used for copyright protection of relational databases recently.

In 2003, Kiernan et al. showed that watermarking could be used to provide copyright protection of relational databases [2]. Their method embeds watermark data at bit level on numeric data fields. A secure Message Authentication Code (MAC) is utilized by their work to select some tuples and attributes on the selected tuples to embed the watermark bits. LSB embedding technique is used by their work and tampers the relational database permanently. A pseudo random number generator is used to generate the watermark bits. When an attacker implements simple shifting operation on the LSB bits of numeric fields in the watermarked database, a significant amount of watermark will be lost. After this, many works in the field have been proposed by other researchers. In 2004 [3], Sion et al. proposed a method to insert a virtual watermark. Their method first sorts tuples in ascending order by most significant bits of normalized values and then puts markers at tuples where MAC mod yield zero to create partitions. The watermark is then embedded by modifying tuple values close to standard deviation boundary which results in altered partition statistics. Their work has some extra payload information and is dependent on a single attribute.

Li et al. used categorical attribute to detect malicious alterations and used fragile watermarking approach [4]. The tuples in the database are grouped and watermarks are embedded into groups. Verification procedure is applied on group level. Guo et al. proposed a fragile watermarking scheme to detect malicious modifications on a database in 2006 [5]. The method divides the database into partitions according to the primary key value of tuples. However, two LSB bits are modified by their approach to embed the watermark information. In 2006, Zhang et al. proposed the first reversible watermarking scheme in their work [6]. Their method utilized histogram expansion approach to reversibly watermark the selected nonzero initial digits of errors. In 2007, Zhou et al. embedded a bitmap image into the relational database for copyright protection [7]. Their method also implements an error correction mechanism to correct the errors in the detected watermarks. Difference Expansion Based Watermarking (DEW) technique is used in [8] to watermark a database in a reversible manner. DEW was proposed by Alattar in 2004 and used average values with expanding or enlarging to embed the watermark bits [9]. The method described in [8] uses distortion tolerance to determine the distortion tolerance of the attribute. Bhattacharya and Cortesi created the watermark after partitioning tuples as a permutation of tuples in 2009 [10]. A hash function is used by their method for grouping purposes. The proposed method is distortion-free due to the ordering of tuples. Hanyurwimfura et al. used nonnumeric attributes that contain multiwords [11]. Their method uses Levenshtein distance during the embedding procedure. Location of a word is shifted horizontally according to the watermark bit. Tuples and attributes that are used for embedding procedure are chosen dynamically. Farfoura et al. [12] used reversible watermarking technique in their method. Reversible watermarking approach recovers the original data from the watermarked data after watermark extraction. Primary key dependent technique cannot resist against linear transformation attacks because when the watermarked tuple is deleted, watermark cannot be detected. Their work utilized Prediction Error Expansion watermarking technique proposed by [13]. In 2012, Arif et al. emphasized rewatermarking attack and used date stamp with watermark to overcome this problem [14]. Khan and Husain proposed a database watermarking technique based on fragile watermarking [15]. Their method is a kind of zero watermarking approach and utilizes characteristics of database relation. Numeric attributes are evaluated by the method and it is not resilient to attribute value substitution attack. Camara et al. suggested a fragile zero watermarking method to authenticate numerical relational data [16]. Their method is distortion-free and generates the watermark from the original database. The method partitions the database into groups and generates some mathematical values from each group. Results show that their method is robust to some modification attacks.

Jawad and Khan used genetic algorithm with difference expansion watermarking to propose a robust and reversible database watermarking approach called GADEW [17]. GADEW minimizes distortion induced by the embedding procedure and increases watermarking capacity. Their method used DEW technique to embed the watermark bits into the database with a reversible manner.

In this work, we proposed a new reversible database watermarking method using Firefly Algorithm to both minimize distortion and improve robustness of DEW called FFADEW. The method uses a bioinspired algorithm, firefly proposed by [18] to select the best attributes for watermark embedding. Each firefly in a population is a candidate solution for watermarking process and brightest firefly on the current population is the best solution for the current iteration. Other fireflies are moved to the best firefly with the proposed moving operation. Generation of the new populations is ended when the algorithm reaches the desired condition proposed by the algorithm. Brightest firefly in the last population guides the watermarking process. The algorithm contains floating-point values that construct a firefly designating the location of attributes to be watermarked in each tuple. Experimental results show that the method modifies attribute values in a less observable manner (standard deviation and average values of attributes are used) and the method is more robust against popular database watermarking attacks compared to GADEW. Complexity of the proposed method is also less than GADEW as shown in the results by the run times of both algorithms.

The rest of the article is organized as follows: Section 2 gives background information about DEW method and Firefly Algorithm used in FFADEW. The details of the method are explained in Section 3 and experimental results are given in Section 4. Section 5 concludes the article with suggestions.

The details of the difference expansion watermarking technique and firefly optimization algorithm used in the proposed method are given in this section.

2.1. Difference Expansion Watermarking Technique

Alattar proposed a reversible watermarking approach called difference expansion watermarking (DEW) in 2004 to apply the images [9]. DEW algorithm gets a pair of adjacent pixels and modifies their difference values to show the watermark bit. The algorithm can reconstruct the original image after extracting the watermark. DEW is used to embed a watermark bit into two numeric attributes in the proposed method. The proposed method uses two numeric attributes denoted by and instead of pixel values. Average value of them and difference between them are calculated as in (1) and denoted by avg and , respectively.The symbol denotes the floor function which returns the greatest integer less than or equal to argument . The difference value d is modified to carry the watermark bit as in (2). The attribute values are also modified to give the new difference value . and denote the modified values of the attributes. The method will use the following steps to recover the original values of attributes and corresponding watermark bit value.

Step 1. Compute the average of the attributes: and the difference .

Step 2. Extract the watermark bit from the difference: .

Step 3. Reconstruct the original values of and : ; .

For example, let and be the contents of two numeric attributes in a database. The difference and average values are calculated as and . Assume that current watermark bit is 1. New difference value will be and modified attribute values will be and . Watermark extraction algorithm calculates new difference value and new average value as −15 and 20, respectively. Watermark bit is calculated as and original attribute values are reconstructed as in

2.2. Firefly Algorithm

Biologically inspired algorithms have become popular in solving global optimization problems such as the Travelling Salesman Problem (TSP) recently. Multiple agents that are affected by each other constitute the base of these algorithms. These algorithms sometimes are referred to as Swarm Intelligence (SI) based algorithms since they simulate the Swarm Intelligence characteristics of biological agents such as fish, birds, and ants. Particle swarm optimization proposed in 1995 used the swarming behavior of fish and birds [19]. Ant colony optimization and artificial bee colony optimization are some examples in this field [20, 21].

Yang proposed Firefly Algorithm, one of the nature inspired algorithms, to solve NP-hard problems in 2008 [18]. Their algorithm uses flashing patterns and behavior of fireflies. Firefly is one of the stochastic optimization algorithms in which a global solution is searched using partially randomized movements in order not to get stuck in one of many local solutions. Being a stochastic method, firefly cannot guarantee optimal solution in deterministic time but it will eventually converge to a solution in a reasonable amount of time. In fact, firefly is a metaheuristic method where a trade-off between randomization and local search is controlled by the parameters. Firefly’s heuristic depends on the survival of the population whereas randomized movement avoids local optima. The search for the optimum solution continues unless improvements in the objective function are possible. The solution in the proposed method corresponds to the selection of both tuples and attributes that minimizes distortion even though a single objective function is optimized.

There are nearly two thousand firefly species and each has a unique pattern of flashes. Flashing characteristics of the fireflies can be summarized with three rules given below.

Rule 1. Each firefly can attract others regardless of its sex. This means that all fireflies are unisex.

Rule 2. Brightness of a firefly determines its attractiveness. If two fireflies are flashing, less bright one will move towards the brighter one. If none firefly is brighter than a particular one, it will move away randomly.

Rule 3. The objective function determines the brightness of a firefly.

Brightness can be defined in a similar way with fitness function in genetic algorithm (GA). The brightness is simply proportional to the value of objective function for a maximization problem.

The brightness of a firefly for a particular location could be chosen as , where is the objective function and the attractiveness should be judged by other fireflies. Distance between the fireflies changes the attractiveness parameter. Light intensity decreases in relation to the distance from the source. The light intensity can be given in (4) in the simplest form. Intensity is inversely proportional to the square of distance as shown in the formulation. denotes the intensity at the source.

When the fixed light absorption coefficient for a medium is considered, the light intensity changes according to (5). represents the initial light intensity.

Light intensity determines the attractiveness of the firefly for other fireflies. given in (6) defines the attractiveness of the firefly. in (3) is the attractiveness at the start position. The method assumes that each firefly resides on a point at dimensional space. Assume that coordinates of any two fireflies and are represented by and . The Cartesian distance between two fireflies is calculated as in The movement of one firefly in dimensional space at time towards another attractive firefly is defined as in (8) at time ,

Attractiveness is considered in the second term and random movement is ensured by the third term in (8). is a randomization factor and is a vector of random numbers , where rand is a pseudo random number generator that generates uniformly distributed random numbers in . The equation given in (8) shows that fireflies realize simple random walk for .

3. Proposed Method

A new reversible database watermarking algorithm using Firefly Algorithm, a bioinspired optimization algorithm, is proposed here. The method uses DEW algorithm to reverse the original database after watermark extraction and it also uses optimization algorithm called firefly to determine the best candidate pairs to embed the watermark. The main advantage of the Firefly Algorithm compared to genetic algorithms is its easy implementation and its run time efficiency. In this regard, we used Firefly Algorithm in this work.

The method consists of two algorithms: watermark insertion and watermark extraction. Watermark insertion algorithm embeds the watermark information into specially selected tuples with DEW. Firefly Algorithm determines which attributes are more appropriate for watermark embedding on the selected tuples. Watermark extraction algorithm extracts the specially embedded watermark information from the database and compares it with the original one. This algorithm also reverses the watermarked database to original after watermark extraction and verification. The details of the two algorithms are given in the following sections.

3.1. Watermarking Algorithm

Watermarking algorithm consists of three modules as shown in Figure 1: preprocessing, firefly determination, and watermark embedding. Preprocessing module prepares the database by sorting tuples and columns. Firefly determination module outputs the best firefly for dataset. Watermark embedding module embeds watermark data into dataset according to the best firefly.

(a)

(b)

(c)

3.1.1. Preprocessing Module

This module prepares the database for watermarking process as the first step. The tuples of the database are sorted according to the primary key and then the columns of the database are sorted according to the names of attributes. Sorting operation ensures algorithm’s robustness against reshuffling of attributes/tuples attacks. Second step of the preprocessing stage is selection of candidate tuples from the database using a Message Authentication Code (MAC) function. These selected tuples that are evaluated during watermark insertion create a subset, DS. The equation given in (9) is used for selection purposes where denotes the Secret Key, shows the primary key of the current tuple, represents concatenation operation and function, and is 512-bit version of Secure Hash Algorithm (SHA). The tuple is selected as a member of subset DS if it satisfies (9). The value of determines the number of selected tuples.

Assume that selected subset DS has rows and attributes. A distortion tolerance range for each attribute in current database should also be determined during watermark insertion algorithm. FFADEW selects the min and the max values of each attribute as corresponding distortion tolerance ranges if no distortion tolerance is specified. Assume the defined ranges in a database with attributes denoted by . For example, values in the th attribute of the database can get values in range.

3.1.2. Firefly Determination Module

Firefly Algorithm is utilized in this step to choose the best attribute pairs for the tuples in the selected subset DS. This module gets the subset from the preceding module as input and outputs the chosen firefly. Determined solution (firefly) designates the locations (attributes positions for the selected rows) to embed the watermark. Firefly determination module consists of two phases and the details of them are given in below.

Phase 1 (creation of the initial population). The algorithm creates initial population with fireflies, . A firefly that constitutes an example solution for the current selected database DS can be represented by a floating-point array. Each element of the array represents the index values of the attributes for each row that is chosen for embedding the watermark information. While integer part of the element represents the index value for the first attribute, fractional part denotes the index value of second attribute at the current row. For example, assume that a database accommodates ten attributes and one firefly is determined as the solution during the optimization algorithm. If the first element of the firefly is 7.2, the algorithm uses seventh and second attributes of the first row that is chosen for watermark embedding.
th candidate solution for the current database contains floating-point numbers and is denoted by . The maximum value for an element in the current firefly can be and it shows that and th attributes are chosen for the current row for watermark embedding. Steps for the creation of initial population are given below.
Function. Initial_Population
Input. DS
Output. Initial population
Step 1. Repeat Step 2 for .
Step 2. Repeat the steps from 2.1 to 2.4 for .
Step 2.1. Select an attribute randomly from the kth row of DS.
Step 2.2. Apply DEW algorithm on and the other attributes , at the th row. .
Step 2.3. Fitness function selects best pair whose elements are changed less compared to other pairs after embedding the watermark bit with DEW. Selected pair gives minimum value for .
Step 2.4. Index values of these attributes constitute the corresponding kth element of the initial population, where denotes the integer part and denotes the fractional part. The information extracted from the th row determines the th element of the current firefly .

Used fitness function for generating the initial population aims to minimize distortion after embedding watermark. Therefore, it selects best pair for each row that results in minimum modification after DEW. Fitness function aims to improve two criteria: minimization of distortion on the selected attributes caused by DEW and improving the watermark capacity. Therefore, FFADEW selects the best pair for each row that gives minimum absolute difference after embedding watermark during the generation of the initial population. Figure 2 shows an example for the creation of initial population.

Figure 2(a) gives the th row of DS. The algorithm randomly selects an attribute, here the seventh in this example. Figure 2(b) shows all possible pairs that can be created with and the result of DEW algorithm on all possible pairs. The sixth pair yields minimum difference if total cost (minimum absolute difference) is considered for each pair. Thus, the algorithm chooses and embeds the value 7.6 into th element of current firefly.

Phase 2 (determination of the best firefly). In this step, FFADEW determines brightness value for each firefly in initial population that was created in the previous phase and chooses the brightest one among them. The other fireflies are moved towards to the best one and the best one is also moved randomly after other fireflies were moved. Phase 2 of the firefly determination module has two terminating conditions: Step or Step 2.3 in the Brightness Value Determination algorithm. First condition controls the number of iteration. The algorithm terminates and returns the brightest firefly if the algorithm exceeds iteration. Second condition controls the number of watermarked tuple and distortion on the watermarked tuple. The algorithm terminates if the brightness value for the current firefly is greater than a predefined threshold value , and the current firefly is saved.
This phase consists of two algorithms: Brightness Value Determination and firefly moving algorithms. The first one determines brightness values for each firefly and run the other algorithm to determine the new coordinates of fireflies. The details of the algorithms are given below.
The brightness value for th firefly, , is determined with following algorithm. The algorithm evaluates subdataset DS with rows.
Function. Brightness_Value_Determination
Input. DS and initial population
Output.
Step 1. Repeat the steps from 2 to 3 for .
Step 2. Repeat the steps from 2.1 to 2.3 for .
Step 2.1. Repeat the following steps from 2.1.1 to 2.1.5 for .
Step 2.1.1. Extract jth row of DS, .
Step 2.1.2. ; .
Step 2.1.3. .
Step 2.1.4. .
Step 2.1.5Step 2.2. .
Step 2.3. If save the current firefly into and go to Step 4.
Step 3. Implement moving operation.
Step 4. Export to the following phase (watermark insertion).

and functions get the jth element of th firefly and return the integer part and fractional part of the element. These values show the index values of attributes that are chosen for watermark embedding during population generation. The function given in Step 2.1.4, , applies DEW algorithm on two selected attributes and embeds one bit watermark. The function returns the new values of after watermark embedding . Distortions are calculated if modified attribute values fall into the desired ranges. Distortions on the attributes induced from the watermarking process are denoted by , and they are accumulated independently as shown in Step 2.1.5. The variable is used to hold information about how tuples in the selected subset can be watermarked. Current element of firefly is not considered in the calculation of brightness value if modified attribute values do not fall into the desired ranges, and the algorithm returns to Step 2.1 to consider the next element of the current firefly.

The algorithm assigns a brightness value for each firefly. The brightness values are affected from two factors: total distortions on the attributes and the number of tuples that can be watermarked. First factor is the ratio between the sum of the distortions on the attributes , and total record count in the database, . Second factor indicates how many records in the selected subset can be watermarked. The two factors are weighted with and to determine the brightness value. Figure 3 shows an example for calculation of on an example dataset. First row of this dataset is the calculated one in Figure 2. This example assumes that firefly consists of only five elements. First part of the brightness function is calculated as given in Figure 3. mdf values and ratios are also reported in the figure. Figure 4 also shows the calculation of the other parts of brightness function . Four out of five elements of the current firefly can be watermarked because watermarked values fall into the desired range. Only first row is not watermarked.

The equation in Step 2.2 that corresponds to the brightness of th firefly in the population is the objective function and also used to compute given in (6). Weights and in the objective function correspond to normalized distortion and amount of embedded watermark, respectively. These weights are selected such that their sum is one. Higher values of indicate increased importance of normalized distortion compared to the amount of embedded watermark whereas higher values of indicate increased importance of the amount of embedded watermark compared to normalized distortion in the objective function. Embedded watermark ratio and distortion are calculated for values in and reported in Figure 18. Tests indicated that and yield highest embedded watermark and lowest distortion for the dataset. Because of this and are selected as 0.4 and 0.6, respectively.

The distance between two fireflies in the original Firefly Algorithm is computed by the Cartesian distance given in (7) whereas the proposed method uses total number of different corresponding element as the distance between two vectors. The fireflies in the current population are sorted according to their brightness values and the brightest firefly is chosen as the best firefly. Other fireflies in the current population are moved towards to the best firefly. Assume that denotes the brightest firefly. The element-by-element difference between a firefly and the best firefly is used to guide the moving process. The main purpose of the moving process is to minimize the difference between two vectors that corresponds to two fireflies. The distance between the th firefly and the best firefly is calculated using (11). Since each firefly has elements, distance is in range.

The movement of fireflies given in (8) is modified such that fireflies in the population improve their similarity to the best firefly by copying a randomly selected number of best firefly’s component. Movement equation given in (8) has 2 parts as information based and random movement. Both of these movements are responsible for finding an optimal solution in the search space without getting stuck at local extremes. The information based movement relies on the positions and brightness of the population whereas random movement acts like mutation in the genetic algorithms where there is no guarantee that the movement is in the correct direction. Carefully selected parameter is important in order to achieve acceptable results. The value of parameters depends on the problem and may require some trial and error before successfully generating an optimal solution.

The algorithm moves fireflies to the firefly corresponding to the best solution, by generating a random number and replaces randomly chosen elements of with corresponding elements of . The random number and it must also satisfy , where denotes the amount of modification and can get values in range. All elements of are replaced with corresponding elements of if is one and no elements of are replaced with corresponding elements of if is zero.

Randomness of moving operation given in (8) is used to determine the number of elements to be changed in the current solution vector (firefly) with the best solution vector (firefly) to make it similar (moving current firefly closer to the best firefly) in our method. The criterion given in Step 3.2 which is specific to our algorithm is used in the selection of the randomization parameter and range is used to generate information based random values.

The following source code realizes moving operation for all fireflies except from the best.

Function. Moving_Operation

Input. DS,

Output. New

Step 1. Repeat the steps from 2 to 4 for .

Step 2. Calculate the distance between the current firefly and the best one .

Step 3. Repeat the steps from 3.1 to 3.4 for .

Step 3.1. Store the current firefly into temporary firefly .

Step 3.2. Choose a random number and it must also satisfy the rule, . If the rule is not satisfied choose another random number .

Step 3.3. Replace randomly selected elements of with corresponding elements of .

Step 3.4. Calculate the brightness value for the current and store it into an array .

Step 4. Choose the brightest firefly using TB from Ftemp and replace with it.

Figure 5 shows an example of moving operation for , or 40% replacement. Three fireflies from the population are shown in the figure. Third and best firefly is the brightest one and its brightness value is 4.431. The distances between the other fireflies and the brightest one are calculated as 5 and 5. Random numbers (1 and 3) are chosen for two fireflies. However, the rule given in Step 3.2 is not satisfied for the first firefly, , and a new random number 2 is chosen. Two and three elements’ contents of two fireflies are replaced by corresponding elements of the best firefly and thus moving process is realized.

The best firefly is also moved after moving all other fireflies. Only Step 3.3 is different from the above code to determine the new coordinates of the best firefly. Randomly selected elements of current firefly are modified randomly as given below.

Step 3.3 integer and fractional parts of each randomly selected elements of the best firefly are modified randomly and stored in . Randomly selected values are floating-point numbers in · form and both and must satisfy conditions.

3.1.3. Watermark Embedding Module

Each element of the brightest firefly of the firefly determination module is consulted and watermark embedding process is realized. Elements of the firefly designate the attribute locations of the current row in the selected dataset. Attributes are modified to embed watermark bit using DEW. The following code inserts the watermark information into corresponding attributes in the selected dataset.

Function. Watermark_Embedding

Input. DS, ,

Output. Watermarked DS,

Step 1. Repeat the following steps for .

Step 1.1. Extract jth row of DS, .

Step 1.2. ; .

Step 1.3. .

Step 1.4. .

Step 1.5. .

3.2. Watermark Extraction Algorithm

This algorithm extracts watermark information from the database and reconstructs the original database using DEW. Figure 6 shows general structure of the algorithm.

Rows of the database are sorted according to the primary keys and columns are sorted according to the names of the attributes. After sorting operation, selected subdataset DS that was used during watermark embedding procedure must also be constructed in this algorithm to extract watermark correctly. The equation given in (9) is applied on the database with same secret value to determine DS. The secret value and the best firefly must be transmitted to watermark extraction algorithm as the side information.

Since best firefly has elements, DS also has rows. Each element of points to attributes of current row that are used for watermark embedding. DEW is applied on designated attributes by and watermark information is extracted. DEW algorithm also reconstructs the original values of attributes. The source code for watermark extraction algorithm is given below.

Function returns the extracted watermark information and original values of .

Function. Watermark_Extracttion

Input. ,

Output. Original DS and reconstructed watermark

Step 1. Repeat the following steps for .

Step 1.1. Extract jth row of DS, .

Step 1.2. ; .

Step 1.3. .

Step 1.4. .

Step 1.5. .

If the extracted watermark is consistent with the current watermark, this means that the database is authentic and no modification is applied on it.

4. Results and Discussion

Simulation results of the proposed method are given in this section. The method is coded in C# using MicroSoft Visual Studio IDE. MicroSoft SQL Server 2012 is used for modification operations on the database. Forest Cover Type (FCT) dataset provided by University of California is used as the database to watermark by the method. The dataset contains 581.012 tuples and consists of 54 numeric attributes. However, nine attributes are used during the experiments. Experiments are given in three sections. First section compares the method with similar works reported in the literature [9, 17]. Second section shows the performance of the method under some well-known attacks used to test the effectiveness of watermarking. The last section analyses effects of some of the parameters on the performance of the algorithm.

4.1. Performance Analysis

In this section, three criteria are used to compare the method with GADEW [17]: running time, watermarking capacity, and distortion effect. The details of the experiments are given below.

The first experiment compares the run time efficiency of the method with GADEW [17]. The number of tuples in the selected dataset DS is determined to be 1610, 3125, 4059, and 6079, respectively, and two algorithms are used to watermark DS ten times. Average running times for two methods with different number of tuples are given in Figure 7. The result shows that FFADEW algorithm runs much faster than GADEW for the same number of tuples. For example, FFADEW determines the best firefly in 23 s whereas GADEW determines the best solution in 196 s for a DS with 1610 tuples. The figure also shows that the method gives better results even if DS contains more tuples.

Second experiment tests watermarking capacity of methods for comparison. Watermarking capacity is the ratio of tuples to total number of tuples in the DS that can be used for watermarking. Higher watermarking capacity means more watermark information can be embedded into the same DS. Six different DS are created with 3110, 4121, 6030, 9004, 10929, and 21727 tuples to test the watermarking capacity of the proposed method and others. Figure 8 shows the watermarking capacity performance of DEW, GADEW, and FFADEW methods. For example, when the number of tuples in DS becomes 21727, DEW and GADEW can use 31% and 32% of DS for watermark embedding. However, FFADEW watermarks 37% of DS. The figure also shows that the method has better watermarking capacity compared to others for larger DS. The method has approximately 37% watermarking capacity for different DS sizes.

Third experiment analyses distortion effect of the proposed watermarking approach. The attacker aims to determine the distortion on the dataset, an indication of an embedded watermark. Distortion on the attributes is a clue about the locations of the watermark. A potential attacker needs to estimate the locations of the watermark first and then try to destroy it. Thus two metrics are important to evaluate the distortion effect of the method: alteration on the standard deviation and average of each attribute in the selected dataset. Figure 9 gives the amount of alteration on the standard deviation of attributes. In this test, DS has 3110 tuples. GADEW alters 24% and 31% of the standard deviation of A3 and A8 attributes, respectively, whereas FFADEW alters only 0.0742% and 0.0168% of the standard deviation for the same attributes. Likewise, Figure 10 shows alteration on the average values of attributes. Average values of A3 and A5 are altered 1.1% and 0.3%, respectively, for DEW or GADEW whereas FFADEW altered only 0.01% and 0.0025% of the average values of the same attributes.

4.2. Watermarking Attacks

The robustness of the method under well-known database watermarking attack scenarios is reported in this section. The main purposes of the attacks are to destroy the watermark information or to embed their own watermark information. The main attack types used and reported in the literature are considered and comparative results are given. A total of 1172 tuples from the selected DS are watermarked to make a fair comparison with GADEW reported in [17]. Successfully extracted watermarked tuple ratios reported are the average of 10 runs of FFADEW since both methods rely on stochastic optimization. The proposed method yields better detection rate on the same dataset with approximately equal number of watermarked tuples.

The first experiment is used to evaluate the robustness of the method when addition attack is applied on the watermarked database. Randomly created tuples are added to the database to destroy the watermark in this type of attack. Figure 11 shows that FFADEW is robust to watermark addition attack.

Even if half of the watermarked number of tuples are added to the database, the watermark is extracted with a true manner. Deletion attack is applied as the second experiment to show the performance of FFADEW. This type of attack deletes randomly selected tuples to destroy the watermark. Figure 11 shows that, as the number of deleted tuples increases, ratio of the extracted watermark decreases. Watermark cannot be recovered if a tuple that contains watermark information is deleted. Thus, deletion attack has the worst effect on the number of watermarked tuples as shown in the figure. Bit-flipping attack is applied on the database as the third test. Bit-flipping attack complements LSBs of all attributes in randomly selected tuples. FFADEW yields similar results with deletion attack for bit-flipping attack. Since this attack modifies the content of the attributes it may also destroy watermark information in a tuple if it has watermark bit embedded in it. Figure 11 shows that the ratio of successfully extracted watermark information decreases considerably as the destroyed number of tuples increases. However, we only modify half of the attributes to discriminate the deletion attack from the bit-flipping attack visually. The result of the sorting attack is also shown in Figure 11. This type of attack rearranges the attributes in database. The method is robust against the sorting attack because the proposed method sorts the attributes alphabetically before embedding watermark and extraction algorithms. Watermark extraction ratio does not change even if an attacker modifies the original sequence of attributes. The last two attack types are tuple-wise multifaceted attack and attribute-wise multifaceted attack. The first one modifies the original database by applying three different attacks sequentially: addition, deletion, and bit-flipping attacks. Figure 11 shows the performance of the method under this attack and the method is also compared with methods in [9, 17] to show the effectiveness of it as shown in Figure 12.

DEW based method yields a detection ratio above 100% because addition attack causes false positives. GADEW can extract only half of the watermark information if half of the tuples in the database are affected by the attack. The proposed method detects 63.24% of the watermark for the same test. The last attack called attribute-wise multifaceted attack applies two different types of attacks to modify the original database, bit-flipping, and update attacks. Update attack modifies the content of an attribute with another attribute’s content. Figure 13 shows the comparison of the methods. The proposed method extracts 63.24% of the watermark for 50% attack whereas GADEW extracts only 40% of the original watermark for this test. Figure 13 also indicates that the proposed method yields better detection for other attack ratios.

Attribute deletion attack removes attributes from database to destroy the watermark. Figure 14 shows that the ratio of the extracted watermark decreases as the number of deleted attributes increased. Watermark cannot be recovered if an attribute that contains watermark information is deleted.

4.3. Algorithm Parameter Analysis

The effect of parameters on the performance of the proposed method is evaluated in this section. The effects of population size , maximum number of iteration , and number of movement operation are evaluated during the tests.

The first experiment tests the proposed method for 10, 20, 30, 40, and 50 fireflies as the population size . Figure 15 shows the performance of the proposed method for values. The -axis of the graph shows the ratio of watermarked number of tuples to the total number of tuples in DS. The graph shows that the ratio changes in 38.8% and 40%. The result indicates that FFADEW yields high watermarking ratio even for small populations. Execution time of algorithm also increases as the population size increases. Thus, it is important to seek for the same performance with a smaller population.

The effect of maximum number of iteration is evaluated in the second experiment. The method is allowed to iterate 30, 40, 50, 60, and 70 times during the test. Figure 16 indicates that the method has watermarking ratio between 37.7% and 38.7% in five different runs. The algorithm has a high watermarking ratio even for less number of iterations. It means that FFADEW has almost the same performance with better execution times.

Figure 17 shows the effects of the number of candidate move operations on the watermarking ratio. Movement operation is repeated for times, and the algorithm selects the best movement operation between them. Watermarking ratio also increases as increase. However, watermarking ratio does not change considerably as shown in Figure 17 for values greater than 15. It also shows that FFADEW converges the best solution faster with less number of iterations.

Total distortion and successfully embedded watermark ratios as a function of weights and used in the objective function are given in Figure 18. Highest watermark ratio and lowest distortion values are observed for and during tests for the same dataset.

5. Conclusion

Database watermarking has become an active research as the demand for sharing information increases. Many reversible watermarking techniques have been proposed to watermark the database and to retrieve the original version of the database after watermark extraction. Researchers proposed using GA to minimize distortion and improve watermark capacity for database watermarking. However, optimization techniques such as GA increase complexity of the watermarking process. Firefly, a new bioinspired optimization algorithm, is adopted and used with DEW in this work to both minimize distortion and reduce complexity during database watermarking. Experimental results indicate less distortion with improved watermark capacity faster than similar works reported in the literature.

Conflicts of Interest

The authors declare that there is no conflict of interests regarding the publication of this paper.

References

I. Cox, “Digital watermarking,” Journal of Electronic Imaging, vol. 11, no. 3, 2002.
View at: Publisher Site | Google Scholar
J. Kiernan, R. Agrawal, and P. J. Haas, “Watermarking relational data: framework, algorithms and analysis,” The VLDB Journal, vol. 12, no. 2, pp. 157–169, 2003.
View at: Publisher Site | Google Scholar
R. Sion, M. Atallah, and S. Prabhakar, “Rights protection for relational data,” IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 12, pp. 1509–1525, 2004.
View at: Publisher Site | Google Scholar
Y. Li, H. Guo, and S. Jajodia, “Tamper detection and localization for categorical data using fragile watermarks,” in Proceedings of the 4th ACM Workshop on Digital Rights Management (DRM '04), pp. 73–82, October 2004.
View at: Google Scholar
H. Guo, Y. Li, A. Liu, and S. Jajodia, “A fragile watermarking scheme for detecting malicious modifications of database relations,” Information Sciences, vol. 176, no. 10, pp. 1350–1378, 2006.
View at: Publisher Site | Google Scholar | Zentralblatt MATH
Y. Zhang, B. Yang, and X.-M. Niu, “Reversible watermarking for relational database authentication,” Journal of Computers, vol. 17, no. 2, pp. 59–66, 2006.
View at: Google Scholar
X. Zhou, M. Huang, and Z. Peng, “An additive-attack-proof watermarking mechanism for databases' copyrights protection using image,” in Proceedings of the ACM Symposium on Applied Computing (SAC '07), pp. 254–258, Seoul, Korea, March 2007.
View at: Publisher Site | Google Scholar
G. Gupta and J. Pieprzyk, “Reversible and blind database watermarking using difference expansion,” in Proceedings of the 1st International Conference on Forensic Applications and Techniques in Telecommunications, Information, and Multimedia and Workshop, Adelaide, Australia, January 2008.
View at: Google Scholar
A. M. Alattar, “Reversible watermark using the difference expansion of a generalized integer transform,” IEEE Transactions on Image Processing, vol. 13, no. 8, pp. 1147–1156, 2004.
View at: Publisher Site | Google Scholar | MathSciNet
S. Bhattacharya and A. Cortesi, “A distortion free watermark framework for relational databases,” in Proceedings of the 4th International Conference on Software and Data Technologies (ICSOFT '09), pp. 229–234, Springer, Sofia, Bulgaria, July 2009.
View at: Google Scholar
D. Hanyurwimfura, Y. Liu, and Z. Liu, “Text format based relational database watermarking for non-numeric data,” in Proceedings of the International Conference on Computer Design and Applications (ICCDA '10), pp. V4-312–V4-316, IEEE, Qinhuangdao, China, June 2010.
View at: Publisher Site | Google Scholar
M. E. Farfoura, S.-J. Horng, J.-L. Lai, R.-S. Run, R.-J. Chen, and M. K. Khan, “A blind reversible method for watermarking relational databases based on a time-stamping protocol,” Expert Systems with Applications, vol. 39, no. 3, pp. 3185–3196, 2012.
View at: Publisher Site | Google Scholar
D. M. Thodi and J. J. Rodriguez, “Prediction-error based reversible watermarking,” in Proceedings of the International Conference on Image Processing (ICIP '04), Singapore, October 2004.
View at: Publisher Site | Google Scholar
F. Arif, S. Suhail, and M. Kamran, “Watermarking of relational databases with emphasis on re-watermarking attack,” International Journal of Computer Science Issues, vol. 9, no. 3, pp. 521–526, 2012.
View at: Google Scholar
A. Khan and S. A. Husain, “A fragile zero watermarking scheme to detect and characterize malicious modifications in database relations,” The Scientific World Journal, vol. 2013, Article ID 796726, 16 pages, 2013.
View at: Publisher Site | Google Scholar
L. Camara, J. Li, R. Li, and W. Xie, “Distortion-free watermarking approach for relational database integrity checking,” Mathematical Problems in Engineering, vol. 2014, Article ID 697165, 10 pages, 2014.
View at: Publisher Site | Google Scholar
K. Jawad and A. Khan, “Genetic algorithm and difference expansion based reversible watermarking for relational databases,” Journal of Systems and Software, vol. 86, no. 11, pp. 2742–2753, 2013.
View at: Publisher Site | Google Scholar
X.-S. Yang, Nature-Inspired Metaheuristic Algorithms, Luniver Press, 2008.
J. Kennedy and R. Eberhart, “Particle swarm optimization,” in Proceedings of the IEEE International Conference on Neural Networks, vol. 1–6, pp. 1942–1948, Perth, Australia, December 1995.
View at: Google Scholar
A. Colorni, M. Dorigo, and V. Maniezzo, “Distributed optimization by ant colonies,” in Proceedings of the 1st European Conference on Artificial Life, Paris, France, December 1991.
View at: Google Scholar
D. Karaboga, “An idea based on honey bee swarm for numerical optimization,” Tech. Rep. TR06, Erciyes University, Engineering Faculty, Computer Engineering Department, 2005.
View at: Google Scholar

Copyright

Copyright © 2017 Mustafa Bilgehan Imamoglu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

2267

Downloads

1119

Citations

Mathematical Problems in Engineering

A New Reversible Database Watermarking Approach with Firefly Optimization Algorithm

Abstract

1. Introduction

2. Related Works

2.1. Difference Expansion Watermarking Technique

2.2. Firefly Algorithm

3. Proposed Method

3.1. Watermarking Algorithm

3.1.1. Preprocessing Module

3.1.2. Firefly Determination Module

3.1.3. Watermark Embedding Module

3.2. Watermark Extraction Algorithm

4. Results and Discussion

4.1. Performance Analysis

4.2. Watermarking Attacks

4.3. Algorithm Parameter Analysis

5. Conclusion

Conflicts of Interest

References

Copyright