About this Journal Submit a Manuscript Table of Contents
The Scientific World Journal
Volume 2013 (2013), Article ID 796726, 16 pages
http://dx.doi.org/10.1155/2013/796726
Research Article

A Fragile Zero Watermarking Scheme to Detect and Characterize Malicious Modifications in Database Relations

1Department of Computing & Technology, Iqra University, Islamabad 44000, Pakistan
2Faculty of Computer Science & IT, King Faisal University, Ahsaa 31982, Saudi Arabia

Received 19 February 2013; Accepted 10 April 2013

Academic Editors: P. Melin and Y. Zhu

Copyright © 2013 Aihab Khan and Syed Afaq Husain. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

We put forward a fragile zero watermarking scheme to detect and characterize malicious modifications made to a database relation. Most of the existing watermarking schemes for relational databases introduce intentional errors or permanent distortions as marks into the database original content. These distortions inevitably degrade the data quality and data usability as the integrity of a relational database is violated. Moreover, these fragile schemes can detect malicious data modifications but do not characterize the tempering attack, that is, the nature of tempering. The proposed fragile scheme is based on zero watermarking approach to detect malicious modifications made to a database relation. In zero watermarking, the watermark is generated (constructed) from the contents of the original data rather than introduction of permanent distortions as marks into the data. As a result, the proposed scheme is distortion-free; thus, it also resolves the inherent conflict between security and imperceptibility. The proposed scheme also characterizes the malicious data modifications to quantify the nature of tempering attacks. Experimental results show that even minor malicious modifications made to a database relation can be detected and characterized successfully.

1. Introduction

Digital watermarking is a class of information hiding technique that provides measures for copyright protection, broadcast monitoring, covert communication, copy control, tamper, and integrity proof of digital assets. The watermarking techniques were primarily proposed for multimedia content [14]; however, in the last decade, the research community has extended these techniques to relational databases for its copyright protection, temper detection, and integrity proof. Most of the existing watermarking schemes for relational databases [520] introduce intentional errors or distortions as marks in the underlying data with some error tolerance so that it does not have a significant impact on the usefulness of data. However, this results in degrading data quality as the integrity of a relational database is violated. A large collection of real-world datasets has a strong usability constraint that disallows any permanent distortions or intentional errors. For example, the safety critical datasets are designed to minimize errors rather than to introduce intentional errors. Similarly, a business application may require that local properties like item-cost, ordered-quantity, and so forth, are preserved as well as global properties like natural join between item and sales, employees and department, and so forth. Moreover, in business datasets, the semantic constraints are not violated, like dissimilarity in attribute value for two similar transactions [21]. Query processing is sensitive due to selection criteria and has well-defined semantics; therefore, the watermarking schemes that introduce distortion into the database original content are not appropriate for certain applications.

Based on the intent of marking, the watermarking schemes presented in the literature can be categorized into robust and fragile schemes. The robust schemes [516] are aimed at copyright protection, whereas the fragile schemes [1725] are used for tamper detection and integrity proof of database relations. Most of the robust schemes for copyright protection [516] introduce distortions into the database original content which affects data integrity and usability. These robust schemes may work for numeric [510] and categorical attributes [11, 12] of relational databases to embed watermarks. Some techniques embed meaningless bit pattern [5, 6]; whereas in other techniques meaningful bit patterns like image [1315] and owner’s speech [16] are used as watermarks for embedding in relational databases. In data sales environment, some of these robust schemes are extended to fingerprinting domain for unique identification of each buyer and also for traitor detection [21, 2628]. Compared with the robust schemes, the fragile watermarking schemes are not adequately addressed and relatively little work is available for integrity proof of relational databases [20]. In this paper, we focus on fragile watermarking schemes for temper detection and integrity proof of database relations.

The initial work on fragile watermarking schemes can be found on images [2931], which is extended to audio [32, 33] and video [3, 34] schemes. Recently, the importance of other data domains is recognized and fragile schemes for text [35, 36] and relational databases [1720, 2225] are proposed. Like robust schemes, most of the fragile schemes for relational databases [1720] introduce distortion into the database original contents that degrades data quality and also affects data usability. These schemes are based on the content characteristics of database relation itself to create a secure hash (used as a watermark) which is stored in Least Significant Bits (LSBs) of database original contents, thus introducing distortion.

A fragile watermarking scheme presented by Guo et al. [17] detects malicious modifications made to a database relation. In their scheme, the watermark generation is based on the content characteristics of the database relation itself. The generated watermarks are embedded in at most two LSBs of all attributes in the database relation that introduces considerable distortion in the database original contents. The fragile scheme presented by Khataeimaragheh and Rashidi [18] is also a distortion-based scheme for integrity proof of database relations. Like [17], the watermarks are embedded in at most two LSBs of all attributes in the relation that forms a two-bit watermark grid. The fragile scheme presented by Iqbal et al. [19] logically partitions the database relation into three groups and generates self-constructing fragile watermark information from each group. The generated watermarks are embedded at LSBs of numerical attributes in each group of a database relation which introduces distortion in database original contents. Prasannakumari [20] presented a fragile scheme for temper detection in database relations. This technique also introduces distortion as it inserts a fake attribute in database relation to act as a watermark. The data values for the newly inserted attribute are determined by applying aggregate function on original database content.

Beside distortion-based techniques, some researches also presented distortion free fragile watermarking schemes [2225] for integrity proof of database relations. The main feature of these schemes is that the watermark embedding in actual fact is the tuples or attributes reordering based on the content characteristics of database relation. A fragile scheme proposed by Li et al. [22] detects and localizes malicious modifications made to the database relations. Their scheme partitions the database relation into disjoint groups and the watermark is embedded and verified in each group independently. In their scheme, the watermark is embedded as tuple reordering and the order of each tuple pair in group is changed or unchanged depending on the tuple hash values and the corresponding group hash value. Though their technique does not introduce any distortion in the database relation, but it works only for categorical data type. Kamel [23] presented a fragile scheme to protect the integrity of database relations. Their scheme divides the database relations in groups and each group is marked independently. As in [22], the watermark embedding is reordering of tuples in each group that corresponds to the value of some secret watermark. The fragile scheme proposed by Bhattacharya and Cortesi [24] detects malicious modifications in database relations having categorical attributes. Their scheme divides the database relation into groups on the basis of categorical attribute values. Like [22, 23], tuple hash value is used to obtain a watermark as permutation of tuples. A fragile zero watermarking scheme is presented by Hamadou et al. [25] for authentication of database relations. Their technique is distortion-free and is based on attribute reordering method. Initially, the attributes of database relation are virtually sorted on hash values of attribute names to define a secret initial order of attributes. For each attribute in database relation, the Most Significant Bits (MSBs) are extracted and used for watermark generation. The generated watermark is then registered with the Certification Authority (CA) for certification purpose. As their technique is based on virtual sorting of attributes by their names, so any change in attribute name by attacker would fail the temper detection process.

In the previous discussion, we have identified two important issues in existing fragile watermarking schemes. First, the fragile schemes are distortion based [1720] that inevitably degrade data integrity and thus affect data usability; therefore, these schemes are not applicable to non-error-tolerant data like safety critical datasets, and so forth. Second, though there exist some fragile schemes like [2225] that are distortion-free, but the watermarking approach is based on reordering of tuples or attributes; so, they are vulnerable to sorting attacks. Also, if the modification is small, such that, it does not affect the order of tuples, the temper detection would fail. To address these issues, we propose a fragile scheme based on zero watermarking approach that does not modify any part or properties of the database relations itself; therefore, the proposed scheme assures imperceptibility and overcomes weaknesses like data integrity and data usability in existing fragile watermarking schemes. Also, the proposed scheme is independent of tuple ordering as well as attributes ordering and naming, so it is not vulnerable to sorting attacks. The watermark generation in the proposed scheme is based on algorithmically evaluating the local characteristics of database relation like frequency distribution of digit count, length and range of data values. This enables us to characterize the malicious data modifications on parameters like the fraction of digit, length and range of data values attacked, the type of attack (insertion, deletion, or update), and the effect of attack (low to high, high to low, or no change) on data values. Also, to the best of our knowledge, there is no such distortion-free fragile watermarking scheme that can characterize the tempering attacks, that is, the nature of tempering. Experimental results show that the proposed scheme can detect and characterize malicious data modifications successfully.

2. Materials and Methods

In this section, we present our proposed fragile zero watermarking scheme to detect and characterize malicious modifications made to a database relation. The proposed scheme exhibits the following important properties of a fragile watermarking system as discussed in [17].(1)Fragility. The proposed scheme is designed to be fragile; that is, if there are any malicious data modifications, the embedded watermark is not detectable (destroyed).(2)Imperceptibility. As the proposed scheme is based on zero watermarking approach, it does not introduce any distortion in the underlying data; therefore, the embedded watermark is invisible or imperceptible.(3)Key-Based System. The watermark generation and verification in the proposed scheme is a key-based system. Also, to detect and characterize malicious data modifications, a secret key is required.(4)Blindness. In the proposed scheme, the original database relation is not required to detect and characterize malicious data modifications. (5)Tuple and Attribute Ordering. The existing fragile schemes are based on tuple ordering [2224] and attribute ordering and naming [25]. The proposed scheme is independent of tuple and attributes ordering so it is not vulnerable to sorting attacks.(6)Characterization. The proposed scheme not only detects but also characterizes the malicious data modifications in database relation to quantify the nature of tempering attacks.

2.1. Watermark Generation

Let be a database relation with primary key and attributes denoted by . The watermark generation in the proposed scheme is based on the content characteristics of numeric data values, so we assume that some attributes of the database relation are numeric. Figure 1 shows the watermark generation process that comprises of subwatermark generation for digit count, length, and range of data values. The generated watermark is registered with the Certification Authority (CA) for certification purpose. Table 1 presents the list of notations used in our algorithms and discussion.

tab1
Table 1: Notations.
796726.fig.001
Figure 1: Proposed model for watermark generation and registration.

The algorithm for watermark generation is presented in Algorithm 1. At lines 1–3, the digit, length, and range of data values in a database relation are algorithmically evaluated to generate the subwatermarks as presented in Algorithms 24. These subwatermarks are then used to generate a database relation watermark as shown at line 4. At line 5, the relation watermark is encrypted with a secret key known only to the database owner. We assume that the secret key is selected from large key space such that it is computationally infeasible for attacker to guess a key. At lines 6-7, the encrypted relation watermark is concatenated with owner Id along with date and time stamp to generate a watermark certificate , which is then registered with the CA before publishing the database for certification purpose.

alg1
Algorithm 1: Watermark generation.

alg2
Algorithm 2: Digit sub-watermark generation.

Algorithm 2 generates a digit subwatermark which is based on digit frequency for all data values present in adatabase relation. At lines 1–3, the length of each data value is determined which is then used to extract the individual digits as shown at lines 4-5. Lines 6-7 compute the frequency of each digit and the total number of digits present in the database relation. At line 11, the relative frequency of each digit is determined which is then used to generate a digit subwatermark as shown at line 13. At lines 15-16, the digit subwatermark is concatenated with total digit count and is returned to the watermark generation algorithm. It is to be noted that the digit subwatermark is composed of each digit relative frequency and the total count of all digits. In fact, this information is used for characterization of attacks as discussed in Section 3.

The subwatermark generation for length of data values in a database relation is presented in Algorithm 3. At lines 1–3, the length of each data value is determined. Lines 4-5 determine the frequency for each length of data values and the total count of data values length present in the database relation. At line 9, the relative frequency for each length of data value is computed which is then used to generate length subwatermark as shown at line 10. At lines 12-13, the length subwatermark is concatenated with total length count and is returned.

alg3
Algorithm 3: Length sub-watermark generation.

alg4
Algorithm 4: Range sub-watermark generation.

Algorithm 4 presents the algorithm for subwatermark generation for range of data values in a database relation. At line 1, different data ranges are defined in which the data value of a database relation may fall. It is to be noted that the defined data ranges may be adjusted as per the nature of data values in the database relation and also for more precise characterization of malicious data modifications, as discussed in Section 3. Lines 1–3 determine the attribute value, within each tuple. Lines 5–13 determine the frequency for different data ranges in which the data value may fall and the total number of data ranges present in the database relation. At lines 16-17, the relative frequency for each range of data value is computed, which is then used to generate range subwatermark . Lines 19-20 show that the range subwatermark is concatenated with total range count and is returned.

2.2. Watermark Verification

Figure 2 shows the model for detection of malicious modifications in suspicious database relation . For detection of malicious data modifications, the relation watermark is regenerated for suspicious database relation and compared with the relation watermark registered at CA; if both watermarks are different then the suspicious database relation is considered as a tempered relation.

796726.fig.002
Figure 2: Proposed model for detection of malicious tempering.

The algorithm for watermark detection is presented in Algorithm 5. At line 1, the watermark is generated by using Algorithm 1 for suspicious database relation . The watermark certificate which is already registered at CA is used to extract database relation watermark as shown at lines 2–4. At lines 5–10, each digit of is compared with the corresponding digit of and match_count is incremented on each successful match. At line 9, the total_count is computed to know the number of digits tested. At lines 11-12, the WAR (Watermark Accuracy Rate) and WDR (Watermark Distortion Rate) are computed. If the distortion exists in the suspicious database relation , then is rejected as a tempered relation with distortion rate WDR as shown at lines 13–15.

alg5
Algorithm 5: Watermark verification.

The algorithm for characterization of malicious data modifications is presented in Algorithm 6. At line 2, the relative frequency of each digit is extracted from digit subwatermark as and is already registered at CA. The frequency distribution of each digit in relation is determined at line 3. At line 4, the frequency distribution of each digit for suspicious database relation is determined. The change in frequency distribution of each digit is computed at line 5 and the fractional change in each digit is determined at line 6. The computed value of is then used to characterize the malicious modifications made to the database relation . For example, if is zero, then the suspicious relation is not tempered. A positive indicates that fraction of digit is maliciously inserted by attacker as an attempt to transform low data values to high in database relation . Similarly, a negative indicates that fraction of digit is maliciously deleted by attacker as an attempt to transform high data values to low in database relation . At lines 8–14 and 15–21, a similar method as discussed earlier is used to determine and to characterize the attacks on length and range of data values in database relation . The characterization of malicious data modifications is further elaborated in Section 3.2 with experimental results.

alg6
Algorithm 6: Characterization of malicious data modifications.

3. Results and Discussion

Suppose that Alice is the database owner and she has used the proposed algorithms along with the secret key to generate a watermark for the database relation . The attacker Mallory for his own nefarious objectives may attempt to make malicious modifications in Alice watermarked database relation. We conducted our experiments in Microsoft Visual Basic and Microsoft Access, on 3.2 GHz Intel core i3 CPU with 2 GB of RAM. The proposed watermarking scheme is evaluated on a real-life dataset namely Forest Cover Type data set, available at UCI Machine Learning Repository [37]. This dataset has 581,102 tuples, each with 10 integer attributes, 44 Boolean attributes, and 1 categorical attribute. In our experiments, we have used all 10 integer attributes. It is to be noted that in robust watermarking schemes, the aim of Mallory is to destroy the Alice watermark without affecting the database relation, whereas in fragile schemes, Mallory attempts to make malicious modifications in Alice watermarked database relation without affecting the watermark. The experimental results presented in this section show that the watermark is adversely affected by even minor malicious data modifications; therefore, the generated watermark is fragile.

3.1. Detection of Malicious Modifications

In this set of experiments, we randomly introduce malicious modifications in Forest Cover Type data set [37]. As discussed in Algorithm 5, these malicious modifications are detected by generating the watermark for the suspicious database relation to obtain , which is then compared with the registered watermark to determine the WAR (Watermark Accuracy Rate) and WDR (Watermark Distortion Rate).

Table 2 shows the WAR and WDR for the malicious insertions made to the database relation with different attack rates. For example, when 10% of the fake but similar tuples are randomly inserted into the database relation , the WDR is found to be high and malicious insertions are detected with low WAR.

tab2
Table 2: Detection of malicious insertion of tuples with different attack rates ( ).

Tables 3-4 show similar results as of insertion attack for malicious deletions and updates made to the database relation .

tab3
Table 3: Detection of malicious deletion of tuples with different attack rates ( ).
tab4
Table 4: Detection of malicious update of tuples with different attack rates ( ).

Figure 3 summarizes the insertion, deletion, and update attacks and shows that the WDR is always high for different volume of malicious data modifications.

796726.fig.003
Figure 3: Watermark distortion rate for malicious insertion, deletion, and update of tuples with different attack rates .

In another set of attacks, we simultaneously perform malicious insertion, deletion, and update of tuples with different attack rates in database relation . Table 5 shows the WDR for this set of attack.

tab5
Table 5: Detection of malicious data modifications with different attack rates ( ).

The experimental results presented in Tables 25 show that the malicious modifications are always detected and fragility of the registered watermark is observed for even low volumes of attack. The WAR is low and WDR is high for different volume of malicious insertions, deletions, and updates made to the database relation. The low WAR indicates the extent to which the database relation has been attacked, whereas the high WDR indicates that the database relation has been tampered and is not authentic. The accuracy of watermark is adversely affected even with minor malicious data modifications and the watermark fragility proves that the database relation has been attacked.

3.2. Characterization of Malicious Modifications

One of the important features of the proposed watermarking scheme is to characterize the malicious modifications made to the database relations. As discussed in Algorithm 1, the watermark generation is based on the content characteristics of database relation itself which enable us to characterize the malicious data modifications. Algorithm 6 elaborates the algorithm for characterization of malicious data modifications by evaluating the fractional change in each digit , length and range of data values in the tempered database relation .

We have conducted experiments for both random and deterministic attacks for characterization of malicious data modifications. In random tempering attacks, we randomly attack the digit frequency, length, and range of data values in the database relation, whereas in deterministic attacks, the attack is performed with the specific attack rates. The random tempering attacks are presented in this section and the results of detailed deterministic attacks are shown in the Appendix for reference.

3.2.1. Attacks on Digit Frequency

In this set of attacks, Mallory randomly performs malicious insertion, deletion, and update attacks on digit frequency in Alice’s watermarked relation . For example, in insertion attack, Mallory may attempt to maliciously insert some digits in . Table 6 shows the experimental results obtained for characterization of malicious insertion attack on digits 9 and 0 as discussed in Algorithm 6. A positive value of indicates that fraction of digits 9 and 0 is maliciously inserted by Mallory in the database relation . The characteristic of this attack is an attempt to relatively increase the low data values to high in database relation as an increase of 35.84% and 24.42% is observed in of digits 9 and 0, respectively. As the other digits are not attacked, so is zero for digits 1–8 and there is no change in the digit frequency of these digits. This characteristic of attack, when combined with the nature of data, may provide useful information about the attacker intention. For example, in the product sales environment, these malicious insertions indicate that the attacker may have attempted to increase the low volume and amount of product sales.

tab6
Table 6: Characterization of malicious insertion attacks on digit frequency.

Table 7 shows the result for random malicious deletions of digits 9 and 0 made to the database relation . A negative value of indicates that fraction of digits 9 and 0 is maliciously deleted by the attacker. The characteristic of this attack is an attempt to relatively decrease the high data values to low in the database relation . In this attack, 14.70% of digit 9 and 12.44% of digit 0 are randomly deleted from the database relation. As the other digits are not deleted, so is zero for digits 1–8. Table 8 shows similar result for random malicious update for digits 9 and 0 made to the database relation. In this attack, digits 9 and 0 are randomly replaced with some other digits, so the digit frequency of digits 9 and 0 is decreased (high to low), where as the digit frequency of digits 1–8 is increased (low to high).

tab7
Table 7: Characterization of malicious deletion attacks on digit frequency.
tab8
Table 8: Characterization of malicious update attacks on digit frequency.

Figure 4 summarizes the malicious insertion, deletion, and update attacks on digits 9 and 0. The insertion attack shows a positive increase (low to high) on attacked digits, where as a negative trend (high to low) is observed in attacked digits for deletion attack. In update attack, both negative (high to low) and positive trends (low to high) are observed for attacked and unattacked digits, respectively.

796726.fig.004
Figure 4: Characterization of malicious insertion, deletion, and update attacks on digits 9 and 0 of data values.

In another set of attacks, we randomly insert, delete and update 10% (lower bound) and 90% (upper bound) of the tuples from the database relation . Table 9 shows the effect on fractional change in digit frequency for each digit. It is to be noted that, in insertion attack, a fraction of positive trend (low to high) is being observed in each digit frequency of database relation . For example, when 10% of similar tuples are inserted in database relation, an increase of approximately 10% is being observed in for each digit of database relation. Similarly, in deletion attack, a fraction of negative trend (high to low) is observed in for each digit of database relation. In update attack, no specific trend is observed in as fractions of digits are randomly replaced by some other digits.

tab9
Table 9: Characterization of malicious modifications on digit frequency.

It is to be noted that the attack on digit frequency (as discussed above) can be characterized on parameters like the digits being attacked, the fraction of each digit attacked, the type of attack (insertion, deletion, or update) on each digit, and the effect of attack (low to high, high to low, or no change) on data values.

3.2.2. Attack on Length of Data Values

In this set of attacks, Mallory randomly performs malicious insertion, deletion, and update attacks on length of data values. Table 10 shows the experimental result for characterization of malicious insertion on data values of length 3 in the database relation . A positive value of indicates that fraction of length is maliciously inserted in the database relation . The characteristic of this attack is to relatively increase the low data values to high as an increase of 18.27% is observed in for data values of length 3. Also, is zero for lengths 1, 2, and 4, which shows that the data values of these lengths are not attacked.

tab10
Table 10: Characterization of malicious insertion attacks on length of data values.

Table 11 shows result of random malicious deletion for data values of length 3. As in deletion of digit frequency attack, a negative value of indicates that fraction of length is maliciously deleted with characteristic of decreasing high data values to low in database relation. Also, as in malicious insertion, the is zero for lengths 1, 2, and 4, which indicates that the data values of these lengths are not deleted. Table 12 shows results for malicious updates on data values of length 3. In this attack, the data values of length 3 are randomly replaced by lengths 1, 2, and 4. This attack shows a decrease in for length 3, where as the for lengths 1, 2, and 4 is increased.

tab11
Table 11: Characterization of malicious deletion attacks on length of data values.
tab12
Table 12: Characterization of malicious update attacks on length of data values.

Figure 5 summarizes the malicious insertion, deletion, and update attacks on length 3 of data values. The insertion attack shows a positive increase (low to high) in attacked length, where as a negative trend (high to low) on attacked length is observed in deletion attack. In modification attack, a negative trend (high to low) is observed on attacked length, where as a positive trend (low to high) is observed on un-attacked length of data values.

796726.fig.005
Figure 5: Characterization of malicious insertion, deletion, and update attacks on length 3 of data values.

Table 13 shows the effect on fractional change in length frequency , when 10% (lower bound) and 90% (upper bound) of tuples are maliciously inserted, deleted, and updated in the database relation. In insertion attack, the fractional change in length frequency has a fraction of positive trend (low to high) for each length of data values. Similarly, in deletion attack, a fraction of negative trend (high to low) is observed for each length of data values. For example, when 10% of tuples are randomly deleted from a database relation, a decrease of approximately 10% is observed in for each length of data values. The update attack does not show any specific trend as fraction of different length of data values are randomly replaced by some other length of data values.

tab13
Table 13: Characterization of malicious modifications on length of data values.

It is to be noted that the attack on length of data values can be characterized on parameters like the length of data values being attacked, the fraction of each length of data values attacked, the type of attack (insertion, deletion, or update), and the effect of attack (low to high, high to low, or no change) on each length of data values.

3.2.3. Attack on Range of Data Values

In this set of attacks, Mallory randomly performs insertion, deletion, and update attack on range 1, that is, (100–999) of data values present in the database relation . Table 14 shows the experimental results for characterization of malicious insertion for range 1 of data values. The characteristic of this attack is to relatively increase the low data values to high as an increase of 17.33% is observed in for range 1 of data values. The for range 0 and 2 is zero as the data values of these ranges are not attacked.

tab14
Table 14: Characterization of malicious insertion attacks on range of data values.

Table 15 shows the results of random malicious deletion for data values of range 1. As in deletion of digit frequency attack, a negative value of indicates that fraction of range 1 is maliciously deleted with characteristic of transforming high data values to low in database relation . As the data values of ranges 0 and 2 are not attacked, so the is zero for these ranges. Table 16 shows the results for malicious updates on data values of range 1. In this attack, the data values of range 1 are randomly replaced by ranges 0 and 2. This attack shows a decrease in for range 1, where as the for range 0 and 2 is increased.

tab15
Table 15: Characterization of malicious deletion attacks on range of data values.
tab16
Table 16: Characterization of malicious update attacks on range of data values.

The malicious insertion, deletion, and update attacks on range 1 of data values are summarized in Figure 6. A positive increase is observed in the attacked range for insertion attack (low to high) and a negative trend (high to low) is observed in attacked range for deletion attack. The modification attack shows a negative trend (high to low) for attacked range, that is, range 1 of data values and a positive increase for nonattacked ranges, that is, range 0 and 2 of data values.

796726.fig.006
Figure 6: Characterization of malicious insertion, deletion, and update attacks on range 1 (100–999) of data values.

In another set of attacks, we randomly inserted, deleted, and updated 10% (lower bound) and 90% (upper bound) of tuples from the database relation . Table 17 shows the effect on fractional change in range frequency , for each range of data values. The fractional change in range frequency has a fraction of positive trend (low to high) for malicious insertion in each range of data values. Similarly, in deletion attack, a fraction of negative trend (high to low) is observed for each range of data values. For example, when 10% of tuples are randomly deleted from a database relation, a decrease of approximately 10% is observed in for each range of data values. The update attack does not show any specific trend as fraction of different range of data values are randomly replaced by some other range of data values.

tab17
Table 17: Characterization of malicious data modifications on range of data values.
tab18
Table 18: Characterization of malicious attacks on digit frequency (deterministic).
tab19
Table 19: Characterization of malicious attacks on length of data values (deterministic).
tab20
Table 20: Characterization of malicious attacks on range of data values (Deterministic).

It is to be noted that the data characteristics used for our experiments like digit, length, and range of data values are cohesive to each another. Due to this relationship, we evaluated the effect of malicious data modifications on these three data characteristics. For example, if Mallory maliciously inserts a digit in a data value, the length and range of the data value are also increased. Similarly, if Mallory maliciously decreases the length of a data value, the digit count and range of the data value are also decreased (Tables 9, 13, and 17).

At the end, we summarize our findings and observations for characterization of malicious data modifications as follows.(i)If there is a positive trend in fractional change of data values in tempered database relation , it means that fraction of digit, range, and length of data values is maliciously inserted by Mallory in Alice’s watermarked relation . The characteristic of this attack is to relatively increase the low data values to high in database relation (Tables 6, 10, and 14).(ii)If there is a negative trend in fractional change of data values in tempered database relation , it means that fraction of digit, range, and length of data values is maliciously deleted by Mallory from Alice’s watermarked relation . The characteristic of this attack is to relatively decrease the high data values to low in database relation (Tables 7, 11, and 15).(iii)If there is both positive and negative trends in fractional change for digit, range, and length of data values in tempered database relation , it means that the negative trend fractional change of data values is maliciously replaced (updated) by positive trend fractional change of data values (Tables 8, 12, and 16).(iv)If there is a uniform increase of in fractional change of all data values in tempered database relation , it means that fraction of similar tuples is maliciously inserted by Mallory in Alice’s watermarked relation . The characteristic of this attack is to relatively increase the low data values to high in database relation (Tables 9, 13, and 17).(v)If there is a uniform decrease of in fractional change of all data values in tempered database relation , it means that fraction of tuples is maliciously deleted by Mallory from Alice’s watermarked relation . The characteristic of this attack is to relatively decrease the high data values to low in database relation (Tables 9, 13, and 17).

4. Conclusions

In this paper, a fragile watermarking scheme to detect and characterize malicious tempering made in database relations is presented. The proposed scheme is based on zero watermarking approach that does not alter the database original content, and thus it overcomes the limitation of data integrity and data usability in existing watermarking schemes. In the proposed scheme, the watermarks are generated by using the local characteristics of database relation itself, like frequency distribution of various digits, lengths, and ranges of data values. This enables us to characterize the malicious modifications made to the database relations. Experimental results showed that the proposed scheme can detect and characterize malicious data modifications successfully. In the future, we intend to work on some other local characteristics of relational databases for watermark generation and to extend the proposed scheme to semifragile watermarking schemes.

Appendix

(i)Characterization of malicious attacks on digit frequency (deterministic) (see Table 18).(ii)Characterization of malicious attacks on length of data values (deterministic) (see Table 19).(iii)Characterization of malicious attacks on range of data values (deterministic) (see Table 20).

References

  1. J. Dittmann, P. Schmitt, E. Saar, J. Schwenk, and J. Ueberberg, “Combining digital watermarks and collusion secure fingerprints for digital images,” SPIE Journal of Electronic Imaging, vol. 9, no. 4, pp. 456–467, 2000. View at Scopus
  2. E. T. Lin and E. J. Delp, “Temporal synchronization in video watermarking,” IEEE Transactions on Signal Processing, vol. 52, no. 10, pp. 3007–3022, 2004. View at Publisher · View at Google Scholar · View at Scopus
  3. M. Chen, Y. He, and R. L. Lagendijk, “A fragile watermark error detection scheme for wireless video communications,” IEEE Transactions on Multimedia, vol. 7, no. 2, pp. 201–211, 2005. View at Publisher · View at Google Scholar · View at Scopus
  4. F. Hartung, P. Eisert, and B. Girod, “Digital watermarking of mpeg-4 facial animation parameters,” Computers and Graphics, vol. 22, no. 4, pp. 425–435, 1998. View at Scopus
  5. R. Agrawal, P. J. Haas, and J. Kiernan, “Watermarking relational data: framework, algorithms and analysis,” VLDB Journal, vol. 12, no. 2, pp. 157–169, 2003. View at Publisher · View at Google Scholar · View at Scopus
  6. G. Gupta and J. Pieprzyk, “Database relation watermarking resilient against secondary watermarking attacks,” in Proceedings of the 5th International Conference on Information Systems Security (ICISS '09), vol. 5905 of Lecture Notes in Computer Science, pp. 222–236, Springer, Kolkata, India, 2009.
  7. Y. Zhang, X. Niu, and D. Zhao, “A method of protecting relational databases copyright with cloud watermark,” International Journal of Information and Communication Engineering, vol. 1, pp. 337–341, 2005.
  8. F. Guo, J. Wang, Z. Zhang, X. Ye, and D. Li, “An improved algorithm to watermark numeric relational data,” in Proceedings of the 6th International Workshop on Information Security applications (WISA '05), vol. 3786 of Lecture Notes in Computer Science, pp. 138–149, Springer, Jeju Island, Republic of Korea, 2005.
  9. M. Huang, J. Cao, Z. Peng, and Y. Fang, “A new watermark mechanism for relational data,” in Proceedings of the 4th International Conference on Computer and Information Technology (CIT '04), pp. 946–950, IEEE Computer Society, Wuhan, China, 2004.
  10. T. Hu, G. Chen, K. Chen, and J. Dong, “Garwm: towards a generalized and adaptive watermark scheme for relational data,” in Proceedings of the 6th International Conference in Advances in Web-Age Information Management (WAIM '05), vol. 3739 of Lecture Notes in Computer Science, pp. 380–391, Springer, Hangzhou, China, 2005.
  11. R. Sion, “Proving ownership over categorical data,” in Proceedings of the 20th International Conference on Data Engineering (ICDE '04), pp. 584–595, IEEE Computer Society, Boston, Mass, USA, April 2004. View at Publisher · View at Google Scholar · View at Scopus
  12. R. Sion, M. Atallah, and S. Prabhakar, “Rights protection for categorical data,” IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 7, pp. 912–926, 2005. View at Publisher · View at Google Scholar · View at Scopus
  13. C. Wang, J. Wang, M. Zhou, G. Chen, and D. Li, “Atbam: an arnold transform based method on watermarking relational data,” in Proceedings of the International Conference on Multimedia and Ubiquitous Engineering (MUE '08), pp. 263–270, IEEE Computer Society, Beijing, China, 2008.
  14. X. Zhou, M. Huang, and Z. Peng, “An additive-attack-proof watermarking mechanism for databases copyrights protection using image,” in Proceedings of the ACM symposium on Applied Computing (SAC '07), pp. 254–258, ACM Press, Seoul, Republic of Korea, March 2007. View at Publisher · View at Google Scholar · View at Scopus
  15. A. Al-Haj and A. Odeh, “Robust and blind watermarking of relational database systems,” Journal of Computer Science, vol. 4, no. 12, pp. 1024–1029, 2008. View at Publisher · View at Google Scholar · View at Scopus
  16. H. Wang, X. Cui, and Z. Cao, “A speech based algorithm for watermarking relational databases,” in Proceedings of the International Symposiums on Information Processing (ISIP '08), pp. 603–606, IEEE Computer Society, Moscow, Russia, 2008.
  17. H. Guo, Y. Li, A. Liu, and S. Jajodia, “A fragile watermarking scheme for detecting malicious modifications of database relations,” Information Sciences, vol. 176, no. 10, pp. 1350–1378, 2006. View at Publisher · View at Google Scholar · View at Scopus
  18. H. Khataeimaragheh and H. Rashidi, “A novel watermarking scheme for detecting and recovering distortions in database tables,” International Journal of Database Management Systems, vol. 2, no. 3, pp. 1–11, 2010.
  19. S. Iqbal, A. Rauf, S. Mahfooz, S. Khusro, and S. H. Shah, “Self-constructing fragile watermark algorithm for. relational database integrity proof,” World Applied Sciences Journal, vol. 19, no. 9, pp. 1273–1277, 2012.
  20. V. Prasannakumari, “A robust tamperproof watermarking for data integrity in relational databases,” Research Journal of Information Technology, vol. 1, no. 3, pp. 115–121, 2009. View at Publisher · View at Google Scholar · View at Scopus
  21. J. Lafaye, D. Gross-Amblard, C. Constantin, and M. Guerrouani, “WATERMILL: an optimized fingerprinting system for databases under constraints,” IEEE Transactions on Knowledge and Data Engineering, vol. 20, no. 4, pp. 532–546, 2008. View at Publisher · View at Google Scholar · View at Scopus
  22. Y. Li, H. Guo, and S. Jajodia, “Tamper detection and localization for categorical data using fragile watermarks,” in Proceedings of the 4th ACM Workshop on Digital Rights Management (DRM '04), pp. 73–82, ACM Press, Washington, DC, USA, October 2004. View at Scopus
  23. I. Kamel, “A schema for protecting the integrity of databases,” Computers and Security, vol. 28, no. 7, pp. 698–709, 2009. View at Publisher · View at Google Scholar · View at Scopus
  24. S. Bhattacharya and A. Cortesi, “A distortion free watermark framework for relational databases,” in Proceedings of the 4th International Conference on Software and Data Technologies (ICSOFT '09), pp. 229–234, INSTICC Press, Sofia, Bulgaria, July 2009. View at Scopus
  25. A. Hamadou, X. Sun, L. Gao, and S. A. Shah, “A fragile zero-watermarking technique for authentication of relational databases,” International Journal of Digital Content Technology and its Applications, vol. 5, no. 5, pp. 189–200, 2011. View at Publisher · View at Google Scholar · View at Scopus
  26. Y. Li, V. Swarup, and S. Jajodia, “Fingerprinting relational databases: schemes and specialties,” IEEE Transactions on Dependable and Secure Computing, vol. 2, no. 1, pp. 34–45, 2005. View at Publisher · View at Google Scholar · View at Scopus
  27. S. Liu, S. Wang, R. H. Deng, and W. Shao, “A block oriented fingerprinting scheme in relational database,” in Proceedings of the 7th International Conference in Information Security and Cryptology (ICISC '04), vol. 3506 of Lecture Notes in Computer Science, pp. 455–466, Springer, Seoul, Republic of Korea, 2004.
  28. F. Guo, J. Wang, and D. Li, “Fingerprinting relational databases,” in Proceedings of the ACM symposium on Applied Computing (SAC '06), pp. 487–492, ACM Press, Dijon, France, 2006.
  29. C. T. Li, F. M. Yang, and C. S. Lee, “Oblivious fragile watermarking scheme for image authentication,” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 3445–3448, IEEE Press, Orlando, Fla, USA, 2002.
  30. H. Yang, X. Sun, B. Wang, and Z. Qin, “An image-adaptive semi-fragile watermarking for image authentication and tamper detection,” in Proceedings of the International Conference on Computational Science and its Applications, Kuala Lumpur, Malaysia, 2007.
  31. J. Hu, J. Huang, D. Huang, and Y. Q. Shi, “Image fragile watermarking based on fusion of multi-resolution tamper detection,” Electronics Letters, vol. 38, no. 24, pp. 1512–1513, 2002. View at Publisher · View at Google Scholar · View at Scopus
  32. J. Zhang and H. Bin, “Fragile audio watermarking scheme based on sample mean sequence,” in Proceedings of the IEEE International Conference on Multimedia Technology (ICMT '11), pp. 333–336, 2011.
  33. H. Wang and M. Fan, “Centroid-based semi-fragile audio watermarking in hybrid domain,” Science in China F, vol. 53, no. 3, pp. 619–633, 2010. View at Publisher · View at Google Scholar · View at Scopus
  34. S. Lian, Z. Liu, Z. Ren, and H. Wang, “Commutative encryption and watermarking in video compression,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 17, no. 6, pp. 774–778, 2007. View at Publisher · View at Google Scholar · View at Scopus
  35. X. Huazheng, S. Xingming, and T. Chengliang, “New fragile watermarking scheme for text documents authentication,” Wuhan University Journal of Natural Sciences, vol. 11, no. 6, pp. 1661–1666, 2006. View at Publisher · View at Google Scholar · View at Scopus
  36. H. Yang, A. C. Kot, and J. Liu, “Semi-fragile watermarking for text document images authentication,” in Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS '05), pp. 4002–4005, May 2005. View at Publisher · View at Google Scholar · View at Scopus
  37. http://archive.ics.uci.edu/ml/datasets/Covertype.