About this Journal Submit a Manuscript Table of Contents
Mathematical Problems in Engineering
Volume 2013 (2013), Article ID 210405, 11 pages
http://dx.doi.org/10.1155/2013/210405
Research Article

Association Rule Hiding Based on Intersection Lattice

Department of Computer Science, Faculty of Science, Khon Kaen University, Khon Kaen 40002, Thailand

Received 12 April 2013; Revised 27 June 2013; Accepted 27 June 2013

Academic Editor: Yang Xu

Copyright © 2013 Hai Quoc Le et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Association rule hiding has been playing a vital role in sensitive knowledge preservation when sharing data between enterprises. The aim of association rule hiding is to remove sensitive association rules from the released database such that side effects are reduced as low as possible. This research proposes an efficient algorithm for hiding a specified set of sensitive association rules based on intersection lattice of frequent itemsets. In this research, we begin by analyzing the theory of the intersection lattice of frequent itemsets and the applicability of this theory into association rule hiding problem. We then formulate two heuristics in order to (a) specify the victim items based on the characteristics of the intersection lattice of frequent itemsets and (b) identify transactions for data sanitization based on the weight of transactions. Next, we propose a new algorithm for hiding a specific set of sensitive association rules with minimum side effects and low complexity. Finally, experiments were carried out to clarify the efficiency of the proposed approach. Our results showed that the proposed algorithm, AARHIL, achieved minimum side effects and CPU-Time when compared to current similar state of the art approaches in the context of hiding a specified set of sensitive association rules.

1. Introduction

Data mining has been recently applied in many areas of science and business, such as traffic accident detection [1], engineering asset health and reliability prediction [2], assessment of landslide susceptibility [3], enterprises [4], and supply chain management [5]. The discovery of association rules is one of the major techniques of data mining that extracts correlative patterns from large databases. Such rules create assets that organizations can use to expand their businesses, improve profitability, decrease supply chain costs, increase the efficiencies of collaborative product developments, and support more effective marketing [4, 5]. The competitive environment of global economy forces companies, who engage in the same business, to form an alliance for mutual benefits. In the collaboration, companies have to share information in order to shorten processing time dramatically, eliminate value-depleting activities, and improve quality, accuracy, and asset productivity [6]. However, due to legal constraints and/or competition among companies, they do not want to reveal their sensitive knowledge to other parties. Association rule hiding is an efficient solution that removes the sensitive association rules from the released database. Thus, the sensitive knowledge can be protected when sharing data between parties.

Many studies in the literature have focused on hiding sensitive association rules by reducing their support or confidence below given thresholds. Association rule hiding algorithms can be divided into three main approach classes [7], namely, border based [8, 9], exact [10, 11], and heuristic [1222]. The border based and exact approaches aim to protect the revised positive border of frequent itemsets in order to minimize side effects. Although these approaches achieve good results for itemsets hiding, they are not conformable for minimizing the side effects when hiding a specific set of sensitive association rules. The heuristic approach does not guarantee a global optimal solution, but it usually finds a solution close to the best one in a faster response time. In 2012, Hai and Somjit [23] introduced a new direction for hiding a specific set of sensitive association rules named intersection lattice based. This approach concentrated on formulating heuristics for specifying victim items and transactions for data sanitization based on intersection lattice theories.

This study proposes an improvement of the new direction of association rule hiding named intersection lattice-based approach [2325]. We first introduce in detail theory of intersection lattice of frequent itemsets and prove that it is applicable to the association rule hiding problem. Subsequently, we formulate two heuristics for hiding sensitive association rules with the lowest side effects. The first heuristic determines the victim item that needs to be modified and focuses on maintaining itemsets in the generating set in order to restrict lost rules. The second heuristic assigns a weight to each transaction relying on its degree of safety, the number of sensitive rules, and the number of nonsensitive association rules contained in that transaction. This study provides evidence that removing the victim item from the transactions which have the highest weight minimally produces effects on the nonsensitive association rules and the intersection lattice of frequent itemsets. An experiment is performed on a real dataset to show the performance of the proposed algorithm in real application terms, as well as comparisons with the previous studies.

The rest of this paper is organized as follows. Section 2 presents a brief review of previous works. The problem formulation is provided in Section 3. Section 4 introduces the basic concepts of lattice theory that are applied in this research. The proposed methodology is presented in Section 5. In Section 6, we present the experimental results in order to show the performance of the proposed approach compared with the state of the art approaches. The main contents presented in this study are concluded in Section 7.

2. Related Work

Recently, association rule hiding is classified into four classes, including heuristic, border based, exact based and intersection lattice based. The heuristic approach provides efficient and fast algorithms that select the appropriate transactions and items for hiding sensitive association rules using distortion or blocking technique. The distortion technique adds (or removes) selected items of sensitive association rules to (or from) specified transactions or add dummy transactions [21] to decrease support [814] or confidence [12, 13, 1519] of the rules under the given thresholds in order to hide single or multiple rules [20]. Unlike the distortion, the blocking technique hides a rule by replacing the existing value of some items with an unknown value so as to reduce the support or confidence of the rule [12, 20, 22].

The border-based approach for association rule hiding was first introduced by Sun and Yu [8]. This approach specifies the revised positive and negative borders of all frequent itemsets. It then focuses on the weight of the positive border [8] or the maxmin set [9] to reduce support of the revised negative border while protecting support of the expected positive border so as to maintain the nonsensitive itemsets.

The exact approach transforms the association rule hiding into optimal problem based on the Constraints Satisfactions Problem (CSP). Menon et al. [10] formulated the CSP to specify a minimum number of transactions needed to be modified in order to hide sensitive association rules. Gkoulalas-Divanis and Verykios [11] formulated the CSP based on the revised positive and negative borders to identify candidate items for the hiding process. In this approach, the authors used a process of constraint reduction to formulate CSP in order to make all constraints in CSP to be linear and all variables in CSP to be binary. This allows the use of binary integer programming instead of integer or linear programming for CSP solutions.

The intersection lattice approach for hiding a specific set of association rules was first introduced by Hai and Somjit [23]. The proposed algorithms, ILARH [23] and HSCRIL [24], aim to hide a specific set of sensitive rules in three steps. The first step specifies a set of itemsets satisfying three conditions that (i) contain right-hand side of the sensitive rule, (ii) are maximal sub-itemset of a maximal itemset, and (iii) have minimal support among those subitemsets specified in (ii). An item in the right-hand side of the sensitive rule that is related to the specified maximal support itemset is identified as the victim item. In the second step, a set of transactions supporting sensitive rule is specified. The third step removes the victim items from specified transactions until confidence of the rule is below minimum confidence threshold. In order to reduce side effects, HCSRIL sorts the set of transactions supporting the sensitive rules in ascending order of their size before sanitizing them. Moreover, HCSRIL technically updates the released database such that the sanitization causes least impacts on the generating set. However, the lager transaction may contain fewer nonsensitive association rules. Thus, sorting transactions based on their size is not enough to restrict the lost rules.

Hai et al. [25] assigned a weight to each transaction in order to measure the impacts of hiding process on the nonsensitive association rules. Moreover, the authors formulated the victim item specification based on the measurement of the distance from sensitive rules to the set of maximal itemsets and the nearest nonsensitive association rule. Modifying the victim item on the high-weight transaction can reduce side effects. On the negative side, the constraints between frequent itemsets are not identified in the distances. Thus, modifying the victim item may avoid impacts on some nonsensitive association rules, but it cannot protect the intersection lattice of frequent itemset from being broken. So it may cause more lost rules.

This research takes full advantages of algorithms proposed in [2325] and proposes an improvement for hiding a specific set of sensitive association rules with the lowest side effects and CPU-Time.

3. Problem Formulation

Let be a finite set of literals. Each member of   is called an item. is an itemset if . A transaction is defined by a set of items, namely, . Let be a finite transaction database, namely, . An itemset is supported by a transaction if . The frequency of an itemset in database is support of   , denoted by , and is defined as

An itemset is called a frequent itemset if , where is the minimum support threshold given by users.

An association rule is the implication , where , , and .

The support of a rule is defined to be the support of itemset , that is,

The confidence of a rule is defined as

Example 1. Let a transaction database be given as in Table 1. Let minimum thresholds be given as and . Frequent itemsets mined from Table 1 are shown in Table 2, and strong association rules generated from the frequent itemsets are presented in Table 3.

tab1
Table 1: Transaction database.
tab2
Table 2: Frequent itemsets.
tab3
Table 3: Strong association rules.

Let and be the minimum support threshold and the minimum confidence threshold given by users. The association rule is the strong association rule if and  .

Lemma 2 (Apriori property [26]). Assume that . If , then .

The Apriori property shows that if an itemset is frequent, then all itemsets in the family of subsets of X are frequent.

The association rules discovered from a large database that can be used in the decision-making support process are said to be sensitive association rules [14].

Definition 3 (sensitive association rules). Let be a transactional database, be a set of all association rules that are mined from , and be a set of decision support rules that need to be hidden according to some security policies. A set of association rules, denoted by , is said to be sensitive if and would derive the set . is the set of nonsensitive association rules such that .

A sensitive association rule   is hidden if or . The rule can be hidden by(i)removing an item from some transactions in order to make ,(ii)adding all items to some transactions until , or(iii)removing an item from some transactions until or  .

The modifications of any item always cause, however, side effects which are the impacts of data modification on the quality of association rule mining, including lost rules, ghost rules, false rules, and accuracy.(i)Lost rule is a nonsensitive association rule that is discovered from the original database but cannot be mined from the released database.(ii)Ghost rule is a nonsensitive association rule that cannot be discovered from the original database but can be mined from the released database.(iii)False rule is the sensitive association rule that cannot be hidden by hiding process.(iv)Accuracy is the ratio of distorted data items to total of data items in the original database.

The association rule hiding algorithm is better than the other one if it achieves lower side effects, including lower lost rules, ghost rules, false rules, and higher accuracy, and lower complexity.

The problem of association rule hiding addressed in this paper can be stated as follows.

Let a transaction database , a minimum support threshold , and a minimum support threshold be given. Let us assume that    is a set of association rules mined from  , whose support and confidence are not less than    and  , respectively. Suppose that a set of certain association rules in    regarded as being sensitive, denoted by  , can be specified. The problem is how to transform    into a released database    in such a way that all sensitive association rules in    are hidden, while nonsensitive association rules can still be mined from    and the side effects are minimal.

We apply method (iii) to a heuristic association rule hiding algorithm based on the intersection lattice of frequent itemsets in order to reduce the side effects.

4. Background

In this section, we recall some concepts in lattice theory that are applied in the present study. Lattice theory was developed by George Grätzer [27]. It singles out a special type of order for details of investigation. The basic concepts of lattice theory that are related to our research are presented as follows.

Let be a nonempty set. A binary relation on is said to be an order relation if satisfies the properties reflexivity, antisymmetry, and transitivity, namely,(1)reflexivity: ,(2)antisymmetry: and imply that ,(3)transitivity: and imply that .

We usually use ≤ to denote an order and to denote an ordered set.

Let be an ordered set. An element is an upper bound of if majorizes all . An upper bound of is the least upper bound of or supremum of if is majorized by all upper bounds of . In this case, we will write .

The dual concepts of upper bound and least upper bound are the lower bound and the greatest lower bound, respectively, which are defined by duality. The greatest lower bound or the infimum of is denoted by .

Definition 4 (lattice). An ordered set is said to be a lattice if for all , and always exist and are denoted by   and , respectively.

Definition 5 (semilattice). Let (; ) be an algebra with one binary operation . The algebra (; ) is a semilattice if is idempotent, commutative, and associative.

An algebra () is said to be a lattice if is a nonempty set, and are semilattices, and the two absorption identities are satisfied. A lattice as algebra and a lattice as an order are proved “equivalent” concepts [27].

Let be a finite nonempty set. It is obvious that the power set of , denoted by , is an ordered set under the inclusion relation . It can be verified that forms a lattice, where and . If and is a lattice satisfying the properties that and , for all and , then is called a set lattice. Similarly, if the ordered set is a semilattice under intersection operation “” satisfying , for all and in , then is said to be an intersection lattice.

5. The Proposed Approach for Association Rule Hiding Based on Intersection Lattice

In this section, we specifically introduce the intersection lattice theory applied in association rule hiding that was basically presented in [2325]. Firstly, we analyze the characteristics of the intersection lattice of frequent itemsets. Then, we improve heuristics for minimizing the side effects of association rule hiding process. Finally, we propose an efficient algorithm for hiding a specific set of sensitive association rules.

5.1. Intersection Lattice of Frequent Itemsets

In this subsection, we formulate intersection lattice theory for the set of frequent itemsets and prove the applicability of this theory into association rule hiding. Let be a given transaction database on a finite set of items and let be a given minimum support threshold. Consider the lattice () and the set , denoted by a set of frequent itemsets that are mined from and satisfy the given threshold ; we have the following statements.

Theorem 6 (intersection lattice of frequent itemset). Let be a given transaction database on a finite set of items and be a given minimum support threshold. Then, forms an intersection lattice, denoted by .

Proof. For all , assume that ; then we have . By Lemma 2, we have , so . In other words, we have .
On the other hand, the ordered set is a semilattice under the intersection operator . Indeed, for all , we always have the following. (i) is idempotent because .(ii) is commutative. Consider an arbitrary item . Then by the definition of set intersection, we have (by the commutativity of meet operation).
Hence, by universal generalization, every item which is in is also in   .
Hence,   .(iii) is associative. Similar to (ii), we have .
In other words, the ordered set is a semilattice under the intersection operation such that for all , . Hence, is an intersection lattice.

Definition 7 (the generating set). The generating set of , denoted by , is the smallest subset of such that each element of can be represented as the (finite) intersection of some elements of , namely,
Definition 7 indicates that each element of can be generated by an intersection of a finite number of certain elements of .

Lemma 8. For all , if , , and , then .

Proof. It can easily be seen that the statement “, , , and then ” is an immediate consequence of Definition 7. Since in the opposite case, , then is obviously also a generating set of . This means that , a contradiction.

Theorem 9. For every , the set is unique.

Proof. It is obvious that if , then . Since , to hold Theorem 9, we have to prove two affirmations as follows.(i) always contains a . For all , we have for all , (Lemma 2). By Definition 7, for all , there is a finite number of itemsets such that If , then  ; thus, we imply that . By Lemma 8, if , then and is generated by an intersection of itemsets . Hence, by universal generalization, for any itemset , there is a set such that either contains or contains a finite set of itemsets which can generate by taking an intersection of those itemsets. In other words, always exists for every intersection lattice .(ii) is unique in . Assume that is the other generating set of . We show that . First, we prove that . Indeed, take any , by the definition of , for some sets , which implies that . By Lemma 8, if , then .
On the other hand, we have by the definition of . Consequently, we obtain the inclusion
By Lemma 8, we infer that the set of indexes is single and, therefore, ; therefore, , which shows that .
Similarly, we also have . In other words, .

Theorem 10. The set is calculated as follows:

Proof. Let be an itemset in . Assume that and . Then, can be generated by the intersection of some itemsets in , namely,
By Lemma 8, . This contradicts the assumption . Therefore, if , then .

Example 11. Let a transaction database be given as Table 1 and be computed as Table 2. The set can be computed by applying (9), namely, .

Definition 12 (set of maximal elements). An element of is said to be a maximal element, if for all and then . A set of maximal elements of is denoted by MAX().

Lemma 13 (the maximal set in the intersection lattice and generating set). Given an intersection lattice , then .

Proof. Assuming that , then . Let ; then
Assuming that , we have , where . By Definition 12, , ; hence, . Therefore, . In other words, (*).
Conversely, assuming that , then
Assuming that , we have , . By Definition 12, , . Thus, since , we have . Then, we imply that . In other words, (**).
By (*) and (**), we imply that .

Definition 14 (coatom). Each item of is called an atom and each element of the set is called a coatom of . A set of all coatoms of is denoted by .

By Lemma 13 and Definition 14, we can infer the property of as follows.

Lemma 15 (characteristics of coatom in the intersection lattice). For every intersection lattice , one always has

The set of can be calculated by applying Theorem 10 to find and then find the maximal itemsets of the set .

Lemma 16. For each itemset , forms a lattice and has a generating set, denoted by , including itemsets in .

Proof. By Lemma 2, if and , then . It is obvious that (; ) is a lattice and . Moreover, for item , , every itemset formed by   lacks only one item, so it has a unique containing itemset in and it has no containing itemset in . All remaining subsets in lack more than one item, so they have at least two containing itemsets in . By Definition 7, includes itemsets in .

For example, itemset forms a lattice and .

Lemma 17. If every itemset of   is not hidden, then no itemset in is hidden.

Proof. By Lemma 15, we have for all such that , so (Lemma 2). Since , we have .

In order to hide a sensitive association rule, this study focuses on decreasing support and confidence of the rule by removing an item belonging to its right-hand side. However, the modification of an item always affects some itemsets in . By (2) and (3), when the support of an itemset is reduced by modifying some items, the support and confidence of association rules that contain these items will be changed. This may lead those rules to be hidden. Moreover, when an itemset is hidden, all association rules generated from this itemset are also hidden. If the hidden rules are not sensitive rules, then they are lost rules. The efficient method that allows the reduction of lost rules restricts itemsets in from being hidden.

By Definition 7, each itemset of intersection lattice can be created by an intersection of some itemsets in . Lemma 17 indicates that all itemsets in are still frequent if every itemset in is maintained. The generating set and coatom set therefore need to be protected from the hiding process in order to maintain . It is possible to propose a heuristic that hides sensitive association rules with lower side effects based on and maintenance.

5.2. The Heuristics for Minimizing Side Effects of Association Rule Hiding Algorithm

In this research, we apply method (iii) to hide the rule by removing an item belonging to from some transactions that support the rule until or . The impacts of the hiding process on depend on the item and transactions selection for the data modifications [24]. This study proposes an efficient improvement of the intersection lattice approach [2325] based on two heuristics for minimizing the side effects of association rule hiding process. In this study, we prove the correctness and efficiency of the heuristic for specifying victim item that was presented in [23, 24] and propose an improvement heuristic for specifying transactions [25]. These heuristics are presented as follows.

Heuristic 1 (specifying victim item for data modifications). For each item , modifying affects support of itemsets in , where . It is obvious that the itemset which has the smallest support in is the easiest to be hidden. This heuristic aims to protect those itemsets in order to restrict the impacts of the hiding process to . Firstly, it identifies itemsets , where and , which are the most vulnerable to the modification of each item in .

Definition 18 (victim candidate). The victim candidate for hiding a sensitive rule , denoted by , is a set of tuples, where each tuple contains four values: , itemset such that , itemset such that and has minimum support in , and . It is computed as follows:
In order to maintain the set and , the modification is required with item in the same tuple with the itemsets that have maximum support among elements of . Such an item is said to be the victim item and is defined as follows.

Definition 19 (victim item). The victim item for hiding the sensitive rule , denoted by , is an item needed to be modified in order to hide the rule such that the modification causes the lowest impacts on , and it is computed as follows:
Function shows that the item needs to be removed from transactions that support the rule . If there are more than two tuples in , then the victim item is selected randomly from those tuples.

Theorem 20. Equation (14) always returns a victim item for association rule hiding.

Proof. By Lemmas 13 and 15, for every rule , there is an itemset such that . Let , by Lemma 16, . In addition, so that ; therefore, there are itemsets such that  . This indicates that the set can always be specified. Obviously, we can find a tuple where . In other words, the function always returns the victim item .

Theorem 21. Modifying the victim item returned by (14) causes minimal impacts on the intersection lattice of frequent itemsets.

Proof. According to (13), the set contains all items and itemset in which is the most vulnerable to the modification of item . Obviously, modifying an item which is contained in the same tuple with the itemset that has maximum support in produces the lowest impacts on . Consequently, modifying returned by (14) causes minimal impacts on .

Heuristic 2 (specifying transaction for data modifications). Assuming that both nonsensitive association rules and sensitive association rules are supported by transaction , the rule is still strong if and . Let a positive integer be assigned as the number of transactions required to be modified. To maintain the nonsensitive rule , must satisfy the conditions and .
Thus, we have and .
The maximal number of transactions that can be modified without hiding the nonsensitive association rules is
Transaction is safe to the hiding process if no nonsensitive rule supported by is hidden. We formulate the safety degree of transaction , denoted by , as follows:
Accordingly, no nonsensitive rule supported by is hidden if is above zero. In other words, we need to maintain during the hiding process in order to restrict the nonsensitive rules from being hidden. As a result, transaction that has high safety degree should be modified first.
Let be the minimum number of transactions that need to be modified in order to hide the sensitive rule . Then,   can be computed as follows: where is left hand side of .
Let be a set of transactions that supports the rule . Let be a set of nonsensitive association rules supported by transaction , namely, . It is obvious that removing victim item from the transaction that supports the lowest and greatest and causes the lowest impacts on and nonsensitive association rules.
For each transaction , a weight was assigned to measure ability of removing victim item from so as to hide the sensitive rule , but the modification causes the least impact on :
Since transaction does not support any nonsensitive association rule corresponding with ,   will be assigned maximal value, because modifying such transaction does not affect any nonsensitive rule. As a result, modifying the high-weight transaction contributes to restricting the lost rules.

5.3. The Proposed Algorithm

Based on the heuristics that are presented in Section 5.2, we propose a new algorithm, denoted by AARHIL (algorithm of association rule hiding based on intersection lattice), that includes two steps as follows.

Step 1 (initiation). AARHIL computes and of the intersection lattice of frequent itemsets using Theorem 10 and Lemma 15, respectively.

Step 2 (hiding process). AARHIL executes three sub-steps for each sensitive association rule .

Step 2.1. AARHIL specifies a set of transactions, denoted by , that fully support the sensitive rule . The algorithm computes the weight of each transaction in using (16) and (18). Then, it sorts in descending order of weight.

Step 2.2. AARHIL specifies victim item using (13) and (14).

Step 2.3. The victim item will be changed when support of itemset in the same tuple with less than

Thus, to save the time needed for updating , , , , , the victim item needs to be updated from transactions in , where

Next, AARHIL updates itemsets in , , and .

Since the victim item is removed from transaction , the support of every itemset that is supported by and contains is decreased one unit. The intersection lattice can be updated by removing all itemsets that have support less than from . The generating set of , denoted by , can be updated as follows.

For each itemset such that ,

Then, of is updated by taking the maximal itemsets of .

AARHIL then computes and . The algorithm repeats this step until or .

The details of AARHIL algorithm are presented in Algorithm 1.

alg1
Algorithm 1: The AARHIL algorithm.

The correctness of AARHIL was proved by Theorem 20. Moreover, by Theorem 21 and the second heuristic, AARHIL hides a set of sensitive association rules with the lowest lost rules while maintaining a high accuracy. The complexity of AARHIL is computed in Theorem 22.

Theorem 22. Computational complexity of algorithm AARHIL is  where is the number of frequent itemsets, is the largest transaction, is the greatest number of transactions supporting the sensitive rule, and is the size of database (total number of transactions).

6. Experimental Results and Discussion

In order to measure the efficiency of proposed model, we compared our algorithm with MaxMin2 [9], WSDA [22], the algorithm proposed by Jain [15], denoted by JA (Jain Algorithm), and HCSRIL proposed by Hai et al. [24]. Moustakides and Verykios [9] showed that MaxMin2 is a more efficient method compared with the previous border-based approach [8], which has achieved better results compared with the heuristic Algorithm 2(b) in [13]. The WSDA algorithm applies heuristic to select the appropriate transactions for modifying an item on the right-hand side of the sensitive rules. The experimental results have indicated that WSDA is more efficient compared with Algorithm 1(b) in [13]. Jain at al. [15] proposed the new algorithm (JA) that overcomes ISL and DSR algorithms [28]. The HCSRIL algorithm applied heuristic on victim item selection based on intersection lattice theory.

The experiment was run on Windows 7 operating system with a Pentium Core i5 and 4 GB of RAM. Our experiments were executed using the Retail.dat dataset, which was donated by Brijs [29]. This dataset contains the retail market basket data from an anonymous Belgian retail store. It contains 88,162 transactions on 16,469 items. In order to examine the performance of the proposed algorithm compared with the previous works, we started the experiments with 30,000 transactions of dataset on 12,142 corresponding items and then extended the dataset up to the maximum. The configurations of datasets are presented in Table 4.

tab4
Table 4: Configuration of datasets and number of association rules satisfy = 1% and = 10%.

We selected two sensitive association rules for the experiments. The performances of these algorithms are illustrated in the following figures.

Figure 1 shows that AARHIL algorithm produced the lowest lost rules in every dataset. In other words, AARHIL achieved the best results in minimizing the lost rules compared with HCSRIL, WSDA, JA, and MaxMin2 algorithms. By applying the support reduction method (i), MaxMin2 produced many lost rules. JA combines methods (ii) and (iii), but it does not apply a heuristic to select victim items and transactions. Thus, it produced more lost rules compared with WSDA, which applied a heuristic to select transactions for data modification. AARHIL applies two heuristics to select appropriate victim items and transactions for data modification using the combination of methods (i) and (iii). Moreover, AARHIL applies a heuristic to compute weight of transactions and sort them before modifying, so it attained the lower lost rules compared with HCSRIL.

210405.fig.001
Figure 1: Lost rules comparison.

Figure 2 indicates that these algorithms produce very few ghost rules. The AARHIL, HCSRIL, WSDA, and JA algorithms did not create ghost rules, whereas the number of ghost rules created by MaxMin2 is more than 0.4 percent.

210405.fig.002
Figure 2: Ghost rules comparison.

There was no false rule produced by these algorithms when dealing with the selected sensitive association rules for every case of dataset.

Figure 3 shows the comparison of these algorithms on the aspect of accuracy of released dataset. With two rules for hiding being selected, the accuracy of released dataset was very high. This means the hiding process caused a few changes in the released dataset compared with the original dataset. Moreover, by modifying the same number of data items, AARHIL and HCSRIL algorithms achieved the same accuracy, but this accuracy is highest compared to other algorithms in every dataset.

210405.fig.003
Figure 3: Accuracy comparison.

The execution times for these algorithms are shown in Figure 4. These algorithms required only 2000 seconds for running 88,162 transactions of 16,469 items, whereas the MaxMin2 algorithm required more times compared with the others. The difference between execution times of HCSRIL and JA algorithms is not significant. By reducing the time to access database and the time to compute , AARHIL achieved lowest CPU-Time.

210405.fig.004
Figure 4: Required execution time.

Table 5 shows the performance of these algorithms in the average case. Accordingly, AARHIL achieved the best results in the side effects minimization. On average, AARHIL achieved 4% lost rule compared with 11% of HCSRIL, 19% of WSDA, 24% of JA, and 32% of MaxMin2. These algorithms attained the same performance in the remaining side effects, whereas MaxMin2 produced 0.38 percent of ghost rules. Moreover, AARHIL achieved the lowest CPU-Time compared with the others.

tab5
Table 5: Average side effect and CPU-Time produced by AARHIL, WSDA, and MaxMin2.

In summary, the results show that the AARHIL algorithm outperforms the HCSRIL, JA, MaxMin2, and WSDA in minimizing the side effects and computational complexity. Hence, this algorithm is suitable for application in the real world.

7. Conclusion

This study introduced in detail the theories of intersection lattice of frequent itemsets, denoted by , and proposed an improvement to minimize size effects and complexity of intersection lattice-based approach. In order to minimize side effects, two heuristics are formulated relying on the properties of the generating set of . The first heuristic aims at specifying the victim item for data distortions such that the modification causes the least impacts on . The improvement is applied in the second heuristic that computes the weight to each transaction relying on their safety degree, the number of sensitive rules, and the number of nonsensitive association rules contained by that transaction. Removing the victim item from the minimum number of specified transactions that have the highest weight contributes to achieving the lowest lost rules and highest accuracy and to restricting ghost rules. The experimental results showed that the proposed algorithm, AARHIL, achieved minimum side effects and CPU-Time compared with HCSRIL, MaxMin2, WSDA, and JA algorithms in the context of hiding a specified set of sensitive association rules.

Acknowledgment

The authors wish to acknowledge the support of the Department of Computer Science, Faculty of Science, Khon Kaen University Publication Clinic, Research and Technology Transfer Affairs, Khon Kaen University, for their assistance.

References

  1. J. Xi, Z. Gao, S. Niu, T. Ding, and G. Ning, “A hybrid algorithm of traffic accident data mining on cause analysis,” Mathematical Problems in Engineering, vol. 2013, Article ID 302627, 8 pages, 2013. View at Publisher · View at Google Scholar
  2. M. Dong, “A tutorial on nonlinear time-series data mining in engineering asset health and reliability prediction: concepts, models, and algorithms,” Mathematical Problems in Engineering, vol. 2010, Article ID 175936, 22 pages, 2010. View at Publisher · View at Google Scholar · View at Scopus
  3. C. Gokceoglu, H. A. Nefeslioglu, E. Sezer, A. S. Bozkir, and T. Y. Duman, “Assessment of landslide susceptibility by decision trees in the metropolitan area of Istanbul, Turkey,” Mathematical Problems in Engineering, vol. 2010, Article ID 901095, 15 pages, 2010. View at Publisher · View at Google Scholar · View at Scopus
  4. Z. R. Adam, An Introduction To Ecommerce, DAI-AGILE, 2000.
  5. D. Y. Zhang, Y. Zeng, L. Wang, H. Li, and Y. Geng, “Modeling and evaluating information leakage caused by inferences in supply chains,” Computers in Industry, vol. 62, no. 3, pp. 351–363, 2011. View at Publisher · View at Google Scholar · View at Scopus
  6. K. I. Ronald, Supply Chain Collaboration: How To Implement CPRT and Other Best Collaborative Practices, J.ROSS Publishing, Plantation, Fla, USA, 2005.
  7. A. Gkoulalas-Divanis and V. S. Verykios, Association Rule Hiding For Data Mining, LaVergne, Tenn, USA, 2011.
  8. X. Sun and P. S. Yu, “Hiding sensitive frequent itemsets by a border-based approach,” Computing Science and Enginerring, vol. 1, no. 1, pp. 74–94, 2007.
  9. G. V. Moustakides and V. S. Verykios, “A MaxMin approach for hiding frequent itemsets,” Data and Knowledge Engineering, vol. 65, no. 1, pp. 75–89, 2008. View at Publisher · View at Google Scholar · View at Scopus
  10. S. Menon, S. Sarkar, and S. Mukherjee, “Maximizing accuracy of shared databases when concealing sensitive patterns,” Information Systems Research, vol. 16, no. 3, pp. 256–270, 2005. View at Publisher · View at Google Scholar · View at Scopus
  11. A. Gkoulalas-Divanis and V. S. Verykios, “An integer programming approach for frequent itemset hiding,” in Proceedings of the 15th ACM Conference on Information and Knowledge Management (CIKM '06), pp. 748–757, November 2006. View at Publisher · View at Google Scholar · View at Scopus
  12. M. Atallah, E. Bertino, A. Elmagarmid, M. Ibrahim, and V. Verykios, “Disclosure limitation of sensitive rules,” in Proceedings of the Workshop on Knowledge and Data Engineering Exchange, 1999.
  13. V. S. Verykios, A. K. Elmagarmid, E. Bertino, Y. Saygin, and E. Dasseni, “Association rule hiding,” IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 4, pp. 434–447, 2004. View at Publisher · View at Google Scholar · View at Scopus
  14. S. R. M. Oliveira and O. R. Zaïane, “A unified framework for protecting sensitive association rules in business collaboration,” International Journal of Business Intelligence and Data Mining, vol. 1, no. 3, pp. 247–287, 2006. View at Scopus
  15. Y. K. Jain, V. K. Yadav, and G. S. Panday, “An efficient association rule hiding algorithm for privacy preserving data mining hiding,” International Journal on Computer Science and Engineering, vol. 3, no. 7, pp. 2792–2798, 2011.
  16. E. T. Wang and G. Lee, “An efficient sanitization algorithm for balancing information privacy and knowledge discovery in association patterns mining,” Data and Knowledge Engineering, vol. 65, no. 3, pp. 463–484, 2008. View at Publisher · View at Google Scholar · View at Scopus
  17. E. D. Pontikakis, A. A. Tsitsonis, and V. S. Verykios, “An experimental study of distortion-based techniques for association rule hiding,” IFIP International Federation for Information Processing, vol. 144, pp. 325–339, 2004.
  18. P. Gulwani, “Association rule hiding by positions swapping of support and confidence,” Information Technology and Computer Science, vol. 4, pp. 54–61, 2012.
  19. D. Jain, A. Sinhal, N. Gupta, et al., “Hiding sensitive association rules without altering the support of sensitive item(s),” International Journal of Artificial Intelligence & Applications (IJAIA), vol. 3, no. 2, pp. 75–84, 2012.
  20. Y. Saygin, V. S. Verykios, and C. Clifton, “Using unknowns to prevent discovery of association rules,” SIGMOD Record, vol. 30, no. 4, pp. 45–54, 2001. View at Scopus
  21. T.-P. Hong, C.-W. Lin, C.-C. Chang, and S.-L. Wang, “Hiding sensitive itemsets by inserting dummy transactions,” in Proceedings of the IEEE International Conference on Granular Computing (GrC '11), pp. 246–249, November 2011. View at Publisher · View at Google Scholar · View at Scopus
  22. V. S. Verykios, E. D. Pontikakis, Y. Theodoridis, and L. Chang, “Efficient algorithms for distortion and blocking techniques in association rule hiding,” Distributed and Parallel Databases, vol. 22, no. 1, pp. 85–104, 2007. View at Publisher · View at Google Scholar · View at Scopus
  23. L. Q. Hai and A. Somjit, “A Conceptual framework for privacy preserving of association rule mining in E-commerce,” in Proceedings of the 7th IEEE Conference on Industrial Electronics and Applications (ICIEA '12), pp. 1999–2003, 2012.
  24. L. Q. Hai, A. Somjit, N. X. Huy, and A. Ngamnij, “Association rule hiding in risk management for retail supply chain collaboration,” Computers in Industry, vol. 64, pp. 776–784, 2013.
  25. L. Q. Hai, A. Somjit, and A. Ngamnij, “Association rule hiding based on distance and intersection lattice,” in Proceedings of the 4th International Conference on Computer Technology and Development (ICCTD '12), pp. 227–231, 2012.
  26. C. Zhang and S. Zhang, Association Rule Mining: Models and Algorithms, vol. 2307 of Lecture Notes in Artificial Intelligence, Springer, New York, NY, USA, 2002.
  27. G. Grätzer, Lattice Theory: Foundation, 2010 Mathematics Subject Classification, Springer Basel AG, 2011. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet
  28. S.-L. Wang, Y.-H. Lee, S. Billis, and A. Jafari, “Hiding sensitive items in privacy preserving association rule mining,” in Proceedings of the IEEE International Conference on Systems, Man and Cybernetics (SMC '04), pp. 3239–3244, October 2004. View at Publisher · View at Google Scholar · View at Scopus
  29. T. Brijs, “Retail market basket data set,” Workshop on Frequent Itemset Mining Implementations (FIMI '03), 2003, http://fimi.ua.ac.be/data/.