Research Article  Open Access
Yi Zeng, Shiqun Yin, Jiangyue Liu, Miao Zhang, "Research of Improved FPGrowth Algorithm in Association Rules Mining", Scientific Programming, vol. 2015, Article ID 910281, 6 pages, 2015. https://doi.org/10.1155/2015/910281
Research of Improved FPGrowth Algorithm in Association Rules Mining
Abstract
Association rules mining is an important technology in data mining. FPGrowth (frequentpattern growth) algorithm is a classical algorithm in association rules mining. But the FPGrowth algorithm in mining needs two times to scan database, which reduces the efficiency of algorithm. Through the study of association rules mining and FPGrowth algorithm, we worked out improved algorithms of FPGrowth algorithm—PaintingGrowth algorithm and N (not) PaintingGrowth algorithm (removes the painting steps, and uses another way to achieve). We compared two kinds of improved algorithms with FPGrowth algorithm. Experimental results show that PaintingGrowth algorithm is more than 1050 and N PaintingGrowth algorithm is less than 10000 in data volume; the performance of the two kinds of improved algorithms is better than that of FPGrowth algorithm.
1. Introduction
Data mining is a process to obtain potentially useful, previously unknown, and ultimately understandable knowledge from the data [1]. Association rules mining is one of the important portions of data mining and is used to find the interesting associations or correlation relationships between item sets in mass data [2]. Discovering frequent item sets is a key technology and step in the applications of association rules mining [3]. The most famous algorithm is Apriori put forward by Agawal in the algorithms of discovering frequent item sets [4]. Apriori algorithm through continuous connection scans the database removing unfrequented item sets to find all the frequent item sets in data. But the Apriori algorithm repeatedly scans the database in mining process and produces a large number of candidate item sets, which influence the running speed of mining [5].
FPGrowth (frequentpattern growth) algorithm is an improved algorithm of the Apriori algorithm put forward by Jiawei Han and so forth [6]. It compresses data sets to a FPtree, scans the database twice, does not produce the candidate item sets in mining process, and greatly improves the mining efficiency [7]. But FPGrowth algorithm needs to create a FPtree which contains all the data sets. This FPtree has high requirement on memory space [8]. And scanning the database twice also makes the efficiency of FPGrowth algorithm not high.
In this paper, we worked out two kinds of improved algorithms—N PaintingGrowth algorithm and PaintingGrowth algorithm. N PaintingGrowth algorithm builds twoitem permutation sets to find association sets of all frequent items and then digs up all the frequent item sets according to the association sets. PaintingGrowth algorithm builds an association picture based on the twoitem permutation sets to find association sets of all frequent items and then digs up all the frequent item sets according to the association sets. Both of the improved algorithms scanning the database only once, improving the overhead of scanning database twice in traditional FPGrowth algorithm, and completing the mining only according to twoitem permutation sets, thus, have the advantages of running faster, taking up small space in memory, having low complexity, and being easy to maintain. It is obvious that improved algorithms provide a reference for next association rules mining research.
2. The System Model of Association Rules Mining
2.1. Frequent Item Sets
Set as a collection of all different items in the database, each transaction is a subset of , that is, , and database is a collection of transactions. For a given transaction database , the total number of transactions it contains is . Define the support count of item set as the number of transactions in making and the support support of item set as count [9]. The number of items in an item set is called dimension or length of this item set, if the length of the item set is , called item set [1].
Definition 1. For a given minimum support, minsup, if the item set meets support, item set is called a frequent item set and conversely item set is called an infrequent item set. A set shows association between a frequent item with other items, calling this set a frequent item association set. The minimum support count, minCount, meets minCount = minsup*. When count, one says support [9].
Definition 2. When the length of the item set is and support, one calls item set item frequent set. If , one can call item set multiitem frequent set.
Nature. All nonempty subsets of frequent item sets must be frequent.
2.2. FPGrowth Algorithm
FPGrowth algorithm [10] compresses the database into a frequent pattern tree (FPtree) and still maintains the information of associations between item sets. Then the compressed database is divided into a set of condition databases (a special type of projection database). Each condition database is dug, respectively, and associates with a frequent item. Transaction database is in Table 1 (support count is 2); mining process using FPGrowth algorithm is shown in Table 1.

Scanning the database for the first time, we can obtain a set of frequent items and their support count. The collection of frequent items is ordered by decreasing sequence of support count. The result set or list writes for . In this way, we have .
Building FPTree. First, the algorithm creates the root node of the tree, with the tag “null.” Then it scans the database for the second time. Each item in a transaction is ordered by the sequence of . Later it creates a branch for each transaction. For example, the first transaction “” contains five items according to the sequence of , generating the first branch for building FPtree. The branch has five nodes. In it, is the children link of root, links to , links to , A links to , and links to . The second transaction “” contains three items according to the sequence of , generating a branch. In it, links to the root, links to , and links to . This branch shares the prefix with the existing path of transaction “001.” In this way, the algorithm makes the count of node increase by 1 and creates two new nodes as a link of . Generally, the algorithm considers increasing a branch for a transaction and when each node follows common prefix, its count increases by 1; algorithm creates node for the item following the prefix and linking.
For convenience of tree traversal, the algorithm creates an item header table. Each item through a node link points to itself in FPtree. After scanning all transactions, we get the FPtree displayed in Figure 1.
FPtree Mining Processing. The algorithm starts by the frequent patterns’ length of 1 (initial suffix pattern) and builds its conditional pattern base (a “subdatabase,” consisting of the prefix path set which appears with the suffix pattern). Then, algorithm builds a (conditional) FPtree for the conditional pattern base and recursively digs the tree. The achievement of pattern growth gets through the link between frequent patterns generating by conditional FPtree and suffix pattern. The mining of FPtree is summarized in Table 2.

2.3. System Model
Algorithms of frequent patterns mining have been applied in many fields. Researching their system model can facilitate a better understanding of them. Figure 2 is a system model of the improved algorithms in this paper.
The user can get needed knowledge which passes data mining through the data mining platform. Data mining platform includes data definition, mining designer, and pattern filter. Through the data definition, we can do a pretreatment for data and make incomplete data usable; through the mining designer, we can use the improved algorithms to dig data and get useful patterns (here are frequent item sets); through the pattern filter, we can select interesting patterns from obtained patterns.
3. Improved Algorithms Based on the FPGrowth Algorithm
FPGrowth algorithm requires scanning database twice. Its algorithm efficiency is not high. This paper puts forward two improved algorithms—PaintingGrowth algorithm and N PaintingGrowth algorithm—which use twoitem permutation sets to dig. Both algorithms scan database only once to obtain the results of mining.
3.1. PaintingGrowth Algorithm
Taking the transaction database in Table 1 as an example, the mining process with PaintingGrowth algorithm is as follows.
(1) The algorithm scans the database once, obtains twoitem permutation sets of all transactions, and paints peak set (the peak set is a set of all different items in transaction database). Here we take the first transaction as an example. The first transaction is .
Twoitem permutation sets after scanning the first transaction are , , .
Other transactions are similar to the first transaction. The peak set after scanning database is .
(2) After obtaining the peak set and twoitem permutation sets of all transactions, the algorithm paints the association picture according to twoitem permutation sets and peak set. It links the two items appearing in each twoitem permutation. When the permutation appears again, the link count increases by 1. The association picture is shown in Figure 3.
(3) According to the association picture, algorithm exploits the support count to remove unfrequented associations. We can get the frequent item association sets as follows: ; .
Here we take the item A as an example. shows that the support count of twoitem set (A C) is 2 and the support count of twoitem set (A D) is 2. Other items are similar to item A.
(4) According to the frequent item association sets, we can get all twoitem frequent sets of this transaction database: .
(5) According to the frequent item association sets , we can get a threeitem frequent set {(A,C,D):2}.
And according to the frequent item association sets , we also can get a threeitem frequent set {(B,C,E):2}.
Similarly, according to the frequent item association sets , we get a threeitem frequent set {(C,D,E):2}.
(6) At this point, we get all frequent item sets.
The algorithm pseudocode is as follows.
Algorithm 3 (PaintingGrowth).
Input. Transaction database, minimum support count: 2
Output. All frequent item sets(1)HashMap hm0; //define a HashMap set hm0(2)List list,list0; //define the List set list,list0(3)List permutation(); //scan the transaction database, execute twoitem arranging to each transaction, return list(4)paint(Graphics g) //painting method(5)String s=null, x=null; //define String s, x(6)String z, y;(7)HashMap hm=null; //define a HashMap set hm(8)For (int i=0; i<list. size(); i++)(9){(10)s = list.get(i).split(“,”); //let list.get(i) to a String(11)drawLine(..); //draw a line between and (12)HashMap count(drawLine); //count the drawing line and return the item associations to hm(13)}(14)Iterator it = hm.keySet.iterator; //define key set iterator of hm(15)z = it.next; //let the key in key set of hm to z(16)Iterator it0 = hm.get(z). keySet. iterator; //define the key sets iterator in value sets of hm(17)y = it0.next; //let the key in key sets of value sets of hm to y(18)if(hm.get(z).get(y)<minsup*N) //if the value in value sets of hm less than minimum support count(19){it0.remove;} //remove the unfrequented item sets(20)List(hm.get(z).keySet()); //combination the key sets in value sets based on key z of hm, return list0(21)for(int j=0; j<list0.size;j++)(22){(23)x = list0.get(j).split(“,”);(24)if(count(hm.contain(z+“,”+list0.get(j))==1+x.length)) //if the count of item sets in hm equal with the length of the item sets(first consider the key of hm in the item sets or not)(25){hm0.put(z+“,”+list0.get(j),value)};//save the item sets and its support count in hm0(26)}(27)return hm0;//gain all frequent item sets(28)super.paintComponents(g); //execute painting method
3.2. N PaintingGrowth Algorithm
The thought of N PaintingGrowth algorithm is similar to the PaintingGrowth algorithm, but with different implementation method. N PaintingGrowth algorithm removes the painting steps. The mining process of N PaintingGrowth is as follows.(1)The algorithm scans the database once and gets twoitem permutation sets of all transactions.(2)Then, the algorithm counts each permutation in twoitem permutation sets getting all item association sets.(3)Later, the algorithm removes infrequent associations according to the support count and gets frequent item association sets.(4)Finally, it gets all frequent item sets according to the frequent item association sets. Mining ends.
From the above processes it can be seen that the N PaintingGrowth algorithm is the removing of painting steps version of PaintingGrowth. The implementation methods are different: PaintingGrowth algorithm imports java.awt and javax.swing, implementing mining through calling super.paintComponents(g); N PaintingGrowth algorithm only passes instantiation of a class in main function to implement.
4. Experimental Results Analysis
To improved algorithms—PaintingGrowth and N PaintingGrowth algorithm—the biggest advantage is reducing database scanning to once. Comparing with scanning database twice of FPGrowth algorithm, it has improved time efficiency.
Another advantage is that improved algorithms are simple, completing all mining only needing transactions’ twoitem permutation sets. Although the FPGrowth algorithm is also getting FPtree to complete mining, the FPtree builds complexly and requires memory overhead largely. Relatively, the twoitem permutation sets can be obtained easily.
Of course, improved algorithms have disadvantages. In PaintingGrowth algorithm, the algorithm needs to build the association picture, leading to a large memory overhead. In N PaintingGrowth algorithm, the implementation method is less vivid than PaintingGrowth algorithm. When using the two improved algorithms to dig multiitem frequent sets, they scan the frequent item association sets repeatedly for count. This reduces the time efficiency.
In order to verify the two kinds of improved algorithms relative to the FPGrowth algorithm existing superiority, we use the Java language, in eclipse development environment, Windows 7 64bit operating system, implementing the PaintingGrowth algorithm, N PaintingGrowth algorithm, and FPGrowth algorithm. The data in experiments come from Data Tang—research sharing platform. Transactions in database, respectively, are 1050, 5250, 10500, 21000, 31500, 42000, and 52500.
In experiments, three kinds of algorithms accept the same original data input and support parameter. The algorithms run 20 times in each bout, calculating the mean as a result.
Figure 4 is an execution time comparison figure for PaintingGrowth algorithm, N PaintingGrowth algorithm, and FPGrowth algorithm under the condition of different transactions. From the figure, on the one hand, starting from 1050 transactions, the execution time of N PaintingGrowth algorithm is less than FPGrowth algorithm; at 31500 transactions, the execution time of N PaintingGrowth algorithm and FPGrowth algorithm is very close. Afterwards, the time efficiency is not as good as FPGrowth algorithm.
On the other hand, from 1050 transactions, the execution time of PaintingGrowth algorithm is a little bit more than FPGrowth algorithm. But with the increase in number of transactions, the execution time is less than the FPGrowth algorithm significantly. Thus it can be seen, from the transactionsexecution time comparing, that PaintingGrowth algorithm is more stable and efficient than FPGrowth algorithm.
Another, the implementation method of PaintingGrowth algorithm and N PaintingGrowth is different. The performance is also different. Although N PaintingGrowth algorithm omits the painting steps, only around 1050 transactions to 10500 transactions, the execution time of N PaintingGrowth algorithm is a little less than PaintingGrowth algorithm. Then, with the increase of transaction amount, the performance of PaintingGrowth algorithm is far better than N PaintingGrowth algorithm. This shows that the implementation method of N PaintingGrowth has large memory consumption which leading the execution time of N PaintingGrowth grows faster.
Figure 5 is execution time’s increase rate comparing of different transaction stages for PaintingGrowth algorithm, N PaintingGrowth algorithm, and FPGrowth algorithm. There are seven transaction stages; stage 1: 0–1050 transactions, stage 2: 1050–5250 transactions, stage 3: 5250–10500 transactions, stage 4: 10500–21000 transactions, stage 5: 21000–31500 transactions, stage 6: 31500–42000 transactions, and stage 7: 42000–52500 transactions.
From Figure 5, firstly, to PaintingGrowth algorithm at initial stage 1, the execution time’s increase rate of PaintingGrowth algorithm is high. But then, from stage 2 to stage 7, the fluctuation of execution time’s increase rate is gentle, stable performance. And at stage 2 to stage 6, the execution time’s increase rate of PaintingGrowth algorithm is lower than FPGrowth algorithm, superior performance.
Secondly, to N PaintingGrowth algorithm at the first three stages, the execution time’s increase rate of N PaintingGrowth algorithm is lower than FPGrowth algorithm, performing well. But later, the increase rate of N PaintingGrowth algorithm is almost higher than FPGrowth algorithm and PaintingGrowth algorithm. It also explains why the execution time of N PaintingGrowth is rising rapidly.
Finally, to FPGrowth algorithm, although the whole change trend of increase rate is similar to improved algorithms, it has more clear change than improved algorithms in stage 2 and stage 5. So, the FPGrowth algorithm is less stable than improved algorithms.
From what is above it can be concluded that our PaintingGrowth algorithm has an obvious breakthrough in data analysis. Unhesitatingly, when the data size is suitable, we can consider adopting improved algorithms to achieve further performance. Carefully, the transactions are less than 10000 and we can consider N PaintingGrowth algorithm. In other cases, the PaintingGrowth algorithm performs better and we can consider adopting it.
5. Conclusions
In this paper, we put forward improved algorithms—PaintingGrowth algorithm and N PaintingGrowth algorithm. Both algorithms get all frequent item sets only through the twoitem permutation sets of transactions, being simple in principle and easy to implement and only scanning the database once. So, at appropriate transactions, we can consider using the improved algorithms. But we also see the problems of improved algorithm: in large data, the performance of the N PaintingGrowth is disappointing. Considering how to make the performance of the improved algorithms more stable, make the removal of unfrequented item associations efficient, and make the mining of multiitem frequent sets quick will be our future work.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
This work is supported by the Fundamental Research Funds for the Central Universities (XDJK2009C027) and Science & Technology Project (2013001287).
References
 P. Yang and Z. Song, “An improvement to FPgrowth algorithm,” Journal of Anhui Institute of Mechanical & Electrical Engineering: Natural Science, vol. 17, no. 3, pp. 8–13, 2005. View at: Google Scholar
 D. Fengyi and L. Zhenyu, “An ameliorating FPgrowth algorithm based on patternsmatrix,” Journal of Xiamen University (Natural Science), vol. 44, no. 5, pp. 629–633, 2005. View at: Google Scholar
 Y. Yang and Y. Luo, “Improved algorithm based on FPGrowth,” Computer Engineering and Design, no. 7, pp. 1506–1509, 2010. View at: Google Scholar
 Q. Ruan, Y. Li, and X. Liu, “A hash table and linear based improved FPTree algorithm,” Journal of Yangtze University (Natural Science Edition): Science & Engineering, vol. 1, pp. 76–79, 2010. View at: Google Scholar
 X. Luo and J. Chen, “An improvement algorithm for FPgrowth,” Journal of Xi'an University of Science and Technology, vol. 29, no. 4, pp. 491–494, 2009. View at: Google Scholar
 L. Zhichun and Y. Fengxin, “An improved frequent pattern tree growth algorithm,” Applied Science and Technology, vol. 35, no. 6, pp. 47–51, 2008. View at: Google Scholar
 C. Jun and G. Li, “An improved FPgrowth algorithm based on item head table node,” Information Technology, vol. 12, pp. 34–35, 2013. View at: Google Scholar
 B. Zheng and J. Li, “An improved algorithm based on FPgrowth,” Journal of Pingdingshan Institute of Technology, vol. 17, no. 4, pp. 9–12, 2008. View at: Google Scholar
 N. Xinzheng and S. Kun, “Mining maximal frequent item sets with improved algorithm of FPMAX,” Computer Science, vol. 40, no. 12, pp. 223–228, 2013. View at: Google Scholar
 J. Han and M. Kamber, Data Mining: Concepts and Techniques, China Machine Press, Beijing, China, 2001, translated by: F. Ming, M. Xiaofeng.
Copyright
Copyright © 2015 Yi Zeng et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.