Scientific Programming

Volume 2015, Article ID 910281, 6 pages

http://dx.doi.org/10.1155/2015/910281

## Research of Improved FP-Growth Algorithm in Association Rules Mining

Faculty of Computer and Information Science, Southwest University, Chongqing 400715, China

Received 17 September 2014; Revised 22 January 2015; Accepted 22 January 2015

Academic Editor: Oleg V. Gendelman

Copyright © 2015 Yi Zeng et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Association rules mining is an important technology in data mining. FP-Growth (frequent-pattern growth) algorithm is a classical algorithm in association rules mining. But the FP-Growth algorithm in mining needs two times to scan database, which reduces the efficiency of algorithm. Through the study of association rules mining and FP-Growth algorithm, we worked out improved algorithms of FP-Growth algorithm—Painting-Growth algorithm and N (not) Painting-Growth algorithm (removes the painting steps, and uses another way to achieve). We compared two kinds of improved algorithms with FP-Growth algorithm. Experimental results show that Painting-Growth algorithm is more than 1050 and N Painting-Growth algorithm is less than 10000 in data volume; the performance of the two kinds of improved algorithms is better than that of FP-Growth algorithm.

#### 1. Introduction

Data mining is a process to obtain potentially useful, previously unknown, and ultimately understandable knowledge from the data [1]. Association rules mining is one of the important portions of data mining and is used to find the interesting associations or correlation relationships between item sets in mass data [2]. Discovering frequent item sets is a key technology and step in the applications of association rules mining [3]. The most famous algorithm is Apriori put forward by Agawal in the algorithms of discovering frequent item sets [4]. Apriori algorithm through continuous connection scans the database removing unfrequented item sets to find all the frequent item sets in data. But the Apriori algorithm repeatedly scans the database in mining process and produces a large number of candidate item sets, which influence the running speed of mining [5].

FP-Growth (frequent-pattern growth) algorithm is an improved algorithm of the Apriori algorithm put forward by Jiawei Han and so forth [6]. It compresses data sets to a FP-tree, scans the database twice, does not produce the candidate item sets in mining process, and greatly improves the mining efficiency [7]. But FP-Growth algorithm needs to create a FP-tree which contains all the data sets. This FP-tree has high requirement on memory space [8]. And scanning the database twice also makes the efficiency of FP-Growth algorithm not high.

In this paper, we worked out two kinds of improved algorithms—N Painting-Growth algorithm and Painting-Growth algorithm. N Painting-Growth algorithm builds two-item permutation sets to find association sets of all frequent items and then digs up all the frequent item sets according to the association sets. Painting-Growth algorithm builds an association picture based on the two-item permutation sets to find association sets of all frequent items and then digs up all the frequent item sets according to the association sets. Both of the improved algorithms scanning the database only once, improving the overhead of scanning database twice in traditional FP-Growth algorithm, and completing the mining only according to two-item permutation sets, thus, have the advantages of running faster, taking up small space in memory, having low complexity, and being easy to maintain. It is obvious that improved algorithms provide a reference for next association rules mining research.

#### 2. The System Model of Association Rules Mining

##### 2.1. Frequent Item Sets

Set as a collection of all different items in the database, each transaction is a subset of , that is, , and database is a collection of transactions. For a given transaction database , the total number of transactions it contains is . Define the support count of item set as the number of transactions in making and the support support of item set as count [9]. The number of items in an item set is called dimension or length of this item set, if the length of the item set is , called -item set [1].

*Definition 1. *For a given minimum support, minsup, if the item set meets support, item set is called a frequent item set and conversely item set is called an infrequent item set. A set shows association between a frequent item with other items, calling this set a frequent item association set. The minimum support count, minCount, meets minCount = minsup*. When count, one says support [9].

*Definition 2. *When the length of the item set is and support, one calls item set -item frequent set. If , one can call item set multi-item frequent set.

*Nature*. All nonempty subsets of frequent item sets must be frequent.

##### 2.2. FP-Growth Algorithm

FP-Growth algorithm [10] compresses the database into a frequent pattern tree (FP-tree) and still maintains the information of associations between item sets. Then the compressed database is divided into a set of condition databases (a special type of projection database). Each condition database is dug, respectively, and associates with a frequent item. Transaction database is in Table 1 (support count is 2); mining process using FP-Growth algorithm is shown in Table 1.