Mathematical Problems in Engineering
Mathematical Problems in Engineering
Volume 2014 (2014), Article ID 867149, 9 pages
Research Article

A Genetic Algorithm Based Multilevel Association Rules Mining for Big Datasets

1School of Computer Science and Engineering, University of Electronic Science and Technology of China, China
2Institute of Computing Technology, Chinese Academy of Sciences, China

Received 1 July 2014; Revised 5 August 2014; Accepted 14 August 2014; Published 26 August 2014

Academic Editor: Shifei Ding

Copyright © 2014 Yang Xu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Multilevel association rules mining is an important domain to discover interesting relations between data elements with multiple levels abstractions. Most of the existing algorithms toward this issue are based on exhausting search methods such as Apriori, and FP-growth. However, when they are applied in the big data applications, those methods will suffer for extreme computational cost in searching association rules. To expedite multilevel association rules searching and avoid the excessive computation, in this paper, we proposed a novel genetic-based method with three key innovations. First, we use the category tree to describe the multilevel application data sets as the domain knowledge. Then, we put forward a special tree encoding schema based on the category tree to build the heuristic multilevel association mining algorithm. As the last part of our design, we proposed the genetic algorithm based on the tree encoding schema that will greatly reduce the association rule search space. The method is especially useful in mining multilevel association rules in big data related applications. We test the proposed method with some big datasets, and the experimental results demonstrate the effectiveness and efficiency of the proposed method in processing big data. Moreover, our results also manifest that the algorithm is fast convergent with a limited termination threshold.