An Improved Apriori Algorithm Based on an Evolution-Communication Tissue-Like P System with Promoters and Inhibitors

Liu, Xiyu; Zhao, Yuzhen; Sun, Minghe

doi:https://doi.org/10.1155/2017/6978146

Discrete Dynamics in Nature and Society

On this page

Abstract Introduction Preliminaries Conclusions Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2017 | Article ID 6978146 | https://doi.org/10.1155/2017/6978146

An Improved Apriori Algorithm Based on an Evolution-Communication Tissue-Like P System with Promoters and Inhibitors

Xiyu Liu,¹Yuzhen Zhao,¹and Minghe Sun²

Academic Editor: Stefan Balint

Received04 Nov 2016

Revised06 Jan 2017

Accepted30 Jan 2017

Published19 Feb 2017

Abstract

Apriori algorithm, as a typical frequent itemsets mining method, can help researchers and practitioners discover implicit associations from large amounts of data. In this work, a fast Apriori algorithm, called ECTPPI-Apriori, for processing large datasets, is proposed, which is based on an evolution-communication tissue-like P system with promoters and inhibitors. The structure of the ECTPPI-Apriori algorithm is tissue-like and the evolution rules of the algorithm are object rewriting rules. The time complexity of ECTPPI-Apriori is substantially improved from that of the conventional Apriori algorithms. The results give some hints to improve conventional algorithms by using membrane computing models.

1. Introduction

Frequent itemsets mining, as a subfield of data mining, aims at discovering itemsets with high frequency from huge amounts of data. Interesting implicit associations between items then can be extracted from these data, which can help researchers and practitioners make informed decisions. One famous example is “beer and diapers” [1]. The supermarket management discovered a significant correlation between the purchases of beer and diapers which had nothing to do with each other ostensibly through frequent itemsets mining. Consequently, they put diapers next to beer. Through this layout adjustment, sales of both beer and diapers increased.

The Apriori algorithm is a typical frequent itemsets mining algorithm, which is suitable for the discovery of frequent itemsets in transactional databases [2]. To process large datasets, many parallel improvements have been made to improve the computational efficiency of the Apriori algorithm [3–6]. How to implement the Apriori algorithm in parallel to improve its computational efficiency is still an on-going research topic. Given the extremity of the technology and theory of the silicon-based computing, new non-silicon-based computing devices P systems are used in this study.

P systems are new bioinspired computing models of membrane computing, which focus on abstracting computing ideas from the study of biological cells, particularly of cellular membranes [7, 8]. This study uses an evolution-communication tissue-like P system with promoters and inhibitors (ECPI tissue-like P systems) for computation. P systems are powerful distributed and parallel bioinspired computing devices, being able to do what Turing machines can do [9–11], and have been applied to many fields. The applications of P systems are based on two types of membrane algorithms, the coupled membrane algorithm and the direct membrane algorithm. The coupled membrane algorithm combines the traditional algorithm with some structural characters of P systems, such as dividing the whole system into several relatively independent computing units, where the computing units can communicate with each other, the computing units can be dynamically rebuilt, and rules can be executed in parallel [12–16]. The direct membrane algorithm designs the algorithm based on the structure, the objects, and the rules of P systems directly [17–21]. The final goal of membrane computing is to build biocomputers and the direct membrane algorithm can be transplanted to the biocomputers directly, which is more meaningful from this perspective. However, the direct membrane algorithm needs to transform the whole traditional algorithm into P system, which is complex and difficult. Up to date, a few simple studies on the direct membrane algorithm focus on the arithmetic operations, the logic operations, the generation of graphic language, and clustering [17–21].

In this study, a novel improved Apriori algorithm based on an ECPI tissue-like P system (ECTPPI-Apriori) is proposed using the parallel mechanism in P systems. The information communication between different computing units in ECTPPI-Apriori is implemented through the exchange of materials between membranes. Specifically, all itemsets are searched in parallel, regulated by a set of promoters and inhibitors. For a database with fields, cells are used in the algorithm, where 1 cell is used to enter the data in the database into the system, cells are used to detect the frequent itemsets, and one specific cell, called output cell, is used to store the results. The time complexity of ECTPPI-Apriori is compared with those of other parallel Apriori algorithms to show that the proposed algorithm is time saving.

The contributions of this study are twofold. From the viewpoint of data mining, new bioinspired techniques are introduced into frequent itemsets mining to improve the efficiency of the algorithms. P systems are natural distributed parallel computing devices which can improve the time efficiency in computation. Besides the hardware and software implementations, P systems can be implemented by biological methods. The computing resources needed are only several cells, which can decrease the computing resource requirements. From the viewpoint of P systems, the application areas of the new bioinspired devices P systems are extended to frequent itemset mining. The applications based on the direct membrane algorithms are limited. This study provides a new application of P systems in frequent itemsets mining, which expands the application areas of the direct membrane algorithms.

The paper is organized as follows. Section 2 introduces some preliminaries about the Apriori algorithm and about the ECPI tissue-like P systems. The ECTPPI-Apriori algorithm using the parallel mechanism of the ECPI tissue-like P system is developed in Section 3. In Section 4, one illustrative example is used to show how the proposed algorithm works. Computational experiments using two datasets to show the performance of the proposed algorithm in frequent itemsets mining are reported in Section 5. Conclusions are given in Section 6.

2. Preliminaries

In this section, some basic concepts and notions in Apriori algorithm [2] and ECPI tissue-like P system [7] are introduced.

2.1. The Apriori Algorithm

The Apriori algorithm is a typical frequent itemsets mining algorithm proposed by Agrawal and Srikant [2], which aims at discovering relationships between items in transactional databases.

Definitions(i)Item: a field in a transactional database is called an item. If one record contains a certain item “1,” otherwise “0,” is placed in the corresponding field of the record in the transactional database.(ii)Itemset: a set of items is called an itemset. For notational convenience, an itemset with items , and is represented by .(iii)h-itemset: a set containing items is called a -itemset.(iv)Transaction: a record in a transactional database is called a transaction, and each transaction is a nonempty itemset.(v)Support count: the number of transactions containing a certain itemset is called the support count of the itemset. Support count is also called the frequency or count of the itemset.(vi)Frequent itemset: if the support count of an itemset is equal to or larger than the given minimum support count threshold , this itemset is called a frequent itemset.

The general procedure of Apriori from Han et al. [1] is as follows.

Input. The database contains transactions and the support count threshold .

Step 1. Scan the database to compute the support count of each item, and obtain the frequent 1-itemsets . Let .

Step 2. Obtain the candidate frequent -itemsets by joining two frequent ()-itemsets with only one item different.

Step 3. Prune those itemsets which have infrequent subset of length () from the candidate frequent -itemsets.

Step 4. Scan the database to compute the support count of each candidate frequent -itemset. Delete those itemsets which do not meet the support count threshold and obtain the frequent -itemsets .

Step 5. Let . Repeat Steps 2 to 4 until no itemset meets the support count threshold .

Output. The collection of all frequent itemsets is represented by .

2.2. Evolution-Communication Tissue-Like P Systems with Promoters and Inhibitors

Membrane computing is a new branch of natural computing, which abstracts computing ideas from the construct and the functions of cells or tissues. In the nature, each organelle membrane or cell membrane works as a relatively independent computing unit. The amount and the types of materials in each organelle or cell change through chemical reactions. Materials can flow between different organelle or cell membranes to transport information. Reactions in different organelles or cells take place in parallel, while reactions in the same organelle or cell take place also in parallel. These biological processes are abstracted as the computing processes of membrane computing. The internal parallel feature makes membrane computing a powerful computing method which has been proven to be equivalent to Turing machines [7–11].

The ECPI tissue-like P system, composed of a network of cells linked by synapses (channels), is a typical membrane computing model. The whole P system is divided into separate regions through these cells, each forming one region. Each cell has two main components, multisets of objects (materials) and rules, also called evolution rules (chemical reactions). Objects, as information carriers, are represented by characters.

Rules regulate the ways objects evolve to new objects and the ways objects in different cells communicate through synapses. Rules are executed in nondeterministic flat maximally parallel in each cell. That is, at any step, if more than one rule can be executed but the objects in the cell can only support some of them, then a maximal number of rules will be executed, and each rule can be executed for only once [22].

The computation halts if no rule can be executed in the whole system. The computational results are represented by the types and numbers of specified objects in a specified cell. Because objects in a P system evolve in flat maximally parallel, regulated by promoters and inhibitors, the systems compute very efficiently [10, 22]. Păun [7] provided more details about P systems.

A formal description of the ECPI tissue-like P system is as follows.

An ECPI tissue-like P system of degree is of the formwhere (1) represents the alphabets including all objects of the system. (2) represents all synapses between the cells. (3) defines the partial ordering relationship of the rules; that is, rules with higher orders are executed with higher priority. (4) represents the subscript of the output cell where the computation results are placed. (5) represent the cells. Each cell is of the form

In (2), represents the initial objects in cell . A means that there is no object in cell . If represents an object, represents the multiplicity of copies of such objects. in (2) represents a set of rules in cell with the form of , where is the multiset of objects consumed by the rule, in the subscript is the promoter or the inhibitor of the form or , and and are the multisets of objects generated by the rule. A rule can be executed only when all objects in the promoter appear and cannot be executed when any objects in the inhibitor appear. Multiset of objects stay in the current cell, and multiset of objects go to the cells which have synapses connected from the current cell. The th subset of rules in cell having similar functions is represented by , and the rules in the same subset are connected by .

3. The ECTPPI-Apriori Algorithm

In this section, the structure of the P system used in the ECTPPI-Apriori algorithm is presented first, the computational processes in different cells are then discussed in detail, a pseudocode summarizing the operations is presented, and an analysis of the algorithm complexity is provided.

3.1. Algorithm and Rules

Assume a transactional database contains records and fields. An object is generated only if the th transaction contains the th item (i.e., there is a 1 in the corresponding field in the transactional database). In this way, the database is transformed into objects, a form that the P system can recognize. The support count threshold is set to . A cell structure with cells, labeled by , as shown in Figure 1, is used as the framework for ECTPPI-Apriori. The evolution rules are not shown in this figure due to their length. Transactional databases are usually sparse. Therefore, the number of objects, represented by , to be processed in this algorithm is much smaller than .

When computation begins, objects encoded from the transactional database and object representing the support count threshold are entered into cell 0. Objects and are passed to cells in parallel, using a parallel evolution mechanism in tissue-like P systems. The auxiliary objects are generated in cell 1. Next, the frequent 1-itemsets are produced and objects representing frequent 1-itemsets are generated in cell 1 by executing the evolution rules in parallel. The objects representing the frequent 1-itemsets are passed to cells 2 and . Cell is used to store the computational results. The frequent 2-itemsets and the objects representing the frequent 2-itemsets are produced in cell 2 by executing the evolution rules in parallel. The objects representing the frequent 2-itemsets are passed to cells 3 and . This process continues until all frequent itemsets have been produced. As compared with that of the conventional Apriori algorithm, the computational time needed by ECTPPI-Apriori to generate the candidate frequent -itemsets and to compute the support count of each candidate frequent -itemset can be substantially reduced.

The ECPI tissue-like P system for ECTPPI-Apriori is as follows.where (1) for ; (2) ; ; ; (3) ; (4) , , ; (5) . In , and : for and .⁡ In and : . for and . for and .⁡ In and : for . . for and . for and .⁡ ⋮⁡ In and : for and . for . for and . for and .⁡ ⋮⁡ In and : for and . for . for , and . for , and .⁡ In , and .

The auxiliary objects and , for , are used to detect the frequent 1-itemsets. The auxiliary objects store the items in the candidate frequent 1-itemsets using their subscripts. For example, object means the itemset is a candidate frequent 1-itemset. The auxiliary objects are used to identify the frequent 1-itemsets. The copies of initially in cell 1 indicate that the th item needs to appear in at least records to make the itemset a frequent 1-itemset. One object is removed from, and one object is generated in cell 1 when one more of the th item is detected in a record. Therefore, if there is no object left and objects have been generated in cell 1, at least records have been found to contain the th item. The functions of in cells 2, in cell , and in cell are similar to that of in cell 1. The functions of in cell 2, in cell , and in cell are similar to that of in cell 1. The objects , for , are used to store the items in the frequent 1-itemsets using their subscript. For example, means the itemset is a frequent 1-itemset. The functions of in cell 2, in cell , and in cell are similar to that of in cell 1.

The evolution rules are object rewriting rules similar to chemical reactions. They take objects, transform them into other objects, and may transport them to other cells.

3.2. Computing Process

Input. Cell 0 is the input cell. The objects encoded from the transactional database and objects representing the support count threshold are entered into cell 0 to activate the computation process. Rule is executed to put copies of and to cells .

Frequent 1-Itemsets Generation. Frequent 1-itemsets are generated in cell 1. Rule is executed to generate for . Rule is executed to detect all frequent 1-itemsets using the internal flat maximally parallel mechanism in the P system. Rule cannot be executed because no object is in cell 1 at this time. The detection process of the candidate frequent 1-itemset is taken as an example. The detection processes of other candidate frequent 1-itemsets are performed in the same way. Rule is actually composed of multiple subrules working on objects with different subscripts. If object is in cell 1 which means the th record contains the first item, the subrule meets the execution condition and can be executed. If object is not in cell 1 which means the th record does not contain the first item, the subrule does not meet the execution condition and cannot be executed. Initially, copies of are in cell 1 indicating that the first item needs to appear in at least records for the itemset to be a frequent 1-itemset. Each execution of a subrule consumes one . Therefore, at most subrules of the form can be executed in nondeterministic flat maximally parallel. The checking process continues until all objects have been checked or all of the copies of have been consumed. If all of the copies of have been consumed, the first item appeared in at least records and the itemset is a frequent 1-itemset. If some copies of are still in this cell after all objects have been checked, the itemset is not a frequent 1-itemset.

Rule is then executed to process the results obtained by rule . The 1-itemset is again taken as an example. If copies of have been consumed by rule , and copies of are still in this cell, subrule is executed to delete the objects and . If all of the copies of have been consumed, subrule is executed to put an object to cells 2 and to indicate that the itemset is a frequent 1-itemset and to activate the computation in cell 2. If no 1-itemset is a frequent 1-itemset, the computation halts.

Frequent 2-Itemsets Generation. The frequent 2-itemsets are generated in cell 2. Rule is executed to obtain all candidate frequent 2-itemsets using the internal flat maximally parallel mechanism in the P system. The pair of empty parentheses in this subrule indicates that no objects are consumed when this rule is executed. The detection process of the candidate frequent 2-itemset is taken as an example. The detection processes of the other candidate frequent 2-itemsets are performed in the same way. Rule is actually composed of multiple subrules working on objects with different subscripts. If objects and are in cell 2, which means itemsets and are frequent 1-itemsets, subrule is executed to generate . The presence of means the 2-itemset is a candidate frequent 2-itemset.

Rule is executed to delete the redundant objects that were used by rule but are not needed anymore.

Rule is executed to detect all frequent 2-itemsets using the internal flat maximally parallel mechanism in the P system. Rule cannot be executed because no object is in cell 2 at this time. The detection process of the frequent 2-itemset is taken as an example. Rule is actually composed of multiple subrules working on objects with different subscripts. If objects and are in cell 2 which means the th record contains the first and the second items, subrule meets the execution condition and can be executed. If objects and are not both in cell 2 which means the th record does not contain both the first and the second items, subrule does not meet the execution condition and cannot be executed. Initially, copies of are in cell 2 indicating that both the first and the second items need to appear together in at least records for the itemset to be a frequent 2-itemset. Each execution of these subrules consumes one . Therefore, at most subrules of the form can be executed in nondeterministic flat maximally parallel. The checking process continues until all objects have been checked or all of the copies of have been consumed. If all of the copies of have been consumed, the first and the second items appeared together in at least records and the itemset is a frequent 2-itemset. If some copies of are still in this cell after rule is executed, the itemset is not a frequent 2-itemset.

Rule is executed to process the results obtained by rule . The 2-itemset is again taken as an example. If copies of have been consumed by rule , and copies of objects are still in this cell, subrule is executed to delete the objects and . If all of the copies of have been consumed, subrule is executed to put an object to cells 3 and to indicate that the itemset is a frequent 2-itemset and to activate the computation in cell 3. If no 2-itemset is a frequent 2-itemset, the computation halts.

Each cell for has 4 rules which are similar to those in cell 2. Each cell performs similar functions as cell 2 does but for frequent -itemsets.

After the computation halts, all the results, that is, objects representing the identified frequent itemsets, are stored in cell .

3.3. Algorithm Flow

The conventional Apriori algorithm executes sequentially. ECTPPI-Apriori uses the parallel mechanism of the ECPI tissue-like P system to execute in parallel. A pseudocode of ECTPPI-Apriori is shown as in Algorithm 1.

Input:
The objects encoded from the transactional database and objects representing the support count
threshold .
Rule :
Copy all and to cells 1 to .
Method:

Rule :
Generate for to form the candidate frequent 1-itemsets .
Rule :
Scan each object in cell 1 to count the frequency of each item. If is in cell 1, consume one and
generate one . Continue until all copies of have been consumed or all objects have been scanned.
Rule :
If all copies of have been consumed, generate an object to add to as a frequent 1-itemset
and pass to cells 2 and . Delete all remaining copies of and delete all copies of .
For ( and ) do the following in cell :

Rule :
Scan the objects representing the frequent ()-itemsets to generate the objects
representing the candidate frequent -itemsets .
Rule :
Delete all objects after they have been used by rule .
Rule :
Scan the objects representing the database to count the frequency of each candidate frequent -itemset
. If the objects and are all in cell , consume one and generate one .
Continue until all copies of have been consumed or all objects have been scanned.
Rule :
If all copies of have been consumed, generate to add to as a frequent
-itemset and put in cells and . Delete all remaining copies of and delete all copies
of .
Let .

Output:
The collection of all frequent itemsets encoded by objects .

3.4. Time Complexity

The time complexity of ECTPPI-Apriori in the worst case is analyzed. Initially, 1 computational step is needed to put copies of and to cells .

Generating the frequent 1-itemsets needs 3 computational steps. Generating the candidate frequent 1-itemsets needs 1 computational step. Finding the support counts of the candidate frequent 1-itemsets needs 1 computational step. All candidate frequent 1-itemsets in the database are checked in the flat maximally parallel. Passing the results of the frequent 1-itemsets to cells 2 and needs 1 computational step.

Generating the frequent -itemsets () needs 4 computational steps. Generating the candidate frequent -itemsets needs 1 computational step. Cleaning the memory used by the objects needs 1 computational step. Finding the support counts of the candidate frequent -itemsets needs 1 computational step. All candidate frequent -itemsets in the database are checked in flat maximally parallel. Passing the results of the frequent -itemsets to cells and needs 1 computational step.

Therefore, the time complexity of ECTPPI-Apriori is , which gives . Note that is used traditionally to indicate the time complexity of an algorithm and the used here has a different meaning from that used earlier when ECTPPI-Apriori is described.

Some comparison results between ECTPPI-Apriori and the original as well as some other improved parallel Apriori algorithms are shown in Table 1, where is the number of candidate frequent -itemsets and is the number of frequent -itemsets.

4. An Illustrative Example

An illustrative example is presented in this section to demonstrate how ECTPPI-Apriori works. Table 2 shows the transactional database of one branch office of All Electronics [1]. There are 9 transactions and 5 fields in this database; that is, and . Suppose the support count threshold is . The computational processes are as follows.

Input. The database is transformed into objects , , , , , , , , , , , , , , , , , , , , , , and , in a form that the P system can recognize. These objects and objects representing the support count threshold are entered into cell 0 to activate the computation process. Rule is executed to put copies of , , , , , , , , , , , , , , , , , , , , , , , and to cells .

Frequent 1-Itemsets Generation. Within cell 1, the auxiliary objects , for , are created by rule to indicate that each item needs to appear in at least records for it to be a frequent 1-itemset. Rule is executed to detect all frequent 1-itemsets in flat maximally parallel. The detection process of the candidate frequent 1-itemset is taken as an example. Objects , , , , , and are in cell 1 which means the first, the fourth, the fifth, the seventh, the eighth, and the ninth records contain . The subrules , , , , , and meet the execution condition and can be executed. Objects , , and are not in cell 1, which means the second, the third, and the sixth records do not contain . The subrules , , and do not meet the execution condition and cannot be executed. Initially, 2 copies of are in cell 1 indicating that the first item needs to appear in at least 2 records to make the itemset a frequent 1-itemset. Each execution of a subrule consumes one . Therefore, 2 of subrules among , , , , , and can be executed in nondeterministic flat maximally parallel. Through the execution of 2 such subrules, both of the 2 copies of are consumed and 2 copies of are generated. The detection processes of other candidate frequent 1-itemsets are performed in the same way. After the detection processes, , , , , and are consumed, and , , , , and are generated.

Rule is then executed to process the results obtained by rule . The 1-itemset is again taken as an example. All of the 2 copies of have been consumed, subrule is executed to put an object to cells 2 and 6 to indicate that the itemset is a frequent 1-itemset and to activate the computation in cell 2. Subrules , , , and are also executed to put objects , , , and to cells 2 and 6 to indicate that the itemsets , , , and are frequent 1-itemsets and to activate the computation in cell 2.

Frequent 2-Itemsets Generation. Within cell 2, rule is executed to obtain all candidate frequent 2-itemsets. The detection process of the candidate frequent 2-itemset is taken as an example. Objects and are in cell 2, which means itemsets and are frequent 1-itemsets. Subrule is executed to generate . The presence of means the 2-itemset is a candidate frequent 2-itemset, and both and need to appear together in at least 2 records for the itemset to be a frequent 2-itemset. The detection processes of the other candidate frequent 2-itemsets are performed in the same way. After the detection processes, objects , , , , , , , , , and are generated.

Rule is executed to delete the objects , , , , and that are not needed anymore.

Rule is executed to detect all frequent 2-itemsets. Objects and , and , and , and and are in cell 2, which means the first, the fourth, the eighth, and the ninth records contain both and . The subrules , , , and meet the execution condition and can be executed. Objects and , and , and , and , or and are not both in cell 2, which means the second, the third, the fifth, the sixth, and the seventh records do not contain both and . The subrules , , , , and do not meet the execution condition and cannot be executed. Initially, 2 copies of are in cell 2 indicating that both and need to appear together in at least 2 records for the itemset to be a frequent 2-itemset. Each execution of these subrules consumes one . Therefore, 2 subrules among , , , and can be executed. After the execution, 2 copies of are consumed and 2 copies of are generated. The detection processes of other candidate frequent 2-itemsets are performed in the same way. After the detection processes, objects , , , , , , , and are consumed, and objects , , , , , , , and are generated.

Rule is executed to process the results obtained by rule . The 2-itemset is again taken as an example. All of the 2 copies of have been consumed by rule , subrule is executed to put an object to cells 3 and 6 to indicate that the itemset is a frequent 2-itemset and to activate the computation in cell 3. Subrules , , , , and are also executed, which put objects , , , , and to cells 3 and 6 to indicate that the itemsets , , , , and are frequent 2-itemsets and to activate the computation in cell 3.

The 4 rules in each cell for are executed in ways similar to those in cell 2. The rules in these cells detect the frequent -itemsets for . After the computation halts, the objects , , , , , , , , , , , , and are stored in cell 6, which means , , and are all frequent itemsets in this database.

The change of objects in the computation processes is listed in Tables 3–6.

5. Computational Experiments

Two databases from the UCI Machine Learning Repository [23] are used to conduct computational experiments. Computational results on these two databases are reported in this section.

5.1. Results on the Congressional Voting Records Database

The Congressional Voting Records database [23] is used to test the performance of ECTPPI-Apriori. This database contains 435 records and 17 attributes (fields). The first attribute is the party that the voter voted for and the 2nd to the 17th attributes are sixteen characteristics of each voter identified by the Congressional Quarterly Almanac. The first attribute has two values, Democrats or Republican, and each of the 2nd to the 17th attributes has 3 values: yea, nay, and unknown disposition. The frequent itemsets of these attribute values need to be identified; that is, the problem is to find the attribute values which always appear together.

Initially, the database is preprocessed. Each attribute value is taken as a new attribute. In this way, each new attribute has only two values: yes or no. After preprocessing, each record in the database has attributes. ECTPPI-Apriori then can be used to discover the frequent itemsets. In this experiment, one itemset is called a frequent itemset if it appeared in more than 40% of all records; that is, the support count threshold is (). The frequent itemsets obtained by ECTPPI-Apriori are listed in Table 7.

5.2. Results on the Mushroom Database

The Mushroom database [23] is also used to test ECTPPI-Apriori. This database contains 8124 records. The 8124 records are numbered orderly from 1 to 8124. Each record represents one mushroom and has 23 attributes (fields). The first attribute is the poisonousness of the mushroom and the 2nd to the 23rd attributes are 22 characteristics of the mushrooms. Each of the attributes has 2 to 12 values. The frequent itemsets of these attribute values need to be found; that is, the problem is to find the attribute values which always appear together.

Initially, the database is preprocessed. Each attribute value is taken as a new attribute. In this way, each new attribute has only two values, yes or no. After preprocessing, each record has 118 attributes. ECTPPI-Apriori then can be used to discover the frequent itemsets. In this experiment, one itemset is a frequent itemset if it appears in more than 40 percent of all records; that is, the support count threshold is (). The frequent itemsets obtained by ECTPPI-Apriori are listed in Table 8.

6. Conclusions

An improved Apriori algorithm, called ECTPPI-Apriori, is proposed for frequent itemsets mining. The algorithm uses a parallel mechanism in the ECPI tissue-like P system. The time complexity of ECTPPI-Apriori is improved to compared to other parallel Apriori algorithms. Experimental results, using the Congressional Voting Records database and the Mushroom database, show that ECTPPI-Apriori performs well in frequent itemsets mining. The results give some hints to improve conventional algorithms by using the parallel mechanism of membrane computing models.

For further research, it is of interests to use some other interesting neural-like membrane computing models, such as the spiking neural P systems (SN P systems) [8], to improve the Apriori algorithm. SN P systems are inspired by the mechanism of the neurons that communicate by transmitting spikes. The cells in SN P systems are neurons that have only one type of objects called spikes. Zhang et al. [24, 25], Song et al. [26], and Zeng et al. [27] provided good examples. Also, some other data mining algorithms can be improved by using parallel evolution mechanisms and graph membrane structures, such as spectral clustering, support vector machines, and genetic algorithms [1].

Competing Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

Project is supported by National Natural Science Foundation of China (nos. 61472231, 61502283, 61640201, 61602282, and ZR2016AQ21).

References

J. Han, M. Kambr, and J. Pei, Data Mining: Concepts and Techniques, Elsevier, Amsterdam, Netherlands, 2012.
R. Agrawal and R. Srikant, “Fast algorithms for mining association rules in large databases,” in Proceedings of the International Conference on Very Large Data Bases, vol. 1, pp. 487–499, September 1994.
View at: Google Scholar
H. Yu, J. Wen, H. Wang, and J. Li, “An improved Apriori algorithm based on the boolean matrix and Hadoop,” Procedia Engineering, vol. 15, no. 1, pp. 1827–1831, 2011.
View at: Google Scholar
J. Li, F. Sun, X. Hu, and W. Wei, “A multi-GPU implementation of apriori algorithm for mining association rules in medical data,” ICIC Express Letters, vol. 9, no. 5, pp. 1303–1310, 2015.
View at: Google Scholar
N. Li, L. Zeng, Q. He, and Z. Shi, “Parallel implementation of apriori algorithm based on MapReduce,” in Proceedings of the 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing (SNPD '12), pp. 236–241, Kyoto, Japan, August 2012.
View at: Publisher Site | Google Scholar
A. Ezhilvathani and K. Raja, “Implementation of parallel Apriori algorithm on Hadoop cluster,” International Journal of Computer Science and Mobile Computing, vol. 2, no. 4, pp. 513–516, 2013.
View at: Google Scholar
G. Păun, “Computing with membranes,” Journal of Computer and System Sciences, vol. 61, no. 1, pp. 108–143, 2000.
View at: Publisher Site | Google Scholar | MathSciNet
Gh. Paun, G. Rozenberg, and A. Salomaa, The Oxford Handbook of Membrane Computing, Oxford University Press, Oxford, UK, 2010.
L. Pan, G. Păun, and B. Song, “Flat maximal parallelism in P systems with promoters,” Theoretical Computer Science, vol. 623, pp. 83–91, 2016.
View at: Publisher Site | Google Scholar | MathSciNet
B. Song, L. Pan, and M. J. Pérez-Jiménez, “Tissue P systems with protein on cells,” Fundamenta Informaticae, vol. 144, no. 1, pp. 77–107, 2016.
View at: Publisher Site | Google Scholar | MathSciNet
X. Zhang, L. Pan, and A. Păun, “On the universality of axon P systems,” IEEE Transactions on Neural Networks and Learning Systems, vol. 26, no. 11, pp. 2816–2829, 2015.
View at: Publisher Site | Google Scholar | MathSciNet
J. Wang, P. Shi, and H. Peng, “Membrane computing model for IIR filter design,” Information Sciences, vol. 329, pp. 164–176, 2016.
View at: Publisher Site | Google Scholar
G. Singh and K. Deep, “A new membrane algorithm using the rules of Particle Swarm Optimization incorporated within the framework of cell-like P-systems to solve Sudoku,” Applied Soft Computing Journal, vol. 45, pp. 27–39, 2016.
View at: Publisher Site | Google Scholar
G. Zhang, H. Rong, J. Cheng, and Y. Qin, “A Population-membrane-system-inspired evolutionary algorithm for distribution network reconfiguration,” Chinese Journal of Electronics, vol. 23, no. 3, pp. 437–441, 2014.
View at: Google Scholar
H. Peng, J. Wang, M. J. Pérez-Jiménez, and A. Riscos-Núñez, “An unsupervised learning algorithm for membrane computing,” Information Sciences, vol. 304, pp. 80–91, 2015.
View at: Publisher Site | Google Scholar
X. Zeng, L. Xu, X. Liu, and L. Pan, “On languages generated by spiking neural P systems with weights,” Information Sciences, vol. 278, pp. 423–433, 2014.
View at: Publisher Site | Google Scholar | MathSciNet
X. Liu, Z. Li, J. Liu, L. Liu, and X. Zeng, “Implementation of arithmetic operations with time-free spiking neural P systems,” IEEE Transactions on Nanobioscience, vol. 14, no. 6, pp. 617–624, 2015.
View at: Publisher Site | Google Scholar
T. Song, P. Zheng, M. L. Dennis Wong, and X. Wang, “Design of logic gates using spiking neural P systems with homogeneous neurons and astrocytes-like control,” Information Sciences, vol. 372, pp. 380–391, 2016.
View at: Publisher Site | Google Scholar
L. Pan and G. Păun, “On parallel array P systems automata,” in Universality, Computation, vol. 12, pp. 171–181, Springer International, New York, NY, USA, 2015.
View at: Publisher Site | Google Scholar | MathSciNet
T. Song, H. Zheng, and J. He, “Solving vertex cover problem by tissue P systems with cell division,” Applied Mathematics and Information Sciences, vol. 8, no. 1, pp. 333–337, 2014.
View at: Publisher Site | Google Scholar
Y. Zhao, X. Liu, and W. Wang, “ROCK clustering algorithm based on the P system with active membranes,” WSEAS Transactions on Computers, vol. 13, pp. 289–299, 2014.
View at: Google Scholar
C. Martín-Vide, G. Păun, J. Pazos, and A. Rodríguez-Patón, “Tissue P systems,” Theoretical Computer Science, vol. 296, no. 2, pp. 295–326, 2003.
View at: Publisher Site | Google Scholar | MathSciNet
M. Lichman, UCI Machine Learning Repository, University of California, School of Information and Computer Science, Irvine, Calif, USA, 2013, http://archive.ics.uci.edu/ml.
X. Zhang, B. Wang, and L. Pan, “Spiking neural P systems with a generalized use of rules,” Neural Computation, vol. 26, no. 12, pp. 2925–2943, 2014.
View at: Publisher Site | Google Scholar | MathSciNet
X. Zhang, X. Zeng, B. Luo, and L. Pan, “On some classes of sequential spiking neural P systems,” Neural Computation, vol. 26, no. 5, pp. 974–997, 2014.
View at: Publisher Site | Google Scholar | MathSciNet
T. Song, L. Pan, and Gh. Paun, “Asynchronous spiking neural P systems with local synchronization,” Information Sciences, vol. 219, pp. 197–207, 2012.
View at: Google Scholar
X. Zeng, X. Zhang, T. Song, and L. Pan, “Spiking neural P systems with thresholds,” Neural Computation, vol. 26, no. 7, pp. 1340–1361, 2014.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2017 Xiyu Liu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

3418

Downloads

1280

Citations