Abstract

Bacterial colonies perform a cooperative and distributed exploration of the environmental resources by using their quorum-sensing mechanisms. This paper describes how bacterial colony networks and their skills to explore resources can be used as tools for mining association rules in static and stream data. A new algorithm is designed to maintain diverse solutions to the problems at hand, and its performance is compared to that of other well-known bacteria, genetic, and immune-inspired algorithms: Bacterial Foraging Optimization (BFO), a Genetic Algorithm (GA), and the Clonal Selection Algorithm (CLONALG). Taking into account the superior performance of our approach in static data, we applied the algorithms to dynamic environments by converting static into flow data via a stream data model named sliding-window. We also provide some notes on the running time of the proposed algorithm using different hardware and software architectures.

1. Introduction

Bacterial colonies can be seen as complex adaptive systems that perform distributed information processing to solve complex problems, such as food acquisition, swarming mobility, and biofilm formation, among others. They use a collaborative system of chemical signals to explore the resources of a given environment and coordinate their social and behavioural tasks [1]. Bacteria can be found in distinct environments, ranging from hostile to more hospitable ones by applying different kinds of survival strategies to process self and environmental stimuli [2].

The collective and collaborative activities carried out by a bacterial colony are classified as a type of collective intelligence [3], where each bacterium is able to sense itself and the environment and maintain communication with other bacteria in the colony to perform its coordinated tasks. This enables the colony to acquire information about the environment and its changes. Thus, a colony can be seen as an adaptive computational system that processes information on different levels, independently of environmental changes [4]. Some important computational properties and collective behaviours of bacteria colonies are shown in [4].

This paper presents an algorithm inspired by the exploratory behaviour of environmental resources by a colony of bacteria, named BaCARO-II, extended from [5, 6], for mining association rules of items in transactional databases and introduces the necessary modifications so that it can be applied to data streams. As an outcome of the modifications, the new bacteria algorithm is able to avoid the genic conversion problem discussed in [7].

The bacterial colony algorithm is compared to other bio-inspired heuristics, more specifically the Bacterial Foraging Optimization (BFO) [8], a Genetic Algorithm (GA) [9], and the Clonal Selection Algorithm (CLONALG) [10], which were adapted to perform association rule mining of static and stream data. The following performance measures are accounted for: support (S), confidence (C), interestingness (I), number of rules (U), and processing time (P).

The paper is an extension of [11] and it is organized as follows. Section 2 provides some theoretical background on association rule mining and Section 3 a review of data stream processing models. Section 4 provides the biological foundations of bacterial colonies and Section 5 presents an overview of bacterial algorithms. Section 6 introduces two bacterial algorithms applied to association rule mining in static and dynamic environments. Section 7 shows the experimental results and, finally, the final considerations and future works are provided in Section 8.

The abbreviations used for the algorithms in this research are as follows:BaCARO-II: Bacterial Colony Association Rule Optimization-IIBFO: Bacterial Foraging OptimizationCLONALG: Clonal Selection AlgorithmGA: Genetic AlgorithmsBaCARO-II: Stream Bacterial Colony Association Rule Optimization-IIsBFO: Stream Bacterial Foraging Optimization AlgorithmsCLONALG: Stream Clonal Selection AlgorithmsGA: Stream Genetic Algorithm

2. On Association Rule Mining and Data Streams

This section provides a brief review of the two main concepts covered in this paper: association rule mining and data streams.

2.1. Association Rule Mining

Originally known as market-basket analysis, mining association rules is one of the main data mining tasks. It is a descriptive task, which uses unsupervised learning and focuses on the identification of associations between items that occur together in a dataset [1215]. A transaction is a set of items that occur together. In the scenario described in the original market-basket analysis, items in a transaction are those that are acquired together by an end user [14, 15]. An association rule is as follows:where A and C are itemsets of products selected by a consumer.

The first set A is called the antecedent and the other one C is called the consequent of the association rule. The intersection between these two sets is empty (A C = Ø), because it is redundant for an item to imply itself. The rule means that the presence of (all items in) A in a transaction implies the presence of (all items in) C in the same transaction with some associated probability [13, 15].

Given a set of transactions T, it is interesting to generate all rules that satisfy two types of constraints:(i)Syntactic constraints: the number of items that appear in a rule is limited.(ii)Support constraints: involving delimitations in the number of transactions in T that support the rule, with support, usually an input parameter, being defined as the number of transactions in T that contain A and C simultaneously.

The problem with the previous definition is that the number N of possible association rules, given a number d of items, grows exponentially, and the problem is placed within the NP-complete set [12, 13, 15]:

To illustrate how this scales, Figure 1 shows the value of N for growing values of d.

Therefore, it is not computationally feasible to generate all rules for fairly large datasets in a reasonable time. Thus, it is compulsory to somehow prune the association rules built before trying to analyse their real usefulness.

Measures of Interest. The Confidence and Support, proposed in [12, 13], are the most studied and applied measures of interest in the association rule mining literature. The support of an association rule is a measure of its relative frequency in the set of all transactions:

On the other hand, the confidence of a rule is a measure of its satisfiability or strength when its antecedent is found in T, that is to say, from all the occurrences of A, how often C also occurs in the base:

While confidence is a measure of the strength of a rule, the support corresponds to its statistical significance over the database. The interestingness of a rule, I(A C), is calculated as follows [14]:where A and C are defined as previously and T is the number of transactions in the database. This measure of interest, differently from the support, looks for low frequency rules in the database.

The Apriori Algorithm. The most well-known algorithm for association rule mining is called Apriori [13] and has the following main steps:(i)Generate frequent itemsets: a set of frequent items is the one whose support is greater than or equal to a minimum support threshold (minsup).(ii)Generate reliable association rules: the reliable association rules are those with a confidence value equal to or greater than a minimum confidence value (minconf).

A set of items of length k, i.e., with k items, is called a k-itemset. The Apriori algorithm was named after its use of a methodology for selecting items that come before others (a priori) for the generation of frequent itemsets. This feature is known as closing down.

The algorithm performs multiple scans over the database. In the first step it computes the frequency of each item. After keeping those items whose frequency is equal to or greater than minsup, it checks if those frequent items, ix, occur in conjunction with item ix+1 and together if their frequency is greater than or equal to minconf. At each new iteration on the data, the algorithm stores, incrementally, only those frequent items that satisfy minsup and minconf. Therefore, Apriori-based algorithms are not suitable for a data stream environment, because data can be scanned only once [16].

3. Data Streams

A sequence of objects that arrives in a timely order is named a data stream [17, 18]. Differently from traditional static data, data streams are continuous, unbounded, and of high speed and their data distribution changes with time. Data streams can be classified in two main classes: offline streams and online streams. An offline stream is characterized by regular bulk arrivals, while an online stream is characterized by real-time updated data that come one followed by the other in time. Unlike offline data streams, bulk data processing is not possible for online stream data [19]. As the number of applications over data streams grows rapidly, there is an increasing need to perform data stream mining tasks, such as classification, clustering, and association rule on stream data.

There are three major stream data processing models for rule mining [20]:(i)Landmark model: it mines all frequent itemsets over the entire log of stream data from a limited point of time, named landmark, to the current one. This simple model is not suitable for applications where the user is interested in the most recent information of data streams.(ii)Damped model: also named time-fading model, it finds frequent itemsets in stream data in which each transaction has a weight decrease with time. Older transactions have a smaller weight toward itemset frequencies, i.e., different weights for new and old transactions.(iii)Sliding-window model: it finds and maintains frequent itemsets in sliding-windows. Only part of the data streams within the sliding-windows are stored and processed at the time while the data flows in. The sliding-window size is defined based on the application and system resources. The result depends on recently generated transactions in the window range.

All approaches have been used in different researches on data stream mining. Selecting which kind of stream data process model to use largely depends on the application demands. The three approaches are summarized in Figure 2.

Some data stream applications involving association rule mining include estimating missing data in sensor networks [21]; predicting the frequency of Internet packet streams [22]; finding alarm incidents from streams [23]; determining frequent itemsets over online data streams [24]; and association analysis [2527].

Open Problems in Data Stream Association Rule Mining. Despite the many applications, these tools are focused on specific areas, and none of them fully deal with the main open issues in data stream association rule mining [16]:(i)There is not enough time to rescan the whole database or to perform a multiscan, as in traditional data mining algorithms.(ii)The data stream mining method needs to adapt to the data distribution, i.e., avoid the drifting problem [28].(iii)The speed of the mining algorithm should be faster than the data arrival rate.(iv)Due to the stream properties, the analysis results of data streams often keep changing as well.(v)A mining mechanism that adapts itself to the available resources is needed.

4. Some Notes on Bacterial Colonies

Bacterial colonies have different behavioural patterns, including foraging, reproduction, communication, sporulation, and motility [29, 30]. They perform a distributed and parallel information processing and each bacterium is an autonomous system capable of sending, storing, processing, and interpreting information. This gives the bacterium a certain freedom to choose its response according to the messages received as part of the chemical distributed processing of information from the colony.

Bacterial communication occurs via chemical signals. The main entities around this communication are the signalling cell, the target cell, the signal molecule, and the receiver protein. The signalling cell sends the chemical signal, presented by the signal molecule, to one or more target cells. The target cells read the message contained in the signalling molecule via protein receptors and then send the message to the intracellular gel. The signalling molecule does not enter the bacteria; the responsible one for decoding and sending each message to the intercellular plasma is the receiver protein [31].

The most studied bacterial communication process in the literature is quorum-sensing, which depends on the concentration of a diffusible molecule called autoinducer [32, 33], and works only in a high density colony. The concentration of autoinducers increases in the environment with the growth of the number of cells that produce them, thus promoting the activation or suppression of gene expression that are responsible for generating certain behaviours in bacteria. Quorum-sensing works as a micro and macro communication mechanism. In the intracellular communication network, a bacterium analyses and interprets the data read from the environment. The macro level information processing is represented by the biochemical interactions of the colony, which correspond to the extracellular communication.

The motion patterns, named taxes, that the bacteria generate in the presence of chemical attractants and repellents are called chemotaxis. The bacteria movement can be done by swimming, which means moving in the same direction, and if a bacterium performs successive swimming steps, we say it is performing a running step, and, finally, if it is moving in a random direction we say it is tumbling. Swimming and tumbling (chemotactic behaviour) are individual and stochastic responses that result in emergent global responses, such as swarming.

Reproduction in bacteria is performed after some chemotaxis steps. The bacteria fitness is used to select those who will die, and the survivors are divided into two new bacteria placed in the same direction. In other words, the survivors are cloned via asexual reproduction, and the clones stay in the same region as their parents.

5. Bacterial Colony Algorithms: BFO and BaCARO-II

There is currently a number of bacteria-inspired algorithms. The pioneer proposal was called Bacterial Chemotaxis Algorithm (BCA) [34] and bacterial foraging behaviours have been used as inspiration for the design of other algorithms, such as the Bacterial Foraging Optimization (BFO) Algorithm [8], Bacterial Colony Optimization (BCO) [35], and Bacterial Colony Association Rule Optimization (BaCARO) [5, 6]. This section describes BFO, which is one of the most well-known proposals in the literature, and a version of our approach, named BaCARO-II. The nomenclature of the parameters used by the algorithms is as follows:P: populution of candidate solutionsBacnum: number of bacteria in a populutionNed: number of elimination and dispersal stepsNre: number of reproduction stepsNc: number of chemotactic stepsNs: number of swim stepsPed: probability of elimination-dispersal: probability of intracellular communication: probability of extracellular communication: probability of changing information: extracellular network size

5.1. The Bacterial Foraging Optimization Algorithm: BFO

The Bacterial Foraging Optimization (BFO) algorithm simulates the foraging strategy of Escherichia Coli and was originally designed to solve optimization problems in continuous environments. It takes inspiration in the following bio-inspired mechanisms [8, 36]: chemotaxis, reproduction, elimination, and dispersion.

Algorithm 1 summarizes the main steps of the BFO algorithm for solving a minimization task. It starts by initializing all the input parameters: a colony P with Bacnum bacteria of the same dimension as the problem to be solved; number of elimination and dispersal steps (Ned); number of reproduction steps (Nre); number of chemotactic steps (Nc); number of swim steps (Ns); the elimination-dispersal probability (Ped); and number of bacteria to be selected for reproduction (Sr).

procedure [P] = BFO(,,,,,,)
initialize P()
for l=0 todo//Elimination-dispersal loop
for k=0 todo//Reproduction loop
for j=0 todo//Chemotaxis loop
Apply chemotaxis
foreach Bacterium in P do
if Fitness(Bacterium) ≥ Fitness()then
Bacterium
end if
end foreach
end for//Chemotaxis
P SortByCellFitness(P,)
P = Clone(P)
end for//Reproduction
foreach Bacterium in Population do
if Random() ≤ then
Bacterium BacteriumAtRandLocation()
end if
end foreach
end for//Elimination-dispersal
return
end procedure

The algorithm first applies chemotaxis and reproduction until their thresholds are reached and then follows with elimination-dispersal. During reproduction a bacterium is cloned (duplicated) with no mutation. During chemotaxis, the health (fitness) of each bacterium is assessed and a number Sr of the healthiest ones are cloned, while the others are removed from the population. Bacteria are then allowed to swim for a number of swim steps (Ns), moving to different locations. If the new location results in improved (healthier) bacteria, then they keep swimming in the same direction; otherwise they tumble, exploring other regions of the search space. Finally, bacteria can survive or be removed from the population with probability Ped. Whenever a bacterium is eliminated, another one is generated in a random position (disperse).

BFO is the bacterial-inspired algorithm more extensively applied to solve problems in different areas [37, 38], such as global optimization [39], engineering design [40], power system [4143], optimal design [44], network planning [45], and data analysis [4648].

5.2. The Bacterial Colony Association Rule Optimization Algorithm: BaCARO-II

The algorithm named Bacterial Colony Association Rule Optimization-II (BaCARO-II) is inspired by the biological processes of intra- and extracellular communication networks of bacterial colonies, as well as quorum-sensing, chemotaxis, and bacterial dispertion [1, 49]. In BaCARO-II, intracellular communication [50] is used to search better gene rearrangements so that bacteria present a higher fitness, and extracellular communication is used to coordinate bacterial motility over the search space. Quorum-sensing is applied to evaluate the neighbourhood and use the synergy of individual and collective decisions, and chemotaxis is used to make fine adjustments during intracellular communication: if the new gene arrangement is worse than the previous one (position in the search space), it can be undone. Finally, dispersion promotes the movement of bacteria away from regions of high concentrations of bacteria.

BaCARO-II starts by initializing a random colony of size equal to the search-space dimension. The artificial colony is evaluated and each bacterium has a probability of making intracellular communication . The bacteria randomly selected to perform intracellular communication reconfigure their gene expression and if the new rearrangement is better than the previous one, the latter is adopted. The colony fitness is updated and the extracellular step begins. Each bacterium starts to perceive its neighbourhood, and those in the same region disperse to new regions. Those that are not occupying dense regions are selected with some probability , a total of surrounding bacteria to change information with their neighbours according to a value and move to the best direction. After that, fitness is computed. Finally, the colony is confronted with an environmental pressure that leads to the selection of the bacteria with highest fitness values to the next generation. The synergy of intracellular and extracellular communication results in quorum-sensing, which is the core of most bacterial algorithms. The pseudocode of BaCARO-II is summarized in Algorithm 2.

procedure [P] = BaCARO-II(,,)
initialize P
t 1
f evaluate(P)
while not_stopping_criterion do
for i=0 to Size(P) do//Intracellular communication loop
rf inCellular(P,,)
f update(f,rf)
end for
for j=0 to Size(P) do//Extracellular communication loop
exCellular()
Foreach in eachdo
if bacterialDensity()==true then//Quorum-sensing
P disperse()//Dispersion
else
Foreach in do
MoveToBestDirectionInExtracellularNetwork()
end Foreach
end Foreach
end for
P
f evaluate(P)
P select(P,f)
t t+1
end while
end procedure

6. Bacterial Colonies in Association Rule Mining

This section describes how the different bacteria-inspired algorithms were adapted to solve association rule mining problems in static and dynamic environments. As presented in the previous section, BFO takes into account reproduction, chemotaxis (tumbling and swimming), and elimination-dispersal mechanisms. By contrast, BaCARO-II uses chemotaxis (tumbling and swimming), intra- and extracellular communication, and dispersion. These mechanisms will be presented here so that both algorithms can be applied to solve association rule mining tasks.

6.1. Encoding Scheme

Instead of initializing the agents in a real interval (), we randomly set them as pairs of binary values (00, 01, 10, or 11) for each vector position. A pair of bits represents each item in a transaction, where items present in the association rule are represented by a bit pair of 00 (antecedent of the rule) or 11 (consequent of the rule). Items out of a rule are composed of the other combinations: 01 or 10. Figure 3 illustrates an artificial bacterium encoding the following rule: .

6.2. Reproduction

The surviving bacteria are cloned without mutation.

6.3. Chemotaxis: Swim and Tumble

Another modification to mine association rules was made in the chemotactic behaviors. A rule of size is more probable than a rule of size . The tumbles were implemented by randomly choosing a rule part (antecedent or consequent) to be shortened and removing an element from this part. If after the tumble the bacterium adaptation level (fitness) increases, it starts to run (applying swim steps) by removing items from the same part until its size is equal to 1 or the number of swim steps (user-defined parameter) is reached, as illustrated in Figure 4. Note that, in terms of chromosomes, the bacteria maintain the same length after swim and tumbling; what changes is only the number of items in the encoded rules.

On the other hand, if after tumbling the bacterium maintains its adaptation value (fitness) the chemotactic behavior is finalized, as illustrated in Figure 5.

6.4. Elimination-Dispersal Mechanisms

This step has two parts:(1)Elimination: removal of some bacteria from the colony based on their fitness (adaptability).(2)Dispersal: randomly changing the positions of the bacteria in the search space.

6.5. Intracellular Communication

In this step each bacterium has an associated probability of performing internal communication. The parts that make up a rule are identified as exchanging structures and the items of these structures may assume a new position in the rule, that is, a new gene expression, as illustrated in Figure 6.

6.6. Extracellular Communication

Extracellular communication is used to coordinate bacterial motility as a collective behaviour over the search space by sharing information in a chemical network. The chemical network is used to control the range of information into a part of the colony, a group. In our model, the information shared by the bacterium with higher fitness is considered by the others as a reference to move around the search space. In a higher density group, the collective behaviour adopted is to disperse to new regions.

6.7. Evaluation Function

The evaluation, fitness, or objective function should reflect the relevance of the measures to be optimized, exhibit regularities over the space defined by the chosen representation, and provide enough information to drive the environmental pressure of a population-based search algorithm [51]. The measures of interest often used in Evolutionary Algorithms and Artificial Immune Systems to compute fitness values are based on those employed for classification rule mining, with some slight modifications.

Confidence and support were used in [5254] to define the fitness function aswhere and and minSupp and minConf are, respectively, the user-defined minimum threshold values for support and confidence. Another fitness function present in the association rule mining literature is

As in , minSupp is also the minimum threshold value defined by the user. There are other fitness functions in the field [52, 55, 56], but they are essentially different combinations of support, confidence, and other measures of interest. A detailed description of various measures of interest usually applied in the association rule mining literature is available in [57].

The evaluation of each bactetium is related to the occurrence probability and accuracy of an association rule in the database. The selection of bacteria is proportional to their fitness values. The fitness function used in BaCARO-II and in the benchmark algorithms iswhere w1 = w2 = 0.5 and w1 + w2 = 1, subject towhere returns the cardinality of a set.

The algorithms use support and confidence to calculate the fitness value and the interestingness measure to compare them from a different perspective, as in [14, 58].

7. Experimental Results

To assess the performance of the algorithms, we run several experiments over distinct scenarios. The first set of tests was performed using five different binary static datasets and the second was run applying a sliding-window approach in the datasets to simulate the data streams. Finally, some experiments were performed investigating the computational complexity of the algorithms using a standard and an optimized architecture.

The following algorithms were implemented for comparison: BFO; BaCARO-II; GA; and CLONALG, as well as their stream versions sBFO; sBaCARO-II; sGA; and sCLONALG [59]. All algorithms were implemented in Java 1.7.0_95 over a GNU/Linux environment (Debian 3.16.7-ckt20-1). The experiments were run in an Intel Pentium® Dual-core CPU t4500 @ 2.30GHz.

7.1. Performance Tests in Static Datasets

The BFO parameters were set as follows: , , , , , and . The BaCARO-II parameters were set as follows: , , , and . For CLONALG we used , , and , and, finally, for GA we used and . All populations were set with 100 individuals and the maximum number of iterations was 100.

The following datasets were taken from the UCI Machine Learning Repository [60]: SPECT Heart database, with a sparsity of 66.75%; Mushroom Database, with 119 items and 8,124 instances with a sparsity of 80.67%; Balance Scale Database, with 23 items and 625 instances with a sparsity of 78.26%; Flare Data, with 49 items and 1,389 instances with a sparsity of 73.47%; and the Monks Problems-1 Database, with 19 items and 432 instances with a sparsity of 63.16%; and the Nursery Database with 32 items and 12,960 instances with a measure of sparsity around 71.88%.

All the values taken over ten simulations of BFO, BaCARO-II, CLONALG, and GA for static environments are summarized in Table 1, while sBFO, sBaCARO-II, sCLONALG, and sGA for static and dynamic environments are summarized in Table 2. The values presented are the mean ± standard deviation and minimum and maximum values for the set of rules found in the final population of each algorithm over ten simulations, where S means support, C confidence, I interestingness, U number of unique rules found over the last set of candidate solutions, and Time the processing time. As S and C are used in the fitness function, we selected the best fitness value from the final population. On the other hand, I is conceptually different from S and C and we used it to estimate the heterogeneity of solutions in the final population, as well as U.

In general, BaCARO-II presented better results than BFO, CLONALG, and GA in most measures. For instance, BaCARO-II overcomes BFO in all five datasets for the S and P measures. It occurs because BFO makes use of its global information by compounding a measure value of each attribute of the bacterium to influence the entire colony. BaCARO-II uses its global information to promote punctual variations along the colony and improve its search ability. By improving it, BaCARO-II tends to maintain many agents over the same high adaptable regions. Consequently, BFO sometimes overcomes BaCARO-II in the U measure by applying more local search steps, avoiding the concentration of large numbers of agents in the same region. On the other hand, BFO makes less use of global information and then BaCARO-II presents better fitness values as well as processing time.

BaCARO-II presented competitive results for all datasets. The best performance of our bacterial algorithm was for the Mushroom, Monks, and Nursery databases. The average values of support, confidence, and interestingness of our approach are higher than those presented by BFO. However, the number of rules generated by BaCARO-II is not greater than that of BFO in most datasets. On the other hand, our approach produces association rules with higher values of support and confidence. Another favourable point for BaCARO-II is its average processing time, which is smaller than its competitors. Nevertheless, BaCARO-II performs worse than BFO, GA, and CLONALG for all databases for the unique rules measure.

7.2. Bacterial Colony Algorithms in Stream Data

The same parameter configurations adopted in the static environment were applied to the dynamical case. As datasets have different sizes, we fixed the sliding-window size at 100, changing 1 object per iteration.

By considering the highlighted performance of our algorithm presented here and in other works [5, 6], we designed dynamical environments to evaluate its robustness and flexibility in mining association rules. In fact, we converted the following static datasets, SPECT, Balance Scale, Flare, Monks, and Nursery, to dynamical datasets by applying the Sliding-Window approach over them. To differentiate static and stream databases, we refer to the stream versions as streamSPECT, streamBalance, streamFlare, streamMonks, and streamNursery.

For experimental proposes, we fixed the sliding-window size at 100 objects per time step ti of the data stream and its transition from ti to ti+1 occurs when one object from the stream enters and another leaves the sliding-window, which always maintains its size. The sliding-window schema, data stream, and its transactions used in the experiments are illustrated in Figure 7.

The results obtained by the stream versions of the algorithms (sBaCARO-II, sCLONALG, sBFO, and SGA) in the dynamic environments for streamSPECT, streamBalance, streamFlare, streamMonks, and streamNursery output are summarized in Table 2.

Although the final result is based on the different objects that run through the sliding-window during the association rule mining process, it is undeniable that the objects at the final time t are the most relevant for the development of the previous ones.

To validate the results obtained in static and dynamic environments we compared the results of our approach with BFO; we choose this one instead of GA or CLONALG due to its superior performance during experimental results, using Student’s t-test with two-tailed distribution. In the static environment, for the Balance database, the t-test showed no statistical difference for the highest values of the support and confidence measures, 8.53 and 0, respectively; the t-test for the Flare database indicates, respectively, the statistical difference of 0.00017 and 0.1341 for the measures of support and confidence; for the Monks database, the value obtained by the statistical difference t-test for the support was 0.015, while for the confidence it was 0.167; already in the Mushroom database, the t-test registered 0.025 for the support measure, while it did not record a difference for the confidence measure; for the Nursery database, the values indicated by the t-test were 0.010 and 0.006 for the confidence and support measures, respectively; and finally, for the SPECT database, the t-test pointed to the largest statistical difference between the algorithms, the support recorded with 0.489 for the support measure and 0.109 for the confidence measure.

In the dynamic environment, the t-test for the Balance database registered 0.009 and 0.041 for the support and confidence measures, respectively; for the Flare database, values of 2.2 and 0.343 were, respectively, recorded for support and confidence; already in the Monks database, the t-test for the support was pointed out with 7.93 and showed no statistical difference for the confidence measure; for the Mushroom database the t-test showed statistical difference for both measures because sBaCARO-II did not generate any rule; and finally, the t-test indicated 3.73 and 0.0006, respectively, for support and confidence measures.

7.3. Some Notes on BaCARO-II Running Time

To assess the running time of the proposed algorithm, we tested its static version using a different hardware and software architecture: an accelerating performance for server-side Java [61] applications, an optimization on JVM (Java Virtual Machine) from version 1.8 to newer versions to Intel® new Xeon Scalable Processors. We performed new experiments aiming at investigating Intel’s High Performance Computing (HPC) platforms benefits. These new experiments were made on a compute node composed of two Intel® Xeon® Platinum 8160 processors @ 2.10 GHz, each one with 24 physical cores (48 logical) and 33 MB of cache memory, 190 GB of RAM, two Intel® Solid State Drive Data Center (Intel® SSD DC) S3520 SERIES with 1.2 TB e 240 GB store capacity, and a CentOS 7 operation system running kernel version 3.10.0-693.21.1.3l7.x86_64. Table 3 provides a comparison of the running times of BaCARO-II for the static datasets in both architectures. As can be observed, the use of an HPC platform leads to an average 2.60-fold gain in performance.

There are many phenomena happening in a bacterial colony. Some of them, such as foraging and chemotaxis, were used to construct tools to solve complex problems. This paper proposed and applied a new bacteria-inspired algorithm by looking at intra- and extracellular communication networks, as well as interactions between bacteria and their internal constituent parts to deal with association rule mining. The results presented by BaCARO-II showed a superior performance to other bio-inspired algorithms, such as BFO, GA, and CLONALG when applied to the same tasks.

With the current need of solving stream data problems, we designed and applied versions of BFO, BaCARO-II, GA, and CLONALG for mining association rules in stream data. The proposed bacterial approach showed good results in the experiments performed, in both static and stream data. We understand that the superior performance of our approach is primarily due to two reasons: first, the local search performed in the intracellular communication phase and, second, the use of information available in the neighbourhood (nearest bacterial cell) of each bacterial cell to improve the search space exploration. BFO was very competitive and presented better results in some dynamic scenarios, though it demands longer processing time.

As future investigations, sBaCARO-II should be applied to stream data mining tasks with different kinds of stream data processing models, Landmark and Damped. Other settings for the Sliding-Windows size should also be tested and the results compared with other algorithms, such as the ones presented in [62, 63]. Future works may also include a deeper understanding of bacterial behaviours and phenomena.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The authors thank CAPES, CNPq, Fapesp, and Mackpesquisa for the financial support. The authors also acknowledge the support of Intel for the Natural Computing and Machine Learning Laboratory as an Intel Center of Excellence in Artificial Intelligence.