Abstract

For most real-world data streams, the concept about which data is obtained may shift from time to time, a phenomenon known as concept drift. For most real-world applications such as nonstationary time-series data, concept drift often occurs in a cyclic fashion, and previously seen concepts will reappear, which supports a unique kind of concept drift known as recurring concepts. A cyclically drifting concept exhibits a tendency to return to previously visited states. Existing machine learning algorithms handle recurring concepts by retraining a learning model if concept is detected, leading to the loss of information if the concept was well learned by the learning model, and the concept will recur again in the next learning phase. A common remedy for most machine learning algorithms is to retain and reuse previously learned models, but the process is time-consuming and computationally prohibitive in nonstationary environments to appropriately select any optimal ensemble classifier capable of accurately adapting to recurring concepts. To learn streaming data, fast and accurate machine learning algorithms are needed for time-dependent applications. Most of the existing algorithms designed to handle concept drift do not take into account the presence of recurring concept drift. To accurately and efficiently handle recurring concepts with minimum computational overheads, we propose a novel and evolving ensemble method called Recurrent Adaptive Classifier Ensemble (RACE). The algorithm preserves an archive of previously learned models that are diverse and always trains both new and existing classifiers. The empirical experiments conducted on synthetic and real-world data stream benchmarks show that RACE significantly adapts to recurring concepts more accurately than some state-of-the-art ensemble classifiers based on classifier reuse.

1. Introduction

Advances in technology in recent years have witnessed an upsurge in the number of applications that generate large amounts of data streams at unprecedented volumes and speed. Examples of such real-world applications include network intrusion detection [1], sensor networks, spam filtering systems [2], and credit card fraud detection [3].

One of the biggest challenges faced by machine learning tasks in data stream learning is concept drift [4], where the data generating mechanism is constantly evolving and the statistical properties of the target concept change over time. Changes that happen in the underlying distribution of the data lead to a significant drop in predictive performance of the learning model. Wang et el. [3] described the term concept in machine learning as the quantity that a learning model is trying to predict. Concept drift often occurs in real-world applications, for example, in weather prediction where prediction models may change due to changes in seasons and consumer preferences may change over time due to seasons, fashion, and economy. Changes that occur in the underlying distribution of the data often lead to a drastic drop in classification performance of the learning model.

An efficient and effective online learning model must have the ability to recognize and respond to such changes accordingly and accurately. In streaming data, different types of concept drifts can be identified. Concept drifts can be categorized based on their speed into sudden and gradual drifts [4]. Sudden concept drift is characterized by severe changes between the underlying class distribution and the incoming instances in a relatively short amount of time. Gradual concept drift takes a relatively large amount of time for significant changes to be revealed in differences of underlying class distributions between the old instances and the incoming instances. Regardless of the type of drift currently occurring, an online learning model must be able to track the drift, recognize its type and adapt to changes accordingly. In many real-world applications, it is common that patterns or concepts recur over time. Context recurrence is a common situation concerning concept drift. Domains associated with context recurrence include weather prediction where learning models change according to seasons. Other domains include financial prediction and dynamic control. Recurring contexts may occur due to cyclic phenomena such as seasons of the year or may be associated with irregular phenomena such as inflation rates or market condition. This phenomenon of recurring concepts is one of the key challenges that online learning algorithms [5] need to deal with. In the event that concept drifts recur, previously learned models may be applied to handle recurring concepts. Existing algorithms consider recurring concepts as new concepts, thereby increasing computational overheads as more classification models are generated. If patterns or concepts recur, previously learned classification models should be reapplied; thus, the predictive performance of the learning model can be optimized. The application of previously learned models may impact both negatively and positively on learning the current concept. Preserving all previously learned classification models induces overheads in both storage and computation, for example, when repeatedly assessing the performance of previously learned classification models on new training data. For this reason, the number of preserved models should be subject to some constraints, instead of increasing indefinitely. A selection scheme is required to decide which previously learned classification models should be preserved. As learning algorithms work at handling different kinds of drift, they tend to better represent the last observed concepts and discard previously learned concepts. Two research questions need to be answered when designing an ensemble classifier to handle recurring concepts; that is, which previously learned classification models should be preserved for future use? And how to exploit the preserved classification models to facilitate adaptation to recurring concepts?

To address the above research questions, this paper first reviews the latest progress on machine learning algorithms for handling recurring concepts and then proposes the Recurrent Adaptive Classifier Ensemble (RACE), specifically designed to handle recurring concept drifts in dynamic environments. RACE employs J48 Decision Tree, Multilayer Perceptrons (MLPs), and Support Vector Machines (SVMs) as base learners in order to maximize diversity and create dynamic decision boundaries separating the training instances, a change detection algorithm, and a diversity based strategy for preserving previously learned models to handle recurring concepts. When a new data chunk arrives, classification models of high diversity are adapted to the new training data.

The rest of this paper is organized as follows. Section 2 presents a review of related work. Section 3 introduces the Recurrent Adaptive Classifier Ensemble (RACE). Section 4 presents the empirical analysis of the comparison between RACE and other state-of-the-art algorithms designed to handle recurring concepts using selected datasets considering the accuracies achieved and how the algorithms handle recurring concepts.

Scenarios associated with recurring concepts are not uncommon, and a number of contemporary approaches have been proposed to address recurring concepts with minimum overheads. Many machine learning techniques have emerged in the literature as candidate solutions, and ensemble classifiers have demonstrated the ability to handle different types of drifting concepts in nonstationary environments. Hassan [6] proposed a concept drift adaptation technique in distributed environment for real-world data streams. The algorithm uses drift detection method; if concept drift is detected, it retrains the model, and knowledge of previously learned concepts is lost. The approach does not automatically identify the type of drift. Sarnovsky [7] proposed the heterogeneous adaptive ensemble model for data stream classification which utilizes the dynamic class weighting scheme and a mechanism to maintain the diversity of the ensemble members. The algorithm implicitly handles recurring concepts, and classifiers with lower weights are discarded, making it difficult to handle recurring concepts. Liu [8] proposed an instance based ensemble learning algorithm called the diverse instance weighting ensemble (DiwE). The algorithm weights classifiers according to their performance, and poorly performing classifiers are discarded. Heusinger [9] proposed a combination of the modified versions of Robust Soft Learning Vector Quantization (RSLVQ) and Generalized Learning Vector Quantization (GLVQ) to learn streaming data and adapt to all types of concept drift. The integration of Adadelta and Adamax into RSLVQ and GLVQ optimized the prediction performance over their vanilla versions. The combined algorithm does not detect drifts and does not handle concept drift explicitly. Zheng [10] proposed a semisupervised classification algorithm on data streams with recurring concept drift and concept evolution in data streams with partially labeled data. The framework uses the Jensen–Shannon divergence based change detection technique on classifier confidence score instead of classification error rate to detect recurring concept drift. The algorithm uses too many parameters that are difficult to tune. Namitha [11] proposed a novel algorithm to identify recurring concepts in data stream clustering. If concept drift is detected, the algorithm retrieves the most matching model from the repository. The algorithm has no strategy to prevent the repository from growing or increasing indefinitely. Wing [12] proposed a bagging ensemble that adapts to concept drift by using a dynamic cost-sensitive weighting scheme for component classifiers according to their classification performances and stochastic sensitivities. The algorithm discards classifiers whose weight is below a predefined threshold, making it unable to adapt to recurring concepts. Zang [13] presented the drift detection based incremental ensemble (DIE) that combines the operations of concept drift detection and component update mechanism to react to different types of concept drift. DIE assigns weights to classifiers and discards classifiers whose weight is below a predefined threshold, making it difficult to react to recurring concepts. Baidari [14] proposed the Accuracy Weighted Diversity based Online Boosting (AWDOB) which is based on an Adaptable Diversity based Online Boosting (ADOB). AWDOB uses an accuracy weighting scheme that exploits the accuracy of the current expert and the number of correctly classified and incorrectly classified instances of all experts to assign the current expert weight to the current instance in the data stream. Experts with lower weights are discarded from the ensemble. The process of calculating and assigning weights takes time and slows the learning process. Gu [15] presented the a novel self-organizing fuzzy inference ensemble framework (SOFEnsemble) which is capable of self-learning, processing streaming data on a chunk by chunk basis, and continuously self-updating the decision boundaries by identifying the more representative samples. SOFEnsemble has a high computational efficiency, and the use of fuzzy inference slows down the learning process. Zeng [16] proposed a chunk based incremental ensemble algorithm called Dynamic Updated Ensemble (DUC) for learning imbalanced data streams with concept drift. DUE periodically updates previous components to make the ensemble react to different kinds of concept drift, and the final decision of testing events is based on the weighting voting value of a certain number of best performing classifiers. DUE discards classifiers whose weight is below a predefined threshold making it unable to accurately react to recurring concepts. Liu et al. [17] proposed a comprehensive online active learning framework (CALMID) that includes an ensemble classifier, a drift detector, a label sliding window, sample sliding windows, and an initialization training sample sequence to learn concept drift. The algorithm has a sample weight formula that assigns weights to classifiers. CALMID was found to be effective and efficient when compared to other state-of-the-art algorithms.

Most of the proposed ensemble approaches in the literature handle recurring concepts by relearning them as if the concepts are new and not recurring. Existing ensemble classifiers for recurring concepts share a common weakness; that is, when a new data chunk arrives, all the ensembles utilize all previously learned concepts without adapting them to new training data. Neither of the proposed approaches explores the exploitation of highly diverse models previously learned to handle recurring concepts by firstly adapting them to the new training data. Therefore, in this paper, a novel and evolving ensemble learning approach called Recurrent Adaptive Classifier Ensemble (RACE) is presented. RACE stores highly diverse models and does not directly combine the prediction outputs of the models. Instead, each diverse model in the archive is first adapted to the new training data, and the model which further increases the diversity of the ensemble is removed from the archive.

In the next section, we present our proposed approach, the Recurrent Adaptive Classifier Ensemble (RACE), that explicitly exploits diversity to handle recurring concepts.

3. Recurrent Adaptive Classifier Ensemble (RACE)

The Recurrent Adaptive Classifier Ensemble (RACE) employs Support Vector Machines (SVMs) as the base learner. The algorithm first builds a support vector, denoted as , with first streaming data chunk and stores the first support vector in an archive. When a new data chunk arrives, the drift detection algorithm checks if the data chunk is from the same distribution from the first created support vector. If the data chunk is from a different underlying distribution, the preserved support vector is adapted to the new data chunk and a new support vector is built from scratch from the new data chunk. The adapted support vector and new support vector are combined to constitute an ensemble to perform classification at time t. RACE does not directly combine the prediction outputs of the stored models in the library. Each preserved previously learned model is first adapted to fit the current data, and then the adapted models and the newly constructed model from the most recent data chunk are combined. Previously learned models are preserved according to a diversity based criterion as opposed to an accuracy based criterion, as the base classifiers have to perform diversely for the ensemble of classifiers to improve its prediction performance. RACE uses Yule’s Q Statistic [18] as a diversity measure to minimize the ensemble error. The diversity measure is recommended due to its simplicity and ease of interpretation [19]. RACE stores highly diverse previously learned models. The previously learned diverse models are then adapted to the current concept via knowledge transfer. A diversity measure is used to measure model diversity to keep only previously learned diverse models [20]. The transfer learning is appropriate as it optimizes the learning process in terms of accuracy and learning efficiency. To learn new concepts, previously learned diverse models are employed as initial candidates of the ensemble for learning new concepts. RACE adapts each previously learned model in the archive to the new training data. The adapted models and the model learned from new training data are combined to predict incoming instances. The newly built model is stored in the archive if it is not full. The model whose removal will lead to the largest diversity among the remaining models is removed from the archive. Algorithm 1 provides a description of the ensemble framework.

Input: () chunks of streaming data
M: a set of diverse models previously learned
Output :: the generalized ensemble model at time step t
(1)For each data chunk do
(2)Learn a new base model with
(3)Select transferred models by transferring the highly diverse stored models M
(4)Build the generalized ensemble using the transferred models and the newly learned model
(5)Update M with to maximise diversity
(6)Endfor

Algorithm 2 provides a description of the RACE algorithm. The detailed steps of the Recurrent Adaptive Classifier Ensemble (RACE) algorithm are presented in Algorithm 2 with the assumption that data arrives sequentially.

Input: ( ) the streaming data chunks
archive of ensemble models at time step t
Diversity measure : Q Statistic
Drift Detection Method Detect Drift
Output:: the generalized ensemble model at each time step t
(1)For each incoming data chunk do
(2)Train new model with data chunk
(3)Test with
(4)drift  Detect Drift ( )
(5)if drift == true
(6)adapt models to current data
(7)else
(8)Update with to maximize diversity
(9)End if
(10)If | − 1| t then
(11) {}
(12)
(13)
(14)Endif
(15)Calculate diversity of models
(16)Output
(17)Endif

The Recurrent Adaptive Classifier Ensemble (RACE) uses the Early Drift Detection Method [21] to detect drift. If concept drift is detected, the preserved models are adapted to fit the current data. EDDM is an online learning system since it does not store the training instances for posterior use. The detailed steps of the Recurrent Adaptive Classifier Ensemble (RACE) algorithm are presented in Algorithm 2, with the assumption that t data chunks arrive sequentially.

3.1. Model Preservation

Preserving previously learned models induces overheads in terms of both storage and computation. For example, iteratively assessing the predictive performance of previously learned models on new data is computationally prohibitive. To prevent the ensemble from growing indefinitely, the size of the ensemble is dynamic. Previously learned models are preserved in an archive of size n. When a data chunk arrives at step t, the preserved models in the archive are adapted to fit the current data. The drift detection helps to detect if the new data chunk is drawn from a different data distribution. The newly generated model from the current data chunk, , will be directly stored in the archive if the size of the archive is less than n. To optimize diversity, the model whose removal will increase diversity among the remaining models in the archive will be discarded from the archive. RACE combines the prediction outputs of previously learned diverse models that are representative of the current concept with the prediction output of a new model built with the first data chunk to form final decisions on testing training instances of the current concept.

3.2. Archive Size and Transfer Operation

The goal is to minimize computational overheads by creating a dynamic pool size of previously learned models from where the ensemble to learn recurring concepts and sudden and gradual concepts is generated. RACE performs a transfer of every previously learned model with the new streaming data chunks. To improve the time efficiency of RACE, we implement the transfer operation in a parallel processing manner. By parallelizing the transfer operations, the speedup ratio is optimized and the runtime level is satisfactory for nonstationary environments. In line with transfer operation of knowledge is the archive size that is dynamic to cater for other different types of concepts. Parallelization of transfer operation is best optimized with a reasonable dynamic archive size which does not grow indefinitely, since models that cause diversity among models to decrease are removed from the archive. The implementation of a drift detection mechanism facilitates detection of recurring concepts. To reduce overheads, a dynamic pool size from which models are drawn serves as a better starting point. The goal is to capitalize on the accuracy as the ensemble size fluctuates. To validate the behavior of the RACE algorithm, we conduct two experiments. The first experiment evaluates the validity of RACE using knowledge transfer. In the second experiment, the behavior of RACE is evaluated using Hidden Markov Models (HMM).

4. Experimental Configuration

The empirical experiments to assess the performance of RACE were conducted on the Massive Online Analysis (MOA) framework, a software environment for implementing machine learning algorithms and running experiments for online learning. MOA is an open source framework for data streaming mining in evolving environments. The generalization performance of RACE is compared to other state-of-the-art algorithms designed to handle recurring concepts such as the comprehensive online active learning framework (CALMID) [17], Dynamic Updated Ensemble (DUE) [16], Self-Organizing Fuzzy Ensemble Inference System (SOFEnsemble) [15], and Accuracy Weighted Diversity based Online Boosting (AWDOB) [14].

4.1. Datasets Used in the Experiments

We evaluate the performances of the algorithms with data created by five synthetic dataset generators. All data stream generators are available in MOA. The synthetic datasets contain three types of concept drift, namely, gradual, sudden, and recurring concept drift.

The Hyperplane dataset [22] is represented by the set of points x that satisfy , where is the ith coordinate of x. Two classes are distinguished in the following way: instances for which are labeled positive, and instances for which are labeled negative. Drifts are simulated by changing each weight attribute , where ⍺ is the probability that the direction of change is reversed and d is the change applied to every instance. This generator was adopted to create a dataset that contains 1,000,000 instances.

The LED dataset [23] is used to predict the digit displayed on a seven-segment LED display. The particular configuration of the generator used for the experiment produces 24 binary attributes, 17 of which are irrelevant. Concept drift is simulated by interchanging relevant attributes. A stream of 1,000,000 instances was generated.

The Random Tree dataset [24] is generated by the Random Tree generator. The dataset contains 1,000,000 instances and 10 attributes. The dataset has four recurring concepts which are evenly distributed among the instances.

The SEA dataset [25] consists of three attributes, where only two are recognized as relevant attributes. All three attributes have values between 0 and 10. The points of the dataset are divided into four blocks with different concepts. In each block, the classification is done using , where and represent the first two attributes and θ is a threshold value. The dataset contains 1,000,000 instances.

The last artificial dataset adopted for this study is the STAGGER Boolean Concepts. The dataset presents enough variety of drifts to perform principled studies. It allows a proper analysis considering several types of drift with different amounts of severity and speed. STAGGER Boolean Concepts dataset generates the data with categorical features using a set of rules to determine the class label. The dataset contains three nominal attributes, namely, size = {small, medium, large}, color = {red, green}, and shape = {circular, noncircular}. Concept drift is simulated by changing the items in the rules. Before the first drift, instances are labeled positive if (color = red) and (size = small). Before the occurrence of the second drift, instances are classified as positive if (color = green) and (shape = circular), and after the second drift, instances are classified as positive only if (size = medium) and (size = large).

Table 1 provides a description of the real-world datasets used in the experiments. The datasets include Airlines [26], KDD99 Cup [27], Covertype [28], Poker Hand [29], and Sensor Data [30].

4.2. Evaluation of RACE

This section investigates the proposed algorithm and compares its predictive accuracy and drift handling capabilities with existing ensemble based approaches: CALMID, DUE, SOFEnsemble, and AWDOB. We also investigate in the second experiment the effect of Hidden Markov Model on the predictive performance and its recurrent drift handling capabilities.

The predictive performance and the recurrent drift handling capabilities of RACE were tested on both artificial and real-world datasets, and corresponding ranks of all algorithms are determined in such a way that higher averages represent lower ranks. Significance tests and post hoc comparisons on ranks are performed to determine significance levels and critical differences. The prediction accuracies and average ranks of RACE, CALMID, DUE, SOFEnsemble, and AWDOB are shown in Table 2.

It is evident from the table that shows accuracy measures that RACE performed significantly better than CALMID, DUE, SOFEnsemble, and AWDOB. The Nemenyi test [31] was applied for pairwise comparison. The critical difference is 1.432. From the figure that provides the average ranks of algorithms compared, it is evident that RACE performed significantly better than the other four algorithms. Figure 1 shows the critical difference plots from post hoc Nemenyi tests of average rankings for experiments on all datasets.

To further evaluate the drift handling capabilities of RACE against the other four representative and current algorithms designed to handle concept drift, we introduce the two Kappa evaluation measures, Kappa Temporal and Kappa M, on all the five algorithms designed to handle recurring concepts. The Kappa evaluation measure is widely used in data stream learning and can handle both multiclass and imbalanced class problems. The larger the Kappa value, the more generalized the classifier, and a negative Kappa value is an indication of low predictive accuracy. Kappa Temporal values are shown in Table 3.

Table 4 shows the Kappa M values of all the datasets used.

Kappa values for both Temporal and M are positive as the attributes in the datasets are averagely balanced.

The statistical tests applied on Kappa Temporal on artificial and real-world data streams showed significance differences at any specified level of significance. Statistical tests for Kappa M on both artificial and real-world datasets also showed significance differences at a specified level of significance, and for this experiment, we chose 0.05. The Nemenyi test [31] was applied for Kappa Temporal and Kappa M for pairwise comparison. The critical difference (CD) is 1.421. RACE performed significantly better than CALMID, DUE, SOFEnsemble, and AWDOB.

4.3. Resources Comparison

To analyze the benefits in terms of resources usage, we compare CPU time and memory consumption of RACE, CALMID, DUE, SOFEnsemble, and AWDOB using real-world data streams since they have large numbers of attributes. The ensemble sizes of all the algorithms are dynamic; that is, they vary in size given the task at hand. Lower values generated in the two scenarios are considered to be the best for each algorithm. Corresponding ranks are determined such that higher averages are representing lower ranks.

Table 5 shows the memory consumption (MB) of each algorithm on each dataset.

According to Table 5, in most cases, RACE achieved minimal memory consumption while AWDOB consumed the most memory. The insertion and deletion of models make memory usage lower for RACE when compared to other algorithms.

Table 6 shows the CPU processing time(s) for each algorithm on each real-world dataset.

As shown in Table 4, through the comparative analysis, we found that RACE consumed the least processing time, followed by CALMID, and SOFEnsemble has the longest CPU processing time.

4.4. Accuracy over Time

Graphical plots are generated for each dataset to describe the performance curves of all the tested algorithms at each time step. The x-axis represents the number of processed observations, and the average accuracy is presented on the y-axis. The graphical plots allow adaptation abilities of all comparative algorithms under different streaming conditions to be analyzed. As shown in the accuracy over time plots, RACE achieved the highest predictive accuracies on the Hyperplane 81.67%, Stagger 79.34%, Covertype 81.56%, and Sensor Data 80.34%. In total, the RACE average ranking in both artificial and real-world data streams is 1.4, CALMID is 3.2, DUE is 3.9, SOFEnsemble is 2.5, and AWDOB is 4.0.

Figure 2 shows the accuracy over time plots of the five algorithms on the Hyperplane dataset that exhibits gradual concept drift. The accuracy of all the algorithms shows the same trend. RACE performs the best, followed by DUE, and CALMID performs the worst. RACE is designed to adapt to all types of concept drift.

Figure 3 demonstrates the accuracy over time plots of the five algorithms on the Stagger dataset which exhibits sudden concept drift. As can be observed, RACE performs the best, followed by DUE, and CALMID is the third, while SOFEnsemble and AWDOB are the worst.

Figure 4 shows the accuracy over time plots of the five algorithms on the LED dataset which is devised to evaluate the ability to handle sudden concept drift. RACE performs the best, followed by AWDOB and then CALMID. SOFEnsemble and DUE perform poorly.

Figure 5 shows the prediction accuracy of the five algorithms on the SEA dataset which is devised to evaluate the ability to handle sudden and gradual drifts. The trend of all the five algorithms is basically the same. Among them, RACE performs the best, followed by DUE and AWDOB, and SOFEnsemble performs the worst.

Figure 6 shows the accuracy over time plots of the five algorithms on the Random Tree dataset which is devised to evaluate the ability to handle recurring concepts. AWDOB performs well in the first observed instances, but as the number of observed instances increases, RACE outperforms all the four algorithms.

Artificial data streams are typically designed for controlled environments. When handling real-world classification problems, several challenges emerge. The major issue is that of the identification and location of the concept drifts. Accordingly, RACE was evaluated on real-world data streams, namely, Airlines, Forest Covertype, KDD99 World Cup, Poker Hand, and Sensor Data. With the five real datasets and the five observations, significance tests were performed and the obtained results showed improvements. Figures 711 show the accuracy over time plots of the five algorithms on five real-world datasets.

RACE achieved the highest predictive accuracies: Covertype 81.56%; Poker Hand, 84.31%; Sensor Data, 80.34%. The overall average ranking of RACE is 1.4, CALMID 3.2, SOFEnsemble 3.9, DUE 2.5, and AWDOB 4.0.

Figure 7 shows the accuracy over time plots of the five algorithms on the Airlines dataset. DUE performs well in the first observed instances, but as more instances are observed, RACE performs the best. SOFEnsemble performs the worst.

Figure 8 shows the accuracy over time plot of the five algorithms on the KDD99 dataset. RACE performs the best, followed by DUE. SOFEnsemble performs the worst, and the trend is the same for CALMID and AWDOB.

Figure 9 demonstrates the accuracy over time plots of the five algorithms on the Covertype dataset. RACE performs the best, followed by DUE. AWDOB performs the worst.

Figure 10 demonstrates the accuracy of the five algorithms on the Poker Hand dataset. The prediction performance of all the algorithms fluctuates with time. As more instances are observed, RACE performs the best, followed by AWDOB. DUE and SOFEnsemble perform the worst.

Figure 11 shows the accuracy over time plots of the five algorithms on the Sensor Data to evaluate gradual concept drift. RACE performs the best, followed by DUE. SOFEnsemble is the third, and AWDOB and CALMID perform the worst. RACE manages recurrent change detection mechanism by reusing previously learned concepts and generalizes well in different situations especially in different concept drift environments. However, other existing ensemble methods do not store previously learned knowledge and lack detection mechanisms, and for that they adapt poorly to different types of drifts.

For all the five real-world datasets, RACE subjects all classifiers to a diversity and accuracy evaluation after each iteration. If they are not representative of the current concept, they are discarded, and classifiers that are representative of the current concept and those with higher amounts of diversity are retained, which allows RACE to appropriately deal with recurring concepts. Poker Hand (84.31%) and DUE (81.36) on the KDD99 dataset are able to deal with concept drifts appropriately and this can only be attributed to the periodic inclusion of new base learners, while CALMID and SOFEnsemble do not maintain dynamic pools due to their static ensemble size.

5. Hidden Markov Model-Based RACE

In our next experiment, we investigate the behavior of RACE when we replace the knowledge transfer process with Hidden Markov Model, a metalearner. Hidden Markov Models (HMM) are known to work extremely well in practice as prediction, recognition, and identification systems in a very efficient manner. Hidden Markov Models are based on the assumption that consecutive observations are independent and therefore the probability of a sequence of observations can be expressed as the probabilities of individual observations.

The Hidden Markov Model is a metalearner that is able to predict when recurring concepts will occur. We can then anticipate that recurrent drifts choose also the most appropriate model for the incoming data chunk. The implementation of RACE using Hidden Markov Models allows the algorithm to better handle recurrent situations in classification problems in dynamic environments, thus enabling the evolving base learner to adapt to recurring concepts in a timely manner. This is made possible by predicting when the drift will happen from training examples at a given time and also getting a similarity level between concepts from a fuzzy similarity function.

5.1. Description of the Algorithm

Multilayer Perceptrons (MLPs), J48 Decision Trees, and Support Vector Machines are used as the base learners, processing the training instances from the time series data by means of an incremental learning algorithm to generate a classifier from the data chunk that represents the underlying concept. A pool that stores all concept representation is created. The drift detection mechanism (DDM) is continuously monitoring the error rate generated by learning algorithm; a warning is generated by the DDM if the error rate exceeds a predefined threshold, and a new classifier is learned. A metamodel is trained from the information provided by the drift detection mechanism and the metamodel evolves as new concepts are detected. The fuzzy concept similarity approach determines whether the underlying concept is recurrent, and previously learned models are applied.

In this case, previously learned highly diverse models are no longer trained as they are stable models that adequately represent specific concepts.

5.2. Experimental Analysis

To compare the performance of RACE that uses knowledge transfer and the RACE that uses Hidden Markov Models, we use the same synthetic datasets and real-world datasets used to compare the predictive performance of RACE with recent state-of-the-art algorithms designed to handle recurring concepts in dynamic environments.

Using the MOA framework, the performance of the analyzed algorithms is evaluated with respect to accuracy, time efficiency, and memory usage on both synthetic datasets and real-world datasets. Table 7 shows the prediction accuracy of RACE using Markov Models.

The performance of RACE is also evaluated with respect to CPU processing time in seconds. Table 8 shows the CPU processing time in seconds.

Concerning runtime, online ensembles like AWDOB require the most time for classification, followed by ARF and DP. RACE is the least time-consuming. This is partly because the combination of Hidden Markov Models with a drift detection mechanism offers quicker reactions to sudden and recurring concept drift compared to other methods. For this reason, RACE is in a better position to capture changes with Hidden Markov Models much more efficiently and adapt to different types of drifts accurately and timeously.

Memory consumption on the real-world datasets that have many attributes is shown in Table 9.

The memory consumption of SOFEnsemble, CALMID, and AWDOB is more than that of RACE and DUE. The three algorithms maintain a large pool of historical concepts which are checked for reuse. RACE and DUE require the least memory storage due to their pruning strategy.

5.3. Comparison of Accuracy Performance

To compare the accuracy of the five algorithms over multiple datasets, we follow the methodology proposed by Demsar [32]. We firstly use the nonparametric Friedman test to determine if there is a statistically significant difference between the rankings of the compared algorithms. We then perform the Nemenyi post hoc test with average rank diagrams. The rankings are depicted on the axis such that the best ranking algorithms are at the rightmost part of the diagram. The algorithms that do not differ significantly are connected with a line. The critical difference (CD) is indicated above the graph.

As can be observed, from the critical difference (CD) plots, RACE outperforms the other algorithms most of the time.

Figure 12 shows the critical difference plots from post hoc tests of rankings for experiments on the datasets used.

The nonparametric Friedman test was carried out to extend the analysis of comparing multiple classifiers over multiple datasets. The null hypothesis for the test was that there is no difference between the performances of all the tested algorithms. In the event of rejecting the null hypothesis, the Nemenyi test could have been employed to verify whether the performance of our algorithm, RACE, is statistically different from the rest of the algorithms used for comparative purposes. The critical difference (CD) from the average rank diagram shows that our algorithm is significantly better than the four recent representative algorithms on nonstationary time series data.

6. Conclusion

This paper presented a novel and evolving algorithm called Recurrent Adaptive Classifier Ensemble (RACE) to handle recurring concepts. RACE stores previously learned highly diverse models that are adapted using a new data chunk. We conducted two empirical experiments to evaluate the effectiveness of RACE in streaming environments associated with recurring concepts. In the first experiment, we created an ensemble of previously learned high diverse classifiers and used the concept of knowledge transfer to select diverse classifiers that are representative of the current concept from the latest data chunk. The drift detector was used in the algorithm to determine whether a drift has occurred or not. Results show that incorporating knowledge transfer and drift detection improves the prediction accuracy of the algorithm for nonstationary time series data.

In the second experiment, we investigated the behavior of the RACE algorithm when knowledge transfer is replaced by the Hidden Markov Model to predict upcoming drift with previously trained classifiers used to test similarity of past concepts to a present concept. Results show that using Hidden Markov Models to anticipate drift does not make the algorithm run efficiently enough for use in nonstationary time series data streams.

This paper has opened new avenues or directions for research, where recurring concepts are learned in a timely manner in nonstationary time series data with the least computational overheads. It is evident from the literature review that this area has not been fully explored. Even though the RACE algorithm exudes novelty, it has its own weaknesses. The RACE algorithm can be computationally expensive as it requires large memory to store all the highly diverse classes and storage during concept transfer. Furthermore, as the ensemble increases in size, it slows down the convergence to recurring concepts as the concept transfer process will require more time, thus compromising its usability in nonstationary series data where a classification delay can prove costly. However, regardless of the weaknesses identified, this paper has uniquely opened new avenues of research in this area. The expectation is that many more approaches to handling recurring concepts in nonstationary time series data can be explored and developed, so that a comparison of prediction performance with the unique and novel RACE algorithm proposed in this research paper can be made.

Data Availability

The research used five artificial datasets, namely, (1) Random Tree generator, (2) SEA generator, (3) LED generator, (4) Stagger, and (5) Hyperplane. The real-world datasets used are (1) Covertype dataset, (2) Sensor Data, (3) KDD99 Cup dataset, (4) Poker Hand dataset, and (5) Airlines dataset. The artificial and real-world data used to support the findings of this study have been deposited in the following repositories and sources: (1) Random Tree generator: Cunningham P., Nowlan N., Delany S. J., and Haahr M., 2003, “A Case-Based Approach to Spam Filtering that Can Track Concept Drift”, in the proceedings of ICCBR-2003 Workshop on Long-Lived CBR Systems. (2) SEA generator: Wang H., Fan W., Yu P.S., and Han J., 2003, “Mining Concept-Drifting Data Streams Using Ensemble Classifiers,” in the Proceedings of 9th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining KDD-2003, ACM Press, pp. 226–235. (3) LED generator: Cunningham P., Nowlan N., Delany S.J., and Haahr M., 2003, “A Case-Based Approach to Spam Filtering that Can Track Concept Drift,” in the Proceedings of ICCBR-2003 Workshop on Long-Lived CBR Systems. (4) Hyperplane: A. Bifet and R. Kirkby, Tutorial 1. Introduction to MOA Massive Online Analysis (Accessed 10.04.17). (5) Stagger dataset: J.C. Schlimmer and R.H. Granger Jr., “Incremental Learning from Noisy Data,” Vol. 1, 1986, pp. 317–354. Real-world datasets used are (1) Covertype dataset, (2) Airlines dataset, (3) KDD99 dataset, (4) Poker Hand dataset, and (5) Sensor Data-Intel Lab Data.

Conflicts of Interest

The authors declare that they have no conflicts of interest.