Table of Contents Author Guidelines Submit a Manuscript
Computational Intelligence and Neuroscience
Volume 2018, Article ID 4276291, 13 pages
https://doi.org/10.1155/2018/4276291
Research Article

A Classifier Graph Based Recurring Concept Detection and Prediction Approach

1School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China
2School of Computer and Information Technology, Xinyang Normal University, Xinyang 464000, China
3Deakin University, Melbourne, VIC 3125, Australia
4Institute for Intelligent Systems Research and Innovation, Deakin University, Waurn Ponds, VIC 3220, Australia

Correspondence should be addressed to Zhihai Wang; nc.ude.utjb@47021131

Received 27 October 2017; Revised 15 January 2018; Accepted 19 April 2018; Published 7 June 2018

Academic Editor: Christian W. Dawson

Copyright © 2018 Yange Sun et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

It is common in real-world data streams that previously seen concepts will reappear, which suggests a unique kind of concept drift, known as recurring concepts. Unfortunately, most of existing algorithms do not take full account of this case. Motivated by this challenge, a novel paradigm was proposed for capturing and exploiting recurring concepts in data streams. It not only incorporates a distribution-based change detector for handling concept drift but also captures recurring concept by storing recurring concepts in a classifier graph. The possibility of detecting recurring drifts allows reusing previously learnt models and enhancing the overall learning performance. Extensive experiments on both synthetic and real-world data streams reveal that the approach performs significantly better than the state-of-the-art algorithms, especially when concepts reappear.

1. Introduction

In recent years, with the technological advance, a growing number of applications produce large amounts of data streams at high speed. Examples include sensor networks, spam filtering systems, traffic control, and intrusion detection [1, 2]. More formally, a data stream S is a potentially unbounded, ordered sequence of instances, which arrive continuously at high-speeds.

One of the biggest challenges in data stream learning is to deal with concept drift [35], i.e., the underlying concept may drift dynamically over time. Concept drift frequently occurs in the real world. For example, in recommend systems, user consumption preferences may change over time due to fashion, economy, and so on and weather prediction models may change according to the seasons. Such changes lead to a drastic drop in classification performance. A reasonably useful classifier should have the capability to recognize and respond to such changes accordingly and accurately. This study focuses on the topic of developing classifier learning systems for mining data streams in dynamic environments.

Concept drift can be divided depending on their speed, into sudden and gradual drifts [4]. Sudden concept drift is characterized by large amounts of changes between the underlying class distribution and the incoming instances in a relatively short amount of time. Gradual concept drift can take a very large amount of time to see a significant change in differences of underlying class distributions between the old instances and the incoming instances. In fact, no matter what type of change occurs, the model should be able to track and adapt to changes accordingly.

It is common that in real-world data streams previously seen concepts may reappear in the future. For example, weather prediction models change according to the seasons and a popular topic may appear in a social network during the time of the year (i.e., festivals or elections) [4, 6]. This demonstrates a unique kind of drift, known as recurring concepts [6]. For example, news reading preference of a user may change over time. A user can have different choices on mornings, evenings, weekdays, and weekends. In addition, a user might search for astrology articles in the beginning of the year and financial articles at the beginning of each quarter. Unfortunately, only few approaches take recurring concepts into consideration [68]. In the situation where concept may recur, the classification models which have been used in the past may apply to the future use. However, most existing papers on drift detection neglect this phenomenon and tend to take the concept which occurs after a drift as a new one.

If a concept reappears, the previously learnt classifiers should be reapplied; thus the performance of the learning algorithm can be improved. This study focuses on the problem of recurring concepts. Two crucial problems will be addressed: How to detect the recurring concepts. How to effectively adapt when a recurring concept is detected.

Due to the fact that drift detection can capture changes in data streams timely and then update the predictive model, the predictive model can maintain high accuracy. Following these critical motivations, an efficient scheme is designed to deal with the above issues. The key contribution of our algorithm is threefold:

A Drift Detection Method named Distribution-Based Detection Method (DBDM) is introduced aiming at addressing the first issue. It detects changes by comparing the distribution of data in different time windows based on Bernstein inequality.

An algorithm named the Recurrent Detection and Prediction (RDP) approach is introduced targeting on solving the second problem. It stores concepts which present previously occurred concepts with a graph model.

The performance comparison results of the proposed algorithms evaluated on a variety of datasets demonstrated that our method is both stable enough on the data streams with gradual concept drifts and flexible enough to adapt effectively to sudden concept drifts and the recurring concepts problem.

The rest of the paper is as follows. Section 2 reviews the related work. In Section 3, we discussed the basic ideas of the presented algorithms in detail. Section 4 provided the experimental results tested on both real and synthetic datasets followed by analysis and discussions. In the last section, we draw conclusions and discussed future work.

2. Related Work

In this section, first, some relevant concepts will be introduced, then, several related algorithms will be reviewed, and finally, based on these previous works, the original contributions of the paper will be summarized.

2.1. Basic Concepts and Notations

Definition 1 (data streams). Let be an infinite d-dimensional data streamEach instance is a pair (xt, yt), where xt is a vector of attribute values arriving at the time stamp t and yt is the class label of xt.

Definition 2 (concept drift). The term concept drift can be formally defined as any scenario where the joint probability, which represents data distribution, changes over time, i.e., [9].

Definition 3 (sudden concept drift). As shown in Figure 1(a), sudden concept drift means that the distribution of data will be changed directly to a new one in a relatively short time.

Figure 1

Definition 4 (gradual concept drift). As shown in Figure 1(b), gradual concept drift means that the probability of the old data distribution will decrease and the probability of a new distribution will increase during a very large amount of time to see a significant change.

Definition 5 (recurring concept drift). As shown in Figure 1(c), a recurring concept drift occurs when the instances from a period k are generated from the same distribution as a previously observed period Pk(xi, yi) = (xi, yi).

Definition 6 (classifier graph). A classifier graph is a graph whose nodes are stored with distinct classifiers derived from a given stream data. Each node in a classifier graph stores a classifier; directed link represents a drift; weight of a link represents drifting times.

2.2. Handling Concept Drift in Data Streams

Approaches to cope with concept drift can be divided into two main categories: passive and active approaches [5, 10]. The first approach adapts a learner at regular intervals without considering whether changes have really occurred, and it tracks changes blindly and updates the model continuously without requiring explicit change detection, while the second approach only makes adjustment when a drift occurs.

Active approaches typically require change detection modules. Gama et al. [11] presented a Drift Detection Method (DDM) which detects change by monitoring the classification error rate. Baena-Garcia et al. [12] introduced a detection algorithm called Early Drift Detection Method (EDDM) which has a better performance in the scenario of gradual change. Adaptive Windowing (ADWIN) [13] adopted a sliding window to store instances recently read and divides the window into two subwindows to monitor changes in the subwindows. EWMA for Concept Drift Detection (ECDD) was introduced by Ross et al. [14], which uses exponentially weighted moving average charts to monitor the error rate. Sakthithasan et al. [15] proposed an algorithm named SeqDrift which adopts reservoir sampling to manage the memory and improve the detection sensitivity in the case of slow gradual change.

Ensemble classifiers are one of the most popular passive approaches, and many active approaches based on ensemble are available in the literature. Streaming Ensemble Algorithm (SEA) [16] is one of the earliest solutions which adopt an ensemble strategy to address concept drift. It makes final prediction using a simple majority voting. Accuracy Weighted Ensemble (AWE) [17] is similar to SEA. It decides the weight of base leaner according to the classification accuracy of the learner on test datasets. Dynamic Weighted Majority algorithm (DWM) [18] is an ensemble based on the weighted majority mechanism and each classifier corresponds to a weight which can change dynamically. Online Coordinate Boosting (OCBoost) [19] is an online boosting algorithm, which adopts online boosting strategy to achieve a better performance. Diversity for Dealing with Drifts (DDD) [20] is a novel ensemble method which maintains ensembles with different levels of diversity.

While active approaches, such as change detection, work quite well in coping with sudden concept drift, passive approaches work better for gradual drift which is nevertheless more difficult to detect. In order to react to different types of concept drifts immediately, some approaches try to utilize both passive and active techniques to aid learning in nonstationary environments, which seek to combine the best elements of both passive approaches and active approaches.

As we pointed out earlier it is very common in real-world data streams that previously seen concepts will reappear. This suggests a unique kind of concept drift, known as recurring concepts. However, none of the aforementioned approaches take the problem of Recurring Concept Drifts into account. The concept of Recurring Concept Drifts has been introduced by Widmer et al. In [8], Katakis et al. presented a framework to detect recurring contexts. In this framework, batches of instances are mapped into conceptual vectors and stream clustering is used to group these conceptual vectors into different clusters. Then incremental classifiers are trained for every cluster. When a new batch of instances arrives, it was assigned into an existing cluster or a new cluster is created for it. Gomes et al. [21] introduced a data stream learning system which exploited context information to add and delete classifiers in the ensemble to improve existing ensemble approaches to deal with concept drift and recurring concept. Abad et al. [22] devised a framework to solve the issue of recurring concept in spam filtering. The framework handles the context which is related to concept drift. Gonçalves et al. [23] presented a framework named Recurring Concept Drifts (RCD) to handle recurring concept drift. In the framework, the new classifier is created when new concept occurs and a group of instances corresponding to the new concept is stored. When a new drift is detected, RCD compares the incoming instances with the previous ones to validate whether the two sets of instances come from the same distribution. If they are from the same distribution, the previous classifier is reused.

3. Learning Recurring Concept from Classifier Graph

In this section, a novel change detection paradigm based on the distribution between two subwindows will be introduced first, and then an internal classifier graph empowered drift detection mechanism will be presented in detail.

3.1. Change Detection Problem

Let W1= (xt+1, …, xt+n) denote reference window, let W2 = (xt+n+1, …, xt+2n) represent current window, let and be the average value of W1 and W2, and let μW1 and μW2 be their expected value. In fact, the underlying distribution of data is unknown and a test statistic based on sample means needs to be adopted by the change detector. This is accomplished by carrying out statistical tests that verifies whether the classification error or class distribution remains constant over time. The problem of change detection in data streams is to determine the null hypothesis H0 against alternative hypothesis H1 as follows:where is a confidence value, while is a function of δ when test statistics are used to model the difference between the means of instance in two windows. The change detection raises a change alarm, when the difference is greater than a threshold.

3.2. Distribution-Based Change Detection Using Bernstein Inequality

In this paper, a Distribution-Based Detection Method (DBDM) was presented by comparing the distribution of data in different time windows. The two-window paradigm is exploited by comparing error rate of the classifiers extracted from old and recent data. A correct prediction can be considered to be 1 and incorrect prediction to be 0. In the change detection, windows contain simple information (bits or numbers); thus the change tests are really simple. The sliding window W was partitioned into two equal length subwindows: a left subwindow W1 and a right subwindow W2 with means μ1 and μ2, respectively. Whenever the amount of new data reaches m, the boundary between the new data and the data arrived before it is taken as a check point. Then every check point is traversed to check whether the difference between the mean value of the data in the left subwindow of the check point and that of the right subwindow is greater than a threshold.

The key to the detection method is the calculation of the threshold. The inequalities which are often used to depict the difference between two distributions are Hoeffding inequality [24], Chernoff inequality [25], and Bernstein inequality [26]. Among them, the Hoeffding inequality is widely used in previous research. However, the Hoeffding inequality neglects the effect of variance, which leads to imprecise results in the case of small variance. Bernstein inequality associates expected value with variance. For this reason, a more precise threshold can be obtained by using Bernstein inequality. It is defined as follows: where are independent random variables, , is the expected value, and is the variance.

The probability of capturing concept drift at a certain check point is no more than , which can be expressed as shown in

Applying Boole's inequality, (4) can be converted intowhere k is the proportion of data among the left and right side.

Apply Bernstein inequality on the RHS of (5): where and represent variances. By substituting a and c with 0 and 1 in (6), we obtain

It is worth noting that formulation of the optimization problem for determining k is based on asymptotic behavior and so that k's value is approximate. Equating the two terms in expression yields

Let in (7) be equated to the user-assigned and set the two exponents to be equal. We have

We have equated to the left exponential term in (7) instead of the right one to get . We have

We set and , and then and can be calculated. When the difference of the mean values is greater than , meaning that a drift has been triggered. And when difference of the mean values is greater than , we confirm that the concept has been changed.

In DBDM, before an actual change point is detected multiple check points need to be detected. In multiple testing problems, as the number of hypotheses being tested increases, the likelihood of incorrectly rejecting a null hypothesis increases. Most of the existing algorithms adopt Bonferroni correction [27] to avoid the problem. However, Bonferroni correction is less effective when there are a large number of tests. For this reason, we adopt the error correction factor based on Šidák correction [28]. is modified according to where n denotes the number of hypotheses. The pseudocode is presented in Algorithm 1.

Algorithm 1: The pseudocode of DBDM.
3.3. Recurrent Detection and Prediction Approach

In this section, we design a novel triggered rebuild approach named Recurrent Detection and Prediction (RDP) approach that uses a weighted directed graph to predict which previously occurred concept is the most likely to recur when a new concept drift is detected.

In RDP, the recurring concepts and corresponding instances will be stored. Directions and weights of the arrows indicate the transformation relation of the concepts stored in graph. Whenever the change detection reaches the warning level, a set of instances will be stored and a new classifier will be trained. When the state is transformed into drift, verification will be carried out to see whether the latest stored instances and instances which have been stored in graph are same. If they are drawn from the same distribution, the current concept will be regarded as a recurring concept. Then the stored instances in classifier will be utilized in order to replace the current classifier. Otherwise, we regard that a concept has occurred. Then a new classifier is trained and the set of instances are in a graph.

As defined in classifier graph, each node stores a classifier and a set of instances used to induce the classifier. When concept drifts occur, one will be added to the weight of the arrow whose to-node is the new concept and the from-node is the concept occurring before the drift. And next time when a drift is detected, classifier graph can be employed to predict which previously occurred concept is the most likely to recur this time. The algorithm is illustrated in detail with an example as Figures 25.

Figure 2: Classifier graph diagram.
Figure 3: Concept 2 recurs.
Figure 4: Concept 5 recurs.
Figure 5: New concept occurs.

As shown in Figure 2, provided that the current concept is concept 1, we can detect concept drift after a period of time. Because the arrow directed from concept 1 to concept 3 has the maximum weight among the arrows whose from-node is concept 1, we assume that concept 3 recurs this time. If it is false, we choose the arrow whose weight is the second maximum. The procedure will be repeated. If all arrows whose from-node is concept 1 which does not recur this time, then the node which is not the one of to-node of concept 1, i.e., concept 5, will be checked to see whether it is a recurring concept. There are three cases:

Concept 2, concept 3, or concept 4 recurs this time. We assume that concept 2 recurs, and then we add the weight of the arrow from concept 1 to concept 2, as shown in Figure 3.

As shown in Figure 4, concept 5 is the recurring concept. An arrow will be added from concept 1 to concept 5.

When a new concept 6 occurs, a node will be created and an arrow will be added from concept 1 to concept 6, as shown in Figure 5.

Let G represent the graph, V is the set of vertexes in G, vexnum denotes the serial number of vertexes in G, Cn is a new classifier which applies to the new concept, Vk.C represents the previous classifier of Vk. Bn represents a set of instances which corresponding to new concept, and p is the order of the old concept in G. The pseudocode of RDP is listed in Algorithm 2.

Algorithm 2: The pseudocode of RDP.

4. Experiment Results and Analysis

The algorithms were carried out with help of MOA (Massive Online Analysis) (Version: MOA Release 2017.7) [29]. MOA is a software environment for implementing algorithms and running experiments for online learning. All of the experiments were performed on an Intel Core i3-2120 CPU @ 3.3GHz with 8GB of RAM running Windows 7.

To evaluate the performance of the proposed algorithms, a series of experiments were carried out. Two main sets of experiments are presented:

Experiment 1 aims to verify the fact that the performance of the proposed change detection in different drift scenarios.

Experiment 2 is set to compare the performance of the RDP with other state-of-the-art approaches.

4.1. The Analysis of DBDM

In the first experiment, the performance of DBDM is compared against the following algorithms: DDM [11], EDDM [12], and ECDD [14] in terms of the false positive counts. In DBDM, we set , and the size of block is 500. In ECDD, we set ARL0=1,000 and .

We generated four datasets. Each has 200,000 instances which were drawn from a stationary Bernoulli distribution and the mean value of the Bernoulli distribution is set to 0.05, 0.1, 0.3, and 0.5, respectively [13]. To make the experimental more reliable, the algorithms were carried out on each dataset for 10 times and then calculated the mean.

Table 1 shows that the false positive count of DBDM is lower than EDDM and ECDD, but it is higher than DDM. It also demonstrates that the false positive count of DBDM is reduced from 3.39 to 0.03 with the increase of the mean value of Bernoulli distribution.

Table 1: Average false positive counts on stationary Bernoulli distribution.

Then, we compared the performance of detectors under sudden concept drift scenario. We generated four datasets of 200,000 instances drawn from a Bernoulli distribution. The first 100,000 instances were drawn from stationary Bernoulli distribution whose mean value was 0.01. And the mean of the last 100,000 instances was raised to 0.05, 0.1, 0.3, and 0.5 for the four datasets separately.

Table 2 shows that the false detection of DBDM is the lowest when the change of the mean is small. However, the false positive rate of DBDM is increased with the increase of the mean value.

Table 2: Average false positive counts on sudden concept drift.

Table 3 shows the average detection delays under sudden concept drift. In general, the detection delays decrease with the increase of the mean value. The detection delay of DDM is higher when the mean value increment is low, and then it decreases rapidly. The detection delay of EDDM is high when the mean value increment is 0.04, and then it decreases gradually. At the beginning, the delay of ECDD is lower than DDM and EDDM, and then it drops slightly. However, the delay of ECDD increases when the mean value of increment is 0.29. The detection delay of DBDM is lower and it has a downward trend. This is due to the fact that the management of the recurrent change detection mechanism is capable of reusing previous concepts and gains the better performance in different situations, particularly under concept drift environments.

Table 3: Detection delays on an abrupt drift.

Finally, we investigated the performances of the four detection methods in the scenario of gradual change. When drift happens incrementally, the change is not obvious. Therefore, we mainly focus on detection delay and false negative. Detection delay is the distance between the instance at which the change is detected and the instance at which the change really occurs. In the experiment, we generate a dataset contains 1,000,000 instances. The first 998,000 instances are stable, and the mean value is 0.01. Then mean values of the last 2,000 instances of the four datasets rise with a different slope separately. The slopes are 0.0001, 0.0002, 0.0003, and 0.0004. The false negative counts are as shown in Table 4. The false negative counts of ECDD and DBDM are 0. And the false negative counts of DDM and EDDM decrease with the increase of the slope.

Table 4: Average false negative counts on a gradual drift.

The detection delays are demonstrated in Table 5. It can be seen that DDM and EDDM are not good at detecting gradual drift. ECDD is rather high, although it reduces a little with the increase of the slope. DBDM is high at the beginning, but it reduces obviously with the increase of the slope. It demonstrates that DBDM is superior to other methods. It is partly because the management of the recurrent change detection mechanism is capable of reusing historical concepts and achieving the better performance in this scenario.

Table 5: Detection delays on a gradual drift.
4.2. Comparative Performance Study

This part demonstrates the experimental results with regard to the effectiveness and efficiency of the proposed method.

4.2.1. Datasets

In the second experiment, we adopted four synthetic and five real-world datasets. The datasets are summarized in Table 6.

Table 6: Description of the nine datasets.

Synthetic Datasets. In the experiment, the synthetic datasets contain three types of concept drift: gradual, sudden, and recurring concept drift.

HyperPlane dataset is represented by the set of points x that satisfy , where xi is the ith coordinate of x. Two classes are distinguishing in the following way: instances for which are labeled positive and instances for which are labeled negative. Drifts was introduced by changing each weight attribute = + dσ, where σ is the probability that the direction of change is reversed and d is the change applied to every instance. This generator was adopted to create a dataset contains 1,000,000 instances with gradual drifts by the modification weight changing by 0.001 with each instance, and 5% noise was added to streams.

LED dataset is to predict the digit displayed on a seven-segment LED display. The particular configuration of the generator used for the experiment produces 24 binary attributes, 17 of which are irrelevant. Concept drift is simulated by interchanging relevant attributes. We generated a stream of 1,000,000 instances with sudden concept drifts and 15% of noise.

Random Tree dataset is generated by Random Tree generator and it contains 1,000,000 instances and 10 attributes. It has four recurring concepts which evenly distributed throughout the 1,000,000 instances.

SEA dataset consists of three attributes, where only two are a relevant attributes. All three attributes have values between 0 and 10. The points of the dataset are divided into four blocks with different concepts. In each block, the classification is done using f1 + f2, where f1 and f2 represent the first two attributes and is a threshold value. The most frequent values are 9, 8, 7, and 9.5 for the data blocks. It contains 1,000,000 instances with sudden drifts reappearing every 250,000 instances and 10% of noise.

Real-World Datasets. Real-world stream environment conceptual changes have unpredictability and uncertainty which can better verify the performance of the algorithm.

Emailing list (Elist) contains a stream of emails on various topics which are shown to the user one after another and are marked as interesting or junk. It is composed of 1, 500 instances with 913 attributes and is divided into 5 periods every 300 instances. At the end of each period, the user's interest in a topic changes in order to simulate the occurrence of concept drift. Thus, the transformation between two periods is a signal of drift. The dataset can be obtained at http://mlkd.csd.auth.gr/concept_drift.html. The characteristics of Elist are presented in Table 7, where (+) means an interested email and (--) indicates a spam.

Table 7: Characteristics of Elist.

Spam dataset represents the scenario of gradual concept drift and is based on the Spam Assassin Collection [8] available in http://spamassassin.apache.org/. It consists of 9, 324 instances with 500 attributes.

Usenet dataset simulates a news filtering system with the presence of concept drifts relative to the change of interest of a user over time [8]. The dataset contains 5,931 instances representing documents collected from the 20 Newsgroups. It is available at http://www.liaad.up.pt/kdus/products/datasets-for-concept-drift.

Covertype dataset from UCI archive [30] contains 581,012 instances, 54 attributes, and no missing values. The aim is to predict the forest cover type based on cartographic variables. It can be obtained at http://moa.cms.waikato.ac.nz/datasets/, and then we simulated the dataset into streams by the MOA generators.

Gas Sensor Drift Dataset also from UCI archive contains 13,910 measurements from 16 chemical sensors utilized in simulations for drift compensation in a discrimination task of 6 gases at various levels of concentrations. The dataset was gathered within January 2007 to February 2011 (36 months) in a gas delivery platform facility situated at the Chemo Signals Laboratory in the Bio Circuits Institute, University of California San Diego.

4.2.2. Comparative Study

We compared RDP with five ensemble-based methods: Accuracy Weighted Ensemble (AWE), Ensemble Building (EB), Dynamic Weighted Majority (DWM), Online Coordinate Boosting (OCBoost), and Recurring Concept Drifts (RCD). AWE is the best-known representative of block-based ensembles for data streams. Similar to AWE, EB constructs a subset of classifiers from sequential data chunks and then are used in the ensemble. DWM is based on the weighted majority algorithm, which maintains an ensemble of classifiers and each classifier corresponds to a weight which can change dynamically. OCBoost is an online boosting algorithm. RCD is a framework to deal with recurring concept drift.

The performance of the analyzed algorithms can be evaluated with respect to accuracy, time efficiency, memory usage, and F1-measure. The results are shown in Tables 811. F1-measure represents a harmonic mean between recall and precision. The calculation equation is as follows:

Table 8: Comparison of classification accuracy (%).
Table 9: Comparison of time consumption (Cpu seconds).
Table 10: Comparison of memory consumption (MB).
Table 11: Comparison of F1-measure.

In terms of accuracy, as shown in Table 8, RDP makes the overall best performance on most of datasets. Specifically, RDP demonstrates significantly the best results on the data steams with recurrent concept drift (Random Tree and Elist). On the dataset with gradual concept drift (HyperPlane), the block-based ensemble AWE is the best, followed by EB. Moreover, DWM seems to be the most accurate in the streams with sudden changes (LED and SEA). On the real-world datasets (Covertype and Gas Sensor), RDP clearly outperformed the other algorithms. For all of datasets, RDP is able to significantly boost the performance by using the recurrent change detection.

Concerning run time, as expected, online classifier like OCBoost requires the most time for classification, following by DWM, and RDP is the least time-consuming. This is partly because the addition of change detection mechanism offers quicker reactions to sudden and recurring concept drift compared to other methods. For this reason, RDP is able to capture changes much more efficiently and adapt to different kind of drifts immediately. Also, notice that AWE and EB did not perform as well as RDP due to the fact that they had to wait to accumulate instances into a batch before learning.

Analyzing the values in Table 10, it can be observed that online ensemble DWM and OCBoost require the least memory storage, following by RDP. It is due to the fact that DWM and OCBoost only update weights of classifiers after each incoming instance without storing data. The memory consumption of AWE and EB is more than RCD and RDP. It is partly because of the fact that RCD and RDP maintain a pool of historical concepts which are checked for reuse. For RDP, we may observe a marginal increase in memory consumption, due to the need of storing and processing weights assigned to classifiers. However, the additional cost is practically negligible.

In terms of F1-measure, as shown in Table 11, RDP obtains the overall best performance on most of datasets, followed by RCD, and OCBoost is the worst. This is partly because the management of the recurrent change detection mechanism is capable of reusing previous concepts and gains the better performance in different situations, particularly under concept drift environments. However, other ensemble methods lack detection mechanisms and therefore adapt ineffectively to drifts.

Figure 6 shows the accuracy on the HyperPlane, which is devised to evaluate the ability to handle gradual drifts. It is found from Figure 6 that the trend of all algorithms is basically the same. Among them, AWE is the best, followed by RCD, OCBoost is the worst. The advantages of RDP are not obvious. The reason is the fact that the block-based ensemble classifier AWE is designed to cope mainly with gradual concept drifts.

Figure 6: Accuracy on the HyperPlane.

Figure 7 demonstrates the accuracy on the Random Tree, which is designed to evaluate the ability to handle sudden concept drifts. As can be seen, RDP is the best, followed by RCD, EB and AWE perform almost identically, with DWM being slightly less accurate, and OCBoost is the worst. Whenever a concept drift occurred, the accurate rates of all the algorithms will undergo instantaneous fluctuations except RDP, which maintains a high, stable accuracy and suffered the smallest accuracy drops. This might be attributed to the addition of drift detector which could capture concept drifts promptly and construct a new classifier to handle this type of drift.

Figure 7: Accuracy on the Random Tree.

Figure 8 depicts the accuracy changes on the Elist. It can be observed that the accuracy curves of all algorithms are with varying degrees of volatility, which indicates that concept drift may exists in the dataset. Notice the sudden drops of all methods at drift time points (after 300, 600, 900, and 1,200 instances). However, RDP manages to recover faster in all cases exploiting and updating older models. RDP is the most accurate one, followed by the RCD. Unfortunately, DWM and OCBoost perform poorly on this dataset and the curves of them almost identically, while the accuracy curve of RDP is relatively stable, subjecting to data concept drift minimal impact on real data, which showed that the algorithm had better adaptability for this environment. This is partly because the management of the recurrent change detection mechanism is able to reuse historical concepts and achieve the better performance in different situations, particularly under concept drift environments.

Figure 8: Accuracy on the Elist.

Figure 9 shows the F1-measure changes on the Covertype. An interesting finding is that the curve of OCBoost suffered the most dramatic fluctuation. Apart from these, the F1-measure of RDP and RCD is relatively stable on this dataset. RDP gained the best performance on this dataset. It is also indicates that the algorithm has superior adaptability to the real stream environment.

Figure 9: F1-measure on the Covertype.

The average rank of classification accuracy of RDP on 9 synthetic and real-world data streams wins all other classifiers. To extend the analysis, nonparametric Friedman test was carried out for comparing multiple classifiers over multiple datasets [31]. The null hypothesis for the test is that there is no difference between the performances of all the tested algorithms. Moreover, in case of rejecting this null hypothesis, we employ the Nemenyi test [32] to verify whether the performance of our method is statistically different from the remaining algorithms. The critical difference diagram shown in Figure 10 tells that our method is significantly better than AWE and OCBoost.

Figure 10: A critical different diagram for all classifiers against each other.

To summarize, RDP is superior to the other three from the following aspects: It is capable of constructing a satisfactory model for handling both sudden and gradual concept drifts and has been specifically built to deal with recurring concepts. Compared to other ensembles, our method can achieve better performance and is robust against various noise levels and different types of drift.

5. Conclusion and Future Work

In this study, we focus on the investigation of Recurring Concept Drifts, a special subtype of concept drifts that has not yet drawn enough attention from the research community. A novel method intending to handle recurring concept based on the change detection method using classifier graph was proposed. The approach of detecting recurring drifts which allows reuse of previously learnt models enhanced the learning performance on most datasets. Extensive experiments on synthetic and real-world datasets have validated that the proposed approach not only outperforms the existing popular methods in the adaptation to Recurring Concept Drifts, but also adapts well even in different concept drifting scenarios.

In our ongoing and future research, we will explore an effective procedure which will eliminate redundant classifiers without decreasing the ability to deal with recurring concepts and work toward developing classifier graph based effective practical dynamical learning mechanism.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (no. 61672086, no. 61702030, no. 61771058, no. 61563001, and no. 61572005), the China Postdoctoral Science Foundation (2018M631328), the Natural Science Foundation of Beijing (no. 4182052), and Excellent Young Teachers Program of XYNU (no. 2016GGJS-08).

References

  1. C. C. Aggarwal, Data Streams: Models and Algorithms, Springer, Berlin, Germany, 2007.
  2. J. Gama, Knowledge Discovery from Data Streams, CRC Press, New Yourk, NY, USA, 2010. View at MathSciNet
  3. G. Widmer and M. Kubat, “Learning in the presence of concept drift and hidden contexts,” Machine Learning, vol. 23, no. 1, pp. 69–101, 1996. View at Google Scholar · View at Scopus
  4. A. Tsymbal, “The problem of concept drift: Definitions and related work,” Tech. Rep., Department of Computer Science, Trinity College, Dublin, Ireland, 2004. View at Google Scholar
  5. G. Ditzler, M. Roveri, C. Alippi, and R. Polikar, “Learning in nonstationary environments: a survey,” IEEE Computational Intelligence Magazine, vol. 10, no. 4, pp. 12–25, 2015. View at Publisher · View at Google Scholar · View at Scopus
  6. G. Widmer, “Tracking Context Changes through Meta-Learning,” Machine Learning, vol. 27, no. 3, pp. 259–286, 1997. View at Publisher · View at Google Scholar · View at Scopus
  7. J. Gama, R. Sebastião, and P. P. Rodrigues, “Issues in evaluation of stream learning algorithms,” in Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '09), pp. 329–338, ACM Press, New York, NY, USA, July 2009. View at Publisher · View at Google Scholar · View at Scopus
  8. I. Katakis, G. Tsoumakas, and I. Vlahavas, “Tracking recurring contexts using ensemble classifiers: an application to email filtering,” Knowledge and Information Systems, vol. 22, no. 3, pp. 371–391, 2010. View at Publisher · View at Google Scholar · View at Scopus
  9. J. Gama, I. Žliobaitė, A. Bifet, M. Pechenizkiy, and A. Bouchachia, “A survey on concept drift adaptation,” ACM Computing Surveys, vol. 46, no. 4, pp. 231–238, 2014. View at Publisher · View at Google Scholar · View at Scopus
  10. R. Elwell and R. Polikar, “Incremental learning of concept drift in nonstationary environments,” IEEE Transactions on Neural Networks and Learning Systems, vol. 22, no. 10, pp. 1517–1531, 2011. View at Publisher · View at Google Scholar · View at Scopus
  11. J. Gama, P. Medas, G. Castillo, and P. Rodrigues, “Learning with drift detection,” in Learning with drift detection. In: Proceedings of the 17th Brazilian Symposium on Artificial Intelligence (SBIA 2004, LNCS 3171), Lecture Notes in Computer Science, pp. 286–295, Springer, Berlin, Germany, 2004. View at Publisher · View at Google Scholar
  12. M. Baena-García, D. J. Campo-Ávila, R. Fidalgo et al., “Early drift detection method,” in Proceedings of the Fourth International Workshop on Knowledge Discovery from Data Streams (KDD 2006), pp. 77–86, ACM Press, New York, NY, USA, 2006.
  13. A. Bifet and R. Gavalda, “Learning from time-changing data with adaptive windowing,” in Proceedings of the 7th SIAM International Conference on Data Mining (SDM 2007), C. Apte, D. Skillicorn, B. Liu et al., Eds., pp. 443–448, SIAM, Philadelphia, PA, USA, 2007.
  14. G. J. Ross, N. M. Adams, D. K. Tasoulis, and D. J. Hand, “Exponentially weighted moving average charts for detecting concept drift,” Pattern Recognition Letters, vol. 33, no. 2, pp. 191–198, 2012. View at Publisher · View at Google Scholar · View at Scopus
  15. R. Pears, S. Sakthithasan, and Y. S. Koh, “Detecting concept change in dynamic data streams,” Machine Learning, vol. 97, no. 3, pp. 259–293, 2014. View at Publisher · View at Google Scholar · View at MathSciNet
  16. W. N. Street and Y. S. Kim, “A streaming ensemble algorithm (SEA) for large-scale classification,” in Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '01), pp. 377–382, ACM Press, New York, NY, USA, August 2001. View at Publisher · View at Google Scholar · View at Scopus
  17. H. Wang, W. Fan, P. S. Yu, and J. Han, “Mining concept-drifting data streams using ensemble classifiers,” in Proceedings of 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2003), pp. 226–235, ACM Press, New York, NY, USA, 2003. View at Publisher · View at Google Scholar
  18. J. Z. Kolter and M. A. Maloof, “Dynamic weighted majority: an ensemble method for drifting concepts,” Journal of Machine Learning Research, vol. 8, pp. 2755–2790, 2007. View at Google Scholar · View at Scopus
  19. R. Pelossof, M. Jones, I. Vovsha, and C. Rudin, “Online coordinate boosting,” in Proceedings of the 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops 2009, pp. 1354–1361, IEEE, New York, NY, USA, October 2009. View at Publisher · View at Google Scholar · View at Scopus
  20. L. L. Minku and X. Yao, “DDD: a new ensemble approach for dealing with concept drift,” IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 4, pp. 619–633, 2012. View at Publisher · View at Google Scholar · View at Scopus
  21. J. B. Gomes, E. Menasalvas, P. A. Sousa et al., “Mining recurring concepts in a dynamic feature space,” IEEE Transactions on Neural Networks and Learning Systems, vol. 25, no. 1, pp. 95–110, 2014. View at Google Scholar
  22. M. A. Abad, J. B. Gomes, and E. Menasalvas, “Predicting recurring concepts on data-streams by means of a meta-model and a fuzzy similarity function,” Expert Systems With Applications, vol. 46, pp. 87–105, 2016. View at Google Scholar
  23. P. M. Gonçalves and R. S. M. D. Barros, “RCD: A recurring concept drift framework,” Pattern Recognition Letters, vol. 34, no. 9, pp. 1018–1025, 2013. View at Publisher · View at Google Scholar
  24. W. Hoeffding, “Probability inequalities for sums of bounded random variables,” Journal of the American Statistical Association, vol. 58, no. 301, pp. 13–30, 1963. View at Publisher · View at Google Scholar · View at MathSciNet
  25. H. Chernoff, “A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations,” Annals of Mathematical Statistics, vol. 23, no. 4, pp. 493–507, 1952. View at Publisher · View at Google Scholar · View at MathSciNet
  26. T. Peel, S. Anthoine, and L. Ralaivola, “Empirical bernstein inequalities for U-statistics,” in Advances in Neural Information Processing Systems, pp. 1903–1911, 2010. View at Google Scholar
  27. J. J. Goeman and A. Solari, “Multiple hypothesis testing in genomics,” Statistics in Medicine, vol. 33, no. 11, pp. 1946–1978, 2014. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  28. Z. Sidák, “Rectangular confidence regions for the means of multivariate normal distributions,” Journal of the American Statistical Association, vol. 62, no. 318, pp. 626–633, 1967. View at Google Scholar · View at MathSciNet
  29. A. Bifet, G. Holmes, R. Kirkby, and B. Pfahringer, “Moa: Massive online analysis,” Journal of Machine Learning Research, vol. 11, pp. 1601–1604, 2010. View at Google Scholar · View at Scopus
  30. M. Lichman, “UCI Machine Learning Repository,” Irvine, CA: University of California, School of Information and Computer Science, http://archive.ics.uci.edu/ml.
  31. J. Demšar, “Statistical comparisons of classifiers over multiple data sets,” Journal of Machine Learning Research, vol. 7, pp. 1–30, 2006. View at Google Scholar · View at MathSciNet · View at Scopus
  32. N. Settouti, M. E. Bechar, and M. A. Chikh, “Statistical comparisons of the top 10 algorithms in data mining for classification task,” International Journal of Interactive Multimedia and Artificial Intelligence, Special Issue on Artificial Intelligence Underpinning, vol. 4, no. 1, pp. 46–51, 2016. View at Publisher · View at Google Scholar