Microcluster-Based Incremental Ensemble Learning for Noisy, Nonstationary Data Streams

Liu, Sanmin; Xue, Shan; Liu, Fanzhen; Cheng, Jieren; Li, Xiulai; Kong, Chao; Wu, Jia

doi:https://doi.org/10.1155/2020/6147378

Complexity

On this page

Abstract Introduction Related Work Conclusions Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Special Issue

Collaborative Big Data Management and Analytics in Complex Systems with Edge

View this Special Issue

Research Article | Open Access

Volume 2020 | Article ID 6147378 | https://doi.org/10.1155/2020/6147378

Microcluster-Based Incremental Ensemble Learning for Noisy, Nonstationary Data Streams

Sanmin Liu,^1,2Shan Xue ,²Fanzhen Liu,²Jieren Cheng,³Xiulai Li,^3,4Chao Kong,¹and Jia Wu²

Guest Editor: Xuyun Zhang

Received23 Oct 2019

Revised26 Dec 2019

Accepted01 Feb 2020

Published05 May 2020

Abstract

Data stream classification becomes a promising prediction work with relevance to many practical environments. However, under the environment of concept drift and noise, the research of data stream classification faces lots of challenges. Hence, a new incremental ensemble model is presented for classifying nonstationary data streams with noise. Our approach integrates three strategies: incremental learning to monitor and adapt to concept drift; ensemble learning to improve model stability; and a microclustering procedure that distinguishes drift from noise and predicts the labels of incoming instances via majority vote. Experiments with two synthetic datasets designed to test for both gradual and abrupt drift show that our method provides more accurate classification in nonstationary data streams with noise than the two popular baselines.

1. Introduction

The velocity and voracity with which we are now producing data is making streaming data ubiquitous in real-world applications [1]. For example, intrusion detection [2], credit fraud detection [3], network traffic management [4], and recommendation system [5] all rely on data streams. However, data streams have some unique characteristics that make it more difficult to manipulate. First, the data can be generated at very fast speeds and in huge volumes. Second, there exists concept drift in data streams, and the existing models no longer work as effectively as they once did. Last, physical constraints mean that only a certain amount of knowledge can be used or extracted from a data stream at any point in time and, once elapsed, it can be very difficult to go back and retrieve more knowledge. Thus, data stream mining confronts many challenges.

Revealing the knowledge hidden in data streams is broadly known as data stream mining, which spans data stream classification, clustering, and other data analytics tasks [6]. Data stream classification is, arguably, the most common analytics task in many practical applications. Due to the time-sequence characteristics, the related research studies of data steam classification confront lots of difficulties. For example, to keep track with concept drift, the model not only needs to be retrained frequently but also its processing and memory overheads must stay low to cope with the velocity and volume of the data. In a traditional data mining scenario, the model would merely need to extract knowledge from a static dataset with a joint distribution function that does not change. However, our model for data stream classification needs to extract knowledge from instances that are generated over time and where the joint distribution function is variable, i.e., in the presence of concept drift [7, 8]. According to many studies, concept drift is the main barrier to data stream classification.

To date, the solutions to classification in nonstationary data stream environment have been based on either online or ensemble learning, and those methods improve the performance of classification. Concerning concept drift in imbalanced streams data setting, an ensemble learning model was presented with resampling technology [7]. A combined online ensemble method was used to simultaneously consider concept drift and the high-dimension problem [9]. Additionally, in the light of various classification scenarios, many supervised learning approaches recently have been widely explored [10–17], and some have been applied in data stream classification, such as support vector machine (SVM) and Bayesian technique.

In nonstationary streaming data environment, these investigations solved some of the problems, including concept drift, the curse of dimensionality, and imbalanced learning. However, there are still some open problems to be addressed. For example, few studies have considered how to effectively and simultaneously cope with both concept drift and noise in nonstationary data streams. To deal with these problems, we design a new classification approach that constructs microclusters to serve as a pool of base classifiers. Final prediction of incoming instance’s class label is made by a majority vote of the microclusters. At the same time, an incremental learning strategy combined with an ensemble learning and a smoothing operator does the work of adapting the model to concept drift, distinguishing noise, and maintaining stability.

In a word, there exist the three main contributions in our paper:(1)A technique for constructing a set of microclusters as base classifiers by redefining the concept of cluster feature previously used in hierarchical cluster analysis. Good classification results can be achieved with nonstationary data streams by combining numerous microclusters. Additionally, microcluster combined with incremental learning is a very convenient way to absorb new knowledge and keep track of concept drift.(2)A smoothing strategy designed to shift the centroids of microcluster and control the balance between historical and new instances. This approach makes the best use of historical knowledge and can also overcome problems with a shortage of drifted data.(3)A majority vote strategy and an incremental learning enhance the stability and adaptability of the model in nonstationary data streams with noise. Thus, the proposed model leverages the advantages of both ensemble and incremental learning to maintain high accuracy in class label prediction.

This paper is organized as follows. The background work is discussed in Section 2, and then Section 3 outlines the basic concept. Section 4 describes the proposed model and provides a complexity analysis of the algorithm. In Section5, experimental schema and results are illustrated. Section 6 describes conclusion and future plans.

An excellent data stream classification approach has the ability to learn incrementally and adapt to concept drift as well [18]. In general, two important kinds of incremental learning method are concerned: instance-incremental learning [19, 20], which learns an instance at a time, and batch-incremental learning [21], which learns from instance set once. In the instance-incremental learning group, Crammer et al. [19] developed an online passive-aggressive algorithmic (PA) framework based on SVM that forces the classification hyperplane to move to satisfy the minimum loss constraint when the classifier misclassifies an instance. This framework has been widely explored for many practical settings [22, 23]. In work [24], it presented the instance-incremental method with weighted one-class SVM that could solve gradual drift in nonstationary data streams. Instance-incremental learning has also been based on extreme learning machine as a way to boost classification speeds [25]. When data stream is stable, incoming instance is used to update the classifier; however, when concept drift happens, a weakly performing classifier is deleted. This is a very flexible approach to classifying real-time data streams. In the batch-incremental learning domain, Lu et al. [26] provided a novel dynamic weighted majority approach to deal with imbalance problems and concept drift. This method uses dynamic weighted ensemble learning to keep the classification model stable and batch-incremental learning to track concept drift.

Between the two modes, instance-based incremental learning is more flexible and scalable for real-time data stream classification. It is also a more suitable approach for environments where it is difficult to label instances and understand concepts in advance [20]. Hence, we turn our attention to instance-incremental learning for the remainder of this paper, using the simple term incremental learning, hereafter.

The impetus for studying ensemble learning in conjunction with data stream classification came from a desire to improve classification model’s stability [27–31]. These models include base classifier set and merged method which combines the base classifier’s output into a final output by the ensemble. SEA algorithm [29] is one of the early ensemble methods. When the SEA ensemble model is not full, each newly arriving data chunk is used to build a new base classifier. If the limit has been reached, the new classifier is still constructed for every newly arriving data chunk, but it replaces the classifier with the worst performance. According to a majority vote policy, the ensemble method, SEA, outputs the final predictions. Another similar work is weighted ensemble method based on accuracy [30], where the important point is to allocate a weight to each base classifier that is an estimate of its accuracy on the newest data chunk. This idea suggests that the newest data chunk could represent the target concept with high probability, so the classifiers with higher accuracy should be given more importance. Also, when the maximum ensemble scale has been reached, a base classifier with the worst performance is deleted and a new base classifier joins into the ensemble model. Another iterative ensemble method was developed based on boosting and batch-incremental learning [31]. This method adds a suitable number of base classifiers to the classification model with each newly arriving data chunk, instead of adding just one. The experimental results suggest that the iterative boosting ensemble classification method is a promising way to perform classification task in nonstationary data stream environment. Beyond concept drift, imbalanced class distributions are another challenge with data stream classification that can be tackled with ensemble learning. Zhang et al.'s [27] method of dealing with this problem is a two-pronged approach. The first tack is to divide the majority into subsets of roughly the same size as the minority and then construct new balanced training subsets from the minority and divided subsets. Next, the ensemble model is created using a neural network with backpropagation as the base learning algorithm. The base classifiers’ diversity is one of the important factors of learning system. Hence, Jackowski [32] introduced the idea of two error trend diversity measurements: pair errors and pool errors, to find and keep track with concept drift in streaming data setting. Experiments with this model show that the diversity measurements can not only be used to enhance the ensemble model’s performance but also to hold effectively the scale of ensemble model. Based on the above analysis, we think that ensemble learning is currently the most promising research direction for data stream classification.

From this review, we distill several observations: incremental learning can dynamically reveal new knowledge in data streams. Ensemble learning can improve the stability of classification models for nonstationary data streams. The suitable algorithm can enhance a classification model’s flexibility. These three observations form the basis of three integrated strategies in our method for simultaneously tackling concept drift and noise.

3. Basic Concept and Problem Definition

This section firstly begins with a description of the basic concepts used in this paper, and then a detailed analysis of the research problem is explored.

3.1. Data Stream

According to the related studies, in this paper, we think that data stream consists of a series of labeled instances, namely, , where , in which stands for a feature vector which represents an instance characterizing the features of an object and is ’s class label. When is , represents positive instance. On the contrary, is negative instance.

According to the above definition, we explore a mapping function with high accuracy which stands for classification model that can output the incoming instance ’s class label. Only supervised learning is considered in this paper. Therefore, the classification model is constructed from a labeled dataset, and, once built, it can output the class label or for the incoming instance. In addition, for the purposes of this paper, the real label is acquired after the mapping function outputs the prediction of incoming instance.

3.2. Concept Drift

According to the work [33], when a joint probability distribution of data changes evolving over time, there exists concept drift. In other words, , where the subscript stands for the time stamp, suggests the vector which represents the value of feature attribute, and is a class label. According to the changing rate of concept, gradual drift and abrupt drift [34] are discussed. Generally speaking, gradual drift is a slower rate of change from one concept to another one, and it is illustrated in Figure 1(a). When the distribution is abruptly differentfrom the distribution at , we say that abrupt drift occurs and it is seen in Figure 1(b). In Figure 1, the difference between gradual drift and abrupt drift is clearly found, and these two kinds of drift are concerned in this paper.

(a)

(b)

3.3. Problem Definition

Noisy instances and concept drift appear to have similar distributions in nonstationary data streams. It is, therefore, critical to differentiate noise from concept drift and that is the motivation of this paper to build a classification model that can find and keep track with concept drift in nonstationary streaming data with noise. Meanwhile, in order to catch concept drift, the classification model should be updated by incremental learning. The research problem is demonstrated in Figure 2.

Figure 2

A demonstration of problem definition. In data stream, (a) microclusters (shape: dotted circle) are developed by historical instances (color: green) with positive class (shape: circle) and negative class (shape: triangle); (b) when new instance (color: red, shape: circle) comes, microcluster is updated (color: red, shape: dotted circled), and (c) the old microcluster involves noisy instance (color: red, shape: circle), and (d) concept drift (color: red, shape: circle) is detected and a new microcluster (color: red, shape: dotted circle) is built.

From Figure 2, we understand clearly the problem definition of this paper and identify the noisy instance in nonstationary streaming data. The dotted circle represents a microcluster, and the dotted straight line suggests the distribution for instances in Figure 2. The current case is shown in Figure 2(a). When time goes on, the instance is coming and the microcluster is updated at time stamp as seen from Figure 2(b), which represents the case of incremental learning. At time stamp , in Figure 2(c), the incoming instance with positive class label lies in the old microcluster with a different class label. In this case, the new instance is regarded as a noisy instance and will be discarded; this is why this instance no longer exists at time stamp . In Figure 2(d), the incoming instance forms a concept drift and leads to a new microcluster construction.

Based on the above analysis, our solution involves three strategies to deal with the research problem as illustrated in Figure 2: incremental learning to track concept drift; ensemble learning to enhance the model’s stability; and microclustering method to distinguish drift from noise and make the final label predictions. In the next section, we outline these strategies in detail and discuss the three scenarios illustrated in Figure 2.

4. Adaptive Incremental Ensemble Data Stream Classification Method

This section describes microcluster and data stream classification model, followed by the corresponding algorithm.

4.1. Definition of Microcluster

Microclusters as classifiers in our model are constructed by cluster features, which is a technique that was originally developed as part of hierarchical cluster analysis [35]. The structure of cluster feature is defined as . Based on the cluster feature, we give the definition of microcluster used in this paper.

Definition 1. Microcluster is represented as , where and are used to compute the boundary of that denotes the square sum of the attributes of the instances in as calculated in equation (1) and is a vector saves the sum of each attribute as in equation (2), suggests the number of instances, presents ’s centroid which changes over time as shown in equation (3), is ’s class label, and counts the number that correctly classifies incoming instance and α is initiated as 0.where is the dimension of the instance.where is ’s centroid on the previous time stamp and stands for smoothing parameter.
The size of is represented by cluster’s radius which is calculated as follows:where represents the length of vector.

4.2. Data Stream Classification Model Based on Microcluster

Classification model consists of three phases: classification, incremental learning, and updating. A framework of the model is given in Figure 3. The processes and calculations are presented in detail in this part and summarized into the corresponding algorithm presented as Algorithm 1.

	Input: The instances ,
	the pool maximum limit , and
	the smoothing parameter .
	Output: The pool of microcluster
(1)	the pool of initial microclusters which is formed by -means
(2)	for each instance do
	Phase 1: Classification
(3)	distance between and
(4)	select the k-nearest microclusters to classify the instance
(5)	the predicted class label of instance gained by majority vote in equation (5)
(6)	update the parameter of the k-nearest microcluster
	Phase 2: Incremental Learning
(7)	if Scenario 1 then
(8)	update the structure of nearest microcluster by equations (1)–(3) and the number of the instances in microcluster will be incremented by 1
(9)	else if Scenario 2 then
(10)	consider the instance as a noisy point and neglect it
(11)	else if Scenario 3 then
(12)	build a new microcluster on instance
	Phase 3: Updating Pool
(13)	if then
(14)
(15)
(16)	else
(17)	the worst microcluster
(18)	replace
(19)	end if
(20)	end if
(21)	end for
(22)	return microcluster pool at required time stamp

4.2.1. Phase 1 (Classification): The -Nearest Microclusters Classify the Incoming Instance

When an incoming instance arrives, Euclidean distance is computed between the incoming instance and each microcluster in pool. Based on Euclidean distances, the -nearest microclusters are selected, and then each microcluster will assign its own label to the incoming instance. According to equation (5), the final label of incoming instance is voted by the merged method.where stands for the number of microclusters participating in the classification and denotes the number of class.

Once incoming instance is classified, microcluster is immediately updated. If the final prediction is correct, i.e., if all the microclusters who voted have the same class label as the final prediction, the value of increases by 1; otherwise, it decreases by 1.

4.2.2. Phase 2 (Incremental Learning): The Nearest Microcluster Will Be Updated Based on the Incoming Instance

Following the first-test-and-then-train principle, the nearest microcluster is immediately updated to ensure the model quickly adapts to the new concept or the new microcluster is constructed in this phase, which is depicted in Figure 4. Scenario 1: when incoming instance’s label is the same as the nearest microcluster’s label, incoming instance is used to retrain this microcluster. The terms , , and of the nearest microcluster are recalculated by equations (1)–(3). The number of instances in this microcluster is incremented by 1. The radius of microcluster is also updated by equation (4). This scenario is shown in Figure 4(a). As a matter of fact, when the incoming instance drops into the nearest microcluster, we carry out the same operation, that is, the incoming instance is merged into the nearest microcluster. Scenario 2: incoming instance’s label varies from the nearest microcluster’s label and incoming instance lies inside the boundary of the nearest microcluster, as seen from Figure 4(b). In this paper, there exists the fundamental assumption that two adjacent instances are highly likely to represent the same concept, i.e., the probability that they share the same class label is very high. According to the fundamental assumption, the incoming instance will be treated as noise and deleted. Scenario 3: in contrast to Scenario 2, incoming instance’s label is different from the nearest microcluster’s label and incoming instance does not drop into the nearest microcluster, as shown in Figure 4(c). This scenario suggests that incoming instance is derived from the different joint probability distribution. Under this circumstance, we think new concept happens, and a microcluster will be constructed with incoming instance by the method described in Section 4.1. Because there is only one instance in this new microcluster when it is constructed, its label will be the same as the incoming instance and its centroid will be the incoming instance itself. The terms and of the new microcluster are computed by equations (1) and (2), and the value of is 0.

(a)

(b)

(c)

Figure 4

Three different scenarios: (a) incoming instance’s label is the same as the nearest microcluster’s label; (b) incoming instance’s label varies from the nearest microcluster’s label, which represents noisy instance; and (c) incoming instance as a new concept does not drop into the nearest microcluster and its label is different from the nearest microcluster’s label. Note: the color represents the class label, the rectangle suggests the incoming instance, and the circle represents the nearest microcluster.

4.2.3. Phase 3 (Updating): The Pool of Microcluster Is Updated

As time passes, new microclusters are continuously being created and, eventually, the pool will reach its limit. Once full, the microcluster with the worst performance will be replaced with new microcluster. By this cyclical update, the classification model can effectively catch concept change, and it leads to improve the classification accuracy. Generally speaking, the smaller the value of , the worse the performance of the microcluster. Therefore, the microcluster with the smallest is selected for replacement.

4.3. Algorithm and Complexity Analysis

In summary of the above phases and scenarios in data stream classification model, the algorithm of microcluster-based incremental ensemble classification named as MCBIE is expressed in Algorithm 1.

The algorithm of MCBIE includes three phases which achieve three functions, namely, classification, incremental learning, and updating pool. Line 1 is to train the initial microcluster and build a pool of microclusters. Lines 3 to 6 achieve the classification for an incoming instance and update the performance of the microcluster. According to the three different scenarios, the function of Phase 2 is accomplished in lines 7 to 12. Finally, the size of base classifier reaches the upper-bound , the worst microcluster will be deleted, and the new microcluster is added to microcluster pool. On the contrary, the new microcluster is directly put into microcluster pool. It is illustrated in lines 13 to 19.

In terms of complexity, through the analysis of Algorithm 1, we know the core operation included by the algorithm MCBIE is to calculate the distance in classification phase. The complexity here depends on mainly two aspects: the dimensions of the instance and the number of microclusters as base classifier in the ensemble model. Thus, the presented algorithm’s time complexity is approximately . In the presented algorithm, the previous instances are not reserved over time and the statistical information of microcluster is recorded, such as , , and , which can save the storage memory by this way.

5. Experiments

5.1. Datasets

To evaluate MCBIE, we conduct simulation experiments with two synthetic datasets. The two datasets selected are the Hyperplane data stream and the SEA data stream taken from Massive Online Analysis (MOA) [36]. Hyperplane data stream is designed to test for gradual drift, while SEA data stream is designed to test for abrupt drift. These are the most popular datasets in the data stream classification domain. Further details are as follows. Hyperplane data stream [37]: in the -dimensional space, a hyperplane includes the point set which satisfies , where represents the -th dimension of . Instances for which represent positive class, and instances for which represent negative class. A hyperplane in -dimensional space may slowly rotate by changing the parameters for simulating time-changing concepts. In this paper, the value of is 10 and there are 6 attributes with concept drift, and it generates 20,000 instances. Three different noise ratios (respectively, 20%, 25%, and 30%) are injected into data stream. SEA data stream [29]: the instances in this data stream are generated from three attributes with continuous values . When it satisfies , the instance is positive class; otherwise, the label of instance is negative. To simulate concept drift, the threshold value will change over time. It generates 5000 instances with each threshold value, and the whole SEA data stream includes 20,000 instances. SEA data stream with two different noise ratios (20% and 30%) is applied in this experiment to test the abrupt drift.

5.2. Baselines

The PA algorithmic framework [19] and Hoeffding tree [38] are selected as baselines to compare with the presented method MCBIE, and these two approaches are frequently chosen as the benchmark in many studies [20, 22, 23, 38]. Moreover, as a well-known classical algorithm, the Hoeffding tree algorithm is integrated into the MOA platform [36]. Therefore, we have followed suit in our paper. The PA algorithmic framework [19] is an online incremental learning framework for binary classification based on SVM. Given instance , the classification model outputs the prediction as follows:where represents a vector of weights and is the prediction of instance .

After the is output, it acquires the ground truth class label and computes a loss value resulting from the following equation:

The vector of weights is then updated usingwhere is a Lagrange multiplier, whose value is calculated by equation (9) in three different methods, namely, PA, PA-I, and PA-II.where is a positive parameter and referred to as aggressiveness parameter of the algorithm. A detailed outline of the derivation procedure can be found in [19]. Hoeffding tree [38] is a decision tree for online learning from the high-volume data stream, and it is built from each instance in constant time. According to Hoeffding bound, we can estimate the number of instances which are needed to build the tree node. The Hoeffding bound has nothing to do with the distribution function that generates the instances. Moreover, the Hoeffding bound is used to construct Hoeffding tree which is approximated to the one produced by batch learning. In the light of its incremental nature of the Hoeffding tree, it is used widely in data stream classification.

5.3. Experiment Setup

Following the first-test-and-then-train principle [39], each incoming instance is first tested, and then the model is retrained with the incoming instance under an incremental paradigm. To assess the classification model’s performance in this paper, the classification accuracy is computed every one hundred instances during the process of data stream classification.

Both our MCBIE and the baselines are initialized on the first 100 instances, and the model resulting from that initialization is used to predict the following instance in data stream. In MCBIE, we use these 100 instances to train 6 initial microclusters as base classifiers by using the -means algorithm. At every time stamp, the three nearest microclusters of each incoming instance are selected to assert the label information. The maximum scale of microcluster pool is 30, and once full, a new microcluster which takes the place of the worst-performing microcluster joins in the pool. We use Weka package to implement the MCBIE algorithm. Hoeffding tree algorithm (named as HT) is run in MOA platform with the parameters set to their default values. PA, PA-I, and PA-II with Gaussian kernel are executed in MATLAB and the constant is equal to 1.

5.4. Experiment Result and Analysis

The simulation experiments are designed to evaluate MCBIE in two sides. First, we want to assess the sensitivity of the smoothing parameter ; second, we want to justify the feasibility and validity of MCBIE.

5.4.1. Experiment 1: Sensitivity Analysis of the Smoothing Parameter

Following the experimental setup in Section 5.3, we verify the function of smoothing parameter in MCBIE from 0.1 to 1. When the smoothing parameter is either too big or too small, the MCBIE’s average accuracy and corresponding standard deviation do not reach the desired result on the Hyperplane data stream and SEA data stream. Through the observation and analysis, we find the smoothing parameter could regulate the balance between the historical and new instances used to compute the centroid of the microcluster. When , the centroid of the microcluster will not move, and only its radius changes. On the contrary, when reaches the maximum value, the microcluster’s centroid is a mean of instances. It suggests all instances have the same importance to the centroid. However, because concept drift will occur in nonstationary data stream environment, instance at different time stamps should have different contributions to the centroid of microcluster. Experiment results justify this viewpoint. According to the analysis of experiment results, a conclusion is made that the best value of is located at an interval ; hence, we chose for subsequent experiments.

5.4.2. Experiment 2: Feasibility and Validity of MCBIE

All the experimental results with both the Hyperplane and SEA data streams are shown in Table 1. At the same time, the maximum value in each column is marked in bold.

From Table 1, we see the average accuracy of MCBIE reaches the highest value of 69.6%, 64.8%, and 61.8% on Hyperplane data stream with the noise ratio of 20%, 25%, and 30%, respectively. The corresponding standard variance of the three average accuracies is 0.051, 0.046, and 0.047, and the standard variances about accuracy are relatively low compared with the baselines. On average, MCBIE provides the most accurate classifications with the least standard variance among all the baselines with the Hyperplane data stream. On the SEA data stream, the average classification accuracy of MCBIE is 70.8% at 20% noise and 63.2% at 30% noise, respectively. Again, the standard variances are the smallest compared to all baselines, which demonstrate MCBIE’s stability in nonstationary data streams with noise. Based on the above experiment results, we may draw a conclusion that MCBIE is an effective and well-performing classification approach.

Through the further analysis of experiment results in Table 1, some interesting phenomena exist. As the noise ratio grows, the performance of MCBIE is improved to a greater degree than the other methods. For instance, with a noise ratio of 20% on the SEA data stream, MCBIE ranks only the second lead behind PA-I. However, at 30% noise, MCBIE becomes the most accurate model. Given the same noise ratios, the experiment results show that the classification model on SEA data stream performs better than on Hyperplane data stream. This suggests it is more difficult for classification model to learn knowledge from gradual drift than from abrupt drift. Of all the baselines, PA-I provides the best performance, which indicates that selecting an appropriate learning ratio is very important for incremental learning. The Hoeffding tree baseline has the largest standard variance, and it shows that Hoeffding tree has instability.

Last but not least, we want to show MCBIE adapts well to concept drift; Figures 5 and 6 illustrate accuracy curve for the MCBIE method with Hyperplane and SEA data streams. Figure 5 suggests that MCBIE can tackle concept drift in time on the Hyperplane data stream with the different noise ratios. When concept drift occurs, the curve plot in Figure 5 sharply descends and it indicates the concept included by the model is inconsistent with the current concept. MCBIE’s accuracy decreases when concept changes. However, the performance of MCBIE improves immediately after the model is retrained and updated with the incoming instance through incremental learning. The intensity of ascent and descent in Figure 5 reflects that the classification model has the ability to catch concept drift. In Figure 6, we easily understand that the similar phenomena are presented with the SEA data stream.

To demonstrate MCBIE’s superiority, based on the above analysis, we choose the two best methods MCBIE and PA-I to illustrate the ability to perform the prediction task in streaming data setting with noise and concept drift. The accuracy curve is plotted in Figures 7 and 8. From Figure 7, the accuracy curve suggests that these two methods have the ability to keep track with concept drift and shows clearly that our method is superior to the PA-I in terms of accuracy over the Hyperplane data stream with the three different noise ratios. Moreover, in three cases, the maximum and minimum accuracy of MCBIE is higher than that of the PA-I. Through the analysis of the accuracy curve over SEA data stream, concerned with the ability to adapt to concept drift, these two methods seem to have the same function to deal with nonstationary data stream classification, as demonstrated in Figure 8. Moreover, with the growth of noise ratio, the MCBIE has a better performance than PA-I, such as stability.

(a)

(b)

(c)

(a)

(b)

From these analyses, we conclude that the MCBIE method is able to conduct nonstationary data stream classification with high accuracy in environments characterized by both concept drift and noise.

6. Conclusions

Classification task in nonstationary data streams faces the two main problems: concept drift and noise, which require the classification model to not only cope with concept drift but also differentiate noise from concept drift. In order to deal with these problems, a novel method named MCBIE was proposed, which can achieve the classification task in nonstationary data streams with noise. Aiming to enhance MCBIE’s performance, the three strategies are used to alleviate the influence of concept drift and noise. In this paper, incremental learning can help microcluster as classifier to catch the concept change fast and ensemble strategy alleviates the disturbance between noise and concept drift. The function of smoothing parameter is to absorb the useful information from historical knowledge. Compared with the baseline methods, experiment results justify that our method, MCBIE, has the ability to perform classification in nonstationary streaming data setting. However, the three problems are worthy to be further concerned: (1) how to improve the noise recognition ability of our method in abrupt drift environment needs to be further strengthened; (2) in addition to accuracy, the stability of model needs to be improved; (3) when concept reoccurs, it is important to design more appropriate strategies for the replacement of microcluster.

Data Availability

The data used to support the findings of this study have been deposited in the GitHub repository (https://github.com/FanzhenLiu/ComplexityJournal).

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This study was supported by the Natural Science Foundation of Anhui Province (nos. 1608085MF147 and 1908085MF183), the Humanities and Social Science Foundation of the Ministry of Education (no. 18YJA630114), a Major Project of Natural Science Research in the Colleges and Universities of Anhui Province (no. KJ2019ZD15), MQNS (no. 9201701203), MQEPS (no. 96804590), MQRSG (no. 95109718), and the Investigative Analytics Collaborative Research Project between Macquarie University and Data61 CSIRO.

References

G. Ditzler, M. Roveri, C. Alippi, and R. Polikar, “Learning in nonstationary environments: a survey,” IEEE Computational Intelligence Magazine, vol. 10, no. 4, pp. 12–25, 2015.
View at: Publisher Site | Google Scholar
A. Jadhav, A. Jadhav, P. Jadhav, and P. Kulkarni, “A novel approach for the design of network intrusion detection system (NIDS),” in Proceedings of 2013 International Conference on Sensor Network Security Technology and Privacy Communication System, pp. 22–27, IEEE, New York, NY, USA, December 2013.
View at: Google Scholar
A. Salazar, G. Safont, A. Soriano, and L. Vergara, “Automatic credit card fraud detection based on non-linear signal processing,” in Proceedings of 2012 IEEE International Carnahan Conference on Security Technology, pp. 207–212, IEEE, Newton, MA, USA, October 2012.
View at: Google Scholar
T. Bujlow, T. Riaz, and J. M. Pedersen, “A method for classification of network traffic based on c5. 0 machine learning algorithm,” in Proceedings of the 2012 International Conference on Computing, Networking and Communications, pp. 237–241, IEEE, Maui, HI, USA, February 2012.
View at: Google Scholar
L. Gao, J. Wu, C. Zhou, and Y. Hu, “Collaborative dynamic sparse topic regression with user profile evolution for item recommendation,” in Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, New York, NY, USA, February 2017.
View at: Google Scholar
S. Xue, J. Lu, and G. Zhang, “Cross-domain network representations,” Pattern Recognition, vol. 94, pp. 135–148, 2019.
View at: Google Scholar
S. Ren, W. Zhu, B. Liao et al., “Selection-based resampling ensemble algorithm for nonstationary imbalanced stream data learning,” Knowledge-Based Systems, vol. 163, pp. 705–722, 2019.
View at: Publisher Site | Google Scholar
J. Sun, H. Fujita, P. Chen, and H. Li, “Dynamic financial distress prediction with concept drift based on time weighting combined with adaboost support vector machine ensemble,” Knowledge-Based Systems, vol. 120, pp. 4–14, 2017.
View at: Publisher Site | Google Scholar
T. Zhai, Y. Gao, H. Wang, and L. Cao, “Classification of high-dimensional evolving data streams via a resource-efficient online ensemble,” Data Mining and Knowledge Discovery, vol. 31, no. 5, pp. 1242–1265, 2017.
View at: Publisher Site | Google Scholar
W.-X. Lu, C. Zhou, and J. Wu, “Big social network influence maximization via recursively estimating influence spread,” Knowledge-Based Systems, vol. 113, pp. 143–154, 2016.
View at: Publisher Site | Google Scholar
Y. Zhang, J. Wu, C. Zhou, and Z. Cai, “Instance cloned extreme learning machine,” Pattern Recognition, vol. 68, pp. 52–65, 2017.
View at: Publisher Site | Google Scholar
J. Wu, Z. Cai, S. Zeng, and X. Zhu, “Artificial immune system for attribute weighted naive bayes classification,” in Proceedings of the 2013 International Joint Conference on Neural Networks (IJCNN), pp. 1–8, IEEE, Dallas, TX, USA, August 2013.
View at: Google Scholar
J. Wu, S. Pan, X. Zhu, C. Zhang, and X. Wu, “Multi-instance learning with discriminative bag mapping,” IEEE Transactions on Knowledge and Data Engineering, vol. 30, no. 6, pp. 1065–1080, 2018.
View at: Publisher Site | Google Scholar
P. ZareMoodi, S. K. Siahroudi, and H. Beigy, “A support vector based approach for classification beyond the learned label space in data streams,” in Proceedings of the 31st Annual ACM Symposium on Applied Computing, pp. 910–915, ACM, Pisa, Italy, April 2016.
View at: Google Scholar
S. Ramirez-Gallego, B. Krawczyk, S. Garcia, M. Wozniak, J. M. Benitez, and F. Herrera, “Nearest neighbor classification for high-speed big data streams using spark,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 47, no. 10, pp. 2727–2739, 2017.
View at: Publisher Site | Google Scholar
J. Gama, R. Fernandes, and R. Rocha, “Decision trees for mining data streams,” Intelligent Data Analysis, vol. 10, no. 1, pp. 23–45, 2006.
View at: Google Scholar
H. L. Hammer, A. Yazidi, and B. J. Oommen, “On the classification of dynamical data streams using novel “Anti-Bayesian” techniques,” Pattern Recognition, vol. 76, pp. 108–124, 2018.
View at: Publisher Site | Google Scholar
M. A. Maloof and R. S. Michalski, “Incremental learning with partial instance memory,” Artificial Intelligence, vol. 154, no. 1-2, pp. 95–126, 2004.
View at: Publisher Site | Google Scholar
K. Crammer, O. Dekel, J. Keshet, S. Shalev-Shwartz, and Y. Singer, “Online passive-aggressive algorithms,” Journal of Machine Learning Research, vol. 7, pp. 551–585, 2006.
View at: Google Scholar
M. Tennant, F. Stahl, O. Rana, and J. B. Gomes, “Scalable real-time classification of data streams with concept drift,” Future Generation Computer Systems, vol. 75, pp. 187–199, 2017.
View at: Publisher Site | Google Scholar
J. Read, A. Bifet, B. Pfahringer, and G. Holmes, “Batch-incremental versus instance-incremental learning in dynamic and evolving data,” in Proceedings of International Symposium on Intelligent Data Analysis, pp. 313–323, Springer, Helsinki, Finland, October 2012.
View at: Google Scholar
J. Lu, D. Sahoo, P. Zhao, and S. C. Hoi, “Sparse passive-aggressive learning for bounded online kernel methods,” ACM Transactions on Intelligent Systems and Technology, vol. 9, no. 4, p. 45, 2018.
View at: Publisher Site | Google Scholar
M. Oide, A. Takahashi, T. Abe, and T. Suganuma, “User-oriented video streaming service based on passive aggressive learning,” International Journal of Software Science and Computational Intelligence, vol. 9, no. 1, pp. 35–54, 2017.
View at: Publisher Site | Google Scholar
B. Krawczyk and M. Woźniak, “One-class classifiers with incremental learning and forgetting for data streams with concept drift,” Soft Computing, vol. 19, no. 12, pp. 3387–3400, 2015.
View at: Publisher Site | Google Scholar
S. Xu and J. Wang, “A fast incremental extreme learning machine algorithm for data streams classification,” Expert Systems with Applications, vol. 65, pp. 332–344, 2016.
View at: Publisher Site | Google Scholar
Y. Lu, Y.-M. Cheung, and Y. Y. Tang, “Dynamic weighted majority for incremental learning of imbalanced data streams with concept drift,” in Proceedings of the 2017 International Joint Conference on Artificial Intelligence, pp. 2393–2399, Melbourne, Australia, August 2017.
View at: Google Scholar
Y. Zhang, J. Yu, W. Liu, and K. Ota, “Ensemble classification for skewed data streams based on neural network,” International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 26, p. 08, 2018.
View at: Publisher Site | Google Scholar
B. Krawczyk, L. L. Minku, J. Gama, J. Stefanowski, and M. Woźniak, “Ensemble learning for data stream analysis: a survey,” Information Fusion, vol. 37, pp. 132–156, 2017.
View at: Publisher Site | Google Scholar
W. N. Street and Y. Kim, “A streaming ensemble algorithm (sea) for large-scale classification,” in Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 377–382, ACM, San Francisco, CA, USA, August 2001.
View at: Google Scholar
H. Wang, W. Fan, P. S. Yu, and J. Han, “Mining concept-drifting data streams using ensemble classifiers,” in Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 226–235, ACM, Washington, DC, USA, December 2003.
View at: Google Scholar
J. R. B. Junior and M. do Carmo Nicoletti, “An iterative boosting-based ensemble for streaming data classification,” Information Fusion, vol. 45, pp. 66–78, 2019.
View at: Google Scholar
K. Jackowski, “New diversity measure for data stream classification ensembles,” Engineering Applications of Artificial Intelligence, vol. 74, pp. 23–34, 2018.
View at: Publisher Site | Google Scholar
G. I. Webb, R. Hyde, H. Cao, H. L. Nguyen, and F. Petitjean, “Characterizing concept drift,” Data Mining and Knowledge Discovery, vol. 30, no. 4, pp. 964–994, 2016.
View at: Google Scholar
A. Tsymbal, “The problem of concept drift: definitions and related work,” Computer Science Department, vol. 106, no. 2, 2004.
View at: Google Scholar
T. Zhang, R. Ramakrishnan, and M. Livny, “Birch,” ACM SIGMOD Record, vol. 25, no. 2, pp. 103–114, 1996.
View at: Publisher Site | Google Scholar
A. Bifet, G. Holmes, R. Kirkby, and B. Pfahringer, “Moa: Massive online analysis,” Journal of Machine Learning Research, vol. 11, pp. 1601–1604, 2010.
View at: Google Scholar
G. Hulten, L. Spencer, and P. Domingos, “Mining time-changing data streams,” in Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 97–106, ACM, San Francisco, CA, USA, December 2001.
View at: Google Scholar
P. Domingos and G. Hulten, “Mining high-speed data streams,” in Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, vol. 2, p. 4, Boston, MA, USA, April 2000.
View at: Google Scholar
A. Bifet, G. Holmes, B. Pfahringer, and R. Gavalda, “Improving adaptive bagging methods for evolving data streams,” in Proceedings of 2009 Asian Conference on Machine Learning, pp. 23–37, Springer, Berlin Germany, November 2009.
View at: Google Scholar

Copyright

Copyright © 2020 Sanmin Liu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

596

Downloads

1067

Citations

Complexity

Collaborative Big Data Management and Analytics in Complex Systems with Edge

Microcluster-Based Incremental Ensemble Learning for Noisy, Nonstationary Data Streams

Abstract

1. Introduction

2. Related Work

3. Basic Concept and Problem Definition

3.1. Data Stream

3.2. Concept Drift

3.3. Problem Definition

4. Adaptive Incremental Ensemble Data Stream Classification Method

4.1. Definition of Microcluster

4.2. Data Stream Classification Model Based on Microcluster

4.2.1. Phase 1 (Classification): The -Nearest Microclusters Classify the Incoming Instance

4.2.2. Phase 2 (Incremental Learning): The Nearest Microcluster Will Be Updated Based on the Incoming Instance

4.2.3. Phase 3 (Updating): The Pool of Microcluster Is Updated

4.3. Algorithm and Complexity Analysis

5. Experiments

5.1. Datasets

5.2. Baselines

5.3. Experiment Setup

5.4. Experiment Result and Analysis

5.4.1. Experiment 1: Sensitivity Analysis of the Smoothing Parameter

5.4.2. Experiment 2: Feasibility and Validity of MCBIE

6. Conclusions

Data Availability

Conflicts of Interest

Acknowledgments

References

Copyright