Research Article  Open Access
Hang Yang, Simon Fong, "Incremental Optimization Mechanism for Constructing a Decision Tree in Data Stream Mining", Mathematical Problems in Engineering, vol. 2013, Article ID 580397, 14 pages, 2013. https://doi.org/10.1155/2013/580397
Incremental Optimization Mechanism for Constructing a Decision Tree in Data Stream Mining
Abstract
Imperfect data stream leads to tree size explosion and detrimental accuracy problems. Overfitting problem and the imbalanced class distribution reduce the performance of the original decisiontree algorithm for stream mining. In this paper, we propose an incremental optimization mechanism to solve these problems. The mechanism is called Optimized Very Fast Decision Tree (OVFDT) that possesses an optimized nodesplitting control mechanism. Accuracy, tree size, and the learning time are the significant factors influencing the algorithm’s performance. Naturally a bigger tree size takes longer computation time. OVFDT is a pioneer model equipped with an incremental optimization mechanism that seeks for a balance between accuracy and tree size for data stream mining. It operates incrementally by a testthentrain approach. Three types of functional tree leaves improve the accuracy with which the tree model makes a prediction for a new data stream in the testing phase. The optimized nodesplitting mechanism controls the tree model growth in the training phase. The experiment shows that OVFDT obtains an optimal tree structure in both numeric and nominal datasets.
1. Introduction
Decisiontree learning is one of the most significant classifying techniques in data mining and has been applied to many areas, including business intelligence, healthcare, and biomedicine. The traditional approach to build a decisiontree, designed by the Greedy Search, loads a full set of data into memory and partitions the data into a hierarchy of nodes and leaves. The tree cannot be changed when new data are acquired, unless the whole model is rebuilt by reloading the complete set of historical data together with the new data. This approach is unsuitable for unbounded input data such as data streams, in which new data continuously flow in at high speed. To this end, the incremental approach is proposed to build a decisiontree dynamically that the tree grows with new data input.
A new generation of algorithms has been developed for incremental decisiontree, a pioneer of which using a Hoeffding bound (HB) in node splitting is so called Very Fast Decisiontree (VFDT) [1]. It builds a decisiontree simply by keeping track of the statistics of the attributes of the incoming data. When sufficient statistics have accumulated at each leaf, a nodesplitting algorithm determines whether there is enough statistical evidence in favor of a nodesplit, which expands the tree by replacing the leaf with a new decision node. This decisiontree learns by incrementally updating the model while scanning the data stream on the fly. This powerful concept is in contrast to a traditional decisiontree that requires the reading of a full dataset for tree induction. The obvious advantage is its realtime data mining capability, which frees it from the need to store up all of the data to retrain the decisiontree because the moving data streams are infinite. A research work [2] proofs the feasibility of classification algorithms for analyzing biosignals in the forms of infinite data streams, and it also provides a comparison of traditional decisiontree C4.5 and incremental decisiontree VFDT in practical.
On one hand, the challenge for data stream mining is associated with the imbalanced class distribution. The term “imbalanced data” refers to irregular class distributions in a dataset. For example, a large percentage of training samples may be biased toward class , leaving few samples that describe class . Both noise and imbalanced class distribution significantly impair the accuracy of a decisiontree classifier through confusion and misclassification prompted by the inappropriate data. The size of the decisiontree will also grow excessively large under noisy data. To tackle these problems, some researchers applied data manipulation techniques to handle the imbalanced class distribution problems, including undersampling, resampling, a recognitionbased induction scheme [3], and a feature subset selection approach [4]. On the other hand, despite the difference in their treebuilding processes, both traditional and incremental decisiontrees suffer from a phenomenon called overfitting when the input data are infected with noise. The noise confuses the treebuilding process with conflicting instances. Consequently, the tree size becomes very large and eventually describes noise rather than the underlying relationship.
With traditional decisiontrees, the underperforming branches created by noise and biases are commonly pruned by crossvalidating them with separate sets of training and testing data. Pruning algorithms [5] help keep the size of the decisiontree in check; however the majority are postpruning techniques that remove relevant tree paths after a whole model has been built from a stationary dataset. Postpruning of a decisiontree in highspeed data stream mining, however, may not be possible (or desirable) because of the nature of incremental access to the constantly incoming data streams.
In this paper, we devise a new version of VFDT, so called Optimized VFDT (OVFDT), which can provide an incremental optimization on prediction accuracy and decisiontree model size. The motivations of OVFDT are.(1)To handle the imbalanced class distribution problem, OVFDT proposes three types of functional tree leaf that improve the classification accuracy; (2)To deal with the noisy data in data streams, OVFDT uses an adaptive tiebreaking threshold instead of a user predefined. We do not know what the best setting is unless all possibilities have been tried. However it is an obstacle for realworld application. By running simulation experiments, the optimized value of adaptive tie is proved to be ideal for constraining the optimal tree growth. (3)To prevent the overfitting problem, OVFDT contains an incremental optimization mechanism in the nodesplitting test that obtains an optimal decisiontree amongst prediction accuracy and model size.
The rest of this paper is structured as follows: Section 2 introduces a research background of decisiontree learning for data streams, the effect of tiethreshold in tree building. Section 3 presents the details of OVFDT algorithm, in terms of a testthentrain approach. Experiments are described in Section 4. Section 5 concludes.
2. Background
2.1. DecisionTree in Data Stream Mining
A decisiontree classification problem is defined as follows: is the number of examples in a dataset with a form (), where is a vector of attributes and is a discrete class label. is the index of class label. Suppose a class label with the th discrete value is . Attribute is the th attribute in and is assigned a value of , where , and is the number of different values . The classification goal is to produce a decisiontree model from examples, which predicts the classes of in future examples with high accuracy. In data stream mining, the example size is very large or unlimited, .
VFDT [1] constructs an incremental decisiontree by using constant memory and constant timepersample. It is a pioneering predictive technique that utilizes the Hoeffding bound. The tree is built by recursively replacing leaves with decision nodes. Sufficient statistics of attribute with a value of are stored in each leaf with a class label assigning to a value . A heuristic evaluation function is used to determine split attributes for converting leaves to nodes. Nodes contain the split attributes, and leaves contain only the class labels. The leaf represents a class according to the sample label. When a sample enters, it traverses the tree from the root to a leaf, evaluating the relevant attributes at every node. Once the sample reaches a leaf, the sufficient statistics are updated. At this time, the system evaluates each possible condition based on the attribute values; if the statistics are sufficient to support one test over the others, then a leaf is converted to a decision node. The decision node contains the number of possible values for the chosen attribute according to the installed split test. The main elements of VFDT include, first, a treeinitializing process that initially contains only a single leaf and, second, a treegrowing process that contains a splitting check using a heuristic function and a Hoeffding bound (HB). VFDT uses information gain as .
The formula of HB is shown in (1). HB controls over errors in the attributesplitting distribution selection, where is the range of classes’ distribution and is the number of instances that have fallen into a leaf. is one minus the desired probability of choosing the correct attribute at any given node. To evaluate a splitting value for attribute , it chooses the best two values. Suppose is the best value of , where ; suppose is the second best value, where ; suppose is the difference of the best two values for attribute , where . HB is used to compute high confidence intervals for the true mean of attribute to class that , where . If, after observing examples, the inequality holds, then , meaning that the best attribute observed over a portion of the stream is truly the best attribute over the entire stream. Thus, a splitting value of attribute can be found without full attribute values even when we do not know all values of . In other words, it does not train a model from full data, and the tree is growing incrementally when more data come. Consider,
In the past decade, several research papers have proposed different methodologies to improve the accuracy of VFDT. HOT [6] proposes an algorithm producing some optional tree branches at the same time, replacing those rules with lower accuracy by optional ones. The classification accuracy has been improved significantly, while learning speed is slowed because of the construction of optional tree branches. Some of options are inactive branches consuming computer resource. Functional tree leaf is originally proposed to integrate into incremental decisiontree in VFDTc [7]. Consequently, the Naïve Bayes classifier on the tree leaf has improved classification accuracy. The functional tree leaf is able to handle both continuous and discrete values in data streams, but no direct evidence shows it can handle such imperfections like noise and bias in data streams. FlexDT [8] proposes a Sigmoid function to handle noisy data and missing values. The Sigmoid function is used to decide what true nodesplitting value, but sacrificing algorithm speed. For this reason, the lightweight algorithm with fast learning speed is favored for data streams environment. CBDT [9] is a forest of trees algorithm that maintains a number of trees, each of which is rooted on different attributes and grows independently. It is sensitive to the concept drift in data streams according to the slidingwindow mechanism. VFDR [10] is a decision rule learner using HB. Likewise VFDT, VFDR proposes a rule expending mechanism that constructs the decision rules (ordered or unordered) from data stream on the fly. VFDT handles streaming data that tree structure keeps on updating when new data arrive. It only requires reading some samples satisfying the statistical bound (referring to the HB) to construct a decisiontree. Since it cannot analyze over the whole training dataset in one time, normal optimization methods using full dataset to search for an optima between the accuracy and tree size do not work well here. Our previous work has provided a solution for sustainable prediction accuracy and regulates the growth of the decisiontree to a reasonable extent, even in the presence of noise. Moderated Very Fast Decisiontree (MVFDT) [11] is a novel extension of the VFDT model that includes optimizing the treegrowing process via adaptive tiebreaking threshold instead of a user predefined value in VFDT.
There are two popular platforms for implementing streammining decisiontree algorithms. Very Fast Machine Learning (VFML) [12] is a Cbased tool for mining timechanging highspeed data streams. Massive Online Analysis (MOA) [13] is Javabased software for massive data analysis, which is a wellknown open source project extended from WEKA data mining. In both platforms, the parameters of VFDT must be preconfigured. For different tree induction tasks, the parameter setup is distinguished.
MOA is an open source project with a user friendly graphic interface. It also provides several ways to evaluate algorithm’s performance. Hence, some VFDTextended algorithms have been built in this platform. For example, the VFDT algorithms embedded in MOA (released on November 2011) are Ensemble Hoeffding Tree [14] is an online bagging method with some ensemble VFDT classifiers. Adaptive Size Hoeffding Tree (ASHT) [15] is derived from VFDT adding a maximum number of split nodes. ASHT has a maximum number of split nodes. After one node splits, if the number of split nodes is higher than the maximum value, then it deletes some nodes to reduce its size. Besides, it is designed for handling conceptdrift data streams. AdaHOT [15] is also derived from HOT. Each leaf stored an estimation of current error. The weight of node in voting process was proportional to the square of inverse of error. AdaHOT combines HOT with a voting mechanism on each node. It also extends the advantages using optional trees to replace the tree branches of bad performance. Based on an assumption “there has been no change in the average value inside the window,” ADWIN [16] proposes a solution to detect changes by a variablelength window of recently seen instances. In this paper, the OVFDT algorithm is developed on the fundamental of MOA platform.
2.2. Relationship amongst Accuracy, Tree Size, and Time
When data contains noisy values, it may confuse the result of heuristic function. The difference of the best two heuristic evaluations for attribute , where , may be negligible. To solve this problem, a fixed tiebreaking , which is a user predefined threshold for incremental learning decisiontree, is proposed as prepruning mechanism to control the tree growth speed [17]. This threshold constrains the nodesplitting condition that . An efficient guarantees a minimum tree growth in case of treesize explosion problem. must be set before a new learning starts; however, so far there has not been a unique suitable for all problems. In other words, there is not a single default value that works well in all tasks so far. The choice of τ hence depends on the data and their nature. It is said that the excessive invocation of tie breaking brings the performance of decisiontree learning declining significantly on complex and noise data, even with the additional condition by the parameter .
In addition to the tiebreaking threshold , is the number of instances a leaf should observe between split attempts. In other words, is a userdefined value to control the tree growing speed, and is a userdefined value to control the interval time to check node splitting. The former is used to constrain tree size and the latter is used to constrain the learning speed. In order to optimize accuracy, tree size, and speed for decisiontree learning, first of all, an example is given to demonstrate the relationship amongst these three factors for data streams.
In this example, we use VFDT, which is a classical incremental decisiontree using HB in node splitting, to evaluate synthetic datasets added with bias classes. We use MOA to generate two typical datasets: LED24 is a nominal dataset, and Waveform21 is a numeric dataset. Both datasets share the origins with the sample generators donated by UCI machine learning repository. LED24 uses 24 nominal attributes to classify 10 different classes, and Waveform21 uses 21 numeric attributes to classify 3 different classes. The data stream problem is simulated by large numbers of instances as many as one million. The accuracy, tree size, and time are recorded with changing the predefined values of and . From Table 1, we can see the following.(i)In general, the bigger tree size brings a higher accuracy, even caused by the overfitting problem, but taking more learning time. (ii) is proposed to control the tree size growing. A bigger brings a faster tree size growth, but longer computation time. But because the memory is limited, the tree size does not increase, while reaches a threshold ( for LED24; for Waveform21). (iii) is proposed to control the learning time. A bigger brings a faster learning speed, but smaller tree size and lower accuracy.
(a)  
 
(b)  

A proposed solution [18] to overcome this detrimental effect is an improved tiebreaking mechanism, which not only considers the best ( and the second best ( splitting candidates in terms of heuristic function, but also uses the worst candidate (). At the same time, an extra parameter is imported, , which determines how many times smaller the gap should be before it is considered as a tie. The attribute splitting condition becomes as the following: when , the attribution will be split as a node. Obviously, this approach uses two extra elements, and , which bring extra computation to the original algorithm. However, the only way to detect the best tiebreaking threshold for a certain task is trying all the possibilities in VFDT. It is impractical for realworld applications. In this paper, we propose the adaptive tiebreaking threshold using the incremental optimization methodology. The breakthrough of our work is the optimized nodesplitting control, which will be specified in the following sections.
3. Proposed Methodology
3.1. Motivation and Overview
OVFDT, which inherits the use of HB, implements on a testthentrain approach (Figure 1) for classifying continuously arriving data streams, even for infinite data streams. The whole testthentrain process is synchronized such, that when the data stream arrives, one segment at a time, the decisiontree is being tested first for prediction output, and training (which is also known as updating) of the decisiontree then occurs incrementally. The description of testing process will be explained in Section 3.3 in details, and the training process will be explained in Section 3.4. Ideally, the nodesplitting test updates tree model in order to improve the accuracy, while a bigger tree model takes longer computation time. The situation to do the nodesplitting check is when the number of instances in a leaf is greater than the predefined value .
Imperfect data streams, including noisy data and bias class distribution, decline the performance of VFDT. Figure 2 shows the results of accuracy, tree size, and computation time using VFDT the same dataset structure added with imperfect values. The ideal stream is free from noise and has a uniform proportion of class samples, which is rare in real world. Comparing ideal data streams with imperfect data streams, we conclude Lemma 1.
(a)
(b)
(c)
Lemma 1. Imperfections in data streams worsen the performance of VFDT. The tree size and the computation time are increased, but the accuracy is declined. In other words, the optimization goal is to increase the accuracy but not enlarge the tree size, within an acceptable computation time. Naturally a bigger tree size takes longer computation time. For this reason, the computation time is dependent on the tree size.
In the decisiontree model, each path from the root to a leaf is considered as a way to present a rule. To ensure a high accuracy, there must be sufficient number of rules, which is the number of leaves in the tree model. Suppose the Hoeffding Tree (HT) is the decisiontree algorithm using Hoeffding bound (HB) as the nodesplitting test. Let Accu be the accuracy function for the decisiontree structure HT at the th nodesplitting estimation, and let Size be the tree size; then where is a mapping function of tree size to accuracy. Most incremental optimization functions can be expressed as the sum of several subobjective functions: where is a continuously differentiable function whose domain is a nonempty, convex, and closed set. We consider the following optimization problems:
Based on Lemma 1, we propose a solution to optimize the decisiontree structure by improving the original VFDT that:
The tree model is updated when a node splitting appears. Original VFDT considers the HB as the only index to split node. However, it is not enough. In terms of the above optimization goal, OVFDT proposes an optimized nodesplitting control during the treebuilding process.
3.2. OVFDT TestThenTrain Process
Data streams are openended problems that traditional sampling strategies are not viable in the nonstopping streams scenario. OVFDT is an improved version of the original VFDT and its extensions using HB to decide the node splitting. The most significant contribution is OVFDT that can obtain an optimal tree structure by balancing the accuracy and tree size. It is useful for data mining especially in the events of the tree size explosion, when the decisiontree is subject to imperfect streams including noisy data and imbalanced class distribution.
HT algorithms run a testthentrain approach to build a decisiontree mode. When new stream arrives, it will be sorted from the root to a predicted leaf. Comparing the predictive class with the true class of this data stream, we can maintain an error matrix for every tree leaf in the testing process. In terms of the stored statistics matrix, the decisiontree model is being updated in the training process. Table 2 presents the differences between OVFDT and HT algorithms (including the original VFDT and its extensions). Pseudocode 1 shows the input parameters, and the output of OVFDT, and the approach presented as pseudocode.
 
FTL is functional tree leaf; MC is Majority Class, NB is Naïve Bayes, and WNB is Weighted Naïve Bayes; HT is the decision tree using a Hoeffding bound. 

3.3. OVFDT Testing Approach
Suppose is a vector of attributes, and is the class with different values included in the data streams. For decisiontree prediction learning tasks, the learning goal is to induce a function of , where is the predicted class by a Hoeffding tree (HT) according to a functional tree leaf strategy . When a new data stream (, ) arrives, it traverses from the root of the decisiontree to an existing leaf by the current decisiontree structure, provided that the root has existed initially. Otherwise, the heuristic function is used to construct a tree model with a single root node. When new instance comes, it will be sorted from the root to a leaf by the current tree model. The classifier on the leaf can further enhance the prediction accuracy via the embedded classifiers. OVFDT contains three classifiers to improve the performance of prediction. They are the Majority Class (), Naïve Bayes (), and Weighted Naïve Bayes ().
Suppose , the predicted class value, and is actual class in data streams with a vector of attribute . A sufficient statistics matrix stores the number of passedby samples, which contain attribute with a value belonging to a certain so far. We call this statistics table Observed Class Distribution (OCD) matrix. The size of OCD is , where is the total number of distinct values for attribute and is the number of distinct class values. Suppose is the sufficient statistic that reflects the number of attribute with a value belonging to class . Therefore, OCD on node is defined as
For a certain leaf that attributes with a value of ,
Majority Class classifier chooses the class with the maximum value as the predicted class in a leaf. Thus, predicts the class with a value that
Naïve Bayes classifier chooses the class with the maximum possibility computed by Naïve Bayes as the predictive class in a leaf. The formula of Naïve Bayes is
OCD of leaf with value is updated incrementally. Thus, predicts the class with a value that
Weighted Naïve Bayes classifier proposes to reduce the effect of imbalanced class distribution. It chooses the class with the maximum possibility computed by Weighted Naïve Bayes as the predictive class in a leaf:
OCD of leaf with value is updated. Thus, predicts the class with a value that
After the stream traverses the whole HT, it is assigned to a predicted class , which according to the functional tree leaf . Comparing the predicted class to the actual class , the statistics of correctly and incorrectly prediction are updated immediately. Meanwhile, the sufficient statistics, , which is a count of attribute with value belonging to class , are updated in each node. This series of actions is so called a testing approach in this paper. Pseudocode 2 gives the pseudocode of this approach. According to the functional tree leaf strategy, the current HT sorts a newly arrived sample () from the root to a predicted leaf . Comparing the predicted class to the actual class , the sequentialerror statistics of and prediction are updated immediately.

To store OCD for OVFDT, , , and require memory proportional to , where is the number of nodes in tree model and the number of attributes; is the maximum number of values per attribute; is the number of classes. OCD of and are converted from that of . Therefore, we do not require extra memory. When required, it can be converted from .
3.4. OVFDT Training Approach
Right after the testing approach, the training follows. Nodesplitting estimation is used to initially decide if HT should be updated or not that depends on the amount of samples received so far that can potentially be represented by additional underlying rules in the decisiontree. In principle, the optimized nodesplitting estimation should be applied on every single new sample that arrives. Of course this will be too exhaustive, and it will slow down the tree building process. Instead, a parameter is proposed in VFDT that only do the nodesplitting estimation when examples have been observed on a leaf. In the nodesplitting estimation, the tree model should be updated when a heuristic function chooses the most appropriate attribute with the highest heuristic function value as a splitting node according to HB and tiebreaking threshold. The heuristic function is implemented as an information gain here. This situ of nodesplitting estimation constitutes the socalled training phase.
The nodesplitting test is modified to use a dynamic tiebreaking threshold , which restricts the attribute splitting as a decision node. The parameter traditionally is preconfigured with a default value defined by the user. The optimal value is usually not known until all of the possibilities in an experiment have been tried. An example has been presented in Section 2.2. Longitudinal testing of different values in advance is certainly not favorable in realtime applications. Instead, we assign a dynamic tie threshold, equal to the dynamic mean of HB at each pass of stream data, as the splitting threshold, which controls the node splitting during the treebuilding process. Tie breaking that occurs close to the HB mean can effectively narrow the variance distribution. HB mean is calculated dynamically whenever new data arrives.
The estimation of splits and ties is only executed once for every (a usersupplied value) samples that arrive at a leaf. Instead of a preconfigured tie, OVFDT uses an adaptive tie that is calculated by incremental computing. At the th nodesplitting estimation, the HB estimates the sufficient statistics for a large enough sample size to split a new node, which corresponds to the leaf . Let be an adaptive tie corresponding to leaf , within estimations seen so far. Suppose is a binary variable that takes the value of 1 if HB relates to leaf and 0 otherwise. is computed by (13). To constrain HB fluctuation, an upper bound and a lower bound are proposed in the adaptive tie mechanism. The formulas are presented in
For lightweight operations, we propose an errorbased prepruning mechanism for OVFDT, which stops noninformative split node before it splits into a new node. The prepruning takes into account the nodesplitting error both globally and locally.
According to the optimization goal mentioned in Section 3.1, besides the HB, we also consider the global and local accuracy in terms of the sequential error statistics of and prediction computed by functional tree leaf. Let be the difference between and , and is the index of testing approach. Then reflects the global accuracy of the current HT prediction on the newly arrived data streams. If , the number of correct predictions is no less than the number of incorrect predictions in the current tree structure; otherwise, the current tree graph needs to be updated by node splitting. In this approach, the statistics of correctly and incorrectly prediction are updated. Suppose , which reflects the accuracy of HT. If declines, it means the global accuracy of current HT model worsens. Likewise, comparing and , the local accuracy is monitored during the node splitting. If is greater than , it means the current accuracy is declining locally. In this case, the HT should be updated to suit the newly arrival data streams.
Lemma 2. Monitor global accuracy. The model’s accuracy varies whenever a node splits, and the tree structure is updated. Overall accuracy of current tree model is monitored during node splitting by comparing the number of correctly and incorrectly predicted samples. The number of correctly predicted instances and otherwise is recorded as global performance indicators so far. This monitoring allows the global accuracy to be determined.
Lemma 3. Monitor local accuracy. The global accuracy can be tracked by comparing the number of correctly predicted samples with the number of wrongly ones. Likewise, comparing the global accuracy measured at current nodesplitting estimation with the previous splitting, the increment in accuracy is tracked dynamically. This monitoring allows us to check whether the current nodesplitting is advantageous at each step by comparing with the previous step.
Figure 3 gives an example why our proposed prepruning takes into account both the local and the global accuracy in the incremental pruning. At the th nodesplitting estimation the difference between correctly and incorrectly predicted classes was , and was at ()th estimation. was negative that the local accuracy of ()th estimation was worse than its previous one, while both were on a global increasing trend. Hence, if accuracy is getting worse, it is necessary to update the HT structure.
Combining the prediction statistics gathered in the testing phase, Pseudocode 3 presents the pseudo code of the training phase in OVFDT in building an upright tree. The optimized nodesplitting control is presented in Figure 3 Line 7. In each nodesplitting estimation process, HB value that relates to a leaf is recorded. The recorded HB values are used to compute the adaptive tie, which uses the mean of HB to each leaf , instead of a fixed userdefined value in VFDT.

4. Evaluation
4.1. Evaluation Platform and Datasets
A Java package with OVFDT has been implemented with MOA toolkit as a simulation platform for experiments. The running environment is on a Windows 7 PC with Intel Quad 2.8 GHz CPU and 8 G RAM. In all of the experiments, the parameters of the algorithms are and , which are default values suggested by MOA. δ is the allowable error in split decision, and values closer to zero will take longer to decide; is the number of instances a leaf should observe between split attempts. The main goal of this section is to provide evidence of the improvement of OVFDT compared with the original VFDT.
The experimental datasets, including pure nominal datasets, pure numeric datasets, and mixed datasets, are either synthetics generated by the MOA Generator or extracted from realworld applications that are publicly available for download from the UCI repository. The descriptions of each experimental dataset are listed in Table 3. LED24 dataset is generated by MOA Generator. In the experiment, we add 10% noisy data to simulate imperfect data streams. The LED24 problem uses 24 binary attributes to classify 10 different classes. Waveform21 dataset is also generated by MOA. The dataset is donated by David Aha to the UCI repository. The goal of the task is to differentiate between three different classes of Waveform. It has 21 numeric attributes which contained noise. Cover Type is used to predict forest cover type from cartographic variables, and this data is collected from real world [19].

The benchmarking algorithms are VFDT [1], HOT [6] and ADWIN [16]. VFDT and HOT are the representative learning methods without slidingwindow criteria for handling concept drift; AWDIN uses an adaptive window technique. The paper [16] claimed that ADWIN performed as well or only slightly worse than the best window for each rate of change in CVFDT. This justifies why CVFDT was not being compared in this test.
4.2. HeldOut Evaluation
The first evaluation simulates a holdout testing approach. The datasets are divided to two parts: 70% for training model and 30% for testing model. In Figures 4 and 5 for LED24 and Waveform21, OVFDT algorithms have the best accuracy and compact tree size. For Cover Type data, HOT obtains a higher accuracy but much bigger model size than the others. OVFDT has the mechanism of optimizing node splitting so as to balance the prediction accuracy and the model size consequently.
Besides, we use the receiver operating characteristic (ROC) as a standard method for analysing and comparing classification result. It provides a convenient graphical display of the trade off between true and false positive classification rates for twoclass problems. In the decisiontree classification, the ROC is used for more than two classes. In this case, we apply the multi class ROC analysis to evaluate the performance of the treelearning algorithm:
Likewise, twoclass ROC statistics, for each class from to in a multiclass ROC analysis, can be assigned to the samples with class as positive; otherwise they are assigned as negative in Figure 6. True positives (TP) are examples correctly labeled as positives, calculated by (17). False positives (FPs) refer to negative examples incorrectly labeled as positive, calculated by (18). True negatives (TNs) correspond to negatives correctly labeled as negative, calculated by (20). Finally, false negatives (FNs) refer to positive examples incorrectly labeled as negative, calculated by (19). Each class can be converted into a twoclass problem, with corresponding values for TP, TN, FP, and FN.
PrecisionRecall is a wellknown analysis method for ROC evaluation. In pattern recognition, precision refers to the fraction of retrieved instances that are relevant, whereas recall is the fraction of relevant instances that are retrieved. The values of precision and recall range from 0 to 1. A precision score of 1 for a class means that every item labeled as belonging to class does, indeed, belong to class . A recall score of 1 means that every item from class was correctly labeled as belonging to class . PrecisionRecall scores are not analyzed in isolation: the measure [20] is a weighted harmonic mean of the PrecisionRecall measure, and the measure evenly weights precision and recall scores, with a best value of 1 and a worst value of 0. In addition, the true positive rate (TPR) and false positive rate (FPR) are commonly used as benchmarks in ROC analysis. According to the ROC matrix shown in Figure 5, the calculation of TPR is given in (21). The calculation of FPR is given in (22). Precision is calculated by (23). measure calculations are in (24). The experimental result of PrecisionRecall test is shown in Table 4. Consider

The pairwise of PrecisionRecall results are shown in Table 4. OVFDT with WNB functional tree leaf has the best statistical results in Waveform21 data, which contains numeric attributes only. For the other two data, OVFDT also has the better TPR that represents that it has higher sensitivity in this experiment. For the ROC curve, Figure 7 displays the TPRFPR analysis in an ROC space. Obviously, the plots of OVFDT are higher than the other algorithms in ROC space that we say the proposed methods outperform others in this holdout test.
(a)
(b)
(c)
4.3. TestThenTrain Evaluation
The second evaluation implements a testthentrain approach that is an incremental evaluation. When new instances come, they are used to test the current model tree that we write down the statistical results of accuracy and model size. After that, those instances are used to train and update the model tree. From Figure 8, we can see that OVFDT with WNB functional tree leaf obtains the best accuracy amongst the tested algorithms while smaller tree size relatively. Since the tree induction that uses optional tree leaves, HOT results much bigger model size than the other methods. This incremental evaluation shows OVFDT’s outperformance on the aspects of accuracy and compact tree size.
(a)
(b)
(c)
5. Conclusion
Imperfect data stream leads to tree size explosion and detrimental accuracy problems. In original VFDT, a tiebreaking threshold that takes a userdefined value is proposed to alleviate this problem by controlling the nodesplitting process that is a way of tree growth. But there is no single default value that always works well and that userdefined value is static throughout the stream mining operation. In this paper, we propose an OptimizedVFDT (OVFDT) algorithm that uses an adaptive tie mechanism to automatically search for an optimized amount of tree node splitting, balancing the accuracy and the tree size, during the treebuilding process. The optimized nodesplitting mechanism controls the attributesplitting estimation incrementally. Balancing between the accuracy and tree size is important, as stream mining is supposed to operate in limited memory computing environment, and a reasonable accuracy is needed. It is a known contradiction that high accuracy requires a large tree with many decision paths, and too sparse the decisiontree results in poor accuracy. In the experiment, we use holdout test and incremental test to evaluate the proposed model. The results show that OVFDT achieves a better performance in terms of high prediction accuracy and compact tree size than the other VFDTs. This advantage can be technically accomplished by means of simple incremental optimization mechanisms as described in this paper. They are light weighted and suitable for incremental learning. The contribution is significant because OVFDT can potentially be further modified into other variants of VFDT models in various applications, while the best possible (optimal) accuracy and minimum tree size can always be guaranteed.
References
 P. Domingos and G. Hulten, “Mining highspeed data streams,” in Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '01), pp. 71–80, ACM, August 2000. View at: Google Scholar
 S. Fong, H. Yang, S. Mohammed, and J. Fiaidhi, “Streambased biomedical classification algorithms for analyzing biosignals,” Journal of Information Processing Systems, vol. 7, no. 4, pp. 717–732, 2011. View at: Publisher Site  Google Scholar
 N. Chawla, N. Japkowicz, and A. Kolcz, “Special issue on learning from imbalanced data sets,” ACM SIGKDD Explorations, vol. 6, no. 1, 2004. View at: Google Scholar
 D. Mladenic and M. Grobelnik, “Feature selection for unbalanced class distribution and naive bayes,” in Proceedings of the 16th International Conference on Machine Learning, pp. 258–267, Morgan Kaufmann, Boston, Mass, USA, 1999. View at: Google Scholar
 T. Elomaa, “The biases of decision tree pruning strategies,” in Advances in Intelligent Data Analysis, vol. 1642 of Lecture Notes in Computer Science, pp. 63–74, 1999. View at: Publisher Site  Google Scholar
 B. Pfahringer, G. Holmes, and R. Kirkby, “New options for hoeffding trees,” in Proceedings of the Australian Conference on Artificial Intelligence, pp. 90–99, 2007. View at: Google Scholar
 J. Gama, R. Rocha, and P. Medas, “Accurate decision trees for mining highspeed data streams,” in Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '03), pp. 523–528, ACM, Washington, DC, USA, August 2003. View at: Publisher Site  Google Scholar
 S. Hashemi and Y. Yang, “Flexible decision tree for data stream classification in the presence of concept change, noise and missing values,” Data Mining and Knowledge Discovery, vol. 19, no. 1, pp. 95–131, 2009. View at: Publisher Site  Google Scholar  MathSciNet
 H. Stefan, P. Russel, and S. K. Yun, “CBDT: a concept based approach to data stream mining,” in Advances in Knowledge Discovery and Data Mining, vol. 5476 of Lecture Notes in Computer Science, pp. 1006–1012, 2009. View at: Publisher Site  Google Scholar
 J. Gama and P. Kosina, “Learning decision rules from data streams,” in Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI '11), T. Walsh, Ed., vol. 2, pp. 1255–1260, AAAI Press, 2011. View at: Google Scholar
 H. Yang and S. Fong, “Moderated VFDT in stream mining using adaptive tie threshold and incremental pruning,” in Proceedings of the 13th International Conference on Data Warehousing and Knowledge Discovery (DaWaK '11), pp. 471–483, Springer, Toulouse, France, 2011. View at: Google Scholar
 G. Hulten P Domingos, “VFML—a toolkit for mining highspeed timechanging data streams,” 2003, http://www.cs.washington.edu/dm/vfml/. View at: Google Scholar
 B. Albert, H. Geoff, P. Bernhard et al., “MOA: a realtime analytics open source framework,” vol. 6913 of Lecture Notes in Computer Science, pp. 617–620. View at: Google Scholar
 N. Oza and S. Russell, “Online bagging and boosting,” in Artificial Intelligence and Statistics, pp. 105–112, Morgan Kaufmann, Boston, Mass, USA, 2001. View at: Google Scholar
 A. Bifet, G. Holmes, B. Pfahringer, R. Kirkby, and R. Gavaldà, “New ensemble methods for evolving data streams,” in Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '09), pp. 139–147, ACM, Paris, France, July 2009. View at: Publisher Site  Google Scholar
 A. Bifet and R. Gavalda, “Learning from timechanging data with adaptive windowing,” in Proceedings of the of SIAM International Conference on Data Mining, pp. 443–448, 2007. View at: Google Scholar
 G. Hulten, L. Spencer, and P. Domingos, “Mining timechanging data streams,” in Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '01), pp. 97–106, ACM, August 2001. View at: Google Scholar
 H. Geoffrey, K. Richard, and P. Bernhard, “Tie breaking in hoeffding trees,” in Proceedings of the 2nd International Workshop on Knowledge Discovery from Data Streams, pp. 107–116, Porto, Portugal, 2005. View at: Google Scholar
 A. Frank and A. Asuncion, UCI Machine Learning Repository, University of California, School of Information and Computer Science, Irvine, Calif, USA, 2010, http://archive.ics.uci.edu/ml/.
 J. P. Bradford, C. Kunz, R. Kohavi, C. Brunk, and C. E. Brodley, “Pruning decision trees with misclassification costs,” in Proceedings of the 10th European Conference on Machine Learning (ECML '98), pp. 131–136, Springer, 1998. View at: Google Scholar
Copyright
Copyright © 2013 Hang Yang and Simon Fong. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.