Input: |
: a stream of examples |
: a set of symbolic attributes |
: heuristic evaluation function for node splitting |
: one minus desired probability of choosing the correct attribute at any given node. |
: number of samples between estimation of growth |
: sorted list of Hoeffding bound values |
: total number of values in |
: new Hoeffding Bound value seen at the node |
: adaptive threshold |
: subset of |
: 5% of examples in . Threshold for checking the eligibility of a node to be part of HT |
Size of |
Output: |
A decision tree HT |
Procedure EnhancedVFDT(, , , , ) |
BEGIN: |
A stream of examples arrives |
IF (), THEN TreeInitialization(S, X) |
Get an Initialized HT with a single root node |
IF (), THEN NewStreamSample(S, X) |
Label with the majority class among the samples seen so far at |
Let be the number of samples seen at |
IF the samples seen so far at are not all of the same class and |
THEN |
Compute for each attribute using |
PrunedMean = AccuracyEVFDT(, , , ) |
Let be the attribute with highest and be the attribute with second-highest |
Compute using (1) |
Let |
IF ( or ≤ PrunedMean) and , THEN split as a branch |
FOR each branch of split |
Add a new leaf and let |
Let be the obtained by predicting the most frequent class at |
FOR each class and each value of each attribute |
Let . |
END-FOR |
END-FOR |
END-IF |
ELSE Pruning(, , , , HT) |
Return HT |
END: |