Analysis of Generalization Ability for Different AdaBoost Variants Based on Classification and Regression Trees

Wu, Shuqiong; Nagahashi, Hiroshi

doi:https://doi.org/10.1155/2015/835357

Journal of Electrical and Computer Engineering

On this page

Abstract Introduction Materials and Methods Conclusion Supplementary Materials References Copyright Related Articles

Research Article | Open Access

Volume 2015 | Article ID 835357 | https://doi.org/10.1155/2015/835357

Analysis of Generalization Ability for Different AdaBoost Variants Based on Classification and Regression Trees

Shuqiong Wu¹and Hiroshi Nagahashi²

Academic Editor: Sos Agaian

Received13 Nov 2014

Accepted21 Jan 2015

Published10 Feb 2015

Abstract

As a machine learning method, AdaBoost is widely applied to data classification and object detection because of its robustness and efficiency. AdaBoost constructs a global and optimal combination of weak classifiers based on a sample reweighting. It is known that this kind of combination improves the classification performance tremendously. As the popularity of AdaBoost increases, many variants have been proposed to improve the performance of AdaBoost. Then, a lot of comparison and review studies for AdaBoost variants have also been published. Some researchers compared different AdaBoost variants by experiments in their own fields, and others reviewed various AdaBoost variants by basically introducing these algorithms. However, there is a lack of mathematical analysis of the generalization abilities for different AdaBoost variants. In this paper, we analyze the generalization abilities of six AdaBoost variants in terms of classification margins. The six compared variants are Real AdaBoost, Gentle AdaBoost, Modest AdaBoost, Parameterized AdaBoost, Margin-pruning Boost, and Penalized AdaBoost. Finally, we use experiments to verify our analyses.

1. Introduction

In the last two decades, AdaBoost and its variants were used in various fields, such as face detection [1, 2], hands detection [3, 4], and human detection [5]. AdaBoost was first introduced to the machine learning literature by Freund and Schapire [6]; and it has obtained tremendous success in classification. AdaBoost is proved efficient in increasing the classification margins of training data [7]. To describe AdaBoost mathematically, Schapire and Singer proposed Real AdaBoost, which is a generalized version of AdaBoost [8]. Real AdaBoost calculates its weak hypotheses by directly optimizing the upper bounds of training errors. Therefore, it converges faster than AdaBoost in the training [9]. To improve the training speed of Real AdaBoost, Wu and Nagahashi devised Parameterized AdaBoost which utilizes a new weight adjustment policy [10]. In 2000, Friedman et al. used additive logistic models to explain AdaBoost and proposed Gentle AdaBoost which computes weak hypotheses by minimizing the least square errors [11]. Friedman et al. also proved that Gentle AdaBoost is more robust than AdaBoost and Real AdaBoost [11]. For reducing the generalization error of Gentle AdaBoost, A. Vezhnevets and V. Vezhnevets suggested Modest AdaBoost which highlights weak classifiers that work well on difficult-to-classify instances [12]. Modest AdaBoost achieves better generalization errors than Gentle AdaBoost in some data sets [13]. However, its performance is unstable because the accuracy drops occasionally. For the same purpose, Wu and Nagahashi devised Margin-pruning Boost [14] and Penalized AdaBoost [15]. Margin-pruning Boost applies a weight reinitialization approach to reduce the influence from noise-like data, while Penalized AdaBoost improves Margin-pruning Boost by introducing an adaptive weight resetting policy. Moreover, it utilizes a margin distribution to penalize the misclassification of small-margin instances. Freund created BrownBoost to reduce the influence of outliers in training [16]. LPBoost was introduced to optimize the minimal margin of training data by using linear programming [17]. However, a comparison showed that LPBoost overall performs worse than AdaBoost [18]. Similarly to Margin-pruning Boost, MadaBoost and SmoothBoost were devised to increase the robustness against malicious noise data [19, 20]. Some AdaBoost variants such as AdaCost, AdaC1, AdaC2, AdaC3, CSB0, CSB1, CSB2, and RareBoost assign weights to positive training instances and negative training instances differently to obtain a better performance on imbalanced data sets [21–24]. While others trade off the integrity of training data for a faster training [25–28]. In 2004, AdaTree was proposed to speed up the training process. It utilizes the same way as in AdaBoost to select weak classifiers but combines them in a nonlinear manner [29]. Filterboost and Regularized AdaBoost were proposed to solve overfitting problem [30, 31]. Filterboost is based on a new logistic regression technique whereas Regularized AdaBoost requires validation subsets to identify and correct the overfitting iteratively. FloatBoost and FM-AdaBoost filter the less effective weak classifiers so that they can outperform AdaBoost when they have the same number of weak classifiers as AdaBoost [32, 33]. Nevertheless, they require more training cycles than AdaBoost. Nowadays, many novel AdaBoost variants were devised to improve the generalization ability such as SoftBoost, Interactive Boosting, ReweightBoost, Soft-LPBoost, and RobustBoost [34–39]. SoftBoost maximizes a soft margin instead of the hard margin used in AdaBoost [34, 36]. Interactive Boosting gives weights to both features and training instances [37]. ReweightBoost builds a tree structure by reusing the selected weak classifiers. Nevertheless, it can only use stump decision trees as its weak classifiers [38]. Soft-LPBoost combines SoftBoost with LPBoost [39]. While RobustBoost is an extension of BrownBoost [35]. All the five AdaBoost variants can achieve better generalization errors than AdaBoost. However, they suffered from a lot of complicated calculations which may lead to a longer training time. In the last few years, A novel approach called SemiBoost has been developed rapidly. It combines supervised learning with semisupervised learning by using both the labelled and unlabelled training instances [40]. The purpose of SemiBoost is to increase the generalization ability when the labelled training instances are insufficient [41]. In addition, AdaBoost.M1, Conservative.2 AdaBoost, and Aggressive AdaBoost are AdaBoost variants proposed for multiple classification problems [42].

With the proposals of many AdaBoost variants, a lot of surveys and comparison studies based on these variants were published. Miao and Heaton compared AdaBoost with Random Forest in ecosystem classification problems and showed that AdaBoost overall outperforms Random Forest [43]. Another comparison of AdaBoost and neural networks proved that AdaBoost ensemble of trees performs better than an individual neural network in experiments using cross validations [44]. Research in [45] applied AdaBoost and SVM to the Synthetic Aperture Radar Automatic Target Recognition Systems and found that AdaBoost is more robust than SVM. Ferreira briefly introduced many boosting algorithms and labelled them as “supervised learning” or “semisupervised learning” [46]. Seiffert et al. compared resampling boost algorithms with reweighting boost algorithms on imbalanced data sets and summarized that boosting by resampling generally outperforms boosting by reweighting [47]. A comparison of LPBoost and AdaBoost based on the experimental results of UCI repository was also conducted in [18]. Hegazy and Denzler evaluated AdaBoost and SoftBoost in generic object recognition and concluded that AdaBoost is more suitable for low-noise data sets while SoftBoost is more suitable for high-noise data sets [48]. Another study which compares AdaBoost with AdaTree was accomplished by Drauschke and Forstner [49]. Its experimental results showed that AdaTree usually performs better than AdaBoost but is prone to overfitting due to its tree-like structure. Jurić-Kavelj and Petrović evaluated three AdaBoost variants (Real, Gentle, and Modest AdaBoost) based on experiments in leg detection and found that Modest AdaBoost can not reduce the error rate as the number of iterations increases [50]. Sun et al. compared Discrete, Real, and Gentle AdaBoost by analyzing the experimental results in license plate detection and explained that Gentle AdaBoost achieves better performance than the other two methods [51]. Comparison in [52] focused on comparing weak classifiers of AdaBoost constructed by Bayes net, naive Bayes, and decision trees, and it showed that decision trees are the best. A review systematically introduced AdaBoost variants proposed during 1999 to 2012. Nevertheless, there is a lack of comparison between different AdaBoost variants [53].

In general, the above surveys and comparison studies either introduce the basic ideas of AdaBoost variants or compare different AdaBoost variants by experiments in a specific research. Differently from these studies, we compare the generalization abilities of six AdaBoost variants (Real AdaBoost, Gentle AdaBoost, Modest AdaBoost, Parameterized AdaBoost, Margin-pruning Boost, and Penalized AdaBoost) by analyzing the classification margins. The remainder of this paper is organized as follows. Section 2 explains the materials and methods. Section 3 shows experimental results. Section 4 draws a conclusion.

2. Materials and Methods

This section describes the training data and weak classifiers used in our research. It also explains the basic ideas of the six compared AdaBoost variants and their generalization abilities in terms of the classification margins.

2.1. Training Data and Weak Classifiers

Here we give a brief introduction of the training data. Given is a training set, where is the number of instances. We let be 1 if is positive, or −1 if is negative. In this paper, we only discuss binary classification problems. We use CART as weak classifiers [54]. As shown in Figure 1, a CART is a decision tree whose leaves output the classification results and inner nodes split the tree to minimize its error rate. In Figure 1, is the feature vector of instance [54].

2.2. Different AdaBoost Variants

2.2.1. AdaBoost and Real AdaBoost

AdaBoost is a machine learning method. At each round, it increases the weights of misclassified instances and decreases the weights of correctly classified instances. The weight adjustment policy is most important because AdaBoost can focus on difficult-to-classify instances. A generalized version of AdaBoost which is called Real AdaBoost is described as follows.

Algorithm 1 (Real AdaBoost). (1) Set the initial weights , where .
(2) Do the following tasks for .
(a) Train a weak classifier based on the weighted instances to divide the training set into partitions. Each leaf of the CART represents a partition. For any partition where , calculate and as follows:
(b) Compute the weak hypothesis for each partition : For any training instance , let its weak hypothesis be , where is the index of partition that falls into.
(c) Do weight updating by
(3) Let and output the strong classifier .

In AdaBoost, the values of weak hypotheses are +1 or −1. But in Real AdaBoost, they are real numbers. The sign of each weak hypothesis stands for the class of the weighted majority of instances falls into the partition , and the absolute value of represents a predication confidence.

2.2.2. Gentle AdaBoost

Gentle AdaBoost utilizes the same weight adjustment policy as Real AdaBoost does. However, it computes the weak hypotheses in a different way. Next we introduce Gentle AdaBoost as follows.

Algorithm 2 (Gentle AdaBoost). (1) Set the initial weights , where .
(2) Do the following tasks for .
(a) Train a weak classifier based on the weighted instances and then calculate and for each partition the same as in Step (2)(a) of Algorithm 1.
(b) Compute the weak hypothesis for each partition : For any training instance , its weak hypothesis equals , where is the index of partition that belongs to.
(c) Update instances’ weights by (3).
(3) Let and output the strong classifier .

In Algorithm 2, the weak hypotheses are calculated by optimizing the weighted least square errors. Thus, it is more robust and stable than AdaBoost [11].

2.2.3. Modest AdaBoost

Modest AdaBoost was proposed to suppress the generalization error of Gentle AdaBoost. It is explained by Algorithm 3.

Algorithm 3 (Modest AdaBoost). (1) Set the initial weights , where .
(2) Do the following tasks for .
(a) Train a weak classifier based on the weighted instances and then calculate and for each partition the same as in Step (2)(a) of Algorithm 1.
(b) Calculate an inverted weight by where . Then compute and for each partition as
(c) Compute the weak hypothesis for every partition : For an instance , its weak hypothesis equals , where is the index of partition that belongs to.
(c) Update instances’ weights by (3).
(3) Let and output the strong classifier .

Modest AdaBoost uses an “inverted” distribution to decrease the contribution of weak hypotheses that only work well on instances with small weights [12].

2.2.4. Parameterized AdaBoost

Parameterized AdaBoost is shown by Algorithm 4. Its purpose is to speed up the training process of Real AdaBoost.

Algorithm 4 (Parameterized AdaBoost). (1) Set the initial conditions in this step by letting and , where .
(2) Do the following tasks for .
(a) Train a weak classifier based on the weighted instances and then calculate and for each partition the same as in Step (2)(a) of Algorithm 1.
(b) Calculate the weak hypotheses by (2).
(c) Let and . Then update the weights of instances by
(3) Output the strong classifier .

For tuning parameter , the training and generalization errors on a data set Gamma Telescope which includes nearly 20000 instances were measured for . The results showed that has the best overall performance, so is set to be [10].

The difference between Real and Parameterized AdaBoost is the weight updating policy. In Step (2)(c), Parameterized AdaBoost adds a parameter and an absolute item to emphasize the instances whose margins are near 0 [10].

2.2.5. Margin-Pruning Boost

Margin-pruning Boost was designed to decrease the influence of noise-like instances. Next we describe this approach as follows.

Algorithm 5 (Margin-pruning Boost). (1) Set the initial weights , where .
(2) Do the following tasks for .
(a) Train a weak classifier based on the weighted instances and then calculate and for each partition the same as in Step (2)(a) of Algorithm 1.
(b) Compute the weak hypothesis for each partition by (4).
(c) Update the weights of instances as
(d) Set a threshold by the following equation: For any instance , if , reset and . Compute ; then do normalization by letting .
(3) Let and output the strong classifier .

Margin-pruning Boost restrains the weight increase of potential noise instances by resetting their weights and summed weak hypotheses [14]. Here the resetting of the sum of weak hypotheses can keep the weights of noise-like instances small. On the other hand, the resetting means that the combination of these weak hypotheses (the current strong hypothesis) for these noise-like instances is reset to be 0 because it can not correctly classify these instances.

2.2.6. Penalized AdaBoost

Penalized AdaBoost is an extension of Margin-pruning Boost. It introduces a margin distribution to penalize the misclassification of small-margin instances. Before introducing Penalized AdaBoost, we first explain the classification margins. The classification margin of an instance shows the difference between prediction confidence of weak hypotheses providing correct classification and that of weak hypotheses leading to misclassification [7]. It is in the range , and the instance is correctly classified if and only if its margin is positive [7]. Therefore, the margin of instance is defined as [10] where denotes the weak hypothesis for instance at round and is the number of total iterations. Next we explain Penalized AdaBoost by the following algorithm.

Algorithm 6 (Penalized AdaBoost). (1) Set the initial weights , where .
(2) Do the following tasks for .
(a) Train a weak classifier based on the weighted instances and then calculate and for each partition the same as in Step (2)(a) of Algorithm 1.
(b) Calculate a margin feedback factor as shown in where equals . Then compute and of every partition by
(c) Compute the weak hypothesis for each as For each instance , set its weak hypothesis to be , where is the index of partition which falls into.
(d) Update all instance weights by
(e) For each instance , if and , reinitialize instance as follows: Then normalize by letting .
(3) Set and output the strong classifier .

Penalized AdaBoost calculates its weak hypotheses by introducing a margin feedback factor in Steps (2)(b) and (2)(c). Moreover, it improves the thresholding of Margin-pruning Boost in Step (2)(e). The parameter in (18) is tuned by the following steps: first the classification performances on 5 data sets with different values of ( = 10; 30; 50; 70; 90) are evaluated, and then the value with the best performance is assigned to [15]. Then we will analyze the generalization abilities of the six variants in the next section.

2.3. Generalization Ability Analysis

In this section, we analyze the generalization abilities of the six AdaBoost variants by comparing their weak hypotheses and weight updating policies.

2.3.1. Real and Gentle AdaBoost

The difference between Real and Gentle AdaBoost is how they calculate their weak hypotheses. Real AdaBoost computes the weak hypothesis by minimizing the upper bound of training error in each loop [8]. Gentle AdaBoost calculates its weak hypothesis by optimizing the weighted least square error iteratively [11]. Real AdaBoost tries to decrease the training error whereas Gentle AdaBoost aims at reducing the variance of its weak hypotheses. Thus, in most cases, Real AdaBoost converges faster than Gentle AdaBoost in training, but Gentle AdaBoost is more stable than Real AdaBoost with respect to the generalization error.

2.3.2. Real and Parameterized AdaBoost

Comparing Step (2)(c) in Real AdaBoost with that in Parameterized AdaBoost, we find that the weight updating policies of the two variants are different. From (12), we know that the training error converges to 0 if and only if the margins of all training instances are increased to be positive. In Step (2)(c) of Real AdaBoost, instances with small margins obtain more weights so that they are more likely correctly classified in future iterations. However, as Real AdaBoost focuses on instances with small margins, it may lead to an increase of instances whose margins are near 0; that is, these instances change the sign of their margins back and forth in the boosting process. Here we call them “swinging instances.” These swinging instances slow the convergence of training error. To decrease the number of swinging instances, Parameterized AdaBoost introduces parameter to give them larger weights. From (8), we can see that reduces more weight for instances whose margins are far from 0 and less weight for swinging instances. Therefore, swinging instances obtain more attention than nonswinging ones. Parameterized AdaBoost aims at reducing the number of swinging instances by trying to correctly classify them in the early training phase. Thus, Parameterized AdaBoost can converge faster than Real AdaBoost in training. With respect to the generalization error, Parameterized AdaBoost is more prone to overfitting than Real AdaBoost especially when the CART used as weak classifiers has more inner nodes. The reason is that it focuses more on swinging instances at the cost of focusing less on instances with minimal margins. However, it can perform similarly to or slightly better than Real AdaBoost if it uses simple weak classifiers such as stump decision trees because stump trees are more resistant to overfitting than CART with many inner nodes.

2.3.3. Gentle and Modest AdaBoost

Modest AdaBoost utilizes an inverted weight distribution to highlight weak hypotheses that can correctly classify instances with small margins. Nevertheless, the performance of Modest AdaBoost is not stable. Here we give an example to explain the reason. We suppose ; if is also larger than , the factor assigns higher prediction confidence to weak hypotheses that correctly classify small-margin instances. At the same time, the factor reduces the prediction confidence for weak hypotheses misclassifying small-margin instances. In this case, Modest AdaBoost outperforms Gentle AdaBoost. However, if is smaller than in the case , the sign of the weak hypothesis will be negative. This means the factor reduces the prediction confidence for weak hypotheses that correctly classify small-margin instances. Meanwhile, increases the prediction confidence of weak hypotheses misclassifying small-margin instances. In this case, Modest AdaBoost performs far worse than Gentle AdaBoost.

2.3.4. Gentle and Margin-Pruning Boost

At each round of Gentle AdaBoost, the weights of misclassified instances are increased whereas the weights of correctly classified instances are decreased. This will lead to the phenomenon that the weights of difficult-to-classify instances are increased too large. If these instances are noise data or outliers, the performance of the final strong classifier will be degraded. To solve this problem, Margin-pruning Boost utilizes a threshold to filter instances whose weights are too large and then resets their weights to be 1.

Margin-pruning Boost reduces the influence from noise-like instances effectively in the early training phase by restraining the weight increase of filtered instances [14]. However, as the number of iterations increases, the weights of instances filtered by thresholding become smaller and smaller. In the late training phase, the weights of these filtered instances are probably reduced smaller than 1. In that case, resetting their weights to be 1 actually increases the influence of these instances. Thus, the performance of Margin-pruning Boost drops when the number of loops increases.

2.3.5. Margin-Pruning Boost and Penalized AdaBoost

Penalized AdaBoost is an improvement of Margin-pruning Boost. First it introduces a margin feedback factor to assign higher prediction confidence to weak hypotheses that can correctly classify small-margin instances. From (13), (14), and (15), we can see that and are proportional to the margins, and they are computed from the sum of margin feedback factors of misclassified instances. This means that misclassifying small-margin instances will lead to small and . Therefore, the prediction confidence of weak hypotheses which misclassify small-margin instances will be degraded. Compared with Gentle AdaBoost and Margin pruning Boost, Penalized AdaBoost can stand out more competent weak hypotheses. Therefore, it is more robust than the other two variants. Modest AdaBoost highlights more competent weak hypotheses in some cases but downplays these weak hypotheses in other cases. By contrast, Penalized AdaBoost attaches importance to these more competent weak hypotheses under any circumstance. Thus it is more stable than Modest AdaBoost.

Furthermore, Penalized AdaBoost solves the problem of Margin-pruning Boost by utilizing a more adaptive thresholding method. Penalized AdaBoost also uses the thresholding to filter the large-weight instances similarly to Margin-pruning Boost. However, it only resets the weights of filtered instances with negative margins. This technique guarantees that the reset weights are always smaller than the original ones. For these noise-like instances, Penalized AdaBoost does not completely exclude them because they are not definitely noise. Nevertheless, Penalized AdaBoost keeps their weights small to reduce their influence on the final strong classifier. Thereby, it has better generalization ability than Margin-pruning Boost.

2.4. Margin Distribution Comparison

In this section, we compare the generalization abilities of the six different variants by analyzing their margin distributions. Li and Shen showed that reducing the minimal margin of training data plays little role in improving the generalization ability [18]. However, enlarging the whole margin distribution to obtain a balance between the training error and complexity is crucial to the generalization ability [18]. Here we use three kinds of CART as weak classifiers to evaluate the six AdaBoost variants. The three kinds of CART are CART-1 (CART with one inner node), CART-2 (CART with two inner nodes), and CART-3 (CART with three inner nodes). To get the cumulative margin distributions, for each data set, we use of its data to train the final strong classifiers. Figure 2 shows the cumulative margin distributions based on CART-1 in data set German at iteration 200. Figure 3 shows the generalization errors of the same data set with respect to Figure 2. In Figure 2, Penalized AdaBoost enlarges the whole margin distribution more than the other variants so that it achieves the best generalization error in Figure 3. Real AdaBoost, Gentle AdaBoost, Parameterized AdaBoost, and Margin-pruning Boost perform similarly on the margins. So their generalization errors are also similar when the number of iterations reaches to 200. The margin curve of Modest AdaBoost in Figure 2 is not smooth. That may explain why its generalization error in Figure 3 is not changed gradually.

Margin distributions based on CART-2 in the same data set at iteration 200 are shown in Figure 4. Figure 5 shows the generalization errors with respect to Figure 4. From Figures 4 and 5, we notice that the generalization abilities of the six variants are consistent with their performance on the margins. We also evaluate margin distributions using CART-3 on the data set German. Margin curves at iterations 10, 100, and 1000 are shown by Figures 6, 7, and 8, respectively. Comparing the three figures, we find that the margins are enlarged gradually as the number of iterations increases. Furthermore, we notice that Margin-pruning Boost outperforms Gentle AdaBoost in Figures 6 and 7. However it performs worse than Gentle AdaBoost in Figure 8. This demonstrates that the performance of Margin-pruning Boost drops as the number of iterations increases. Differently from Margin-pruning Boost, Penalized AdaBoost outperforms others at most cases. Thus it is most robust and stable. Figure 9 shows the generalization errors of the six variants which use CART-3 as their weak classifiers. In Figure 9, Margin-pruning Boost obtains lower generalization errors than Gentle AdaBoost before iteration 500. Unfortunately, it leads to severe overfitting after iteration 500.

Figures 10 and 11 show margin distributions of other data sets. From these margin curves, we can conclude that Penalized AdaBoost generally outperforms the other five variants on enlarging the whole margin distributions. Real and Gentle AdaBoost perform very similarly, and Parameterized AdaBoost is slightly worse than Real AdaBoost when it uses CART-2 and CART-3. Margin-pruning Boost is better than Gentle AdaBoost if the number of iterations is small, While the margin curves of Modest AdaBoost are not smooth, they may lead to an unstable performance on generalization errors.

(a)

(b)

(a)

(b)

3. Experiments

In this section, we compare the six AdaBoost variants using 25 binary classification data sets from UCI [55]. For every data set, we used Matlab AdaBoost Toolbox [54] and 3-fold cross validation. First we measure the generalization errors (estimated by the classification error on the test set) of the six variants based on CART-1. Table 1 summarizes the results of the six variants using CART-1 at iteration 200. Tables 2 and 3 show their generalization errors using CART-1 at iterations 500 and 800, respectively. We also compare the generalization errors of the six variants based on CART-2 and CART-3. Table 4 shows the comparison results using CART-2, and Table 5 compares the six variants using CART-3.

In Tables 1, 2, 3, 4, and 5, RAB, GAB, MAB, PAAB, MPB, and PAB denote Real AdaBoost, Gentle AdaBoost, Modest AdaBoost, Parameterized AdaBoost, Margin-pruning Boost, and Penalized AdaBoost separately. The row of VS.GAB shows the residues that the sum of generalization errors of other variants subtracts that of Gentle AdaBoost. Here the bold values show the best performance and No.Best means the number of best generalization errors. No.To.GAB denotes the number of data sets in which Gentle AdaBoost is outperformed by others. From Tables 1, 2, and 3, we can conclude that Real, Gentle, and Parameterized AdaBoost perform similarly when using CART-1 as weak classifiers. Modest AdaBoost performs worse than other variants at most cases. Moreover, its error rates are rarely changed even when the number of loops increases. Comparing No.Best of Margin-pruning Boost in Tables 1, 2, and 3, we find that its performance drops as the number of iterations increases. We can also see that Penalized AdaBoost generally outperforms other variants from VS.GAB, No.Best, and No.To.GAB in Tables 1, 2, and 3. Comparing Tables 1, 4, and 5, we can conclude that increasing the inner nodes of CART is important to reduce the generalization errors. In Tables 4 and 5, we can see that Gentle AdaBoost is slightly better than Real AdaBoost. However, the performance of Parameterized AdaBoost and Margin-pruning Boost drops sharply. This means the two variants are more suitable for CART-1. On the other hand, Modest AdaBoost using CART-2 or CART-3 performs better than that using CART-1. This suggests that Modest AdaBoost is suitable for CART with more inner nodes. From all tables, we notice that the performance of Gentle and Penalized AdaBoost is not degraded neither by the number of inner nodes in CART nor by the number of iterations. Nevertheless, Penalized AdaBoost shows stronger robustness when compared with Gentle AdaBoost.

4. Conclusion

This paper analyzes the generalization abilities of six AdaBoost variants mathematically. The novel contributions of our work are listed as follows.(1)There are many comparison studies of AdaBoost variants. However, we compare three new proposed variants (Parameterized AdaBoost, Margin-pruning Boost, and Penalized AdaBoost) with three traditional variants (Real, Gentle, and Modest AdaBoost). This kind of comparison is new in the machine learning studies.(2)Differently from conventional comparison works that draw conclusion from experimental results, we analyze the generalization abilities of the six variants by comparing their classification margins.(3)We design experiments to verify our analyses. The experimental results are consistent with our analyses.

In general, the analyses and comparison in this paper are useful for researchers who want to improve the classification performance by switching to a new AdaBoost variant. In our current research, we focus on two classification problems. In our future work, we want to extend our analyses to multiclassification problems. In addition, we will compare more kinds of weak classifiers such as SVM and ANN to find out which kind of weak classifiers is suitable for which AdaBoost variant.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Supplementary Materials

The Supplementary Material includes 25 binary classification data sets we used in our experiments. These data sets were downloaded from UCI.

Supplementary Material

References

J.-M. Guo, C.-C. Lin, M.-F. Wu, C.-H. Chang, and H. Lee, “Complexity reduced face detection using probability-based face mask prefiltering and pixel-based hierarchical-feature adaboosting,” IEEE Signal Processing Letters, vol. 18, no. 8, pp. 447–450, 2011.
View at: Publisher Site | Google Scholar
P. Viola and M. Jones, “Rapid object detection using a boosted cascade of simple features,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '01), pp. I-511–I-518, December 2001.
View at: Google Scholar
M. Kölsch and M. Turk, “Robust hand detection,” in Proceedings of the 6th IEEE International Conference on Automatic Face and Gesture Recognition (FGR '04), pp. 614–619, May 2004.
View at: Google Scholar
J. M. Guo, Y. F. Liu, C. H. Chang, and H. S. Nguyen, “Improved hand tracking system,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 5, pp. 693–701, 2012.
View at: Publisher Site | Google Scholar
J. Xu, Q. Wu, J. Zhang, and Z. Tang, “Fast and accurate human detection using a cascade of boosted MS-LBP features,” IEEE Signal Processing Letters, vol. 19, no. 10, pp. 676–679, 2012.
View at: Publisher Site | Google Scholar
Y. Freund and R. E. Schapire, “A short introduction to boosting,” Journal of Japanese Society for Artificial Intelligence, vol. 14, no. 5, pp. 771–780, 1999.
View at: Google Scholar
R. E. Schapire, Y. Freund, P. Bartlett, and W. S. Lee, “Boosting the margin: a new explanation for the effectiveness of voting methods,” The Annals of Statistics, vol. 26, no. 5, pp. 1651–1686, 1998.
View at: Publisher Site | Google Scholar | MathSciNet
R. E. Schapire and Y. Singer, “Improved boosting algorithms using confidence-rated predictions,” Machine Learning, vol. 37, no. 3, pp. 297–336, 1999.
View at: Google Scholar
Z. L. Fu, “Analysis and improvement on real AdaBoost algorithm,” Journal of University of Electronic Science and Technology of China, vol. 45, no. 10, pp. 1747–1755, 2012.
View at: Google Scholar
S. Wu and H. Nagahashi, “Parameterized adaboost: introducing a parameter to speed up the training of real adaboost,” IEEE Signal Processing Letters, vol. 21, no. 6, pp. 687–691, 2014.
View at: Publisher Site | Google Scholar
J. Friedman, T. Hastie, and R. Tibshirani, “Additive logistic regression: a statistical view of boosting,” The Annals of Statistics, vol. 28, no. 2, pp. 337–407, 2000.
View at: Publisher Site | Google Scholar | MathSciNet
A. Vezhnevets and V. Vezhnevets, “Modest AdaBoost: teaching AdaBoost to generalize better,” Graphicon, vol. 12, no. 5, pp. 987–997, 2005.
View at: Google Scholar
Y. Z. J. Thongkam, G. Xu, and F. Huang, “Breast cancer survivability via adaboost algorithms,” in Proceedings of the 2nd Australasian Workshop on Health Data and Knowledge Management (HDKM ’08), vol. 80, pp. 55–64, New South Wales, Australia, January 2008.
View at: Google Scholar
S. Wu and H. Nagahashi, “A new method for solving overfitting problem of gentle AdaBoost,” in Proceedings of the 5th International Conference on Graphic and Image Processing, vol. 9069 of Proceedings of SPIE, pp. 1–6, Hong Kong, China, January 2014.
View at: Google Scholar
S. Wu and H. Nagahashi, “Penalized AdaBoost: improving the generalization error of gentle AdaBoost through a margin distribution,” IEICE Transactions on Information and Systems. Submitted.
View at: Google Scholar
Y. Freund, “An adaptive version of the boost by majority algorithm,” in Proceedings of the 12th Annual Conference on Computational Learning Theory, pp. 102–113, 2000.
View at: Google Scholar
A. Demiriz, K. P. Bennett, and J. Shawe-Taylor, “Linear programming boosting via column generation,” Machine Learning, vol. 46, no. 1–3, pp. 225–254, 2002.
View at: Publisher Site | Google Scholar
H. Li and C. Shen, “Boosting the minimum margin: LP boost vs. ada boost,” in Proceedings of the Digital Image Computing: Techniques and Applications (DICTA '08), pp. 533–539, December 2008.
View at: Publisher Site | Google Scholar
C. Domingo and O. Watanabe, “Madaboost: a modification of adaboost,” in Proceedings of the 13th Annual Conference on Computational Learning Theory, pp. 180–189, Palo Alto, Calif, USA, July 2000.
View at: Google Scholar
R. A. Servedio, “Smooth boosting and learning with malicious noise,” Journal of Machine Learning Research, vol. 4, no. 4, pp. 633–648, 2004.
View at: Google Scholar | MathSciNet
W. Fan, S. J. Stolfo, J. Zhang, and P. K. Chan, “AdaCost: misclassification cost-sensitive boosting,” in Proceedings of the 16th International Conference on Machine Learning, pp. 97–105, 1999.
View at: Google Scholar
K. M. Ting, “A comparative study of cost-sensitive boosting algorithms,” in Proceedings of 17th International Conference on Machine Learning, pp. 983–990, 2000.
View at: Google Scholar
M. V. Joshi, V. Kumar, and R. C. Agarwal, “Evaluating boosting algorithms to classify rare classes: comparison and improvements,” in Proceedings of the 1st IEEE International Conference on Data Mining (ICDM '01), pp. 257–264, December 2001.
View at: Google Scholar
Y. Sun, M. S. Kamel, A. K. C. Wong, and Y. Wang, “Cost-sensitive boosting for classification of imbalanced data,” Pattern Recognition, vol. 40, no. 12, pp. 3358–3378, 2007.
View at: Publisher Site | Google Scholar
D. P. Young and J. M. Ferryman, “Faster learning via optimised adaboost,” in Proceedings of the IEEE Conference on Advanced Video and Signal Based Surveillance (AVSS '05), pp. 400–405, September 2005.
View at: Google Scholar
M. Seyedhosseini, A. R. C. Paiva, and T. Tasdizen, “Fast AdaBoost training using weighted novelty selection,” in Proceedings of the International Joint Conference on Neural Network (IJCNN '11), pp. 1245–1250, August 2011.
View at: Publisher Site | Google Scholar
B. Paul, G. Athithan, and M. N. Murty, “Speeding up AdaBoost classifier with random projection,” in Proceedings of the 7th International Conference on Advances in Pattern Recognition (ICAPR '09), pp. 251–254, February 2009.
View at: Publisher Site | Google Scholar
C. Sun, J. Hu, and K.-M. Lam, “Feature subset selection for efficient AdaBoost training,” in Proceedings of the 12th IEEE International Conference on Multimedia and Expo (ICME '11), pp. 1–6, IEEE, Barcelona, Spain, July 2011.
View at: Publisher Site | Google Scholar
E. Grossmann, “AdaTree: boosting a weak classifier into a decision tree,” in Proceedings of the Conference on Computer Vision and Pattern Recognition Workshop (CVPR '04), pp. 105–112, Washington, DC, USA, 2004.
View at: Google Scholar
J. K. Bradley and R. E. Schapire, “FilterBoost: regression and classification on large datasets,” in Proceedings of the 21st Annual Conference on Neural Information Processing Systems (NIPS '07), pp. 185–192, December 2007.
View at: Google Scholar
Y. Sun, J. Li, and W. Hager, “Two new regularized AdaBoost algorithms,” in Proceedings of the International Conference on Machine Learning and Applications (ICMLA '04), pp. 41–48, December 2004.
View at: Google Scholar
S. Z. Li and Z. Zhang, “FloatBoost learning and statistical face detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, no. 9, pp. 1112–1123, 2004.
View at: Publisher Site | Google Scholar
Y. Zhang and P. He, “A revised AdaBoost algorithm: FM-AdaBoost,” in Proceedings of the International Conference on Computer Application and System Modeling (ICCASM '10), pp. V11-277–V11-281, October 2010.
View at: Publisher Site | Google Scholar
G. Rätsch, T. Onoda, and K.-R. Müller, “Soft margins for AdaBoost,” Machine Learning, vol. 42, no. 3, pp. 287–320, 2001.
View at: Publisher Site | Google Scholar
Y. Freund, “A more robust boosting algorithm,” http://arxiv.org/abs/0905.2138.
View at: Google Scholar
M. K. Warmuth, K. Glocer, and G. Rätsch, “Boosting algorithms for maximizing the soft margin,” in Proceedings of the 21st Annual Conference on Neural Information Processing Systems (NIPS '07), pp. 1–8, December 2007.
View at: Google Scholar
Y. Lu, Q. Tian, and T. Huang, “Interactive boosting for image classification,” in Proceedings of the 7th International Conference on Multiple Classifier Systems, pp. 180–189, 2007.
View at: Google Scholar
J. J. Rodríguez and J. Maudes, “Boosting recombined weak classifiers,” Pattern Recognition Letters, vol. 29, no. 8, pp. 1049–1059, 2008.
View at: Publisher Site | Google Scholar
M. Warmuth, K. Glocer, and S. Vishwanathan, “Entropy regularized LPBoost,” in Proceedings of the 19th International Conference on Algorithmic Learning Theory, pp. 256–271, 2008.
View at: Google Scholar
P. K. Mallapragada, R. Jin, A. K. Jain, and Y. Liu, “SemiBoost: boosting for semi-supervised learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 11, pp. 2000–2014, 2009.
View at: Publisher Site | Google Scholar
K. Chen and S. Wang, “Semi-supervised learning via regularized boosting working on multiple semi-supervised assumptions,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 1, pp. 129–143, 2011.
View at: Publisher Site | Google Scholar
L. I. Kuncheva, “Error bounds for aggressive and conservative adaboost,” in Proceedings of the 4th International Workshop on Multiple Classifier Systems, pp. 25–34, 2003.
View at: Google Scholar
X. Miao and J. S. Heaton, “A comparison of random forest and AdaBoost tree in ecosystem classification in east Mojave Desert,” in Proceedings of the 18th International Conference on Geoinformatics, pp. 1–6, Beijing, China, June 2010.
View at: Google Scholar
E. Alfaro, N. García, M. Gámez, and D. Elizondo, “Bankruptcy forecasting: an empirical comparison of AdaBoost and neural networks,” Decision Support Systems, vol. 45, no. 1, pp. 110–122, 2008.
View at: Publisher Site | Google Scholar
Y. Wang, P. Han, X. Lu, R. Wu, and J. Huang, “The performance comparison of Adaboost and SVM applied to SAR ATR,” in Proceedings of the CIE International Conference on Radar (ICR '06), pp. 1–4, October 2006.
View at: Publisher Site | Google Scholar
A. Ferreira, “Survey on boosting algorithms for supervised and semisupervised learning,” Tech. Rep., Instituto Superior de Engenharia de Lisboa, 2007.
View at: Google Scholar
C. Seiffert, T. M. Khoshgoftaar, J. Van Hulse, and A. Napolitano, “Resampling or reweighting: a comparison of boosting implementations,” in Proceedings of the 20th IEEE International Conference on Tools with Artificial Intelligence (ICTAI '08), pp. 445–451, November 2008.
View at: Publisher Site | Google Scholar
D. Hegazy and J. Denzler, “Performance comparison and evaluation of adaboost and softboost algorithms on generic object recognition,” Proceedings of World Academy of Science: Engineering & Technolog, vol. 47, pp. 70–74, 2008.
View at: Google Scholar
M. Drauschke and W. Forstner, Comparison of AdaBoost and ADTboost for Feature Subset Selection, INSTICC PRESS, 2008.
S. Jurić-Kavelj and I. Petrović, “Experimental comparison of AdaBoost algorithms applied on leg detection with different range sensor setups,” in Proceedings of the 19th International Workshop on Robotics in Alpe-Adria-Danube Region (RAAD '10), pp. 267–272, June 2010.
View at: Publisher Site | Google Scholar
J. Sun, D. Cui, D. Gu, H. Cai, and G. Liu, “Empirical analysis of AdaBoost algorithms on license plate detection,” in Proceedings of the IEEE International Conference on Mechatronics and Automation (ICMA '09), pp. 3497–3502, August 2009.
View at: Publisher Site | Google Scholar
P. Natesan, P. Balasubramanie, and G. Gowrison, “Performance comparison of AdaBoost based weak classifiers in network intrusion detection,” Journal of Information Systems and Communication, vol. 3, no. 1, pp. 295–299, 2012.
View at: Google Scholar
A. J. Ferreira and M. T. Figueiredo, “Boosting algorithms: a review of methods, theory, and applications,” Journal of Information Systems and Communication, vol. 3, no. 1, pp. 35–85, 2012.
View at: Google Scholar
A. Vezhnevets and V. Vezhnevets, “GML AdaBoostMatlab Toolbox Manual,” http://graphics.cs.msu.ru/ru/science/research/machinelearning/adaboosttoolbox.
View at: Google Scholar
K. Bache and M. Lichman, UCI Machine Learning Repository, University of California, School of Information and Computer Science, Irvine, Calif, USA, 2013, http://archive.ics.uci.edu/ml/.

Copyright

Copyright © 2015 Shuqiong Wu and Hiroshi Nagahashi. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

4993

Downloads

1715

Citations