Classifying Imbalanced Data Sets by a Novel RE-Sample and Cost-Sensitive Stacked Generalization Method

Yan, Jianhong; Han, Suqing

doi:https://doi.org/10.1155/2018/5036710

Mathematical Problems in Engineering

On this page

Abstract Introduction Background Conclusions Conflicts of Interest Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2018 | Article ID 5036710 | https://doi.org/10.1155/2018/5036710

Classifying Imbalanced Data Sets by a Novel RE-Sample and Cost-Sensitive Stacked Generalization Method

Jianhong Yan¹and Suqing Han¹

Academic Editor: Michele Migliore

Received03 Jun 2017

Revised01 Oct 2017

Accepted06 Nov 2017

Published23 Jan 2018

Abstract

Learning with imbalanced data sets is considered as one of the key topics in machine learning community. Stacking ensemble is an efficient algorithm for normal balance data sets. However, stacking ensemble was seldom applied in imbalance data. In this paper, we proposed a novel RE-sample and Cost-Sensitive Stacked Generalization (RECSG) method based on 2-layer learning models. The first step is Level 0 model generalization including data preprocessing and base model training. The second step is Level 1 model generalization involving cost-sensitive classifier and logistic regression algorithm. In the learning phase, preprocessing techniques can be embedded in imbalance data learning methods. In the cost-sensitive algorithm, cost matrix is combined with both data characters and algorithms. In the RECSG method, ensemble algorithm is combined with imbalance data techniques. According to the experiment results obtained with 17 public imbalanced data sets, as indicated by various evaluation metrics (AUC, GeoMean, and AGeoMean), the proposed method showed the better classification performances than other ensemble and single algorithms. The proposed method is especially more efficient when the performance of base classifier is low. All these demonstrated that the proposed method could be applied in the class imbalance problem.

1. Introduction

Classification learning becomes complicated if the class distribution of the data is imbalanced. The class imbalance problem occurs when the number of representative instances is much less than that of other instances. Recently, the classification problem of imbalanced data appears frequently and has been widely concerned ‎[1–3].

Usually, imbalanced data sets are grouped into binary class and the majority class and minority class are, respectively, denoted as negative class and positive class. Traditional techniques are divided into four categories. RE-sample technique is to increase the minority class of instances (oversampling) ‎[4] or decrease the majority class of instances (undersampling) ‎[5, 6]. Existing algorithms are improved by increasing the weight of positive instances ‎[7]. Classifier ensemble method is widely adopted to deal with the imbalance problem in the last decade ‎[8]. In cost-sensitive algorithms, data characters are incorporated with misclassification costs in the classification phase ‎[9, 10]. In general, cost-sensitive and algorithms levels are more associated with the imbalance problem, whereas data level and ensemble learning can be used and independent of the single classifier.

Ensemble methods involving training base classifiers, integrating their results, and generating a single final class label can increase the accuracy of classifiers. Bagging algorithm ‎[11] and Ada AdaBoost boost algorithm ‎[12, 13] are the most common ensemble classification algorithms. Ensemble algorithms combined with the other three techniques are widely applied in the classification of imbalanced data sets. Cost-sensitive learning targets the imbalanced learning problem by using different cost matrices that can be considered as a numerical representation of the penalty of misclassifying examples from one class to another. Data characters are incorporated with misclassification costs in the classification phase. Cost-sensitive learning is closely related to learning from imbalanced data. In order to solve the imbalance problem, ensemble algorithms were combined with data preprocessing, cost-sensitive method, and relevant algorithms in previous studies. Four ensemble methods for imbalance data sets are commonly used: boosting-, bagging-, cost-sensitive-boosting and hybrid ensemble methods.

In boosting-based ensembles, the data preprocessing technique is embedded into boosting algorithms. In each iteration, the data distribution is changed by altering the weight to train the next classifier towards the positive class. These algorithms mainly include SMOTEBoost ‎[14], RUSBoost ‎[15], MSMOTEBoost ‎[16], and DataBoost-IM ‎[17] algorithms. In bagging-based ensembles, the algorithms combine bagging with data preprocessing techniques. The algorithms are usually simpler than their integration in boosting because of simplicity and good generalization ability of bagging. The family includes but not limited to OverBagging, UnderOverBagging ‎[18], UnderBagging ‎[19], and IIVotes ‎[20]. In cost-sensitive-boosting ensembles, the general learning framework of AdaBoost is maintained, but a misclassification cost adjustment function is introduced into the weight updating formula. These ensembles are usually different in the modification way of the weight update rule. AdaCost ‎[21], CSB1, CSB2 ‎[22], RareBoost ‎[23], AdaC1, AdaC2, and AdaC3 ‎[24] are the most representative approaches. Unlike previous algorithms, hybrid ensemble methods adopt the double ensemble learning methods. For example, EasyEnsemble and BalanceCascade use bagging as the main ensemble learning method, but each base classifier is trained by AdaBoost. Therefore, the final classifier is an ensemble of ensembles.

Stacking algorithm is another ensemble method and has the same base classifiers to the Bagging and AdaBoost algorithms, but the structure is different. Stacking algorithm has a two-level structure consisting of Level 0 classifiers and Level 1 classifiers. Stacking algorithm involves two steps. The first step is to collect the output of each model into a new set of data. For each instance in the original training set, the data set represents every model’s prediction of that instance’s class and the models are base classifiers. In the second step, based on the new data and true labels of each instance in the original training set, a learning algorithm is employed to train the second-layer model. In Wolpert's terminology, the first step is referred to as the Level 0 layer and the second-stage learning algorithm is referred to as the Level 1 layer ‎[25].

Stacking ensemble is a general method, in which a high-level model is combined with the lower-level models. Stacking ensemble can achieve the higher predictive accuracy. Chen et al. adopted ant colony optimization to configure stacking ensembles for data mining ‎[26]. Kadkhodaei and Moghadam proposed an entropy-based approach to search for the best combination of the base classifiers in ensemble classifiers based on stack generalization ‎[27]. Czarnowski and Jędrzejowicz focused on the machine classification with data reduction based on stacked generalization ‎[28]. Most of the previous studies were focused on the way to use or generate stacking algorithm. However, the stacking ensemble does not consider data distribution and is suitable for the common data sets other than imbalance data.

In order to solve the imbalance problem, the paper introduces cost-sensitive learning into stacking ensemble and adds a misclassification cost adjustment function into the weight of instance and classifier. In this way, misclassification costs may be considered in the data set as a form of data space weighting to select the best distribution for training. On the other hand, in the combine stage, metatechniques can be integrated with cost-sensitive classifiers to replace standard cost-minimizing techniques. The weights for the misclassification of positive instances are higher and the weights for misclassification of negative instance are relatively lower. The method provides an option for imbalanced learning domains.

In this paper, RE-sample and Cost-Sensitive Stacked Generalization (RECSG) method is proposed in order to solve the imbalance problem. In the method, preprocessed imbalance data are used to train the Level 0 layer model. Unlike common ensemble algorithm for imbalanced data, the proposed method utilized cost-sensitive algorithm as the Level 1 (meta)layer. Stacking methods combined with imbalance data approaches including cost-sensitive learning had been reported. Kotsiantis proposed a stacking variant methodology with cost-sensitive models as base learners ‎[29]. In the method, the model tree was replaced by MLR in metalayer to determine the class with the highest probability associated with the true class. Lo et al. proposed a cost-sensitive stacking method for audio tag annotation and retrieval ‎[30]. In these methods, cost-sensitive learners are adopted in the base-layer model and the metalayer model is trained by other learning algorithms such as SVM and decision tree. In this paper, Level 0 model generalizer involves resampling the data and training the base classifier. The cost-sensitive algorithm is used to train the Level 1 metalayer classifier. The two layers adopt the imbalanced data algorithms and take the full advantages of mature methods. Level 1 layer model has a bias towards the performance of the minority class. Therefore, the method proposed in the study is more efficient than other methods in which cost-sensitive algorithms are only used in the Level 0 layer.

The method was compared with common classification methods, including other ensemble algorithms. Additionally, the evaluation metrics of the algorithm were analyzed based on the results of statistical tests. Statistical tests of evaluation metrics demonstrated that the proposed new approach could effectively solve the imbalanced problem.

The paper is structured as follows. Related ensemble approaches and cost-sensitive algorithms are introduced in Section 2. Section 3 introduces the details of proposed RECSG approach including Level 0 and Level 1 model generalizers. In Section 4, experiments and corresponding results and analysis are presented. Statistical tests of evaluation metrics of algorithm performance are analyzed and discussed. Finally, Section 5 discusses the advantages and disadvantages of the proposed method.

2. Background

2.1. Performance Evaluation in Imbalanced Domains

The evaluation metric is a vital factor for the classifier model and performance assessment. In Table 1, the confusion matrix demonstrates the results of incorrectly and correctly classified instances of each class in the two classes of problems.

Accuracy is the most popular evaluation metric. However, it cannot effectively measure the correct rates of all the classes, so it is not an appropriate metric for imbalance data sets. For this reason, in addition to accuracy, more suitable metrics should be considered in the imbalance problem. Other metrics have been proposed to measure the classification performance independently. Based on the confusion matrix (Table 1), these metrics are defined aswhere is a coefficient to adjust the relative importance of precision versus recall (usually ).

The used combined evaluation metrics of these measures include receiver operating characteristic (ROC) graphic ‎[31], the area under the ROC curve (AUC) ‎[2], average geometric mean of sensitivity and specificity (GeoMean) ‎[32] (see (4)), and average adjusted geometric mean (AGeoMean) ‎[33] (see (5)), where refers to the proportion of the majority samples. These metrics are defined as

2.2. Stacking

Bagging and AdaBoost are the most common ensemble learning algorithms. In bagging method, different base classifier models generate different classification results and the final decision is decided by majority voting. In AdaBoost algorithm, a series of base weak classifiers are trained with the whole training set and the final decision is generated by a weighted majority voting scheme. In each round of training iteration, different weights are attributed to each instance. In the two algorithms, the base classifiers are the same.

Stacking ensemble is another ensemble algorithm, in which the prediction result of base classifiers is used as the attribute to train the combination function in metalayer classifier ‎[25]. The algorithm has a two-level structure consisting of Level 0 classifiers and Level 1 classifiers. It was proposed by Wolpert and used by Breiman ‎[34] and Leblanc and Tibshirani ‎[35].

Set a data set:where is a vector representing the attribute values of the instance and is the class value. All instances are randomly split into equivalent parts and -fold cross-validation is used to train the model. The prediction results of every time testing set gives a vector which includes instances:

Set is the difference base learning algorithm model obtained with the data set , , and is the part of Level 0 models. Level 0 layer consists of base classifiers, which are employed to estimate the class probability of each instance.

In , -fold cross-validation of training data set gives the prediction result:

All the prediction results generate input space of Level 1 model and the real value of instance is treated as the output space. The model is expressed as follows:

The above intermediate data are considered as the training data of the Level 1 layer model. The input data are treated as features and the real value of instance is treated as the output space. The next step is to train the data with some fused leaning algorithms. The process is called Level 1 generalizer and Level 1 model is denoted by , which can be regarded as the function of .

In the stacking process, Level 0 (, ) is combined with Level 1 . Given a new instance , models produce a prediction result vector:

Then, model is used to combine base classifiers and predict the final result of .

In this paper, for imbalance data, we propose a stacked generalization based on cost-sensitive classification. In Level 0 model generalizer layer, we firstly resample the imbalance data. Resampling approaches include oversampling and undersampling. Secondly, Model , , is trained by the classification with new data, and the prediction output by is produced with original data. In Lever 1 model generalizer layer, model is generated by cost-sensitive classification based on logistic regression.

3. Stacked Generalization for the Imbalance Problem

The proposed RECSG architecture includes two layers. The first layer (Level 0) consisting of classifiers ensemble is called the base layer; the second layer (Level 1) combined with base classifiers is called the metalayer. The flowchart of the architecture is shown in Figure 1.

3.1. Level 0 Model Generalizers

For the imbalance problem, Level 0 model generalizer step of RECSG includes preprocessing data and training the base classifier. Firstly, the oversampling (SMOTE) method has been used in data preprocessing of base classifier. Secondly, the base classifier model is trained with the new data set. In this level, we employed three base algorithms: Naïve Bayes (NB) ‎[36], decision tree C4.5 ‎[37], and -nearest neighbors (-NN) ‎[38].

(1) Naïve Bayes (NB). Given instance , set is the posterior probability of class , then where NB uses an Laplacian estimate for estimating the conditional probabilities.

(2) Decision Tree (C4.5). Decision tree classification is a method usually used in data mining ‎[39]. A decision tree is a tree, where each input feature labels a nonleaf node, one class, or probability over the class labels each leaf node of the tree. By recursive partitioning, a tree can be “learned” until no prediction value is added or the subset has the same value. When the parameter of training data selects information entropy, the decision tree is named C4.5.

(3) -NN. k-NN is a nonparametric algorithm used for regression or classification. An instance is classified by a majority vote of its -nearest neighbors. If , the instance is simply classified in accordance with the class label of the nearest neighbor.

All the above algorithms are simple and have low complexity, so they are applicable weak base classifiers.

3.2. Level 1 Model Generalizers

Prediction results of several base classifiers in Level 0 layer are used as input space and true value class of instance is used as output space. Based on Level 0 layer, Level 1 layer model is trained by other learning algorithms. For imbalanced data, the cost-sensitive algorithm is used to train the Level 1 metalayer classifier in the paper.

3.2.1. Cost-Sensitive Classifier

Aiming at imbalanced data learning problem, in cost-sensitive classifier, different cost matrices are used as the penalty of misclassified instance ‎[40]. For example, a cost matrix has the following structure in a binary classification scenario in Table 2.

In the cost matrix, the row indicated alternative predicted classer, whereas the column indicates actual classes. The cost of false negative is notated as and the cost of false positive is notated as . Conceptually, the cost of correctly classified instances should always be less than the cost of incorrectly classified instances. In the imbalance problem, is always greater than . For the German credit data set previously reported as the part of the Stalog project ‎[41], the cost matrix is provided in Table 3.

The cost of false good prediction is greater than the cost of false bad prediction in the view of economical reason. So, in Level 1 classifier of stacking for the imbalance problem, the cost-sensitive algorithm is adopted.

3.2.2. Logistic Regression

Ting and Witten illustrated that MLR (Multiresponse Linear Regression) algorithm had an advantage over other Level 1 generalizers in stacking ‎[42]. In this paper, the logistic regression classifier is used as the metalayer algorithm. Based on the metalayer, the cost of misclassification is considered in the cost-sensitive algorithm. In logistic regression classifier, the prediction result of Level 0 layer is used as the attribute of Level 1 metalayer and the real value of instance is used as output space. The linear regression for class is simply obtained as

Details of the implementation of RECSG are presented below.

3.3. Algorithm Pseudocode

Pseudocode 1 presents the proposed RECSG approach. Input parameters are two data sets: training set and testing set. Output predicts class labels of the test samples. The first step in the process is data preprocessing, resample new instances, and train model , where is the number of base classifiers and is the generated function of based on model (lines - in Pseudocode 1).

Input Training set , Test dataset
Output Predict class labels of the test samples
For each do
(1) Resample imbalance data and generate –fold cross-validation sets to obtain New ;
(2) Train and compute and in Level-0 (base)-layer classifier
end
(3) Construct , and
(4) Based on the data , classification (cost-sensitive and Logistic Regression) is used to
generate Level-1 (meta)-layer model , through with to predict

Then, the metalayer model is constructed based on the data , which is firstly predicted with Level 0 layer (line in Pseudocode 1). Finally, Lever 1 layer classification (cost-sensitive and logistic regression) is used to predict the ultimate value of the tested samples.

4. Empirical Investigation

The experiment aims to verify whether the RECSG approach can improve the classification performance for the imbalance problem. In this paper, the RECSG approach was compared with other algorithms involving various combination ensemble techniques and imbalanced approaches. For each method, the same training set, testing set, and validation set were used.

The experimental system has been implemented and is composed of 7 learning methods implemented in Weka ‎[43], namely, Naïve Bayes (NB), C4.5(j48), -nearest neighbor (k-NN), cost-sensitive, AdaBoost, Bagging, and stacking. The brief description, the standard version, and parameters of these methods are shown in Table 4.

The RECSG approach includes 2 layers. Level 1 layer (metalearning) system consists of 2 learning methods implemented in Weka, namely, simple logistic regression (MLP), and cost-sensitive classifier. MLP is the base classification algorithm of cost-sensitive classifier. Level 0 layer (base -learning) of stacking is composed of 3 classifiers: Naïve Bayes, C4.5, and -NN. Evaluation metrics of algorithm performance include AUC, GeoMean, and AGeoMean.

4.1. Experimental Settings

Experiments were implemented with 17 data sets from the UCI Machine Learning Repository ‎[44]. These data sets cover various fields and are based on IR measure values (from 0.54 to 0.014), unique data set names, a varying number of samples (from 173 to 2338), and variations in the amount of class overlap (see the KEEL repository ‎[45]). Multiclass data sets were modified to obtain two-class imbalanced problems so that the union of one or more classes became the positive class and the union of one or more of the remaining classes was labeled as the negative class. A brief description of the used date set is presented in Table 5. It includes the total number of instances (#Sam.), the total number of each class instances (#Min., #Maj.), the imbalance ratio (IR = the ratio of the number of minority class instance to majority class instance), and the number of features (#Fea.).

Our system was compared with other ensemble algorithms including AdaBoost with Naïve Bayes, AdaBoost with cost-sensitive, bagging with Naïve Bayes, bagging with cost-sensitive, and stacking cost-sensitive with NB, k-NN, C4.5, and logistic regression. All the experiments were performed by 10-fold cross-validation.

4.2. Experimental Results

Tables 6, 7, and 8, respectively, show the results of the three metrics (AUC, GeoMean, and AGeoMean) for the algorithms in Table 4 obtained with the data set in Table 5. The best results are emphasized in boldface on each data set in these tables.

The results showed that the performance of the proposed RECSG method was the best for 12 of 17 data sets in terms of GeoMean and AGeoMean and for 10 of 17 data sets in terms of AUC. Some methods are better than others in some evaluation metrics, but not in all metric and most data sets.

In Table 6 (AUC), the performance of the RECSG method is better than that of the 3 single-base classification methods for 14 of 17 data sets and better than that of other 5 ensemble algorithms for 13 of 17 data sets. In Table 7 (GeoMean), the RECSG method outperforms all the 3 single-base classification methods in 15 of 17 data sets and outperforms other 5 ensemble algorithms in 15 of 17 data sets. In Table 8 (AGeoMean), the RECSG method outperforms all the 3 single-base classification methods in 14 of 17 data sets and outperforms other 5 ensemble algorithms in 16 of 17 data sets. For the 17 data sets, the evaluation metrics of GeoMean and AGeoMean are better than AUC.

4.3. Statistical Tests

Statistical test ‎[46] is adopted to compare different algorithms. In the paper, we use two types of comparisons: pairwise comparison (between a pair of algorithms) and multiple comparisons (among a group of algorithms).

4.3.1. Pair of Algorithms

We performed statistic -tests to explore whether RECSG approach is the significantly better than other algorithms in the three metrics (AUC, GeoMean, and AGeoMean). Table 9 shows the results of RECSG approach compared with the other methods in terms of AUC, GeoMean, and AGeoMean. The values in the square brackets indicate the number of the metrics with statistically significant difference in the -test performed with the confidence level in the three evaluation metrics. As for the evaluation metric of AUC, the RECSG outperformed NB for 11 of 17 data sets and 5 data sets showed the statistically significant difference with the confidence level of .

4.3.2. Multiple Comparisons

We performed Holm post hoc test ‎[47] and selected multiple groups to test the three metrics (AUC, GeoMean, and AGeoMean). The post hoc procedures can determine whether a comparison hypothesis should be rejected at a specified confidence level α. Statistical experiment was performed in the platform on the website http://tec.citius.usc.es/stac/. Table 10 provides the results that RECSG approach is significantly different with the confidence level of in terms of AUC, GeoMean, and AGeoMean. indicates that there is no significant difference. According to the evaluation metric of GeoMean, the RECSG is significantly better than other 8 methods with the confidence level of .

4.4. Discussion

According to the experiments performed with 17 different imbalance data sets and the comparison with other 9 classification methods, the proposed RECSG method showed the higher performance than other methods in terms of AUC, GeoMean, and AGeoMean, as illustrated in Tables 7, 8, and 9. The RECSG method showed the better performance for 10 of 17 cases in terms of AUC and for 12 of 17 cases in terms of GeoMean and AGeoMean. The method outperformed other 5 ensemble algorithms for 16 of 17 data sets in terms of GeoMean and AGeoMean and for 13 of 17 data sets in terms of AUC. GeoMean and AGeoMean are better than AUC in evaluation metrics in 17 data sets. The means of RECSG method in terms of GeoMean, AGeoMean, and AUC are all higher than other methods. The Holm post hoc test shows that RECSG method significantly outperforms the rest with the confidence level in terms of AGeoMean, for 8 of 9 methods in terms of AGeoMean, and for 3 of 9 methods in terms of AUC.

Experimental results and statistical tests show that the RECSG approach has improved the classification performance of imbalanced data sets. The reasons can be explained as follows.

Firstly, stacking algorithm uses to combine the result of base classifier, whereas bagging employs majority vote. Bagging is only a simple decision combination method which requires neither cross-validation nor Level 1 learning. In this paper, stacked generalization adopts logistic regression, thus providing the simplest linear combination of pooling the Level 0 models’ confidence.

Secondly, cost-sensitive algorithm can affect imbalance data sets in two aspects. Firstly, cost-sensitive modifications can be applied in the probabilistic estimation. Moreover, cost-sensitive factors change the weight of instance, and the weight of the minority class is higher than that of the majority class. Therefore, the method directly changes the number of common instances without discarding or duplicating any of the rare instances.

Thirdly, RECSG method introduces cost-sensitive learning into stacking ensemble. The cost-sensitive function in replaces the error-minimizing function, thus allowing Level 1 learning to be prone to focus on the minority class.

The results in Tables 7 and 8 demonstrate that the RECSG method has the higher performance when the evaluation performance metric of base classifier is weaker (such as Vehicle1, Vehicle0, and car-vgood). The reason is that the alternative methods of NB, C4.5, and -NN have shortcomings. The independence assumption hampers the performance of NB in some data sets. In C4.5 tree construction, the selection of attributes affects the model performances. Error ratio of -NN is relatively high when data sets are in imbalance. Therefore, logistic regression adopted in can improve the performance when base classifier is weaker.

The performance of the RECSG method is generally better when the IR is low (such as Glass2, car-good, flare-F, car-vgood, and abalone-17_vs_7-8-9-10). The performance of RECSG method is probably related to the setting of the cost matrix. Different data sets should use different cost matrices. For the purpose of simplicity in the paper, we have adopted the same cost matrices, which may be more suitable to low IR values.

5. Conclusions

In this paper, in order to solve the class imbalance problem, we proposed the RECSG method based on 2-layer learning models. Experimental results and statistical tests showed that the RECSG approach improved the classification performance. The proposed RECSG approach may have the relatively high computational complexity in the training stage because the approach involves 2-layer classifier models which consist of several base classifiers and metaclassifiers. The number and kinds of the base-level classifiers are closely related to the performance of stacking algorithm. In the fusion stage of base classifiers, the selection of the metaclassifier is also important. In this paper, in order to validate the performance improvement compared with other current classification algorithms, we only randomly selected 3 classification algorithms (NB, -NN, and C4.5) as base classifier and cost-sensitive algorithm as metaclassifier in the fuse stage. Selection of the number or kind of the base classifiers and metaclassifier was not discussed. Therefore, we will explore the diversity and quality of base classifier in the future. The adoption of these strategies would improve the prediction performance and reduce the training time of stacking algorithm for imbalance problems.

Conflicts of Interest

The author declares that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This research is supported by the Natural Science Foundation of Shanxi Province under Grant no. 2015011039.

References

O. Loyola-González, J. F. Martínez-Trinidad, J. A. Carrasco-Ochoa, and M. García-Borroto, “Effect of class imbalance on quality measures for contrast patterns: an experimental study,” Information Sciences, vol. 374, pp. 179–192, 2016.
View at: Publisher Site | Google Scholar
C. Beyan and R. Fisher, “Classifying imbalanced data sets using similarity based hierarchical decomposition,” Pattern Recognition, vol. 48, no. 5, pp. 1653–1672, 2015.
View at: Publisher Site | Google Scholar
H. He and E. A. Garcia, “Learning from imbalanced data,” IEEE Transactions on Knowledge and Data Engineering, vol. 21, no. 9, pp. 1263–1284, 2009.
View at: Publisher Site | Google Scholar
V. C. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: synthetic minority over-sampling technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, 2002.
View at: Google Scholar
X.-Y. Liu, J. X. Wu, and Z.-H. Zhou, “Exploratory under-sampling for class-imbalance learning,” in Proceedings of the 6th International Conference on Data Mining (ICDM '06), pp. 965–969, IEEE, Hong Kong, December 2006.
View at: Publisher Site | Google Scholar
N. Japkowicz, “The class imbalance problem: significance and strategies,” in Proceedings of the International Conference on Artificial Intelligence, 2000.
View at: Google Scholar
B. Zadrozny and C. Elkan, “Learning and making decisions when costs and probabilities are both unknown,” in Proceedings of the the seventh ACM SIGKDD international conference, pp. 204–213, San Francisco, Calif, USA, August 2001.
View at: Publisher Site | Google Scholar
M. Galar, A. Fernandez, E. Barrenechea, H. Bustince, and F. Herrera, “A review on ensembles for the class imbalance problem: bagging, boosting, and hybrid-based approaches,” IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, vol. 42, no. 4, pp. 463–484, 2012.
View at: Publisher Site | Google Scholar
N. V. Chawla, D. A. Cieslak, L. O. Hall, and A. Joshi, “Automatically countering imbalance and its empirical relationship to cost,” Data Mining and Knowledge Discovery, vol. 17, no. 2, pp. 225–252, 2008.
View at: Publisher Site | Google Scholar | MathSciNet
A. Freitas, A. Costapereira, and P. Brazdil, “Cost-Sensitive Decision Trees Applied to Medical Data,” in International Conference on Data Warehousing and Knowledge Discovery, pp. 303–312, 2007.
View at: Google Scholar
L. Breiman, “Bagging predictors,” Machine Learning, vol. 24, no. 2, pp. 123–140, 1996.
View at: Google Scholar
R. E. Schapire, “The strength of weak learnability,” Machine Learning, vol. 5, no. 2, pp. 197–227, 1990.
View at: Publisher Site | Google Scholar
Y. Freund and R. E. Schapire, “A decision-theoretic generalization of on-line learning and an application to boosting,” Journal of Computer and System Sciences, vol. 55, no. 1, part 2, pp. 119–139, 1997.
View at: Publisher Site | Google Scholar | MathSciNet
N. V. Chawla, A. Lazarevic, L. O. Hall, and K. W. Bowyer, “SMOTEBoost: improving prediction of the minority class in boosting,” in Proceeding of Knowledge Discovery in Databases, vol. 2838, pp. 107–119.
View at: Publisher Site | Google Scholar
C. Seiffert, T. M. Khoshgoftaar, J. Van Hulse, and A. Napolitano, “RUSBoost: A hybrid approach to alleviating class imbalance,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 40, no. 1, pp. 185–197, 2010.
View at: Publisher Site | Google Scholar
S. Hu, Y. Liang, L. Ma, and Y. He, “MSMOTE: Improving classification performance when training data is imbalanced,” in Proceedings of the 2nd International Workshop on Computer Science and Engineering, WCSE 2009, pp. 13–17, China, October 2009.
View at: Publisher Site | Google Scholar
H. Guo and H. L. Viktor, “Learning from imbalanced data sets with boosting and data generation,” ACM SIGKDD Explorations Newsletter, vol. 6, no. 1, pp. 30–39, 2004.
View at: Publisher Site | Google Scholar
S. Wang and X. Yao, “Diversity analysis on imbalanced data sets by using ensemble models,” in Proceedings of the 2009 IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2009, pp. 324–331, USA, April 2009.
View at: Publisher Site | Google Scholar
R. Barandela, J. S. S'anchez, and R. M. Valdovinos, “New applications of ensembles of classifiers,” PAA. Pattern Analysis and Applications, vol. 6, no. 3, pp. 245–256, 2003.
View at: Publisher Site | Google Scholar | MathSciNet
J. Błaszczyński, M. Deckert, J. Stefanowski, and S. Wilk, “Integrating selective pre-processing of imbalanced data with Ivotes ensemble,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics): Preface, vol. 6086, pp. 148–157, 2010.
View at: Publisher Site | Google Scholar
W. Fan, S. J. Stolfo, J. Zhang, and P. K. Chan, “Adacost: Misclassication cost-sensitive boosting,” in the 6th Int. Conf. Mach. Learning, pp. 97–105, San Francisco, CA, USA, 1999.
View at: Google Scholar
K. M. Ting, “A comparative study of cost-sensitive boosting algorithms,” in Proc. 17th Int. Conf. Mach. Learning, pp. 983–990, Stanford, CA, USA, 2000.
View at: Google Scholar
M. Joshi, V. Kumar, and R. Agarwal, “Evaluating boosting algorithms to classify rare classes: comparison and improvements,” in Proceedings of the 2001 IEEE International Conference on Data Mining, pp. 257–264, San Jose, CA, USA.
View at: Publisher Site | Google Scholar
Y. Sun, M. S. Kamel, A. K. C. Wong, and Y. Wang, “Cost-sensitive boosting for classification of imbalanced data,” Pattern Recognition, vol. 40, no. 12, pp. 3358–3378, 2007.
View at: Publisher Site | Google Scholar
D. H. Wolpert, “Stacked generalization,” Neural Networks, vol. 5, no. 2, pp. 241–259, 1992.
View at: Publisher Site | Google Scholar
Y. Chen, M. L. Wong, and H. Li, “Applying Ant Colony Optimization to configuring stacking ensembles for data mining,” Expert Systems with Applications, vol. 41, no. 6, pp. 2688–2702, 2014.
View at: Publisher Site | Google Scholar
H. Kadkhodaei and A. M. E. Moghadam, “An entropy based approach to find the best combination of the base classifiers in ensemble classifiers based on stack generalization,” in Proceedings of the 4th International Conference on Control, Instrumentation, and Automation, ICCIA 2016, pp. 425–429, Iran, January 2016.
View at: Publisher Site | Google Scholar
I. Czarnowski and P. Jędrzejowicz, “An approach to machine classification based on stacked generalization and instance selection,” in Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, SMC 2016, pp. 4836–4841, Hungary, 2017.
View at: Publisher Site | Google Scholar
S. Kotsiantis, “Stacking cost sensitive models,” in Proceedings of the 12th Pan-Hellenic Conference on Informatics, PCI 2008, pp. 217–221, Greece, August 2008.
View at: Publisher Site | Google Scholar
H. Y. Lo, J. C. Wang, H. M. Wang, and S. D. Lin, “Cost-sensitive stacking for audio tag annotation and retrieval,” in Proceedings of the 36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011, pp. 2308–2311, Czech Republic, May 2011.
View at: Publisher Site | Google Scholar
J. Huang and C. X. Ling, “Using AUC and accuracy in evaluating learning algorithms,” IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 3, pp. 299–310, 2005.
View at: Publisher Site | Google Scholar
M. Kubat and S. Matwin, “Addressing the curse of imbalanced training sets: one sided selection in,” in Proceedings of the 14th International Conference on Machine Learning (ICML'97, pp. 179–186, 1997.
View at: Google Scholar
R. Batuwita and V. Palade, “Adjusted geometric-mean: A novel performance measure for imbalanced bioinformatics datasets learning,” Journal of Bioinformatics and Computational Biology, vol. 10, no. 4, Article ID 1250003, 2012.
View at: Publisher Site | Google Scholar
L. Breiman, “Stacked regressions,” Machine Learning, vol. 24, no. 1, pp. 49–64, 1996.
View at: Publisher Site | Google Scholar
M. Leblanc and R. Tibshirani, “Combining estimates in regression and classification,” Journal of the American Statistical Association, vol. 91, no. 436, pp. 1641–1650, 1996.
View at: Publisher Site | Google Scholar | MathSciNet
B. Cestnik, “Estimating Probabilities: A Crucial Task in Machine Learning,” in Proceedings of the European Conference on Artificial Intelligence, pp. 147–149, 1990.
View at: Google Scholar
J. R. Quinlan, C4.5: Program for machine learning, Morgan Kaufmann, 1993.
N. S. Altman, “An introduction to kernel and nearest-neighbor nonparametric regression,” The American Statistician, vol. 46, no. 3, pp. 175–185, 1992.
View at: Publisher Site | Google Scholar | MathSciNet
J. R. Quinlan, “Induction of decision trees,” Machine Learning, vol. 1, no. 1, pp. 81–106, 1986.
View at: Publisher Site | Google Scholar
C. Elkan, “The foundations of cost-sensitive learning,” in Proceedings of the 17th International Joint Conference on Artificial Intelligence (IJCAI '01), pp. 973–978, New York, NY, USA, August 2001.
View at: Google Scholar
D. Michie, D. Spiegel halter J, C. Taylor et al., Machine learning, neural and statistical classification, Ellis Horwood, neural and statistical classification, 1995.
K. M. Ting and I. F. Witten, “Issues in stacked generalization,” Artif. Intell. Res, vol. 10, no. 1, pp. 271–289, 1999.
View at: Google Scholar
M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten, “The WEKA data mining software: an update,” ACM SIGKDD Explorations Newsletter, vol. 11, no. 1, pp. 10–18, 2009.
View at: Publisher Site | Google Scholar
A. Frank and A. Asuncion, UCI Machine Learning Repository, University of California, School of Information and Computer Science, Irvine, CA, USA, http://archive.ics.uci.edu/ml.
J. Alcalá-Fdez, A. Fernández, J. Luengo et al., “KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework,” Journal of Multiple-Valued Logic and Soft Computing, vol. 17, no. 2-3, pp. 255–287, 2011.
View at: Google Scholar
S. García, A. Fernández, J. Luengo, and F. Herrera, “A study of statistical techniques and performance measures for genetics-based machine learning: accuracy and interpretability,” Soft Computing, vol. 13, no. 10, pp. 959–977, 2009.
View at: Publisher Site | Google Scholar
S. Holm, “A simple sequentially rejective multiple test procedure,” Scandinavian Journal of Statistics, vol. 6, no. 2, pp. 65–70, 1979.
View at: Google Scholar | MathSciNet

Copyright

Copyright © 2018 Jianhong Yan and Suqing Han. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

3474

Downloads

1752

Citations