A Novel Approach to Ensemble Classifiers: FsBoost-Based Subspace Method
In this article, an algorithm is proposed for creating an ensemble classifier. The name of the algorithm is the F-score subspace method (FsBoost). According to this method, the features are selected with the F-score and classified with different or the same classifiers. In the next step, the ensemble classifier is created. Two versions that are named FsBoost.V1 and FsBoost.V2 have been developed based on classification by the same or different classifiers. According to the results obtained, the results are consistent with the literature. Besides, a higher accuracy rate is obtained compared with many algorithms in the literature. The algorithm is fast because it has a few steps. It is thought that the algorithm will be successful due to these advantages.
An ensemble classifier is a method in which multiple classifications are used together to improve classification performance [1, 2]. For example, when three classifiers are used to classify an object, the classifier works like this: if the first classifier is classified as a cat, the second classifier is classified as a dog, and the third classifier is classified as a cat, the ensemble classifier generates the result by taking the average of these decisions. There are many ways to create an ensemble classifier. Some of the most commonly used ones are (1) adaptive resampling and combining (boosting) , (1.1) AdaBoost (adaptive boosting) , (2) bagging (bootstrap aggregating) , and (3) random subspace .
The boosting method can create powerful classifiers by combining and training weak classifiers . The most commonly used boosting method is AdaBoost . The AdaBoost method tries to improve performance by focusing on misclassified instances . In the bagging method, classifiers trained with different training sets randomly selected (random sampling method) from the dataset are combined . Outputs of classifiers are combined with majority voting or weighted voting . In the random subspace method, feature subsets are generated by randomly selecting from samples . Each subset has an element and an feature . In other words, a subset of features is created, not a subset of instances . In this way, the training process is accelerated. These subclasses and classifiers are trained to form ensemble classifiers. The outputs of the classifiers are combined with majority voting or weighted voting.
These algorithms have a disadvantage. As the education levels for AdaBoost increase, the number of samples decreases, and training becomes more difficult. There is a need for more samples for training. It is quite slow because the training stages are too much [7, 8]. The bagging method involves complex calculations . Both methods require many iterations . So, the success rate is usually lower than the random forest method . These models cannot explain the dataset by modeling it as decision trees . When these disadvantages are taken into consideration, these methods are still in need of improvement. In this study, a new ensemble algorithm based on the F-score feature selection algorithm has been developed to reduce the processing load of existing ensemble algorithms and to increase the accuracy rate.
Feature selection algorithms are often used in the machine learning field to improve the performance of systems [9–11]. In the field of machine learning, datasets are used in a variety of sizes and types [12–15]. Large size data will cause the classifier to lengthen the training duration. Feature selection algorithms have been developed to solve this problem [9, 16, 17]. They do this by clearing irrelevant data when holding relevant data . Thus, data size, process load, and training time decrease while classification accuracy increases [9, 18]. Many feature selection algorithms have been developed in the literature [1, 16]. However, in this study, the F-score feature selection algorithm is used because it can work fast, and its performance is good . Feature selection algorithms can be used in many places such as health areas [19–22].
In this study, two different methods have been developed, namely, FsBoost.V1 and FsBoost.V2, based on the F-score feature selection algorithm that can enhance training performance for ensemble classifiers. The FsBoost.V1 method is like the random subspace method. However, the features are chosen concerning the data label, not random. Selected datasets are classified with a single classifier, and then ensemble classifier 1 is created. This process is repeated for three or more different classifiers. Eventually, ensemble classifiers for three different classifiers are merged. In this way, it is ensured that unnecessary data are removed from the training process. The operation can be interrupted first in the ensemble classifier. In the FsBoost.V2 method, all data are classified with different classifiers. In the second step, the subfeature space is created by the F-score feature selection algorithm and reclassified. The ensemble classifier is created because of classification. This process was repeated a second time. Eventually, ensemble classifiers for three different classifiers are merged. The use of a single classifier reduces the cost. Only relevant features are retrieved by using the F-score feature selection algorithm. This process accelerates the training process. Complexity is less than other algorithms.
2. Materials and Methods
The operation was performed according to the flow in Figure 1. Firstly, the records to be used in the study were collected. Then, features were selected with the F-score feature selection algorithm. Finally, the data are classified with different classifiers, and their performances are calculated. When these operations are performed, ensemble classifiers are created, and their performances are calculated at different levels and formations.
2.1. Collection of Data
The data used in the study were downloaded from the Machine Learning Repository website of the University of California, Irvine (UCI) [23, 24]. The data consist of 4 groups (A/B/C/D) belonging to epilepsy patients (Table 1). Records include EEG records of individuals. Each record is 23.6 seconds. 2300 EEG recordings were taken during the epileptic seizure. The other 2300 records (nonepilepsy) were recorded while in a healthy condition. However, the records belong to epileptic patients. The epilepsy data in each set are the same. However, nonepilepsy records are different. The database contains 178 features for each EEG recording.
2.2. F-Score Feature Selection Algorithm
The F-score is one of the feature selection algorithms that helps distinguish classes from each other . To select the feature, an F-score value () is calculated for each feature (equation (1)). The F-score threshold value () is determined by taking the average of all F-score values. For the th feature, if , th feature is selected. This step is repeated for each feature.
The variables in equation (1) are (1) feature vector , and . (2) , , respectively, are the positive (+) and negative (−) total number of elements in the class, and . (3) is the feature number. (4) , , and are the average value, mean value in the negative class, and mean value in the positive class of the th property, respectively. (5) represents the th positive example of the th feature. represents the th negative example of the th feature.
In the study, A, B, C, and D dataset features were selected with the F-score (Table 2). Feature selection has been applied twice.
2.3. Ensemble Classifier
The ensemble classifier is a system created by combining different classifiers to produce safer and more stable estimates . The system is built with classifiers. can be single or double. While classifying according to the feature vector, for each feature vector 1, each classifier generates an output value. The output values produced are counted. Then, the output of the ensemble classifier is determined by the number of votes. If the number of classifiers is even, the average of the decision values of the classifiers is rounded off, and the decision of the ensemble classifier is determined. This process applies to all feature vectors. The ensemble classifier was prepared in MATLAB using three different classifiers: kNN, PNN, and SVMs .
The kNN is one of the machine learning classification methods with advisory learning [28, 29]. Under the structure of the training dataset, classification is done according to nearest of the new classifier. In this study, was selected, and ten distance calculation formulas were used. These include Spearman, Seuclidean, Minkowski, Mahalanobis, Jaccard, Hamming, Euclidean, Cosine, Correlation, and Cityblock.
PNN is a statistical classification algorithm based on kernel and Bayesian . The method is developed based on feedforward networks . The classifier takes care of all class elements when processing . The radial-based kernel function calculates the distance between class samples. The user in the PNN classifier can manipulate the spread parameter. As the spread parameter approaches zero, the network begins to behave like the nearest neighbor classifier . This value when farther away from zero, the classifier classifies, considering several vectors that separate data from each other . In the study, PNN networks were designed with a total of 500 different values ranging from 0.01 to 5 steps of the spread parameter, with 0.01 step range. At the end of the study, the best performing network parameters and performance criteria were calculated.
SVMs are among the best machine learning algorithms . They can be used in the regression analysis as well as classification . SVMs try to separate datasets from each other with a linear and nonlinear line. The purpose of the SVM algorithm is to be able to distinguish between the data with the minimum error . Gaussian or radial basis function (RBF) kernel (rbf) was used in the study. The BoxConstraint box limit is set between 1 and 100 so that the best performance can be achieved.
2.4. Ensemble Classifier Powered by the F-Score
In this study, two different ensemble classifiers, namely, Classifier-FsBoost.V1 and Feature-FsBoost.V2, were developed.
2.4.1. Classifier-Based Ensemble Classifier: FsBoost.V1
The implementation steps of this method are shown in detail in Figure 2. Accordingly to this, firstly, a dataset (A) is classified in a classifier (kNN). In the second step, the first feature selection is performed and again classified in the same classifier (kNN). In the third step, the first and second feature selection are performed and again classified in the same classifier (kNN). Thus, it is classified in three different steps, but only in a classifier (kNN). These three results are combined to form the kNN ensemble. The same process is repeated in PNN and SVMs. Eventually, the kNN ensemble, the PNN ensemble, and the SVM ensemble are combined into a single ensemble classifier.
2.4.2. Feature-Based Ensemble Classifier: FsBoost.V2
The steps for this method are shown in Figure 3. Accordingly to this, firstly, a dataset (A) is classified by each classifier (kNN, PNN, and SVMs). These three classifiers are combined to obtain ensemble classifier 1. In the second step, the first feature selection is performed, and the process in the first step is repeated. In the third step, the first and second property selection steps are performed together, and then the first process is repeated. Ensemble 1, 2, and 3 classifiers are combined to create the ensemble classifier.
2.5. Performance Evaluation Criteria and Distribution of Data for Classification
Different performance evaluation criteria were used to test the accuracy rates of the proposed systems. These are accuracy rates, sensitivity, specificity, kappa value, receiver operating characteristic (ROC), area under a ROC (AUC), and k (10-fold) cross-validation accuracy.
While classifying the datasets, they were divided into two groups: training (50%) and test (50%) (Table 3).
The work aims to develop a new algorithm to improve the ensemble classifier performance. We have developed an algorithm (FsBoost) that is similar to the random subspace method but with less workload, faster running, and better performance. F-score feature selection algorithm based on this method has two versions (FsBoost.V1 vs. FsBoost.V2). The ensemble classifier is created with a single classifier in FsBoost.V1 (Şekil 2, Level 1) and at least three different classifiers in FsBoost.V2 (Şekil 3, Level 1). The developed algorithms were tested with four two-class datasets (A, B, C, and D) (Table 3).
According to the FsBoost algorithm, the dataset features were selected twice using the F-score feature selection algorithm. For example, according to FsBoost.V1, the dataset (A) is classified with the same classifier after each property selection (Figure 2, Level 1—kNN1, kNN2, and kNN3) (Table 4). kNN ensemble was formed by combining classifiers of three kNNs (Figure 2, Level 1) (Table 5). This process was repeated with three different classifiers to create PNN ensemble and SVM ensemble (Figure 2, Level 1) (Table 5). Then, the kNN ensemble, PNN ensemble, and SVM ensemble were combined to form the final ensemble classifier (Figure 2, Level 2) (Table 5). This process is repeated for each dataset (Tables 4–8).
In FsBoost.V2, the dataset (A) is classified with different classifiers after each feature selection (Figure 3, Level 1—kNN1, PNN1, and SVM1) (Table 4). These three classifiers were combined to create ensemble 1 (Figure 3, Level 1—ensemble 1) (Table 9). Then, ensemble 1, ensemble 2, and ensemble 3 were combined to form the final ensemble classifier (Figure 3, Level 2) (Table 9). This process is repeated for each dataset (Tables 4–7 and 9). Finally, the FsBoost ensemble algorithm is also compared with the ensemble algorithms available in the literature (Table 10).
Accuracy rates for FsBoost.V1 and FsBoost.V2 are higher than those for single classifiers (Table 10). The FsBoost algorithm is well ranked compared to other boosting algorithms in the literature (Table 10). FsBoost.V1—Level 1—SVM ensemble method is the best method when compared with the literature (Table 10, Rank).
Three different datasets were used to reconfirm the results obtained. The distribution of datasets is shown in Table 11.
In order to compare the FsBoost algorithm with boosting algorithms, three different datasets were reanalyzed. The results obtained from the analysis are summarized in Table 12. According to the results, the algorithm with the average best performance is the FsBoost.V1 Level 2 ensemble algorithm.
4. Discussion and Conclusion
FsBoost is one of the best algorithms developed until now [4–7]. This method has very few steps. In this way, it provides results faster. A high accuracy rate is a distinct advantage. Algorithms with high accuracy and fast results are preferred in medical data classification. In this regard, FsBoost may be preferred.
FsBoost contains fewer calculations and steps than the algorithms in the literature [4–7]. The accuracy rate is very good compared with other algorithms (Table 10) . Considering these advantages, FsBoost may be a commonly used algorithm soon.
FsBoost can be used with three or more classifiers. Besides, FsBoost.V1 is a version of FsBoost that can be used with a single classifier. Achieving high performance with a single classifier is a distinct advantage of FsBoost.V1. The F-score feature selection algorithm creates this advantage. By combining different features, the same data can be interpreted differently. If the classifiers are strong, FsBoost increases in performance. Therefore, it is recommended that the algorithm is used with robust classifiers. Ensemble classifiers often bring out a strong classifier by combining weak classifiers. This is the weakness of FsBoost.
As a result, we can say that FsBoost is an alternative method to create an ensemble classifier. A high-performance ensemble classifier can be created with a powerful classifier and the F-score feature selection algorithm.
The datasets in our paper could be downloaded from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets/index.html). The authors can send all the datasets based on the readers’ requests.
Conflicts of Interest
The authors declare no conflicts of interest.
This project was funded by the Deanship of Science Research (DSR) at King Abdulaziz University, Jeddah, Saudi Arabia, under grant no. RG-2-611-40. The authors, therefore, acknowledge with thanks to DSR for the technical and financial support.
Y. Freund and E. Robert, “Schapire. Experiments with a new boosting algorithm,” in Proceedings of the ICML ’96: 13th International Conference on Machine Learning, pp. 148–156, Bari Italy, July 1996.View at: Google Scholar
R. Rojas, AdaBoost and the Super Bowl of Classifiers A Tutorial Introduction to Adaptive Boosting, 2009.
B. Peter, “Bagging, boosting and ensemble learning,” in Handbook of Computational Statistics: Concepts and Methods, J. E. Gentle, W. Karl Härdle, and Y. Mori, Eds., pp. 1–38, Springer-Verlag Berlin Heidelberg, Heidelberg, Germany, 2012.View at: Google Scholar
K. Polat, S. Şahan, H. Kodaz, and S. Güneş, “Breast cancer and liver disorders classification using artificial immune recognition system (AIRS) with performance evaluation by fuzzy resource allocation mechanism,” Expert Systems with Applications, vol. 32, no. 1, pp. 172–183, 2007.View at: Publisher Site | Google Scholar
A. Noor, “The utilization of E-health in the kingdom of Saudi Arabia,” International Research Journal of Engineering and Technology (IRJET), vol. 6, no. 9, 2019.View at: Google Scholar
R. G. Andrzejak, K. Lehnertz, F. Mormann et al., “Indications of nonlinear deterministic and finite-dimensional structures in time series of brain electrical activity: dependence on recording region and brain state,” Physical Review E—Statistical Physics, Plasmas, Fluids, and Related Interdisciplinary Topics, vol. 64, no. 6, 2001.View at: Publisher Site | Google Scholar
R. G. Elger CE Andrzejak, K. Lehnertz, C. Rieke, F. Mormann, and P. David, UCI Machine Learning Repository: Epileptic Seizure Recognition Data Set, University of California, Oakland, CA, USA, 2001.
A. S. D. P. Wallisch, M. E. Lusignan, M. D. Benayoun, T. I. Baker, and N. G. Hatsopoulos, “MATLAB for neuroscientists: an introduction to scientific computing in MATLAB,” Journal of Undergraduate Neuroscience Education, vol. 13, no. 1, 2014.View at: Google Scholar
M. Khan, Q. Ding, and W. Perrizo, “k-nearest neighbor classification on spatial data streams using P-trees,” in Advances in Knowledge Discovery and Data Mining, pp. 517–528, Springer Berlin Heidelberg, Heidelberg, Germany, 2002, Chapter Lecture No.View at: Google Scholar
P. D. Wasserman, Advanced Methods in Neural Computing, Van Nostrand Reinhold, New York, NY USA, 1993.
V. N. Mandhala, V. Sujatha, and B. Renuka Devi, “Scene classification using support vector machines,” in Proceedings of the 2014 IEEE International Conference on Advanced Communications, Control and Computing Technologies, pp. 1807–1810, Ramanathapuram, India, May 2014.View at: Publisher Site | Google Scholar