Research Article  Open Access
SelfTrained LMT for Semisupervised Learning
Abstract
The most important asset of semisupervised classification methods is the use of available unlabeled data combined with a clearly smaller set of labeled examples, so as to increase the classification accuracy compared with the default procedure of supervised methods, which on the other hand use only the labeled data during the training phase. Both the absence of automated mechanisms that produce labeled data and the high cost of needed human effort for completing the procedure of labelization in several scientific domains rise the need for semisupervised methods which counterbalance this phenomenon. In this work, a selftrained Logistic Model Trees (LMT) algorithm is presented, which combines the characteristics of Logistic Trees under the scenario of poor available labeled data. We performed an in depth comparison with other wellknown semisupervised classification methods on standard benchmark datasets and we finally reached to the point that the presented technique had better accuracy in most cases.
1. Introduction
Classification task is an integral part of machine learning algorithms, trying to separate and thereafter match each tested pattern or object into distinct categories or classes. The classes vary according to the application domain of each problem. For example, the classes could represent the different origin among the tested speakers in a Speech Identification problem or different objects at several pictures in various backgrounds in a Pattern Recognition problem.
The default scenario of classification is the supervised, in which all the available labeled data are used in order to build a classification model. Using the information of the labeled data, the trained supervised classification model will assign to each new instance a class label. Unsupervised techniques can be also used for the same problems. The main characteristic of unsupervised techniques is the lack of need for labeled data [1]. However, the lack of the classes downgrades the performance of unsupervised algorithms, respectively. The most recently proposed family of methods is commonly called semisupervised learning (SSL) algorithms and is generated by a direct combination of the previous strategies [2].
Friedhelm and Edmondo [3] proposed in 2014 a categorization of semisupervised learning algorithms. They used the title of Partially Supervised Learning (PSL) for mentioning of these algorithms. They also referred to the phase of training semisupervised algorithms as a weak supervision, since only a part of the whole information is provided. Trying to explain all the new matters that have arisen from PSL task, Friedhelm and Edmondo [3] review the most prominent directions of research that are related to this domain.
Sun [4] reviews theories in order to describe the characteristics of multiview learning. Under this concept, any set of features, or, more generally, any possible information gathered which is related to the dataset, can potentially improve the classification accuracy. Moreover, Triguero et al. [5] made an in depth study of selflabeled techniques, mainly focused on the matter of classification. Based on some specific properties, which seem to be quite representative of and objective for the majority of real applications, they proposed a taxonomy for semisupervised classification (SSC) methods. One of the findings in this work is the shortage of multilearning approaches introduced with selftraining method.
In many application domains, the labeling of the training instances requires high cost in labor and/or time [6]. The major asset of semisupervised algorithms is that they overcome the need for collecting and distinguishing large amounts of data in fields like text mining, speech recognition, object detection from images [7], and so forth, allowing application of such methods in a variety of contexts. Moreover, the increased accuracy that is provided by these methods along with the automated learning of most possible patterns from datasets renders semisupervised techniques as a great tool to the machine learning community [8]. Using SSC methods, the essential effort from human experts of labeling instances tends to be reduced dramatically, especially in reallife scenarios [9].
In particular, SSC methods demand only a small proportion of the whole amount of data to be labeled for accomplishing their task. This attribute is widely known as labeled ratio and is usually provided in percentage values: Having chosen the labeled ratio, all the available data split into two different subsets: the labeled () and the unlabeled () set. The mathematic expression of the instances that are included in each of these subsets is as follows:
Tanha et al. [10] suggested that using decision tree classifiers as base classifiers along with selftraining algorithm is not quite effective as semisupervised learning is concerned mainly due to low performance when decision tree classifiers compute probability estimations for their predictions. However, decision trees are not demanding in training time and produce easily comprehensive models. A series of modifications have been proposed so as to refrain from using the simplistic proportion distribution at the leaves of a pruned decision tree [11]. Laplacian correction and grafted decision trees are some of them [10]. Torgo [12] also made a thorough study of treebased regression models and focused on generation of tree models and on pruning by tree selection.
The aim of our work was to present a selftrained Logistic Model Tree (LMT) algorithm and compare it with other wellknown semisupervised classification methods on standard benchmark datasets. To achieve this, we performed statistical comparisons of the proposed method with other algorithms and represented an illustrative visualization for recording the behavior of each algorithm against the others. Our proposed technique presented higher accuracy in most cases and a better overall performance in different scenarios, rendering this algorithm as a robust tool.
In Section 2, a brief description of the semisupervised classification techniques is provided. In Section 3, the proposed algorithm is presented. In Section 4, there are the results of the comparison of the proposed algorithm with other wellknown semisupervised classification methods on standard benchmark datasets. Finally, some conclusion remarks and future research points are presented in Section 5.
2. Semisupervised Techniques
Selftraining is usually called a wrapper method that constitutes a great tool for semisupervised learning tasks. It is a simple scheme based on four stages [7]. In the first one, a classifier of our choice is chosen and is trained with a small amount of labeled data, which have been chosen randomly from the initial dataset. During the second phase, the classification of unlabeled instances takes place and afterwards a procedure of assessment follows. More specifically, each instance that has achieved a probability value over a defined threshold is considered enough reliable to be added to the training set for the following training phases. Finally, these instances are added to the initial training set, increasing in this way its robustness. All these phases constitute a complete step of the algorithm. Retraining of the classifier is done using the new enlarged training set until stopping criteria are satisfied. Selftraining has been proven to perform with great success in many reallife scenarios, even though misclassified instances could occur due to lack of specific assumptions. An important reason why PSL techniques’ performance may fluctuate compared with supervised algorithms’ performance is the fact that, during the training phase of the former, some of the unlabeled examples will not get labelized, since the termination of the algorithm will have been preceded [3]. This fact means that a part of the total information provided through the dataset will not be exploited under this scheme.
SelfTraining with Editing (SETRED) method is a modified approach to selftraining proposed by Li and Zhou [13]. Their principal improvement in relation to the basic selftraining scheme is the different tackle of misclassified examples which come from the unlabeled set and may incorrectly be merged with the original train set, pushing in this way the performance of the algorithm in inferior level. In order to reduce these occasions, they build a neighborhood graph in dimensional feature space, whereas is the dimension of the feature vector (). By evaluating a hypothesis test, they finally discard any example whose output of the test was negative.
Cotraining is an equally important scheme that can be considered as a different variant of selftraining technique [14]. Its main approach is that the feature space can be exploited with a different way other than combining all its elements. Under this assumption, which keeps up with the multiview learning, cotraining algorithm assumes that, by dividing the feature space into two separate categories, it is more effective to predict the unlabeled instances each time [15]. This assumption seems to be more realistic when the newly formed categories represent a different view of the dataset. Since the cotraining algorithm belongs to the family of selftraining schemes, its algorithmic phases are similar to the previously referred ones, under the restriction of the existence of two independent feature vectors for each instance. In the work of Didaci et al. [16], the relation between the performance of cotraining and the size of the labeled training set was examined and their results showed that high performance was achieved even in cases where the algorithm was provided with very few instances per class. However, Du et al. [17], based on an adequate number of experiments, came to the conclusion that relying on small labeled training sets cannot ensure the accuracy of multiview consideration assumptions. In order to exclude the insertion of misclassified instances into the training set at the end of each iteration, several approaches have been proposed. Sun and Jin [18] filtered the predictions of cotraining classifiers with Canonical Correlation Analysis [4]. By applying CCA on paired datasets, the similarities between unlabeled examples of test set and initial train set were calculated in an effective way and only those instances that satisfied CCA’s restrictions were inserted into the initial training set.
Wang et al. [19] proposed the usage of some distance metric, which examines the probabilities of belonging to a class between labeled and unlabeled examples. If two examples have the same class probability value, the metric that has been defined by this scenario will boost the example with the smaller distance, to be selected with a higher possibility. Another technique for separating with higher accuracy the predictions of a semisupervised scheme is the combination of more than one classifier. Jiang et al. [20] introduced a hybrid method which combines the predictions of two different types of classifiers for exploiting their different characteristics. The first one is Naive Bayes (NB), which is a generative classifier, and the second is Support Vector Machine (SVM), which is a discriminative classifier. The final prediction is controlled by a parameter which controls the weights between the two classifiers. A review of other similar hybrid methods is also presented in [20]. Moreover, Li and Zhou [6] suggested CoForest algorithm, in which a number of Random Trees are trained on bootstrap data from the dataset. As an ensemble method, its behavior is robust even if the number of the available labeled examples is reduced. The principal idea of this algorithm is the assignment of a few unlabeled examples to each Random Tree during the training period. Eventually, the final decision is produced by majority voting. An extension of this algorithm is ADECoForest which is based on a data editing technique in order to find and reject possibly problematic instances at the end of each iteration [21]. Within its framework, cotraining by committee has been proposed by Hady and Schwenker [22]. Based on the completely known instances of dataset, a starting committee was built. The ensemble methods that were used under this semisupervised scheme were named as CoBag (Bagging), CoAdaBoost (AdaBoost), and CoRSM (random subspace).
RASCO [23] does not consider any specific criterion for splitting the feature vectors, but it implements a random split, so as to train different learners. Following this strategy, the unlabeled data are getting labeled and added to the training set based on the combination of a number of decisions of the learners trained on different attribute splits. RelRASCO [24] algorithm instead of random feature subspaces generates relevant random subspaces using relevance scores of features which are obtained using the mutual information between features and class.
Tritraining scheme uses three classifiers using different bootstrap sample of the same dataset to label each unlabeled instance. If two of the three classifiers agree on the categorization of an instance, then this is considered to be labeled and is added to the training set [25]. An improved approach to tritraining scheme is improved tritraining algorithm (imtritraining) [26], in which some drawbacks of the original model such as unsuitable error estimation, excessively confined restriction, and deficiency of weight for labeled example and unlabeled example were eliminated. The idea of ensemble methods and majority voting has been also endorsed by Zhou and Goldman [27], who proposed democratic colearning. One really interesting asset of this algorithm is the enlarging of the training set of the classifier whose prediction was different with the final one after the voting phase. Sun and Zhang [28] suggested an ensemble of classifiers to be trained from multiple views. Subsequently, only the instances whose classification stemmed from consensus prediction of multiple classifiers are selected as the most confident in order to teach the other ensemble from the new one view.
Huang et al. [29] proposed a classification method based on Local Cluster Centers (CLCC). This algorithm tries to resolve problems that occur when the provided datasets consist of a few labeled training data and facilitates situations in which the labeling process may lead to misclassified instances. Another algorithm which uses selftraining scheme is aggregation pheromone density based semisupervised classification (APSSC) algorithm [30]. In this work, the corresponding property was used, as the name of algorithm defines, found in natural behavior of real ants. Actually, it performed well enough and offered promising results for solving real world problems which are related to the classification task. A combination of classifiers under selftraining scheme has been proposed by Wang et al. [31]. Their learning approach is named SelfTraining Nearest Neighbor Rule using Cut Edges (SNNRCE) and its main advantage is the prevention of problematic examples from being added in each iteration to the initial labeled set through graphbased methods.
3. Proposed Algorithm
Our proposed algorithm combines selftraining scheme with Logistic Model Tree (LMT) algorithm. A LMT is a decision tree that has linear regression models at its leaves to provide a piecewise linear regression model [34]. As in ordinary decision trees, a test on one of the features is associated with every inner node. For a nominal feature with values, the node has child nodes, and examples are sorted down one of the branches depending on their feature’s value. For numerical features, the node has two child nodes and the test consists of comparisons of the feature value with a threshold. The LogitBoost algorithm is used to produce a linear regression model at every node in the tree [35]. The subsets encountered at lower levels in the tree become smaller and smaller; it can be preferable at some point to build a linear logistic model instead of calling the tree growing procedure recursively. There is strong evidence that building trees for very small datasets is usually not a good idea; it is better to use simpler models (like logistic regression) [36]. As for simple decision trees, pruning is an essential part of the LMT algorithm. For LMT, sometimes a single leaf (a tree pruned back to the root) leads to the best generalization performance, which is seldom the case for simple decision trees [11].
Decision trees can generate estimates for the class membership probabilities: the probability for a particular class is just the fraction of the instances in the region which are labeled with that class. In terms of probability estimates, LMT outperforms all other simple decision trees and related algorithms included in the experiments [34]. In this work, we propose a selftraining method that uses the power of LMT for semisupervised tasks. The proposed algorithm (selftrained LMT) is presented in Algorithm 1. The selftraining process produces good results by using the more accurate class probabilities of LMT model for the unlabeled instances. When fitting the logistic regression functions at a node, LMT has to determine the number of LogitBoost iterations to run. Originally, this number was crossvalidated at every node in the tree [34]. To save time, a heuristic that crossvalidates the number only once and then uses this number at every node in the tree was used in our implementation. In [37], a similar process was used.

Removal of data points from to is based on estimation of class probabilities. If the probability of the most probable class exceeds the predefined threshold , then this instance is assigned a label. In the proposed algorithm, experimental results that were performed by the authors showed that a good option for the threshold parameter is the value of 0.9, which gave decent results irrespective of the dataset. It was noticed that only a small amount of instances per class in each iteration meets the restriction above.
Algorithm 2 describes briefly the main characteristics of LMT classifier and is focused on the points that distinguish the used classifier from the common decision tree algorithms.

For the implementation, we used the opensource environments of Weka [38] and KEEL [5]. In our implementation, minNumInst was set to 15 and numBoostIter was set to 10.
4. Experiments
The experiments are based on standard classification datasets taken from the KEELdataset repository [39] covering a wide range of scientific fields. These datasets have been partitioned using the 10fold crossvalidation procedure. For each generated fold, a given algorithm is trained with the examples contained in the rest of folds (training partition) and then tested with the current fold. Each training partition is divided into two parts: labeled and unlabeled examples. In order to study the influence of the amount of labeled data, we examined four different ratios for dividing the training set: 10%, 20%, 30%, and 40%.
Subsequently, we compared the proposed method with other stateoftheart algorithms into the KEEL tool [39] such as selftraining (C45) [7], selftraining (SMO) [40], selftraining (NN) [32], SETRED [13], cotraining (C45) [14], cotraining (SMO) [41], democraticco [27], tritraining (C45) [41], tritraining (SMO) [25], tritraining (NN) [41], DEtritraining (C45), DEtritraining (SMO) [42], CoForest [6], Rasco (C45) [23], CLCC [29], APSSC [30], SNNRCE [31], RelRasco (NB) [24], ADECoForest [43], cobagging (C45) [22], and cobagging (SMO) [44]. For all tested algorithms, the default parameters of KEEL were used.
The classification accuracy of each tested algorithm using 10%, 20%, 30%, and 40% as labeled ratio is presented in Tables 1, 2, 3, and 4, respectively. The best accuracy value among the different algorithms tested in each experiment is shown in bold style. For our experiments, we used 52 datasets and all the above 22 algorithms, including SelfLMT. The full tables of comparisons can be found in http://www.math.upatras.gr/~sotos/Self_LMT_Results.xlsx.




Here, we present only the best 10 of these algorithms, according to their classification accuracy. A short comment follows each experiment about the general behavior of the proposed algorithm in comparison with the most effective one of the rest. We also provide a more representative visualization of the average accuracy ability of the proposed algorithm in comparison with the rest 21 algorithms, presented in Figure 1. In this figure, we have mapped each different ratio of labeled instances with a different color across a radar plot.
In this experiment, selftrained LMT and CoForest presented 8 wins in an amount of 52 datasets, being followed by selftraining (C45), cotraining (C45), and APSSC with 5 victories. Despite the low labeled ratio of instances, selftrained LMT managed to achieve the best average accuracy, assuring its robust behavior.
During the experiment of 20% labeled rate, selftrained LMT algorithm succeeded with 15 victories, while the next in victories’ rank were CoForest algorithm with 5 and cotraining (SMO) with 4, respectively.
Similar to the previous experiment, selftrained LMT performed 17 wins out of 52 datasets, while cotraining (SMO) and RelRasco (NB) achieved 7 and 6 best accuracy values, respectively.
Finally, selftrained LMT algorithm outperformed the rest of algorithms managing to score the best accuracy value in 19 different datasets, while democraticco achieved 5 victories.
An interesting point which comes out from Figure 1 is that the increase of labeled ratio does not uniquely mean that the average accuracy of all the algorithms will also be enhanced. The example of cobagging (C45) depicts this phenomenon, since its accuracy rate was decreased when it was provided with 40% labeled ratio against the same rate in 30% labeled ratio scenario. Furthermore, many other algorithms, such as RelRasco (NB), APSSC, and detritraining (SMO), did not manage to achieve a noteworthy improvement between 30% and 40% labeled ratio. Consequently, by providing the average accuracy of the tested algorithms on radar plots like this in Figure 1, we can extract useful information for comparing any subset of these algorithms as it concerns not only their accuracy but also their response to labeled ratio’s increase, avoiding any saturation phenomena. In order to conduct comparisons among all algorithms considered in the study and the proposed algorithm for all the different labeled ratios, the results of Friedman test together with a post hoc statistical test described in [45] are presented in Tables 5, 6, 7, and 8.




As a result, the proposed algorithm gives statistically better results among all the tested algorithms. This is due to better probabilitybased ranking and higher classification accuracy which allow selection of the highconfidence predictions in the selection step of selftraining.
5. Conclusions
It is promising to implement techniques that use both labeled and unlabeled instances in classification tasks. The limited availability of labeled instances makes the learning process difficult, as supervised learning methods cannot produce a learner with worthy accuracy.
LMT produces a single tree containing binary splits on numeric features, multiway splits on categorical ones, and logistic regression models at the leaves, and the algorithm ensures that only relevant features are included in the latter. The produced classifier is not so easy to interpret as a standard decision tree, but much more legible than an ensemble of classifiers or Kernelbased estimators.
In this work, a selftrained LMT algorithm has been proposed. We performed a comparison with other wellknown semisupervised learning methods on standard benchmark datasets and the presented technique had better accuracy in most of the tested datasets. Due to the encouraging results obtained from these experiments, one can expect that the proposed technique can be applied to real classification tasks giving slightly better accuracy than the traditional semisupervised approaches.
In spite of these results, no general method will work always. The main drawback of the semisupervised schemes is the needed time in the training phase. Some techniques that could enhance this property by saving both valuable operation time and computational resources are the feature selection algorithms which search for a subset of relevant features by removing the less informative of the initial features [46]. Building Logistic Model Trees with the LMT algorithm are orders of magnitude slower than simple tree induction or using model trees for classification. Improving the computational efficiency of the method using feature selection could be an interesting field for further research.
Appendix
A java software tool implementing the proposed algorithm and some basic run instructions can be found at http://www.math.upatras.gr/~sotos/SelfLMTExperiment.zip.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
References
 A. K. Jain, “Data clustering: 50 years beyond Kmeans,” Pattern Recognition Letters, vol. 31, no. 8, pp. 651–666, 2010. View at: Publisher Site  Google Scholar
 Q. Ye, H. Pan, and C. Liu, “Enhancement of ELM by clustering discrimination manifold regularization and multiobjective FOA for semisupervised classification,” Computational Intelligence and Neuroscience, vol. 2015, Article ID 731494, 9 pages, 2015. View at: Publisher Site  Google Scholar
 S. Friedhelm and T. Edmondo, “Pattern classification and clustering: a review of partially supervised learning approaches,” Pattern Recognition Letters, vol. 37, pp. 4–14, 2014. View at: Publisher Site  Google Scholar
 S. Sun, “A survey of multiview machine learning,” Neural Computing and Applications, vol. 23, no. 78, pp. 2031–2038, 2013. View at: Publisher Site  Google Scholar
 I. Triguero, S. García, and F. Herrera, “Selflabeled techniques for semisupervised learning: taxonomy, software and empirical study,” Knowledge and Information Systems, vol. 42, no. 2, pp. 245–284, 2015. View at: Publisher Site  Google Scholar
 M. Li and Z.H. Zhou, “Improve computeraided diagnosis with machine learning techniques using undiagnosed samples,” IEEE Transactions on Systems, Man, and Cybernetics Part A:Systems and Humans, vol. 37, no. 6, pp. 1088–1098, 2007. View at: Publisher Site  Google Scholar
 C. Rosenberg, M. Hebert, and H. Schneiderman, “Semisupervised selftraining of object detection models,” in Proceedings of the 7th IEEE Workshop on Applications of Computer Vision (WACV '05), pp. 29–36, IEEE, January 2005. View at: Publisher Site  Google Scholar
 M.L. Zhang and Z.H. Zhou, “CoTrade: confident cotraining with data editing,” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 41, no. 6, pp. 1612–1626, 2011. View at: Publisher Site  Google Scholar
 C. Liu and P. C. Yuen, “A boosted cotraining algorithm for human action recognition,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 21, no. 9, pp. 1203–1213, 2011. View at: Publisher Site  Google Scholar
 J. Tanha, M. van Someren, and H. Afsarmanesh, “Semisupervised selftraining for decision tree classifiers,” International Journal of Machine Learning and Cybernetics, 2015. View at: Publisher Site  Google Scholar
 F. Provost and P. Domingos, “Tree induction for probability based ranking,” Machine Learning, vol. 52, no. 3, pp. 199–215, 2003. View at: Publisher Site  Google Scholar
 L. Torgo, “Inductive learning of treebased regression models,” AI Communications, vol. 13, no. 2, pp. 137–138, 2000. View at: Google Scholar
 M. Li and Z.H. Zhou, “SETRED: selftraining with editing,” in Advances in Knowledge Discovery and Data Mining: 9th PacificAsia Conference, PAKDD 2005, Hanoi, Vietnam, May 18–20, 2005. Proceedings, vol. 3518 of Lecture Notes in Computer Science, pp. 611–621, Springer, Berlin, Germany, 2005. View at: Publisher Site  Google Scholar
 O. Chapelle, B. Schölkopf, and A. Zien, SemiSupervised Learning, MIT Press, Cambridge, Mass, USA, 2006. View at: Publisher Site
 X. Zhu and A. Goldberg, Introduction to SemiSupervised Learning, Synthesis Lectures on Artificial Intelligence and Machine Learning, Morgan & Claypool Publishers, 2009.
 L. Didaci, G. Fumera, and F. Roli, “Analysis of cotraining algorithm with very small training sets,” in Structural, Syntactic, and Statistical Pattern Recognition, vol. 7626 of Lecture Notes in Computer Science, pp. 719–726, Springer, Berlin, Germany, 2012. View at: Publisher Site  Google Scholar
 J. Du, C. X. Ling, and Z.H. Zhou, “When does cotraining work in real data?” IEEE Transactions on Knowledge and Data Engineering, vol. 23, no. 5, pp. 788–799, 2011. View at: Publisher Site  Google Scholar
 S. Sun and F. Jin, “Robust cotraining,” International Journal of Pattern Recognition and Artificial Intelligence, vol. 25, no. 7, pp. 1113–1126, 2011. View at: Publisher Site  Google Scholar  MathSciNet
 S. Wang, L. Wu, L. Jiao, and H. Liu, “Improve the performance of cotraining by committee with refinement of class probability estimations,” Neurocomputing, vol. 136, pp. 30–40, 2014. View at: Publisher Site  Google Scholar
 Z. Jiang, S. Zhang, and J. Zeng, “A hybrid generative/discriminative method for semisupervised classification,” KnowledgeBased Systems, vol. 37, pp. 137–145, 2013. View at: Publisher Site  Google Scholar
 C. Deng and M. Z. Guo, “A new cotrainingstyle random forest for computer aided diagnosis,” Journal of Intelligent Information Systems, vol. 36, no. 3, pp. 253–281, 2011. View at: Publisher Site  Google Scholar
 M. F. A. Hady and F. Schwenker, “Cotraining by committee: a new semisupervised learning framework,” in Proceedings of the IEEE International Conference on Data Mining Workshops (ICDMW '08), pp. 563–572, IEEE, Pisa, Italy, December 2008. View at: Publisher Site  Google Scholar
 J. Wang, S.W. Luo, and X.H. Zeng, “A random subspace method for cotraining,” in Proceedings of the International Joint Conference on Neural Networks (IJCNN '08), pp. 195–200, IEEE, Hong Kong, June 2008. View at: Publisher Site  Google Scholar
 Y. Yaslan and Z. Cataltepe, “Cotraining with relevant random subspaces,” Neurocomputing, vol. 73, no. 10–12, pp. 1652–1661, 2010. View at: Publisher Site  Google Scholar
 Z.H. Zhou and M. Li, “Tritraining: exploiting unlabeled data using three classifiers,” IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 11, pp. 1529–1541, 2005. View at: Publisher Site  Google Scholar
 T. Guo, G. Li, and T. Guo, “Improved tritraining with unlabeled data,” in Software Engineering and Knowledge Engineering: Theory and Practice, vol. 115 of Advances in Intelligent and Soft Computing, pp. 139–147, 2012. View at: Publisher Site  Google Scholar
 Y. Zhou and S. Goldman, “Democratic colearning,” in Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence (ICTAI '04), pp. 594–602, IEEE, November 2004. View at: Publisher Site  Google Scholar
 S. Sun and Q. Zhang, “Multipleview multiplelearner semisupervised learning,” Neural Processing Letters, vol. 34, no. 3, pp. 229–240, 2011. View at: Publisher Site  Google Scholar
 T. Huang, Y. Yu, G. Guo, and K. Li, “A classification algorithm based on local cluster centers with a few labeled training examples,” KnowledgeBased Systems, vol. 23, no. 6, pp. 563–571, 2010. View at: Publisher Site  Google Scholar
 A. Halder, S. Ghosh, A. Ghosh, and A. Halder, “Ant based semisupervised classification,” in Swarm Intelligence, vol. 6234 of Lecture Notes in Computer Science, pp. 376–383, Springer, Berlin, Germany, 2010. View at: Publisher Site  Google Scholar
 Y. Wang, X. Xu, H. Zhao, and Z. Hua, “Semisupervised learning based on nearest neighbor rule and cut edges,” KnowledgeBased Systems, vol. 23, no. 6, pp. 547–554, 2010. View at: Publisher Site  Google Scholar
 M. Iggane, A. Ennaji, D. Mammass, and M. Yassa, “Selftraining using a knearest neighbor as a base classifier reinforced by support vector machines,” International Journal of Computer Applications, vol. 56, no. 6, pp. 43–46, 2012. View at: Publisher Site  Google Scholar
 L. Breiman, H. Friedman, J. A. Olshen, and C. J. Stone, Classification and Regression Trees, Wadsworth Statistics/Probability, Chapman and Hall/CRC, 1984.
 N. Landwehr, M. Hall, and E. Frank, “Logistic model trees,” Machine Learning, vol. 59, no. 12, pp. 161–205, 2005. View at: Publisher Site  Google Scholar
 M. Sumner, E. Frank, and M. Hall, “Speeding up logistic model tree induction,” in Proceedings of the 9th European Conference on Principles and Practice of Knowledge Discovery in Databases, pp. 675–683, Porto, Portugal, October 2005. View at: Google Scholar
 C. Perlich, F. Provost, and J. Simonoff, “Tree inductions vs. logistic regression: a learningcurve analysis,” Journal of Machine Learning Research, vol. 4, pp. 211–255, 2003. View at: Google Scholar
 M. Sumner, E. Frank, M. Hall, and M. Sumner, “Speeding up logistic model tree induction,” in Knowledge Discovery in Databases: PKDD 2005, vol. 3721 of Lecture Notes in Computer Science, pp. 675–683, Springer, Berlin, Germany, 2005. View at: Publisher Site  Google Scholar
 M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten, “The WEKA data mining software: an update,” ACM SIGKDD Explorations Newsletter, vol. 11, no. 1, pp. 10–18, 2009. View at: Publisher Site  Google Scholar
 J. AlcaláFdez, A. Fernández, J. Luengo et al., “KEEL datamining software tool: data set repository, integration of algorithms and experimental analysis framework,” Journal of MultipleValued Logic and Soft Computing, vol. 17, no. 23, pp. 255–287, 2011. View at: Google Scholar
 S. S. Keerthi, S. K. Shevade, C. Bhattacharyya, and K. R. K. Murthy, “Improvements to Platt's SMO algorithm for SVM classifier design,” Neural Computation, vol. 13, no. 3, pp. 637–649, 2001. View at: Publisher Site  Google Scholar
 A. Blum and T. Mitchell, “Combining labeled and unlabeled data with cotraining,” in Proceedings of the 11th Annual Conference on Computational Learning Theory (COLT' 98), pp. 92–100, Morgan Kaufmann Publishers, Madison, Wis, USA, July 1998. View at: Publisher Site  Google Scholar
 C. Deng and M. Guo, “Tritraining and data editing based semisupervised clustering algorithm,” in MICAI 2006: Advances in Artificial Intelligence, vol. 4293 of Lecture Notes in Computer Science, pp. 641–651, Springer, Berlin, Germany, 2006. View at: Publisher Site  Google Scholar
 C. Deng and M. Z. Guo, “A new cotrainingstyle random forest for computer aided diagnosis,” Journal of Intelligent Information Systems, vol. 36, no. 3, pp. 253–281, 2011. View at: Publisher Site  Google Scholar
 Y. Li, H. Li, C. Guan, and Z. Chin, “A selftraining semisupervised support vector machine algorithm and its applications in brain computer interface,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '07), vol. 1, pp. I385–I388, Honolulu, Hawaii, USA, April 2007. View at: Publisher Site  Google Scholar
 S. García, A. Fernández, J. Luengo, and F. Herrera, “Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power,” Information Sciences, vol. 180, no. 10, pp. 2044–2064, 2010. View at: Publisher Site  Google Scholar
 Z. Xu, I. King, M. R.T. Lyu, and R. Jin, “Discriminative semisupervised feature selection via manifold regularization,” IEEE Transactions on Neural Networks, vol. 21, no. 7, pp. 1033–1047, 2010. View at: Publisher Site  Google Scholar
Copyright
Copyright © 2016 Nikos Fazakis et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.