- About this Journal ·
- Abstracting and Indexing ·
- Advance Access ·
- Aims and Scope ·
- Annual Issues ·
- Article Processing Charges ·
- Articles in Press ·
- Author Guidelines ·
- Bibliographic Information ·
- Citations to this Journal ·
- Contact Information ·
- Editorial Board ·
- Editorial Workflow ·
- Free eTOC Alerts ·
- Publication Ethics ·
- Reviewers Acknowledgment ·
- Submit a Manuscript ·
- Subscription Information ·
- Table of Contents
Abstract and Applied Analysis
Volume 2014 (2014), Article ID 894246, 5 pages
A Mahalanobis Hyperellipsoidal Learning Machine Class Incremental Learning Algorithm
1College of Engineering, Bohai University, Jinzhou 121013, China
2Department of Engineering, Faculty of Engineering and Science, The University of Agder, 4898 Grimstad, Norway
3College of Mathematics and Physics, Bohai University, Jinzhou, China
4New Energy College, Bohai University, Jinzhou, China
Received 23 December 2013; Accepted 31 December 2013; Published 11 February 2014
Academic Editor: Ming Liu
Copyright © 2014 Yuping Qin et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
A Mahalanobis hyperellipsoidal learning machine class incremental learning algorithm is proposed. To each class sample, the hyperellipsoidal that encloses as many as possible and pushes the outlier samples away is trained in the feature space. In the process of incremental learning, only one subclassifier is trained with the new class samples. The old models of the classifier are not influenced and can be reused. In the process of classification, considering the information of sample’s distribution in the feature space, the Mahalanobis distances from the sample mapping to the center of each hyperellipsoidal are used to decide the classified sample class. The experimental results show that the proposed method has higher classification precision and classification speed.
Incremental learning is an intelligent technology of data mining and knowledge discovery. There are already some key methods of incremental learning, such as KNN, principal component analysis, Bayesian network, and Boosting and support vector machines (SVM), and this idea in the control theory, such as data driven [1–5], promotes the development of control method. Among these methods, SVM has a good generalization performance, because it does not depend on all the training data, but a subset named support vector. The number of support vectors is very small compared with the training data set, so SVM is a powerful tool to the incremental learning.
Many incremental learning algorithms based on SVM have been proposed, and they received better results, such as the incremental learning Batch SVM [6, 7], on-line recursive algorithm , divisional training SVM algorithm , fast incremental learning algorithm , and α-SVM algorithm . However, these algorithms are only suitable to the case that the new incremental samples belong to old classes. When a new class was added to the classification system, the above methods could not be fully accommodated to this situation and the old models became useless. Reference  proposed a class incremental learning algorithm. The algorithm reuses the old models of the classifier and trains only one binary subclassifier when a new class comes. But it is not suitable to large data set, because all samples participate in the training in the process of each incremental learning. For the disadvantage,  proposed a class incremental learning algorithm based on hyper sphere support vector machine (HS-CIL), but the algorithm is only suitable to the case that the sample’s distribution is hyper sphere shaped and the density is higher. For the disadvantage,  proposed a hyperellipsoidal class incremental learning algorithm (HE-CIL), but the algorithm does not consider the influence of the outlier samples. Therefore, a Mahalanobis hyperellipsoidal learning machine class incremental learning algorithm (MHE-CIL) is proposed in this paper. To every class sample, the smallest hyperellipsoidal that encloses as much samples as possible and pushes the outlier samples away is trained in the feature space. Mahalanobis distances are used to confirm the classified sample class.
The rest of this paper is organized as follows. In Section 2, a brief review of Mahalanobis hyperellipsoidal learning machine is given. In Section 3, a new Mahalanobis hyperellipsoidal learning machine class incremental learning algorithm is discussed in detail. In Section 4, experimental results are given on Reuters 21578. Finally, conclusion is outlined in Section 5.
2. Mahalanobis Hyperellipsoidal Learning Machine
Given a set of training samples of a class , where and is the number of samples, let be a sample matrix. Training a Mahalanobis hyperellipsoidal in the feature space, where is the center of the hyperellipsoidal and is the radius of the hyperellipsoidal, the hyperellipsoidal encloses most of the mappings of sample and the radius is as small as possible. If there are no remote points, then the hyperellipsoidal will enclose all the mappings of sample. If there are remote points, allowing part of the samples outside of the superellipsoid, and searching for the smallest superellipsoid which can surround the most samples. When we are uncertain whether there are remote points, nonnegative slack variables are introduced to allow some of the mappings of sample outside the hyperellipsoidal. Using the method that is similar to finding optimal hyperplane of SVM to obtain the smallest hyperellipsoidal [15–17], the formulation is as follows: where is used to compromise the number of noises out of hyperellipsoidal and the radius of hyperellipsoidal, is covariance matrix of the samples and is the inverse of the covariance matrix .
To solve the optimization problem above, one can construct the Lagrange function as follows: where and are the Lagrange multipliers.
According to the Kuhn-Tucker theorem (KKT) in optimization theory, the following conditions are satisfied:
The kernel form of (4) is as follows: where is kernel function and .
The examples that lie outside or on the margin are the corresponding nonzero. These examples are called support vectors.
The center of the smallest hyperellipsoidal can be obtained as follows:
The square kernel Mahalanobis distance from the mapping of sample to the center of the hyperellipsoidal in the feature space is defined as follows:
The radius of the smallest hyperellipsoidal can be determined by (7), via KKT conditions as follows:
3. Incremental Learning Algorithm
We give a set of training samples and kernel function , where , , is the number of class of the training set , is the number of training samples, and corresponds to inner product in feature space, namely, .
Assume that is a subset of training samples , where all the samples of the subset belong to the th class . For every subset , the smallest hyperellipsoidal is trained in feature space, where is the center of the hyperellipsoidal and is the radius of the hyperellipsoidal.
If a new class, which defined , is generated, training the smallest hyper ellipsoidal in the feature space, and adding the classifier to old models. One time incremental learning is finished.
For the sample to be classified, compute the Mahalanobis distance from the mapping of the sample to the center of the hyperellipsoidal according to (7).
If all satisfy , and there is no hyperellipsoidal that encloses the mapping of the sample , then compute the membership that the sample belongs to the th class according to (9) and then confirm the class of sample as follows:
If there are no less than two that satisfy , then compute the membership that the sample belongs to the th class according to (11) and then confirm the class of sample as follows:
For the sample to be classified, the classification algorithm is described in detail as follows.
Step 1. Computing according to (7).
Step 5. End.
Experiments are made on Reuters 21578, in which five categories and 896 texts are used. 598 texts are used as training set, and the rest 298 texts are used as testing set (see Table 1). Information gain is used to reduce feature dimension and the weight of every word is computed according to TF-IDF.
To verify the efficiency of the proposed method, the same tasks are realized by using HS-CIL, HE-CIL, and MHE-CIL methods. The computational experiments were done on a Pentium 1.6 G with 512 MB memory. Liner kernel function is used for all the experiments. System parameter , .
The macroaverage precision (MAAP), macro average recall (MAAR), and macroaverage (MAAF)  are used to evaluate the classification performance of the algorithm.
In experiments, the original sample set includes two class samples (wheat and corn). Three times class incremental learning is done; the first time increment is the third class sample (coffee), the second time increment is the forth class sample (soybean), and the third time increment is the fifth class sample (cocoa). The macroaverage precision, macroaverage recall, and macroaverage value of three algorithms are given in Table 2. The training time and testing time of three algorithms are given in Table 3.
The experimental results show the precision, the recall, and the value of MHE-CIL method which are obviously higher than the other two methods. The key reasons are that MHE-CIL method reduces the space that hyperellipsoidal encloses by pushing the outlier samples away, and the information of sample’s distribution is considered by using Mahalanobis distance. The training time of MHE-CIL method is faster compared with HE-CIL method, and it is nearly the same as HS-CIL method. The classification speed of MHE-CIL method is faster compared with HE-CIL method, and it is nearly the same as HE-CIL method.
A novel class incremental learning algorithm is proposed. In the process of class incremental learning, only the new class samples participate in training and the old models of the classifier can be reused. In the process of classification, the Mahalanobis distance is used to confirm the class of classified sample, and the information of the sample’s distribution in the feature space is considered. The experimental results show that the proposed algorithm not only improves classification accuracy obviously, but also ensures training speed and classification speed. How to use kernel function theory to increase the density of the samples would be our research work in the future.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
This study is partly supported by the National Natural Science Foundation of China (no. 61304149), the Natural Science Foundation of Liaoning Province in China (no. 201202003), and the Program for New Century Excellent Talents in University (no. NCET-11-1005).
- S. Yin, S. X. Ding, A. H. A. Sari, and H. Hao, “Data-driven monitoring for stochastic systems and its application on batch process,” International Journal of Systems Science, vol. 44, no. 7, pp. 1366–1376, 2013.
- S. Yin, S. X. Ding, A. Haghani, H. Hao, and P. Zhang, “A comparison study of basic data-driven fault diagnosis and process monitoring methods on the benchmark Tennessee Eastman process,” Journal of Process Control, vol. 22, no. 9, pp. 1567–1581, 2012.
- S. Yin, H. Luo, and S. Ding, “Real-time implementation of fault-tolerant control systems with performance optimization,” IEEE Transactions on Industrial Electronics, vol. 61, no. 5, pp. 2402–2411, 2014.
- S. Yin, G. Wang, and H. R. Karimi, “Data-driven design of robust fault detection system for wind turbines,” Mechatronics, 2013.
- S. Yin, X. Yang, and H. R. Karimi, “Data-driven adaptive observer for fault diagnosis,” Mathematical Problems in Engineering, vol. 2012, Article ID 832836, 21 pages, 2012.
- S. Rüping, “Incremental learning with support vector machines,” in Proceedings of the IEEE International Conference on Data Mining (ICDM '01), pp. 641–642, December 2001.
- C. Domeniconi and D. Gunopulos, “Incremental support vector machine construction,” in Proceedings of the IEEE International Conference on Data Mining (ICDM '01), pp. 589–592, December 2001.
- G. Cauwenberghs and T. Poggio, “Incremental and decremental support vector machine learning,” in Advances in Neural Information Processing Systems, pp. 409–415, 2001.
- J. Zhang, Z. Li, and J. Yang, “A divisional incremental training algorithm of Support Vector Machine,” in Proceedings of the IEEE International Conference on Mechatronics and Automation (ICMA '05), pp. 853–856, August 2005.
- R. Kong and B. Zhang, “Fast incremental learning algorithm for support vector machine,” Control and Decision, vol. 20, no. 10, pp. 1129–1136, 2005.
- R. Xiao, J. Wang, and F. Zhang, “An approach to incremental SVM learning algorithm,” in Proceedings of the 12th IEEE International Conference on Tools with Artificial Intelligence (ICTAI '00), pp. 268–273, 2000.
- B.-F. Zhang, J.-S. Su, and X. Xu, “A class-incremental learning method for multi-class support vector machines in text classification,” in Proceedings of the International Conference on Machine Learning and Cybernetics, pp. 2581–2585, August 2006.
- Y. P. Qin, X. N. Li, and C. l. Wang, “Study on class incremental learning algorithm based on hyper-sphere support vector machines,” Computer Science, vol. 8, article 28, 2008.
- A. Ghorbel, A. Almaksour, A. Lemaitre, and E. Anquetil, “Incremental learning for interactive sketch recognition,” in Graphics Recognition: New Trends and Challenges, pp. 108–118, Springer, New York, NY, USA, 2013.
- G. Ateniese, G. Felici, L. V. Mancini, A. Spognardi, A. Villani, and D. Vitali, “Hacking smart machines with smarter ones: how to extract meaningful data from machine learning classifiers,” CoRR, http://arxiv.org/abs/1306.4447.
- F. Porikli and Y. Chi, “Multi-class classification method,” US Patent 20, 130, 156, 300, 2013.
- N. Shahid, I. H. Naqvi, and S. B. Qaisar, “One-class support vector machines: analysis of outlier detection for wireless sensor networks in harsh environments,” Artificial Intelligence Review, 2013.
- F. Sebastiani, “Machine learning in automated text categorization,” ACM Computing Surveys, vol. 34, no. 1, pp. 1–47, 2002.