Abstract

A Mahalanobis hyperellipsoidal learning machine class incremental learning algorithm is proposed. To each class sample, the hyperellipsoidal that encloses as many as possible and pushes the outlier samples away is trained in the feature space. In the process of incremental learning, only one subclassifier is trained with the new class samples. The old models of the classifier are not influenced and can be reused. In the process of classification, considering the information of sample’s distribution in the feature space, the Mahalanobis distances from the sample mapping to the center of each hyperellipsoidal are used to decide the classified sample class. The experimental results show that the proposed method has higher classification precision and classification speed.

1. Introduction

Incremental learning is an intelligent technology of data mining and knowledge discovery. There are already some key methods of incremental learning, such as KNN, principal component analysis, Bayesian network, and Boosting and support vector machines (SVM), and this idea in the control theory, such as data driven [15], promotes the development of control method. Among these methods, SVM has a good generalization performance, because it does not depend on all the training data, but a subset named support vector. The number of support vectors is very small compared with the training data set, so SVM is a powerful tool to the incremental learning.

Many incremental learning algorithms based on SVM have been proposed, and they received better results, such as the incremental learning Batch SVM [6, 7], on-line recursive algorithm [8], divisional training SVM algorithm [9], fast incremental learning algorithm [10], and α-SVM algorithm [11]. However, these algorithms are only suitable to the case that the new incremental samples belong to old classes. When a new class was added to the classification system, the above methods could not be fully accommodated to this situation and the old models became useless. Reference [12] proposed a class incremental learning algorithm. The algorithm reuses the old models of the classifier and trains only one binary subclassifier when a new class comes. But it is not suitable to large data set, because all samples participate in the training in the process of each incremental learning. For the disadvantage, [13] proposed a class incremental learning algorithm based on hyper sphere support vector machine (HS-CIL), but the algorithm is only suitable to the case that the sample’s distribution is hyper sphere shaped and the density is higher. For the disadvantage, [14] proposed a hyperellipsoidal class incremental learning algorithm (HE-CIL), but the algorithm does not consider the influence of the outlier samples. Therefore, a Mahalanobis hyperellipsoidal learning machine class incremental learning algorithm (MHE-CIL) is proposed in this paper. To every class sample, the smallest hyperellipsoidal that encloses as much samples as possible and pushes the outlier samples away is trained in the feature space. Mahalanobis distances are used to confirm the classified sample class.

The rest of this paper is organized as follows. In Section 2, a brief review of Mahalanobis hyperellipsoidal learning machine is given. In Section 3, a new Mahalanobis hyperellipsoidal learning machine class incremental learning algorithm is discussed in detail. In Section 4, experimental results are given on Reuters 21578. Finally, conclusion is outlined in Section 5.

2. Mahalanobis Hyperellipsoidal Learning Machine

Given a set of training samples of a class , where and is the number of samples, let be a sample matrix. Training a Mahalanobis hyperellipsoidal in the feature space, where is the center of the hyperellipsoidal and is the radius of the hyperellipsoidal, the hyperellipsoidal encloses most of the mappings of sample and the radius is as small as possible. If there are no remote points, then the hyperellipsoidal will enclose all the mappings of sample. If there are remote points, allowing part of the samples outside of the superellipsoid, and searching for the smallest superellipsoid which can surround the most samples. When we are uncertain whether there are remote points, nonnegative slack variables are introduced to allow some of the mappings of sample outside the hyperellipsoidal. Using the method that is similar to finding optimal hyperplane of SVM to obtain the smallest hyperellipsoidal [1517], the formulation is as follows: where is used to compromise the number of noises out of hyperellipsoidal and the radius of hyperellipsoidal, is covariance matrix of the samples and is the inverse of the covariance matrix .

To solve the optimization problem above, one can construct the Lagrange function as follows: where and are the Lagrange multipliers.

According to the Kuhn-Tucker theorem (KKT) in optimization theory, the following conditions are satisfied:

Substituting (3) into (2), the dual optimal problem is obtained as follows:

The kernel form of (4) is as follows: where is kernel function and .

The examples that lie outside or on the margin are the corresponding nonzero. These examples are called support vectors.

The center of the smallest hyperellipsoidal can be obtained as follows:

The square kernel Mahalanobis distance from the mapping of sample to the center of the hyperellipsoidal in the feature space is defined as follows:

The radius of the smallest hyperellipsoidal can be determined by (7), via KKT conditions as follows:

3. Incremental Learning Algorithm

We give a set of training samples   and kernel function , where , , is the number of class of the training set , is the number of training samples, and corresponds to inner product in feature space, namely, .

Assume that is a subset of training samples , where all the samples of the subset belong to the th class . For every subset , the smallest hyperellipsoidal is trained in feature space, where is the center of the hyperellipsoidal and is the radius of the hyperellipsoidal.

If a new class, which defined , is generated, training the smallest hyper ellipsoidal in the feature space, and adding the classifier to old models. One time incremental learning is finished.

For the sample to be classified, compute the Mahalanobis distance from the mapping of the sample to the center of the hyperellipsoidal according to (7).

If all satisfy , and there is no hyperellipsoidal that encloses the mapping of the sample , then compute the membership that the sample belongs to the th class according to (9) and then confirm the class of sample as follows:

If there are no less than two that satisfy , then compute the membership that the sample belongs to the th class according to (11) and then confirm the class of sample as follows:

For the sample to be classified, the classification algorithm is described in detail as follows.

Step 1. Computing according to (7).

Step 2. If only one satisfies , then the sample belongs to the th class; go to Step 5; otherwise go to Step 3.

Step 3. If all satisfy , then compute the membership that the sample belongs to the th class according to (9) and then confirm the class of sample by (10) and go to Step 5; otherwise go to Step 4.

Step 4. If more than one satisfies , then compute the membership that the sample belongs to the th class according to (11) and then confirm the class of sample by (10).

Step 5. End.

4. Experiments

Experiments are made on Reuters 21578, in which five categories and 896 texts are used. 598 texts are used as training set, and the rest 298 texts are used as testing set (see Table 1). Information gain is used to reduce feature dimension and the weight of every word is computed according to TF-IDF.

To verify the efficiency of the proposed method, the same tasks are realized by using HS-CIL, HE-CIL, and MHE-CIL methods. The computational experiments were done on a Pentium 1.6 G with 512 MB memory. Liner kernel function is used for all the experiments. System parameter , .

The macroaverage precision (MAAP), macro average recall (MAAR), and macroaverage (MAAF) [18] are used to evaluate the classification performance of the algorithm.

In experiments, the original sample set includes two class samples (wheat and corn). Three times class incremental learning is done; the first time increment is the third class sample (coffee), the second time increment is the forth class sample (soybean), and the third time increment is the fifth class sample (cocoa). The macroaverage precision, macroaverage recall, and macroaverage value of three algorithms are given in Table 2. The training time and testing time of three algorithms are given in Table 3.

The experimental results show the precision, the recall, and the value of MHE-CIL method which are obviously higher than the other two methods. The key reasons are that MHE-CIL method reduces the space that hyperellipsoidal encloses by pushing the outlier samples away, and the information of sample’s distribution is considered by using Mahalanobis distance. The training time of MHE-CIL method is faster compared with HE-CIL method, and it is nearly the same as HS-CIL method. The classification speed of MHE-CIL method is faster compared with HE-CIL method, and it is nearly the same as HE-CIL method.

5. Conclusion

A novel class incremental learning algorithm is proposed. In the process of class incremental learning, only the new class samples participate in training and the old models of the classifier can be reused. In the process of classification, the Mahalanobis distance is used to confirm the class of classified sample, and the information of the sample’s distribution in the feature space is considered. The experimental results show that the proposed algorithm not only improves classification accuracy obviously, but also ensures training speed and classification speed. How to use kernel function theory to increase the density of the samples would be our research work in the future.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This study is partly supported by the National Natural Science Foundation of China (no. 61304149), the Natural Science Foundation of Liaoning Province in China (no. 201202003), and the Program for New Century Excellent Talents in University (no. NCET-11-1005).