Research Article  Open Access
Jiusheng Chen, Xiaoyu Zhang, Kai Guo, "A Core Set Based Large VectorAngular Region and Margin Approach for Novelty Detection", Mathematical Problems in Engineering, vol. 2016, Article ID 1658758, 12 pages, 2016. https://doi.org/10.1155/2016/1658758
A Core Set Based Large VectorAngular Region and Margin Approach for Novelty Detection
Abstract
A large vectorangular region and margin (LARM) approach is presented for novelty detection based on imbalanced data. The key idea is to construct the largest vectorangular region in the feature space to separate normal training patterns; meanwhile, maximize the vectorangular margin between the surface of this optimal vectorangular region and abnormal training patterns. In order to improve the generalization performance of LARM, the vectorangular distribution is optimized by maximizing the vectorangular mean and minimizing the vectorangular variance, which separates the normal and abnormal examples well. However, the inherent computation of quadratic programming (QP) solver takes training time and at least space, which might be computational prohibitive for large scale problems. By and approximation algorithm, the core set based LARM algorithm is proposed for fast training LARM problem. Experimental results based on imbalanced datasets have validated the favorable efficiency of the proposed approach in novelty detection.
1. Introduction
The task of novelty detection is to learn a model from normal examples in training patterns and hence can classify the test patterns. In realworld novelty detection applications, it is usually assumed that normal training patterns can be well sampled, while abnormal training patterns are severely undersampled, which is due to expensive measurement cost or infrequency of abnormal events. Therefore, only normal training patterns are used to build detection model in most novelty detection algorithms. Generally, novelty detection may be seen as oneclass classification problem. Recently, novelty detection has gained much research attention in realworld applications such as network intrusion detection [1], jet engine health monitoring [2], medical data [3], and aviation safety [4, 5].
In this paper, the kernelbased novelty detection algorithm is studied indepth, which is very popular and has been proved to be successful recently. Various kernelbased novelty detection approaches have been proposed, such as oneclass support vector machine (OCSVM) [6] and support vector data description (SVDD) [7]. OCSVM was proposed by Schölkopf et al. [6], in which, to improve generalization ability, novelty detection boundary is constructed to separate the origin from the input samples with the maximal margin. The performance of OCSVM is very sensitive to the parameters, making it difficult to be generalized to other applications [8].
SVDD was proposed by Tax and Duin [7], in which the minimal ball is constructed to enclose most of the training samples. Novelty point is assessed by determining whether a test point lies within the minimal ball or not. The margin between the closed boundary surrounding the positive data and that surrounding the negative data is zero, which makes the method of poor generalization ability. A small sphere and large margin (SSLM) approach was proposed by Wu and Ye [9], in which the smallest hypersphere is constructed to surround the normal data; meanwhile, the margin from any outlier to this hypersphere is as large as possible. An incremental weighted oneclass support vector machine for mining streaming data was proposed by Krawczyk and Wózniak [10, 11], in which the weights to each object are modified according to its level of significance, and the shape of the decision boundary is influenced only by new objects that carry new and useful knowledge extending the competence of the classifier.
Support vector machine (SVM) can be solved through figuring out quadratic programming (QP) problem, which has the important computational advantage of avoiding the problem of local minima. However, solving the corresponding SVM problems using the naive implementation of QP solver takes computational time complexity and at least space complexity if the number of training patterns is . Obviously, the naive implementation of QP solver is difficult to meet the practical application of novelty detection in large scale datasets. Tsang et al. proposed the core vector machine (CVM) [12, 13] as the approximation algorithm of minimum enclosing ball (MEB) for large scale problems. The key idea is that the implementation of QP solver for corresponding SVM problems could be equivalently viewed as MEB problems. By utilizing an approximation algorithm for the MEB problem in computational geometry, the time complexity of CVM algorithm is linear to the number of training patterns. Moreover, the space complexity is irrelevant to the number of training patterns.
As mentioned above, only normal training patterns are used to build the detection model in most novelty detection algorithms. In practical applications of novelty detection, it is difficult, but not impossible, to obtain a very few abnormal training patterns. For instance, in machine fault detection, in addition to extensive measurements on the normal working conditions, there may be also some measurements on faulty situations [14]. Recently, extensive and comprehensive researches have been carried out in both academia and industry to solve the imbalanced novelty detection problem.
Kernelbased novelty detection based on imbalanced data is researched in this paper. Suppose , , is a given training dataset with examples, where is the th input instance, is a class identity label associated with instance , is the set of majority training patterns and , is the set of minority training patterns and , and . is the feature mapping function defined by a given kernel function . The length of the perpendicular projection of the training pattern onto the vector is expressed as , which actually reflects the information about the angular and the Euclidean distances between and in the Euclidean vector space. According to the definition in [15], is called vectorangular.
In this paper, a large vectorangular region and margin (LARM) algorithm and its fast training method based on core set are proposed for novelty detection, where the training patterns are imbalanced. The main contributions of this paper lie in three aspects. Firstly, the boundary of SVM is only determined by the support vectors and the distribution of the data in the training set is not considered [16]. However, recent theoretical results have proved that data distribution information is crucial to the generalization performance [17, 18]. The proposed algorithm in this paper aims to find an optimal vector in the feature space, in which the mean and the variance of vectorangular are maximized and minimized, respectively. Therefore, normal and abnormal examples are well separated when projected onto the optimal vector joining their large mean and small variance. Secondly, the proposed LARM integrates oneclass and binary classification algorithms to tackle the novelty detection problem based on imbalanced data, which constructs the largest vectorangular region in the feature space to separate normal training patterns and maximizes the vectorangular margin between the optimal vectorangular region and the abnormal data. Since the number of normal training patterns is sufficient, the largest vectorangular region is constructed accurately, which can minimize the chance of accepting the normal examples. To achieve better generalization performance, the vectorangular margin between the surface of this optimal vectorangular region and the abnormal data is maximized. Thirdly, the core set based LARM algorithm is proposed for fast training LARM problem. The time and space complexity of core set based LARM are linear to and independent of the number of training patterns, respectively.
The structure of this paper is organized as follows. Section 1 introduces the novelty detection technique and presents an analysis of the existing problems. Section 2 introduces support vector machine (SVM), twoclass SVDD, and maximum vectorangular margin classifier (MAMC). Section 3 presents the proposed LARM for novelty detection and its fast training method based on core set. Experimental results are shown in Section 4 and conclusions are given in Section 5.
2. SVM, SVDD, and MAMC
2.1. SVM
SVM was proposed by Schölkopf et al. [19] to solve the binary classification problem, which uses the parameter to control the number of support vectors and the bound of the classification errors. SVM can be modeled as follows:where is the normal vector of the decision hyperplane, is the bias of the classifier, is the margin, is the vector of slack variables, and is a positive constant. SVM obtains the optimal hyperplane for separating the two classes with a maximal margin . To classify a testing instance , the decision function takes the sign function of the optimal hyperplane .
2.2. SVDD
Oneclass SVDD and twoclass SVDD were proposed by Tax and Duin in 2004 [7], in which the minimal ball is constructed to enclose most of the training patterns. Here, we only review twoclass SVDD that can utilize the abnormal data. Twoclass SVDD can be modeled as follows:where and are the radius and the center of the hypersphere, and are two tradeoff parameters which can treat imbalanced datasets, and is the vector of slack variables. The testing instance can be determined, whether it is inside of the optimal hypersphere or not. Hence, the decision function of twoclass SVDD is .
2.3. MAMC
MAMC was proposed by Hu et al. in 2012 [15], which attempts to find an optimal vector in the feature space based on the maximum vectorangular margin. MAMC can be modeled as follows:where is the optimized vector, is the vectorangular margin, is the vector of slack variables, and and are two positive constants. To classify a testing instance , the decision function is defined as .
3. Core Set Based Large VectorAngular Region and Margin
In this section, LARM algorithm and its fast training method based on core set are proposed for novelty detection with imbalanced data.
3.1. LARM
To tackle the novelty detection problem on imbalanced data, the distribution of vectorangular and maximization of vectorangular margin are considered in this paper. Figure 1 illustrates the principle of LARM.
Firstly, LARM is adopted to find an optimal vector in the feature space, which attempts to maximize the vectorangular mean and minimize the vectorangular variance simultaneously. Here, the vectorangular expresses the length of projection of training pattern onto the optimal vector . Therefore, the normal and abnormal examples are well separated when projected onto the optimal vector joining their large mean and small variance.
Secondly, for the learning problem on imbalanced data, the largest vectorangular region in the feature space is constructed to separate the normal data. Since the number of normal training patterns is sufficient, the largest vectorangular region is constructed accurately, which can minimize the chances of accepting the normal examples. Meanwhile, to achieve a favorable generalization performance, the vectorangular margin between the surface of this optimal vectorangular region and the abnormal data is maximized.
3.1.1. Primal Formulation of LARM
Formally, define the training pattern matrix , label column vector , and label diagonal matrix . According to the definition in [18], the vectorangular mean and vectorangular variance between training patterns , and vector can be expressed as
Then, the primal LARM can be formulated as the following optimization problem:where is the optimal vector, is the width of vectorangular region, is the vectorangular margin, is the vector of slack variables, and , , , and are four positive constants.
According to [18], for problem (5) is expressed as follows:
Hence, can be obtained, where is the kernel matrix. Problem (5) can be formulated as follows:where , , and is the th column of .
3.1.2. Dual Problem
To investigate the problem with constraints described as (7), the Lagrangian function is constructed as follows:where and are Lagrange multipliers. The following equations can be obtained by making the partial derivatives of with respect to the primal variables to zero:
Substituting (9)–(13) into (8), the dual form can be obtained, which omits constants without influence on optimization:where , , and is the inverse matrix of and .
The dual problem (14) is a QP problem, which has the same form as the dual of the SVM [19, 20]. Therefore, the QP problem (14) can be easily solved by SMO algorithm in LIBSVM [21].
Suppose is the optimal vector of the dual problem (14). According to (13), can be expressed as follows:
To compute and , two sets are considered:
According to the KarushKuhnTucker (KKT) conditionsand (11) and (12), , , , and can be obtained. Hence, set and , and and can be expressed as
3.1.3. Decision Function
It can be seen that minimizing the cost function (5) will make the width of vectorangular region and vectorangular margin as large as possible. Meanwhile, the optimal vector in feature space is found, which makes the normal and abnormal examples well separated when projected onto the optimal vector joining their large mean and small variance. Therefore, the testing patterns can be classified in terms of the vectorangular between the vector and the training patterns . The optimal separating hyperplane of SVM is , which is at the middle of the margin. Similarly, the separating hyperplane of LARM is defined at the center of the margin. Hence, for testing instance , the decision function is expressed as follows:
3.1.4. Property
Let and represent the number of margin errors of the normal and abnormal training patterns and and denote the number of support vectors of the normal and abnormal training patterns, respectively. According to (9) and (10), the following formulas can be obtained:
By using similar proof about property in [19] and by making use of (20), inequalities (21) can be obtained:
The inequalities (21) indicate that (or ) is a lower bound of the fraction of support vectors in the normal (or abnormal) dataset and an upper bound of the fraction of misclassified patterns in the normal (or abnormal) dataset. The property of LARM can be used for parameter selection in the following experiments.
3.2. Core Set Based LARM
As mentioned above, the dual problem of LARM can be actually formulated as a QP problem. So, solving the corresponding QP problem of LARM takes computational time complexity and space complexity. When the number of training patterns is large, it is thus computationally infeasible. Inspired from the core set based approximate MEB algorithms, and approximation algorithm is utilized for fast training LARM problem, which is called core set based LARM. Firstly, core sets of training patterns are obtained by and approximation algorithm to achieve the distribution of vectorangular region of the normal and abnormal examples. The core set is a subset of the original training patterns and the optimization problem can be approximately solved on the core set. Secondly, the LARM problem is solved by SMO algorithm [22] using the obtained core set. According to [12, 13], the number of core sets is independent of both the number and the dimension of training patterns, and the time complexity is linear to the number of training patterns while the space complexity is independent of the number of training patterns. The schematic illustration of core set based LARM is shown in Figure 2.
Suppose is the core set of the iteration, is the optimal vector in the feature space of the iteration, is the minimum distance between the center of the vectorangular margin and any point in core set of the iteration, and is the maximum distance between the center of the vectorangular margin and any point in core set of the iteration. Given , according to [12, 13], the core set based LARM is trained as follows.(i)Initialize , , and .(ii)Terminate if there is no training point falls outside the vectorangular region . Go to step (vi).(iii)Find and ; is the furthest away from the center of the vectorangular margin and is the shortest away from the center of the vectorangular margin. Set .
The distance between the center of the vectorangular margin and any point is expressed as follows:where is the width of vectorangular region at the th iteration, is the vectorangular margin at the iteration, and the set is constructed by all training patterns outside the vectorangular region .
Computing (22) for all training patterns, takes time at the iteration. When is large, time cost will be enormous. In order to reduce the computation cost, the probabilistic speedup method [23] is used to accelerate the vectorangular computations in steps (ii) and (iii). The details of time and space complexities can be seen in [12, 13].(iv)Find the new vectorangular region .(v)Increase by 1 and go back to step (ii).(vi)Solve the LARM problem (14) by the core set .(vii)Classify the test pattern by the decision function (19).
4. Experimental Results
The proposed core set based LARM is evaluated on twenty datasets, including both LIBSVM datasets [24] and UCI datasets [25]. Details of the datasets are listed in Table 1, where is the data dimension, #pos is the total number of normal patterns, #neg is the total number of abnormal patterns, is the number of normal training patterns, and is that of abnormal training patterns. The dataset size is ranged from 178 to more than 495,141, and the proportion of major and minor data is ranged from 10 : 1 to 1000 : 1. Experiments are repeated for 10 times with random data partitions, the geometric mean accuracy and the standard deviation are recorded.

4.1. Performance Measurement and Parameter Selection
The performance of core set based LARM is compared with three kernelbased algorithms: SVM, SVDD, and MAMC. The geometric mean accuracy [26] is used for both parameter selection and algorithm evaluation, where is the classification accuracy of the positive class and is the classification accuracy of the negative class. The measurement is widely applied in imbalanced data [14, 26, 27], and it considers the classification results on both the positive and the negative classes. To make the experimental results persuasive enough, all the parameters of SVM, SVDD, MAMC, and core set based LARM are selected by fivefold cross validation.
In all experiments, the Radial Basis Function (RBF) is taken as the kernel function: where is the kernel parameter of the RBF. For all the algorithms, RBF parameter is calculated by [12, 13]where and is the diagonal elements of matrix .
For SVM, parameter is searched in , where .
For SVDD, parameter is searched in and parameter is searched by the ratio belonging to .
For MAMC, parameter is searched in and parameter is searched in .
For core set based LARM, parameter is searched in and parameters and are searched in . From (21), and can be achieved, which are most associated with the percentage of support vectors and margin errors. From Section 4.2, we can see that parameters and have faint effect on the accuracy rate. Therefore, parameters and are set to 1 and , respectively.
4.2. Parameters Influence
There are five parameters in core set based LARM, that is, , , , , and . To verify the influence of the parameters on the performance of core set based LARM, experiments on some representative datasets are performed. By fixing other parameters, the influence of every parameter on some representative datasets is further studied, which is shown in Figures 3–7.
(a) Influence of on geometric mean accuracy
(b) Influence of on the number of core sets
(a) Influence of on geometric mean accuracy
(b) Influence of on the number of core sets
(a) Influence of on geometric mean accuracy
(b) Influence of on the number of core sets
(a) Influence of on geometric mean accuracy
(b) Influence of on the number of core sets
(a) Influence of on geometric mean accuracy
(b) Influence of on the number of core sets
Figure 3 shows the influence of on the geometric mean accuracy and the number of core sets by varying from 10 to 100 while fixing , , , and as the suggested value obtained by the cross validation described in Section 4.1. Figure 4 shows the influence of on the geometric mean accuracy and the number of core sets by varying from 0.001 to 0.01 while fixing , , , and in the same way. Figure 5 shows the influence of on the geometric mean accuracy and the number of core sets by varying from 0.001 to 0.01 while fixing , , , and in the same way. Figure 6 shows the influence of on the geometric mean accuracy and the number of core sets by varying from to while fixing , , , and in the same way. Figure 7 shows the influence of on the geometric mean accuracy and the number of core sets by varying from to while fixing , , , and in the same way.
From Figures 3–7, it can be seen that parameters , , , , and have faint effect on the geometric mean accuracy and the number of core sets, which make the core set based LARM even more attractive in practice. Therefore, parameters , , , , and obtained by the cross validation described in Section 4.1 are acceptable for all experiments.
4.3. Numerical Results
4.3.1. Detection Performance
For each dataset, samples are randomly split into training patterns and testing patterns with the proportion described in Table 1. Parameters of SVM, SVDD, MAMC, and core set based LARM are selected by fivefold cross validation to make the experimental results persuasive enough.
The geometric mean accuracy is used for the performance evaluation. Experiments are repeated for 10 times with random data partitions. The average accuracy and the standard deviation are listed in Table 2. NULL shows that there is no return result in 10 hours. Furthermore, with regard to every dataset, the difference between the bold results and the best geometric mean accuracy is not significant, which is determined by the Wilcoxon ranksum test, with the confidence level of 0.05.

From Table 2, it can be concluded that the performance of core set based LARM is comparable to the best of SVM, SVDD, and MAMC on all datasets. The core set based LARM performs significantly better than SVM, SVDD, and MAMC on 12, 9, and 13 over 20 datasets, respectively. It illustrates that, by using and approximation algorithm for training LARM, the generalization performance of core set based LARM is comparable to or even better than the best of SVM, SVDD, and MAMC.
4.3.2. Time Cost
The time cost of SVM, SVDD, MAMC, and core set based LARM on different datasets is shown in Tables 3 and 4. The average and standard deviation of training time (including parameters selection and model training time) are shown in Table 3. The average and standard deviation of testing time are shown in Table 4. All the experiments are conducted on the computer with an i52400@3.10 GHz CPU and 8 GB SDRAM. NULL shows that there is no return result in 10 hours. Furthermore, with regard to every dataset, the difference between the bold results and the best time cost is not significant, which is determined by the Wilcoxon ranksum test, with the confidence level of 0.05.


From Table 3, it can be clearly seen that the training time of core set based LARM is longer than the best of SVM, SVDD, and MAMC, when the number of the training patterns is less than 2,143. However, when the number of training patterns is larger than 2,686 such as SDD, MC, Shuttle, Codrna, S. segmentation, and Covtype, the training time of core set based LARM is shorter than the best of SVM, SVDD, and MAMC. When the number of training patterns increases to 141,792, the average training time of core set based LARM does not exceed 65 seconds. Therefore, the training time of core set based LARM does not increase very quickly with the number of training patterns.
As can be seen from Table 4, the best testing time of SVM, SVDD, and MAMC performs slightly better than core set based LARM on 11 over 20 datasets; the longest time gap is 0.002 second. However, the testing time of core set based LARM is not the worst one. When the number of testing patterns is 353,349, such as Covtype, the average testing time of core set based LARM is about 1.5 seconds. It shows that the core set based LARM can detect testing examples fast.
5. Conclusion
In this paper, a novel LARM algorithm and its fast training method based on core set are proposed for novelty detection on imbalanced data. The proposed LARM algorithm combines the ideas of oneclass and binary classification algorithms, which constructs the largest vectorangular region in the feature space to separate normal training patterns and maximizes the vectorangular margin between this optimal vectorangular region and the abnormal data. In order to make the generalization performance of LARM better, the vectorangular distribution is optimized by maximizing the vectorangular mean and minimizing the vectorangular variance. To improve the computation efficiency, and approximation algorithm is proposed for fast training LARM based on core set. The time and space complexity of core set based LARM are linear to and independent of the number of training patterns, respectively. Comprehensive experiments have validated the effectiveness of proposed approach. In the future, it will be interesting to extend the idea of LARM to handle oneclass learning problem.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
This work is supported by the National Natural Science Foundation of China under Grant nos. U1433103 and U1333116; the Science and Technology Foundation of Civil Aviation Administration of China under Grant no. 20150227; and the Fundamental Research Foundation for the Central Universities of CAUC under Grant no. 3122014D022 and no. 3122014B002.
References
 R. Perdisci, G. Gu, and W. Lee, “Using an ensemble of oneclass SVM classifiers to harden payloadbased anomaly detection systems,” in Proceedings of the 6th International Conference on Data Mining (ICDM '06), pp. 488–498, Hong Kong, December 2006. View at: Publisher Site  Google Scholar
 P. Hayton, S. Utete, D. King, S. King, P. Anuzis, and L. Tarassenko, “Static and dynamic novelty detection methods for jet engine health monitoring,” Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol. 365, no. 1851, pp. 493–514, 2007. View at: Publisher Site  Google Scholar
 L. Clifton, D. A. Clifton, P. J. Watkinson, and L. Tarassenko, “Identification of patient deterioration in vitalsign data using oneclass support vector machines,” in Proceedings of the Federated Conference on Computer Science and Information Systems (FedCSIS '11), pp. 125–131, September 2011. View at: Google Scholar
 E. Smart and D. Brown, “A twophase method of detecting abnormalities in aircraft flight data and ranking their impact on individual flights,” IEEE Transactions on Intelligent Transportation Systems, vol. 13, no. 3, pp. 1253–1265, 2012. View at: Publisher Site  Google Scholar
 R. X. Guo, K. Guo, and J. K. Dong, “Fault diagnosis for the landing phase of the aircraft based on an adaptive kernel principal component analysis algorithm,” Proceedings of the Institution of Mechanical Engineers Part I: Journal of Systems & Control Engineering, vol. 229, no. 10, pp. 917–926, 2015. View at: Publisher Site  Google Scholar
 B. Schölkopf, R. Williamson, A. Smola, J. ShaweTaylor, and J. Platt, “Support vector method for novelty detection,” in Advances in Neural Information Processing Systems—NIPS 1999, pp. 582–588, MIT Press, 1999. View at: Google Scholar
 D. M. J. Tax and R. P. W. Duin, “Support vector data description,” Machine Learning, vol. 54, no. 1, pp. 45–66, 2004. View at: Publisher Site  Google Scholar
 L. M. Manevitz and M. Yousef, “Oneclass svms for document classification,” The Journal of Machine Learning Research, vol. 2, no. 2, pp. 139–154, 2002. View at: Google Scholar
 M. Wu and J. Ye, “A small sphere and large margin approach for novelty detection using training data with outliers,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 11, pp. 2088–2092, 2009. View at: Publisher Site  Google Scholar
 B. Krawczyk and M. Woźniak, “Incremental weighted oneclass classifier for mining stationary data streams,” Journal of Computational Science, vol. 9, pp. 19–25, 2015. View at: Publisher Site  Google Scholar
 B. Krawczyk and M. Woźniak, “Oneclass classifiers with incremental learning and forgetting for data streams with concept drift,” Soft Computing, vol. 19, no. 12, pp. 3387–3400, 2015. View at: Publisher Site  Google Scholar
 I. W. Tsang, J. T. Kwok, and P.M. Cheung, “Core vector machines: fast SVM training on very large data sets,” Journal of Machine Learning Research, vol. 6, pp. 363–392, 2005. View at: Google Scholar  MathSciNet
 I. W. H. Tsang, J. T. Y. Kwok, and J. A. Zurada, “Generalized core vector machines,” IEEE Transactions on Neural Networks, vol. 17, no. 5, pp. 1126–1140, 2006. View at: Publisher Site  Google Scholar
 H. He and E. A. Garcia, “Learning from imbalanced data,” IEEE Transactions on Knowledge and Data Engineering, vol. 21, no. 9, pp. 1263–1284, 2009. View at: Publisher Site  Google Scholar
 W. Hu, F.L. Chung, and S. Wang, “The maximum vectorangular margin classifier and its fast training on large datasets using a core vector machine,” Neural Networks, vol. 27, no. 3, pp. 60–73, 2012. View at: Publisher Site  Google Scholar
 M. A. F. Pimentel, D. A. Clifton, L. Clifton, and L. Tarassenko, “A review of novelty detection,” Signal Processing, vol. 99, pp. 215–249, 2014. View at: Publisher Site  Google Scholar
 W. Gao and Z.H. Zhou, “On the doubt about margin explanation of boosting,” Artificial Intelligence, vol. 203, pp. 1–18, 2013. View at: Publisher Site  Google Scholar  MathSciNet
 T. Zhang and Z.H. Zhou, “Large margin distribution machine,” in Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '14), pp. 313–322, New York, NY, USA, August 2014. View at: Publisher Site  Google Scholar
 B. Schölkopf, A. J. Smola, R. C. Williamson, and P. L. Bartlett, “New support vector algorithms,” Neural Computation, vol. 12, no. 5, pp. 1207–1245, 2000. View at: Publisher Site  Google Scholar
 C.C. Chang and C.J. Lin, “Training νsupport vector classifiers: theory and algorithms,” Neural Computation, vol. 13, no. 9, pp. 2119–2147, 2001. View at: Publisher Site  Google Scholar
 C.C. Chang and C.J. Lin, “LIBSVM: a library for support vector machines,” ACM Transactions on Intelligent Systems and Technology, vol. 2, no. 3, article 27, 2011. View at: Publisher Site  Google Scholar
 J. C. Platt, “Fast training of support vector machines using sequential minimal optimization,” in Advances in Kernel Methods—Support Vector Learning, B. Schölkopf, C. Burges, and A. Smola, Eds., pp. 185–208, MIT Press, Cambridge, Mass, USA, 1999. View at: Google Scholar
 A. Smola and B. Schölkopf, “Sparse greedy matrix approximation for machine learning,” in Proceedings of the 17th International Conference on Machine Learning (ICML '00), pp. 911–918, Stanford, Calif, USA, June 2000. View at: Google Scholar
 R. E. Fan and C. J. Lin, LIBSVM Data: Classification, Regression and MultiLabel, 2011, https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets.
 A. Asuncion and D. J. Newman, UCI Machine Learning Repository, School of Information and Computer Sciences, University of California Irvine, 2007, http://www.ics.uci.edu/~mlearn/MLRepository.html.
 M. Kubat and S. Matwin, “Addressing the curse of imbalanced training sets: onesided selection,” in Proceedings of the 14th International Conference on Machine Learning, pp. 179–186, Morgan Kaufmann, 1997. View at: Google Scholar
 G. Wu and E. Y. Chang, “Classboundary alignment for imbalanced dataset learning,” in Proceedings of the International Conference on Machine Learning Workshop Learning from Imbalanced Datasets (ICML '03), 2003. View at: Google Scholar
Copyright
Copyright © 2016 Jiusheng Chen et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.