Mathematical Problems in Engineering

Volume 2016 (2016), Article ID 1658758, 12 pages

http://dx.doi.org/10.1155/2016/1658758

## A Core Set Based Large Vector-Angular Region and Margin Approach for Novelty Detection

College of Electronics, Information & Automation, Civil Aviation University of China, Tianjin 300300, China

Received 2 November 2015; Revised 10 January 2016; Accepted 12 January 2016

Academic Editor: Muhammad N. Akram

Copyright © 2016 Jiusheng Chen et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

A large vector-angular region and margin (LARM) approach is presented for novelty detection based on imbalanced data. The key idea is to construct the largest vector-angular region in the feature space to separate normal training patterns; meanwhile, maximize the vector-angular margin between the surface of this optimal vector-angular region and abnormal training patterns. In order to improve the generalization performance of LARM, the vector-angular distribution is optimized by maximizing the vector-angular mean and minimizing the vector-angular variance, which separates the normal and abnormal examples well. However, the inherent computation of quadratic programming (QP) solver takes training time and at least space, which might be computational prohibitive for large scale problems. By and -approximation algorithm, the core set based LARM algorithm is proposed for fast training LARM problem. Experimental results based on imbalanced datasets have validated the favorable efficiency of the proposed approach in novelty detection.

#### 1. Introduction

The task of novelty detection is to learn a model from normal examples in training patterns and hence can classify the test patterns. In real-world novelty detection applications, it is usually assumed that normal training patterns can be well sampled, while abnormal training patterns are severely undersampled, which is due to expensive measurement cost or infrequency of abnormal events. Therefore, only normal training patterns are used to build detection model in most novelty detection algorithms. Generally, novelty detection may be seen as one-class classification problem. Recently, novelty detection has gained much research attention in real-world applications such as network intrusion detection [1], jet engine health monitoring [2], medical data [3], and aviation safety [4, 5].

In this paper, the kernel-based novelty detection algorithm is studied in-depth, which is very popular and has been proved to be successful recently. Various kernel-based novelty detection approaches have been proposed, such as one-class support vector machine (OCSVM) [6] and support vector data description (SVDD) [7]. OCSVM was proposed by Schölkopf et al. [6], in which, to improve generalization ability, novelty detection boundary is constructed to separate the origin from the input samples with the maximal margin. The performance of OCSVM is very sensitive to the parameters, making it difficult to be generalized to other applications [8].

SVDD was proposed by Tax and Duin [7], in which the minimal ball is constructed to enclose most of the training samples. Novelty point is assessed by determining whether a test point lies within the minimal ball or not. The margin between the closed boundary surrounding the positive data and that surrounding the negative data is zero, which makes the method of poor generalization ability. A small sphere and large margin (SSLM) approach was proposed by Wu and Ye [9], in which the smallest hypersphere is constructed to surround the normal data; meanwhile, the margin from any outlier to this hypersphere is as large as possible. An incremental weighted one-class support vector machine for mining streaming data was proposed by Krawczyk and Wózniak [10, 11], in which the weights to each object are modified according to its level of significance, and the shape of the decision boundary is influenced only by new objects that carry new and useful knowledge extending the competence of the classifier.

Support vector machine (SVM) can be solved through figuring out quadratic programming (QP) problem, which has the important computational advantage of avoiding the problem of local minima. However, solving the corresponding SVM problems using the naive implementation of QP solver takes computational time complexity and at least space complexity if the number of training patterns is . Obviously, the naive implementation of QP solver is difficult to meet the practical application of novelty detection in large scale datasets. Tsang et al. proposed the core vector machine (CVM) [12, 13] as the approximation algorithm of minimum enclosing ball (MEB) for large scale problems. The key idea is that the implementation of QP solver for corresponding SVM problems could be equivalently viewed as MEB problems. By utilizing an approximation algorithm for the MEB problem in computational geometry, the time complexity of CVM algorithm is linear to the number of training patterns. Moreover, the space complexity is irrelevant to the number of training patterns.

As mentioned above, only normal training patterns are used to build the detection model in most novelty detection algorithms. In practical applications of novelty detection, it is difficult, but not impossible, to obtain a very few abnormal training patterns. For instance, in machine fault detection, in addition to extensive measurements on the normal working conditions, there may be also some measurements on faulty situations [14]. Recently, extensive and comprehensive researches have been carried out in both academia and industry to solve the imbalanced novelty detection problem.

Kernel-based novelty detection based on imbalanced data is researched in this paper. Suppose , , is a given training dataset with examples, where is the th input instance, is a class identity label associated with instance , is the set of majority training patterns and , is the set of minority training patterns and , and . is the feature mapping function defined by a given kernel function . The length of the perpendicular projection of the training pattern onto the vector is expressed as , which actually reflects the information about the angular and the Euclidean distances between and in the Euclidean vector space. According to the definition in [15], is called vector-angular.

In this paper, a large vector-angular region and margin (LARM) algorithm and its fast training method based on core set are proposed for novelty detection, where the training patterns are imbalanced. The main contributions of this paper lie in three aspects. Firstly, the boundary of SVM is only determined by the support vectors and the distribution of the data in the training set is not considered [16]. However, recent theoretical results have proved that data distribution information is crucial to the generalization performance [17, 18]. The proposed algorithm in this paper aims to find an optimal vector in the feature space, in which the mean and the variance of vector-angular are maximized and minimized, respectively. Therefore, normal and abnormal examples are well separated when projected onto the optimal vector joining their large mean and small variance. Secondly, the proposed LARM integrates one-class and binary classification algorithms to tackle the novelty detection problem based on imbalanced data, which constructs the largest vector-angular region in the feature space to separate normal training patterns and maximizes the vector-angular margin between the optimal vector-angular region and the abnormal data. Since the number of normal training patterns is sufficient, the largest vector-angular region is constructed accurately, which can minimize the chance of accepting the normal examples. To achieve better generalization performance, the vector-angular margin between the surface of this optimal vector-angular region and the abnormal data is maximized. Thirdly, the core set based LARM algorithm is proposed for fast training LARM problem. The time and space complexity of core set based LARM are linear to and independent of the number of training patterns, respectively.

The structure of this paper is organized as follows. Section 1 introduces the novelty detection technique and presents an analysis of the existing problems. Section 2 introduces -support vector machine (-SVM), two-class SVDD, and maximum vector-angular margin classifier (MAMC). Section 3 presents the proposed LARM for novelty detection and its fast training method based on core set. Experimental results are shown in Section 4 and conclusions are given in Section 5.

#### 2. -SVM, SVDD, and MAMC

##### 2.1. -SVM

-SVM was proposed by Schölkopf et al. [19] to solve the binary classification problem, which uses the parameter to control the number of support vectors and the bound of the classification errors. -SVM can be modeled as follows:where is the normal vector of the decision hyperplane, is the bias of the classifier, is the margin, is the vector of slack variables, and is a positive constant. -SVM obtains the optimal hyperplane for separating the two classes with a maximal margin . To classify a testing instance , the decision function takes the sign function of the optimal hyperplane .

##### 2.2. SVDD

One-class SVDD and two-class SVDD were proposed by Tax and Duin in 2004 [7], in which the minimal ball is constructed to enclose most of the training patterns. Here, we only review two-class SVDD that can utilize the abnormal data. Two-class SVDD can be modeled as follows:where and are the radius and the center of the hypersphere, and are two trade-off parameters which can treat imbalanced datasets, and is the vector of slack variables. The testing instance can be determined, whether it is inside of the optimal hypersphere or not. Hence, the decision function of two-class SVDD is .

##### 2.3. MAMC

MAMC was proposed by Hu et al. in 2012 [15], which attempts to find an optimal vector in the feature space based on the maximum vector-angular margin. MAMC can be modeled as follows:where is the optimized vector, is the vector-angular margin, is the vector of slack variables, and and are two positive constants. To classify a testing instance , the decision function is defined as .

#### 3. Core Set Based Large Vector-Angular Region and Margin

In this section, LARM algorithm and its fast training method based on core set are proposed for novelty detection with imbalanced data.

##### 3.1. LARM

To tackle the novelty detection problem on imbalanced data, the distribution of vector-angular and maximization of vector-angular margin are considered in this paper. Figure 1 illustrates the principle of LARM.