Mathematical Problems in Engineering

Volume 2015 (2015), Article ID 269856, 10 pages

http://dx.doi.org/10.1155/2015/269856

## A Structural SVM Based Approach for Binary Classification under Class Imbalance

^{1}Key Laboratory of Intelligent Computing & Signal Processing, Ministry of Education, Anhui University, No. 3, Feixi Road, Hefei, Anhui 230039, China^{2}School of Computer, Anhui University, No. 3, Feixi Road, Hefei 230039, China

Received 4 January 2015; Accepted 4 May 2015

Academic Editor: Haibo He

Copyright © 2015 Fan Cheng et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Class imbalance situations, where one class is rare compared to the other, arise frequently in machine learning applications. It is well known that the usual misclassification error is not suitable in such settings. A wide range of performance measures such as AM and QM have been proposed for this problem. However, due to computational difficulties, few learning techniques have been developed to directly optimize for AM or QM metric. To fill the gap, in this paper, we present a general structural SVM framework for directly optimizing AM and QM. We define the loss functions oriented to AM and QM, respectively, and adopt the cutting plane algorithm to solve the outer optimization. For the inner problem of finding the most violated constraint, we propose two efficient algorithms for the AM and QM problem. Empirical studies on the various imbalanced datasets justify the effectiveness of the proposed approach.

#### 1. Introduction

Classification problem with class imbalance where one class is rare compared to the other is a common yet important problem in supervised learning. It arises in many applications, ranging from medical diagnosis and text retrieval to credit risk prediction and fraud detection [1–4]. Due to its practical importance, it has been identified as one of the ten most challenging problems in data mining research [5]. For reasons of simplicity and with no loss in generality, only binary classification problems under class imbalance are considered in this paper. However, it is important to keep in mind that the class imbalance problem is pervasive in other areas as well such as multiclass classification and association rule mining.

It is well known that the usual binary learning algorithms are ill-suited in the imbalanced domains, because those classifiers will cause a bias towards the majority class and result in a lower sensitivity in detecting the minority class examples [6]. In the literature of solving class imbalance problems, a variety of approaches have been proposed, which can be mainly categorized into two groups: the data-oriented methods and the algorithm-oriented methods.

The data-oriented methods use various sampling techniques to oversample instances in the minor class [6–8] or undersample those in the major class [9, 10], so that the resulting data is balanced. A typical example is the SMOTE approach [6] which increases the number of minor class instances by creating synthetic samples.

The second group algorithm-oriented methods aim at the extension and modification of existing classification algorithms so that they can be more effective in dealing with imbalanced data. For example, Liu et al. and Kang and Ramamohanarao have presented two different modified decision tree algorithms for improving the standard C4.5, such as CCPDT [11] and HeDEx [12], while Köknar-Tezel et al., Joachims et al., and Lipton et al. have proposed various approaches to improve traditional SVM’s performance on the imbalanced settings [13–22].

Those two groups are both effective and it is difficult to say which one is better. However, since, in this paper, our goal is to improve the existing statistical learning algorithm, in the following we are interested in algorithm-oriented method and propose a modified SVM approach by directly optimizing imbalance measure. It seems that our algorithm is similar to the algorithms in [15–22]; however, we design different objective functions and use different optimization techniques with theirs. More specifically, this paper makes the following contributions.(1)We adopt 1-slack structural SVM as the framework and define the loss functions oriented to AM and QM, which are rarely considered in the literature of optimizing imbalance metrics.(2)We show that the QM loss is a lower bound of the AM one, which means our QM classifier may be more accurate than the AM one.(3)For the inner computational challenge of the AM loss, we propose to decompose it nicely and apply the Hardy-Littlewood-Polya inequality to solve it in time, while, for the case of QM, such decomposition is impossible. We present an efficiently greedy method for solving this problem, which also requires time.(4)Empirical evaluations on the imbalanced datasets demonstrate that the proposed algorithms are not only significantly better than standard binary learning algorithm but also competitive to other existing imbalanced algorithms.The remainder of the paper is organized as follows. In Section 2 the related work is presented. Section 3 discusses the details of our proposed algorithms and the empirical results on the benchmark datasets are reported in Section 4. Section 5 concludes the paper and discusses the future work.

#### 2. Related Work

##### 2.1. Problem Setup and Notations

As discussed in Introduction, in this paper, we only consider the binary classification problem. Given training dataset , where is the th example and is the corresponding class label. The binary classification problem is to construct a classifier function , which gives generalization performance. We assume that the classifier function is of the form and the decision function of the form is used when finding the label of an unseen example. Note that we have not included the bias term in the classifier function for notational convenience. However, it can be incorporated in a straightforward way.

In machine learning area, a common way to find the linear parameter is minimizing a regularized risk function:where is a constant that controls the trade-off between training error minimization and margin maximization. is a suitable loss function which measures the discrepancy between a true label and a predicted value from using . Different loss functions yield different learners. One of the most famous loss functions is the hinge loss in SVM, which has the form of .

##### 2.2. Relevant Background

Standard SVM has been used to optimize an estimation of classification error on the training set and was shown to be a very powerful tool for classification problems when data is balanced. However, if the data is highly imbalanced, classification error is not always a good measure, and the standard SVM can be misleading. To solve this problem, a number of modified algorithms have been proposed. For example, Köknar-Tezel and Latecki [13] and Shao et al. [14] proposed approaches to improve SVM on imbalanced datasets, which they called GP and WLTSVM, respectively. But their works are both focused on improving sampling techniques (e.g., modifying SMOTE in GP) for SVM and do not solve the problem of training bias in the design of SVM learning algorithm per se. Recently, with the advances in learning to rank, direct optimization of the ranking measure technique has been extended to design SVM for imbalanced setting and a variety of algorithm-oriented methods have been proposed. Joachims [15] and Aiolli [16] presented algorithms to optimize AUC for the imbalanced data, and the experimental results on the unbalanced sets proved their effectiveness. Along the lines of the above works, Paisitkriangkrai and Narasimhan et al. further gave algorithms by optimizing partial AUC and successfully applied their approaches to the real-world tasks [17–19]. Optimizing the F-measure is another popular method for imbalance learning. Joachims [15], Chinta et al. [20], Maratea et al. [21], and Lipton et al. [22] used different approximates to the F-measure and designed different classifies. Numerical experiments on the benchmark datasets demonstrated their algorithms’ effectiveness.

However, it is well known that, in evaluating imbalanced setting, there are many other performance measures besides AUC and F-measure, which include AM (arithmetic mean) [23] and QM (quadratic mean) [24]. The AM is the arithmetic mean of the true positive and true negative rates and can be defined asThe QM is a quadratic mean measure and is defined aswhereAlthough AM and QM are popular in the imbalanced setting, surprisingly, little has been focused on designing the algorithms based on them. Until very recently, Menon provided a consistent algorithm, which aimed at directly optimizing AM measure [25]. This approach is effective, but it is only suitable for the AM measure; whether it can be extended to other measures such as QM is still unknown. In contrast to Menon’s work, in this paper, we will present a general learning framework, whose loss function allows us to incorporate different imbalanced measure. We exploit it for optimizing AM and QM. In the following, we will discuss our approach in detail.

#### 3. DOPMID: Direct Optimization of Performance Measure for Imbalanced Dataset

##### 3.1. The Framework of DOPMID

We referred to the classifier we presented as DOPMID (Direct Optimization of Performance Measure for Imbalanced Dataset). The framework of DOPMID is based on structural SVM proposed by Joachims et al. [26]. Specifically, we use the 1-slack SVM formulation, presented in (OP1) (optimization problem 1), to learn a linear . Note that the following approach can be extended to nonlinear function/non-Euclidean instance spaces by using kernels [27]:For simplicity, in the paper, we assume that the training dataset has been ordered by the positive instances ahead of the negative ones, and we define , , where is the number of the positive instances, is the number of the negative instances, and . stands for any possible permutation of predicted list from using the parameter . represents a mapping function from input list to output list. is a function used to measure the difference between the real output and the predicted output . This function must satisfy the following conditions:In contrast to the traditional SVM which has slack , there is only a single slack variable in the (OP1) above. We refer to it as the “1-slack” SVM.

##### 3.2. The Loss Functions Oriented to AM and QM

For the framework above, we need to further define the functions and , in order to determine the optimization target.

In this paper, we first define asThen we define oriented to AM and QM, respectively, asIn equality (8) and (9), the function is an indicator function, which can be demonstrated asIt is obvious that and defined in (8) and (9) satisfy the constraint conditions in (6). It has been proved that if the function satisfies (6), the slack is a convex upper bound on the training loss regularized by the norm of the weight vector [26].

In the following, we will show the fact that although , are both upper bound, is a lower bound than .

Lemma 1. * defined by (9) is a lower bound than defined by (8).*

*Proof. *Since the slack is a convex upper bound on the training loss, we can rewrite (OP1) as Then we replace (8) and (9) with and get the AM bound and QM bound, respectively:We can simplify (14) asSincewe obtainSince we obtainCombining inequality (17) and (19), we getwhich means that and proves the claim.

We can solve the (OP1) by substituting (7), (8), and (9) with (5), but unfortunately there is still a question: for each , inequality (5) has an exponential number of constraints. To solve this problem, we propose to use the cutting plane algorithm, which is based on the fact that, for any , a small subset of the constraints is sufficient to find an -approximate solution to the problem. The detail of the cutting plane algorithm is shown in Algorithm 1.