Scientific Programming

Volume 2016 (2016), Article ID 8035089, 9 pages

http://dx.doi.org/10.1155/2016/8035089

## A Cost-Sensitive Sparse Representation Based Classification for Class-Imbalance Problem

^{1}School of Computer and Information Security, Guilin University of Electronic Technology, Guilin 541004, China^{2}School of Electronic Engineering and Automation, Guilin University of Electronic Technology, Guilin 541004, China^{3}School of Automation, Beijing University of Posts and Telecommunications, Beijing 100876, China

Received 8 August 2016; Accepted 16 October 2016

Academic Editor: Kun Hua

Copyright © 2016 Zhenbing Liu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Sparse representation has been successfully used in pattern recognition and machine learning. However, most existing sparse representation based classification (SRC) methods are to achieve the highest classification accuracy, assuming the same losses for different misclassifications. This assumption, however, may not hold in many practical applications as different types of misclassification could lead to different losses. In real-world application, much data sets are imbalanced of the class distribution. To address these problems, we propose a cost-sensitive sparse representation based classification (CSSRC) for class-imbalance problem method by using probabilistic modeling. Unlike traditional SRC methods, we predict the class label of test samples by minimizing the misclassification losses, which are obtained via computing the posterior probabilities. Experimental results on the UCI databases validate the efficacy of the proposed approach on average misclassification cost, positive class misclassification rate, and negative class misclassification rate. In addition, we sampled test samples and training samples with different imbalance ratio and use -measure, -mean, classification accuracy, and running time to evaluate the performance of the proposed method. The experiments show that our proposed method performs competitively compared to SRC, CSSVM, and CS4VM.

#### 1. Introduction

As a powerful tool for statistical signal modeling, sparse representation (or sparse coding) has been successfully used in pattern recognition fields [1], such as texture classification [2] and face recognition [3, 4], in the past few years. In [3], John et al. proposed a sparse representation based classification (SRC) method when they solve the face recognition under various illuminations and occlusions, which represents an input test image as a sparse linear combination of training images and assigned the test image to the class whose training samples can best reconstruct it. In theirs work, they used -regularizer rather than -regularizer to regularize the objective function and then calculated the residuals between the original test sample and the reconstructed one to identify the query image’s label. Such a sparse representation based classification framework has achieved a great success in face recognition and has boosted the research of sparsity related machine learning methods.

Traditional classification algorithms [5], including SRC, are designed to achieve the lowest recognition errors and assume the same losses for different types of misclassifications. However, this assumption may not be suitable for many real-world applications. For example, it may cause inconvenience to a gallery who is misclassified as an impostor and not allowed to enter the room controlled by a face recognition system but may result in a serious loss if an impostor is misclassified as a gallery and allowed entering the room. In such settings, the loss of misclassification should be taken into consideration, and “cost” information can be introduced to measure the severity of misclassification. In recent years, many cost-sensitive methods have been proposed. The typical works include the Cost-Sensitive Semisupervised Support Vector Machine (CS4VM) and Cost-Sensitive Laplacian Support Vector Machines (CSLSVM) proposed by Zhou et al. [6, 7], a cost-sensitive Naïve Bayes method from a novel perspective of inferring the order relation [8] proposed by Fang et al., and novel cost-sensitive approach proposed by Castro and Braga to improve the performance of multilayer perceptron [9]. In [10], an instance weighting method was incorporated into various Bayesian network classifiers. The probability estimation of Bayesian network classifiers was modified by the instance weighting method, which made Bayesian network classifiers cost-sensitive. In [11], Lo et al. presented a basis expansions model for multilabel classification to handle the cost-sensitive multilabel classification problem, where a basis function is an LP classifier trained on a random -label set. In [12], Wan et al. proposed a cost-sensitive feature selection method called Discriminative Cost-Sensitive Laplacian Score (DCSLS) for face recognition, which incorporated the idea of local discriminant analysis into Laplacian Score.

Cost-sensitive learning always coexists with class-imbalance in most applications with the goal of minimizing the total misclassification cost [13]. Class-imbalance has been considered as one of the most challenging problems in machine learning and data mining. The ratio of imbalance (the size of majority class to minority class) can be as huge as 100, even up to 10000. Much work has been done in addressing the class-imbalance problem. Cost-sensitive learning is an effective method to deal with the imbalance data classification problem. In recent year, cost-sensitive learning has been studied widely and become one of the most important topics for solving the class-imbalance problem. In [14], Zhou and Liu studied empirically the effect of sampling and threshold-moving in training cost-sensitive neural networks and revealed that threshold-moving and soft-ensemble are relatively good choices in training cost-sensitive neural networks. There are also some other cost-sensitive learning methods by improving the existed method. In [15], Sun et al. proposed a cost-sensitive boosting algorithms, which are developed by introducing cost items into the learning framework of AdaBoost. Another strategy for class-imbalance problem is based on exchanging the distribution of data sets. In [16], Jiang et al. proposed a novel Minority Cloning Technique (MCT) for class-imbalanced cost-sensitive learning. MCT alters the class distribution of training data by cloning each minority class instance according to the similarity between it and the mode of the minority class. Generally, users focus more on the minority class and consider the cost of misclassifying a minority class to be more expensive. In our study, we adopt the same strategy to address this problem.

In [17], a probabilistic cost-sensitive classifier was proposed for face recognition; they utilize the probabilistic model to estimate the posterior probability of a testing sample and calculate all the misclassification losses via the posterior probabilities. Motivated by this probabilistic model and probabilistic subspace clustering [17–19], we proposed a new method to handle misclassification cost. In sparse representation, it will play an important role for reconstruction if the value of coefficient is higher [20]. In other words, the coefficient is 1 when a query sample was represented by a dictionary with the same sample as the query one. Just like Gaussian distribution, a sample that is close to the mean vector has a higher probability. Inspired by probabilistic model, we use coefficient matrix to calculate the posterior probabilities rather than the distribution of noise (residual) in [17] and they have to estimate the distribution of noise. The main advantage of our method is to reduce the computation complexity and computation cost, and the contribution of the proposed method is obtaining the posterior probability by coefficient vector of sparse representation. After calculating all the misclassification losses via the posterior probabilities, the test sample is assigned to the class whose loss is minimal. Experimental results on UCI databases validate the effectiveness and efficiency of our methods.

This paper is organized as follows. Section 2 outlines the details of the relevant method. Section 3 presents the details of the proposed algorithm. Section 4 reports the experiments. Finally, Section 5 concludes the paper and offers suggestions for future research.

#### 2. Related Works

In this section, we briefly introduce some related works, including sparse representation based classification and cost-sensitive learning framework.

##### 2.1. Sparse Representation Based Classification

Sparse representation is a typically method in machine learning [3, 21, 22], which is to use labeled training samples from distinct object classes to learn a dictionary and determine the label of an unseen new test sample correctly. We denote the data set with training samples from the th class as a matrix and is the number of all training samples, where is the number of classes in training set. Given sufficient training samples of the th class, any test sample from the same class will be approximately represented linearly by the training samples of class : Then, rewrite the above representation of in matrix form as , where . Then, define a new matrix for the entire training set as follows:Many method based distances are not robust in real-world applications because of various occlusions. To overcome this limitations, Wright introduced the sparse representation based classification method to represent the query image. Then, the linear representation of can be rewritten in terms of all training samples aswhere , whose entries are zero except those associated with the th class. This motivates us to seek the sparsest solution by solving the following optimization problem: where denotes the -norm, which counts the number of nonzero entries in a vector. However, the above problem of finding the sparsest solution (-norm minimization problem) is nonconvex and actually NP-hard. Generally, if the solution sought is sparse enough, the solution of the -minimization problem is equal to the solution of the following -minimization problem [4, 22, 23]: The real data are noisy; it may not represent the test sample exactly. To deal with the noises, John et al. extended the -norm minimization problem to the following formulation:where is a noise term with bounded energy . The sparse solution can still be obtained by solving the following stable -minimization problem:To better harness such linear structure, they instead classify based on how well the coefficients associated with all training samples of each object reproduce . Let be the solution of (7), for each class , let be the characteristic function that selects the coefficients associated with the th class. Using the coefficients, one can approximate the given test sample as , where . They then compute the residual (Euclidean distance) between and :The label of the test sample can be identified by minimizing as follows:

##### 2.2. Cost-Sensitive Function

In multiclass cost-sensitive learning, considering gallery subjects with their class labels , many impostors, whose labels are . In [7], Zhang and Zhou categorized the costs into three types: cost of false acceptance , cost of false rejection , and cost of false identification . Empirically, it is evident that , , and are unequal. Give a cost setting according to the users and reassign , , and . Here, for the ease of understanding, we still preserve the original formulation. We can construct a multiclass cost matrix as shown in where indicates the cost of misclassifying a sample of the th class as the th class. The diagonal elements of are all zero since there is no loss for correct recognition.

Cost-sensitive learning usually sets the misclassification cost as objective function and identifies the label by minimizing loss function. Given a test sample and its predicted class label as , respectively, the label is obtained by minimizing the objective function:wherewhere is the optimal prediction of and represents the gallery subjects in classification problem.

#### 3. Cost-Sensitive SRC

In [5], Alpaydın calculated the residuals to identify the class label of a test sample , which is the Euclidean distance between reconstructed sample and the original test sample . In cost-sensitive learning, the loss function (see (7)) is regarded as an objective function to identify the label of a test sample. In binary classification problem, there are two misclassification costs, and we denote the cost that misclassifies positive class as negative class by and the cost by conversely. Then a cost matrix can be constructed as shown in where and represents the label of minority class and majority class, respectively.

It is well known that the loss function can be related to the posterior probability . Then the loss function can be rewritten as follows: The test sample belongs to the class with higher probability. Now, we will estimate , .

In coefficient matrix, the larger the element value is, the more important the role it will play for reconstructing a test sample. In other words, it is best to represent the test sample by training samples and they have the same class label, and there are no samples from different class in this linear combination. The posterior probability can be related to the coefficient matrix. Accordingly, we rewrite the solution of (7) as , where and represent the positive class coefficient and negative class coefficient, respectively. Here, is the number of positive samples and is the number of negative samples in dictionary. Then, we can obtain the posterior probabilities:where . Then, (14) can be written asWe can obtain the label of a test sample by minimizing (16):The whole process of CSSRC is described in Algorithm 1.

*Algorithm 1 (CSSRC algorithm). * *Input*. Dictionary , test sample *Output*. The label of test sample (1)Normalize the columns of to unit -norm(2)Solve the -minimization problem: Or alternatively, solve Assume the solution is (3)Calculate the loss function: where (4)Obtain the label of :

#### 4. Experiments

##### 4.1. Data Sets and Experimental Setting

We test the proposed method on seven UCI data sets [24]. Detailed information about these data sets is summarized in Table 1.