Applied Computational Intelligence and Soft Computing

Volume 2016, Article ID 2783568, 9 pages

http://dx.doi.org/10.1155/2016/2783568

## Low-Rank Kernel-Based Semisupervised Discriminant Analysis

^{1}School of Electronic and Information Engineering, Hebei University of Technology, Tianjin 300401, China^{2}Key Lab of Big Data Computation of Hebei Province, Tianjin 300401, China

Received 16 April 2016; Accepted 14 June 2016

Academic Editor: Yu Cao

Copyright © 2016 Baokai Zu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Semisupervised Discriminant Analysis (SDA) aims at dimensionality reduction with both limited labeled data and copious unlabeled data, but it may fail to discover the intrinsic geometry structure when the data manifold is highly nonlinear. The kernel trick is widely used to map the original nonlinearly separable problem to an intrinsically larger dimensionality space where the classes are linearly separable. Inspired by low-rank representation (LLR), we proposed a novel kernel SDA method called low-rank kernel-based SDA (LRKSDA) algorithm where the LRR is used as the kernel representation. Since LRR can capture the global data structures and get the lowest rank representation in a parameter-free way, the low-rank kernel method is extremely effective and robust for kinds of data. Extensive experiments on public databases show that the proposed LRKSDA dimensionality reduction algorithm can achieve better performance than other related kernel SDA methods.

#### 1. Introduction

For many real world data mining and pattern recognition applications, the labeled data are very expensive or difficult to obtain, while the unlabeled data are often copious and available. So how to use both labeled and unlabeled data to improve the performance becomes a significant problem [1, 2]. Recently, semisupervised dimensionality reduction has attracted considerable attention, which can be directly used in the whole database [3]. Illuminated by semisupervised learning (SSL), many methods have been put forward to relieve the so-called small sample size (SSS) problem of LDA [4, 5]. Semisupervised Discriminant Analysis (SDA) first is proposed by Cai et al. [2], which can easily resolve the out-of-sample problem [6] and is more suitable for the real world applications. In SDA algorithm, the labeled samples are used to maximize the different classes’ separability and the unlabeled ones to estimate the data’s intrinsic geometric information.

Semisupervised Discriminant Analysis may fail to discover the intrinsic geometry structure when the data manifold is highly nonlinear [2, 7]. The kernel trick [8] has been widely used to generalize linear dimensionality reduction algorithms to nonlinear ones, which maps the original nonlinearly separable problem to an intrinsically larger dimensionality space where the classes are linearly separable. So the kernel SDA (KSDA) [2, 7] can discover the underlying subspace more exactly in the feature space, which brings a better subspace for the classification task by a nonlinear learning technique. Cai et al. discussed how to perform SDA in Reproducing Kernel Hilbert Space (RKHS), which gives rise to kernel SDA [2]. You et al. have presented the derivations of a first approach to optimize the parameters of a kernel. It can map the original class distributions to a space where these are optimally (with respect to Bayes) separated with a hyperplane [7]. A new kernel-based nonlinear discriminant analysis algorithm is proposed to solve the fundamental limitations in LDA [9]. A novel KFDA kernel parameters optimization criterion is presented for maximizing the uniformity of class-pair separabilities and class separability in kernel space simultaneously [10]. To overcome the nonlinear dimensionality reduction problems and adopting multiple features restrictions of LFDA, Wang and Sun proposed a new dimensionality reduction algorithm called multiple kernel local Fisher discriminant analysis (MKLFDA) based on the multiple kernel learning [11]. The kernelization of graph embedding applies the kernel trick on the linear graph embedding algorithm to handle data with nonlinear distributions [12]. Weinberger et al. described an algorithm for nonlinear dimensionality reduction based on semidefinite programming and kernel matrix factorization which learns a kernel matrix for high dimensional data that lies on or near a low-dimensional manifold [13].

Low-rank matrix decomposition and completion are recently becoming very popular since Yang et al. and Chen et al. proved that a robust estimation of an underlying subspace which can be obtained by decomposing the observations into a low-rank matrix and a sparse error matrix [14, 15]. Recently, Liu et al. propose a low-rank representation method which is robust to noise and data corruptions due to its ability to decompose noise from the data set [14]. More recently, low-rank representation [16, 17], as a promising method to capture the underlying low-dimensional structures of data, has attracted much attention in the pattern analysis and signal processing communities. LRR method [16–18] seeks the lowest rank representation of all data jointly, such that each data point can be represented as a linear combination of some bases.

The major problem of kernel methods is to find the proper kernel parameters. But all these kernel methods usually use fixed global parameters to determinate the kernel matrix, which are very sensitive to the parameters setting. In fact, the most suitable kernel parameters may vary greatly at different random distribution of the same data. Moreover, the kernel mapping of KSDA always analyze the relationship of the data using the mode one-to-others, which emphasizes local information and lacks global constraints on their solutions. These shortcomings limit the performance and efficiency of KSDA methods. To overcome the disadvantages of the traditional kernel methods, inspired by LRR, we proposed a novel kernel-based Semisupervised Discriminant Analysis called low-rank kernel-based SDA (LRKSDA) where the low-rank representation is used as the kernel method. Compared with other kernels, the low-rank kernel jointly obtains the representation of all the samples under a global low-rank constraint [19]. Thus it is better at capturing the global data structures and very robust to different random distribution of the data set. In addition, we can get the lowest rank representation in a parameter-free way, which is very convenient and robust for kinds of data. Extensive experiments on public databases show that our proposed LRKSDA dimensionality reduction algorithm can achieve better performance than other related methods.

The rest of the paper is organized as follows. We start by a brief review on an overview of SDA in Section 2. We then introduce the low-rank kernel-based SDA framework in Section 3. Then Section 4 reports the experiment results on real world database tasks. In Section 5, we conclude the paper.

#### 2. Overview of SDA

Given a set of samples , where , the first samples are labeled as , and the remaining are unlabeled ones. They all belong to classes. The SDA [2] hopes to find a rejection matrix , which motivates us to present the prior assumption of consistency by a regularizer term. The objective function is as follows: where and are the between class scatter and total class scatter matrix. And is defined as the within class scatter matrix where is the mean vector of the total sample, is the number of samples in the th class, is the average vector of the th class, and is the th sample in the th class.

The parameter in (1) balances the model complexity and the empirical loss. The regularizer term supplies us with the flexibility to incorporate the prior knowledge in the applications. We aim at constructing graph combining the manifold structure through the available unlabeled samples [2]. The key of SSL algorithm is the prior assumption of consistency. For classification, it means that the nearby samples are likely to have same label [20]. And for dimensionality reduction, it implicates that the nearby samples have similar embeddings (low-dimensional representations).

Given a set of samples , we can construct the graph to represent the relationship between nearby samples by NN algorithm. Then put an edge between nearest neighbors of each other. The corresponding weight matrix is defined as follows: where denotes the set of nearest neighbors of . Then term can be defined as follows: where is a diagonal matrix whose entries are column (or row since is symmetric) sum of ; that is, . The Laplacian matrix [21] is .

We can get the objective function of the SDA with regularizer term [2]: By maximizing the generalized eigenvalue problem, we can obtain the projective vector :

#### 3. Low-Rank Kernel-Based SDA Framework

##### 3.1. Low-Rank Representation

Yan and Wang [22] proposed sparse representation (SR) to construct -graph [23] by solving optimization problem. However, -graph lacks global constraints, which greatly reduce the performance when the data is grossly corrupted. To solve this drawback, Liu et al. proposed the low-rank representation and used it to construct the affinities of an undirected graph (here called LR-graph) [19]. It jointly obtains the representation of all the samples under a global low-rank constraint, and thus it is better at capturing the global data structures [24].

Let be a set of samples; each column is a sample which can be represented by a linear combination of the dictionary [19]. Here, we select the samples themselves as the dictionary : where is the coefficient matrix with each being the representation coefficient of . Different from the SR which may not capture the global structure of the data, LRR seeks the lowest rank solution by solving the following optimization problem [19]: The above optimization problem can be relaxed to the following convex optimization [25]: Here denotes the nuclear norm (or trace norm) [26] of a matrix, that is, the sum of the matrix’s singular values. By considering the noise or corruption in our real world applications, a more reasonable objective function is where can be -norm or -norm. In this paper we choose -norm as the error term which is defined as . The parameter is used to balance the effect of low rank and the error term. The optimal solution can be obtained via the inexact augmented Lagrange multipliers method [27, 28].

##### 3.2. Kernel SDA

Semisupervised Discriminant Analysis may fail to discover the intrinsic geometry structure when the data manifold is highly nonlinear. The kernel trick is a popular technique in machine learning which uses a kernel function to map samples to a high dimensional space [8, 29, 30]. By using the kernel trick, we can nonlinearly map the original data to the kernel feature space.

Let be a nonlinear mapping from into feature space. For any two points and , we use a kernel function to map the data into a kernel feature space. Some commonly used kernels are including the Gaussian radial basis function (RBF) kernel , polynomial kernel , and sigmoid kernel [2, 31].

Let denote the data matrix in the kernel space: . The projective vectors are the eigenvector problem in (6) and then we get transformation matrix . The number of the feature dimensions can be decided by us. Then a data point can be embedded into dimensional feature space by where .

Kernel SDA (KSDA) [2, 7] can discover the underlying subspace more exactly in the feature space. It results in a better subspace for the classification task by a nonlinear learning technique.

##### 3.3. Low-Rank Kernel-Based SDA

The major problem of all these kernel methods is to find the proper kernel parameters. And they usually use fixed global parameters to determinate the kernel matrix, which is very sensitive to the parameters setting. In fact, the most proper kernel parameters may vary greatly at different random distribution even if they are for the same data. Moreover, the traditional kernel mapping always analyzes the relationship of the data using the mode one-to-others, which emphasizes local information and lacks global constraints on their solutions. These shortcomings limit the performance and efficiency of KSDA methods. To overcome these shortcomings mentioned above, inspired by low-rank representation, we propose a novel kernel-based Semisupervised Discriminant Analysis (LRKSDA) where LRR is used as the kernel representation.

Let be a low-rank mapping from into a low-rank kernel feature space . For the database , a reasonable objective function is as follows: The optimal solution is the coefficient matrix with each being the low-rank representation coefficient of .

Let denote the data matrix in the kernel space. The projective vectors are the eigenvector problem in (6) and transformation matrix is . The number of the feature dimensions can be decided by us. Then a data point can be embedded into dimensional feature space by where is the low-rank representation of .

Since the low-rank representation jointly obtains the representation of all the samples under a global low-rank constraint to capture the global data structures, we can get the lowest rank representation in a parameter-free way, which is very convenient and robust for kinds of data. So low-rank kernel-based SDA algorithm can improve the performance to a very large extent. The step of the LRKSDA is as follows.

Firstly, map the labeled and unlabeled data to the LR-graph kernel space. Secondly, execute the SDA algorithm for dimensionality reduction. Finally execute the nearest neighbor method for the final classification in the derived low-dimensional feature subspace. The procedure of low-rank kernel-based SDA is described as follows.

*Algorithm 1 (low-rank kernel-based SDA algorithm). **Input.* The whole data set , where samples are labeled and are unlabeled ones.*Output.* The classification results.*Step 1.* Map the labeled and unlabeled data to feature space by the LRR algorithm: *Step 2*. Implement the SDA algorithm for dimensionality reduction.*Step 3*. Execute the nearest neighbor method for final classification.

#### 4. Experiments and Analysis

In this section, we conduct extensive experiments to examine the efficiency of low-rank kernel-based SDA algorithm. The simulation experiment is conducted in MATLAB7.11.0 (R2010b) environment on a computer with AMD Phenom(tn)II P960 1.79 GHz CPU and 2 GB RAM.

##### 4.1. Experiment Overview

###### 4.1.1. Databases

The proposed LRKSDA is tested on six real world databases, including three face databases and three University of California Irvine (UCI) databases. In these experiments, we normalize the sample to a unit norm.

*(1) Extended Yale Face Database B [2]*. This database has 38 individuals and around 64 near frontal images under different illuminations per individual. Each face image is resized to 32 32 pixels. And we select the first 20 persons and choose 20 samples of each subject.

*(2) ORL Database [22].* The ORL database contains 10 different images of each for 40 distinct subjects. The images are taken at different times, varying the lighting, facial expressions, and facial details. Each face image is manually cropped and resized to 32 32 pixels, with 256 grey levels per pixel.

*(3) CMU PIE Face Database [2].* It contains 68 subjects with 41,368 face images. The face images were captured under varying poses, illuminations, and expressions. The size of each image is resized to 32 32 pixels. We select the first 20 persons and choose 20 samples for per subject.

*(4) Musk (Version 2) Data Set 2.* This database contains 2 classes and 6598 instances with 166 features. Here, we randomly select 300 examples for the experiments.

*(5) Seeds Data Set.* It contains 210 instances for three different wheat varieties. A soft X-ray technique and GRAINS package were used to construct all seven, real-valued attributes.

*(6) SPECT Heart Data Set.* The database describes diagnosing of cardiac Single Proton Emission Computed Tomography (SPECT) images. Each of the patients is classified into two categories: normal and abnormal. The database of 267 SPECT image sets was processed to extract features that summarize the original SPECT images. The pattern was further processed to obtain 22 binary feature patterns.

###### 4.1.2. Compared Algorithms

In order to demonstrate how the semisupervised dimensionality reduction performance can be improved by low-rank kernel-based SDA, we list out SDA, KSDA1, and KSDA2 algorithm for comparison. In all experiments, the number of the nearest neighbors in the NN regularizer graph is set to 4.

*(1) KSDA1 Algorithm.* KSDA1 algorithm is the KSDA with Gaussian radial basis function (RBF) kernel .

*(2) KSDA2 Algorithm.* KSDA2 algorithm is the KSDA which uses polynomial kernel . Here, .

The classification accuracy is influenced by the kernel parameters. So after comparing, we choose a proper kernel parameters and for the KSDA1 and KSDA2 algorithm in each database in the following pairs, respectively, where is for Extended Yale Face Database B, is for ORL database, is for CMU PIE database, is for Musk database, is for Seeds Data Set, and is for SPECT Heart Data Set, respectively. Since the most suitable kernel parameters vary greatly at different random distribution even if they are for the same data, these kernel parameters are relatively suitable after comparing by many times’ runs.

##### 4.2. Experiment 1: Different Algorithms Performances

To examine the effectiveness of the proposed LRKSDA algorithm, we conduct experiments on the six public databases. In our experiments, we randomly select 30% samples from each class as the labeled samples to evaluate the performance with different numbers of selected features. The evaluations are conducted with 20 independent runs for each algorithm. We average them as the final results. First we utilize different kernel methods to get the kernel mapping, and then we implement the SDA algorithm for dimensionality reduction. Finally, the nearest neighbor approach is employed for the final classification in the derived low-dimensional feature subspace. For each database, the classification accuracy for different algorithms is shown in Figure 1. Table 1 shows the performance comparison of different algorithms. Note that the results are the best results of all these different selected features mentioned above. From these results, we can observe the following.