Abstract

Semisupervised Discriminant Analysis (SDA) aims at dimensionality reduction with both limited labeled data and copious unlabeled data, but it may fail to discover the intrinsic geometry structure when the data manifold is highly nonlinear. The kernel trick is widely used to map the original nonlinearly separable problem to an intrinsically larger dimensionality space where the classes are linearly separable. Inspired by low-rank representation (LLR), we proposed a novel kernel SDA method called low-rank kernel-based SDA (LRKSDA) algorithm where the LRR is used as the kernel representation. Since LRR can capture the global data structures and get the lowest rank representation in a parameter-free way, the low-rank kernel method is extremely effective and robust for kinds of data. Extensive experiments on public databases show that the proposed LRKSDA dimensionality reduction algorithm can achieve better performance than other related kernel SDA methods.

1. Introduction

For many real world data mining and pattern recognition applications, the labeled data are very expensive or difficult to obtain, while the unlabeled data are often copious and available. So how to use both labeled and unlabeled data to improve the performance becomes a significant problem [1, 2]. Recently, semisupervised dimensionality reduction has attracted considerable attention, which can be directly used in the whole database [3]. Illuminated by semisupervised learning (SSL), many methods have been put forward to relieve the so-called small sample size (SSS) problem of LDA [4, 5]. Semisupervised Discriminant Analysis (SDA) first is proposed by Cai et al. [2], which can easily resolve the out-of-sample problem [6] and is more suitable for the real world applications. In SDA algorithm, the labeled samples are used to maximize the different classes’ separability and the unlabeled ones to estimate the data’s intrinsic geometric information.

Semisupervised Discriminant Analysis may fail to discover the intrinsic geometry structure when the data manifold is highly nonlinear [2, 7]. The kernel trick [8] has been widely used to generalize linear dimensionality reduction algorithms to nonlinear ones, which maps the original nonlinearly separable problem to an intrinsically larger dimensionality space where the classes are linearly separable. So the kernel SDA (KSDA) [2, 7] can discover the underlying subspace more exactly in the feature space, which brings a better subspace for the classification task by a nonlinear learning technique. Cai et al. discussed how to perform SDA in Reproducing Kernel Hilbert Space (RKHS), which gives rise to kernel SDA [2]. You et al. have presented the derivations of a first approach to optimize the parameters of a kernel. It can map the original class distributions to a space where these are optimally (with respect to Bayes) separated with a hyperplane [7]. A new kernel-based nonlinear discriminant analysis algorithm is proposed to solve the fundamental limitations in LDA [9]. A novel KFDA kernel parameters optimization criterion is presented for maximizing the uniformity of class-pair separabilities and class separability in kernel space simultaneously [10]. To overcome the nonlinear dimensionality reduction problems and adopting multiple features restrictions of LFDA, Wang and Sun proposed a new dimensionality reduction algorithm called multiple kernel local Fisher discriminant analysis (MKLFDA) based on the multiple kernel learning [11]. The kernelization of graph embedding applies the kernel trick on the linear graph embedding algorithm to handle data with nonlinear distributions [12]. Weinberger et al. described an algorithm for nonlinear dimensionality reduction based on semidefinite programming and kernel matrix factorization which learns a kernel matrix for high dimensional data that lies on or near a low-dimensional manifold [13].

Low-rank matrix decomposition and completion are recently becoming very popular since Yang et al. and Chen et al. proved that a robust estimation of an underlying subspace which can be obtained by decomposing the observations into a low-rank matrix and a sparse error matrix [14, 15]. Recently, Liu et al. propose a low-rank representation method which is robust to noise and data corruptions due to its ability to decompose noise from the data set [14]. More recently, low-rank representation [16, 17], as a promising method to capture the underlying low-dimensional structures of data, has attracted much attention in the pattern analysis and signal processing communities. LRR method [1618] seeks the lowest rank representation of all data jointly, such that each data point can be represented as a linear combination of some bases.

The major problem of kernel methods is to find the proper kernel parameters. But all these kernel methods usually use fixed global parameters to determinate the kernel matrix, which are very sensitive to the parameters setting. In fact, the most suitable kernel parameters may vary greatly at different random distribution of the same data. Moreover, the kernel mapping of KSDA always analyze the relationship of the data using the mode one-to-others, which emphasizes local information and lacks global constraints on their solutions. These shortcomings limit the performance and efficiency of KSDA methods. To overcome the disadvantages of the traditional kernel methods, inspired by LRR, we proposed a novel kernel-based Semisupervised Discriminant Analysis called low-rank kernel-based SDA (LRKSDA) where the low-rank representation is used as the kernel method. Compared with other kernels, the low-rank kernel jointly obtains the representation of all the samples under a global low-rank constraint [19]. Thus it is better at capturing the global data structures and very robust to different random distribution of the data set. In addition, we can get the lowest rank representation in a parameter-free way, which is very convenient and robust for kinds of data. Extensive experiments on public databases show that our proposed LRKSDA dimensionality reduction algorithm can achieve better performance than other related methods.

The rest of the paper is organized as follows. We start by a brief review on an overview of SDA in Section 2. We then introduce the low-rank kernel-based SDA framework in Section 3. Then Section 4 reports the experiment results on real world database tasks. In Section 5, we conclude the paper.

2. Overview of SDA

Given a set of samples , where , the first samples are labeled as , and the remaining are unlabeled ones. They all belong to classes. The SDA [2] hopes to find a rejection matrix , which motivates us to present the prior assumption of consistency by a regularizer term. The objective function is as follows: where and are the between class scatter and total class scatter matrix. And is defined as the within class scatter matrix where is the mean vector of the total sample, is the number of samples in the th class, is the average vector of the th class, and is the th sample in the th class.

The parameter in (1) balances the model complexity and the empirical loss. The regularizer term supplies us with the flexibility to incorporate the prior knowledge in the applications. We aim at constructing graph combining the manifold structure through the available unlabeled samples [2]. The key of SSL algorithm is the prior assumption of consistency. For classification, it means that the nearby samples are likely to have same label [20]. And for dimensionality reduction, it implicates that the nearby samples have similar embeddings (low-dimensional representations).

Given a set of samples , we can construct the graph to represent the relationship between nearby samples by NN algorithm. Then put an edge between nearest neighbors of each other. The corresponding weight matrix is defined as follows: where denotes the set of nearest neighbors of . Then term can be defined as follows: where is a diagonal matrix whose entries are column (or row since is symmetric) sum of ; that is, . The Laplacian matrix [21] is .

We can get the objective function of the SDA with regularizer term [2]: By maximizing the generalized eigenvalue problem, we can obtain the projective vector :

3. Low-Rank Kernel-Based SDA Framework

3.1. Low-Rank Representation

Yan and Wang [22] proposed sparse representation (SR) to construct -graph [23] by solving optimization problem. However, -graph lacks global constraints, which greatly reduce the performance when the data is grossly corrupted. To solve this drawback, Liu et al. proposed the low-rank representation and used it to construct the affinities of an undirected graph (here called LR-graph) [19]. It jointly obtains the representation of all the samples under a global low-rank constraint, and thus it is better at capturing the global data structures [24].

Let be a set of samples; each column is a sample which can be represented by a linear combination of the dictionary [19]. Here, we select the samples themselves as the dictionary : where is the coefficient matrix with each being the representation coefficient of . Different from the SR which may not capture the global structure of the data, LRR seeks the lowest rank solution by solving the following optimization problem [19]: The above optimization problem can be relaxed to the following convex optimization [25]: Here denotes the nuclear norm (or trace norm) [26] of a matrix, that is, the sum of the matrix’s singular values. By considering the noise or corruption in our real world applications, a more reasonable objective function is where can be -norm or -norm. In this paper we choose -norm as the error term which is defined as . The parameter is used to balance the effect of low rank and the error term. The optimal solution can be obtained via the inexact augmented Lagrange multipliers method [27, 28].

3.2. Kernel SDA

Semisupervised Discriminant Analysis may fail to discover the intrinsic geometry structure when the data manifold is highly nonlinear. The kernel trick is a popular technique in machine learning which uses a kernel function to map samples to a high dimensional space [8, 29, 30]. By using the kernel trick, we can nonlinearly map the original data to the kernel feature space.

Let be a nonlinear mapping from into feature space. For any two points and , we use a kernel function to map the data into a kernel feature space. Some commonly used kernels are including the Gaussian radial basis function (RBF) kernel , polynomial kernel , and sigmoid kernel [2, 31].

Let denote the data matrix in the kernel space: . The projective vectors are the eigenvector problem in (6) and then we get transformation matrix . The number of the feature dimensions can be decided by us. Then a data point can be embedded into dimensional feature space by where .

Kernel SDA (KSDA) [2, 7] can discover the underlying subspace more exactly in the feature space. It results in a better subspace for the classification task by a nonlinear learning technique.

3.3. Low-Rank Kernel-Based SDA

The major problem of all these kernel methods is to find the proper kernel parameters. And they usually use fixed global parameters to determinate the kernel matrix, which is very sensitive to the parameters setting. In fact, the most proper kernel parameters may vary greatly at different random distribution even if they are for the same data. Moreover, the traditional kernel mapping always analyzes the relationship of the data using the mode one-to-others, which emphasizes local information and lacks global constraints on their solutions. These shortcomings limit the performance and efficiency of KSDA methods. To overcome these shortcomings mentioned above, inspired by low-rank representation, we propose a novel kernel-based Semisupervised Discriminant Analysis (LRKSDA) where LRR is used as the kernel representation.

Let be a low-rank mapping from into a low-rank kernel feature space . For the database , a reasonable objective function is as follows: The optimal solution is the coefficient matrix with each being the low-rank representation coefficient of .

Let denote the data matrix in the kernel space. The projective vectors are the eigenvector problem in (6) and transformation matrix is . The number of the feature dimensions can be decided by us. Then a data point can be embedded into dimensional feature space by where is the low-rank representation of .

Since the low-rank representation jointly obtains the representation of all the samples under a global low-rank constraint to capture the global data structures, we can get the lowest rank representation in a parameter-free way, which is very convenient and robust for kinds of data. So low-rank kernel-based SDA algorithm can improve the performance to a very large extent. The step of the LRKSDA is as follows.

Firstly, map the labeled and unlabeled data to the LR-graph kernel space. Secondly, execute the SDA algorithm for dimensionality reduction. Finally execute the nearest neighbor method for the final classification in the derived low-dimensional feature subspace. The procedure of low-rank kernel-based SDA is described as follows.

Algorithm 1 (low-rank kernel-based SDA algorithm). Input. The whole data set , where samples are labeled and are unlabeled ones.
Output. The classification results.
Step  1. Map the labeled and unlabeled data to feature space by the LRR algorithm: Step  2. Implement the SDA algorithm for dimensionality reduction.
Step  3. Execute the nearest neighbor method for final classification.

4. Experiments and Analysis

In this section, we conduct extensive experiments to examine the efficiency of low-rank kernel-based SDA algorithm. The simulation experiment is conducted in MATLAB7.11.0 (R2010b) environment on a computer with AMD Phenom(tn)II P960 1.79 GHz CPU and 2 GB RAM.

4.1. Experiment Overview
4.1.1. Databases

The proposed LRKSDA is tested on six real world databases, including three face databases and three University of California Irvine (UCI) databases. In these experiments, we normalize the sample to a unit norm.

(1) Extended Yale Face Database B [2]. This database has 38 individuals and around 64 near frontal images under different illuminations per individual. Each face image is resized to 32 32 pixels. And we select the first 20 persons and choose 20 samples of each subject.

(2) ORL Database [22]. The ORL database contains 10 different images of each for 40 distinct subjects. The images are taken at different times, varying the lighting, facial expressions, and facial details. Each face image is manually cropped and resized to 32 32 pixels, with 256 grey levels per pixel.

(3) CMU PIE Face Database [2]. It contains 68 subjects with 41,368 face images. The face images were captured under varying poses, illuminations, and expressions. The size of each image is resized to 32 32 pixels. We select the first 20 persons and choose 20 samples for per subject.

(4) Musk (Version 2) Data Set 2. This database contains 2 classes and 6598 instances with 166 features. Here, we randomly select 300 examples for the experiments.

(5) Seeds Data Set. It contains 210 instances for three different wheat varieties. A soft X-ray technique and GRAINS package were used to construct all seven, real-valued attributes.

(6) SPECT Heart Data Set. The database describes diagnosing of cardiac Single Proton Emission Computed Tomography (SPECT) images. Each of the patients is classified into two categories: normal and abnormal. The database of 267 SPECT image sets was processed to extract features that summarize the original SPECT images. The pattern was further processed to obtain 22 binary feature patterns.

4.1.2. Compared Algorithms

In order to demonstrate how the semisupervised dimensionality reduction performance can be improved by low-rank kernel-based SDA, we list out SDA, KSDA1, and KSDA2 algorithm for comparison. In all experiments, the number of the nearest neighbors in the NN regularizer graph is set to 4.

(1) KSDA1 Algorithm. KSDA1 algorithm is the KSDA with Gaussian radial basis function (RBF) kernel .

(2) KSDA2 Algorithm. KSDA2 algorithm is the KSDA which uses polynomial kernel . Here, .

The classification accuracy is influenced by the kernel parameters. So after comparing, we choose a proper kernel parameters and for the KSDA1 and KSDA2 algorithm in each database in the following pairs, respectively, where is for Extended Yale Face Database B, is for ORL database, is for CMU PIE database, is for Musk database, is for Seeds Data Set, and is for SPECT Heart Data Set, respectively. Since the most suitable kernel parameters vary greatly at different random distribution even if they are for the same data, these kernel parameters are relatively suitable after comparing by many times’ runs.

4.2. Experiment  1: Different Algorithms Performances

To examine the effectiveness of the proposed LRKSDA algorithm, we conduct experiments on the six public databases. In our experiments, we randomly select 30% samples from each class as the labeled samples to evaluate the performance with different numbers of selected features. The evaluations are conducted with 20 independent runs for each algorithm. We average them as the final results. First we utilize different kernel methods to get the kernel mapping, and then we implement the SDA algorithm for dimensionality reduction. Finally, the nearest neighbor approach is employed for the final classification in the derived low-dimensional feature subspace. For each database, the classification accuracy for different algorithms is shown in Figure 1. Table 1 shows the performance comparison of different algorithms. Note that the results are the best results of all these different selected features mentioned above. From these results, we can observe the following.

In most cases, our proposed low-rank kernel-based SDA algorithm consistently achieves the highest classification accuracy compared to the other algorithms. LRKSDA achieves the best performance when the dimensionality is larger than a certain low dimension. And the classification accuracy is much higher than the other kernel SDA algorithms. So it improves the classification performance to a large extent, which suggests that low-rank kernel is more informative and suitable for SDA algorithm.

Since the proper kernel parameters are the most important thing of these traditional algorithms and since the kernel parameters of KSDA1 and KSDA2 algorithm are fixed global parameters, the two algorithms are very sensitive to different data or different random distribution of the same data. The performance improvement of these KSDA methods is not obvious. More seriously, as a result of randomly select labeled samples, the random distribution in each run may not adapt the so-called proper kernel parameters of KSDA1 and KSDA2 algorithm. Moreover, the traditional kernel mapping always analyzes the relationship of the data using the mode one-to-others, which emphasizes local information and lacks global constraints on their solutions. This situation may result in not good performance in some case, while the low-rank representation is better at capturing the global data structures. And we can get the lowest rank representation in a parameter-free way, which is very convenient and robust for kinds of data. So low-rank kernel-based SDA separates the different classes very well compared to other kernel SDA. And it can improve the performance to a very large extent, which means that our proposed low-rank kernel method is extremely effective.

4.3. Experiment  2: Influence of the Label Number

We evaluate the influence of the label number in this part. The experiments are conducted with 20 independent runs for each algorithm. We average them as the final results. The procedure is the same with experiment 1. For each database, we vary the percentage of labeled samples from 10% to 50% and the recognition accuracy is shown in Tables 2 and 3, from which we observe the following.

In most cases, our proposed low-rank kernel-based SDA algorithm consistently achieves the best results, which is robust to the label percentage variations. While some other compared algorithms are not as robust as our LRKSDA algorithm, we can see that the classification accuracy is very awful when the label rate is low. Thus, our proposed method has much superiority than the traditional KSDA and SDA algorithms. Sometimes these traditional methods may achieve good performances in some databases with high enough label rate. But they are not as stable as our proposed algorithm. Since the labeled data is very expensive and difficult, our proposed algorithm is much robust and suitable to the real word data.

As we mentioned in the previous part, since the low-rank kernel method gets the kernel matrix in a parameter-free way, it is robust for different kinds of data, while for the traditional kernel like Gaussian radial basis function kernel and polynomial kernel, if the data’s structure does not fit the stable kernel parameters they used, they cannot obtain the good representation of the original data set. Therefore, the low-rank kernel method is much more stable for all the data sets we use. And the low-rank representation jointly obtains the representation of all the samples under a global low-rank constraint, which can capture the global data structures. So it is robust to the label percentage variations even though the label rate is low.

4.4. Experiment  3: Robustness to Different Types Noises

In this test we compare the performance of different algorithms in the noisy environment. Extended Yale Face Database B and Musk database are randomly selected in this experiment. The Gaussian white noise, “salt and pepper” noise, and multiplicative noise are added to the data, respectively. The Gaussian white noise is with mean 0 and different variances from 0 to 0.1. The “salt and pepper” noise is added to the image with different noise densities from 0 to 0.1. And multiplicative noise is added to the data , using the equation , where and are the original and noised data and is uniformly distributed random noise with mean 0 and varying variance from 0 to 0.1. The number of labeled samples in each class is 30%. The experiments are conducted with 20 runs for each algorithm. We average them as the final results. The procedure is the same with experiment 1. For each graph, we vary the parameter of different noise. The results are shown in Tables 4 and 5.

As we can see, our proposed low-rank kernel-based SDA algorithm always achieves the best results, which means that our method is stable for Gaussian noise, “salt and pepper” noise, and multiplicative noise. And because of the robustness of the low-rank representation to noise, our method LRKSDA is much more robust than other algorithms. With the different kinds of gradually increasing noise, the traditional KSDA and SDA algorithms’ performance falls a lot, while our method’s performance is robust to these three noises and drops a few.

Notice that the noise is from a different model other than the original data’s subspaces. LRR can well solve the low-rank representation problem. When the data corrupted by arbitrary errors, LRR can also approximately recover the original data with theoretical guarantees. In other words, LRR is robust in an efficient way. Therefore, our method is much more robust than other algorithms with the three noises mentioned above.

5. Conclusions

In this paper, we propose a novel low-rank kernel-based SDA (LRKSDA) algorithm, which largely improves the performance of KSDA and SDA. Since low-rank representation is better at capturing the global data structures, LRKSDA algorithm separates the different classes very well compared to other kernel SDA. Therefore, our proposed low-rank kernel method is extremely effective. Empirical studies on six real world databases show that our proposed low-rank kernel-based SDA is much robust and suitable to the real word applications.

Disclosure

Current affiliation for Baokai Zu is Computer Science Department, Worcester Polytechnic Institute, Worcester, MA 01609, USA.

Competing Interests

The authors declare that they have no competing interests.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (no. 51208168), Tianjin Natural Science Foundation (no. 13JCYBJC37700), Hebei Province Natural Science Foundation (no. E2016202341), Hebei Province Natural Science Foundation (no. F2013202254 and no. F2013202102), and Hebei Province Foundation for Returned Scholars (no. C2012003038).