Complexity

Volume 2018, Article ID 2857594, 11 pages

https://doi.org/10.1155/2018/2857594

## Manifold Adaptive Kernelized Low-Rank Representation for Semisupervised Image Classification

^{1}School of Computer Science, Hangzhou Dianzi University, Hangzhou 310018, China^{2}Jiangsu Key Laboratory of Big Data Security & Intelligent Processing, Nanjing University of Posts and Telecommunications, Nanjing 210023, China^{3}Center for OPTical IMagery Analysis and Learning (OPTIMAL), Northwestern Polytechnical University, Xi’an 710072, China

Correspondence should be addressed to Yong Peng; nc.ude.udh@gnepgnoy

Received 19 December 2017; Revised 22 March 2018; Accepted 2 April 2018; Published 15 May 2018

Academic Editor: Eulalia Martínez

Copyright © 2018 Yong Peng et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Constructing a powerful graph that can effectively depict the intrinsic connection of data points is the critical step to make the graph-based semisupervised learning algorithms achieve promising performance. Among popular graph construction algorithms, low-rank representation (LRR) is a very competitive one that can simultaneously explore the global structure of data and recover the data from noisy environments. Therefore, the learned low-rank coefficient matrix in LRR can be used to construct the data affinity matrix. Consider the existing problems such as the following: the essentially linear property of LRR makes it not appropriate to process the possible nonlinear structure of data and learning performance can be greatly enhanced by exploring the structure information of data; we propose a new manifold kernelized low-rank representation (MKLRR) model that can perform LRR in the data manifold adaptive kernel space. Specifically, the manifold structure can be incorporated into the kernel space by using graph Laplacian and thus the underlying geometry of data is reflected by the wrapped kernel space. Experimental results of semisupervised image classification tasks show the effectiveness of MKLRR. For example, MKLRR can, respectively, obtain 96.13%, 98.09%, and 96.08% accuracies on ORL, Extended Yale B, and PIE data sets when given 5, 20, and 20 labeled face images per subject.

#### 1. Introduction

Since it is usually not easy to collect a large number of labeled samples to train learning models, the semisupervised learning (SSL) paradigm, which can harness both labeled and unlabeled samples to improve the learning performance, draws a lot of attention in recent studies [1–7]. Among existing SSL algorithms, graph-based algorithms are a class of the most popular approaches in which the label propagation can be performed on the graph [8–11]. The underlying idea for graph-based algorithms is to characterize the relationship between data pairs by an affinity matrix. Although researchers have pointed that sparsity, high discriminative power, and adaptive neighborhood are desirable properties for a good graph [12], how to learn a good graph that can accurately uncover the latent relationship in data is still a challenging problem.

Among existing graph construction methods, the -nearest neighbors and -neighborhood are the two most widely used algorithms. However, they are usually sensitive to noisy environments, especially when data points contain outliers. To construct more effective graph, many new algorithms were proposed. The sparse graph [8] is parameter-free and insensitive to outliers, which is derived by encoding each datum as a sparse representation of the remaining samples. Sparse graph can automatically select the most informative neighbors for each datum. However, since sparse representation encodes each datum individually, the resultant sparse graph only emphasizes the local structure of data, while it neglects considering the global structure of data. This property will deteriorate its performance, especially when data are grossly corrupted [13]. Different from sparse representation that enforces the representation coefficient to be sparse [14], low-rank representation aims to learn the data affinities jointly, which can reveal the global structure of data and preserve the membership of samples that belong to the same class in noisy environments [15, 16]. The learned LRR graph can capture the global mixture of subspaces structure via the low rankness property and thus it is generative and discriminative for semisupervised learning tasks [9].

Apart from the conventional LRR model, many advanced variants were proposed recently. To efficiently explore the structure information of data, Zheng et al. imposed the local constraint characteristic on the representation coefficient and thus formulated the low-rank representation with local constraint (LRRLC) model [10]. Lu et al. proposed the graph regularized LRR (GLRR) that introduces the graph regularizer to enforce the local consistency of data [17]. Zhuang et al. proposed incorporating the sparse and nonnegative constraints into low-rank representation and formulated the NNLRS model [9]. The manifold low-rank representation (MLRR) [18] first uses a sparse learning objective to identify the data manifold and then incorporates the manifold information into low-rank representation as a regularizer. Additionally, [19] proposed preserving the structure information of data from two aspects: local affinity and distant repulsion. Li and Fu proposed constructing graph based on low-rank coding and -matching constraint for obtaining a sparse and balanced graph [20]. All the above-mentioned low-rank models are linear; therefore, they inevitably have limitations in modeling complex data distribution, which does not strictly follow a linear model but a nonlinear one. To make the low-rank model effectively deal with the nonlinear structure of data, [11] proposed the kernel low-rank representation (KLRR) graph for semisupervised classification by using kernel trick. As a nonlinear extension of LRR, KLRR also showed excellent performance in face recognition [21].

Recent studies [22–26] have shown that learning performance can be greatly enhanced by considering the geometrical structure and local invariant idea [27]. It is obvious that this idea should be considered in both original data space and the reproducing kernel Hilbert space (RKHS). However, there is no existing LRR variant that takes into account the intrinsic manifold structure in RKHS. In this paper, we propose a novel manifold adaptive kernelized LRR for semisupervised classification. By using a data-dependent norm on RKHS proposed by [28], we can warp the structure of the RKHS to reflect the underlying geometry of the data. Then, the conventional low-rank representation can be performed in the manifold adaptive kernel space. The main contributions of this paper can be briefly summarized as follows:(1)We construct the manifold adaptive kernel space, where the underlying geometry of data can be reflected by the graph Laplacian.(2)We give the model formulation, the optimization method, and the complexity analysis of MKLRR in detail.(3)We conduct extensive experiments on semisupervised image classification tasks to evaluate the effectiveness of MKLRR and the experimental results show that MKLRR can get pretty promising performance.

The remainder of this paper is organized as follows. In Section 2, we give a brief review on the conventional LRR model and the semisupervised learning framework to be used in our work. Section 3 describes the model formulation, optimization method, and complexity analysis of the manifold adaptive kernelized LRR model in detail. Experimental studies of MKLRR on semisupervised image classification task will be introduced in Section 4. Section 5 concludes the whole paper and presents an extension of MKLRR as our future work.

#### 2. Related Work

In this section, we give a brief review of the conventional low-rank representation model [15] and the semisupervised classification framework based on Gaussian Fields and Harmonic Functions (GHF) [1].

##### 2.1. LRR

Given a set of samples , LRR aims to represent each sample as a linear combination of the bases in by , where is the matrix in which each is the representation coefficient corresponding to sample . Therefore, each entry in can be viewed as the contribution to the reconstruction of with as the dictionary. LRR seeks to find the lowest rank solution by solving the following optimization problem [15]:It is NP-hard to directly optimize the rank function. Therefore, we usually use the trace norm (also called nuclear norm) as the closest convex surrogate to the rank norm, which leads to the following objective [29]:where is the sum of its singular values of a certain matrix [30]. Considering the fact that samples are usually noisy or even grossly corrupted, a more reasonable objective for LRR can be expressed as where and . The second term in (3) is to characterize the error term by modeling the sample-specific corruptions. Also, some existing studies employed the -norm to measure the error term [31, 32]. The optimal solution can be obtained via the inexact augmented Lagrange multiplier method [31].

##### 2.2. GHF

Assume that we have a data set from classes, where , and , are the labeled and unlabeled samples, respectively. The label indicator matrix is defined as follows: for each sample , is its label vector. If is from the th () class, then only the th entry of is one and all the other entries are zeros. If is an unlabeled data, then .

GHF is a well-known graph-based semisupervised learning framework in which the predicted label matrix is estimated on the graph with respect to the label fitness and the manifold smoothness. Let and , respectively, denote the th rows of and . GHF tries to minimize the following objective:where is a very large value such that can be approximately satisfied. is an affinity matrix to depict the pairwise similarity of samples. Obviously, (4) can be rewritten in the compact matrix form aswhere the graph Laplacian matrix can be calculated as ; (or since is usually a symmetric matrix) is a diagonal degree matrix. is also a diagonal matrix with the first and the remaining diagonal entries as and 0, respectively.

#### 3. Manifold Adaptive Low-Rank Representation

##### 3.1. Manifold Adaptive Kernel

In this section, we show how to incorporate the manifold structure into the reproducing kernel Hilbert space (RKHS), which leads to manifold adaptive kernel space.

Kernel trick is usually applied with the hope of discovering the nonlinear structure in data by mapping the original nonlinear observations into a higher dimensional linear space [33]. The most commonly used kernels are Gaussian and Polynomial kernels. However, the nonlinear structure captured by the data-independent kernels may not be consistent with intrinsic manifold structure, such geodesic distance, curvature, and homology [34, 35].

In this work, we adopt the manifold adaptive kernel proposed by [28]. Let be a linear space with a positive semidefinite inner product (quadratic form) and let be a bounded linear operator. We define to be the space of functions from with manifold inner product: is still a RKHS [28].

Given samples , let be the evaluation map.Denote and . Note that ; thus we havewhere is a positive semidefinite matrix. For a data vector , we defineIt can be shown that the reproducing kernel in iswhere is an identity matrix, is the kernel matrix in , and is a constant controlling the smoothness of the functions. The key issue now is the choice of , so that the deformation of the kernel induced by the data-dependent norm is motivated with respect to the intrinsic geometry of the data.

Without loss of generality, we assume that there are data points to be utilized to derive the linear space . It is easy to rewrite formulation (10) in compact matrix form aswhere the matrices , , and are all in . Here, is an identity matrix with the same size as . was referred to as the kernel matrix in the warped RKHS.

The key issue now is the choice of . As mentioned above, manifold structure can be discovered by the graph Laplacian associated with the data points.

##### 3.2. The Objective Function

From [11], the objective of kernel low-rank representation was formulated asIn order to learn the low-rank representation that is consistent with the manifold geometry, it is natural to take advantage of the manifold adaptive kernel in KLRR.

In order to model the manifold structure, we construct a nearest-neighbor graph . For each data point , we find its nearest neighbors denoted by and put an edge between and its neighbors. There are many choices for the weight matrix on the graph and we use the “0-1” form defined as follows:The graph Laplacian [36] is defined as , where is a diagonal degree matrix given by (or since is symmetric). The graph Laplacian provides the following smoothness penalty on the graph:Therefore, it is natural to substitute with the graph Laplacian . For convenience, we make use of all the available data points to derive the linear space in the warped RKHS (i.e., ); then (11) can be rewritten aswhere indicates that this kernel matrix is in a manifold RKHS.

Using the nuclear norm to replace the rank function, we arrive at the following objective of manifold adaptive kernelized LRR as

Figure 1 shows the connection between MKLRR and LRR as well as its variants. As we can see, LRR variants such as GLRR, LRRLC, and MLRR can be reached by incorporating manifold information. By using the kernel trick, the KLRR model can find the lowest rank representation in RKHS. Further, by considering the geometric structure of data in RKHS, we can formulate the MKLRR model. Both KLRR and MKLRR are nonlinear models, since an implicit nonlinear mapping is employed.