Complexity

Volume 2018, Article ID 7963210, 16 pages

https://doi.org/10.1155/2018/7963210

## Robust Semisupervised Nonnegative Local Coordinate Factorization for Data Representation

^{1}School of Mathematics, Liaoning Normal University, Dalian 116029, China^{2}Institute of Information and Control, Hangzhou Dianzi University, Hangzhou 541004, China

Correspondence should be addressed to Wei Jiang; moc.nuyila@wjxxws

Received 19 December 2017; Revised 20 March 2018; Accepted 24 April 2018; Published 1 August 2018

Academic Editor: Gao Cong

Copyright © 2018 Wei Jiang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Obtaining an optimum data representation is a challenging issue that arises in many intellectual data processing techniques such as data mining, pattern recognition, and gene clustering. Many existing methods formulate this problem as a nonnegative matrix factorization (NMF) approximation problem. The standard NMF uses the least square loss function, which is not robust to outlier points and noises and fails to utilize prior label information to enhance the discriminability of representations. In this study, we develop a novel matrix factorization method called robust semisupervised nonnegative local coordinate factorization by integrating robust NMF, a robust local coordinate constraint, and local spline regression into a unified framework. We use the norm for the loss function of the NMF and a local coordinate constraint term to make our method insensitive to outlier points and noises. In addition, we exploit the local and global consistencies of sample labels to guarantee that data representation is compact and discriminative. An efficient multiplicative updating algorithm is deduced to solve the novel loss function, followed by a strict proof of the convergence. Several experiments conducted in this study on face and gene datasets clearly indicate that the proposed method is more effective and robust compared to the state-of-the-art methods.

#### 1. Introduction

Owing to the rapid development of data collection and storage techniques, there has been an increase in the demand for effective data representation approaches [1] to cope with image and gene information, particularly in the fields of pattern recognition, machine learning, and gene clustering. For large databases, an efficient representation of data [2–4] can improve the performance of numerous intelligent learning systems such as those used for classification and clustering analysis. In many application fields, the input samples are represented in high-dimensional form, which is infeasible for direct calculation. The efficiency and effectiveness of learning models exponentially decrease with each increase in the dimensionality of input samples, which is generally referred to as the “curse of dimensionality.” Accordingly, dimensionality reduction [5–7] is becoming increasingly important as it can overcome the curse of dimensionality, enhance the learning speed, and even offer critical insights into the essence of the issue. In general, dimensionality reduction methods can be divided into two categories: feature extraction [5, 8, 9] and selection [10–14]. Feature selection involves selecting discriminative and highly related features from an input feature set, whereas feature extraction combines original features to form new features of data variables.

In recent years, there has been an increasing interest in feature extraction. Many feature extraction methods are designed to obtain a low-dimensional feature of high-dimensional data. These methods include singular value decomposition (SVD), principal component analysis (PCA) [5], nonnegative matrix factorization (NMF) [15, 16], and concept factorization (CF) [17]. Despite the different motivations of these models, they can all be interpreted as matrix decomposition, which often finds two or more low-dimensional matrices to approximate the original matrix. Factorization leads to a reduced representation of high-dimensional data and belongs to the category of methods employed for dimension reduction.

Unlike PCA [5] and SVD, NMF [15, 16] factorizes a sample matrix as a product of two matrices constrained by nonnegative elements. One matrix comprises new basis vectors that reveal the semantic structure, and the other matrix can be regarded as the set of coefficients composed of linear combinations of all sample points based on the new bases. Owing to their ability to extract the most discriminative features and their feasibility in computation, many extension versions [4, 18, 19] of NMF have been developed from various perspectives to enhance the original NMF. Sparseness-constrained NMF [20] has been introduced by adding norm minimization on the learned factor matrices to enhance sparsity for data representation. Fisher’s criterion [21] has been incorporated into NMF formulation and is used to achieve discriminant representation. The semi- and convex-NMF formulations [22] relax the nonnegativity constraint of NMF by allowing the basis and coefficient matrices to have mixed signs, thereby extending the applicability of the method. Liu et al. [23] proposed a constrained NMF in which the label information is incorporated into the standard NMF for data representation. Cai et al. [24] extended NMF and proposed a graph-regularized NMF (GNMF) scheme, which imposes intrinsic geometry latent in a high-dimensional dataset onto the traditional NMF using an affinity graph. Chen et al. [9] presented a nonnegative local coordinate factorization (NLCF) method that imposes locality constraint onto the original NMF to explore faithful intrinsic geometry.

Traditional NMF and its variants usually adopt the square Euclidean distance to measure the approximation error. Although it has a solid theoretical foundation in mathematics and has shown encouraging performance in most cases, the square Euclidean distance is not always optimal for decomposition of a data matrix. The squared error has proved to be the best for both Gaussian and Poisson noise [25]. However, in real-world applications, data that violate the assumptions are usually involved. The squared loss is sensitive to outlier points and noises when the reconstruction error is measured. Even a single outlier point may sometimes easily dominate the objective function. In recent years, some variants have been presented to enhance the robustness of the classical NMF. A robust type of NMF that factorizes the sample matrix as the summation of two nonnegative matrices and one sparse error matrix was presented by Zhang et al. [26]. Zhang et al. [27] presented a robust NMF (RNMF) using the norm objective function, which can deal with outlier points and noises. Zhang et al. [28] presented a robust nonnegative graph-embedding framework (RNGE) that can simultaneously cope with noisy labels, noisy data, and uneven distribution.

Supervised learning algorithms [29–32] generally can achieve better performance than unsupervised learning techniques when label information is available in many applications. The motivation of semisupervised learning methods [33–38] is to employ numerous unlabeled samples as well as relatively few labeled samples to construct a better high-dimensional data analysis model. A surge of research interest in graph-based semisupervised learning techniques [37–39] [40] has recently occurred. Gaussian fields and harmonic functions (GFHF) [33] is an efficient and effective semisupervised learning methods in which the predicted label matrix is reckoned on the graph with respect to manifold smoothness and label fitness. Xiang et al. [37] presented a method called local spline regression (LSR) in which an iterative algorithm is built on local neighborhoods through spline regression. Han et al. [38] presented a model of video semantic recognition using semisupervised feature selection via spline regression (S2FS2R). These methods not only consider label information but also employ the local and global structure consistency assumption.

Despite NMF’s appealing advantages, it suffers from the following problems in real-world applications: (1) data may often be contaminated by noise and outliers due to illumination (e.g., specular reflections), image noises (e.g., scanned image data), occlusion (e.g., sunglasses and scarf in front of a face), among others. Although NMF can deal with noise in the test data to some extent, it will suffer from severe performance degradation when the training samples have noise. (2) In an NMF method, a data point may be represented by the base vectors, which are far from the data point, resulting in poor clustering performance. The standard NMF does not preserve the locality during its decomposition process, whereas local line coding can preserve such properties. (3) One of the challenges for classification tasks in the real world is the lack of labeled training data. Therefore, data labeled by an expert is often used as an alternative. Unfortunately, designating labels requires considerable human effort and is thus time-consuming and difficult to manage. In addition, an accurate label may require expert knowledge. However, unlabeled samples are relatively easy to obtain.

To address all the aforementioned issues, we present an efficient and effective matrix factorization framework called robust semisupervised nonnegative local coordinate factorization (RSNLCF) in which both data reconstruction functions and a local coordinate constraint regularization term are formulated in a norm manner to make our model robust to outlier points and noises. By integrating Green’s functions and a set of primitive polynomials into the local spline, the local and global label consistency of data can be characterized based on their distribution. The main work of our study and its contributions are summarized as follows: (i)The proposed RSNLCF model is robust to outlier points and noises as a result of employing the norm formulations of NMF and a local coordinate constraint regularization term. In addition, to guarantee that the data representation is discriminative, local spline regression over labels is exploited.(ii)Unlike traditional dimension reduction approaches that treat feature extraction and selection separately, the proposed RSNLCF algorithm integrates the two aspects into a single optimization framework.(iii)We present an efficient algorithm to solve the presented RSNLCF model and provide the proof of rigorous convergence and correctness analysis of our model.

The remainder of this paper is organized as follows. Related studies are introduced in Section 2. We introduce our RSNLCF method and the optimization scheme in Section 3 and offer a convergence proof in Section 4. We describe and analyze the results of our experiments in Section 5. We conclude and discuss future work in Section 6.

#### 2. Related Work

In this section, we summarize the notations and definitions of norm used in this study and briefly review NMF.

##### 2.1. Notations and Definitions

Matrices and vectors are denoted by boldface capital and lowercase letters, respectively. denotes the norm of the vector . and denote the th row and the th column of matrix , respectively. is the element in the th row and th column of , denotes the trace of if is a square matrix, and denotes the transposed matrix of . The Frobenius norm of the matrix is defined as

The norm of a matrix is defined as where is a diagonal matrix with . However, could approach zero. For this case, we define , where is a very small constant.

Assume that the matrix samples are represented as , where denotes labeled and unlabeled data, respectively. The labels of are denoted as with being the total number of categories. Let be a label indicator binary matrix with the th entry if and only if is labeled with the th class; otherwise. We also introduce a predicted label matrix , where each row is the predicted label vector of the data .

##### 2.2. NMF

Given a nonnegative matrix , each column of is a sample point. The main idea of NMF is to find two nonnegative matrices and that minimize the Euclidean distance between and . The corresponding optimization problem is as follows: where is the Frobenius norm. To solve the objective function, Lee and Seung [15] proposed an iterative multiplicative updating algorithm as follows:

By NMF, each column of and can be viewed as the basis, while the matrix can be treated as the set of the coefficients. Each sample point is approximated by a linear combination of the bases, weighted by components of .

#### 3. The Proposed RSNLCF Framework

In this section, we introduce our novel learning method for image clustering (RSNLCF), which is used to find an effective and robust representation of data.

##### 3.1. Robust Sparse NMF

The square loss function based on the Frobenius norm is used to learn the data representations in NMF. However, it is very sensitive to outlier points and noises. Therefore, our robust representation model is represented as where is the regularization parameter. Because the norm reduces the components occupied by the large magnitude of error in the loss function, the corrupted samples never dominate the objective function. In this sense, the loss function is insensitive to outlier points and noises. Meanwhile, the regularization term ensures that is sparse in rows. This means that some of ’s rows approximate zero. Consequently, can be considered the combination coefficient for the most discriminative features. Feature selection is then achieved by , where only the features related to the nonzero rows in are chosen.

##### 3.2. Robust Local Coordinate Constraint

Motivated by the concept of local coordinate coding [41], we present a robust local coordinate constraint as a regularization term for image clustering. First, we define coordinate coding.

*Definition 1. *Coordinate coding [41] can be written as concept pair (, ), where is defined as a set of anchor points with dimensions and is a map of to . It induces the following physical approximation of in .

For the local coordinate coding system, NMF can be considered as coordinate coding in which the columns of the matrix can be viewed as a set of anchor points, and each column of the coefficient matrix represents the corresponding coordinate coding for each data point. We might further hope that each sample point is represented as a linear combination of only a few proximate anchor points. A natural assumption here would be that if is far away from the anchor points , then its coordinate coding with respect to will tend to be zero and thus achieve sparsity and locality simultaneously. The local coordinate constraint [41] can be defined as follows: where denotes the th column of , is the th column of , is the coordinate of with respect to , and , indicates a conversion of the vector into a diagonal matrix in which the th diagonal element is .

The local coordinate constraint employs a square loss. When the dataset is corrupted by outlier points and noises, the local coordinate constraint may fail to achieve sparsity and locality simultaneously. In order to alleviate the side effect of noisy data, our robust local coordinate constraint can be formulated as where the Frobenius norm-based square loss function has been substituted by the norm.

##### 3.3. Local Spline Regression

In this subsection, we briefly introduce local spline regression [42].

Given data points sampled from the underlying submanifold , we use set to denote and its nearest neighbor points, where , and is the local predicted label matrix for the th region. The task of local spline regression is to seek the predicting function in order to map each data point to the local predicted class label . The model of local spline regression can be expressed as where is a regularization term and is a small positive regularization parameter to control the smoothness of the spline [42]. If is defined as a seminorm of a Sobolev space, can be solved by the following objective function [43]: where , in which is the order of the partial derivatives [43]. and are a set of primitive polynomials and a Green’s function, respectively. The coefficients and can be achieved by solving the following problem: where is a symmetrical matrix with elements , and is a matrix with its elements . The local spline regression model can then be expressed as [42] where is the upper left submatrix of the inverse matrix of the coefficient matrix in (10). Because the local predicted label matrix is a part of the global predicted label matrix , we can construct a selection matrix for each such that where the selection matrix is defined as follows:

After the local predicted label matrices are established, we combine them by minimizing the following loss function: where

Based on the studies of [33, 34], the predicted label matrix of the labeled data points should be consistent with the ground truth labels matrix . With the consistence constraints, the objective function (14) can be written as follows: where is a diagonal matrix whose diagonal elements are 1 for labeled data and 0 for unlabeled data, and the elements of are defined as follows:

When is sufficiently large, the optimal solution to the problem (16) makes the second term approximately equal to zero. Thus, the objective function (16) guarantees local and global structural consistency over labels. All the elements of are restricted to be nonnegative.

##### 3.4. Objective Function of RSNLCF

By combining the RNMF (5), robust local coordinate constraint (7), and semisupervised local spline regression (16) into a unified framework, we can formulate the objective function as follows: where and are two trade-off parameters. We call (18) our proposed RSNLCF.

#### 4. Optimization

The objective function (18) involves the norm, which is nonsmooth and cannot have a closed form solution. Consequently, we propose to solve it as follows.

Denote and . When considering the nonnegative constraint on , , and , the objective function (18) could be reformulated as where , , and are three diagonal matrices with their diagonal elements given as , , and , respectively.

##### 4.1. Update Rules

The objective function of RSNLCF in (19) is not convex in together. Therefore, it is unrealistic to expect an algorithm to find the global minima. In this subsection, we describe our development of an iterative algorithm based on the Lagrangian multiplier method, which can achieve local minima. Following some algebraic steps, the objective function can be written as follows:

To tackle the nonnegative constraint on , , and , the objective (20) can be rewritten as the Lagrangian multiplier. where , , and are the Lagrangian multipliers. Let the partial derivatives of the objective function (21) with respect to , , and be zero. Thus, we have where is a diagonal matrix whose entries are row sums of . is a matrix whose columns are . is a matrix, and .

Based on the Karush-Kuhn-Tucker conditions [44] and , we obtain

The corresponding equivalent formulas are as follows:

Solving (24), (25), and (26), we obtain the following update rules, given by

In this manner, we obtain the solver for the objective function (19).

##### 4.2. Convergence Analysis

In this subsection, we demonstrate that the objective function (20) converges to a local optimum by using the update rules (27), (28), and (29) after finite iterations. We adopt the auxiliary function approach [16] to prove the convergence. Here, we first introduce the definition of an auxiliary function.

*Definition 1. * is an auxiliary function for if the following properties are satisfied:

Lemma 1. *If is an auxiliary function for , then is nonincreasing under the update:
*

*Proof 1. *

Lemma 2. *For any nonnegative matrices , , , , and are symmetric, and then the following inequality holds
*

The convergence of the algorithms is demonstrated in the following:

For given , the optimizing objective function (20) w.r.t. is equivalent to minimizing

Theorem 1. *The following function
is an auxiliary function for .*

*Proof 1. *In one sense, is obvious. However, we need to prove that To accomplish this, we compare (34) and (35) to find out that .

By applying Lemma 2, we obtain

To obtain the upper bound for the third and fifth terms, we use the inequality , which holds for any , , and these third and fifth terms in are bounded by

To obtain lower bounds for the remaining terms, we adopt the inequality , , and then

Summing all inequalities, we can obtain which obviously satisfies . Therefore, is an auxiliary function of .

Theorem 2. *The updating rule (28) can be obtained by minimizing the auxiliary function .*

*Proof 1. *To find the minimum of , we set the derivative and obtain

Thus, by simple algebraic formulation, we can obtain the iterative updating rule for as (28).

Based on the properties of the auxiliary, we prove that the objective function (20) monotonically decreases under the updating .

The converge proofs showing that updating and can be accomplished using (27) and (29) are similar to the aforementioned.

#### 5. Experiments and Discussion

We systematically evaluated the performance of our presented RSNLCF method and compared it to the popular clustering methods.

##### 5.1. Datasets

Three standard face datasets and the gene dataset were selected to evaluate different methods. The four datasets are described as follows:
(i)*Extended YaleB dataset*: the extended YaleB dataset contains 2414 frontal face images of 38 individuals. In this dataset, the size of each face image is 192 × 168 and each image was acquired from 64 illuminate conditions and nine individual poses. Each image was resized to 32 × 32 in our experiments.(ii)*ORL face dataset*: the OR dataset contains 400 images of 40 individuals. All images were captured at different times and with different variations including lighting, face expressions (open and closed eyes, smiling, and not smiling), and specific facial details (glasses and no glasses). The original images had a size of 92 × 112. Each image was rescaled to 32 × 32.(iii)*AR dataset*: the AR dataset contains over 4000 frontal face images of 126 individuals (70 men and 56 women) with different facial expressions, illumination conditions, and occlusions (sunglasses and scarf). All individuals participated in two photo sessions, and 26 images of each individual were captured. Each image was scaled to 32 × 32.(iv)*Leukemia dataset*: the leukemia dataset contains data related to and samples of acute myelogenous leukemia (AML) and acute lymphoblastic leukemia (ALL). ALL can be further classified as T and B subtypes. This dataset consists of 5000 genes in 38 set of tumor data and contains 19 samples of B cell ALL B, eight samples of T cell ALL T, and 11 samples of AML.

##### 5.2. Experimental Design

In this section, we describe our evaluation metrics, the compared methods, and our parameter selection.

###### 5.2.1. Evaluation Metrics

In our experiments, two widely used metrics (i.e., accuracy (Acc) and normalized mutual information (NMI)) were adopted to evaluate the clustering results [45]. We evaluated the algorithms by comparing the cluster labels of each data point with its label provided by the dataset. The Acc metric is defined as follows: where refers to the total number of samples, denotes the cluster label of , and is the true class label. In addition, is the delta function that is equal to 1 if and 0 otherwise, and is the mapping function that maps the obtained label to the equivalent label from the dataset. The best mapping function can be determined by using the Kuhn-Munkres algorithm [46]. The value of Acc is equal to 1 if and only if the clustering result and the true label are identical. The second measure is the NMI, which is adopted in order to evaluate the quality of clusters. Given a clustering result, the NMI is defined as follows: where denotes the number of images contained in the th cluster based on clustering results, is the number of images belonging to the , and is the number of images that are in the intersection of and.

###### 5.2.2. Compared Methods

To verify the clustering performance of our RSNLCF, several popular methods were compared using the same dataset. The methods are listed as follows: (i)RNMF using norm [27](ii)Semisupervised graph-regularized NMF (semi-GNMF) [24](iii)Constrained NMF (CNMF) [16](iv)Local centroid-structured NMF (LCSNMF) [47](v)Unsupervised robust seminonnegative graph embedding through the norm (URNGE) [28](vi)Nonnegative local coordinate factorization (NLCF) [9](vii)Our proposed RSNLCF

Sample images are shown in Figure 1.