Mathematical Problems in Engineering

Volume 2014, Article ID 439417, 10 pages

http://dx.doi.org/10.1155/2014/439417

## Low-Rank Representation for Incomplete Data

^{1}School of Science, Xi’an University of Architecture and Technology, Xi’an 710055, China^{2}School of Mathematics and Computer Science, Shaanxi University of Technology, Hanzhong 723001, China

Received 20 August 2014; Revised 25 November 2014; Accepted 19 December 2014; Published 31 December 2014

Academic Editor: Wanquan Liu

Copyright © 2014 Jiarong Shi et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Low-rank matrix recovery (LRMR) has been becoming an increasingly popular technique for analyzing data with missing entries, gross corruptions, and outliers. As a significant component of LRMR, the model of low-rank representation (LRR) seeks the lowest-rank representation among all samples and it is robust for recovering subspace structures. This paper attempts to solve the problem of LRR with partially observed entries. Firstly, we construct a nonconvex minimization by taking the low rankness, robustness, and incompletion into consideration. Then we employ the technique of augmented Lagrange multipliers to solve the proposed program. Finally, experimental results on synthetic and real-world datasets validate the feasibility and effectiveness of the proposed method.

#### 1. Introduction

In the community of pattern recognition, machine learning, and computer vision, the investigated datasets usually have intrinsically low-rank structure although they are probably high-dimensional. Low-rank matrix recovery (LRMR) [1–3] is just a type of model which utilizes the crucial low-complexity information to complete missing entries, recover sparse noise, identify outliers, and build an affinity matrix. It also can be regarded as the generalization of compressed sensing from one order to two orders due to the fact that the low rankness of a matrix is equivalent to the sparsity of its singular values. Recently, LRMR has received more and more attentions in the fields of information science and engineering and achieved great success in video background modeling [4, 5], collaborative filtering [6, 7], and subspace clustering [3, 8, 9], to name just a few.

Generally, LRMR is mainly composed of three appealing types, that is, matrix completion (MC) [1], robust principal component analysis (RPCA) [2, 4], and low-rank representation (LRR) [3, 8]. Among them, MC aims to complete the missing entries with the aid of the low-rank property and is initially described as an affine rank minimization problem. In the past few years, the affine rank minimization is convexly relaxed into a nuclear norm minimization [10] and it is proven that if the number of sampled entries and singular vectors satisfy some conditions, then most low-rank matrices can be perfectly recovered by solving the aforementioned convex program [1].

Classical principal component analysis (PCA) [11] is very effective to small Gaussian noise, but it does not work well in practice when data samples are corrupted by outliers or large sparse noise. For this purpose, several robust variants of PCA have been proposed successively during the past two decades [12, 13]. Since the seminal research work [4], the principal component pursuit (PCP) approach has become a standard for RPCA. This approach minimizes a weighted combination of the nuclear norm and the norm with linear equality constraints. It is proven that both the low-rank and the sparse components can be recovered exactly with dominant probability under some conditions by solving PCP [4].

In subspace clustering, a commonly used assumption is that the data lie in the union of multiple low-rank subspaces and each subspace has sufficient samples compared with its rank. Liu et al. [3] proposed a robust subspace recovery technique via LRR. Any sample in each subspace can be represented as the linear combination of the bases. The low complexity of the linear representation coefficients is very useful in exploiting the low-rank structure. LRR attempts to seek the lowest-rank representation of all data jointly and it is demonstrated that the data contaminated by outliers can be exactly recovered under certain conditions by solving a convex program [8]. If the bases are chosen as the columns of an identity matrix and the norm is employed to measure the sparsity, then LRR is changed into the PCP of RPCA.

For the datasets with missing entries and large sparse corruption, the robust recovery of subspace structures may be a challenging task. The available algorithms to MC are not robust to gross corruption. Moreover, a large quantity of missing values will bring out the degeneration of recovering performance for LRR or RPCA. In this paper, we attempt to address the problem of low-rank subspace recovery in the presence of missing values and sparse noise. Specifically speaking, we present a model of incomplete low-rank representation (ILRR) which is a direct generalization of LRR. The ILRR model can be boiled down to a nonconvex optimization model which minimizes the combination of the nuclear norm and the -norm. To solve this program, we design an iterative scheme by applying the method of inexact augmented Lagrange multipliers (ALM).

The rest of this paper is organized as follows. Section 2 briefly reviews preliminaries and related works on LRMR. The model and algorithm for ILRR are presented in Section 3. In Section 4, we discuss the extension of ILRR and its relationship with the existing works. We compare the performance of ILRR with the state-of-the-art algorithms on synthetic data and real-world datasets in Section 5. Finally, Section 6 draws some conclusions.

#### 2. Preliminaries and Related Works

This section introduces the relevant preliminary material concerning matrices and representative models of low-rank matrix recovery (LRMR).

The choice of matrix norms plays a significant role in LRMR. In the following, we present four important types of matrix norms. For arbitrary , the Frobenius norm of is expressed by , the -norm is , the-norm is , and the nuclear norm is , where is the th entry of and is the th largest singular value. Among them, the matrix nuclear norm is the tightest convex relaxation of the rank function, and the -norm and the -norm are frequently used to measure the sparsity of a noise matrix.

Consider a data matrix stacked by training samples, where each column of indicates a sample with the dimensionality of . Within the field of LRMR, the following three proximal minimization problems [3, 4] are extensively employed: where is a positive constant used to balance the regularization term and the approximation error. For given and , we define three thresholding operators , , and as follows: where is the singular value decomposition (SVD) of . It is proven that the aforementioned three optimization problems have closed-form solutions denoted by [4], [14], and [8], respectively.

We assume that is low-rank. Because the degree of freedom of a low-rank matrix is far less than its number of entries, it is possible to recover exactly all missing entries from partially observed entries as long as the number of sampled entries satisfies certain conditions. Formally, the problem of matrix completion (MC) [1] can be formulated as follows: where , , . We define a linear projection operator as follows: Hence, the constraints in problem (3) can be rewritten as .

PCA obtains the optimal estimate for small additive Gaussian noise but breaks down for large sparse contamination. Here, the data matrix is assumed to be the superposition of a low-rank matrix and a sparse noise matrix . In this situation, robust principal component analysis (RPCA) is very effective to recover both the low-rank and the sparse components by solving a convex program. Mathematically, RPCA can be described as the following nuclear norm minimization [2]: where is a positive weighting parameter. Sequentially, RPCA is generalized into a stable version which is simultaneously stable to small perturbations and robust to gross sparse corruption [15].

We further assume that the dataset is self-expressive and the representation coefficients matrix is also low-rank. Based on the above two assumptions, the model of low-rank representation (LRR) [3] is expressed as where is the coefficient matrix, is the noise matrix, and is a positive trade-off parameter. This model is very effective to detect outliers and the optimal is in favor of the robust subspace recovery. In subspace clustering, the affinity matrices can be constructed by the optimal to problem (6).

Problems (3), (5), and (6) belong to the nuclear norm minimizations. The existing algorithms to the preceding optimizations mainly include the iterative thresholding, the accelerated proximal gradient, the dual approach, and the augmented Lagrange multipliers (ALM) [16]. These algorithms are scalable owing to the adoption of first-order information. Among them, ALM, also called alternating direction method of multipliers (ADMM) [17], is a very popular and effective method to solve the nuclear norm minimizations.

#### 3. Model and Algorithm of Incomplete Low-Rank Representation

This section proposes a model of low-rank representation for incomplete data and develops a corresponding iterative scheme for this model.

##### 3.1. Model

We consider an incomplete data matrix and denote the sampling index set by . The th entry of is missing if and only if . For the sake of convenience, we set all missing entries of to zeros. To recover simultaneously the missing entries and the low-rank subspace structure, we construct an incomplete low-rank representation (ILRR) model: where is a positive constant and corresponds to the completion argument of . If there is not any missing entry, that is, , then the above model is equivalent to LRR. In other words, LRR is a special case of ILRR.

In order to solve conveniently the nonconvex nuclear norm minimization (7), we introduce two auxiliary matrix variables and . Under this circumstance, the above optimization problem is reformulated as This minimization problem is equivalent to where the factor . Without considering the constraint , we construct the augmented Lagrange function of problem (9) as follows: where is the inner product operator between matrices and is a Lagrange multiplier matrix, . In the next part, we will propose an inexact augmented Lagrange multipliers (IALM) method to solve problem (8) or problem (9).

##### 3.2. Algorithm

Inexact ALM (IALM) method employs an alternating update strategy and it minimizes or maximizes the function with respect to each block variable at each iteration. Let .

*Computing *. When is unknown and other variables are fixed, the calculation procedure of is as follows:

*Computing *. If matrix is unknown and other variables are given, is updated by minimizing :
Let . By setting the derivative of to zero, we have
or, equivalently,
where is an -order identity matrix.

*Computing *. The update formulation of matrix is calculated as follows:
Considering the constraint , we further obtain the iteration formulation of
where is the complementary set of .

*Computing *. Fix, and and minimize with respect to **:**

*Computing *. Fix , and to calculate as follows:
where . The derivative of is . Hence, we obtain the update of by setting :

*Computing *. Given **,** and , we calculate as follows
In the detailed implementation, is updated according to the following formulations:

We denote by the zeros matrix. The whole iterative procedure is outlined in Algorithm 1. The stopping condition of Algorithm 1 can be set as where is a sufficiently small positive number.