Abstract

Low-rank matrix recovery (LRMR) has been becoming an increasingly popular technique for analyzing data with missing entries, gross corruptions, and outliers. As a significant component of LRMR, the model of low-rank representation (LRR) seeks the lowest-rank representation among all samples and it is robust for recovering subspace structures. This paper attempts to solve the problem of LRR with partially observed entries. Firstly, we construct a nonconvex minimization by taking the low rankness, robustness, and incompletion into consideration. Then we employ the technique of augmented Lagrange multipliers to solve the proposed program. Finally, experimental results on synthetic and real-world datasets validate the feasibility and effectiveness of the proposed method.

1. Introduction

In the community of pattern recognition, machine learning, and computer vision, the investigated datasets usually have intrinsically low-rank structure although they are probably high-dimensional. Low-rank matrix recovery (LRMR) [13] is just a type of model which utilizes the crucial low-complexity information to complete missing entries, recover sparse noise, identify outliers, and build an affinity matrix. It also can be regarded as the generalization of compressed sensing from one order to two orders due to the fact that the low rankness of a matrix is equivalent to the sparsity of its singular values. Recently, LRMR has received more and more attentions in the fields of information science and engineering and achieved great success in video background modeling [4, 5], collaborative filtering [6, 7], and subspace clustering [3, 8, 9], to name just a few.

Generally, LRMR is mainly composed of three appealing types, that is, matrix completion (MC) [1], robust principal component analysis (RPCA) [2, 4], and low-rank representation (LRR) [3, 8]. Among them, MC aims to complete the missing entries with the aid of the low-rank property and is initially described as an affine rank minimization problem. In the past few years, the affine rank minimization is convexly relaxed into a nuclear norm minimization [10] and it is proven that if the number of sampled entries and singular vectors satisfy some conditions, then most low-rank matrices can be perfectly recovered by solving the aforementioned convex program [1].

Classical principal component analysis (PCA) [11] is very effective to small Gaussian noise, but it does not work well in practice when data samples are corrupted by outliers or large sparse noise. For this purpose, several robust variants of PCA have been proposed successively during the past two decades [12, 13]. Since the seminal research work [4], the principal component pursuit (PCP) approach has become a standard for RPCA. This approach minimizes a weighted combination of the nuclear norm and the norm with linear equality constraints. It is proven that both the low-rank and the sparse components can be recovered exactly with dominant probability under some conditions by solving PCP [4].

In subspace clustering, a commonly used assumption is that the data lie in the union of multiple low-rank subspaces and each subspace has sufficient samples compared with its rank. Liu et al. [3] proposed a robust subspace recovery technique via LRR. Any sample in each subspace can be represented as the linear combination of the bases. The low complexity of the linear representation coefficients is very useful in exploiting the low-rank structure. LRR attempts to seek the lowest-rank representation of all data jointly and it is demonstrated that the data contaminated by outliers can be exactly recovered under certain conditions by solving a convex program [8]. If the bases are chosen as the columns of an identity matrix and the norm is employed to measure the sparsity, then LRR is changed into the PCP of RPCA.

For the datasets with missing entries and large sparse corruption, the robust recovery of subspace structures may be a challenging task. The available algorithms to MC are not robust to gross corruption. Moreover, a large quantity of missing values will bring out the degeneration of recovering performance for LRR or RPCA. In this paper, we attempt to address the problem of low-rank subspace recovery in the presence of missing values and sparse noise. Specifically speaking, we present a model of incomplete low-rank representation (ILRR) which is a direct generalization of LRR. The ILRR model can be boiled down to a nonconvex optimization model which minimizes the combination of the nuclear norm and the -norm. To solve this program, we design an iterative scheme by applying the method of inexact augmented Lagrange multipliers (ALM).

The rest of this paper is organized as follows. Section 2 briefly reviews preliminaries and related works on LRMR. The model and algorithm for ILRR are presented in Section 3. In Section 4, we discuss the extension of ILRR and its relationship with the existing works. We compare the performance of ILRR with the state-of-the-art algorithms on synthetic data and real-world datasets in Section 5. Finally, Section 6 draws some conclusions.

This section introduces the relevant preliminary material concerning matrices and representative models of low-rank matrix recovery (LRMR).

The choice of matrix norms plays a significant role in LRMR. In the following, we present four important types of matrix norms. For arbitrary , the Frobenius norm of is expressed by , the -norm is , the-norm is , and the nuclear norm is , where is the th entry of and is the th largest singular value. Among them, the matrix nuclear norm is the tightest convex relaxation of the rank function, and the -norm and the -norm are frequently used to measure the sparsity of a noise matrix.

Consider a data matrix stacked by training samples, where each column of indicates a sample with the dimensionality of . Within the field of LRMR, the following three proximal minimization problems [3, 4] are extensively employed: where is a positive constant used to balance the regularization term and the approximation error. For given and , we define three thresholding operators , , and as follows: where is the singular value decomposition (SVD) of . It is proven that the aforementioned three optimization problems have closed-form solutions denoted by [4], [14], and [8], respectively.

We assume that is low-rank. Because the degree of freedom of a low-rank matrix is far less than its number of entries, it is possible to recover exactly all missing entries from partially observed entries as long as the number of sampled entries satisfies certain conditions. Formally, the problem of matrix completion (MC) [1] can be formulated as follows: where , , . We define a linear projection operator as follows: Hence, the constraints in problem (3) can be rewritten as .

PCA obtains the optimal estimate for small additive Gaussian noise but breaks down for large sparse contamination. Here, the data matrix is assumed to be the superposition of a low-rank matrix and a sparse noise matrix . In this situation, robust principal component analysis (RPCA) is very effective to recover both the low-rank and the sparse components by solving a convex program. Mathematically, RPCA can be described as the following nuclear norm minimization [2]: where is a positive weighting parameter. Sequentially, RPCA is generalized into a stable version which is simultaneously stable to small perturbations and robust to gross sparse corruption [15].

We further assume that the dataset is self-expressive and the representation coefficients matrix is also low-rank. Based on the above two assumptions, the model of low-rank representation (LRR) [3] is expressed as where is the coefficient matrix, is the noise matrix, and is a positive trade-off parameter. This model is very effective to detect outliers and the optimal is in favor of the robust subspace recovery. In subspace clustering, the affinity matrices can be constructed by the optimal to problem (6).

Problems (3), (5), and (6) belong to the nuclear norm minimizations. The existing algorithms to the preceding optimizations mainly include the iterative thresholding, the accelerated proximal gradient, the dual approach, and the augmented Lagrange multipliers (ALM) [16]. These algorithms are scalable owing to the adoption of first-order information. Among them, ALM, also called alternating direction method of multipliers (ADMM) [17], is a very popular and effective method to solve the nuclear norm minimizations.

3. Model and Algorithm of Incomplete Low-Rank Representation

This section proposes a model of low-rank representation for incomplete data and develops a corresponding iterative scheme for this model.

3.1. Model

We consider an incomplete data matrix and denote the sampling index set by . The th entry of is missing if and only if . For the sake of convenience, we set all missing entries of to zeros. To recover simultaneously the missing entries and the low-rank subspace structure, we construct an incomplete low-rank representation (ILRR) model: where is a positive constant and corresponds to the completion argument of . If there is not any missing entry, that is, , then the above model is equivalent to LRR. In other words, LRR is a special case of ILRR.

In order to solve conveniently the nonconvex nuclear norm minimization (7), we introduce two auxiliary matrix variables and . Under this circumstance, the above optimization problem is reformulated as This minimization problem is equivalent to where the factor . Without considering the constraint , we construct the augmented Lagrange function of problem (9) as follows: where is the inner product operator between matrices and is a Lagrange multiplier matrix, . In the next part, we will propose an inexact augmented Lagrange multipliers (IALM) method to solve problem (8) or problem (9).

3.2. Algorithm

Inexact ALM (IALM) method employs an alternating update strategy and it minimizes or maximizes the function with respect to each block variable at each iteration. Let .

Computing . When is unknown and other variables are fixed, the calculation procedure of is as follows:

Computing . If matrix is unknown and other variables are given, is updated by minimizing : Let . By setting the derivative of to zero, we have or, equivalently, where is an -order identity matrix.

Computing . The update formulation of matrix is calculated as follows: Considering the constraint , we further obtain the iteration formulation of where is the complementary set of .

Computing . Fix, and and minimize with respect to :

Computing . Fix , and to calculate as follows: where . The derivative of is . Hence, we obtain the update of by setting :

Computing . Given , and , we calculate as follows In the detailed implementation, is updated according to the following formulations:

We denote by the zeros matrix. The whole iterative procedure is outlined in Algorithm 1. The stopping condition of Algorithm 1 can be set as where is a sufficiently small positive number.

Input: Data matrix , sampling index set , compromising parameter .
Initialize: , , , , , ,
     , , , , .
Output: and .
While not converged do
  (1) Update according to (11).
  (2) Update according to (14).
  (3) Update according to (16).
  (4) Update according to (17).
  (5) Update according to (19).
  (6) Update according to (21).
  (7) Update as .
End while

3.3. Convergence

When solving ILRR via the IALM method, the block variables are updated alternatively. Now, we update simultaneously these five block variables; namely, The modified method is called the exact ALM method. Since the objective function in problem (8) is continuous, the exact ALM method is convergent [18]. However, it is still difficult to prove the convergence of the IALM. There are two reasons for the difficulty: one is the existence of nonconvex constraints in (8) and the other is that the number of block variables is more than two. Nevertheless, the experimental results of Section 5 demonstrate the validity and effectiveness of Algorithm 1 in practice.

4. Model Extensions

In ILRR model, the main aim of the term is to enhance the robustness to noise and outliers. If we do not consider outliers, then should be replaced with . For the new ILRR model, we can design an algorithm by only revising Step 4 of Algorithm 1 as follows: If we substitute for and set , then problem (7) is the incomplete version of low-rank subspace clustering (LRSC) with uncorrupted data [9]. If we replace and by and , respectively, and incorporate into the constraints, then problem (7) is the incomplete version of sparse subspace clustering (SSC) without dense errors [19].

ILRR uses the data itself as the dictionary. Now, we extend the dictionary and noise sparsity to the more general cases. As a result, we obtain a comprehensive form of ILRR: where represents the dictionary, is the coefficients matrix, and indicates a certain norm of . If is an -order identity matrix and is chosen as , then problem (25) corresponds to the incomplete version of RPCA [20]. If we further reinforce , then problem (25) becomes the equivalent formulation of MC [16]. Moreover, if is an unknown orthogonal matrix and , then problem (25) is equivalent to matrix decomposition method for MC [21]. Finally, if we let and be stacked by the testing samples and the training samples, respectively, let be replaced by , and let , then problem (25) is changed into the incomplete version of robust pattern recognition via sparse representation [22].

5. Experiments

In this section, we validate the effectiveness and efficiency of the proposed method by carrying out experiments on synthetic data and real-world datasets. The experimental results of incomplete low-rank representation (ILRR) are compared with that of other state-of-the-art methods: sparse subspace clustering (SSC), low-rank subspace clustering (LRSC), and low-rank representation (LRR). For the latter three methods, the missing values are replaced by zeros. For SSC and LRSC, their parameters are tuned to achieve the best performance. The tolerant error is set to in all experiments.

5.1. Synthetic Data

We generate randomly an orthogonal matrix and a rotation matrix . Four other orthogonal matrices are constructed as , . Thus, five independent low-rank subspaces are spanned by the columns of , respectively. We draw 40 data vectors from each subspace by , , where the entries of are independent of each other and they obey standard normal distributions. We set and choose randomly its 20 column vectors to be corrupted. In this part, the chosen column vector is contaminated by Gaussian noise with zero mean and standard deviation . The resulting matrix is expressed as .

We draw samples on according to a uniform distribution and denote by the sampling index set. The sampling ratio (SR) is defined as , where is the cardinality of and means that no entry is missing. Hence, an incomplete matrix is generated by . The trade-off parameter in both LRR and ILRR is set to 0.1. After running Algorithm 1, we obtain the optimal low-rank representation matrix . On the basis of , we construct an affinity matrix , where means the absolute value operator. Then we choose spectral clustering [23] as the clustering algorithm and evaluate the clustering performance by normalized mutual information (NMI). Let be a set of true cluster labels and let be a set of clusters obtained from the spectral clustering algorithm. NMI is calculated as where is the mutual information metric and is the entropy of . NMI takes values in and a larger NMI value indicates a better clustering performance.

In the experimental implementation, we vary the SR from 0.2 to 1 with an interval of 0.1. For each fixed SR, we repeat the experiments 10 times and report the average NMI. We first compare the affinity matrices produced by SSC, LRSC, LRR, and ILRR, as shown partially in Figure 1. From Figure 1, we observe that our method exhibits obvious block structures compared to other three methods, whereas SSC, LRSC, and LRR show no block-diagonal structure with . This observation means that only ILRR can keep compact representation for the same subspace and divergent representation for different subspaces when a large number of values are missing.

Secondly, we compare NMI values of ILRR with that of other three methods at different SR, as shown in Figure 2. It can be seen from this figure that NMI values of SSC, LRR, and ILRR are almost one if , while ILRR has higher NMI values than other three methods if . In other words, the NMI values of SSC, LRSC, and LRR are reduced drastically with the decreasing of SR. These observations verify that ILRR is more robust than other methods in the presence of missing values.

5.2. Face Clustering

We carry out face clustering experiments on a part of Extended Yale Face Database B [24] with large corruptions. This whole database consists of 38 objects and each object has about 64 images. We choose the first 10 objects and each image is resized into 32 × 32. Thus, the face images dataset is represented by a matrix with size of 1024 × 640. Each column of the data matrix is normalized to an unit length in consideration of variable illumination conditions and poses.

The generation manner of sampling set is similar to that of the previous part. We vary the values of SR from 0.1 to 1 with an interval of 0.1 and set in LRR and ILRR. The comparison of NMI values is compared among SSC, LRSC, LRR, and ILRR, as shown in Figure 3, where each NMI value is the average result of ten repeated experiments. It can be seen from Figure 3 that ILRR achieves relatively stable NMI values if , while other three methods obtain much worse NMI values than ILRR if . This observation shows that SSC, LRSC, and LRR are more sensitive than ILRR on SR.

Furthermore, ILRR has an advantage in recovering the low-rank components over other three methods. In the following, we give the recovery performance contrast of ILRR. Here, we only consider two different SR; that is, and . For these two given SR, the original images, sampled images, recovered images, and noise images are shown partially in Figures 4 and 5, respectively. Among these sampled images, the unsampled entries are shown in white. From these two figures, we can see that ILRR not only corrects automatically the corruptions (shadow and noise) but also recovers efficiently the low-rank components.

5.3. Motion Segmentation

In this part, we test the proposed ILRR method for the task of motion segmentation. We consider eight sequences of outdoor traffic scenes, a subset of the Hopkins 155 dataset [25]. These sequences were taken by a moving handheld camera and they tracked two cars translating and rotating on a street, as shown partly in Figure 6. These sequences are drawn from two or three motions, where one motion corresponds to one subspace. There are 8 clustering tasks in total since motion segmentation of each sequence is a sole clustering task. The tasks of motion segmentation are carried out according to the given features extracted and tracked in multiple frames of the above video sequences.

The extracted feature points of each sequence can be reshaped into an approximately low-rank matrix with columns, where equals the number of feature points. In the detailed experimental implementation, we consider different SR varying from 0.2 to 1 with an interval 0.2. The regularization parameter set in LRR and ILRR is set to 2. For fixed sequence and fixed SR, each method is repeated 10 times and the mean NMI values are recorded. We report the mean and the standard derivation of NMI values among 8 sequences, as shown in Table 1.

From Table 1, we can see that ILRR obtains very high NMI values especially for , while NMI values of other three methods are unacceptable for . Although SSC segments exactly each subspace when , small SR has a fatal influence on SSC’s segmentation performance. In summary, ILRR is more robust to a large number of missing values than other three methods.

6. Conclusions

In this paper, we investigate the model of low-rank representation for incomplete data, which can be regarded as the generalization of low-rank representation and matrix completion. For the model of incomplete low-rank representation, we propose an iterative scheme based on augmented Lagrange multipliers method. Experimental results show that the proposed method is feasible and efficient in recovering low-rank structure, completing missing entries, and removing noise. It still needs further research on how to construct a general model of low-rank matrix recovery and design the corresponding algorithm.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work is partially supported by the National Natural Science Foundation of China (no. 61403298, no. 11326204, no. 11401457, and no. 11401357), by the Natural Science Basic Research Plan in Shaanxi Province of China (no. 2014JQ8323 and no. 2014JQ1019), by the Shaanxi Provincial Education Department (no. 2013JK0587 and no. 2013JK0588), and by the Hanzhong Administration of Science & Technology (no. 2013hzzx-39).