Abstract

This paper considers the problem of recovering low-rank matrices which are heavily corrupted by outliers or large errors. To improve the robustness of existing recovery methods, the problem is solved by formulating it as a generalized nonsmooth nonconvex minimization functional via exploiting the Schatten -norm and seminorm. Two numerical algorithms are provided based on the augmented Lagrange multiplier (ALM) and accelerated proximal gradient (APG) methods as well as efficient root-finder strategies. Experimental results demonstrate that the proposed generalized approach is more inclusive and effective compared with state-of-the-art methods, either convex or nonconvex.

1. Introduction

In many practical applications, such as removing shadows and specularities from face images, separating foregrounds and backgrounds from monitored videos, ranking, and collaborative filtering, the observed data matrix can naturally be decomposed into a low-rank matrix and a corrupted matrix . That is, , where can be arbitrarily large and is usually assumed to be sparse and unknown. The problem is whether it is possible to recover the low-rank matrix from the observed . Recently, it has been shown that the answer is affirmative as long as the corrupted matrix is sufficiently sparse and the rank of the low-rank matrix is sufficiently low [1, 2]. That is to say, under certain conditions one can exactly recover from the low-rank matrix with high probability by solving the following convex optimization problem, that is, the idealization of robust principal component analysis (RPCA): where denotes the nuclear norm of the low-rank matrix , that is, the sum of the singular values of , denotes the norm of the matrix when seen as a vector, and is a positive tuning parameter. Recently, there has been a lot of research focusing on solving the RPCA problem, for example, [36].

In spite of the great theoretical success and wide practical applications of RPCA (1), its major limitation should be claimed due to the use of nuclear and norms as regularizers. Specifically, compared with the intrinsic rank constraint, that is, rank , the nuclear norm regularizer will not only do more harm to the large singular values of but also lead to weaker shrinkage of the disturbed small singular values. It is not hard to make a similar analysis with the case of the norm regularizer. Then, the performance of RPCA (1) in dimensionality reduction and outlier separation will not be as good as expected in some scenarios. In Section 3, it has been empirically demonstrated that RPCA (1) is not robust to the case that either the matrix is not sufficiently low-rank or the matrix is more grossly corrupted.

To improve the robustness of RPCA [36], this paper proposes a generalized nonsmooth nonconvex minimization framework for low-rank matrix recovery by exploiting the Schatten -norm and seminorm. And two numerical algorithms are deduced based on the augmented Lagrange multiplier (ALM) and accelerated proximal gradient (APG) methods as well as efficient root-finder strategies. Experimental results show that the proposed generalized approach is more inclusive and effective compared with state-of-the-art methods [36]. Notice that much recently a nonconvex relaxation approach for low-rank matrix recovery [7] is proposed exploiting a nonconvex penalty called minmax concave plus and also a nonconvex loss function. However, our approach is different from [7] and is better in the terms of recovery accuracy and robustness than [7] as well as the other two nonconvex methods [8, 9].

The paper is organized as follows. Section 2 provides the generalized nonsmooth nonconvex minimization framework, including the problem formulation and two numerical algorithms based on the ALM and APG methods. Section 3 verifies the recovery performance of the proposed method and compares it against state-of-the-art methods. Finally, the paper is concluded in Section 4.

2. Proposed Model and Algorithms

2.1. Problem Formulation

Taking into account both the recovery robustness and computational efficiency, the Schatten -norm is used to better approximate the intrinsic rank constraint rank; similarly, the seminorm is exploited to replace the norm of a matrix when seen as a vector. It is now intuitive to generalize RPCA as follows: where , , and are defined, respectively, in the following. Assume ; then the seminorm of a matrix when seen as a vector can be defined as where is the ,th element of ; and the Schatten -norm of a matrix can be defined as where is the th singular value of and the singular value decomposition (SVD) of is . Clearly, as , (2) reduces to convex RPCA in (1); as , (2) corresponds to a constrained nonsmooth and nonconvex minimization problem. Now, it comes to study the numerical iteration schemes of (2).

In recent several years, communities of signal processing and computational mathematics show more and more interests in developing efficient algorithms for nonlinear nonsmooth optimization problems [10], such as iterative soft thresholding, split Bregman iteration, accelerated proximal gradient, augmented Lagrange multiplier, and so on, which have significantly simplified sparse optimization problems including RPCA (1). In a similar spirit to [1, 3, 4, 7], this paper exploits the accelerated proximal gradient (APG) and augmented Lagrange multiplier (ALM) methods to solve the generalized minimization problem (2), considering the fact that APG and ALM are the two most popular numerical algorithms currently. As for ALM, it has been shown [11] that under certain general conditions, ALM converges linearly to the optimal solution. As for APG, though little is known about the actual convergence of its produced sequences, the rate of convergence of the objective function that they achieve is optimal [10]. However, we should note that the above mentioned convergence results are not applicable to the new problem (2) because of its nonsmooth and nonconvex properties. In spite of that, empirical studies in Section 3 demonstrate that the two deduced algorithms in the following can both solve (2) very well, with empirically fast convergence rate.

2.2. ALM-Based Algorithm

This subsection exploits ALM to solve problem (2), which is a nonconvex extension of [3]. First of all, define functions and as According to ALM, the Lagrange function for (2) is given as where is a matrix of Lagrange multipliers and is the augmented Lagrange penalty parameter. It is seen that can be solved iteratively by alternating minimization of . In the meanwhile, a continuation strategy is applied to in order to improve both the accuracy and efficiency of low-rank matrix recovery. Specifically, the iteration process is described in Algorithm 1.

Set and . Initialize
while Residual error =
;
;
if     break; end
end

In the following, are solved, respectively, by As for (7), suppose that ; then (7) can be rewritten as According to von Neumann’s theorem on singular values [12], we have Hence, the minimization problem (7) can be approximated by More concretely, that is, Let . It is not hard either to find that the minimization problem (8) can be reduced to From (12) and (13), the kernel of Algorithm 1 is the following root-finder problem:

Here, we borrow the numerical idea in [13] to solve (14). For , analytical solutions are used as calculated by Algorithms 2 and 3 in [13]. For all other values, the numerical root-finder method Newton-Raphson is exploited to solve (14).

2.3. APG-Based Algorithm

This subsection exploits AGP as well as the continuation technique to solve problem (2), which is a nonconvex extension of [4]. First of all, a relaxed minimization problem is produced from (2); that is, where and is a relaxation parameter. Obviously, is different from the Lagrange function of (2) in Section 2.2. However, instead of directly minimizing , a sequence of separable quadratic approximations to is minimized, denoted as , where are specifically chosen points and . Then, can be solved iteratively by alternating minimization of with reasonable choices of . That is, where To assure both the accuracy and efficiency of minimizing , two key strategies are taken into account deliberately. For one thing, are determined by iterative smoothed computation as suggested in [14]. For another, the continuation technique is also applied to , just similar to Algorithm 1. Moreover, the stopping criterion is identical to the one proposed in [15] and utilized in [4]. The iteration scheme is presented in Algorithm  2 specifically.

Set . Initialize .
while Residual error =
;
,
;
if   break; end
end

In the following, can be solved, respectively, by Similar to (12) and (13) in Section 2.2, both (19) and (20) are instances of rooter-find problems (14) and hence can be solved efficiently by borrowing the numerical idea in [13].

3. Experimental Results

3.1. Experimental Settings

In this section, simulation experiments are designed and conducted to show the validity of our proposed approach. It firstly needs to produce available data using , in which and are, respectively, the true low-rank and sparse matrices that we wish to recover. Without loss of generality, is generated as a product of two matrices whose entries are sampled i.i.d. from Gaussian distribution , and the sparse matrix is constructed by setting a proportion of entries to be ±1 and the rest to be zeros. More specifically, if and spr represent, respectively, the matrix rank and sparsity ratio, then the MATLAB v7.0 scripts for generating and can be given as(i)A = 1/m*randn(m,r)*1/m*randn(r,m);(ii)B = zeros(m,m); (iii)p = randperm(m*m); (iv)L = round(spr*m*m);(v)B(p(1:L)) = sign(randn(L,1));

In the following experiments, set to 500, to 50, 100, 150, and 200, and spr to 5%, 10%, 15%, and 20%. To be noted, the matrix recovery problem (2) roughly changes from easy to hard as or spr changes from small to large. To assess the accuracy of low-matrix recovery, the relative squared error (RSE) is used, defined as where is the recovered low-rank matrix. And the number of SVD’s is used to evaluate the computational efficiency, since the running time of Algorithms 1 and 2 as well as [1, 3, 4, 7] is dominated by the SVD in each iteration. The experiments in this paper are conducted on a Lenovo computer equipped with an Intel Pentium (R) Core i5-3470 CPU (3.20 GHz) and 8 GB of RAM.

3.2. Comparison between Algorithm 1 and RPCA [3]

In the literature, although several different numerical algorithms solving RPCA have been reported [36], the ALM method [3] is shown possessing the best performance in both accuracy and efficiency. Hence, this subsection compares Algorithm 1 with its convex and reduced version, that is, RPCA [3]. As implementing Algorithm 1, the parameters are uniformly set as , and the parameter is set as , where is set as and is the largest singular value of . Besides, are both set as zero matrices. As for value choices of and , we set them as 0.85 based on empirical studies, despite the fact that it may produce more accurate recovery with choices adaptive to different and spr.

Experimental results of Algorithm 1 and [3] are provided in Tables 1, 2, 3, and 4 corresponding to different settings. As the sparsity ratio spr equals 5%, it is obviously observed that Algorithm 1 performs perfectly in recovering the true rank of and is better than [3] in the term of RSE. It is also noticed that, as the sparsity ratio spr becomes larger, the recovery accuracy of both Algorithm 1 and [3] reduces, too. But it is still the case that Algorithm 1 behaves better than [3] no matter in the term of RSE or true rank recovery.

One more point should be claimed which is that slightly lower RSE’s can be achieved by Algorithm 1 as setting . However, since the improvement in recovery accuracy is very limited, we just choose for computational efficiency.

3.3. Comparison between Algorithm 2 and RPCA [4]

As running Algorithm 2, the parameters are uniformly set as and , and the parameter is set as , where is set as the largest singular value of , that is, . In addition, , and are all set as zero matrices. As for and , similar to the above manner, they are both set as 0.9 based on intensive empirical studies.

Experimental results of Algorithm 2 and [4] are provided in Table 5, 6, 7, and 8 corresponding to different settings. It is also remarkable that Algorithm 2 recovers the true rank of in almost all the scenarios, which is much superior to Algorithm 1. Its second advantage over Algorithm 1 is that it achieves slightly more robust recovery as is much grossly corrupted and is not sufficiently low-rank; for example, rank() is 200. In spite of that, it is observed in other majority of cases that Algorithm 1 outperforms Algorithm 2 by in the term of RSE. Therefore, it can be concluded that both algorithms possess their own advantages and disadvantages, and on the whole, Algorithm 1 shows better performance in terms of both recovery accuracy and efficiency.

3.4. Comparison between Proposed Approach and [7]

In the literature several nonconvex approaches for low-rank matrix recovery have also been proposed, for example, [79]. However, only [7] announces that it outperforms ALM-based RPCA [3] in the term of recovery accuracy.

Table 9 presents the RSE, number of SVD, and recovered rank achieved by [7] with sparsity ratios equal to 5% and 20%. Making comparison among Tables 1, 4, 5, 8, and 9, we can claim that both Algorithms 1 and 2 outperform [7] in terms of RSE and true rank recovery when is not sufficiently low-rank or is much grossly corrupted. In the meanwhile, we should also note that our method is computationally less efficient than [7] because of slightly more SVD’s used in each iteration, which is one of the future works to be studied.

3.5. Empirical Analysis on the Convergence of Algorithms 1 and 2

As mentioned earlier, existed convergence results on ALM and APG in [10, 11] are not applicable to problem (2) due to the usage of nonconvex seminorm and Schatten -norm, which makes it difficult to conduct theoretical convergence analysis of the proposed algorithms, either. In spite of that, the empirical analysis can be made by plotting the residual error curve against iteration number for each algorithm. Specifically, the residual error curves are provided as the sparsity ratio equals 20% for both Algorithms 1 and 2, as shown, respectively, in Figures 1 and 2. It is obvious that the two deduced algorithms are of empirically fast convergence in each recovery scenario. Actually, this observation is also valid to other easier recovery cases with lower sparsity ratios. In addition, the number of iterations can be deduced from the residual error curves for each recovery problem.

4. Conclusions and Discussions

In this paper, a generalized robust minimization framework is proposed for low-rank matrix recovery by exploiting the Schatten -norm and seminorm. And two numerical algorithms are deduced based on the ALM and APG methods as well as efficient root-finder techniques. Experimental results demonstrate that the proposed algorithms possess their own advantages and disadvantages and both perform more effectively than state-of-the-art methods, either convex or nonconvex, in terms of both RSE and true rank recovery.

Note that this paper does not consider the influence of additive noise on the proposed algorithms, which actually corresponds to the problem of noisy RPCA [16, 17]. As claimed in [17], noisy RPCA is intrinsically different from the RPCA problem, that is, the focus of this paper. Indeed, the proposed algorithms in this paper are not quite robust to additive noise, just the same as many existing approaches to RPCA, for example, [14, 69]. To some degree, this observation coincides with the investigations in [18, 19]; that is, the seminorm as a sparsity-enforcing penalty is vulnerable against the influence of additive noise on the data, as it resembles the seminorm when approaches 0, in spite of the fact that in Algorithms 1 and 2 is chosen, respectively, as 0.85 and 0.9. Our future research topic is to extend the proposed algorithms to the noisy RPCA problem with applications to the field of image and vision computing.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgment

The authors would like to show gratitude to the anonymous reviewers for their serious, pertinent, and helpful comments. Wen-Ze Shao is much grateful to Professor Zhihui Wei, Professor Yizhong Ma, and Dr. Min Wu for their kind supports in the past years. He also shows many thanks to Mr. Yatao Zhang and other kind people for helping him through his lost and sad years. The work is supported in part by the NSF of Jiangsu Province (BK20130868, BK20130883), the NSRF of Jiangsu Universities (13KJB510022, 13KJB120005), the TIF of NJUPT (NY212014, NY213007, NY213011, NY213066, and NY213139), and the NSF of China (61203270).