Abstract

As a special three-block separable convex programming, the stable principal component pursuit (SPCP) arises in many different disciplines, such as statistical learning, signal processing, and web data ranking. In this paper, we propose a proximal fully parallel splitting method (PFPSM) for solving SPCP, in which the resulting subproblems all admit closed-form solutions and can be solved in distributed manners. Compared with other similar algorithms in the literature, PFPSM attaches a Glowinski relaxation factor to the updating formula for its Lagrange multiplier, which can be used to accelerate the convergence of the generated sequence. Under mild conditions, the global convergence of PFPSM is proved. Preliminary computational results show that the proposed algorithm works very well in practice.

1. Introduction

Recovering the low-rank and sparse components of a given matrix is a fundamental task in many scientific fields, which can be modeled as the following two-block separable nonconvex and nonsmooth programming:where is a given matrix, is the rank of the matrix , denotes the -norm of , which is defined by the number of the nonzeros of its entries, and is a trade-off parameter balancing the low rank and sparsity. Recent studies [13] indicate that the NP-hard task (1) can be accurately accomplished by solving one of its convex relaxation problems: the principal component pursuit (PCP) problem, in which the rank and the -norm are replaced by their tightest convex relaxations, that is, the nuclear norm and the -norm, respectively. The PCP problem can be mathematically modeled as the following optimization problem:where the nuclear norm is the sum of all singular values of , which is to induce the low-rank component of the given matrix , and the -norm is the sum of the absolute values of all entries of , which is to induce the sparse component of the given matrix .

To capture even more applications, Zhou et al. [4] considered that the observation matrix is corrupted by both small entry-wise noise and gross sparse errors. Then more general case was studied by Tao and Yuan [5] wherein only a fraction of entries of the given matrix can be observed, and they suggested recovering the low-rank and sparse components of by solving the following stable principal component pursuit (SPCP) problem:where is related to the impulsive or Gaussian noise level, denotes the Frobenius-norm, is the index set of the observable entries of , and is the projection operator defined by In [6], Hou et al. transformed (3) into the following unconstrained problem:where is a penalty parameter. Problems (3) and (5) are both convex optimization programming, which can be equivalently reformulated as some semidefinite programming (SDP) problems and then solved by using some state-of-the-art SDP solvers in polynomial time, such as SeDuMi [7] and SDPT3 [8]. However, these solvers are not applicable to high-dimensional (3) and (5), because a linear system needs to be solved (in)exactly at each iteration, which may be more costly and require very large amount of memory in actual implementations [9].

By introducing an auxiliary variable , the model (5) can be rewritten as the following equivalent form:which decouples the term of (5). Recently, there has been an intensive study on the iteration methods for solving (6) [5, 6, 1012]. Interestingly, these methods usually only exploit the first-order information of the objective function of (6), and most of them are based on the well-known alternating direction method with multipliers (ADMM), such as the sequential ADMM [11], the partially parallel ADMMs [5, 6, 10, 12], and the fully parallel ADMMs [13, 14]. As an influential first-order iteration method in optimization, ADMM was independently proposed by Glowinski and Marrocco [15] and Gabay and Mercier [16] about forty years ago, and due to its high efficiency in the numerical simulation, ADMM has been receiving considerable attention in recent years. ADMM can be viewed as an application of the Douglas-Rachford splitting method to the dual of the two-block separable convex programming (2) [17] or a special case of the proximal point method for the general convex programming [18], or a generalization of the classical Uzawa method for solving the saddle-point problems [19]. Chen et al. [20] showed that the sequential ADMM cannot directly extend to three-block separable convex programming by a simple counterexample, and similarly He et al. [13] showed that the full parallel ADMM may be divergent even for two-block separable convex programming. There are three strategies to deal with the issue: the substitution procedure [13, 21]; the technique of regularization [14, 22]; the technique of small step size [23]. For other recent developments of ADMM, including the (sub)linear convergence rate, various acceleration techniques, indefinite regularized matrices, larger step size, and its generalization to solve nonconvex and nonsmooth multiblock separable programming, we refer to [2428].

In this paper, we focus the attention on the fully parallel ADMM, which is based on the substitution procedure and is more suitable for distributed computation. Following the full Jacobian decomposition of the augmented Lagrangian method (FJDALM) in [13], we shall propose a new fully parallel ADMM, termed as the proximal fully parallel splitting method (PFPSM) in the following analysis. Compared to other similar methods in the literature, the new iteration method has three properties. The first one is that all of its subproblems are regularized by some proximal terms, which can enhance its numerical efficiency in practice. The second one is that a Glowinski relaxation factor is attached to the updating formula for its Lagrange multiplier, and larger values of often can accelerate the convergence of the proposed method (as to be reported in Section 4). To the best of our knowledge, this is the first fully parallel ADMM-type method with the Glowinski relaxation factor for solving problem (6). The third one is that it dynamically updates the step size .

The remainder of this paper is organized as follows. In Section 2, we present some necessary notations and useful lemmas. In Section 3, we propose the proximal fully parallel splitting method for solving problem (6) and prove its global convergence step by step. Some numerical results about the proposed method are shown and discussed in Section 4. Finally, in Section 5, some concluding remarks are drawn and several future research directions are discussed also.

2. Preliminaries

In this section, we briefly introduce some necessary notations, definitions, and lemmas.

For any matrix , let denote the Frobenius-norm of . For any two matrices , denotes their Frobenius inner product. Then, we have . If is symmetric positive definite matrix, we denote by the -norm of the matrix . The effective domain of a function is defined as . The set of all relative interior points of a given nonempty convex set is denoted by . diag has as its th block on the diagonal.

A function is convex if and only if Then, for a convex function , we have the following basic inequality:where denotes the subdifferential of at the point . The above definition and property (8) can be easily extended to the matrix-valued function.

For the convenience of our analysis, let us relabel as , as , and as and denote . Then are all convex function.

We make the following standard assumption about problem (6) in this paper.

Assumption 1. The generalized Slater’s condition holds; that is, there is a point , where the set is defined by

Under Assumption 1, it follows from Theorems and of [29] that is an optimal solution of problem (6) if and only if there exists a matrix such that is a solution of the following KKT systems:The following lemma is the basis to reformulate (10) as a mixed variational inequality problem.

Lemma 2. For any two matrices , the relationship is equivalent to the following mixed variational inequality problem:

Proof. From , one has which together with the convexity of and the subgradient inequality (8) implies that is,Conversely, from , one has which impliesFrom this and Theorem of [29], we have . The proof is completed.

Remark 3. Based on (10) and Lemma 2, the matrix is an optimal solution to problem (6) if and only if there exists a matrix such thatMoreover, any satisfying (17) is an optimal solution to the dual of problem (6).

Furthermore, we setObviously, (17) can be written as the following mixed variational inequality problem, abbreviated as VI: find a matrix such thatwhere , andThe set of the solutions of VI, denoted by , is nonempty by Assumption 1 and Remark 3. Because is a linear mapping and its coefficient matrix is skew-symmetric, it satisfies the following nice property:

3. The Proximal Fully Parallel Splitting Method

In this section, we shall present the proximal fully parallel splitting method for solving (6) and discuss its convergence results.

Firstly, the augmented Lagrangian function of the convex programming (6) can be written as where is the penalty parameter and is the Lagrange multiplier for the linear constraint of (6). To simplify our notation in later analysis, we define a block diagonal matrix as follows:where are some symmetric positive semidefinite matrices and is a predefined parameter. Note that the matrix is symmetric positive definite.

Algorithm 4 (proximal fully parallel splitting method (termed as PFPSM)).
Step 0. Given the  tolerance , two constants , and the initial iterate . Set .
Step 1 (prediction step). Compute the temporary iterate byStep 2 (convergence verification). If the stopping criterionis satisfied, then stop.
Step 3 (correction step). Generate the new iterate bywhere is any symmetric positive definite matrix and is the step size defined bywithSet and go to Step .

Remark 5. In PFPSM, we apply the classical proximal point algorithm (PPA) to regularize its subproblems. Compared to the iteration method in [13], the new iteration method PFPSM extends the scope of from 1 to the interval , and larger values of are often more preferred in practice; see Figure 2. Furthermore, compared to the iteration method in [14], the new iteration method PFPSM removes the restriction imposed on the regularized matrices ; see condition in [14].

The following lemma provides a lower bound of the , that is, the numerator of the step size , and this lower bound plays an important role in the proof of the global convergence of PFPSM.

Lemma 6. Let be the matrix generated by PFPSM from the given and be defined by (28). Then we havewhere .

Proof. By the matrix form of the inequality with and , one hasIt follows from (23) and (28) that where the first inequality comes from (31) and the second inequality follows from . The proof is completed.

Remark 7. It follows from (27) and (29) thatwhere is the minimum eigenvalue of the positive definite matrix . Therefore the sequence generated by PFPSM is bounded away from zero. Specially, if we set , then

Now, we shall show that the stopping criterion (25) is reasonable.

Lemma 8. If , then the matrix is a solution of VI.

Proof. Deriving the first-order optimal condition of subproblem in (24), we have From the updating formula for in (24), one has and, substituting it into the above inequality, we get which can be reduced to Similarly, for the other two variables , we have Furthermore, the updating formula for in (24) can be rewritten as the following variational inequality problem: Then, adding the above four inequalities, and by manipulation, one hasTherefore, substituting into (41), we get that is, which indicates that is a solution of VI. This completes the proof.

If the iteration method PFPSM terminates at Step , then the iterate is an approximate solution of VI. Thus, it is assumed, without loss of generality, that the iteration method PFPSM generates an infinite sequence . Now, we intend to prove that is a descent direction of the merit function at the point , where .

Lemma 9. Let and be the two sequences generated by PFPSM. Then, one has

Proof. Since , it follows from (41) that Thus,where the second inequality follows from , and the second equality comes from (24). Therefore, we have which completes the proof of the lemma.

From inequalities (29) and (44), one has which indicates that is a descent direction of the merit function at . Therefore, it is natural to generate the new iterate along this direction as the iterative scheme (26). Now, let us investigate how to choose a suitable step size to achieve progress as much as possible. The new iterate with an undeterminate step size is denoted by , so one hasThen, letbe the profit function of the th iteration. However, we cannot maximize directly because is unknown. Motivated by the strategy in [13], we derive a tight lower bound of in the next lemma, which is independent of the unknown matrix , to obtain a suboptimal step size.

Lemma 10. For any given matrix , let be the matrix generated by (24), be defined by (28), and be defined by (50). Then, one has

Proof. The proof is similar to that of Lemma   in [13] and is omitted here.

It follows from Lemma 9 that is a lower bound of for any . Therefore, we can maximize to accelerate the convergence of PFPSM. Note that is a quadratic function of and it reaches its maximum at defined by (27). The following theorem shows that the sequence generated by PFPSM is Fejér monotone with respect to the solution set of VI.

Theorem 11. For any , there exists a positive constant such that the sequence generated by PFPSM satisfieswhere

Proof. It follows from (50) and (51) thatwhere the second equality follows from the definition of in (27), and the second inequality comes from (29) and (33). Then, assertion (52) is thus proved.

Remark 12. For some very large-scale (6), computing the step size defined by (27) is a fussy work needing much calculating. For these instances, set in PFPSM, and the step size can be fixed as the constant defined in Lemma 6. The new iterate can be generated byThen, referring to the proof of the above theorem, we have Based on the Fejér monotonicity of the sequence generated by PFPSM, the global convergence of PFPSM can be proved by similar arguments to those in [13], which is omitted.

Theorem 13. The sequence generated PFPSM converges to some .

4. Numerical Results

In this section, we run a series of numerical experiments and apply the proposed PFPSM to solve some concrete examples of SPCP (6). We compare it with the splitting method for separable convex programming (denoted by HTY) in [30], the full Jacobian decomposition of the augmented Lagrangian method (denoted by FJDALM) in [13], and the fully parallel ADMM (denoted by PADMM) in [14]. Through these numerical experiments, we want to illustrate that suitable proximal terms can enhance PFPSM’s numerical efficiency in practice, larger values of the Glowinski relaxation factor can often accelerate PFPSM’s convergence speed, and compared with the other three ADMM-type methods the dynamically updated step size defined in (27) can accelerate the convergence of PFPSM. All codes were written by Matlab R2010a and conducted on a ThinkPad notebook with Pentium(R) Dual-Core CPU [email protected] GHz, 4 GB of memory.

Now, let us first introduce two shrinkage operators, which are used to solve the subproblems of the four tested methods. Let be a constant and be a given matrix, and let be the singular value decomposition (SVD) of . Then, the solution of the optimization problemis given by the operator , which is defined by [31]The solution of the optimization problem is given by the operator , which is defined componentwisely by [32]where is the sign function.

In the following, let us discuss how to solve the three subproblems of PFPSM. In fact, these three subproblems are simple enough to have closed-form solutions, which are listed as follows. For simplicity, we set with in PFPSM. Based on the two shrinkage operators (58) and (60), the closed-form solutions of the three subproblems in the prediction step of PFPSM are summarized as follows:(i)The explicit solution is(ii)The explicit solution is(iii)The subproblem related to is Then, the explicit solution is given as follows:

4.1. Synthetic Data

In the experiment, following [5], we used the following synthetic data: the true low-rank matrix is generated using normally distributed and . Hence, the rank of is , and the low-rank ratio of is defined by . The nonzero entries of sparse matrix or are generated uniformly in , whose sparsity ratio is denoted by , defined by , where represents the number of nonzero entries of . Then, we get the matrix . Here, we consider that only a fraction of entries of can be observed, and indices of the observed entries are collected in the index set , which is determined randomly with a certain sample ratio (the ratio is denoted by %, where is the cardinality of ). Then, we get the matrix , which is further corrupted by adding Gaussian noise with standard deviation . Finally, we get the true observed matrix . Here, we test the following scenarios: , , , and . The parameters and in (6) are set as , respectively.

Throughout, we use the following stopping criterion:and the initial point is set as . For the penalty parameter , we take for all tested methods. The parameters in the tested methods are listed as follows:HTY: .FJDALM: .PADMM: .PFPSM: .

Let us first test the sensitivity of PFPSM with respect to the proximal parameter and the Glowinski relaxation factor , and the numerical advantage of attaching the parameters and to the iterative scheme of PFPSM is thus clear. We choose 0 : 0.1 : 1 and fix , and . The maximum number of iterations is set to 500. The numerical results are plotted in Figure 1, in which “RLR” denotes the relative error of the recovered low-rank matrix and “RSP” denotes the relative error of the recovered sparse matrix when the stopping criterion (65) is satisfied. Figure 1 shows that small values of can improve the accuracy of the solutions, while large values of can accelerate the convergence speed of PFPSM. Then taking into consideration the numerical performance of other comparing methods, in the following we set . Now let us make the sensitivity analysis of the parameter . We choose 0.86 : 0.02 : 1.15 and fix and . The maximum number of iterations is also set to 500. The numerical results are plotted in Figure 2. The two subplots in Figure 1 indicate that the number of iterations is descent and the elapsed CPU time has a descent tendency as the parameter increases, which verify the numerical advantage of PFPSM with large values of . Then, based on this observation, we set in the following experiments.

Now, we compare PFPSM with PPSA, FJDALM, and PADMM to examine its numerical advantage from aspects of the number of iterations (denoted by “Iter.”), the elapsed CPU time in seconds (denoted by “Time”), the relative error of the recovered low-rank matrix, and the relative error of the recovered sparse matrix when the stopping criterion (65) is satisfied. Some numerical results are listed in Tables 1 and 2, which are the average results over 20 trials for each setting of parameters.

The numerical results in Tables 1 and 2 indicate the following: From the “Iter.” criterion, the tested methods all successfully recover the given matrix ; from the “Iter.” and “Time” criteria, our new method PFPSM performs better than the other three tested methods. It is almost the fastest for all the cases, and the reason may be that the suitable choice of the parameter can accelerate the convergence of PFPSM; compared to HTY, PFPSM is a fully parallel splitting method, which only recorded the maximum time taken by its three subproblems. Thus, we observe in Tables 1 and 2 that PFPSM takes less CPU time than HTY, though it needs to calculate a complicated correction step at each iteration or/and takes more iterations to terminate than HTY. In fact, considering the case as an example, we find that the correction step of PFPSM takes about 7.26% of its whole CPU time, while the computation of the recovered sparse matrix and the simple correction step of HTY takes about 13.10% of its whole CPU time. In conclusion, the proposed PFPSM can be used as a surrogate of HTY, FJDALM, and PADMM.

In Figures 3 and 4, we display the evolution of and with respect to the iteration counter , respectively, from which we can see that PFPSM performs better than FJDALM and PADMM and slightly worse than HTY in terms of the accuracy. The reason may be that PFPSM, FJDALM, and PADMM are fully parallel methods, while HTY is a partially parallel method. Therefore, HTY can fully utilize the latest iteration information to speed up its convergence. The relationship is just like that of the Jacobi iteration method and the Gauss-Seidel iteration method for linear system of equations. However, as illustrated in Table 1, the CPU time of PFPSM is almost always smaller than that of HTY, which is the main advantage of the parallel methods over the partially parallel or sequential methods.

4.2. Background Extraction from Surveillance Video with Missing and Noisy Data

In this subsection, we are going to show the efficiency of the proposed method by computing the real-world problem: background extraction from surveillance video with missing and noisy data. In the experiment, following [6], we focus on a well-tested video taken in an airport, which consists of 50 grayscale frames with each frame having pixels. We vectorize all frames of the video and get a matrix where each column represents a frame, which is further contaminated with additive Gaussian noise of mean zero and standard deviation . Similar to the last experiment, only a fraction of entries of can be observed, and indices of the observed entries are also collected in the index set . In the experiment, we set and the initial iteration is set to be zero. Other parameters remain the same as those used in the last experiment for all the tested algorithms. The numerical results are given in Table 3, where Obj. represents the value of the objective function of (1) when the tested methods terminate.

Table 3 shows that although PFPSM implements more iterations than HTY for all test instances, the performance of PFPSM is much better than the other two methods in recovering low-rank and sparse components. In Figure 5, we exhibit a frame in the tested video and the separation results with 30% random missing data. Clearly, all the tested methods can extract the disturbed image quite well.

5. Conclusion

In this paper, we have proposed a new fully parallel splitting method for the stable principal component pursuit problem, which inherits all merits of ADMM and parallel method. Compared to other similar methods in the literature, the new method attaches a Glowinski relaxation factor to the updating formula for the Lagrange multiplier, and large values of are often beneficial to its convergence in general. Preliminary numerical results indicate that the new method is quite efficient.

To ensure the convergence of the proposed iteration method, the scope of the involved parameter is , which is a proper subset of the interval , the scope of the Glowinski relaxation factor of the sequential ADMM-type methods for two-block separable convex programming [25, 33]. Therefore, it is worth extending the scope of the involved parameter and studying its optimal choice.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Authors’ Contributions

The first author has proved the convergence results; the second author and the third author have accomplished the numerical experiment. All authors read and approved the final manuscript.

Acknowledgments

This work is supported by the National Natural Science Foundation of China (nos. 11671228 and 11601475), the Foundation of First Class Discipline of Zhejiang-A (Zhejiang University of Finance and Economics-Statistics), the Foundation of National Natural Science Foundation of Shandong Province (no. ZR2016AL05), and Scientific Research Project of Shandong Universities (no. J15LI11).