Mathematical Problems in Engineering

Volume 2017, Article ID 9674528, 15 pages

https://doi.org/10.1155/2017/9674528

## A Proximal Fully Parallel Splitting Method for Stable Principal Component Pursuit

^{1}School of Mathematics and Statistics, Linyi University, Shandong 276005, China^{2}School of Data Sciences, Zhejiang University of Finance and Economics, Zhejiang 310018, China^{3}School of Mathematics and Statistics, Zaozhuang University, Shandong 277160, China^{4}School of Management, Qufu Normal University, Shandong 276826, China

Correspondence should be addressed to Min Sun; moc.361@uodoaixuoyiz

Received 30 March 2017; Revised 8 June 2017; Accepted 29 August 2017; Published 25 October 2017

Academic Editor: Laurent Bako

Copyright © 2017 Hongchun Sun et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

As a special three-block separable convex programming, the stable principal component pursuit (SPCP) arises in many different disciplines, such as statistical learning, signal processing, and web data ranking. In this paper, we propose a proximal fully parallel splitting method (PFPSM) for solving SPCP, in which the resulting subproblems all admit closed-form solutions and can be solved in distributed manners. Compared with other similar algorithms in the literature, PFPSM attaches a Glowinski relaxation factor to the updating formula for its Lagrange multiplier, which can be used to accelerate the convergence of the generated sequence. Under mild conditions, the global convergence of PFPSM is proved. Preliminary computational results show that the proposed algorithm works very well in practice.

#### 1. Introduction

Recovering the low-rank and sparse components of a given matrix is a fundamental task in many scientific fields, which can be modeled as the following two-block separable nonconvex and nonsmooth programming:where is a given matrix, is the rank of the matrix , denotes the -norm of , which is defined by the number of the nonzeros of its entries, and is a trade-off parameter balancing the low rank and sparsity. Recent studies [1–3] indicate that the NP-hard task (1) can be accurately accomplished by solving one of its convex relaxation problems: the principal component pursuit (PCP) problem, in which the rank and the -norm are replaced by their tightest convex relaxations, that is, the nuclear norm and the -norm, respectively. The PCP problem can be mathematically modeled as the following optimization problem:where the nuclear norm is the sum of all singular values of , which is to induce the low-rank component of the given matrix , and the -norm is the sum of the absolute values of all entries of , which is to induce the sparse component of the given matrix .

To capture even more applications, Zhou et al. [4] considered that the observation matrix is corrupted by both small entry-wise noise and gross sparse errors. Then more general case was studied by Tao and Yuan [5] wherein only a fraction of entries of the given matrix can be observed, and they suggested recovering the low-rank and sparse components of by solving the following stable principal component pursuit (SPCP) problem:where is related to the impulsive or Gaussian noise level, denotes the Frobenius-norm, is the index set of the observable entries of , and is the projection operator defined by In [6], Hou et al. transformed (3) into the following unconstrained problem:where is a penalty parameter. Problems (3) and (5) are both convex optimization programming, which can be equivalently reformulated as some semidefinite programming (SDP) problems and then solved by using some state-of-the-art SDP solvers in polynomial time, such as SeDuMi [7] and SDPT3 [8]. However, these solvers are not applicable to high-dimensional (3) and (5), because a linear system needs to be solved (in)exactly at each iteration, which may be more costly and require very large amount of memory in actual implementations [9].

By introducing an auxiliary variable , the model (5) can be rewritten as the following equivalent form:which decouples the term of (5). Recently, there has been an intensive study on the iteration methods for solving (6) [5, 6, 10–12]. Interestingly, these methods usually only exploit the first-order information of the objective function of (6), and most of them are based on the well-known* alternating direction method* with* multipliers* (ADMM), such as the sequential ADMM [11], the partially parallel ADMMs [5, 6, 10, 12], and the fully parallel ADMMs [13, 14]. As an influential first-order iteration method in optimization, ADMM was independently proposed by Glowinski and Marrocco [15] and Gabay and Mercier [16] about forty years ago, and due to its high efficiency in the numerical simulation, ADMM has been receiving considerable attention in recent years. ADMM can be viewed as an application of the Douglas-Rachford splitting method to the dual of the two-block separable convex programming (2) [17] or a special case of the proximal point method for the general convex programming [18], or a generalization of the classical Uzawa method for solving the saddle-point problems [19]. Chen et al. [20] showed that the sequential ADMM cannot directly extend to three-block separable convex programming by a simple counterexample, and similarly He et al. [13] showed that the full parallel ADMM may be divergent even for two-block separable convex programming. There are three strategies to deal with the issue: the substitution procedure [13, 21]; the technique of regularization [14, 22]; the technique of small step size [23]. For other recent developments of ADMM, including the (sub)linear convergence rate, various acceleration techniques, indefinite regularized matrices, larger step size, and its generalization to solve nonconvex and nonsmooth multiblock separable programming, we refer to [24–28].

In this paper, we focus the attention on the fully parallel ADMM, which is based on the substitution procedure and is more suitable for distributed computation. Following the full Jacobian decomposition of the augmented Lagrangian method (FJDALM) in [13], we shall propose a new fully parallel ADMM, termed as the proximal fully parallel splitting method (PFPSM) in the following analysis. Compared to other similar methods in the literature, the new iteration method has three properties. The first one is that all of its subproblems are regularized by some proximal terms, which can enhance its numerical efficiency in practice. The second one is that a Glowinski relaxation factor is attached to the updating formula for its Lagrange multiplier, and larger values of often can accelerate the convergence of the proposed method (as to be reported in Section 4). To the best of our knowledge, this is the first fully parallel ADMM-type method with the Glowinski relaxation factor for solving problem (6). The third one is that it dynamically updates the step size .

The remainder of this paper is organized as follows. In Section 2, we present some necessary notations and useful lemmas. In Section 3, we propose the proximal fully parallel splitting method for solving problem (6) and prove its global convergence step by step. Some numerical results about the proposed method are shown and discussed in Section 4. Finally, in Section 5, some concluding remarks are drawn and several future research directions are discussed also.

#### 2. Preliminaries

In this section, we briefly introduce some necessary notations, definitions, and lemmas.

For any matrix , let denote the Frobenius-norm of . For any two matrices , denotes their Frobenius inner product. Then, we have . If is symmetric positive definite matrix, we denote by the -norm of the matrix . The effective domain of a function is defined as . The set of all relative interior points of a given nonempty convex set is denoted by . diag has as its th block on the diagonal.

A function is convex if and only if Then, for a convex function , we have the following basic inequality:where denotes the subdifferential of at the point . The above definition and property (8) can be easily extended to the matrix-valued function.

For the convenience of our analysis, let us relabel as , as , and as and denote . Then are all convex function.

We make the following standard assumption about problem (6) in this paper.

*Assumption 1. *The generalized Slater’s condition holds; that is, there is a point , where the set is defined by

Under Assumption 1, it follows from Theorems and of [29] that is an optimal solution of problem (6) if and only if there exists a matrix such that is a solution of the following KKT systems:The following lemma is the basis to reformulate (10) as a mixed variational inequality problem.

Lemma 2. *For any two matrices , the relationship is equivalent to the following mixed variational inequality problem:*

*Proof. *From , one has which together with the convexity of and the subgradient inequality (8) implies that is,Conversely, from , one has which impliesFrom this and Theorem of [29], we have . The proof is completed.

*Remark 3. *Based on (10) and Lemma 2, the matrix is an optimal solution to problem (6) if and only if there exists a matrix such thatMoreover, any satisfying (17) is an optimal solution to the dual of problem (6).

Furthermore, we setObviously, (17) can be written as the following mixed variational inequality problem, abbreviated as VI: find a matrix such thatwhere , andThe set of the solutions of VI, denoted by , is nonempty by Assumption 1 and Remark 3. Because is a linear mapping and its coefficient matrix is skew-symmetric, it satisfies the following nice property:

#### 3. The Proximal Fully Parallel Splitting Method

In this section, we shall present the proximal fully parallel splitting method for solving (6) and discuss its convergence results.

Firstly, the augmented Lagrangian function of the convex programming (6) can be written as where is the penalty parameter and is the Lagrange multiplier for the linear constraint of (6). To simplify our notation in later analysis, we define a block diagonal matrix as follows:where are some symmetric positive semidefinite matrices and is a predefined parameter. Note that the matrix is symmetric positive definite.

*Algorithm 4 (proximal fully parallel splitting method (termed as PFPSM)). ****Step **0. *Given the tolerance , two constants , and the initial iterate . Set .*Step **1* (prediction step). Compute the temporary iterate by*Step **2* (convergence verification)*. *If the stopping criterionis satisfied, then stop. *Step **3* (correction step)*. *Generate the new iterate bywhere is any symmetric positive definite matrix and is the step size defined bywithSet and go to Step .

*Remark 5. *In PFPSM, we apply the classical proximal point algorithm (PPA) to regularize its subproblems. Compared to the iteration method in [13], the new iteration method PFPSM extends the scope of from 1 to the interval , and larger values of are often more preferred in practice; see Figure 2. Furthermore, compared to the iteration method in [14], the new iteration method PFPSM removes the restriction imposed on the regularized matrices ; see condition in [14].

The following lemma provides a lower bound of the , that is, the numerator of the step size , and this lower bound plays an important role in the proof of the global convergence of PFPSM.

Lemma 6. *Let be the matrix generated by PFPSM from the given and be defined by (28). Then we havewhere .*

*Proof. *By the matrix form of the inequality with and , one hasIt follows from (23) and (28) that where the first inequality comes from (31) and the second inequality follows from . The proof is completed.

*Remark 7. *It follows from (27) and (29) thatwhere is the minimum eigenvalue of the positive definite matrix . Therefore the sequence generated by PFPSM is bounded away from zero. Specially, if we set , then

Now, we shall show that the stopping criterion (25) is reasonable.

Lemma 8. *If , then the matrix is a solution of VI.*

*Proof. *Deriving the first-order optimal condition of subproblem in (24), we have From the updating formula for in (24), one has and, substituting it into the above inequality, we get which can be reduced to Similarly, for the other two variables , we have Furthermore, the updating formula for in (24) can be rewritten as the following variational inequality problem: Then, adding the above four inequalities, and by manipulation, one hasTherefore, substituting into (41), we get that is, which indicates that is a solution of VI. This completes the proof.

If the iteration method PFPSM terminates at Step , then the iterate is an approximate solution of VI. Thus, it is assumed, without loss of generality, that the iteration method PFPSM generates an infinite sequence . Now, we intend to prove that is a descent direction of the merit function at the point , where .

Lemma 9. *Let and be the two sequences generated by PFPSM. Then, one has*

*Proof. *Since , it follows from (41) that Thus,where the second inequality follows from , and the second equality comes from (24). Therefore, we have which completes the proof of the lemma.

From inequalities (29) and (44), one has which indicates that is a descent direction of the merit function at . Therefore, it is natural to generate the new iterate along this direction as the iterative scheme (26). Now, let us investigate how to choose a suitable step size to achieve progress as much as possible. The new iterate with an undeterminate step size is denoted by , so one hasThen, letbe the profit function of the th iteration. However, we cannot maximize directly because is unknown. Motivated by the strategy in [13], we derive a tight lower bound of in the next lemma, which is independent of the unknown matrix , to obtain a suboptimal step size.

Lemma 10. *For any given matrix , let be the matrix generated by (24), be defined by (28), and be defined by (50). Then, one has*

*Proof. *The proof is similar to that of Lemma in [13] and is omitted here.

It follows from Lemma 9 that is a lower bound of for any . Therefore, we can maximize to accelerate the convergence of PFPSM. Note that is a quadratic function of and it reaches its maximum at defined by (27). The following theorem shows that the sequence generated by PFPSM is Fejér monotone with respect to the solution set of VI.

Theorem 11. *For any , there exists a positive constant such that the sequence generated by PFPSM satisfieswhere*

*Proof. *It follows from (50) and (51) thatwhere the second equality follows from the definition of in (27), and the second inequality comes from (29) and (33). Then, assertion (52) is thus proved.

*Remark 12. *For some very large-scale (6), computing the step size defined by (27) is a fussy work needing much calculating. For these instances, set in PFPSM, and the step size can be fixed as the constant defined in Lemma 6. The new iterate can be generated byThen, referring to the proof of the above theorem, we have Based on the Fejér monotonicity of the sequence generated by PFPSM, the global convergence of PFPSM can be proved by similar arguments to those in [13], which is omitted.

Theorem 13. *The sequence generated PFPSM converges to some .*

#### 4. Numerical Results

In this section, we run a series of numerical experiments and apply the proposed PFPSM to solve some concrete examples of SPCP (6). We compare it with the splitting method for separable convex programming (denoted by HTY) in [30], the full Jacobian decomposition of the augmented Lagrangian method (denoted by FJDALM) in [13], and the fully parallel ADMM (denoted by PADMM) in [14]. Through these numerical experiments, we want to illustrate that suitable proximal terms can enhance PFPSM’s numerical efficiency in practice, larger values of the Glowinski relaxation factor can often accelerate PFPSM’s convergence speed, and compared with the other three ADMM-type methods the dynamically updated step size defined in (27) can accelerate the convergence of PFPSM. All codes were written by Matlab R2010a and conducted on a ThinkPad notebook with Pentium(R) Dual-Core CPU T4400@2.2 GHz, 4 GB of memory.

Now, let us first introduce two shrinkage operators, which are used to solve the subproblems of the four tested methods. Let be a constant and be a given matrix, and let be the singular value decomposition (SVD) of . Then, the solution of the optimization problemis given by the operator , which is defined by [31]The solution of the optimization problem is given by the operator , which is defined componentwisely by [32]where is the sign function.

In the following, let us discuss how to solve the three subproblems of PFPSM. In fact, these three subproblems are simple enough to have closed-form solutions, which are listed as follows. For simplicity, we set with in PFPSM. Based on the two shrinkage operators (58) and (60), the closed-form solutions of the three subproblems in the prediction step of PFPSM are summarized as follows:(i)The explicit solution is(ii)The explicit solution is(iii)The subproblem related to is Then, the explicit solution is given as follows:

##### 4.1. Synthetic Data

In the experiment, following [5], we used the following synthetic data: the true low-rank matrix is generated using normally distributed and . Hence, the rank of is , and the low-rank ratio of is defined by . The nonzero entries of sparse matrix or are generated uniformly in , whose sparsity ratio is denoted by , defined by , where represents the number of nonzero entries of . Then, we get the matrix . Here, we consider that only a fraction of entries of can be observed, and indices of the observed entries are collected in the index set , which is determined randomly with a certain sample ratio (the ratio is denoted by %, where is the cardinality of ). Then, we get the matrix , which is further corrupted by adding Gaussian noise with standard deviation . Finally, we get the true observed matrix . Here, we test the following scenarios: , , , and . The parameters and in (6) are set as , respectively.

Throughout, we use the following stopping criterion:and the initial point is set as . For the penalty parameter , we take for all tested methods. The parameters in the tested methods are listed as follows: HTY: . FJDALM: . PADMM: . PFPSM: .

Let us first test the sensitivity of PFPSM with respect to the proximal parameter and the Glowinski relaxation factor , and the numerical advantage of attaching the parameters and to the iterative scheme of PFPSM is thus clear. We choose 0 : 0.1 : 1 and fix , and . The maximum number of iterations is set to 500. The numerical results are plotted in Figure 1, in which “RLR” denotes the relative error of the recovered low-rank matrix and “RSP” denotes the relative error of the recovered sparse matrix when the stopping criterion (65) is satisfied. Figure 1 shows that small values of can improve the accuracy of the solutions, while large values of can accelerate the convergence speed of PFPSM. Then taking into consideration the numerical performance of other comparing methods, in the following we set . Now let us make the sensitivity analysis of the parameter . We choose 0.86 : 0.02 : 1.15 and fix and . The maximum number of iterations is also set to 500. The numerical results are plotted in Figure 2. The two subplots in Figure 1 indicate that the number of iterations is descent and the elapsed CPU time has a descent tendency as the parameter increases, which verify the numerical advantage of PFPSM with large values of . Then, based on this observation, we set in the following experiments.