Abstract

The proximal-based parallel decomposition methods were recently proposed to solve structured convex optimization problems. These algorithms are eligible for parallel computation and can be used efficiently for solving large-scale separable problems. In this paper, compared with the previous theoretical results, we show that the range of the involved parameters can be enlarged while the convergence can be still established. Preliminary numerical tests on stable principal component pursuit problem testify to the advantages of the enlargement.

1. Introduction

Consider the constrained convex optimization problems with separable objective functions in the following form: where , , , , and , are two nonempty, closed, and convex sets and , are convex functions. Problem of this type arises from a number of fields such as signal processing, compressed sensing, machine learning, and semidefinite programming (see, e.g., [17] and references cited therein).

To solve (1), the classical alternating direction method generates the new iterate via the following scheme: where is the Lagrange multiplier associated with the linear constraint and is a penalty parameter for the violation of the linear constraint.

At each iteration, ADM essentially splits the subproblem of the augmented Lagrangian method into two subproblems in Gauss-Seidel fashion. The subproblems can be solved in consecutive order, which makes ADM possible to exploit the individual structure of and . The decomposed subproblems in (2) are often easy when and in (1) are both identity matrices and the resolvent operators of and have closed-form solutions or can be efficiently solved up to a high precision. Here, the resolvent operator of a function (say, ) is defined by where and . However, in some cases, both and are not identity matrices; the two subproblems in ADM (2) are difficult to solve because the evaluation of the following minimization style could be costly, where is a given nonidentity matrix, for example, or .

For the purpose of parallel and easy computing, the first parallel decomposition method [8] (abbreviated as FPDM) generates the new iterative as follows: where the parameters , are required to satisfy and . Here, denotes the largest eigenvalue of matrix . It is easy to verify that the proximal-based decomposition method proposed in [9] is a special case of the FPDM.

When (4) is easy to evaluate for and , the second parallel decomposition method [8] (abbreviated as SPDM) can be used, which generates the new iterative as follows: where the parameters , are required to satisfy and .

Note that the subproblems in FPDM and SPDM can be processed in a parallelized fashion because the first subproblem involving is independent on the second subproblem involving . Thus, FPDM and SPDM are suitable for solving large-scale distributed machine learning and big-data-related optimization problems.

ADM was first described in [10] and is closely related to many other algorithms, such as augmented Lagrangian methods, proximal point algorithm [11], and split Bregman methods [12]. Recently, the convergence of ADM has been analyzed under certain assumptions (see, e.g., [1316]) and the direct extension of ADM for multiblock convex minimization problems has already been proved not necessarily convergent [17].

In this paper, we study the proximal-based parallel decomposition methods from the perspective of variational inequalities. We show that the requirement ranges of the parameters ,  , and can be significantly enlarged. Our contributions are as follows.(i)For the FPDM, we show that the requirements of the step sizes , , and can be uniformly relaxed by (ii)For the SPDM, we show that the requirements of the step sizes , , and can be uniformly relaxed by (iii)We provide a new application example in machine learning, that is, stable principal component pursuit problem. Preliminary numerical experiments testify to the advantages of the enlargement.

The rest of this paper is organized as follows. In Section 2 we derive a variational reformulation of (1) and summarize some preliminaries of variational inequalities. In Section 3, we describe our main theoretical results and analyze their convergence. We report some numerical results in Section 4 and make some conclusions in Section 5.

2. Preliminaries

2.1. Variational Inequality Characterization

In this section, we derive a variational reformulation of (1) which will be used in subsequent analysis.

Since the functions and are all assumed to be convex, by invoking the first-order optimality condition for (1), we can easily verify that solving (1) amounts to finding a vector of the variational inequality (VI): with where

The problem (9) is referred to as a structured variational inequality (SVI) and has been studied extensively both in the theoretical frameworks and applications. Recently, He et al. [18, 19] proposed a unified framework of proximal-like contraction methods for monotone VI. They also construct the convergence rate of the projection and contraction methods for VI with Lipschitz continuous monotone operators [20]. Xu et al. [21] proposed two classes of correction methods for the SVI in which the mapping does not have an explicit form. Yuan and Li [22] developed a logarithmic-quadratic proximal (LQP) based decomposition method by applying the LQP terms to regularize the ADM subproblems. Tao and Yuan [23] established the convergence rate of ADM with LQP regularization. Bnouhachem et al. [24] studied a new inexact LQP alternating direction method by solving a series of related systems of nonlinear equations.

2.2. Some Properties of Variational Inequalities

In this section, we summarize some basic knowledge and related definitions of variational inequalities.

Let be a symmetric positive definite matrix; the -norm of the vector is denoted by . In particular, when , is the Euclidean norm of . Let be the projection operator onto under the -norm; that is, From the above definition, we have the following well-known properties:

The mapping is said to be monotone with respect to if

The following lemma [25, page 267] states an important result which characterizes a VI by a projection equation.

Lemma 1. Let be a closed convex set in and let be any positive definite matrix; then is a solution of VI() if and only if it satisfies

3. Theoretical Results of the Relaxation

In this section, we show that the range of the parameters , , and can be enlarged in FPDM and SPDM, which is broader than the previous theoretical results. We also establish the global convergence of FPDM and SPDM under the new conditions for the parameters.

3.1. The Parameters Relaxation of the FPDM

From (5), the subproblems of FPDM can be, respectively, characterized by the following VI form: find such that with where and is a positive definite matrix.

Lemma 2. For a given , let be generated by (16) and (17). If then for any , one has

Proof. Since and one has Applying the Cauchy-Schwarz inequality to the right term, we get

Theorem 3. Let the sequence be generated by FPDM (16) and (17). If then one has

Proof. Consider On the other hand, by setting in (16), we have Using the fact that is a monotone operator, we have With rearrangement of the term (26) and using (27), we derive that Substituting (28) into (24), we get With Lemma 1, substituting (19) into (29), we get which completes the proof.

Remark 4. Compared to the requirement of the parameters , , in [8], we now allow the step sizes , , to be chosen according to rule (7). In fact, the restriction on and proposed in [8] is which is a special case of the rule (18), since that is, . Hence, the requirement on the parameters is significantly relaxed.

3.2. The Parameters Relaxation of the SPDM

In this subsection, we extend our analysis to the SPDM. From (5), the subproblems of SPDM can be characterized by the following VI form: find such that with where and is a positive definite matrix.

Lemma 5. For a given , let be generated by (33) and (34). If then for any one has

Proof. Analogically, we have Applying the Cauchy-Schwarz inequality to the right term, we get

Theorem 6. Let the sequence be generated by SPDM (33) and (34). If then one has

Proof. Consider On the other hand, by setting in (33), we have By using the monotonicity of , we have With rearrangement of the term (43), we derive that Substituting (45) into (41), we get With Lemma 2, substituting (36) into (46), we get which completes the proof.

Remark 7. Compared to the requirement of the parameters , , in [8], we now allow the step sizes , , to be chosen according to the rule (8). In fact, the restriction on and proposed in [8] is which is a special case of the rule (35), since that is, . Hence, the requirement on the parameters is significantly relaxed.

3.3. The Convergence

In this subsection, we give the main convergence theorem of the FPDM and SPDM under the new required parameters conditions.

Theorem 8. The sequence generated by the FPDM (resp., SPDM) under the conditions (18) (resp., (35)) converges to some solution , which is a solution of SVIs (9).

Proof. Theorem 3 (resp., Theorem 6) means that the sequence generated is Fejér monotone with respect to the solution set and the assertion follows immediately by using the property of Fejér monotonicity.

4. Numerical Experiments

In this section, we report the sensitivity of the involved parameters , , of FPDM on the stable principal component pursuit problem (SPCP). Since SPDM is the extended version of FPDM and the sensitivity results of SPDM are similar to those of FPDM, we omit the numerical results of SPDM for the sake of succinctness. The problem tested is from Example 2 of [26]. Codes were all written in Matlab 2009b and all programs were run on HP notebook with Intel Core CPU 2.0 GHZ and 2 G memory.

SPCP arising from compressed sensing seeks to decompose a given observation matrix into the sum of three matrices: , where is a nonnegative and low-rank matrix, is a sparse matrix, and is a noise matrix. The model of SPCP can be cast as where is the so-called nuclear norm (the sum of all singular values), is the norm, and is an indicator function.

Following the procedure described in [26], by introducing an auxiliary variable , grouping and as one big block , and grouping and as another big block , (50) can be reformulated as the standard form of (1) as follows: Then the largest singular value of the coefficient matrix of is . The largest singular value of the coefficient matrix of is . For a better illustration, we denote .

FPDM (5) applied to (51) yields the following iterative scheme:

There are two main advantages of FPDM applied to SPCP. First, all the generated minimizations in (52a)–(52d) have closed-form solutions. Second, the subproblems (52a)–(52d) are highly parallel, making FPDM appealing for parallel or distributed computing. Now, we elaborate on the strategy of solving the resulting subproblems at each iteration.

(i) The -subproblem (52a) amounts to evaluate the proximal operator of the nuclear norm function and is given by the matrix shrinkage operation where the matrix shrinkage operator is defined as and is the SVD of matrix .

(ii) The closed-form solution of -subproblem (52b) can be given by the shrinkage operation: where the shrinkage operator is defined as

(iii) The -subproblem (52c) amounts to projecting the matrix onto the Euclidean ball , whose closed-form solution is given by

(iv) The -subproblem (52d) amounts to projecting the matrix onto the the nonnegative orthant, whose closed-form solution is given by For detailed analytical methods of (52a)–(52d), the reader is referred to, for example, [26, 27].

In our experiment, we generate the data of (50) randomly in the same way as [26]. For given , , the rank- matrix was generated by , where and are random matrices whose entries are independently and identically (i.i.d.) uniformly distributed in . Note that, in this experiment, is a componentwise nonnegative and low-rank matrix we want to recover. The support of the sparse matrix was chosen uniformly and randomly, and the nonzero entries of were i.i.d. uniformly in the interval . The entries of matrix for noise were generated as i.i.d. Gaussian with standard deviation .

As in [26], we set ; we chose . The initial iterate for FPDM is , , . The stopping criterion is set as where is the tolerance set as . We denoted so that the rank of is and so that the cardinality of is . For some cases of dimension , we report the iteration numbers (Iter.), relative error of the low-rank matrix , relative error of the sparse matrix , and CPU times in seconds (CPU(s)), where the relative errors are defined as For each instance, we randomly created ten examples, so the results were averaged over ten runs.

The computational results are presented in Table 1. For a different instance, the value of , , and chosen to satisfy the condition (7), respectively. It can be seen that if we choose a little smaller than and a little larger than , the numerical performance of FPDM with the new selected parameters shows better than the case, where .

5. Conclusions

In this paper, we show that the requirement ranges on the involved parameters of the proximal-based parallel decomposition methods can be significantly enlarged. We prove the global convergence of the new scheme under the new conditions. Preliminary numerical experiments on the stable principal component pursuit problem testify to the advantages of the enlargement.

Conflict of Interests

We declare that there is no conflict of interests regarding the publication of this article.

Acknowledgment

The work of Mingfang Ni is supported in part by the Natural Science Foundation of China Grant no. NSFC-70971136.