Sensitivity Analysis of the Proximal-Based Parallel Decomposition Methods

Ma, Feng; Ni, Mingfang; Zhu, Lei; Yu, Zhanke

doi:https://doi.org/10.1155/2014/891017

Mathematical Problems in Engineering

On this page

Abstract Introduction Preliminaries Conclusions References Copyright Related Articles

Special Issue

Optimization Theory, Methods, and Applications in Engineering 2013

View this Special Issue

Research Article | Open Access

Volume 2014 | Article ID 891017 | https://doi.org/10.1155/2014/891017

Sensitivity Analysis of the Proximal-Based Parallel Decomposition Methods

Feng Ma,¹Mingfang Ni,¹Lei Zhu,¹and Zhanke Yu¹

Academic Editor: Dongdong Ge

Received30 Jun 2013

Revised25 Nov 2013

Accepted27 Nov 2013

Published20 Jan 2014

Abstract

The proximal-based parallel decomposition methods were recently proposed to solve structured convex optimization problems. These algorithms are eligible for parallel computation and can be used efficiently for solving large-scale separable problems. In this paper, compared with the previous theoretical results, we show that the range of the involved parameters can be enlarged while the convergence can be still established. Preliminary numerical tests on stable principal component pursuit problem testify to the advantages of the enlargement.

1. Introduction

Consider the constrained convex optimization problems with separable objective functions in the following form: where , , , , and , are two nonempty, closed, and convex sets and , are convex functions. Problem of this type arises from a number of fields such as signal processing, compressed sensing, machine learning, and semidefinite programming (see, e.g., [1–7] and references cited therein).

To solve (1), the classical alternating direction method generates the new iterate via the following scheme: where is the Lagrange multiplier associated with the linear constraint and is a penalty parameter for the violation of the linear constraint.

At each iteration, ADM essentially splits the subproblem of the augmented Lagrangian method into two subproblems in Gauss-Seidel fashion. The subproblems can be solved in consecutive order, which makes ADM possible to exploit the individual structure of and . The decomposed subproblems in (2) are often easy when and in (1) are both identity matrices and the resolvent operators of and have closed-form solutions or can be efficiently solved up to a high precision. Here, the resolvent operator of a function (say, ) is defined by where and . However, in some cases, both and are not identity matrices; the two subproblems in ADM (2) are difficult to solve because the evaluation of the following minimization style could be costly, where is a given nonidentity matrix, for example, or .

For the purpose of parallel and easy computing, the first parallel decomposition method [8] (abbreviated as FPDM) generates the new iterative as follows: where the parameters , are required to satisfy and . Here, denotes the largest eigenvalue of matrix . It is easy to verify that the proximal-based decomposition method proposed in [9] is a special case of the FPDM.

When (4) is easy to evaluate for and , the second parallel decomposition method [8] (abbreviated as SPDM) can be used, which generates the new iterative as follows: where the parameters , are required to satisfy and .

Note that the subproblems in FPDM and SPDM can be processed in a parallelized fashion because the first subproblem involving is independent on the second subproblem involving . Thus, FPDM and SPDM are suitable for solving large-scale distributed machine learning and big-data-related optimization problems.

ADM was first described in [10] and is closely related to many other algorithms, such as augmented Lagrangian methods, proximal point algorithm [11], and split Bregman methods [12]. Recently, the convergence of ADM has been analyzed under certain assumptions (see, e.g., [13–16]) and the direct extension of ADM for multiblock convex minimization problems has already been proved not necessarily convergent [17].

In this paper, we study the proximal-based parallel decomposition methods from the perspective of variational inequalities. We show that the requirement ranges of the parameters , , and can be significantly enlarged. Our contributions are as follows.(i)For the FPDM, we show that the requirements of the step sizes , , and can be uniformly relaxed by (ii)For the SPDM, we show that the requirements of the step sizes , , and can be uniformly relaxed by (iii)We provide a new application example in machine learning, that is, stable principal component pursuit problem. Preliminary numerical experiments testify to the advantages of the enlargement.

The rest of this paper is organized as follows. In Section 2 we derive a variational reformulation of (1) and summarize some preliminaries of variational inequalities. In Section 3, we describe our main theoretical results and analyze their convergence. We report some numerical results in Section 4 and make some conclusions in Section 5.

2. Preliminaries

2.1. Variational Inequality Characterization

In this section, we derive a variational reformulation of (1) which will be used in subsequent analysis.

Since the functions and are all assumed to be convex, by invoking the first-order optimality condition for (1), we can easily verify that solving (1) amounts to finding a vector of the variational inequality (VI): with where

The problem (9) is referred to as a structured variational inequality (SVI) and has been studied extensively both in the theoretical frameworks and applications. Recently, He et al. [18, 19] proposed a unified framework of proximal-like contraction methods for monotone VI. They also construct the convergence rate of the projection and contraction methods for VI with Lipschitz continuous monotone operators [20]. Xu et al. [21] proposed two classes of correction methods for the SVI in which the mapping does not have an explicit form. Yuan and Li [22] developed a logarithmic-quadratic proximal (LQP) based decomposition method by applying the LQP terms to regularize the ADM subproblems. Tao and Yuan [23] established the convergence rate of ADM with LQP regularization. Bnouhachem et al. [24] studied a new inexact LQP alternating direction method by solving a series of related systems of nonlinear equations.

2.2. Some Properties of Variational Inequalities

In this section, we summarize some basic knowledge and related definitions of variational inequalities.

Let be a symmetric positive definite matrix; the -norm of the vector is denoted by . In particular, when , is the Euclidean norm of . Let be the projection operator onto under the -norm; that is, From the above definition, we have the following well-known properties:

The mapping is said to be monotone with respect to if

The following lemma [25, page 267] states an important result which characterizes a VI by a projection equation.

Lemma 1. Let be a closed convex set in and let be any positive definite matrix; then is a solution of VI() if and only if it satisfies

3. Theoretical Results of the Relaxation

In this section, we show that the range of the parameters , , and can be enlarged in FPDM and SPDM, which is broader than the previous theoretical results. We also establish the global convergence of FPDM and SPDM under the new conditions for the parameters.

3.1. The Parameters Relaxation of the FPDM

From (5), the subproblems of FPDM can be, respectively, characterized by the following VI form: find such that with where and is a positive definite matrix.

Lemma 2. For a given , let be generated by (16) and (17). If then for any , one has

Proof. Since and one has Applying the Cauchy-Schwarz inequality to the right term, we get

Theorem 3. Let the sequence be generated by FPDM (16) and (17). If then one has

Proof. Consider On the other hand, by setting in (16), we have Using the fact that is a monotone operator, we have With rearrangement of the term (26) and using (27), we derive that Substituting (28) into (24), we get With Lemma 1, substituting (19) into (29), we get which completes the proof.

Remark 4. Compared to the requirement of the parameters , , in [8], we now allow the step sizes , , to be chosen according to rule (7). In fact, the restriction on and proposed in [8] is which is a special case of the rule (18), since that is, . Hence, the requirement on the parameters is significantly relaxed.

3.2. The Parameters Relaxation of the SPDM

In this subsection, we extend our analysis to the SPDM. From (5), the subproblems of SPDM can be characterized by the following VI form: find such that with where and is a positive definite matrix.

Lemma 5. For a given , let be generated by (33) and (34). If then for any one has

Proof. Analogically, we have Applying the Cauchy-Schwarz inequality to the right term, we get

Theorem 6. Let the sequence be generated by SPDM (33) and (34). If then one has

Proof. Consider On the other hand, by setting in (33), we have By using the monotonicity of , we have With rearrangement of the term (43), we derive that Substituting (45) into (41), we get With Lemma 2, substituting (36) into (46), we get which completes the proof.

Remark 7. Compared to the requirement of the parameters , , in [8], we now allow the step sizes , , to be chosen according to the rule (8). In fact, the restriction on and proposed in [8] is which is a special case of the rule (35), since that is, . Hence, the requirement on the parameters is significantly relaxed.

3.3. The Convergence

In this subsection, we give the main convergence theorem of the FPDM and SPDM under the new required parameters conditions.

Theorem 8. The sequence generated by the FPDM (resp., SPDM) under the conditions (18) (resp., (35)) converges to some solution , which is a solution of SVIs (9).

Proof. Theorem 3 (resp., Theorem 6) means that the sequence generated is Fejér monotone with respect to the solution set and the assertion follows immediately by using the property of Fejér monotonicity.

4. Numerical Experiments

In this section, we report the sensitivity of the involved parameters , , of FPDM on the stable principal component pursuit problem (SPCP). Since SPDM is the extended version of FPDM and the sensitivity results of SPDM are similar to those of FPDM, we omit the numerical results of SPDM for the sake of succinctness. The problem tested is from Example 2 of [26]. Codes were all written in Matlab 2009b and all programs were run on HP notebook with Intel Core CPU 2.0 GHZ and 2 G memory.

SPCP arising from compressed sensing seeks to decompose a given observation matrix into the sum of three matrices: , where is a nonnegative and low-rank matrix, is a sparse matrix, and is a noise matrix. The model of SPCP can be cast as where is the so-called nuclear norm (the sum of all singular values), is the norm, and is an indicator function.

Following the procedure described in [26], by introducing an auxiliary variable , grouping and as one big block , and grouping and as another big block , (50) can be reformulated as the standard form of (1) as follows: Then the largest singular value of the coefficient matrix of is . The largest singular value of the coefficient matrix of is . For a better illustration, we denote .

FPDM (5) applied to (51) yields the following iterative scheme:

There are two main advantages of FPDM applied to SPCP. First, all the generated minimizations in (52a)–(52d) have closed-form solutions. Second, the subproblems (52a)–(52d) are highly parallel, making FPDM appealing for parallel or distributed computing. Now, we elaborate on the strategy of solving the resulting subproblems at each iteration.

(i) The -subproblem (52a) amounts to evaluate the proximal operator of the nuclear norm function and is given by the matrix shrinkage operation where the matrix shrinkage operator is defined as and is the SVD of matrix .

(ii) The closed-form solution of -subproblem (52b) can be given by the shrinkage operation: where the shrinkage operator is defined as

(iii) The -subproblem (52c) amounts to projecting the matrix onto the Euclidean ball , whose closed-form solution is given by

(iv) The -subproblem (52d) amounts to projecting the matrix onto the the nonnegative orthant, whose closed-form solution is given by For detailed analytical methods of (52a)–(52d), the reader is referred to, for example, [26, 27].

In our experiment, we generate the data of (50) randomly in the same way as [26]. For given , , the rank- matrix was generated by , where and are random matrices whose entries are independently and identically (i.i.d.) uniformly distributed in . Note that, in this experiment, is a componentwise nonnegative and low-rank matrix we want to recover. The support of the sparse matrix was chosen uniformly and randomly, and the nonzero entries of were i.i.d. uniformly in the interval . The entries of matrix for noise were generated as i.i.d. Gaussian with standard deviation .

As in [26], we set ; we chose . The initial iterate for FPDM is , , . The stopping criterion is set as where is the tolerance set as . We denoted so that the rank of is and so that the cardinality of is . For some cases of dimension , we report the iteration numbers (Iter.), relative error of the low-rank matrix , relative error of the sparse matrix , and CPU times in seconds (CPU(s)), where the relative errors are defined as For each instance, we randomly created ten examples, so the results were averaged over ten runs.

The computational results are presented in Table 1. For a different instance, the value of , , and chosen to satisfy the condition (7), respectively. It can be seen that if we choose a little smaller than and a little larger than , the numerical performance of FPDM with the new selected parameters shows better than the case, where .

5. Conclusions

In this paper, we show that the requirement ranges on the involved parameters of the proximal-based parallel decomposition methods can be significantly enlarged. We prove the global convergence of the new scheme under the new conditions. Preliminary numerical experiments on the stable principal component pursuit problem testify to the advantages of the enlargement.

Conflict of Interests

We declare that there is no conflict of interests regarding the publication of this article.

Acknowledgment

The work of Mingfang Ni is supported in part by the Natural Science Foundation of China Grant no. NSFC-70971136.

References

J. Yang, Y. Zhang, and W. Yin, “A fast alternating direction method for TVL1-L2 signal reconstruction from partial Fourier data,” IEEE Journal on Selected Topics in Signal Processing, vol. 4, no. 2, pp. 288–297, 2010.
View at: Publisher Site | Google Scholar
P. L. Combettes and J.-C. Pesquet, “Proximal splitting methods in signal processing,” in Fixed-Point Algorithms for Inverse Problems in Science and Engineering, vol. 49, pp. 185–212, Springer, New York, NY,USA, 2011.
View at: Google Scholar
J. Yang and Y. Zhang, “Alternating direction algorithms for $ℓ_{1}$ -problems in compressive sensing,” SIAM Journal on Scientific Computing, vol. 33, no. 1, pp. 250–278, 2011.
View at: Publisher Site | Google Scholar | MathSciNet
S. Ma, L. Xue, and H. Zou, “Alternating direction methods for latent variable Gaussian graphical model selection,” Neural Computation, vol. 25, no. 8, pp. 2172–2198, 2013.
View at: Publisher Site | Google Scholar | MathSciNet
S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Foundations and Trends in Machine Learning, vol. 3, no. 1, pp. 1–122, 2010.
View at: Publisher Site | Google Scholar | Zentralblatt MATH
B. He, M. Xu, and X. Yuan, “Solving large-scale least squares semidefinite programming by alternating direction methods,” SIAM Journal on Matrix Analysis and Applications, vol. 32, no. 1, pp. 136–152, 2011.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
Z. Wen, D. Goldfarb, and W. Yin, “Alternating direction augmented Lagrangian methods for semidefinite programming,” Mathematical Programming Computation, vol. 2, no. 3-4, pp. 203–230, 2010.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
B. He and X. Yuan, “A unified framework of some proximal-based decomposition methods for monotone variational inequalities with separable structures,” Pacific Journal of Optimization, vol. 8, no. 4, pp. 817–844, 2012.
View at: Google Scholar | Zentralblatt MATH | MathSciNet
G. Chen and M. Teboulle, “A proximal-based decomposition method for convex minimization problems,” Mathematical Programming, vol. 64, no. 1, pp. 81–101, 1994.
View at: Publisher Site | Google Scholar | MathSciNet
R. G. A. Marrocco, “Sur l'approximation paréléments finis d'ordre un et la résolution, par pénalisation dualité, d'une classe de problémes de dirichlet non linéaires,,” RAIRO, vol. 9, pp. 41–76, 1975.
View at: Google Scholar
B. Martinet, “Régularisation d'inéquations variationnelles par approximations successives,” vol. 4, no. 3, pp. 154–158, 1970.
View at: Google Scholar | Zentralblatt MATH | MathSciNet
T. Goldstein and S. Osher, “The split Bregman method for $L 1$ -regularized problems,” SIAM Journal on Imaging Sciences, vol. 2, no. 2, pp. 323–343, 2009.
View at: Publisher Site | Google Scholar | MathSciNet
B. He and X. Yuan, “On the $O (1 / n)$ convergence rate of the Douglas-Rachford alternating direction method,” SIAM Journal on Numerical Analysis, vol. 50, no. 2, pp. 700–709, 2012.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
M. Hong and Z. Q. Luo, “On the linear convergence of the alternating direction method of multipliers,” In press, http://arxiv.org/abs/1208.3922.
View at: Google Scholar
W. Deng and W. Yin, “On the global and linear convergence of the generalized alternating direction method of multipliers,” Tech. Rep., 2012.
View at: Google Scholar
D. Han and X. Yuan, “A note on the alternating direction method of multipliers,” Journal of Optimization Theory and Applications, vol. 155, no. 1, pp. 227–238, 2012.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
C. Chen, B. He, Y. Ye, and X. Yuan, “The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent,” In press, http://www.optimization-online.org/DB_HTML/2013/09/4059.html.
View at: Google Scholar
B. He, L.-Z. Liao, and X. Wang, “Proximal-like contraction methods for monotone variational inequalities in a unified framework I: effective quadruplet and primary methods,” Computational Optimization and Applications, vol. 51, no. 2, pp. 649–679, 2012.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
B. He, L.-Z. Liao, and X. Wang, “Proximal-like contraction methods for monotone variational inequalities in a unified framework II: general methods and numerical experiments,” Computational Optimization and Applications, vol. 51, no. 2, pp. 681–708, 2012.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
X. Cai, G. Gu, and B. He, “On the O(1/t) convergence rate of the projection and contraction methods for variational inequalities with Lipschitz continuous monotone operators,” Computational Optimization and Applications, 2013.
View at: Publisher Site | Google Scholar
M. H. Xu, J. L. Jiang, B. Li, and B. Xu, “An improved prediction-correction method for monotone variational inequalities with separable operators,” Computers & Mathematics with Applications, vol. 59, no. 6, pp. 2074–2086, 2010.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
X. Yuan and M. Li, “An LQP-based decomposition method for solving a class of variational inequalities,” SIAM Journal on Optimization, vol. 21, no. 4, pp. 1309–1318, 2011.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
M. Tao and X. Yuan, “On the $O (1 / t)$ convergence rate of alternating direction method with logarithmic-quadratic proximal regularization,” SIAM Journal on Optimization, vol. 22, no. 4, pp. 1431–1448, 2012.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
A. Bnouhachem, H. Benazza, and M. Khalfaoui, “An inexact alternating direction method for solving a class of structured variational inequalities,” Applied Mathematics and Computation, vol. 219, no. 14, pp. 7837–7846, 2013.
View at: Publisher Site | Google Scholar | MathSciNet
D. P. Bertsekas and J. N. Tsitsiklis, Parallel and Distributed Computation, 1989.
S. Ma, “Alternating proximal gradient method for convex minimization,” In press, http://www.optimization-online.org/DB_HTML/2012/09/3608.html.
View at: Google Scholar
N. Parikh and S. Boyd, “Proximal algorithms,” Foundations and Trends in Optimization, vol. 1, no. 3, pp. 123–231, 2013.
View at: Google Scholar

Copyright

Copyright © 2014 Feng Ma et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

925

Downloads

903

Citations