Some Recent Trends in Variational Inequalities and Optimization Problems with ApplicationsView this Special Issue
Research Article | Open Access
On the Convergence Analysis of the Alternating Direction Method of Multipliers with Three Blocks
We consider a class of linearly constrained separable convex programming problems whose objective functions are the sum of three convex functions without coupled variables. For those problems, Han and Yuan (2012) have shown that the sequence generated by the alternating direction method of multipliers (ADMM) with three blocks converges globally to their KKT points under some technical conditions. In this paper, a new proof of this result is found under new conditions which are much weaker than Han and Yuan’s assumptions. Moreover, in order to accelerate the ADMM with three blocks, we also propose a relaxed ADMM involving an additional computation of optimal step size and establish its global convergence under mild conditions.
In various fields of applied mathematics and engineering, many problems can be equivalently formulated as a separable convex optimization problem with two blocks; that is, given two closed convex functions , to find a solution pair of the following problem: where is a matrix in , and is a vector in . The classical alternating direction method of multipliers (ADMM) [1, 2] applied to problem (1) yields the following scheme: where is a Lagrangian multiplier and is a penalty parameter. Possibly due to its simplicity and effectiveness, the ADMM with two blocks has received continuous attention both in theoretical and application domains. We refer to [3–8] for theoretical results on ADMM with two blocks and [9–13] for its efficient applications in high-dimensional statistics, compressive sensing, finance, image processing, and engineering, to name just a few.
In this paper, we concentrate on the linearly constrained convex programming problem with three blocks: where is a closed convex function and is a matrix in . For solving (3), a nature idea is to extend the ADMM with two blocks to the ADMM with three blocks in which the next iteration is updated by where Similar to the ADMM with two blocks, the ADMM with three blocks has found numerous applications in a broad spectrum of areas, such as doubly nonnegative cone programming , high-dimensional statistics [15, 16], imaging science , and engineering . Even though its numerical efficiency is clearly seen from those applications, the theoretical treatment of ADMM with three blocks is challenging and the convergence of the ADMM is still open given only the convex assumptions of the objective function. To alleviate this difficulty, the authors of [19, 20] proposed prediction-correction type methods to solve the general separable convex programming; however, numerical results show that the direct ADMM outperforms its variants substantially. Therefore, it is of great significance to investigate the theoretical performance of the ADMM with three blocks even only to provide sufficient conditions to guarantee the convergence. To the best of our knowledge, there exist only two works aiming to attack the convergence problem of the direct ADMM with three blocks. By using an error bound analysis method, Hong and Luo  proved the linear convergence of the ADMM with blocks for sufficiently small subject to some technical conditions. However, the sufficiently small requirement on makes the algorithm difficult to implement. In , Han and Yuan employed a contractive analysis method to establish the convergence of ADMM under the strongly convex assumptions of and the parameter less than a threshold depending on all the strongly convex moduli. In this paper, we firstly prove the convergence of ADMM with three blocks under two conditions weaker than those of . In our conditions, the threshold on the parameter only relies on the strongly convex moduli of and , and furthermore is not necessarily strongly convex in one of our conditions. Also, the restricted range of in this paper is shown to be at least three times as big as that of .
In order to accelerate the ADMM with three blocks, we also propose a relaxed ADMM with three blocks which involves an additional computation of optimal step size. Specifically, with the triple , we first generate a predictor according to (5) and then obtain in the next iteration by
where and is special step size defined in (43). The convergence of the relaxed ADMM is also established under mild conditions. We should mention that it is possible to modify the analyses given in this paper to be problems with more than three blocks of separability. But this is not the focus of this paper.
The remaining parts of this paper are organized as follows. In Section 2, we list some preliminaries on the strongly convex function, subdifferential, and the ADMM with three blocks. In Section 3, we first show the contractive property of the distance between the sequence generated by ADMM with three blocks and the solution set and then prove the convergence of ADMM under certain conditions. In Section 4, we extend the direct ADMM with three blocks to the relaxed ADMM with an optimal step size and establish its convergence under suitable conditions. We conclude our paper in Section 5.
Notation. For any positive integer , let be the identity matrix. We use and to denote the vector Euclidean norm and the spectral norm of matrices (defined as the maximum singular value of matrices). For any symmetric matrix , we write for any . and are two matrices defined by respectively. For given , and , we frequently use and to denote the joint vectors of and , respectively; that is, while and are the joint vectors corresponding to and .
Throughout this paper, we assume , are strongly convex functions with modulus ; that is for each . Note that is a strongly convex function with modulus being equivalent to the convexity of . Let be a point of ; the subdifferential of at is defined by From Proposition 6 in , we know that, for each , is strongly monotone with modulus which means
The next lemma introduced in  plays a key role in the convergence analysis of the ADMM and the relaxed ADMM with three blocks.
3. The ADMM with Three Blocks
In this section, we first investigate the contractive property of the distance between the sequence generated by ADMM with three blocks and the solution set under the condition that .
Lemma 2. Let be a KKT point of problem (3) and let the sequence be generated by the ADMM with three blocks. Then, it holds that
Proof. Since minimizes , we deduce from the first order optimality condition that By (14) and the monotonicity of (11), it is easily seen that Then for each , where the last “≥” follows from the elementary inequality Since by direct computations, we further obtain that which, together with , implies Note that We complete the proof of this lemma.
With the above preparation, we are ready to prove the convergence of the ADMM with three blocks for solving (3) given the following conditions.
Theorem 3. Let be the sequence generated by the ADMM with three blocks. Then converges to a KKT point of problem (3) if either of the following conditions holds:(i) and ; (ii) is of full column rank, , and .
Proof. By the inequality (13), it follows that the sequence is bounded. Recall that
Hence is also bounded. Moreover, from (13) we see immediately that
According to the condition that , we know
It therefore holds that
Therefore, the sequence , , is bounded, which, together with the boundedness of , implies that is bounded, and is bounded given the condition or is of full column rank. Moreover, since
by the first equality in (25) and the third equality in (26), it holds that
We proceed to prove the convergence of ADMM by considering the following two cases.
Case 1 ( and ). In this case, the sequence converges to and then By the second equality in (26), we deduce from (29) that Since is bounded, there exist a triple and a subsequence such that which by combining (25), (29) with given conditions, implies Note that Then, by taking the limits on both sides of (33), using (25) and (29), and invoking the upper semicontinuous of , , and , one can immediately write which indicates is a KKT point of problem (3). Hence, the inequality (13) is also valid if is replaced by . Then it holds that which yields By adding the last two equalities in (26) to (36), we know Therefore, we have shown that the whole sequence converges to under condition (i) in Theorem 3.
Case 2 ( is of full column rank, , and ). In this case, the sequence converges to and is bounded. From the second equality in (25) and (28), we have Since is of full column rank, it therefore holds that Let be a cluster point of the sequence . Following a similar proof in Case 1, we are able to show is a KKT point of problem (3) and the whole sequence converges to this point.
Remark 4 (see ). the authors proved the convergence of the ADMM under the conditions that , and are strongly convex and . Our result improves the upper bound by . Moreover, in our condition (ii), the strongly convexity assumption is only imposed on and while is not necessarily strongly convex with positive modulus.
4. The Relaxed ADMM with Three Blocks
For the ADMM with two blocks, Ye and Yuan  developed a variant of alternating direction method with an optimal step size. Numerical results demonstrated that an additional computation on the optimal step size would improve the efficiency of the new variant of ADMM. In this section, by adopting the essential idea of Ye and Yuan , we propose a relaxed ADMM with three blocks to accelerate the ADMM via an optimal step size. For notational simplicity, we write With , the new iterate of extended ADMM is produced by where is the solution of (5) and is defined by
Lemma 5. Let the sequence be generated by the relaxed ADMM with three blocks. Then, if , the following statements are valid:(i) and thus ; (ii) − − .
Proof. By direct computations to , we know that where the second inequality follows Cauchy inequality. It therefore holds that which completes the proof of the first part. By Lemma 1 and the elementary inequality (17), it can be easily verified that and then This, together with the fact that , completes the proof.
Based on the above inequality, we are able to prove the following convergence result of the relaxed ADMM with three blocks. Since the proof is in line with that of Theorem 3, we omit it.
Theorem 6. Let be the sequence generated by the relaxed ADMM. Then converges to a KKT point of problem (3) under the conditions that and , and are of full column rank.
5. Conclusion Remarks
In this paper, we take a step to investigate the ADMM for separable convex programming problems with three blocks. Based on the contractive analysis of the distance between the sequence and the solution set, we establish theoretical results to guarantee the global convergence of ADMM with three blocks under weaker conditions than those employed in . By adopting the essential idea of , we also present a relaxed ADMM with an optimal step size to accelerate the ADMM and prove its convergence under mild assumptions.
The first author is supported by the Natural Science Foundation of Jiangsu Province and the National Natural Science Foundation of China under Project no. 71271112. The second author is supported by university natural science research fund of jiangsu province under grant no. 13KJD110002.
- D. Gabay and B. Mercier, “A dual algorithm for the solution of nonlinear variational problems via finite element approximations,” Computational Mathematics with Applications, vol. 2, pp. 17–40, 1976.
- R. Glowinski and A. Marrocco, “Sur l'approximation, par éléments finis d'ordre un, et la résolution, par pénalisation-dualité, d'une classe de problèmes de Dirichlet non linéaires,” vol. 9, no. R-2, pp. 41–76, 1975.
- J. Eckstein and D. Bertsekas, “On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators,” Mathematical Programming, vol. 55, no. 3, pp. 293–318, 1992.
- D. Gabay, “Chapter ix applications of the method of multipliers to variational inequalities,” Studies in Mathematics and Its Applications, vol. 15, pp. 299–331, 1983.
- R. Glowinski, Numerical Methods for Nonlinear Variational Problems, Springer, New York, NY, USA, 1984.
- R. Glowinski and P. Le Tallec, Augmented Lagrangian and Operator-Splitting Methods in Nonlinear Mechanics, vol. 9, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, Pa, USA, 1989.
- B. He, L. Liao, D. Han, and H. Yang, “A new inexact alternating directions method for monotone variational inequalities,” Mathematical Programming, vol. 92, no. 1, pp. 103–118, 2002.
- B. He and X. Yuan, “On the convergence rate of the Douglas-Rachford alternating direction method,” SIAM Journal on Numerical Analysis, vol. 50, no. 2, pp. 700–709, 2012.
- C. Chen, B. He, and X. Yuan, “Matrix completion via an alternating direction method,” IMA Journal of Numerical Analysis, vol. 32, no. 1, pp. 227–245, 2012.
- M. Fazel, T. K. Pong, D. F. Sun, and P. Tseng, “Hankel matrix rank minimization with applications to system identification and realization,” SIAM Journal on Matrix Analysis and Applications, vol. 34, no. 3, pp. 946–977, 2012.
- B. He, M. Xu, and X. Yuan, “Solving large-scale least squares semidefinite programming by alternating direction methods,” SIAM Journal on Matrix Analysis and Applications, vol. 32, no. 1, pp. 136–152, 2011.
- J. Yang, Y. Zhang, and W. Yin, “A fast alternating direction method for tvl1-l2 signal reconstruction from partial fourier data,” IEEE Journal of Selected Topics in Signal Processing, vol. 4, no. 2, pp. 288–297, 2010.
- J. Yang and Y. Zhang, “Alternating direction method algorithms for l1-problems in compressive sensing,” SIAM Journal on Scientific Computing, vol. 33, no. 1, pp. 250–278, 2011.
- Z. Wen, D. Goldfarb, and W. Yin, “Alternating direction augmented Lagrangian methods for semidefinite programming,” Mathematical Programming Computation, vol. 2, no. 3-4, pp. 203–230, 2010.
- M. Tao and X. Yuan, “Recovering low-rank and sparse components of matrices from incomplete and noisy observations,” SIAM Journal on Optimization, vol. 21, no. 1, pp. 57–81, 2011.
- J. Yang, D. Sun, and K. Toh, “A proximal point algorithm for logdeterminant optimization with group Lasso regularization,” SIAM Journal on Optimization, vol. 23, no. 2, pp. 857–293, 2013.
- Y. Peng, A. Ganesh, J. Wright, W. Xu, and Y. Ma, “RASL: robust alignment by sparse and low-rank decomposition for linearly correlated images,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR '10), pp. 763–770, 2010.
- K. Mohan, P. London, M. Fazel, D. Witten, and S. I. Lee, “Node-based learning of multiple gaussian graphical models,” http://arxiv.org/abs/1303.5145.
- B. He, M. Tao, M. H. Xu, and X. M. Yuan, “Alternating directions based contraction method for generally separable linearly constrained convex programming problems,” Optimization, vol. 62, no. 4, pp. 573–596, 2013.
- B. He, M. Tao, and X. Yuan, “Alternating direction method with Gaussian back substitution for separable convex programming,” SIAM Journal on Optimization, vol. 22, no. 2, pp. 313–340, 2012.
- M. Hong and Z. Luo, “On the linear convergence of the alternating direction Method of multipliers,” http://arxiv.org/abs/1208.3922.
- D. Han and X. Yuan, “A note on the alternating direction method of multipliers,” Journal of Optimization Theory and Applications, vol. 155, no. 1, pp. 227–238, 2012.
- R. Rockafellar, “Monotone operators and the proximal point algorithm,” SIAM Journal on Control and Optimization, vol. 14, no. 5, pp. 877–898, 1976.
- R. Rockafellar, Convex Analysis, Princeton Mathematical Series, no. 28, Princeton University Press, Princeton, NJ, USA, 1970.
- C. Ye and X.-M. Yuan, “A descent method for structured monotone variational inequalities,” Optimization Methods & Software, vol. 22, no. 2, pp. 329–338, 2007.
Copyright © 2013 Caihua Chen et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.