Convergence of Linear Bregman ADMM for Nonconvex and Nonsmooth Problems with Nonseparable Structure

Chao, Miantao; Deng, Zhao; Jian, Jinbao

doi:https://doi.org/10.1155/2020/6237942

Complexity

On this page

Abstract Introduction Preliminaries Conclusions Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2020 | Article ID 6237942 | https://doi.org/10.1155/2020/6237942

Convergence of Linear Bregman ADMM for Nonconvex and Nonsmooth Problems with Nonseparable Structure

Miantao Chao,¹Zhao Deng,¹and Jinbao Jian²

Academic Editor: Zhile Yang

Received05 Sept 2019

Revised06 Jan 2020

Accepted27 Jan 2020

Published26 Feb 2020

Abstract

The alternating direction method of multipliers (ADMM) is an effective method for solving two-block separable convex problems and its convergence is well understood. When either the involved number of blocks is more than two, or there is a nonconvex function, or there is a nonseparable structure, ADMM or its directly extend version may not converge. In this paper, we proposed an ADMM-based algorithm for nonconvex multiblock optimization problems with a nonseparable structure. We show that any cluster point of the iterative sequence generated by the proposed algorithm is a critical point, under mild condition. Furthermore, we establish the strong convergence of the whole sequence, under the condition that the potential function satisfies the Kurdyka–Łojasiewicz property. This provides the theoretical basis for the application of the proposed ADMM in the practice. Finally, we give some preliminary numerical results to show the effectiveness of the proposed algorithm.

1. Introduction

In this paper, we consider the following possibly nonconvex and nonsmooth optimization problem:where are variables vectors, is differentiable, and each is proper and lower semicontinuous, are given matrix, and .

The alternating direction method of multipliers (ADMM) is a very effective method for solving the convex two-block optimization problem [1, 2]. A natural idea is to extend ADMM to solve problem (1). However, ADMM or its directly extend version may not converge, when either the involved number of blocks is more than two, or there is a nonconvex function, or there is a nonseparable structure. Recently, there have been a few developments on it, e.g., [3–13].

Hong et al. [6] considered the sharing and consensus problem and showed that the classical ADMM converges to the set of stationary solutions, only provided that the penalty parameter in the augmented Lagrangian is chosen to be sufficiently large. Li and Pong [8] studied the convergence of ADMM for some special two-block nonconvex models, where one of the matrices A and B is an identity matrix. Wang et al. [9, 10] studied the convergence of the nonconvex Bregman ADMM algorithm, which includes ADMM as a special case. Wang et al. [11] studied the convergence of the ADMM for nonconvex nonsmooth optimization with a nonseparable structure. Guo et al. [4, 5]studied the convergence of classical ADMM for two-block and multiblock nonconvex models where one of the matrices is an identity matrix. Yang et al. [13] studied the convergence of the ADMM for a nonconvex optimization model which come from the background/foreground extraction.

The purpose and the main contribution of this paper is to propose and prove the convergence of a new variant ADMM for nonconvex coupled problems (1). The novelty of this paper can be summarized as follows:(1)Compared to the existing literature, the model in this paper is more general. There is no nonseparable structure in the models considered by [4–10, 12, 13]. Wang et al. [11] considered two scenarios. If , then (1) is the scenario 1 in [11]. If , for , then (1) becomes the scenario 2 in [11]. Furthermore, in this paper, the matrices and B are possibly not full column or row rank.(2)The proposed algorithm combines linearization technology with regularization technology. Linearization technology and regularization technology can effectively reduce the difficulty of the solving subproblems.

The rest of this paper is organized as follows. In Section 2, some basic concepts and necessary preliminaries for further analysis are summarized. In Section 3, we propose the algorithm and analyze the convergence of it for 3-block nonconvex and nonsmooth coupled problems. Finally, some conclusions are made in Section 4.

2. Preliminaries

denotes the n-dimensional Euclidean space, denotes the extended real number set, and denotes the natural number set. The image space of a matrix is defined as . denotes the Euclidean projection onto . If matrix , let denote the smallest positive singular value of the matrix . represents the Euclidean norm. is the domain of a function . . . For a set and a point , let If , we set for all . For a point-to-set mapping F, its graph is defined by

Definition 1 (see [14]). Let be a proper function. If there exists such thatfor all and , then f is called strongly convex with modulus δ.

Definition 2 (see [15]). For a convex differential function , the associated Bregman distance is defined asThe Bregman distance plays an important role in iterative algorithms. The Bregman distance share many similar nice properties of the Euclidean distance. However, the Bregman distance is not a metric, since it does not satisfy the triangle inequality nor symmetry. Some examples of Bregman distance include [16](i)Classical Euclidean distance: if , then (ii)Itakura–Saito distance: if , then (iii)Mahalanobis distance: if with Q a symmetric positive definite matrix, then Let us now collect some useful properties about Bregman distance.

Proposition 1 (see [15]). Let ϕ be differentiable and strongly convex function with modulus δ, then(i) and if and only if (ii) for all The following notations and definitions are quite standard and can be founded in [14, 17].

Definition 3. Let be a proper lower semicontinuous function.(i)The subdifferential, or regular subdifferential, of f at is When , we set .(ii)The limiting subdifferential, or simply the subdifferential, of f at , written , is defined as(iii)A point that satisfies is called a critical point or a stationary point of the function f. The set of critical points of f is denoted by .The following proposition collects some properties of the subdifferential.

Proposition 2 (see [17]). Let and be proper lower semicontinuous functions. Then, the following holds:(i) for each . Moreover, the first set is closed and convex, while the second is closed and not necessarily convex.(ii)Let be a sequence such that it converges to . If , then .(iii)If is a local minimizer of f then .(iv)If is continuous differentiable, then .

The Lagrangian function of (1), with multiplier , is defined as

Definition 4. If such thatthen is called a critical point or stationary point of the Lagrange function .
A very important technique to prove the strong convergence of the ADMM for nonconvex optimization problems relies on the assumption that the benefit function satisfying Kurdyka-Łojasiewicz property (KL property) [18–21]. There are many functions which satisfy this inequality. Especially, when the function belongs to some functional classes, e.g., semialgebraic, real subanalytic, and log-exp (see [22–24]). It is often elementary to check that such an inequality holds.
For notational simplicity, we use to denote the set of concave functions such that(i)(ii) is continuous differentiable on and continuous at 0(iii)The KL property can be described as follows.

Definition 5. (see [18–21]) (KL property). Let be a proper lower semicontinuous function. If there exists , a neighborhood U of , and a function , such that for all , it holds thatthen f is said to have the KL property at .

Lemma 1 (see [22]) (uniformized KL property). Suppose that is a proper lower semicontinuous function and is a compact set. If for all and satisfies the property at each point of . Then, there exist , and such thatfor all

Lemma 2 (see [25]) (Descent lemma). Let be a continuous differentiable function where gradient is Lipschitz continuous with the modulus , then for any , we have

Lemma 3 (see [26]). Let be a nonzero matrix and let denote the smallest positive eigenvalue of . Then, for every , there holds

3. Algorithm and Convergence

For the convenience of analysis, we only consider the case of . The obtained results could naturally be generalized to the case of . Thus, in the rest of this paper, we consider the following nonconvex and nonsmooth 3-block optimization problem:where is proper and lower semicontinuous but possibly nonconvex, is differentiable, , and .

In this paper, we present the following algorithm for (12).

Algorithm 1. LBADMM: start with and . With the given iteration point , the new iteration point is given as follows:where , , and are the Bregman distances associated with , , and ϕ, respectively.

Remark 1. Due to the different structures of the problem, the algorithm in this paper is different from the existing algorithms. In order to make use of the properties of differentiable blocks and simplify the calculation of each iteration, we linearize the differentiable part in the and subproblems. If the function is only related to the variable y, that is, , then the algorithm LBADMM will become the Bregman ADMM in [9, 10]. Different from [9, 10], we do not assume B is full row rank.
In this section, we always assume that the sequence is generated by algorithm LBADMM. Let , where denotes the smallest positive eigenvalue of .

Assumption 1. (i) is -Lipschitz continuous, i.e., for all (ii) and (iii) are Lipshitz continuous with the modulus , respectively(iv) strongly convex with the modulus , and (v)The following lemma establishes the relationship between the dual variable and the original variables.

Lemma 4. For each ,

Proof. By Assumption 1 (ii) and Lemma 3, we haveFrom the optimality condition of y-subproblem in (14) yieldsTaking into account , one hasThus,It follows from the abovementioned formula and (17) thatThe proof is completed.
The augmented Lagrangian function with multiplier of (12) is defined aswhere is the Lagrangian function of (12). LetLetThe following lemma implies the monotonicity of the sequence .

Lemma 5. For each ,where

Proof. From (17), we haveAdding up the abovementioned three formulas, we haveand hencethat is,From Lemma 2, Assumption 1 (iv), and Proposition 1, we obtainRecall thatAdding up (71) and (72), we haveTogether with (14), we obtainwhich implies thatThat is, (23) holds.

Remark 2. From Assumption 1 (iv), we have and . Furthermore, from Assumption 1 (v), we have .

Lemma 6. If the sequence is bounded, then

Proof. Since is bounded, the sequence is bounded and there exists a subsequence such that Since are lower semicontinuous and is Lipschitz differentiable, the function is lower semicontinuous, which leads toThus, is bounded from below. From Lemma 5, is nonincreasing. Thus, is convergent. Furthermore, is also convergent and for each k. By Lemma 5, we haveFrom the abovementioned formula, we obtainNote that and the arbitrariness of t, we obtainIn view of (14), we haveThus,

Lemma 7. There exists such thatwhere

Proof. From the definition of , we haveFrom (14) and the optimality conditions, one hasThat is,From (43) and (45), we havewhereThus,It follows from Assumption 1 and Lemma 4 that there exists a such thatThe following theorem shows that the algorithm LBADMM has global convergence.

Theorem 1. Let denote the cluster point set of , then(i) is a nonempty compact set, and (ii)If , then (iii) is finite and constant on and equal to

Proof. (i)By the definition of , it is trivial.(ii)Let , then there exists a subsequence of converging to . Since , Since and , Let It follows from (14) that and . Thus, Noting that and are lower semicontinuous, we have It follows form the abovementioned four formulas that Together with the continuity of and the closeness of , by taking limit in (45) along the subsequence and yields That is, is a critical point of the Lagrange function L of (12).(iii)From (53) and Lemma 5, we haveFrom (55) and the descent of , we obtainTherefore, is constant on . Moreover,
The following theorem is the main result of this paper.

Theorem 2 (strong convergence). Suppose that Assumption 1 holds, satisfies the property at each point of , then(i)(ii) converges to a critical point of

Proof. From Theorem 1, we have for all . We consider two cases.(i)If there exists an integer such that . From Lemma 5, we have Thus, for any we have Hence, for any it follows that and the assertion holds.(ii)Assume that for all . Since , it follows that for any given there exists such that , for all . Since , for given there exists such that , for all . Consequently, when ,Since is a nonempty compact set and is constant on , applying Lemma 1, we haveFrom Lemma 7, one hasFrom the concavity of , we haveThus, associating with Lemma 5 and , we haveFor convenience, we set Thus,That is,By the fact , we obtainwhich along with (64) yieldsSumming up the abovementioned formula for yieldsNotice that ; thus,whereThus,By Lemma 4, one has . Furthermore, . Consequently is a Cauchy sequence. The assertion then follows immediately from Theorem 1.

Remark 3. In this section, the main conclusions are based on the boundedness assumption of the sequences . The following conclusion shows that we only need to assume that the sequence is bounded.

Proposition 3. If and the sequence is bounded, then is bounded.

Proof. From (17), one hasSince is bounded, is bounded. By Assumption 1 (ii) and Lemma 3, we haveThus, is bounded.
Next, we present a sufficient condition of boundedness of the sequence , which is similar with Lemma 8 in [4].

Lemma 8. Let be the sequence generated by Algorithm 1. Suppose that and there exists such that

Ifthen is bounded.

Proof. From Lemma 5, we know thatThen, combining with , we obtainNote that , we haveUnder the assumptions, one can easily observe that , and are all bounded. Boundedness of follows from Proposition 3. Therefore, is bounded.

4. Numerical Results

In compressed sensing, a fundamental problem is recovering an n-dimensional sparse signal x from a set of m incomplete measurements. In such a case one needs to find the sparsest solution of a linear system, which can be modeled aswhere is the measurement matrix, is the observed data, is a regularization parameter, and denotes the number of nonzero elements of x. In general, the abovementioned models are NP-hard. In order to overcome such a difficulty, one can relax the regularization to regularization. And, some scholars generally solve the following problems instead of solving problem (78) [10, 27]:where

Based on (79), we construct the following problems:

In order to verify the effectiveness of the algorithm LBADMM, we now focus on applying the algorithm LBADMM to solve the nonconvex optimization problem (81). Applying the algorithm LBADMM to problem (81) with and , we havewhere is the half shrinkage operator [27] defined as withwith

In the experiment, we choose , to normalize the columns to have unit norm. The variable are generated with 100 nonzero entries, each sample from an Gaussian distribution. The variables , , , and were initialized to be zero. The vector , where . We set , , , and , and the regularization parameter . It is easy to verify that the parameters meet Assumption 1 (v). Defining the residual at iteration k as , a reasonable termination criterion is that the residual must be small, so we choose the stop criterion as

The numerical results are reported in Table 1. The codes were written by matlab R2016a, the computer running the program is configured as Windows 10 system, Inter (R) Core (TM) i7-6500U 2.5 GHz CPU, 8 GB memory. We report the number of iterations (“Iter”.), the computing time in seconds (“Time”) and the objective function value (“f-val”). Numerical results show that the Algorithm LBADMM is stable and effective.

A part of computational results are presented in Figures 1–3. In each figure, we plot the trend of the objective value (“objective-value”) and the trend of the residual defined by (“”).

(a)

(b)

(a)

(b)

(a)

(b)

5. Conclusions

We propose a new algorithm called linear Bregman ADMM for the three-blocks optimization problem with the nonseparable structure. The proposed algorithm integrates the basic ideas of the linearization technology and regularization technology. We show that any cluster point of the sequence generated by the proposed algorithm is a critical point. Under the condition that the potential function satisfies the Kurdyka-Łojasiewicz property and the penalty parameter is larger than a constant, the strong convergence of the algorithm is proved. Preliminary numerical results show that the algorithm LBADMM is stable and effective.

Data Availability

No data were used to support this study.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the Natural Science Foundation of China (nos. 11601095 and 11771383) and Natural Science Foundation of Guangxi Province (nos. 2016GXNSFBA380185 and 2016GXNSFDA380019).

References

R. Glowinski and A. Marroco, “Sur l’approximation, par éléments finis d’ordre un, et la résolution, par pénalisation-dualité d’une classe de problèmes de dirichlet non linéaires,” Revue Française D’automatique, Informatique, Recherche Opérationnelle. Analyse Numérique, vol. 9, no. R2, pp. 41–76, 1975.
View at: Publisher Site | Google Scholar
D. Gabay and B. Mercier, “A dual algorithm for the solution of nonlinear variational problems via finite element approximation,” Computers & Mathematics with Applications, vol. 2, no. 1, pp. 17–40, 1976.
View at: Publisher Site | Google Scholar
M. T. Chao, C. Z. Cheng, and D. Y. Liang, “A proximal block minimization method of multipliers with a substitution procedure,” Optimization Methods and Software, vol. 30, pp. 825–842, 2014.
View at: Google Scholar
K. Guo, D. R. Han, and T. T. Wu, “Convergence of alternating direction method for minimizing sum of two nonconvex functions with linear constraints,” International Journal of Computer Mathematics, vol. 94, no. 8, pp. 1653–1669, 2017.
View at: Publisher Site | Google Scholar
K. Guo, D. Han, D. Z. W. Wang, and T. Wu, “Convergence of ADMM for multi-block nonconvex separable optimization models,” Frontiers of Mathematics in China, vol. 12, no. 5, pp. 1139–1162, 2017.
View at: Publisher Site | Google Scholar
M. Hong, Z.-Q. Luo, and M. Razaviyayn, “Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems,” SIAM Journal on Optimization, vol. 26, no. 1, pp. 337–364, 2016.
View at: Publisher Site | Google Scholar
B. Jiang, T. Lin, S. Ma, and S. Zhang, “Structured nonconvex and nonsmooth optimization: algorithms and iteration complexity analysis,” Computational Optimization and Applications, vol. 72, no. 1, pp. 115–157, 2019.
View at: Publisher Site | Google Scholar
G. Li and T. K. Pong, “Global convergence of splitting methods for nonconvex composite optimization,” SIAM Journal on Optimization, vol. 25, no. 4, pp. 2434–2460, 2015.
View at: Publisher Site | Google Scholar
F. H. Wang, W. F. Cao, and Z. B. Xu, “Convergence of multi-block Bregman ADMM for nonconvex composite problems,” Science China Information Sciences, vol. 61, p. 122101, 2018.
View at: Publisher Site | Google Scholar
F. H. Wang, Z. B. Xu, and H. K. Xu, “Convergence of alternating direction method with multipliers for non-convex composite problems,” 2014, http://arxiv.org/abs/1410.8625.
View at: Google Scholar
Y. Wang, W. Yin, and J. Zeng, “Global convergence of ADMM in nonconvex nonsmooth optimization,” Journal of Scientific Computing, vol. 78, no. 1, pp. 29–63, 2019.
View at: Publisher Site | Google Scholar
Z. M. Wu, M. Li, D. Z. W. Wang, and D. R. Han, “A symmetric alternating direction method of multipliers for separable nonconvex minimization problems,” Asia-Pacific Journal of Operational Research, vol. 34, no. 6, Article ID 1750030, 2017.
View at: Publisher Site | Google Scholar
L. Yang, T. K. Pong, and X. Chen, “Alternating direction method of multipliers for a class of nonconvex and nonsmooth problems with applications to background/foreground extraction,” SIAM Journal on Imaging Sciences, vol. 10, no. 1, pp. 74–110, 2017.
View at: Publisher Site | Google Scholar
R. T. Rockafellar and R. J. B. Wets, Variational Analysis, Springer, Berlin, Germany, 1998.
L. M. Bregman, “The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming,” USSR Computational Mathematics and Mathematical Physics, vol. 7, no. 3, pp. 200–217, 1967.
View at: Publisher Site | Google Scholar
A. Banerjee, S. Merugu, I. Dhillon et al., “Clustering with bregman divergences,” Journal of Machine Learning Research, vol. 6, pp. 1705–1749, 2005.
View at: Google Scholar
B. Mordukhovich, Variational Analysis and Generalized Differentiation, I: Basic Theory, Springer-Verlag, Berlin, Germany, 2006.
J. Bolte, A. Daniilidis, O. Ley, and L. Mazet, “Characterizations of Lojasiewicz inequalities and applications: subgradient flows, talweg, convexity,” Transactions of the American Mathematical Society, vol. 362, pp. 3319–3363, 2010.
View at: Google Scholar
K. Kurdyka, “On gradients of functions definable in o-minimal structures,” Annales de l’institut Fourier, vol. 48, no. 3, pp. 769–783, 1998.
View at: Publisher Site | Google Scholar
S. Lojasiewicz, “Une propriété topologique des sous-ensembles analytiques réels,” in Les Équations aux Dérivées Partielles, pp. 87–89, Editions du centre National de la Recherche Scientifique, Paris, France, 1963.
View at: Google Scholar
S. Lojasiewicz, “Sur la geométrie semi-et sous-analytique,” Annales de l’institut Fourier, vol. 43, pp. 575–1595, 1993.
View at: Google Scholar
H. Attouch, J. Bolte, and B. F. Svaiter, “Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods,” Mathematical Programming, vol. 137, no. 1-2, pp. 91–129, 2013.
View at: Publisher Site | Google Scholar
H. Attouch, J. Bolte, P. Redont, and A. Soubeyran, “Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the kurdyka-lojasiewicz inequality,” Mathematics of Operations Research, vol. 35, no. 2, pp. 438–457, 2010.
View at: Publisher Site | Google Scholar
J. Bolte, A. Daniilidis, and A. Lewis, “The lojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems,” SIAM Journal on Optimization, vol. 17, no. 4, pp. 1205–1223, 2007.
View at: Publisher Site | Google Scholar
Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course, Volume 87 of Applied Optimization, Springer, Berlin, Germany, 2004.
M. L. N. Goncalves, J. G. Melo, and R. D. C. Monteiro, “Convergence rate bounds for a proximal ADMM with over-relaxation stepsize parameter for solving nonconvex linearly constrained problems,” February 2017, http://arxiv.org/abs/1702.01850.
View at: Google Scholar
Z. B. Xu, X. Y. Chang, F. M. Xu, and H. Zhang, “ regularization: a thresholding representation theory and a fast solver,” IEEE Transactions on Neural Networks and Learning Systems, vol. 23, pp. 1013–1027, 2012.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2020 Miantao Chao et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

1078

Downloads

1203

Citations