Inexact Version of Bregman Proximal Gradient Algorithm

Kabbadj, S.

doi:https://doi.org/10.1155/2020/1963980

Abstract and Applied Analysis

On this page

Abstract Introduction Conclusion Data Availability Conflicts of Interest References Copyright Related Articles

Research Article | Open Access

Volume 2020 | Article ID 1963980 | https://doi.org/10.1155/2020/1963980

Inexact Version of Bregman Proximal Gradient Algorithm

S. Kabbadj¹

Academic Editor: Allan Peterson

Received31 Aug 2019

Revised19 Jan 2020

Accepted21 Jan 2020

Published01 Apr 2020

Abstract

The Bregman Proximal Gradient (BPG) algorithm is an algorithm for minimizing the sum of two convex functions, with one being nonsmooth. The supercoercivity of the objective function is necessary for the convergence of this algorithm precluding its use in many applications. In this paper, we give an inexact version of the BPG algorithm while circumventing the condition of supercoercivity by replacing it with a simple condition on the parameters of the problem. Our study covers the existing results, while giving other.

1. Introduction

We consider the following minimization problem:where f is a convex proper lower-semicontinuous (l.s.c.) function and is a convex continuously differentiable function. This problem arises in many applications including compressed sensing [1], signal recovery [2], and phase retrieve problem [3]. One classical algorithm for solving this problem is the proximal gradient (PG) method:where is the stepsize on each iteration. The Proximal Gradient Method and its variants [4–14] have been one hot topic in optimization field for a long time due to their simple forms. A central property required in the analysis of gradient methods is that of the Lipschitz continuity of the gradient of the smooth part . However, in many applications, the differentiable function does not have such a property, e.g., in the broad class of Poisson inverse problems. In [15], by introducing the Bregman distance [16] generated by some reference convex function h defined bythe authors could replace the intricate question of Lipschitz continuity of gradients by a convex condition easy to verify, which we call below LC property. Thereby, they proposed and studied the algorithm called NoLips defined bywhere . Equation (4) is the Bregman Proximal (BP) studied in [17–21].

In this article, we give an inexact version of the (BPG) defined byWhile circumventing the condition of supercoercivity required in [15, 22] by replacing it with a simple condition on the parameters of the problem, our study covers the existing results, while giving others.

Our notation is fairly standard, is the scalar product on , and the associated norm . The closure of the set C (relative interior) is denoted by (riC, respectively). For any convex function f, we denote by(1) its effective domain(2) its (3) its argmin f(4) its

2. Preliminary

In this section, we present the main results of the convergence of NoLips.

Definition 1 (see [23]). Let C be a convex, not empty of .(i)A convex function is of Legendre on C if it verifies the three following conditions:(a)C = int (dom h)(b)h is differentiable on C(c), for any sequence of C that converges towards a boundary point of C(ii)The class of strictly convex functions verifying a, b, and c is called the class of Legendre’s functions on C and denoted by .

Definition 2 (see [23]). Let F: ; we say that F is supercoercive ifConsider the following assumptions.

Assumption 1. (i)h: is of Legendre type(ii): is convex proper l.s.c. with , which is differentiable on (iii)f: is convex proper lower semicontinuous (l.s.c.)(iv)(v)We consider the following minimization problem: (P):
Let the operator be defined by

Lemma 1 (well-posedness of the method). Under Assumption 1, suppose one of the following assumptions holds:(i) is nonempty and compact(ii) is supercoercive and the map defined in (7) is nonempty and single-valued from int (domh) to int (domh).

Definition 3. The couple (, h) verified a Lipschitz-like/Convexity Condition (LC) if with Lh-g convex on int (dom h).
By posingthey showed thatwhere The operator thus appears as composed of two operators prox. The NoLips algorithm then becomes

Assumption 2. (i) is nonempty and compact or is supercoercive(ii)For every and , the level set is bounded(iii)If converges to some x in int (dom h), then (iv)Reciprocally, if x in int (dom h) and if is such that then (v) with Lh-g convex on int (dom h) (LC)

Theorem 1 (Global Convergence). Assume that(i).(ii) and Assumptions 1 and 2 are satisfied. Then, the sequence converges to some solution of (P).

Our contribution is resumed in two essential points:(1)Improvements of some assumptions:(a)Suppose f and are both are convex (see [15, 22]), we show that we can reduce this hypothesis by supposing only that is convex, which allows to distinguish two interesting cases that are still not yet studied neither in the case of the BPG nor in the case PG:(i)The nonsmooth part f is possibly not convex and the smooth part is convex(ii)The nonsmooth part f is convex and the smooth part is possibly not convex(b)The assumption is as follows: is compact or is supercoercive. This is a condition on f and (see [15]), which precludes the application of NoLips for the functions non-supercoercive. In this work, we show that we can circumvent this condition by coupling the LC property with the bounded level sets as follows: It is a condition which relates to the parameter h and which is verified by most of the interesting Bregman distances.(2)Inexact version of NoLips.

We propose an inexact version of NoLips called NoLips which verifies

The convergence result is established in Section 4. This study covers the convergence results given for PG and BPG, by giving new results, in particular, the convergence of the inexact version of the interior method with Bregman distance studied in [24]; this result has not been established until now.

We also note that the convergence of NoLips is given with the following condition:It is for that and for the clarity of the hypothesis, we suppose in what follows that , with S being an open convex set of

3. Main Results

In order to clarify the status of parameter h, we give the following definitions. Let S be a convex open subset of and . Let us consider the following hypotheses: : h is continuously differentiable on S. : h is continuous and strictly convex on : the sets below are bounded : the sets below are bounded : if , then , so : if , then so

Definition 4. (i)h: is a Bregman function on S or“D-function” if h verifies , and (ii) such that :Eq. (18) is called Bregman distance if h is a Bregman function. We put the following conditions:

Proposition 1. Let h and verify .

Lemma 2.

Example 1. If and , then

Example 2. If andwith the convention , then :

Example 3. If and , then

Proposition 2 (see [19]). We consider the following minimization problem:

The following assumptions on the problem’s data are made throughout the paper (and referred to as the blanket assumptions).

Assumption 3. (i)(ii): is proper (l.s.c.) with , which is and continuously differentiable on (iii)f: is proper (l.s.c.)(iv)(v)We consider the operator defined by :We give in the following a series of lemmas allowing establishment of Theorem 2, which assures the well-posedness of the method proposed in Section 4.

Lemma 3.

Proof. When, we have

Lemma 4. If the pair (, h) verified the condition (LC), then :(i)(ii)

Proof. (i)If Lh- is convex on S, then Let There exists a sequence such that then, we have Lh- is continuous in ; then, (ii)Let

Lemma 5. If h is the Legendre on S, then is also the Legendre on S, for all λ such that

Proof. Conditions (a) and (b) of Definition 1 being verified, let us demonstrate that the condition (c) is verified too. Let be Then, is strictly convex in S. Indeed, let we have h is strongly convex on S, then, which is absurd. Hence, is strictly convex in S.

Lemma 6. Consider the following:

Proof. Since is a Legendre function on S, is also a Legendre. By application of Theorem 26.1 in [23], verifies the following:(i)If , then(ii)If , then

Theorem 2 (well-posedness of the method). We assume that(i) is convex.(ii)The pair (, h) verified the condition (LC).(iii) the sets below are bounded:Then, and the map defined in (25) is nonempty and single-valued from S to S.

Proof. and is nonempty; for this, it is enough to demonstrate that which is closed and is bounded when it is nonempty:It follows thatthanks to is bounded, which leads that is bounded too, which shows thatLet Let us suppose that since from [10], which allows to write thatIt follows thatsuch thatis in contradiction with Lemma 6. Then,
On the other hand, is strictly convex in S and is convex, so is strongly convex in S. Then, has a unique value for all

Remark 1. This result is liberated from the supercoercivity of and the simultaneous convexity of f and , as required by Lemma 2 [15].

Proposition 3.

Proof. Since , we haveFor (46), just take since

Proposition 4. where

Proof. The first equality is due to Lemma 3. The second is established in [15].

The first equality played a decisive role in the development of this paper.

4. Analysis of the NoLips Algorithm

In this section, we propose an Inexact Bregman Proximal Gradient Algorithm (IBPG), which is an inexact version of the BPG algorithm described in [15, 22]; the IBPG framework allows an error in the subgradient inclusion by using the error We study two algorithms:(i)Algorithm 1: inexact Bregman Proximal Gradient (IBPG) algorithm without relative error criterion(ii)Algorithm 2: inexact Bregman Proximal Gradient (IBPG) algorithm with relative error criterion which we call NoLips

We establish the main convergence properties of the proposed algorithms. In particular, we prove its global rate of convergence, showing that it shares the claimed sublinear rate of basic first-order methods such as the classical PG and BPG. We also derive a global convergence of the sequence generated by NoLips to a minimizer of (P).

Assumptions 4. (i)(ii) is convex(iii)The pair (, h) verified the condition (LC)(iv)Argmin In our analysis, is supposed to be a convex function; it allows to distinguish two interesting cases:(i)The nonsmooth part f is possibly not convex, and the smooth part is convex(ii)The nonsmooth part f is convex, and the smooth part is possibly not convexIn what follows, the choice of the sequence depends of the convexity of .
Let such that ,
If is not convex, then we chooseIf is convex, then we chooseIn those conditions, we easily show that
, , such thatWe pose

Proposition 5. The sequence defined by (IBPG) exists and is verified for all :

Proof. Existence is deduced trivially from (45):By applying Lemma 2, we havewhere

Remark 2. This result shows that IBPG is an inexact version of BPG and this is exactly the BPG when i.e.:(i)(ii)

Proposition 6. For all ,

Proof. We put in (59), we get (57)(ii)Put in (59), we have

Corollary 1. (i)If , the sequence is nonincreasing(ii)Summability: if , then and

Proof. (i)From (57), .(ii)From (58), :

In the following, we pose

Proposition 7 (Global Estimate in Function Values). (a) For all ,

Proof. We have ; from (59), we haveFrom (52), we have (), so

This theorem covers the evaluation of the global convergence rate given in , as shown by the following corollary.

Corollary 2. We assume that(a)(b)(c)Then,i.e.,

Proof. Immediate consequence of Proposition 7.

Now, we derive a global convergence of the sequence generated by Algorithm 1 to a minimizer of (P).

Theorem 3. We assume that if one of the following assumptions holds:(i) such that , (ii)The sequence is nonincreasing and ; then, (a) and (b)

Proof. (a)Suppose(i)Let and we put in (69), we have (68) and (suppose (69)) Suppose (ii) For in (63), we have , so .(b) , soThen, is bounded, and from , is bounded as well. Let ; there exists then a subsequence of such that . From , On the other hand,so Indeed,which shows that Then,Since and , we haveWe haveso ; then,And from , we have
The IBPG algorithm generates a sequence such that does not necessarily be nonincreasing; for this reason and for improvement of the global estimate in function values, we now propose NoLips which is an inexact version of BPG with a relative error criterion. Let σ such that be given as follows.
In what follows, we will derive a convergence rate result (Theorem 4) for the NoLips framework. First, we need to establish a few technical lemmas.

In the following, denotes the sequence generated by NoLips.

Lemma 7. For every , for all ,

Proof. Since , we have from (62),From Algorithm 2, we haveFrom the condition LC, we have

(1) Input:
(2) For with we obtain

Remark 3. We now notice that is nonincreasing. Just replace u with in (79).

Lemma 8. For every and , we have

Proof. Replacing u by in (79), and since and , we haveReplacing u by in (79), we haveSince , by adding (84) and (85), we have (83).

Lemma 9. For every

Proof. This result is obtained by adding inequality (83) from 1 to k

We are now ready to state the convergence rate result for the NoLips framework. This result improves and completes the one given in the Proposition 7.

Theorem 4. For every , the following statements hold: we pose . Consider

Proof. From (86) and since and , we immediately have (87) and (88).

Corollary 3. Consider an instance of the NoLips framework with for every Then, for every , the following statements hold:

Proof. and (87) (90):and (88) (90).

Remark 4. (87) and (89) represent exactly the convergence rate established in [15, 22]. (90) is a new result not established in [15, 22]; this result shows that converges to zero at a rate of .

Theorem 5. If , then(a)(b)

Proof. Replacing u by in (81), we haveBy adding the inequality (92) from 1 to k, we have, so From (69), we haveSince is nonincreasing and (94), by applying the Theorem 3 (ii), we have the results of Theorem 5.

5. Application to Nonnegative Linear Inverse Problem

In Poisson inverse problems (e.g., [25, 26]), we are given a nonnegative observation matrix and a noisy measurement vector , and the goal is to reconstruct the signal such that . We can naturally adopt the distance to measure the residuals between two nonnegative points, withwhere denotes the ith line of A.

In this section, we propose an approach for solving the nonnegative linear inverse problem defined by

We take

It is shown in [15] that the couple (, h) verified a Lipschitz-like/Convexity Condition (LC) on for any L such thatwhere denotes the jth column of A.

For , Theorem 3 is applicable and global warrant of Algorithm 1 convergence to an optimal solution of .

(1)	Input:
(2)	Choose and find and such that


(3)	Set and go to step 1

Given , the iterationamounts to solving the one-dimensional problem:

For ,where is the jth component of

6. Conclusion

The proposed algorithms constitute a unified frame for the existing algorithms BPG, BP, and PG, by giving others, in particular, the inexact version of the interior method with Bregman distance studied in [24]. More precisely,(i)When , our algorithm is the NoLips studied in [15, 22](ii)When , our algorithm is the inexact version of Bregman Proximal (BP) studied in [19](iii)When and , our algorithm is the Bregman Proximal (BP) studied in [17, 21](iv)When , our algorithm is the inexact version of the interior method with Bregman distance studied in [24](v)When and our algorithm is the interior method with Bregman distance studied in [24](vi)When , our algorithm is the proximal gradient method (PG) and its variants [4–14, 27]

Our analysis is different and more simple than the one given in [15, 22] and allows to reduce some hypothesis, in particular, the supercoercivity of as well as the simultaneous convexity of f and .

Data Availability

No data were used to support this study.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

References

D. L. Donoho, “Compressed sensing,” IEEE Transactions on Information Theory, vol. 52, no. 4, pp. 1289–1306, 2006.
View at: Publisher Site | Google Scholar
A. Beck and Y. C. Eldar, “Sparsity constrained nonlinear optimization: optimality conditions and algorithms,” SIAM Journal on Optimization, vol. 23, no. 3, pp. 1480–1509, 2013.
View at: Publisher Site | Google Scholar
D. R. Luke, “Phase retrieval, what’s new,” SIAG/OPT Views and News, vol. 25, no. 1, pp. 1–5, 2017.
View at: Google Scholar
A. Auslender, “Numrrical methods for non-differentiable convex optimization,” in Mathematical Programming Studies, J. P. Vial and C. B. Nguyen, Eds., vol. 30, pp. 102–126, Springer, Berlin, Germany, 1987.
View at: Google Scholar
A. I. Chen and A. Ozdaglar, “A fast distributed proximal-gradient method,” in Proceedings of the 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 601–608, IEEE, Monticello, IL, USA, October 2012.
View at: Google Scholar
J. B. Hiriart-Urruty, “ε-subdifferentiel calculs. Convex analysis and optimization,” in Research Notes in Mathematics. Series, vol. 57, Pitman Publishers, London, UK, 1982.
View at: Google Scholar
K. Jiang, D. Sun, and K.-C. Toh, “An inexact accelerated proximal gradient method for large scale linearly constrained convex SDP,” SIAM Journal on Optimization, vol. 22, no. 3, pp. 1042–1064, 2012.
View at: Publisher Site | Google Scholar
B. Lemaire, “About the convergence of the proximal method , proceedings 6th French German conference on optimisation,” in 1991 Advances in Optimization Lecture Notes, pp. 39–51, Springer, Berlin, Germany, 1992.
View at: Google Scholar
B. Lemaire, “The proximal algorithm,” International Series of Numerical Mathematics, vol. 87, pp. 73–87, 1989.
View at: Google Scholar
B. Martinet, “Perturbation des méthodes d’optimisation-Application,” Modélisation Mathématique et Analyse Numérique, vol. 12, no. 2, pp. 153–171, 1976.
View at: Google Scholar
N. Parikh, “Proximal algorithms,” Foundations and Trends in Optimization, vol. 1, no. 3, pp. 127–239, 2014.
View at: Publisher Site | Google Scholar
M. Schmidt, N. L. Roux, and F. R. Bach, “Convergence rates of inexact proximal-gradient methods for convex optimization,” in Advances in Neural Information Processing Systems, vol. 28, pp. 1458–1466, MIT Press, Cambridge, MA, USA, 2011.
View at: Google Scholar
N. D. Vanli, M. Gürbüzbalaban, and A. Ozdaglar, “Global convergence rate of proximal incremental aggregated gradient methods,” SIAM Journal on Optimization, vol. 28, no. 2, pp. 1282–1300, 2018.
View at: Publisher Site | Google Scholar
L. Xiao and T. Zhang, “A proximal stochastic gradient method with progressive variance reduction,” SIAM Journal on Optimization, vol. 24, no. 4, pp. 2057–2075, 2014.
View at: Publisher Site | Google Scholar
J. Bolte, S. Sabach, M. Teboulle, and Y. Vaisbourd, “First order methods beyond convexity and Lipschitz gradient continuity with applications to quadratic inverse problems,” SIAM Journal on Optimization, vol. 28, no. 3, pp. 2131–2151, 2018.
View at: Publisher Site | Google Scholar
L. M. Bregman, “The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming finding the common point of convex sets and its application to the solution of problems in conve programming,” USSR Computational Mathematics and Mathematical Physics, vol. 7, no. 3, pp. 200–217, 1967.
View at: Publisher Site | Google Scholar
J. Eckstein, “Nonlinear proximal point algorithms using Bregman functions, with applications to convex programming,” Mathematics of Operations Research, vol. 18, no. 1, pp. 202–226, 1993.
View at: Publisher Site | Google Scholar
S. Kabbadj, “Theoretical aspect of diagonal bregman proximal methods,” Journal of Applied Mathematics, vol. 2020, Article ID 3108056, 9 pages, 2020.
View at: Publisher Site | Google Scholar
S. Kabbadj, “Méthodes proximales entropiques,” Université Montpellier, II, Montpellier, France, 1994, Thèse.
View at: Google Scholar
R. T. Rockafellar, “Monotone operators and the proximal point algorithm,” SIAM Journal on Control and Optimization, vol. 14, no. 5, pp. 877–898, 1975.
View at: Google Scholar
M. Teboulle and C. Gong, “Convergence analysis of a proximal-like minimization algorithm using Bregman function,” SIAM Journal on Optimization, vol. 3, no. 3, pp. 538–543, 1993.
View at: Google Scholar
D. H. Gutman and J. F. Pena, “Unified framework for bregman proximal methods,subgradient, gradient, and accelerated gradient schemes,” 2018, https://arxiv.org/pdf/1812.10198.pdf.
View at: Google Scholar
R. T. Rockafellar, Convex Analysis, Princeton University Press, Princeton, NJ, USA, 1970.
A. Auslender and M. Teboulle, “Interior gradient and proximal methods for convex and conic optimization,” SIAM Journal on Optimization, vol. 16, no. 3, pp. 697–725, 2006.
View at: Publisher Site | Google Scholar
M. Bertero, P. Boccacci, G. Desiderà, and G. Vicidomini, “Image deblurring with Poisson data: from cells to galaxies,” Inverse Problems, vol. 25, no. 12, 2009.
View at: Publisher Site | Google Scholar
I. Csiszar, “Why least squares and maximum entropy? An axiomatic approach to inference for linear inverse problems,” The Annals of Statistics, vol. 19, no. 4, pp. 2032–2066, 1991.
View at: Publisher Site | Google Scholar
A. Nitanda, “Stochastic proximal gradient descent with acceleration techniques,” in Advances in Neural Information Processing Systems, pp. 1574–1582, MIT Press, Cambridge, MA, USA, 2014.
View at: Google Scholar

Copyright

Copyright © 2020 S. Kabbadj. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

906

Downloads

1482

Citations