Convergence Analysis on an Accelerated Proximal Point Algorithm for Linearly Constrained Optimization Problems

Lu, Sha; Wei, Zengxin

doi:https://doi.org/10.1155/2020/8873507

Mathematical Problems in Engineering

On this page

Abstract Introduction Conclusions Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Special Issue

Machine Learning and its Applications in Image Restoration

View this Special Issue

Research Article | Open Access

Volume 2020 | Article ID 8873507 | https://doi.org/10.1155/2020/8873507

Convergence Analysis on an Accelerated Proximal Point Algorithm for Linearly Constrained Optimization Problems

Sha Lu^1,2and Zengxin Wei³

Academic Editor: Weijun Zhou

Received01 Sept 2020

Revised17 Oct 2020

Accepted20 Oct 2020

Published10 Nov 2020

Abstract

Proximal point algorithm is a type of method widely used in solving optimization problems and some practical problems such as machine learning in recent years. In this paper, a framework of accelerated proximal point algorithm is presented for convex minimization with linear constraints. The algorithm can be seen as an extension to Gler’s methods for unconstrained optimization and linear programming problems. We prove that the sequence generated by the algorithm converges to a KKT solution of the original problem under appropriate conditions with the convergence rate of .

1. Introduction

Proximal point algorithm (abbr. PPA) is a type of method widely used in solving optimization problems, fixed point problems, maximal monotone operator problems, and so on. The framework of the proximal point method is closely related to many algorithms. It even can do more interpretation and generalization of some other methods and algorithms. In recent years, combining the idea of the proximal point method or the proximal terms with some existing algorithms shows that it can improve the performance of the original algorithms in a certain extent. The main step of the PPA is to compute a subproblem consisting of proximal point operator. In some practical problems with suitable conditions, the proximal point subproblems may be expressed as convex optimization problems with smaller scales or better properties which make them even have closed form solutions. In recent years, the proximal point method, together with its relative models, and several types of PPAs have been used in machine learning, image recognition, signal processing, and so on [1–3].

The original PPA can be seen in Martinet [4, 5] about the research of the fixed point problem and the variational inequality. For any given in a Hilbert space, Martinet [4] gave an iterative sequence for variational inequality problem by a PPA based on the following inclusion relation:where is a sequence of positive integers and is a maximal monotone operator. Later, in the unconstrained minimization of , where is a proper lower semicontinuous convex function in a Hilbert space, Rockafellar [6, 7] presented a more practical PPA which had the iteration as

If , for all , it was proved that the PPA weakly converges to a zero solution of the maximal monotone operator . And when , the convergence rate is superlinear. In 1991, Gler [8] further discussed the PPA and its convergent property. He proved that, under weaker conditions, that is, when satisfies , the PPA is convergent. Furthermore, its global convergence rate can be given as

For solving the convex programming problem with Lipschitz continuous gradients, Nesterov [9] gave an accelerated gradient method, which added a step to the gradient descent direction in order to reduce the times of iterations and obtained the iterative estimation of general first-order methods. The idea of acceleration was then introduced into other optimization algorithms, such as Gler [10], who gave two new PPAs by adding auxiliary point series and adopting acceleration strategy. Compared with (3), his accelerated algorithms obtained better convergence rate:

Gler [11] also applied this method to linear programming and proposed new augmented Lagrangian multiplier algorithms. While the iteration subproblems were solved inexactly, the algorithms not only maintained the global convergent property but also provided faster global convergence rates and can terminate in finite iterations to obtain the primal and dual optimal solutions. In the literature of Birge et al. [12], the authors further generalized Gler’s accelerated proximal point method to unconstrained nonsmooth convex optimization problems. They gave a model algorithm and then presented a family of PPAs which generated under a different rule from the classical PPA. Under weaker conditions, the estimation of the global convergence rate is obtained. They also discussed the application of PPA on stochastic programming and the variant proximal bundle method. More research studies on the proximal point method can be found in the review of Parikh and Boyd [13]. In recent years, much effort has been made to accelerate various first-order methods and the convergence analysis for linearly constrained convex optimization. For example, Ke and Ma [14] proposed an accelerated augmented Lagrangian method for solving the linearly constrained convex programming and showed its convergence rate is . Xu [15] proposed two accelerated methods for solving structured linearly constrained convex programming and discussed their convergence rate under different conditions. Zhang et al. [16] applied the proximal method of multipliers for equality constrained optimization problems and proved that, under linear independence constraint qualification and the second-order sufficiency optimality condition, it is linearly convergent. And when the penalty parameter increases to , it is superlinear.

Now, we consider an accelerated PPA for solving convex optimization problem with linear equality constraints:where is a proper lower semicontinuous (not necessary smooth) convex function and , . Our main work is as follows. (1) By using the Lagrange function, the KKT system, and the original-dual relations, we construct appropriate auxiliary sequences by quadratic convex functions and auxiliary points . Then, we update and the Lagrange multipliers, respectively, to extend the accelerated PPA to general convex optimization problems. (2) In the extended algorithm, the parameter which is related to the convergence rate is updated with an introduced constant . The update of in Gler’s algorithm can be seen as a special case with . (3) When the iteration subproblems are solved exactly, the algorithm has the convergence rate of in terms of the objective residual of the associated Lagrange function.

The remaining parts of this paper are organized as follows. In Section 2, a framework of the accelerated PPA is presented for constrained optimization problem (5). In Section 3, the global convergence is established under mild assumptions. In Section 4, the convergence rate based on the function values is given. In Section 5, we conclude this paper with some remarks.

2. An Accelerated PPA for Constrained Convex Optimization

For an unconstrained optimization problem,the classical PPA generates the next iteration point by

And the main accelerating idea of Nesterov [9] and Gler [10] is to construct a sequence of auxiliary quadratic convex functions which can be seen as estimations of but with better functional states. While increases, the difference between the auxiliary functions and the original objective function is compressed such that, for any ,

Then, in each iteration of the algorithm, it produces satisfyingwhere is the minimal point of . Thus, while the minimal point exists, since is proper, it obtains

Since , the sequence minimizes and the speed that converges to is related to .

The key steps in the accelerated algorithm are the construction of the auxiliary quadratic convex functions with suitable estimation on , the producing of satisfying (15), and the selection of the compressing factor . Inspired by Nesterov’s estimate sequences and the PPA given by [10], we consider the constrained optimization problem (5). It is known that the dual problem of (5) iswhere is the Fenchel conjugate function of . And the augmented Lagrangian function associated with (5) is defined aswhere is the multiplier and is a penalty parameter. To simplify the discussion, we denote the proximal point about function at a given as

Firstly, we construct a series of quadratic regular functions for (12) before the accelerated PPA is given. For given , and , let

Since is convex and is a quadratic regular function, for any , can be written in the canonical form:where means the minimizer of .

On the contrary, from (15), we have

Comparing the second terms of (16) and (17), it implies

It is not hard to see that the minimum values and the minimizers between two adjacent quadratic functions have the following relationships:and

If we generate the iteration points by for , the following lemma shows the relationship between the Lagrange function, the proximal point , the constructed quadratic estimation function , and its minimizer .

Lemma 1. Suppose for , , , , , and are defined as previously mentioned. Let ,

Then, we have

Proof. We prove it by induction. While , (22) is obvious. Now, assumeSince is convex, is strong convex, from the definition of ; then, we haveBy convexity, for any , it hasThen, by (23),Substitute (18) and (26) into (20), it impliesNotice that, in the last term of (27), by using , it turns outThus, by (20), we gainSubstituting this formula into (27), it deducesBy using of , , and , we haveFrom Lemma 1, can be seen as an upper bound estimation of the Lagrange function on . , , , and all can be updated by explicit formulas. Thus, we give the accelerated PPA for the constrained optimization problem (5) as follows.
Accelerated PPA for constrained convex optimization :(i)Step 0: let . Choose . Let .(ii)Step 1: let ,(iii)Step 2: compute the proximal point and the updating(iv)Step 3: let and go to Step 1.

Remark 1. It is obvious that by the definition. Actually, from (10), the convergence of the algorithm relates to . In the later Theorem 1, it needs to assume the nonzero lower bound such that . In particular, when , it obtains .

Remark 2. In the iterations, can be computed from (34) or (35), which has the same result though they have different quadratic terms. We will use these two different subproblem forms in later discussion on convergence for convenience. One is used to prove the sequence converges to the KKT solution and the other (35), which is more suitable for measuring changes in function values and the constraints, is used in the analysis on the convergence rate.

Remark 3. In the problem (35), scaled proximal terms also can be used such as , where is a positive semidefinite matrix. For some structured function and appropriately chosen , the augmented term can be linearized, and it makes (35) have a closed form solution. Or in practice, we can solve the problem inexactly, but it may not retain convergence rate, which we will show in the following sections under the same assumptions. Further discussion will be out of the scope of this paper, and we leave the discussion on inexactly solving cases and their numerical experiments to another paper.

3. Global Convergence

Assumption 1. Assume that is a proper lower semicontinuous convex function and is a feasible point of (5).
From Corollaries 28.2.2 and 28.3.1 of [17], under Assumption 1, the KKT set of (5) is nonempty. Moreover, is the solution of (5) iff there exists such that satisfies the KKT system:Denote , , and . We have the lemmas as follows.

Lemma 2. Suppose that is generated by C-APPA and is a KKT solution of (5). Then, for , it has

Proof. By the first-order optimization condition of (34) and together with (36), we haveSince is proper lower semicontinuous convex function and by Theorem 12.17 of Rockalellar and Wets [18], is a maximal monotone operator and there exists a self-adjoint positive semidefinite linear operator such that, for any with and ,Notice that and satisfy the KKT system, and it impliesBy using of (36) and (39), it is easy to know thatThus,By (36) and (39) in the algorithm, we haveandThen,From the basic relations,it deducesSince , we gain (41) by (53).

Lemma 3. Suppose that is generated by C-APPA and is a KKT solution of (5). Then, for it has

Proof. Fromand together with (42), (43), and (45), we haveThen,By (55), it deducesFrom (50) and (55), we can obtainAlso by using of (50), it hasSince for all and by (59), (60), and (50), it deducesThus, (54) is true.
Now, for , we construct the following and turn to the analysis of the convergence of the algorithm:The theorem shows that the sequence decreases monotonically under appropriate conditions, and then C-APPA is convergent since for all .

Theorem 1. Suppose the solution set of (5) is nonempty. Denote as one solution satisfying the KKT system. Let be defined as (62) and is generated by C-APPA. Then,Furthermore, for all , if there exists and such that and , then

Proof. From Lemmas 1 and 2, we can substitute (49), (41), and (54) into the left side of inequality (48). Sincethen by using of (62) and (48), we obtainwhich deduces the sequence is nonnegative and monotone nonincreasing. When the sequence is infinite, are all bounded. Let , and by (66), we haveSince and are the lower bounds of and , respectively, it can be known that , , , , , , and are all bounded. By and , it has is bounded. Also, is bounded. We denote the convergent subsequence of asBy (42) and (43), it hasThen, by taking limits of on both sides of (69) simultaneously, it deducesAssume for general that . Thus, by the definition of , there must exist subsequence such thatBecause is nonincreasing and bounded, converges to 0. By (62), it hasfor . Thus, by , it obtainsAlso, by , we haveFromwe get .

4. The Convergence Rate of C-APPA

Suppose that (34) is solved exactly in the iteration and we denote . Let

Lemma 4. Suppose that , , and are generated by algorithm C-APPA. Then, for all , it haswhere

Proof. By (78), it hasBecause is the minimizer of (34), we haveThen, by the definition of , it impliesFrom the convexity of , for any , we haveSubstituting it into (81), then (79) can be proved.
The following lemma shows that, for any , if is the minimizer of and we write as the canonical formthen is the upper bound of on .

Lemma 5. Suppose that and are defined as (76)–(78), respectively, and the other sequences are generated by algorithm C-APPA. Then, , and for all , it has

Proof. We prove it by induction. By (77) and , (86) is true for . Now, suppose for , . Then, by (78), (85), (37), and (38), it impliesSince is convex and , thenSubstituting it into (87) and by using of (33), we haveNotice that and are defined as (37) and (33), respectively, and it deduceswhich meansThe following bounded conclusion of is similar to the one of given in Lemma 2.2 by Gler [10], but it introduces a parameter in order to have more flexibility. So, here we omit the proving detail for concision.

Lemma 6. Let . Then, for all ,Now, we give the main convergence conclusion of C-APPA.

Theorem 2. Suppose that the solution set of (5) is nonempty and bounded. Denote as a solution satisfying the KKT system and . Let be an initial feasible point and is generated by algorithm C-APPA.(1)Denote . Then, it has(2)If, for any , there exists and such that and , then

Proof. Let , andIt can be seen thatBy (79), it deducesAlso by Lemma 5, we haveThen, by using of (76)–(78), it obtainsFrom (81), we havewhich obtainsBy (36) and (39), we deduceThus,From (99), it hasWhen the assumptions are true, converges to a certain KKT solution from Theorem 1. Furthermore, from Lemma 6, it obtainsWhile , it holds that

5. Conclusions

In this paper, a framework of accelerated PPA is presented for constrained convex optimization. The algorithm can be seen as an extension to Gler’s methods for unconstrained optimization and linear programming problems. Different from [11], where the accelerated proximity point method is used on the dual problem of the linear programming, our algorithm directly uses the accelerating technique both on the proximal point operator and the Lagrange multiplier for the original problem in the iterations by constructing appropriate auxiliary sequences with the quadratic convex functions and auxiliary points. We prove that the sequence generated by the algorithm converges to a KKT solution under suitable conditions. Also, it retains a similar convergence rate on objective function values.

This paper has paid more attention on analyzing the convergence of the accelerated PPA for solving convex minimization with linear constrains. In the iterations of C-APPA, computing the proximal mapping of is required. Usually, the proximal point of is hard to know exactly. Then, series of , and other auxiliary quantities being updated and calculated by a nonprecision rule should be considered. Several inexactly accelerated PPAs, the convergent properties under some proper conditions, and the numerical results are going to be given in another paper of ours. And more research on different inexact conditions, their further applications, and testings in practice would be further carried out in the future.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest regarding the publication of this paper.

Acknowledgments

The authors express the gratitude for the cooperation of Congzuo Municipal Bureau of Statistics during this research. Also, this work was supported in part by the Key Project of Guangxi Provincial Natural Science Foundation under Grants 2017GXNSFDA198014, 2018GXNSFDA050014, and 2019AC04004.

References

P. Ravikumar, A. Agarwal, and M. Wainwright, “Message-passing for graphstructured linear programs: proximal methods and rounding schemes,” Journal of Machine Learning Research, vol. 11, pp. 1043–1080, 2010.
View at: Google Scholar
N. Pustelnik, J.-C. Pesquet, and C. Chaux, “Relaxing tight frame condition in parallel proximal methods for signal restoration,” IEEE Transactions on Signal Processing, vol. 60, no. 2, pp. 968–973, 2012.
View at: Publisher Site | Google Scholar
J. Bai, J. Li, P. Dai et al., “General parameterized proximal point algorithm with applications in statistical learning,” International Journal of Computer Mathematics, vol. 96, no. 1-4, pp. 199–215, 2019.
View at: Publisher Site | Google Scholar
B. Martinet, “Régularisation d’Inéquations variation nelles par approximations successives,” French Journal of Computing and Operational Research, vol. 4, 1970.
View at: Google Scholar
B. Martinet, “Determination approche d’un point fixe d’une application pseudo-contractante,” Comptes rendus de l’Académie des Sciences, vol. 274A, pp. 163–165, 1972.
View at: Google Scholar
R. Rockafellar, “Monotone operators and the proximal point algorithm,” SIAM Journal on Control and Optimization, vol. 14, no. 5, pp. 877–898, 1976.
View at: Publisher Site | Google Scholar
R. Rockafellar, “Augmented Lagrangians and applications of the proximal point algorithm in convex programming,” Mathematics of Operations Research, vol. 1, no. 2, pp. 97–116, 1976.
View at: Publisher Site | Google Scholar
O. GÜler, “On the convergence of the proximal point algorithm for convex minimization,” SIAM Journal on Control and Optimization, vol. 29, no. 2, pp. 403–419, 1991.
View at: Google Scholar
Y. Nesterov, “A method for solving the convex programming problem with convergence rate ,” Doklady Akademii Nauk: SSSR, vol. 269, no. 3, pp. 543–547, 1983.
View at: Google Scholar
O. GÜler, “New proximal point algorithms for convex minimization,” SIAM Journal on Optimization, vol. 2, no. 4, pp. 649–664, 1992.
View at: Google Scholar
O. GÜler, “Augmented Lagrangian algorithms for linear programming,” Journal of Optimization Theory and Applications, vol. 75, no. 3, pp. 445–470, 1992.
View at: Google Scholar
J. Birge, L. Qi, and Z. Wei, “A general approach to convergence properties of some methods for nonsmooth convex optimization,” Applied Mathematics and Optimization, vol. 38, no. 2, pp. 141–158, 1998.
View at: Google Scholar
N. Parikh and S. Boyd, “Proximal algorithms,” Foundations and Trends in Optimization, vol. 1, no. 3, pp. 127–239, 2014.
View at: Publisher Site | Google Scholar
Y.-f. Ke and C.-f. Ma, “An accelerated augmented Lagrangian method for linearly constrained convex programming with the rate of convergence ,” Applied Mathematics-A Journal of Chinese Universities, vol. 32, no. 1, pp. 117–126, 2017.
View at: Publisher Site | Google Scholar
Y. Xu, “Accelerated first-order primal-dual proximal methods for linearly constrained composite convex programming,” SIAM Journal on Optimization, vol. 27, no. 3, pp. 1459–1484, 2017.
View at: Publisher Site | Google Scholar
Y. Zhang, J. Wu, and L. Zhang, “The rate of convergence of proximal method of multipliers for equality constrained optimization problems,” Optimization Letters, vol. 14, no. 6, p. 1599, 2019.
View at: Publisher Site | Google Scholar
R. Rockafellar, Convex Analysis, Princeton University Press, Princeton, NY, USA, 1970.
R. Rockafellar and R. Wets, Variational Analysis, Springer, Berlin, Germany, 1998.

Copyright

Copyright © 2020 Sha Lu and Zengxin Wei. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

439

Downloads

550

Citations