Mathematical Problems in Engineering

Mathematical Problems in Engineering / 2020 / Article
Special Issue

Machine Learning and its Applications in Image Restoration

View this Special Issue

Research Article | Open Access

Volume 2020 |Article ID 8873507 |

Sha Lu, Zengxin Wei, "Convergence Analysis on an Accelerated Proximal Point Algorithm for Linearly Constrained Optimization Problems", Mathematical Problems in Engineering, vol. 2020, Article ID 8873507, 13 pages, 2020.

Convergence Analysis on an Accelerated Proximal Point Algorithm for Linearly Constrained Optimization Problems

Academic Editor: Weijun Zhou
Received01 Sep 2020
Revised17 Oct 2020
Accepted20 Oct 2020
Published10 Nov 2020


Proximal point algorithm is a type of method widely used in solving optimization problems and some practical problems such as machine learning in recent years. In this paper, a framework of accelerated proximal point algorithm is presented for convex minimization with linear constraints. The algorithm can be seen as an extension to Gler’s methods for unconstrained optimization and linear programming problems. We prove that the sequence generated by the algorithm converges to a KKT solution of the original problem under appropriate conditions with the convergence rate of .

1. Introduction

Proximal point algorithm (abbr. PPA) is a type of method widely used in solving optimization problems, fixed point problems, maximal monotone operator problems, and so on. The framework of the proximal point method is closely related to many algorithms. It even can do more interpretation and generalization of some other methods and algorithms. In recent years, combining the idea of the proximal point method or the proximal terms with some existing algorithms shows that it can improve the performance of the original algorithms in a certain extent. The main step of the PPA is to compute a subproblem consisting of proximal point operator. In some practical problems with suitable conditions, the proximal point subproblems may be expressed as convex optimization problems with smaller scales or better properties which make them even have closed form solutions. In recent years, the proximal point method, together with its relative models, and several types of PPAs have been used in machine learning, image recognition, signal processing, and so on [13].

The original PPA can be seen in Martinet [4, 5] about the research of the fixed point problem and the variational inequality. For any given in a Hilbert space, Martinet [4] gave an iterative sequence for variational inequality problem by a PPA based on the following inclusion relation:where is a sequence of positive integers and is a maximal monotone operator. Later, in the unconstrained minimization of , where is a proper lower semicontinuous convex function in a Hilbert space, Rockafellar [6, 7] presented a more practical PPA which had the iteration as

If , for all , it was proved that the PPA weakly converges to a zero solution of the maximal monotone operator . And when , the convergence rate is superlinear. In 1991, Gler [8] further discussed the PPA and its convergent property. He proved that, under weaker conditions, that is, when satisfies , the PPA is convergent. Furthermore, its global convergence rate can be given as

For solving the convex programming problem with Lipschitz continuous gradients, Nesterov [9] gave an accelerated gradient method, which added a step to the gradient descent direction in order to reduce the times of iterations and obtained the iterative estimation of general first-order methods. The idea of acceleration was then introduced into other optimization algorithms, such as Gler [10], who gave two new PPAs by adding auxiliary point series and adopting acceleration strategy. Compared with (3), his accelerated algorithms obtained better convergence rate:

Gler [11] also applied this method to linear programming and proposed new augmented Lagrangian multiplier algorithms. While the iteration subproblems were solved inexactly, the algorithms not only maintained the global convergent property but also provided faster global convergence rates and can terminate in finite iterations to obtain the primal and dual optimal solutions. In the literature of Birge et al. [12], the authors further generalized Gler’s accelerated proximal point method to unconstrained nonsmooth convex optimization problems. They gave a model algorithm and then presented a family of PPAs which generated under a different rule from the classical PPA. Under weaker conditions, the estimation of the global convergence rate is obtained. They also discussed the application of PPA on stochastic programming and the variant proximal bundle method. More research studies on the proximal point method can be found in the review of Parikh and Boyd [13]. In recent years, much effort has been made to accelerate various first-order methods and the convergence analysis for linearly constrained convex optimization. For example, Ke and Ma [14] proposed an accelerated augmented Lagrangian method for solving the linearly constrained convex programming and showed its convergence rate is . Xu [15] proposed two accelerated methods for solving structured linearly constrained convex programming and discussed their convergence rate under different conditions. Zhang et al. [16] applied the proximal method of multipliers for equality constrained optimization problems and proved that, under linear independence constraint qualification and the second-order sufficiency optimality condition, it is linearly convergent. And when the penalty parameter increases to , it is superlinear.

Now, we consider an accelerated PPA for solving convex optimization problem with linear equality constraints:where is a proper lower semicontinuous (not necessary smooth) convex function and , . Our main work is as follows. (1) By using the Lagrange function, the KKT system, and the original-dual relations, we construct appropriate auxiliary sequences by quadratic convex functions and auxiliary points . Then, we update and the Lagrange multipliers, respectively, to extend the accelerated PPA to general convex optimization problems. (2) In the extended algorithm, the parameter which is related to the convergence rate is updated with an introduced constant . The update of in Gler’s algorithm can be seen as a special case with . (3) When the iteration subproblems are solved exactly, the algorithm has the convergence rate of in terms of the objective residual of the associated Lagrange function.

The remaining parts of this paper are organized as follows. In Section 2, a framework of the accelerated PPA is presented for constrained optimization problem (5). In Section 3, the global convergence is established under mild assumptions. In Section 4, the convergence rate based on the function values is given. In Section 5, we conclude this paper with some remarks.

2. An Accelerated PPA for Constrained Convex Optimization

For an unconstrained optimization problem,the classical PPA generates the next iteration point by

And the main accelerating idea of Nesterov [9] and Gler [10] is to construct a sequence of auxiliary quadratic convex functions which can be seen as estimations of but with better functional states. While increases, the difference between the auxiliary functions and the original objective function is compressed such that, for any ,

Then, in each iteration of the algorithm, it produces satisfyingwhere is the minimal point of . Thus, while the minimal point exists, since is proper, it obtains

Since , the sequence minimizes and the speed that converges to is related to .

The key steps in the accelerated algorithm are the construction of the auxiliary quadratic convex functions with suitable estimation on , the producing of satisfying (15), and the selection of the compressing factor . Inspired by Nesterov’s estimate sequences and the PPA given by [10], we consider the constrained optimization problem (5). It is known that the dual problem of (5) iswhere is the Fenchel conjugate function of . And the augmented Lagrangian function associated with (5) is defined aswhere is the multiplier and is a penalty parameter. To simplify the discussion, we denote the proximal point about function at a given as

Firstly, we construct a series of quadratic regular functions for (12) before the accelerated PPA is given. For given , and , let

Since is convex and is a quadratic regular function, for any , can be written in the canonical form:where means the minimizer of .

On the contrary, from (15), we have

Comparing the second terms of (16) and (17), it implies

It is not hard to see that the minimum values and the minimizers between two adjacent quadratic functions have the following relationships:and

If we generate the iteration points by for , the following lemma shows the relationship between the Lagrange function, the proximal point , the constructed quadratic estimation function , and its minimizer .

Lemma 1. Suppose for , , , , , and are defined as previously mentioned. Let ,

Then, we have

Proof. We prove it by induction. While , (22) is obvious. Now, assumeSince is convex, is strong convex, from the definition of ; then, we haveBy convexity, for any , it hasThen, by (23),Substitute (18) and (26) into (20), it impliesNotice that, in the last term of (27), by using , it turns outThus, by (20), we gainSubstituting this formula into (27), it deducesBy using of , , and , we haveFrom Lemma 1, can be seen as an upper bound estimation of the Lagrange function on . , , , and all can be updated by explicit formulas. Thus, we give the accelerated PPA for the constrained optimization problem (5) as follows.
Accelerated PPA for constrained convex optimization :(i)Step 0: let . Choose .Let .(ii)Step 1: let ,(iii)Step 2: compute the proximal point and the updating(iv)Step 3: let and go to Step 1.

Remark 1. It is obvious that by the definition. Actually, from (10), the convergence of the algorithm relates to . In the later Theorem 1, it needs to assume the nonzero lower bound such that . In particular, when , it obtains .

Remark 2. In the iterations, can be computed from (34) or (35), which has the same result though they have different quadratic terms. We will use these two different subproblem forms in later discussion on convergence for convenience. One is used to prove the sequence converges to the KKT solution and the other (35), which is more suitable for measuring changes in function values and the constraints, is used in the analysis on the convergence rate.

Remark 3. In the problem (35), scaled proximal terms also can be used such as , where is a positive semidefinite matrix. For some structured function and appropriately chosen , the augmented term can be linearized, and it makes (35) have a closed form solution. Or in practice, we can solve the problem inexactly, but it may not retain convergence rate, which we will show in the following sections under the same assumptions. Further discussion will be out of the scope of this paper, and we leave the discussion on inexactly solving cases and their numerical experiments to another paper.

3. Global Convergence

Assumption 1. Assume that is a proper lower semicontinuous convex function and is a feasible point of (5).
From Corollaries 28.2.2 and 28.3.1 of [17], under Assumption 1, the KKT set of (5) is nonempty. Moreover, is the solution of (5) iff there exists such that satisfies the KKT system:Denote , , and . We have the lemmas as follows.

Lemma 2. Suppose that is generated by C-APPA and is a KKT solution of (5). Then, for , it has

Proof. By the first-order optimization condition of (34) and together with (36), we haveSince is proper lower semicontinuous convex function and by Theorem 12.17 of Rockalellar and Wets [18], is a maximal monotone operator and there exists a self-adjoint positive semidefinite linear operator such that, for any with and ,Notice that and satisfy the KKT system, and it impliesBy using of (36) and (39), it is easy to know thatThus,By (36) and (39) in the algorithm, we haveandThen,From the basic relations,it deducesSince , we gain (41) by (53).

Lemma 3. Suppose that is generated by C-APPA and is a KKT solution of (5). Then, for it has

Proof. Fromand together with (42), (43), and (45), we haveThen,By (55), it deducesFrom (50) and (55), we can obtainAlso by using of (50), it hasSince for all and by (59), (60), and (50), it deducesThus, (54) is true.
Now, for , we construct the following and turn to the analysis of the convergence of the algorithm:The theorem shows that the sequence decreases monotonically under appropriate conditions, and then C-APPA is convergent since for all .

Theorem 1. Suppose the solution set of (5) is nonempty. Denote as one solution satisfying the KKT system. Let be defined as (62) and is generated by C-APPA. Then,Furthermore, for all , if there exists and such that and , then

Proof. From Lemmas 1 and 2, we can substitute (49), (41), and (54) into the left side of inequality (48). Sincethen by using of (62) and (48), we obtainwhich deduces the sequence is nonnegative and monotone nonincreasing. When the sequence is infinite, are all bounded. Let , and by (66), we haveSince and are the lower bounds of and , respectively, it can be known that , , , , , , and are all bounded. By and , it has is bounded. Also, is bounded. We denote the convergent subsequence of asBy (42) and (43), it hasThen, by taking limits of on both sides of (69) simultaneously, it deducesAssume for general that . Thus, by the definition of , there must exist subsequence such thatBecause is nonincreasing and bounded, converges to 0. By (62), it hasfor . Thus, by