An Accelerated Proximal Algorithm for the Difference of Convex Programming

Shen, Feichao; Zhang, Ying; Wang, Xueyong

doi:https://doi.org/10.1155/2021/9994015

Mathematical Problems in Engineering

On this page

Abstract Introduction Conclusions Data Availability Conflicts of Interest Authors’ Contributions Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2021 | Article ID 9994015 | https://doi.org/10.1155/2021/9994015

An Accelerated Proximal Algorithm for the Difference of Convex Programming

Feichao Shen,¹Ying Zhang,²and Xueyong Wang³

Academic Editor: Zhenbo Wang

Received04 Mar 2021

Revised08 Apr 2021

Accepted15 Apr 2021

Published26 Apr 2021

Abstract

In this paper, we propose an accelerated proximal point algorithm for the difference of convex (DC) optimization problem by combining the extrapolation technique with the proximal difference of convex algorithm. By making full use of the special structure of DC decomposition and the information of stepsize, we prove that the proposed algorithm converges at rate of under milder conditions. The given numerical experiments show the superiority of the proposed algorithm to some existing algorithms.

1. Introduction

Difference of convex problem (DCP) is an important kind of nonlinear programming problems in which the objective function is described as the difference of convex (DC) functions. It finds numerous applications in digital communication system [1], assignment and power allocation [2], compressed sensing [3–6], and so on [7–13].

It is well known that the method to solve the DCP is the so-called difference of the convex algorithm (DCA) [14] in which the concave part is replaced by a linear majorant in the objective function and a convex optimization subproblem needs to be solved at each iteration. Note that the difficulty of the involved subproblem relies heavily on the DC decomposition of the objective function, and it can be easily solved when the objective function can be written as the sum of a smooth convex function with Lipschitz gradient, a proper closed convex function, and a continuous concave function [15]. Motivated by this, Gotoh et al. [16] proposed the so-called proximal difference of the convex algorithm (PDCA) for solving DCP, in which not only the concave part is replaced by a linear majorant in each iteration but also the smooth convex part is replaced by some techniques. Furthermore, if the proximal mapping of the proper closed convex function can be easily computed, then the subproblem involved in the PDCA can be solved efficiently. However, when the concave part of the objective is void, the PDCA reduces to the proximal gradient algorithm which may be slow in computing [17]. In fact, since the convergence rate of the PDCA heavily depends on the Lojasiewicz exponent of the objective function, the PDCA converges linearly in general [18, 19]. To accelerate the convergence rate of the proximal difference of the convex algorithm, researchers recall the well-known extrapolation technique to design some efficient algorithms [20–24]. This technique has been extensively used in accelerating the proximal type algorithms for convex programming [25, 26], and the convergence rate of the algorithms can be improved from to . Motivated by this, Wen et al. [27] proposed the proximal difference of the convex algorithm with extrapolation (PDCAE) for solving the DCP. The numerical experiments [27] show that the PDCAE has a better performance although it converges linearly in theory [27]. Now, a question is posed naturally: can we propose new type of the PDCA in which the convergence rate can be improved in theory? This constitutes the motivation of the paper.

In this paper, inspired by the work in [20–23, 27], we establish an accelerated proximal DC programming algorithm (APDCA) for the DCP by combining the extrapolation technique and the PDCA. In the algorithm, the current iteration point is replaced by a linear combination of the previous two points, and extrapolation technique is involved in the stepsize. By making full use of the special structure of DC decomposition and the information of stepsize, we prove that the APDCA converges at rate of under milder conditions. The given numerical experiments show the superiority to some existing algorithms.

The remainder of the paper is organized as follows. In Section 2, we describe the DC optimization problem considered in this paper and present our new designed algorithm. In Section 3, we establish the global convergence and the quadratic convergence rate of the new designed algorithm. Some numerical experiments are provided in Section 4. Some conclusions are drawn in Section 5.

To end this section, we recall some definitions used in the subsequent analysis [28–30].

For an extended real valued function , we denote its domain by dom . The function is said to be strongly convex if there exists an such that for all , where is a convex set and is a identity matrix. The function is said to be proper if it never equals and dom . Moreover, a proper function is closed if it is lower semicontinuous. A proper closed function is said to be level-bounded if the lower level sets of are bounded; that is, are bounded for any . Given a proper closed function , the limiting subdifferential of at is given as follows:where mean and . Note that dom It is well known that the (limiting) subdifferential reduces to the classical subdifferential in convex analysis when is a convex function; that is,

Furthermore, if is continuously differentiable, then the (limiting) subdifferential reduces to the gradient of and denoted by .

2. Algorithms for DC Programming

Consider the following difference of convex programming:where is a strongly convex function with , is a smooth convex function, is Lipschitz continuous with , is a continuous convex function, and is Lipschitz continuous with .

For the DCP, the following is a classical DCA which takes the following iterative scheme [14]:

By replacing the concave part in the objective function by a linear majorant and replacing the smooth convex part by a quadratic majorant, Gotoh et al. [16] proposed a proximal DCA for the DCP. For the sake of completeness, we list Algorithm 1 as follows.

	Initial step. Take , , and .
	Iterative step. Compute the new iterate by the following iterative scheme:

	until is satisfied
where is the Lipschitz constant of .

Despite a simple subproblem is involved in the algorithm, the PDCA is potentially slow [19, 27]. To accelerate the convergence rate of the PDCA, we incorporate extrapolation technique into the PDCA to obtain the following algorithm (Algorithm 2).

	Initial step. Take , with , , , and .
	Iterative step. Compute the new iterate by the following iterative scheme:



	until is satisfied.

3. Convergence Analysis of the APDCA

In this section, we establish the global convergence of the algorithm and its convergence rate. To continue, we first recall the following conclusions.

Lemma 1. (see [25]). Let be a continuously differentiable function with Lipschitz continuity gradient whose Lipschitz constant . Then, for any , it holds that

Lemma 2. Let . For the sequence generated by the APDCA, it holds that

Proof. Since is strong convex function, there exists constant such thatwhere .
Connecting the fact that is Lipschitz continuous with constant with Lemma 1, we havewhere , which means thatIt follows from is convex function thatConnecting (7) and (9) with (10), we haveOn the other hand, since is convex, it follows thatwhich means thatConnecting the fact that is Lipschitz continuous with constant with Lemma 1, we havewhere . Summing (13) and (14), we haveAdding to both sides of (15) yieldsBy taking , (16) yields thatBy the optimality conditions of (8), one hasthat is,Then, for , it follows from (11) and (17) thatwhere the first equality follows from (19), the second equality follows from the fact that , and the last inequality follows from . We have conclusion (6).
Before proceeding further, we need the following conclusions.

Lemma 3. (see [25, 31]). Let . Then, the sequence generated by (6) is increasing, and

Lemma 4. Let be a sequence generated by the APDCA. Then,where , , and is the critical point of problem (3).

Proof. From (7) and (6), we have . Then, it follows thatHence, to show the assertion, we only need to show thatIn fact, by taking , one has from Lemma 2 thatHence,Using Lemma 2 again, one has from thatthat is,Multiplying (25) by and (27) by , respectively, and summing them yieldwhere the first equality follows from the fact that and the last equality follows by some manipulation. The desired result follows.
Now, we are ready to show the convergence rate of the APDCA.

Theorem 1. For the sequence generated by the APDCA, it holds thatwhere is a stationary point of (3).

Proof. Using the notations used in Lemma 4, let , and it follows from (27) thatHence,Then, from Lemma 4, we know that the sequence is nonincreasing. Therefore,where the second inequation follows from and , and the last equation follows from .
Then, it follows from Lemma 3 thatThe desired result follows.

4. Numerical Experiments

In this section, we evaluate the performance of the APDCA by applying it to the DC regularized least squares problem. We will compare the performance of the APDCA with the algorithm in [15] (PDCA) and GIST in [32].

On APDCA and PDCA, we set and . On GIST, we set . We initialize the three algorithms at the origin point and terminate the algorithms when

Furthermore, we terminate PDCA when the number of iteration is more than 5000 (denoted by “max” on the report).

Example 1. Least squares problems with regularizer are as follows:where , and is the regularization parameter.
This problem takes the form of (3) with , , and . Note that the purpose of adding is to ensure strong convexity of .
To compare the performance of the three algorithms, we report the number of iterations (denoted by Iter), CPU times in seconds (denoted by CPU time), the sparsity of the solution (denoted by sparsity), and the function values at termination (denoted by fval), averaged over the 30 random instances. The numerical results are reported in Tables 1 and 2, from which we can see that the APDCA always outperforms PDCA and GIST. Specifically, from Table 1, we can see that the APDCA is about 2.5 times faster than GIST and is about 5.2 times faster than PDCA for the parameter . From Table 2, we can see that the APDCA is about 2.1 times faster than GIST and is about 8.4 times faster than PDCA for the parameter . Tables 1 and 2 also show that the APDCA requires fewer iteration steps than the other two algorithms. Specifically, from Table 1, the iteration step of APDCA is about of GIST for the parameter . From Table 2, the iteration step of APDCA is about of GIST for the parameter . Meanwhile, Tables 1 and 2 also show that the solution given by APDCA is more sparse than that given by GIST and PDCA.

Example 2. Least squares problems with logarithmic regularizer are as follows:where is a constant, and is the regularization parameter.
This problem takes the form of (3) with , , and . Note that the purpose of adding is to ensure strong convexity of . For this example, we set .
To compare the performance of the three algorithms, we report the number of iterations (denoted by Iter), CPU times in seconds (denoted by CPU time), the sparsity of the solution (denoted by sparsity), and the function values at termination (denoted by fval), averaged over the 30 random instances. The numerical results are reported in Tables 3 and 4, from which we can see that the APDCA always outperforms PDCA and GIST. Specifically, from Table 3, we can see that the APDCA is about 1.9 times faster than GIST and is about 8.3 times faster than PDCA for the parameter . From Table 4, we can see that the APDCA is about 1.6 times faster than GIST and is about 11.3 times faster than PDCA for the parameter . Tables 3 and 4 also show that the APDCA requires fewer iteration steps than the other two algorithms. Specifically, from Table 3, the iteration step of APDCA is about of GIST for the parameter . From Table 4, the iteration step of APDCA is about of GIST and is about of PDCA for the parameter . Meanwhile, Tables 3 and 4 also show that the solution given by APDCA is more sparse than that given by GIST and PDCA.

5. Conclusions

In this paper, we propose an accelerated proximal point algorithm for the difference of convex optimization problem by combining the extrapolation technique with the proximal difference of the convex algorithm. By making full use of the special structure of difference of convex decomposition and the information of stepsize, we prove that the proposed algorithm converges at rate of under milder conditions. The given numerical experiments show the superiority of the proposed algorithm to some existing algorithms.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Authors’ Contributions

The authors equally contributed to this paper and read and approved the final manuscript.

Acknowledgments

This project was supported by the Natural Science Foundation of China (grants nos. 11801309, 11901343, and 12071249).

References

A. Alvarado, G. Scutari, and J.-S. Pang, “A new decomposition method for multiuser DC-programming and its applications,” IEEE Transactions on Signal Processing, vol. 62, no. 11, pp. 2984–2998, 2014.
View at: Publisher Site | Google Scholar
M. Sanjabi, M. Razaviyayn, and Z.-Q. Luo, “Optimal joint base station assignment and beamforming for heterogeneous networks,” IEEE Transactions on Signal Processing, vol. 62, no. 8, pp. 1950–1961, 2014.
View at: Publisher Site | Google Scholar
P. Yin, Y. Lou, Q. He, and J. Xin, “Minimization of $\ell_{1-2}$ for compressed sensing,” SIAM Journal on Scientific Computing, vol. 37, no. 1, pp. A536–A563, 2015.
View at: Publisher Site | Google Scholar
G. Wang, Y. Wang, and Y. Wang, “Some Ostrowski-type bound estimations of spectral radius for weakly irreducible nonnegative tensors,” Linear and Multilinear Algebra, vol. 68, no. 9, pp. 1817–1834, 2020.
View at: Publisher Site | Google Scholar
G. Wang, G. Zhou, G. Zhou, and L. Caccetta, “Z-eigenvalue inclusion theorems for tensors,” Discrete & Continuous Dynamical Systems-B, vol. 22, no. 1, pp. 187–198, 2017.
View at: Publisher Site | Google Scholar
G. Wang, Y. Zhang, and Y. Zhang, “\begin {document} $ Z $\end {document} -eigenvalue exclusion theorems for tensors,” Journal of Industrial & Management Optimization, vol. 16, no. 4, pp. 1987–1998, 2020.
View at: Publisher Site | Google Scholar
W. De Oliveira, “Proximal bundle methods for nonsmooth DC programming,” Journal of Global Optimization, vol. 75, no. 2, pp. 523–563, 2019.
View at: Publisher Site | Google Scholar
D. Feng, M. Sun, and X. Wang, “A family of conjugate gradient methods for large-scale nonlinear equations,” Journal of Inequalities and Applications, vol. 236, 2017.
View at: Publisher Site | Google Scholar
H. A. Le Thi and T. Pham Dinh, “DC programming in communication systems: challenging problems and methods,” Vietnam Journal of Computer Science, vol. 1, no. 1, pp. 15–28, 2014.
View at: Publisher Site | Google Scholar
H. A. Le Thi and T. Pham Dinh, “DC programming and DCA: thirty years of developments,” Mathematical Programming, vol. 169, no. 3, pp. 5–68, 2018.
View at: Publisher Site | Google Scholar
Z. Lu and Z. Zhou, “Nonmonotone enhanced proximal DC algorithms for a class of structured nonsmooth DC programming,” SIAM Journal on Optimization, vol. 29, no. 4, pp. 2725–2752, 2019.
View at: Publisher Site | Google Scholar
Y. Lou, T. Zeng, S. Osher, and J. Xin, “A weighted difference of anisotropic and isotropic total variation model for image processing,” SIAM Journal on Imaging Sciences, vol. 8, no. 3, pp. 1798–1823, 2015.
View at: Publisher Site | Google Scholar
X. Wang, “Alternating proximal penalization algorithm for the modified multiple-sets split feasibility problems,” Journal of Inequalities and Applications, vol. 2018, p. 48, 2018.
View at: Publisher Site | Google Scholar
D. T. Pham and H. A. Le Thi, “Convex analysis approach to DC programming:theory, algorithms and applications,” Acta Mathematica Vietnamica, vol. 22, pp. 289–355, 1997.
View at: Google Scholar
D. T. Pham and H. A. Le Thi, “A D. C. Optimization algorithm for solving the trust-region subproblem,” SIAM Journal on Optimization, vol. 8, pp. 476–505, 1998.
View at: Publisher Site | Google Scholar
J.-Y. Gotoh, A. Takeda, and K. Tono, “DC formulations and algorithms for sparse optimization problems,” Mathematical Programming, vol. 169, no. 1, pp. 141–176, 2018.
View at: Publisher Site | Google Scholar
B. O’ Donoghue and E. J. Candes, “Adaptive restart for accelerated gradient schemes,” Foundations of Computational Mathematics, vol. 15, pp. 715–732, 2015.
View at: Publisher Site | Google Scholar
H. A. Le Thi, V. N. Huynh, and T. Pham Dinh, “Convergence analysis of difference-of-convex algorithm with subanalytic data,” Journal of Optimization Theory and Applications, vol. 179, no. 1, pp. 103–126, 2018.
View at: Publisher Site | Google Scholar
X. Wang, Y. Zhang, H. Chen, and X. Kou, “Convergence rate analysis of the proximal difference of convex algorithm,” Mathematical Problem in Engineering, vol. 2021, Article ID 5629868, 5 pages, 2021.
View at: Publisher Site | Google Scholar
Y. Nesterov, “A method of solving a convex programming problem with convergence rate ,” Proceedings of the USSR Academy of Sciences, vol. 269, pp. 543–547, 1983.
View at: Google Scholar
Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course, Springer, Berlin, Germany, 2004.
Y. Nesterov, “Dual extrapolation and its applications to solving variational inequalities and related problems,” Mathematical Programming, vol. 109, no. 2-3, pp. 319–344, 2007.
View at: Publisher Site | Google Scholar
Y. Nesterov, “Gradient methods for minimizing composite functions,” Mathematical Programming, vol. 140, no. 1, pp. 125–161, 2013.
View at: Publisher Site | Google Scholar
X. Wang, Y. Wang, Y. Wang, and G. Wang, “An accelerated augmented Lagrangian method for multi-criteria optimization problem,” Journal of Industrial & Management Optimization, vol. 16, no. 1, pp. 1–9, 2020.
View at: Publisher Site | Google Scholar
A. Beck and M. Teboulle, “A fast iterative shrinkage-thresholding algorithm for linear inverse problems,” SIAM Journal on Imaging Sciences, vol. 2, no. 1, pp. 183–202, 2009.
View at: Publisher Site | Google Scholar
A. Moudafi and A. Gibali, “ regularization of split feasibility problems,” Numerical Algorithms, vol. 78, no. 3, pp. 739–757, 2018.
View at: Publisher Site | Google Scholar
B. Wen, X. Chen, and T. K. Pong, “A proximal difference-of-convex algorithm with extrapolation,” Computational Optimization and Applications, vol. 69, no. 2, pp. 297–324, 2018.
View at: Publisher Site | Google Scholar
F. Facchinei and J. Pang, Finite Dimensional Variational Inequalities and Complementarity Problems, Springer, Berlin, Germany, 2003.
R. T. Rockafellar and R. J.-B. Wets, Variational Analysis, Springer, Berlin, Germany, 1998.
S. Boyd and L. Vandenberghe, Convex Optimization, Cambridge University Press, Cambridge, UK, 2004.
D. Goldfarb, S. Ma, and K. Scheinberg, “Fast alternating linearization methods for minimizing the sum of two convex functions,” Mathematical Programming, vol. 141, no. 1-2, pp. 349–382, 2013.
View at: Publisher Site | Google Scholar
P. Gong, C. Zhang, Z. Lu, J. Huang, and J. Ye, “A general iteraitve shrinkage and thresholding algorithm for nonconvex regularized optimization problems,” in Proceedings of the 2013 30 th International Conference on Machine Learning, Atlanta, GA, USA, 2013.
View at: Google Scholar

Copyright

Copyright © 2021 Feichao Shen et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

415

Downloads

597

Citations