Research Article | Open Access
A General Proximal Alternating Minimization Method with Application to Nonconvex Nonsmooth 1D Total Variation Denoising
We deal with a class of problems whose objective functions are compositions of nonconvex nonsmooth functions, which has a wide range of applications in signal/image processing. We introduce a new auxiliary variable, and an efficient general proximal alternating minimization algorithm is proposed. This method solves a class of nonconvex nonsmooth problems through alternating minimization. We give a brilliant systematic analysis to guarantee the convergence of the algorithm. Simulation results and the comparison with two other existing algorithms for 1D total variation denoising validate the efficiency of the proposed approach. The algorithm does contribute to the analysis and applications of a wide class of nonconvex nonsmooth problems.
In the past few years, increasing attentions have been paid to convex optimization problems, which consist of minimizing a sum of convex or smooth functions [1–3]. Each of the objective functions enjoys several appreciated properties, like strong convexity, Lipschitz continuity, or other convex conditions. These properties can usually lead to great advantage in computing. Meanwhile works on such convex problems have provided a sound theoretic foundation. Both of the theoretic and computing advantages created many benefits to practical use. It is particularly noticeable in signal/image processing, machine learning, computer vision, and so forth. However, what deserves the special attention is the fact that the convex or smooth models are always approximations of nonconvex nonsmooth problems. For example, nonconvex-norm in sparse recovery problems is routinely relaxed as convex-norm, and many related works were developed [4, 5]. Although the difference between the nonconvex nonsmooth problem and its approximations vanishes in certain case, it is nonnegligible sometimes, like the problem in paper . On the other hand, excellent numerical performances of various nonconvex nonsmooth algorithms inspire researchers to continue on their directions to the nonconvex methodology.
Nonconvex and nonsmooth convex optimization problems are ubiquitous in different disciplines, including signal denoising , image deconvolution [8, 9], or other ill-posed inverse problems , to name a few. In this paper, we aim at solving the following generic nonconvex nonsmooth optimization problem, formulated in real Hilbert spaces and, for some:where (i) andare proper lower semicontinuous functions; (ii) the operatorsare linear; and (iii) the set of minimizers is supposed to be nonempty.
It is quite meaningful to find a common convergent point in the optimal set of sums of simple functions [2, 11]. Insightful studies for nonconvex problems are presented in [12, 13]: if nonconvex structured functions of the type has the Kurdyka-Lojasiewicz (K-L) property, then each bounded sequence generated by a proximal alternating minimization algorithm converges to a critical point of. Our main work is mainly based on this convergence result and introduces a generic proximal minimization method for the general model (1). Note that if some of the functions in (1) are zeros and linear operatoris the identity, the model would reduce to common one in compressed sensing . The model we consider is the generalization of many application problems, such as the common “lasso” problem , and composition in papers [2, 16]. Here, we provide a few of the typical examples.
Example 1 (1D total variation minimization ). In this application, we need to solve the following denoising problem:where is the input signal, , counts the number of nonzero elements of , and is defined as derivation of original signal.
Noise removal is the basis and requisite of other subsequential applications and algorithm dealing with total variation (TV) regularizer; this regularizer is of great importance since it can efficiently deal with noisy signals which have sparse derivatives (or gradients), for instance, piecewise constant (PWC) signal that has flat sections with a number of abrupt jumps. The 1D total variation minimization can be extended to related 2-dimension image restoration.
Example 2 (group lasso ). In this application, one needs to solvewhereare the decision block variables andwithis the corresponding block size.
In the past few years, researches on structural sparse signal recovery have been very popular and group lasso is typical one of those important problems. It attracts many attentions in face recognition, multiband signal processing, and other machine learning problems. The general case is also applied to many other kinds of structural sparse recovery problems, like-minimization  and block sparse recovery .
Example 3 (image deconvolution ). In this application, one needs to solvewhere and . The discrete total variation, denoted byin (4), is defined as follows. We definematrix: and the discrete gradient operatoris defined as Then we have .
The concept of deconvolution finds lots of applications in signal processing and image processing [21–23]. In this paper, it would just be considered as a specific case to problem (1), although paying attention to this problem (4) with other details is equally important.
The main difficulty in solving (1) lies in thatis coupled by. In order to surmount the computation barrier, we introduce a new auxiliary variable and split the problem into two sequences of subproblems, minutely described in the next section. Then our problem is an extension of problem given in paper . The paper aims at giving a generic proximal alternating minimization method for a class of nonconvex nonsmooth problem (1), to be applied in many fields. The motivation is introduction of auxiliary variable and splitting the original problem into two kinds of easier nonconvex nonsmooth subproblems. Recent studies often give the regularizationa reasonable assumption; namely, the proximal map ofis easy to calculate. Then convergence analysis can be extended by the context of the present work . In the last section, we show application to nonconvex nonsmooth 1D total variation signal denoising.
In order to reduce computation complexity caused by composite of nonconvex functionand operator, we introduce a sequence of auxiliary variables. Then, the problem in (1) can be represented equivalently as follows: for eachwhereare represented as a concise wholeand The last term ensures the high similarity betweenand). And this quadratic function minimization can be easily solved. Hence, the original complex composite is split into two simpler objectives.
Apparently, now this form could be solved by a series of alternating proximal minimization methods : for each,
We then make the following assumptions about (9a) and (9b): The assumption is weakly required and can be easily meet. In the first section, the given three examples all satisfy the first two conditions, and the third condition holds when proper parameters are set in practical tests. Besides, each of the proximal termsand in (9a) and (9b) is used to meet the decrease condition of objective function. If we omit them, the algorithm can also perform well but can not build direct general convergence theories. At that time, it reduces to alternating projection minimization method.
3. Convergence Results
In fact, now the problem is a proximal alternating minimization case, whose global convergence has been detailedly analyzed in paper . The paper mainly concentrates on theory analysis of problemwith the following form:And ifhas the Kurdyka-Lojasiewicz (K-L) property, then each bounded sequence generated by the above algorithm converges to a critical point of. Even convergence rate of the algorithm can be computed, which depends on the geometrical properties of the functionaround its critical points. It is remarkable that assumption with K-L property can be verified in many common functions.
The convergence difference between algorithm of paper  and ours is that our minimization objective is not two variablesbutand a sequence of variables. In this section, our work is to give similar consequence of algorithms (9a) and (9b).
Definition 4 (subdifferentials [24, 25]). Letbe a proper and lower semicontinuous function. (1)For a given, the Fréchet subdifferential ofat , written as, is the set of all vectorswhich satisfy When, we set.(2)The “limiting” subdifferential, or simply the subdifferential, ofat, written as, is defined through the following closure process:
A necessary condition for to be a minimizer ofisA point that satisfies (13) is called limiting-critical or simply critical point. The set of critical points ofis denoted by.
Corollary 5. Assume sequencesare generated by (9a) and (9b) under assumption , and then they are well defined. Moreover, consider the following:(i)The following estimate holds: hencedose not increase.(ii) hence .(iii)For, we havewhere aboveis a multivector with the same dimension of.
Besides, for all bounded subsequenceof,
Corollary 6. Assume that hold. Letbe a sequence complying with (9a) and (9b). Letdenote the set (possibly empty) of its limit points. Then (i)ifis bounded, thenis a nonempty compact connected set andas(ii),(iii)is finite and constant on, equal to.
The above proposition gives some convergence results about sequences generated by (7), (9a), and (9b). Point (ii) guarantees that all limiting points produced by (9a) and (9b) must be limiting-critical points. And (iii) gives the point that objectiveconverges to the finite and constant.
3.2. Convergence to a Critical Point
Letbe a proper lower semicontinuous function. For, let us set. Then we give an important definition in the optimization theory.
Definition 7 (K-L property ). The functionis said to have the K-L property atif there exist, a neighborhoodofand a continuous concave functionsuch that (i)(ii)ison(iii)for all, ;(iv)for all, the Kurdyka-Lojasiewicz inequality holds:
If we justify that a function has K-L property, we should estimate. Many convex functions, for instance, satisfy the above property withand. Besides, a lot of nonconvex examples are also given in paper .
Below, we will give convergence analysis to critical point.
Theorem 8 (convergence). Assume thatsatisfies and has the Kurdyka-Lojasiewicz property at each point of the domain of. Then (i)either tends to infinity,(ii)oris, that is, and, as a consequence,converges to a critical point of.
The above theorem’s proof is based on the same analysis process in paper , so here we just present the convergence results but omit their proofs.
4. Application to 1D TV Denoising
In practical scientific and engineering contexts, noise removal is the basis and requisite of other subsequential applications. It has received extensive attentions. A range of computational algorithms have been proposed to solve the denoising problem [26–28]. Among these solvers, total variation (TV) regularizer is of great importance since it can efficiently deal with noisy signals that have sparse derivatives (or gradients). For instance, piecewise constant (PWC)  signal with noise, whose derivative is sparse relative to signal dimension, could be denoised by powerful TV denoising method.
In 1D TV denoising problem , one needs to solve model (2). TV denoising minimizes a composite of two parts. The first part is to keep the error, between the observed data and the original, as small as possible. The second is devoted to minimizing the sparsity of the gradients. Usually, denoising model is defined as one combination of a quadratic data fidelity term and a convex regularization term or a differential regularization, for example, convex but nonsmooth problem or differential but nonconvex problem where. Exact solution of the above two types can be obtained by very fast direct algorithms [7, 30]. In fact, convex-norm is the replacement of nonconvex-norm in (19) since convex optimization techniques have been deeply studied. The latter, like logarithmic penalty and arctangent penalty, can be solved by MM update iteration, in which total objective function (including data fidelity and regularization terms) should meet strictly convex condition.
In this test, we apply our algorithms (9a) and (9b) to this example. Auxiliary variableis introduced to reduce complexity of the composition. Then (2) is represented asApparently this problem satisfies the convergence conditions . Concrete steps by algorithms (9a) and (9b) are shown in
Computation of (22b). Apparently, the latter (22b) could be rewritten as a proximal operator of function ; that is,. Consider the proximal operator. When, norm is reduced to, where one easily establishes that
Whenis arbitrary, trivial algebraic manipulations are given, with: and thusis a perfectly known object.
Total variation denoising examples with three convex and nonconvex regularization instances (the two others are convex and nonconvex but smooth algorithms in [7, 30], resp.) are figured in Figure 1. Original piece signal datawith lengthis obtained with MakeSignal in paper . The noisy data is obtained using additive white Gaussian noise (AWGN) (). For both convex and nonconvex cases, we set, consistent with the range suggested in  for standard (convex) TV denoising and nonconvexity parameter is set to its maximal value,default in . These settings could lead to the best denoising result in their papers. All the other settings are consistent with paper . The maximum iteration numbers are all. All the codes are tested in the same computer.
According to the comparison between our algorithm for TV-norm and the proposed algorithms in papers [7, 30], our algorithm has better result with smaller Root-Mean-Square-Error (RMSE), where. Referring to Figure 1, the best RMSE results for 1D TV denoising with convexpenalty  and smoothpenalty  are 0.2720 and 0.2611, respectively. And ours is 0.1709, much better than the convex and smooth cases.
Nonconvex nonsmooth algorithm finds many interesting applications in many fields. In this paper, we give a general proximal alternating minimization method for a kind of nonconvex nonsmooth problems with complex composition. It has concise form, good theory results, and promising numerical result. For specific 1D standard TV denoising problem, the improvement is more dramatic compared to the existing algorithms [7, 30]. Besides, our algorithm works on other nonconvex nonsmooth problems, such as block sparse recovery, group lasso, and image deconvolution, of which the examples are just too numerous to mention.
The authors declare that they have no competing interests.
The work is supported in part by National Natural Science Foundation of China, no. 61571008.
- P. L. Combettes and J.-C. Pesquet, “Primal-dual splitting algorithm for solving inclusions with mixtures of composite, Lipschitzian, and parallel-sum type monotone operators,” Set-Valued and Variational Analysis, vol. 20, no. 2, pp. 307–330, 2012.
- L. Condat, “A primal-dual splitting method for convex optimization involving Lipschitzian, proximable and linear composite terms,” Journal of Optimization Theory and Applications, vol. 158, no. 2, pp. 460–479, 2013.
- J. E. Esser, Primal dual algorithms for convex models and applications to image restoration, registration and nonlocal inpainting [Ph.D. thesis], University of California, Los Angeles, Calif, USA, 2010.
- Y. B. Zhao and D. Li, “Reweighted -minimization for sparse solutions to underdetermined linear systems,” SIAM Journal on Optimization, vol. 22, no. 3, pp. 1065–1088, 2012.
- J. Wright and Y. Ma, “Dense error correction via-minimization,” IEEE Transactions on Information Theory, vol. 56, no. 7, pp. 3540–3560, 2010.
- T. Sun, H. Zhang, and L. Cheng, “Subgradient projection for sparse signal recovery with sparse noise,” Electronics Letters, vol. 50, no. 17, pp. 1200–1202, 2014.
- I. W. Selesnick, A. Parekh, and I. Bayram, “Convex 1-D total variation denoising with non-convex regularization,” IEEE Signal Processing Letters, vol. 22, no. 2, pp. 141–144, 2015.
- L. He and S. Schaefer, “Mesh denoising via L0 minimization,” ACM Transactions on Graphics, vol. 32, no. 4, article 64, 2013.
- L. Xu, C. Lu, Y. Xu, and J. Jia, “Image smoothing via gradient minimization,” ACM Transactions on Graphics, vol. 30, no. 6, article 174, 2011.
- C. Brandt, H.-P. Seidel, and K. Hildebrandt, “Optimal spline approximation via -minimization,” Computer Graphics Forum, vol. 34, no. 2, pp. 617–626, 2015.
- H. Zhang, L. Cheng, and W. Yin, “A dual algorithm for a class of augmented convex models,” https://arxiv.org/abs/1308.6337
- H. Attouch, J. Bolte, P. Redont, and A. Soubeyran, “Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka-Lojasiewicz inequality,” Mathematics of Operations Research, vol. 35, no. 2, pp. 438–457, 2010.
- H. Attouch, J. Bolte, and B. F. Svaiter, “Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods,” Mathematical Programming, vol. 137, no. 1-2, pp. 91–129, 2013.
- J. Bolte, S. Sabach, and M. Teboulle, “Proximal alternating linearized minimization for nonconvex and nonsmooth problems,” Mathematical Programming, vol. 146, no. 1-2, pp. 459–494, 2014.
- R. Tibshirani, “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistical Society. Series B (Methodological), vol. 73, no. 3, pp. 267–288, 1996.
- G. Chen and M. Teboulle, “A proximal-based decomposition method for convex minimization problems,” Mathematical Programming, vol. 64, no. 1–3, pp. 81–101, 1994.
- L. Meier, S. Van De Geer, and P. Bhlmann, “The group Lasso for logistic regression,” Journal of the Royal Statistical Society. Series B. Statistical Methodology, vol. 70, no. 1, pp. 53–71, 2008.
- X. Liao, H. Li, and L. Carin, “Generalized alternating projection for weighted-2, 1 minimization with applications to model-based compressive sensing,” SIAM Journal on Imaging Sciences, vol. 7, no. 2, pp. 797–823, 2014.
- E. Elhamifar and R. Vidal, “Block-sparse recovery via convex optimization,” IEEE Transactions on Signal Processing, vol. 60, no. 8, pp. 4094–4107, 2012.
- L. Condat, “A generic proximal algorithm for convex optimization—application to total variation minimization,” IEEE Signal Processing Letters, vol. 21, no. 8, pp. 985–989, 2014.
- T. F. Chan and C.-K. Wong, “Total variation blind deconvolution,” IEEE Transactions on Image Processing, vol. 7, no. 3, pp. 370–375, 1998.
- L. He, A. Marquina, and S. J. Osher, “Blind deconvolution using TV regularization and Bregman iteration,” International Journal of Imaging Systems and Technology, vol. 15, no. 1, pp. 74–83, 2005.
- N. Dey, L. Blanc-Feraud, C. Zimmer et al., “Richardson-Lucy algorithm with total variation regularization for 3D confocal microscope deconvolution,” Microscopy Research and Technique, vol. 69, no. 4, pp. 260–266, 2006.
- B. S. Mordukhovich, Variational Analysis and Generalized Differentiation I: Basic Theory, Springer Science & Business Media, Berlin, Germany, 2006.
- R. T. Rockafellar and R. J. B. Wets, Variational Analysis, Springer Science & Business Media, Berlin, Germany, 2009.
- L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,” Physica D: Nonlinear Phenomena, vol. 60, no. 1–4, pp. 259–268, 1992.
- M. Lysaker, A. Lundervold, and X.-C. Tai, “Noise removal using fourth-order partial differential equation with applications to medical magnetic resonance images in space and time,” IEEE Transactions on Image Processing, vol. 12, no. 12, pp. 1579–1590, 2003.
- M. Lysaker, S. Osher, and X.-C. Tai, “Noise removal using smoothed normals and surface fitting,” IEEE Transactions on Image Processing, vol. 13, no. 10, pp. 1345–1357, 2004.
- M. A. Little and N. S. Jones, “Generalized methods and solvers for noise removal from piecewise constant signals. I. Background theory,” Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, vol. 467, no. 2135, pp. 3088–3114, 2011.
- L. Dumbgen and A. Kovac, “Extensions of smoothing via taut strings,” Electronic Journal of Statistics, vol. 3, pp. 41–75, 2009.
Copyright © 2016 Xiaoya Zhang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.