Abstract

We deal with a class of problems whose objective functions are compositions of nonconvex nonsmooth functions, which has a wide range of applications in signal/image processing. We introduce a new auxiliary variable, and an efficient general proximal alternating minimization algorithm is proposed. This method solves a class of nonconvex nonsmooth problems through alternating minimization. We give a brilliant systematic analysis to guarantee the convergence of the algorithm. Simulation results and the comparison with two other existing algorithms for 1D total variation denoising validate the efficiency of the proposed approach. The algorithm does contribute to the analysis and applications of a wide class of nonconvex nonsmooth problems.

1. Introduction

In the past few years, increasing attentions have been paid to convex optimization problems, which consist of minimizing a sum of convex or smooth functions [13]. Each of the objective functions enjoys several appreciated properties, like strong convexity, Lipschitz continuity, or other convex conditions. These properties can usually lead to great advantage in computing. Meanwhile works on such convex problems have provided a sound theoretic foundation. Both of the theoretic and computing advantages created many benefits to practical use. It is particularly noticeable in signal/image processing, machine learning, computer vision, and so forth. However, what deserves the special attention is the fact that the convex or smooth models are always approximations of nonconvex nonsmooth problems. For example, nonconvex-norm in sparse recovery problems is routinely relaxed as convex-norm, and many related works were developed [4, 5]. Although the difference between the nonconvex nonsmooth problem and its approximations vanishes in certain case, it is nonnegligible sometimes, like the problem in paper [6]. On the other hand, excellent numerical performances of various nonconvex nonsmooth algorithms inspire researchers to continue on their directions to the nonconvex methodology.

Nonconvex and nonsmooth convex optimization problems are ubiquitous in different disciplines, including signal denoising [7], image deconvolution [8, 9], or other ill-posed inverse problems [10], to name a few. In this paper, we aim at solving the following generic nonconvex nonsmooth optimization problem, formulated in real Hilbert spaces and, for some:where (i) andare proper lower semicontinuous functions; (ii) the operatorsare linear; and (iii) the set of minimizers is supposed to be nonempty.

It is quite meaningful to find a common convergent point in the optimal set of sums of simple functions [2, 11]. Insightful studies for nonconvex problems are presented in [12, 13]: if nonconvex structured functions of the type has the Kurdyka-Lojasiewicz (K-L) property, then each bounded sequence generated by a proximal alternating minimization algorithm converges to a critical point of. Our main work is mainly based on this convergence result and introduces a generic proximal minimization method for the general model (1). Note that if some of the functions in (1) are zeros and linear operatoris the identity, the model would reduce to common one in compressed sensing [14]. The model we consider is the generalization of many application problems, such as the common “lasso” problem [15], and composition in papers [2, 16]. Here, we provide a few of the typical examples.

Example 1 (1D total variation minimization [7]). In this application, we need to solve the following denoising problem:where is the input signal, , counts the number of nonzero elements of , and is defined as derivation of original signal.

Noise removal is the basis and requisite of other subsequential applications and algorithm dealing with total variation (TV) regularizer; this regularizer is of great importance since it can efficiently deal with noisy signals which have sparse derivatives (or gradients), for instance, piecewise constant (PWC) signal that has flat sections with a number of abrupt jumps. The 1D total variation minimization can be extended to related 2-dimension image restoration.

Example 2 (group lasso [17]). In this application, one needs to solvewhereare the decision block variables andwithis the corresponding block size.

In the past few years, researches on structural sparse signal recovery have been very popular and group lasso is typical one of those important problems. It attracts many attentions in face recognition, multiband signal processing, and other machine learning problems. The general case is also applied to many other kinds of structural sparse recovery problems, like-minimization [18] and block sparse recovery [19].

Example 3 (image deconvolution [20]). In this application, one needs to solvewhere and . The discrete total variation, denoted byin (4), is defined as follows. We definematrix: and the discrete gradient operatoris defined as Then we have .

The concept of deconvolution finds lots of applications in signal processing and image processing [2123]. In this paper, it would just be considered as a specific case to problem (1), although paying attention to this problem (4) with other details is equally important.

The main difficulty in solving (1) lies in thatis coupled by. In order to surmount the computation barrier, we introduce a new auxiliary variable and split the problem into two sequences of subproblems, minutely described in the next section. Then our problem is an extension of problem given in paper [12]. The paper aims at giving a generic proximal alternating minimization method for a class of nonconvex nonsmooth problem (1), to be applied in many fields. The motivation is introduction of auxiliary variable and splitting the original problem into two kinds of easier nonconvex nonsmooth subproblems. Recent studies often give the regularizationa reasonable assumption; namely, the proximal map ofis easy to calculate. Then convergence analysis can be extended by the context of the present work [12]. In the last section, we show application to nonconvex nonsmooth 1D total variation signal denoising.

2. Algorithm

In order to reduce computation complexity caused by composite of nonconvex functionand operator, we introduce a sequence of auxiliary variables. Then, the problem in (1) can be represented equivalently as follows: for eachwhereare represented as a concise wholeand The last term ensures the high similarity betweenand). And this quadratic function minimization can be easily solved. Hence, the original complex composite is split into two simpler objectives.

Apparently, now this form could be solved by a series of alternating proximal minimization methods [12]: for each,

We then make the following assumptions about (9a) and (9b): The assumption is weakly required and can be easily meet. In the first section, the given three examples all satisfy the first two conditions, and the third condition holds when proper parameters are set in practical tests. Besides, each of the proximal termsand in (9a) and (9b) is used to meet the decrease condition of objective function. If we omit them, the algorithm can also perform well but can not build direct general convergence theories. At that time, it reduces to alternating projection minimization method.

3. Convergence Results

In fact, now the problem is a proximal alternating minimization case, whose global convergence has been detailedly analyzed in paper [12]. The paper mainly concentrates on theory analysis of problemwith the following form:And ifhas the Kurdyka-Lojasiewicz (K-L) property, then each bounded sequence generated by the above algorithm converges to a critical point of. Even convergence rate of the algorithm can be computed, which depends on the geometrical properties of the functionaround its critical points. It is remarkable that assumption with K-L property can be verified in many common functions.

The convergence difference between algorithm of paper [12] and ours is that our minimization objective is not two variablesbutand a sequence of variables. In this section, our work is to give similar consequence of algorithms (9a) and (9b).

3.1. Preliminary

Definition 4 (subdifferentials [24, 25]). Letbe a proper and lower semicontinuous function. (1)For a given, the Fréchet subdifferential ofat , written as, is the set of all vectorswhich satisfy When, we set.(2)The “limiting” subdifferential, or simply the subdifferential, ofat, written as, is defined through the following closure process:

A necessary condition for to be a minimizer ofisA point that satisfies (13) is called limiting-critical or simply critical point. The set of critical points ofis denoted by.

Being given, recall that sequence generated by (9a) and (9b) is of the form. According to the basic properties of, we can deduce a few important points.

Corollary 5. Assume sequencesare generated by (9a) and (9b) under assumption , and then they are well defined. Moreover, consider the following:(i)The following estimate holds: hencedose not increase.(ii)   hence .(iii)For, we havewhere aboveis a multivector with the same dimension of.
Besides, for all bounded subsequenceof,

Corollary 6. Assume that hold. Letbe a sequence complying with (9a) and (9b). Letdenote the set (possibly empty) of its limit points. Then (i)ifis bounded, thenis a nonempty compact connected set andas(ii),(iii)is finite and constant on, equal to.

The above proposition gives some convergence results about sequences generated by (7), (9a), and (9b). Point (ii) guarantees that all limiting points produced by (9a) and (9b) must be limiting-critical points. And (iii) gives the point that objectiveconverges to the finite and constant.

3.2. Convergence to a Critical Point

This part gives more precise convergence analysis about the proximal algorithms (9a) and (9b).

Letbe a proper lower semicontinuous function. For, let us set. Then we give an important definition in the optimization theory.

Definition 7 (K-L property [12]). The functionis said to have the K-L property atif there exist, a neighborhoodofand a continuous concave functionsuch that (i)(ii)ison(iii)for all, ;(iv)for all, the Kurdyka-Lojasiewicz inequality holds:

If we justify that a function has K-L property, we should estimate. Many convex functions, for instance, satisfy the above property withand. Besides, a lot of nonconvex examples are also given in paper [12].

Below, we will give convergence analysis to critical point.

Theorem 8 (convergence). Assume thatsatisfies and has the Kurdyka-Lojasiewicz property at each point of the domain of. Then (i)either tends to infinity,(ii)oris, that is, and, as a consequence,converges to a critical point of.

The above theorem’s proof is based on the same analysis process in paper [12], so here we just present the convergence results but omit their proofs.

4. Application to 1D TV Denoising

In practical scientific and engineering contexts, noise removal is the basis and requisite of other subsequential applications. It has received extensive attentions. A range of computational algorithms have been proposed to solve the denoising problem [2628]. Among these solvers, total variation (TV) regularizer is of great importance since it can efficiently deal with noisy signals that have sparse derivatives (or gradients). For instance, piecewise constant (PWC) [29] signal with noise, whose derivative is sparse relative to signal dimension, could be denoised by powerful TV denoising method.

In 1D TV denoising problem [10], one needs to solve model (2). TV denoising minimizes a composite of two parts. The first part is to keep the error, between the observed data and the original, as small as possible. The second is devoted to minimizing the sparsity of the gradients. Usually, denoising model is defined as one combination of a quadratic data fidelity term and a convex regularization term or a differential regularization, for example, convex but nonsmooth problem [30]or differential but nonconvex problem [7]where. Exact solution of the above two types can be obtained by very fast direct algorithms [7, 30]. In fact, convex-norm is the replacement of nonconvex-norm in (19) since convex optimization techniques have been deeply studied. The latter, like logarithmic penalty and arctangent penalty, can be solved by MM update iteration, in which total objective function (including data fidelity and regularization terms) should meet strictly convex condition.

In this test, we apply our algorithms (9a) and (9b) to this example. Auxiliary variableis introduced to reduce complexity of the composition. Then (2) is represented asApparently this problem satisfies the convergence conditions [12]. Concrete steps by algorithms (9a) and (9b) are shown in

In fact, when testsandcan be set as very large constants, the last proximal termsand in each computing step could be negligible. Hence, the computation of (22a) and (22b) is as follows.

Computation of (22a). The former step (22a) is a quadratic function and could be computed through many techniques, like gradient descent.

Computation of (22b). Apparently, the latter (22b) could be rewritten as a proximal operator of function [12]; that is,. Consider the proximal operator. When, norm is reduced to, where one easily establishes that

Whenis arbitrary, trivial algebraic manipulations are given, with: and thusis a perfectly known object.

Total variation denoising examples with three convex and nonconvex regularization instances (the two others are convex and nonconvex but smooth algorithms in [7, 30], resp.) are figured in Figure 1. Original piece signal datawith lengthis obtained with MakeSignal in paper [7]. The noisy data is obtained using additive white Gaussian noise (AWGN) (). For both convex and nonconvex cases, we set, consistent with the range suggested in [30] for standard (convex) TV denoising and nonconvexity parameter is set to its maximal value,default in [7]. These settings could lead to the best denoising result in their papers. All the other settings are consistent with paper [7]. The maximum iteration numbers are all. All the codes are tested in the same computer.

According to the comparison between our algorithm for TV-norm and the proposed algorithms in papers [7, 30], our algorithm has better result with smaller Root-Mean-Square-Error (RMSE), where. Referring to Figure 1, the best RMSE results for 1D TV denoising with convexpenalty [21] and smoothpenalty [7] are 0.2720 and 0.2611, respectively. And ours is 0.1709, much better than the convex and smooth cases.

5. Conclusion

Nonconvex nonsmooth algorithm finds many interesting applications in many fields. In this paper, we give a general proximal alternating minimization method for a kind of nonconvex nonsmooth problems with complex composition. It has concise form, good theory results, and promising numerical result. For specific 1D standard TV denoising problem, the improvement is more dramatic compared to the existing algorithms [7, 30]. Besides, our algorithm works on other nonconvex nonsmooth problems, such as block sparse recovery, group lasso, and image deconvolution, of which the examples are just too numerous to mention.

Competing Interests

The authors declare that they have no competing interests.

Acknowledgments

The work is supported in part by National Natural Science Foundation of China, no. 61571008.