Iterative Methods for Nonlinear Equations or Systems and Their Applications
View this Special IssueResearch Article  Open Access
New Nonsmooth EquationsBased Algorithms for Norm Minimization and Applications
Abstract
Recently, Xiao et al. proposed a nonsmooth equationsbased method to solve the norm minimization problem (2011). The advantage of this method is its simplicity and lower storage. In this paper, based on new nonsmooth equations reformulation, we investigate new nonsmooth equationsbased algorithms for solving norm minimization problems. Under mild conditions, we show that the proposed algorithms are globally convergent. The preliminary numerical results demonstrate the effectiveness of the proposed algorithms.
1. Introduction
We consider the norm minimization problem where , , , and is a nonnegative parameter. Throughout the paper, we use and to denote the Euclidean norm and the norm of vector , respectively. Problem (1.1) has many important practical applications, particularly in compressed sensing (abbreviated as CS) [1] and image restoration [2]. It can also be viewed as a regularization technique to overcome the illconditioned, or even singular, nature of matrix , when trying to infer from noiseless observations or from noisy observations , where is the white Gaussian noise of variance [3–5].
The convex optimization problem (1.1) can be cast as a secondorder cone programming problem and thus could be solved via interior point methods. However, in many applications, the problem is not only large scale but also involves dense matrix data, which often precludes the use and potential advantage of sophisticated interior point methods. This motivated the search of simpler firstorder algorithms for solving (1.1), where the dominant computational effort is a relatively cheap matrixvector multiplication involving and . In the past few years, several firstorder algorithms have been proposed. One of the most popular algorithms falls into the iterative shrinkage/thresholding (IST) class [6, 7]. It was first designed for waveletbased image deconvolution problems [8] and analyzed subsequently by many authors, see, for example, [9–11]. Figueiredo et al. [12] studied the gradient projection and BarzilaiBorwein method [13] (denoted by GPSRBB) for solving (1.1). They reformulated problem (1.1) as a boxconstrained quadratic program and solved it by a gradient projection and BarzilaiBorwein method. Wright et al. [14] presented sparse reconstruction algorithm (denoted by SPARSA) to solve (1.1). Yun and Toh [15] proposed a block coordinate gradient descent algorithm for solving (1.1). Yang and Zhang [16] investigated alternating direction algorithms for solving (1.1).
Quite recently, Xiao et al. [17] developed a nonsmooth equationsbased algorithm (called SGCS) for solving norm minimization problems in CS. They reformulated the boxconstrained quadratic program obtained by Figueiredo et al. [12] into a system of nonsmooth equations and then applied the spectral gradient projection method [18] to solving the nonsmooth equation. The main advantage of the SGCS is its simplicity and lower storage. The difference between the above algorithms and SGCS is that SGCS did not use line search to decrease the value of objective function at each iteration and instead used a projection step to accelerate the iterative process. However, each projection step in SGCS requires two matrixvector multiplication involving or , which means that each iteration requires matrixvector multiplication involving or four times, while each iteration in GPSRBB and IST is only two times. This may bring in more computational complexity. In addition, the dimension of the system of nonsmooth equations is , which is twice of the original problems. These drawbacks motivate us to study new nonsmooth equationsbased algorithms for the norm minimization problem.
In this paper, we first reformulate problem (1.1) into a system of nonsmooth equations. This system is Lipschitz continuous and monotone and many effective algorithms (see, e.g., [18–22]) can be used to solve it. We then apply spectral gradient projection (denoted by SGP) method [18] to solve the resulting system. Similar to SGCS, each iteration in SGP requires matrixvector multiplication involving or four times. In order to reduce the computational complexity, we also propose a modified SGP (denoted by MSGP) method to solve the resulting system. Under mild conditions, the global convergence of the proposed algorithms will be ensured.
The remainder of the paper is organized as follows. In Section 2, we first review some existing results of nonsmooth analysis and then derive an equivalent system of nonsmooth equations to problem (1.1). We verify some nice properties of the resulting system in this section. In Section 3, we propose the algorithms and establish their global convergence. In Section 4, we apply the proposed algorithms to some practical problems arising from compressed sensing and image restoration and compare their performance with that of SGCS, SPARSA, and GPSRBB.
Throughout the paper, we use to denote the inner product of two vectors in .
2. Preliminaries
By nonsmooth analysis, a necessary condition for a vector to be a local minima of nonsmooth function is that where denotes the subdifferential of at [23]. If is convex, then (2.1) is also sufficient for to be a solution of (1.1). The subdifferential of the absolute value function is given by the signum function , that is For problem (1.1), the optimality conditions therefore translate to where , . It is clear that the function defined by (1.1) is convex. Therefore a point is a solution of problem (1.1) if and only if it satisfies Formally, we call the above conditions the optimality conditions for problem (1.1).
For any given , we define a mapping by Then is a continuous mapping and is closely related to problem (1.1). It is generally not differentiable in the sense of Fréchet derivative but semismooth in the sense of Qi and Sun [24]. The following proposition shows that the norm minimization problem (1.1) is equivalent to a nonsmooth equation. It can be easily obtained by the use of the optimality conditions and the convexity of the function defined by (1.1).
Proposition 2.1. Let be any given constant. A point is a solution of problem (1.1) if and only if it satisfies
The above proposition has reformulated problem (1.1) as a system of nonsmooth equations. Compared with the nonsmooth equation reformulation in [17], the dimension of (2.6) is only half of the dimension of the equation in [17].
Given . It is easy to verify that (see, e.g. [25]) with It is clear that , . By (2.5), we have for any , it holds that where , . Define two diagonal matrixes and by Then we obtain Since , we get The next proposition shows the Lipschitz continuity of defined by (2.5).
Proposition 2.2. For each , there exists a positive constant such that
Proof. By (2.10) and (2.12), we have Let . Then (2.13) holds. The proof is complete.
The following proposition shows another good property of the system of nonsmooth equations (2.6).
Proposition 2.3. There exists a constant such that for any , the mapping is monotone, that is
Proof. Let be the th diagonal element of . It is clear that , . Set . Note that is symmetric and positive semidefinite. Consequently, for any , matrix is also positive semidefinite. Therefore, it follows from (2.12) that This completes the proof.
3. Algorithms and Their Convergence
In this section, we describe the proposed algorithms in detail and establish their convergence. Let be given. For simplicity, we omit and abbreviate as .
Algorithm 3.1 . (spectral gradient projection method (abbreviated as SGP)). Given initial point and constants , , , , . Set .
Step 1. Compute by where for each , is defined by with and . Stop if .
Step 2. Determine steplength with being the smallest nonnegative integer such that Set . Stop if .
Step 3. Compute Set , and go to Step 1.
Remark 3.2. (i) The idea of the above algorithm comes from [18]. The major difference between Algorithm 3.1 and the method in [18] lies in the definition of . The choice of in Step 1 follows from the modified BFGS method [26]. The purpose of the term is to make be closer to as tends to a solution of (2.6).
(ii) Step 3 is called the projection step. It is originated in [20]. The advantage of the projection step is to make closer to the solution set of (2.6) than . We refer to [20] for details.
(iii) Since , by the continuity of , it is easy to see that inequality (3.3) holds for all sufficiently large. Therefore Step 2 is well defined and so is Algorithm 3.1.
The following lemma comes from [20].
Lemma 3.3. Let be monotone and satisfy . Let Then for any satisfying , it holds that
The following theorem establishes the global convergence for Algorithm 3.1.
Theorem 3.4. Let be generated by Algorithm 3.1 and a solution of (2.6). Then one has In particular, is bounded. Furthermore, it holds that either is finite and the last iterate is a solution of the system of nonsmooth equations (2.6), or the sequence is infinite and . Moreover, converges to some solution of (2.6).
Proof. The proof is similar to that in [18]. We omit it here.
Remark 3.5. The computational complexity of each of SGP’s steps is clear. In largescale problems, most of the work is matrixvector multiplication involving and . Steps 1 and 2 of SGP require matrixvector multiplication involving or two times each, while each iteration in GPSRBB involves matrixvector multiplication only two times. This may bring in more computational complexity. Therefore, we give a modification of SGP. The modified algorithm, which will be called MSGP in the rest of the paper, coincides with SGP except at Step 3, whose description is given below.
Algorithm 3.6 (modified spectral gradient projection Method (abbreviated as MSGP)). Given initial point and constants , , , a positive integer . Set .
Step 3. Let . If is a positive integer, compute otherwise, let . Set , and go to Step 1.
Lemma 3.7. Assume that is a sequence generated by Algorithm 3.6 and satisfies . Let be the maximum eigenvalue of and . Then it holds that
Proof. Let be generated by (3.8). It follows from Lemma 3.3 that (3.9) holds. In the following, we assume that . Then, we obtain This together with (2.12) implies that Let . Then we get This completes the proof.
Now we establish a global convergence theorem for Algorithm 3.6.
Theorem 3.8. Let be the maximum eigenvalue of and . Assume that is generated by Algorithm 3.6 and is a solution of (2.6). Then one has In particular, is bounded. Furthermore, it holds that either is finite and the last iterate is a solution of the system of nonsmooth equations (2.6), or the sequence is infinite and . Moreover, converges to some solution of (2.6).
Proof. We first note that if the algorithm terminates at some iteration , then we have or . By the definition of , we have if . This shows that either or is a solution of (2.6).
Suppose that and for all . Then an infinite sequence is generated. It follows from (3.3) that
Let be an arbitrary solution of (2.6). By Lemmas 3.7 and 3.3, we obtain
where is a nonnegative integer. In particular, the sequence is nonincreasing and hence convergent. Moreover, the sequence is bounded, and
Following from (3.8) and (3.14), we have
This together with (3.16) yields
Now we consider the following two possible cases:(i);(ii).
If (i) holds, then by the continuity of and the boundedness of , it is clear that the sequence has some accumulation point such that . Since the sequence converges, it must hold that converges to .
If (ii) holds, then by the boundedness of and the continuity of , there exist a positive constant and a positive integer such that
On the other hand, from (3.2) and the definitions of and , we have
which together with (3.19) and Propositions 2.2 and 2.3 implies
Consequently, we obtain from (3.1), (3.19), and (3.21)
Therefore, it follows from (3.18) that . By the line search rule, we have for all sufficiently large, will not satisfy (3.3). This means
Since and are bounded, we can choose subsequences of and converging to and , respectively. Taking the limit in (3.23) for the subsequence, we obtain
However, it is not difficult to deduce from (3.1), (3.19), and (3.21) that
This yields a contradiction. Consequently, is not possible. The proof is then complete.
4. Applications to Compressed Sensing and Image Restoration
In this section, we apply the proposed algorithms, that is, SGP and MSGP, to solve some practical problems arising from the compressed sensing and image restoration. We will compare the proposed algorithms with SGCS, SPARSA, and GPSRBB. The system of nonsmooth equations in SGCS is where , , are defined as those in [17]. The test problems are associated with applications in the areas of compressed sensing and image restoration. All experiments were carried out on a Lenovo PC (2.53 GHz, 2.00 GB of RAM) using Matlab 7.8. The parameters in SGCS are specified as follows: The parameters in SGP and MSGP are specified as follows: Throughout the experiments, we choose the initial iterate to be .
In our first experiment, we consider a typical CS scenario, where the goal is to reconstruct a length sparse signal (in the canonical basis) from observations, where . The matrix is obtained by first filling it with independent samples of the standard Gaussian distribution and then orthonormalizing the rows. Due to the storage limitations of PC, we test a small size signal with , . The observed vector is , where is Gaussian white noise with variance and is the original signal with randomly placed spikes and with zeros in the remaining elements. The regularization parameter is chosen as . We compare the performance of SGP and MSGP with that of SGCS, SPARSA, and GPSRBB by solving the problem and choose in SGP and MSGP algorithms. We measure the quality of restoration by means of mean squared error (MSE) to the original signal defined by where is the restored signal. To perform this comparison, we first run the SGCS algorithm and stop the algorithm if the following inequality is satisfied: and then run each of the other algorithms until each reaches the same value of the objective function reached by SGCS.
The original signal and the estimation obtained by solving (1.1) using the MSGP method are shown in Figure 1. We can see from Figure 1 that MSGP does an excellent job at locating the spikes with respect to the original signal. In Figure 2, we plot the evolution of the objective function versus iteration number and CPU time, for these algorithms. It is readily to see that MSGP worked faster than other algorithms.
(a)
(b)
(c)
(a)
(b)
In the second experiment, we test MSGP for three image restoration problems based on the images as House, Cameraman, and Barbara. House and Cameraman images are of size and the other is of size . All the pixels are contaminated by Gaussian noise with the standard deviation of with blurring. The blurring function is chosen to be a twodimensional Gaussian, truncated such that the function has a support of . The image restoration problem has the form (1.1), where and are the composition of the uniform blur matrix and the Haar discrete wavelet transform (DWT) operator. We compare the performance of MSGP with that of SGCS, SPARSA, and GPSRBB by solving the problem and choose in the MSGP method. As usual, we measure the quality of restoration by signaltonoise ratio (SNR) defined as where and are the original and restored images, respectively. We first run SGCS and stop the process if the following inequality is satisfied: and then run the other algorithms until their objective function value reach SGCS's value. Table 1 reports the number of iterations (Iter), the CPU time in seconds (Time), and the SNR to the original images (SNR).

It is easy to see from Table 1 that the MSGP is competitive with the wellknown algorithms: SPARSA and GPSRBB, in computing time and number of iterations and improves the SGCS greatly. Therefore we conclude that the MSGP provides a valid approach for solving norm minimization problems arising from image restoration problems.
Preliminary numerical experiments show that SGP and MSGP algorithms have improved SGCS algorithm greatly. This may be because the system of nonsmooth equations solved here has lower dimension than that in [17] and the modification to projection steps that we made reduces the computational complexity.
Acknowledgments
The authors would like to thank the anonymous referee for the valuable suggestions and comments. L. Wu was supported by the NNSF of China under Grant 11071087; Z. Sun was supported by the NNSF of China under Grants 11126147 and 11201197.
References
 D. L. Donoho, “Compressed sensing,” IEEE Transactions on Information Theory, vol. 52, no. 4, pp. 1289–1306, 2006. View at: Publisher Site  Google Scholar  Zentralblatt MATH
 M. Elad, B. Matalon, and M. Zibulevsky, “Image denoising with shrinkage and redundant representations,” in Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '06), pp. 1924–1931, New York, NY, USA, June 2006. View at: Publisher Site  Google Scholar
 S. Alliney and S. A. Ruzinsky, “An algorithm for the minimization of mixed ${\ell}_{1}$ and ${\ell}_{2}$ norms with application to Bayesian estimation,” IEEE Transactions on Signal Processing, vol. 42, no. 3, pp. 618–627, 1994. View at: Publisher Site  Google Scholar
 E. J. Candès, J. K. Romberg, and T. Tao, “Stable signal recovery from incomplete and inaccurate measurements,” Communications on Pure and Applied Mathematics, vol. 59, no. 8, pp. 1207–1223, 2006. View at: Publisher Site  Google Scholar  Zentralblatt MATH
 D. L. Donoho, “For most large underdetermined systems of linear equations the minimal ${\ell}_{1}$norm solution is also the sparsest solution,” Communications on Pure and Applied Mathematics, vol. 59, no. 6, pp. 797–829, 2006. View at: Publisher Site  Google Scholar
 I. Daubechies, M. Defrise, and C. De Mol, “An iterative thresholding algorithm for linear inverse problems with a sparsity constraint,” Communications on Pure and Applied Mathematics, vol. 57, no. 11, pp. 1413–1457, 2004. View at: Publisher Site  Google Scholar  Zentralblatt MATH
 C. De Mol and M. Defrise, “A note on waveletbased inversion algorithms,” Contemporary Mathematics, vol. 313, pp. 85–96, 2002. View at: Google Scholar
 A. Chambolle, R. A. DeVore, N. Y. Lee, and B. J. Lucier, “Nonlinear wavelet image processing: variational problems, compression, and noise removal through wavelet shrinkage,” IEEE Transactions on Image Processing, vol. 7, no. 3, pp. 319–335, 1998. View at: Publisher Site  Google Scholar  Zentralblatt MATH
 A. Beck and M. Teboulle, “A fast iterative shrinkagethresholding algorithm for linear inverse problems,” SIAM Journal on Imaging Sciences, vol. 2, no. 1, pp. 183–202, 2009. View at: Publisher Site  Google Scholar  Zentralblatt MATH
 E. T. Hale, W. Yin, and Y. Zhang, “Fixedpoint continuation for ${\ell}_{1}$minimization: methodology and convergence,” SIAM Journal on Optimization, vol. 19, no. 3, pp. 1107–1130, 2008. View at: Publisher Site  Google Scholar
 Z. Wen, W. Yin, D. Goldfarb, and Y. Zhang, “A fast algorithm for sparse reconstruction based on shrinkage, subspace optimization, and continuation,” SIAM Journal on Scientific Computing, vol. 32, no. 4, pp. 1832–1857, 2010. View at: Publisher Site  Google Scholar  Zentralblatt MATH
 M. A. T. Figueiredo, R. D. Nowak, and S. J. Wright, “Gradient projection for sparse reconstruction: application to compressed sensing and other inverse problems,” IEEE Journal of Selected Topics in Signal Processing, vol. 1, no. 4, pp. 586–597, 2007. View at: Publisher Site  Google Scholar
 J. Barzilai and J. M. Borwein, “Twopoint step size gradient methods,” IMA Journal of Numerical Analysis, vol. 8, no. 1, pp. 141–148, 1988. View at: Publisher Site  Google Scholar  Zentralblatt MATH
 S. J. Wright, R. D. Nowak, and M. A. T. Figueiredo, “Sparse reconstruction by separable approximation,” IEEE Transactions on Signal Processing, vol. 57, no. 7, pp. 2479–2493, 2009. View at: Publisher Site  Google Scholar
 S. Yun and K.C. Toh, “A coordinate gradient descent method for ${\ell}_{1}$regularized convex minimization,” Computational Optimization and Applications, vol. 48, no. 2, pp. 273–307, 2011. View at: Publisher Site  Google Scholar
 J. Yang and Y. Zhang, “Alternating direction algorithms for ${\ell}_{1}$problems in compressive sensing,” SIAM Journal on Scientific Computing, vol. 33, no. 1, pp. 250–278, 2011. View at: Publisher Site  Google Scholar
 Y. Xiao, Q. Wang, and Q. Hu, “Nonsmooth equations based method for ${\ell}_{1}$norm problems with applications to compressed sensing,” Nonlinear Analysis: Theory, Methods & Applications, vol. 74, no. 11, pp. 3570–3577, 2011. View at: Publisher Site  Google Scholar
 L. Zhang and W. J. Zhou, “Spectral gradient projection method for solving nonlinear monotone equations,” Journal of Computational and Applied Mathematics, vol. 196, no. 2, pp. 478–484, 2006. View at: Publisher Site  Google Scholar  Zentralblatt MATH
 Q. N. Li and D. H. Li, “A class of derivativefree methods for largescale nonlinear monotone equations,” IMA Journal of Numerical Analysis, vol. 31, no. 4, pp. 1625–1635, 2011. View at: Publisher Site  Google Scholar
 M. V. Solodov and B. F. Svaiter, “A globally convergent inexact Newton method for systems of monotone equations,” in Reformulation: Nonsmooth, Piecewise Smooth, Semismooth and Smoothing Methods, M. Fukushima and L. Qi, Eds., vol. 22, pp. 355–369, Kluwer Academic Publishers, 1998. View at: Google Scholar  Zentralblatt MATH
 W. J. Zhou and D. H. Li, “Limited memory BFGS method for nonlinear monotone equations,” Journal of Computational Mathematics, vol. 25, no. 1, pp. 89–96, 2007. View at: Google Scholar
 W. J. Zhou and D. H. Li, “A globally convergent BFGS method for nonlinear monotone equations without any merit functions,” Mathematics of Computation, vol. 77, no. 264, pp. 2231–2240, 2008. View at: Publisher Site  Google Scholar  Zentralblatt MATH
 F. H. Clarke, Optimization and Nonsmooth Analysis, John Wiley & Sons, New York, NY, USA, 1983.
 L. Q. Qi and J. Sun, “A nonsmooth version of Newton's method,” Mathematical Programming A, vol. 58, no. 3, pp. 353–367, 1993. View at: Publisher Site  Google Scholar  Zentralblatt MATH
 X. Chen and S. Xiang, “Computation of error bounds for Pmatrix linear complementarity problems,” Mathematical Programming A, vol. 106, no. 3, pp. 513–525, 2006. View at: Publisher Site  Google Scholar  Zentralblatt MATH
 D. H. Li and M. Fukushima, “A modified BFGS method and its global convergence in nonconvex minimization,” Journal of Computational and Applied Mathematics, vol. 129, no. 12, pp. 15–35, 2001. View at: Publisher Site  Google Scholar  Zentralblatt MATH
Copyright
Copyright © 2012 Lei Wu and Zhe Sun. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.