Sparse Recovery by Semi-Iterative Hard Thresholding Algorithm
We propose a computationally simple and efficient method for sparse recovery termed as the semi-iterative hard thresholding (SIHT). Unlike the existing iterative-shrinkage algorithms, which rely crucially on using negative gradient as the search direction, the proposed algorithm uses the linear combination of the current gradient and directions of few previous steps as the search direction. Compared to other iterative shrinkage algorithms, the performances of the proposed method show a clear improvement in iterations and error in noiseless, whilst the computational complexity does not increase.
Compressed sensing (CS) [1–3] is a new framework for acquiring sparse signals based on the revelation that a small number of linear measurements of the signal contain enough information for its reconstruction. CS relies on the fact that many natural signals are sparse or compressible when expressed in the proper basis and frame. The model of CS can be written as a linear sampling operator by a matrix yielding a measurement vector where is an matrix, is -sparse vector, and . Since the linear sampling operator is not bijection and therefore has infinitely many solutions. Efficient algorithms to find sparse solutions are becoming very important. This leads to solving the -minimization problem
Unfortunately, this minimization problem is NP-hard . As alternatives, approximation algorithms are often considered. Approximation algorithms to find sparse solutions may be classified into greedy pursuits algorithms, convex relaxation algorithms, Bayesian framework, and nonconvex optimization. In this paper, we will focus on greedy pursuits algorithms and convex relaxation algorithms; thus more details of Bayesian framework and nonconvex optimization methods can be found in [4, 5]. Greedy pursuits algorithms include orthogonal matching pursuit (OMP) , stagewise OMP (StOMP) , regularized OMP (ROMP) , compressive sampling matching pursuit (CoSaMP) , iterative hard thresholding (IHT) , and gradient descent with sparsification (GraDeS) . Convex relaxation algorithms include gradient projection for sparse reconstruction (GPSR)  and sparse reconstruction by separable approximation (SpaRSA) . For more details about convex relaxation algorithms, see, for example . Convex relaxation algorithms succeed with a very small number of measurements, but they tend to be computationally burdensome . An alternative family of numerical algorithms has gradually built, addressing the optimization problems very effectively . This family is the iterative-shrinkage algorithms. Iterative-shrinkage algorithms include iterative hard thresholding (IHT)  and gradient descent with sparsification (GraDeS) , parallel coordinate descent (PCD) , and fast iterative-shrinkage thresholding algorithm (FISTA) . In these methods, each iteration consists of a multiplication by and its transpose, along with a scalar shrinkage step on the obtained . For iterative-shrinkage algorithms, IHT and GraDeS use a negative gradient as the search direction, that is, Landweber iteration , but the main drawback of Landweber iteration is its slow performance, that is, a large number of iterations need to obtain the optimal convergence rates . Inspired by the semi-iterative method  and hard thresholding, we present an algorithm for solving sparse recovery, which requires less time and fewer iterations.
2. Background on Compressed Sensing
2.1. Sensing Matrix
Without further information, it is impossible to recover from , since is highly underdetermined. In order to recover a good estimate of from measurements, the measurement matrix must obey the restricted isometry property (RIP) , for all , denotes the set of -sparse vectors, is restricted isometry constant, , provided that , where is some constant depending on each instance. It is difficult to verify the RIP conditions for a given matrix. A widely used technique for avoiding checking the RIP directly is to generate the matrix randomly, such as Gaussian matrix, symmetric Bernoulli matrix, and partial Fourier matrix [1–3], and to show that the resulting random matrix satisfies the RIP with high probability. In this paper, we will use Gaussian matrix as the measurement matrix.
2.2. Sparse Recovery
An alternative approach to sparse signal recovery is based on the idea of iterative greedy pursuit and tries to approximate the solution to (2) directly. In this case, the problem (2) is closely related to the following optimization problem: where denotes the sparse level of the vector
The second one is convex relaxation. In this case, the problem (2) is closely related to the following optimization problem: However, these methods are often inefficient, requiring many iterations and excessive central processing unit time to reach their solutions .
An alternative family of numerical algorithms has gradually built, addressing the above optimization problems very effectively . This family is the iterative-shrinkage algorithms. We will discuss iterative-shrinkage algorithms in the next section.
3. Semi-Iterative Hard Thresholding
The main drawback of Landweber iteration is its comparatively slow rate of convergence while for Landweber iteration only information about the last iterate is used to construct the new approximation . In order to overcome the drawback, more sophisticated iteration methods have been developed on the basis of the so-called semi-iterative methods. A basic step of a semi-iterative method (polynomial acceleration methods) consists of one step of iteration, followed by an averaging process over all or some of the previously obtained approximations. A basic step of a semi-iterative method has the form where . An example for semi-iterative methods with optimal rate of convergence are the -methods (two-step methods) by , which are defined by where
From (4), the gradient of the cost function is given by and easy to compute the step length that minimizes . By differentiating the function with respect to , we obtain By setting the derivative to zero, we obtain If we choose the step lengths by (10), thus , that is, the search direction is orthogonal to the gradient (previous search direction). In this case, the sequence of iterations is subject to zigzags. Since IHT and GraDeS use the negative gradient of the cost function as the search direction, and sampling matrix must obey the RIP, that is, , which means , thus the iteration zigzag toward the solution. As a result, a large number of iterations need to obtain the optimal solution.
In order to avoid zigzagging toward solution and find the sparse solution for (4), inspired by the -methods  as mentioned above, we present the semi-iterative hard thresholding method, which has the form where is the nonlinear operator that sets all but the largest (in magnitude) elements of a vector to zero. From (11), we use the linear combination of the current negative gradient and the search direction of the previous step as the new search direction. In this case, the search direction dose not tend to become orthogonal to the gradient ; thus SIHT avoids zigzagging toward solution. The algorithm is summarized as in Algorithm 1.
As mentioned above, the semi-iterative hard thresholding algorithm is easy to implement. It involves the application of the matrix and at each iteration as well as two vector additions. The storage requirements are small. Apart from storage of , we only require the storage of the vector and , which require two elements to be stored. The choice of the parameter will be discussed in the next section.
4. Experimental Results
This section describes some experiments testifying to the performances of the proposed algorithm. All the experiments were carried out on HP z600 workstation with eight Intel Xeon 2.13 GHz processors and 16 GB of memory, using a MATLAB implementation under Windows XP.
4.1. Choice of the Parameter
In our experiment, we consider a typical CS scenario, where the goal is to reconstruct a length- sparse vector from measurements. In this case, first, the random matrix is created by filling it with entries generated independently and identically distribution and then orthogonalizing the rows. Second, original vector contains randomly placed spikes, and the measurement is generated according to (1). Unless otherwise stated, we terminate the iteration after , with .
The experiment assesses how the running time of the proposed algorithm grows with the parameter . In order to find a better optimization parameter in the experiment, we set the parameter , respectively, to , whilst the running time of our method is computed. Figure 1 shows the running time of our algorithm as the parameter is varied. The label stands for measurements and sparse length- vector in our experiments. A careful examination reveals that as parameter is increased, the running time of our method is minimized with respect to . For , the running time increases only marginally as is increased, that is, the choice of parameter appears to give good performance for a wide range of problems.
4.2. Comparison in Recovery Rate
In this experiment, we compared the empirical performance of GraDes, IHT, SpaRSA, FISTA, and SIHT solutions to the sparse recovery. We generated a Gaussian random matrix and generated sparse spikes vector. The reconstruction is considered to be exact when the 2 norm of the difference between the reconstruction and original vector is below . We repeated the experiment 100 times for each value of from 2 to 128 (in steps of 2). Figure 2 shows that SIHT algorithm provides higher probability of perfect recovery than GraDes, IHT, SpaRSA, and FISTA, when the sparse vectors are drawn spikes. Furthermore, in the perfect recovery case, we observe that the GraDes and SpaRSA algorithms perform similarly. While it reveals that measurements of SIHT require less than those of IHT, GraDes, SpaRSA, and FISTA to recover the sparse vector for a given and
4.3. Comparison in Running Time
In order to evaluate running time of the proposed algorithm, these experiments include comparisons with OMP, StOMP, ROMP, IHT, and GraDeS. Now, the sampling matrix , the measurement vector , and the sparsity level are given to each of the algorithms as inputs. For the proposed algorithm, we set ; the performance is insensitive to the choices. Table 1 compares the running time of the MATLAB implementation of SIHT and the five existing methods. The symbol “” indicates the algorithm fails.
Table 1 shows that the iterative-shrinkage algorithms are significantly faster than match pursuit algorithms. For simplicity, we will compare the performance of SIHT with IHT and GraDes in the next experiments.
4.4. Comparison in Sparsity
In this experiment, we show the dependence of the 2-norm errors of different algorithms in different sparsity level . In Figure 3, we show the 2-norm errors of SIHT comparison with IHT, GraDes, SpaRSA, and FISTA in different sparsity levels. We generated a Gaussian random matrix and generated sparse spikes vector or Gaussian vector. We repeated the experiment 100 times for each value of from 2 to 120 (in steps of 10). Both GraDes and SpaRSA begin to fail when sparsity level is above 120; thus the failed results are omitted from the figure.
Figure 3 shows that GraDes, IHT, and SIHT algorithms perform similarly for Gaussian sparse vectors, and GraDes algorithm is to fail in recovery for sparse spikes vectors when sparsity level is above 70, that is, GraDes algorithm requires more measurements to recover the sparse vectors. It reveals that FISTA, IHT, and SIHT algorithms are insensitive to the sparsity level , whilst GraDes and SpaRSA algorithms are sensitive to the sparsity level . In addition, SIHT algorithm outperforms other algorithms in 2-norm errors for sparse spikes or Gaussian vector.
4.5. Comparison in Number of Iterations
In the experiment, we show the number of iterations required by SIHT algorithm in comparison with four algorithms, namely, IHT, GraDes, SpaRSA, and FISTA for sparse spikes vector or Gaussian vector. We generated a Gaussian random matrix and generated sparse spikes or Gaussian vector. Figures 4 and 5 show the number of iterations needed by the algorithms as mentioned above for , , and .
Figures 4 and 5 depict that IHT and GraDes algorithms show a faster rate of convergence, when the number of iteration is less than 4. However, when the number of iteration is above 6, owing to polynomial acceleration, FISTA and SIHT algorithms show a faster rate of convergence than the others. In addition, from Figures 4 and 5, for each , FISTA and IHT algorithms are roughly similar in terms of number of iterations. In that SIHT algorithm uses the linear combination of the current gradient and directions of a few previous steps as the new search direction, SIHT algorithm shows a faster rate of convergence than the others. While GraDes algorithm exhibits poorer performance than the others in rate of convergence.
From Figures 4 and 5, the 2-norm errors of those algorithms except SpaRSA algorithm are insensitive to the number of iterations, that is, 2-norm errors are strictly monotone reduced as iteration is increased.
As expected, these results suggest that SIHT outperforms other iterative-shrinkage algorithms in iterations and 2-norm errors.
In this paper, semi-iterative hard thresholding recovery algorithm for sparse recovery was proposed in this work. The proposed algorithm uses the linear combination of the current gradient and directions of a few previous steps as the new search direction and avoids zigzagging toward solution. Owing to using the new search direction, the performance of SIHT is improved compared with iterative-shrinkage algorithms.
This work was supported in part by the National Natural Science Foundation of China (Grant no. 61271294). The authors would like to thank Arvind Ganesh, Allen Y. Yang, and Zihan Zhou for sharing their software packages (L1benchmark) with us.
D. L. Donoho, Y. Tsaig, I. Drori, and J. L. Starck, “Sparse solution of underdetermined linear equations by stagewise orthogonal matching pursuit,” Stanford Statistics Technical Report 2006-2, 2006.View at: Google Scholar
M. Zibulevsky and M. Elad, “L1-L2 optimization in signal and image processing,” IEEE Signal Processing Magazine, vol. 27, no. 3, pp. 76–88, 2010.View at: Google Scholar
R. Garg and R. Khandekar, “Gradient descent with sparsification: an iterative algorithm for sparse recovery with restricted isometry property,” in Proceedings of the 26th International Conference on Machine Learning, pp. 337–344, 2009.View at: Google Scholar