Table of Contents Author Guidelines Submit a Manuscript
Mathematical Problems in Engineering
Volume 2014, Article ID 358742, 23 pages
http://dx.doi.org/10.1155/2014/358742
Research Article

A Greedy Multistage Convex Relaxation Algorithm Applied to Structured Group Sparse Reconstruction Problems Based on Iterative Support Detection

School of Mathematical Sciences, University of Electronic Science and Technology of China, Chengdu, Sichuan 611731, China

Received 27 April 2014; Revised 27 August 2014; Accepted 29 August 2014; Published 21 October 2014

Academic Editor: Yi-Kuei Lin

Copyright © 2014 Liangtian He and Yilun Wang. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

We propose a new effective algorithm for recovering a group sparse signal from very limited observations or measured data. As we know that a better reconstruction quality can be achieved when encoding more structural information besides sparsity, the commonly employed -regularization incorporating the prior grouping information has a better performance than the plain -regularized models as expected. In this paper we make a further use of the prior grouping information as well as possibly other prior information by considering a weighted model. Specifically, we propose a multistage convex relaxation procedure to alternatively estimate weights and solve the resulted weighted problem. The procedure of estimating weights makes better use of the prior grouping information and is implemented based on the iterative support detection (Wang and Yin, 2010). Comprehensive numerical experiments show that our approach brings significant recovery enhancements compared with the plain model, solved via the alternating direction method (ADM) (Deng et al., 2013), either in noiseless or in noisy environments.

1. Introduction

1.1. Group Sparse Reconstruction and Related Work

It is becoming a hot research topic to find sparse solutions of underdetermined linear systems in the last few years in various fields, for example, compressive sensing (CS), signal processing, statistics, and machine learning [1], for example, multiple kernel learning [2], microarray data analysis [3], and channel estimations in doubly dispersive multicarrier systems [4]. For example, in machine learning, the high dimensionality poses significant challenges for us to build interpretable models with high prediction accuracy, and many sparsity related regularization techniques have been commonly utilized to obtain more stable and interpretable models. In compressive sensing, the sparsity regularization allows us to reconstruct high dimensional data with only a small number of samples. However, recent studies encourage us to go beyond sparsity to further enhance the recoverability, that is, taking into account additional information about the underlying structure of the solutions. As an important case, lots of solutions are known to have certain groups sparsity structure, and more precisely, not only do they have a natural grouping division of its components, but also the components within a group are likely to be all nonzeros or all zeros. Thus encoding the group sparsity structure will reduce the degrees of freedom in the solution, resulting in better recovery performance.

In this paper, we focus on the above group sparsity and corresponding commonly used regularized model and try to extend it for better recovery performance. Assume that denotes an unknown group sparse solution. Let be the grouping division of , where is an index set corresponding to the th group, and denotes the subvector of indexed by , which is predefined based on the prior information of the underlying solution generally. The mixed norm is defined as follows: Compared with the classical using of -regularization for sparse reconstruction, -regularization will take the grouping information into consideration and facilitate group sparsity. Notice that the resulted problem is convex, and this mixed optimization problem (1) is commonly solved through several efficient first-order algorithms proposed in the literature, for example, spectral projected gradient method (SPGL1) [5], accelerated gradient method (SLEP) [6], block-coordinated descent algorithms [7], and SpaRSA [8]. Among them, the authors proposed the alternating direction method (ADM) to solve the primal and dual formulation of regularized optimization problems in [9]. Their preliminary numerical results have shown that the ADM algorithms are fast, stable, and robust, outperforming the previously known state-of-art algorithms.

1.2. Weighted -Norm Regularized Group Sparse Reconstruction Problem

In this paper, for better reconstruction quality, instead of considering the -norm, we consider a more general formulation, that is, the weighted (or )-norm [9] defined as follows: where are the corresponding weights associated with each group, consisting of a weighting vector . We assume that the groups form a division of unless otherwise specified. We emphasize that the weighted -norm can also be extended to the more general group configurations, for example, overlapping and/or incomplete cover group sparsity.

As a nonconvex model, (2) is well known to behavior better than the unweighted counterpart with appropriate settings of weights (1). The key is how to determine the weights and it constitutes the main contribution as summarized in Section 1.3.

We will consider several weighted models in this paper. The following basis pursuit (BP) model is considered when the measurement vector do not contain noise: where and . Without loss of generality, we suppose has full rank. The basis pursuit denoising (BPDN) models are commonly employed when the measurement vector contains noise, including the following constrained form: and the unconstrained form where and are penalty parameters, respectively. It should be noted that the constrained form is equivalent to unconstrained form from the viewpoint of optimization theory when the parameters and are properly selected. In this paper, we will focus on the basis pursuit model, and the extension of our proposed algorithms for the basis pursuit denoising models (4) and (5) follows similarly. Furthermore, it should be pointed out that the basis pursuit model (3) can also be good for reconstructing noisy data if the iterations are stopped properly prior to convergence based on the noise level.

1.3. Contributions

In this paper, we propose an effective way to determine the weights of the above weighed models, by extending the iterative support detection (ISD) proposed in [10], from sparsity to group sparsity, which is a special case of structure sparsity [1113]. In other words, for the weighted reconstruction problem in this paper, based on ISD, we obtain the final result via multistage convex relaxation process [14], which consists of solving a series of weighted (or referred to as ) convex optimization problems, where weights have been estimated via the support detection applied to the reconstructed signal of the previous stage. The solution of multistage weighted model is usually better than the traditional model in terms of the relative error and reconstruction quality, from both the theoretical and practical points of view, and numerical results demonstrate that our proposed algorithm can recover a satisfying result from a failed reconstruction of model in our cases.

In addition, we empirically demonstrate that, in the cases of grouping sparsity, the previous requirements of the fast decaying property in the common plain sparse signal recovery are not necessary any longer, for the threshold-ISD proposed in [10], and we will give some intuitive explanation.

1.4. Organization

The rest of this paper is organized as follows. In Section 2, we present our algorithmic framework. In Section 3, we provide comprehensive numerical experiments to evaluate the performance of our proposed algorithm for group sparse signal reconstruction and compare it with the ADM approach proposed in [9]. We end this paper with some conclusions and discussions on some possible future work.

2. Algorithmic Framework

In this section, without loss of generality, we suppose that the grouping is a partition of the solution, which is predefined as a prior knowledge, and it can be easily extended to the general group configurations, for example, overlapping and incomplete cover grouping cases.

The main difficulty of setting the weights is that we do not usually know the true solution and that even when we have some knowledge about the true solution, we still need to find a proper way to make use of it to help obtain a better solution. The iterative support detection is an effective way to deal with the above difficulty and we will generalize this idea to the cases of group sparsity. So we first review the iterative support detection.

2.1. Revisiting and Some New Thoughts of Iterative Support Detection

Iterative support detection was first proposed in our early work [10], and the idea of exploiting partially support detection arises in several subsequent literatures, for example, [1520]. We first briefly review the iterative support detection in compressive sensing in terms of the single sparse signal reconstruction [10]. In addition, we give some novel analysis of the advantages of 0-1 weighting scheme adopted in ISD compared with other weighting alternatives [21, 22]. Compressive sensing (CS) [23, 24] reconstructs a sparse signal from a small set of linear projections. Let denote a -sparse signal, let be the measurement matrix, and represent the linear projections of . The general optimization model is the basis pursuit (BP) problem: ISD alternatively calls its two components: support detection and signal reconstruction. Support detection identifies an index set from an incorrect reconstruction, which contains some elements of . After acquiring the support detection, a resulting truncated BP problem is considered: where and . Let be the true sparse signal. If the support detection , then the solution of (7) is, of course, equal to true . But we should point out that even if contains enough, not necessarily all, entries of , a better solution can still be expected. When does have enough , those entries of in will help (7) return a better solution in comparison to (6), and from this better solution, support detection will be able to identify more entries in and then yield an even better . In this way, the two components of ISD work together to gradually recover and improve the reconstruction performance. It is clear that ISD is a multistage procedure.

ISD requires reliable true support detection from inexact reconstruction, which can be obtained by taking advantages of the features and prior information about the original true signal [10, 25]. For example, for the sparse or compressible signals with components having a fast decaying distribution of nonzeros in [10]. One can perform the support detection by thresholding the solution of (7), and the corresponding ISD implementation is denoted as threshold-ISD.

In this paper, we would like to present some further analysis for ISD. ISD adopts a specific 0-1 weighting scheme and it has several advantages. First, its performance over the single stage pure model has been proved rigorously in [10] once we can detect correct partial support information (by thresholding, e.g., in this paper), while most of the related alternatives such as the reweighted algorithm [21] do not have such rigorous theoretical guarantees. Secondly, for the reweighted algorithm, the weights are usually set like . In [10], we have pointed out that the tuning parameter is the key parameter and should be determined carefully. Roughly, should not be a fixed value and should decrease from a large value to a small value. In an extreme case, if is always fixed to , then we would not get a better solution for the next stage, even if there is no numerical trouble of dividing , because we just passively make use of all the information of the current solution without any filtering out for the inaccurate information. From our analysis, the choice of is like controlling the extraction of the useful information and suppressing the distortion of the recovery noise. Our 0-1 weights of ISD are a more explicit and straightforward way to imply the idea of making use of the correct information (mainly about the locations of components of large magnitude and setting the corresponding weights as ) and give up the rest of the too noisy information (for those components of small magnitudes, they are mostly overwhelmed by the recovery noise and therefore there is very little meaning to set different weights according to their magnitudes; so it is more reasonable to set the same weights as ).

We need to point out that the estimation of based on the support detection is often advantageous. The implementations of support detection can be more flexible and different for specific signals, in order to make use of their different underlying structure, for example, grouping structure, tree structure, or even graph structure.

2.2. ISD Extended to the Weighted Model

The main effort of this paper is to extend the idea of ISD from the plain sparse vector recovery to the group sparsity cases and demonstrate the extraordinary advantages of thresholding support detection in these cases, compared to the plain sparsity cases.

As the original ISD does, our extension is also in general an alternating optimization procedure to decouple the nonlinear combination of and , by repeatedly applying the following two steps.

Step 1. First we optimize with fixed (initially ): this is a convex problem in .

Step 2. Second we determine the value of according to the currently reconstructed . The value of will be used in Step of the next iteration.

The plain model, as a single stage process, is solved once generally and treats the solution as the final result. For the weighted model, we will obtain the final result from a multistage process by solving a series of weighted problems. At each stage, the adaptive weights will change according to the newly reconstructed signal. The full procedure and the details of Steps and will be presented in the following sections.

2.3. Step 1: Solving the Weighted Model Given Weights

Assume the weights are given; [9] proposed the approach for solving the weighted -problem (3), based on the variable splitting technique and the alternating direction method (ADM, for short) [2632]. However, how to select proper weights was not presented and they used the uniform weights in their numerical experiments, that is, plain model. Here, we briefly review their approach, using the nonoverlapping case. But in the numerical experiments, the overlapping group sparsity [33] will also be considered.

2.3.1. Applying ADM to the Primal Weighted Model

For the primal -problem (3), we first introduce an auxiliary variable and transform (3) as an equivalent constrained optimization problem as follows: The corresponding augmented Lagrangian function of (8) is defined as where , are multipliers and are penalty parameters, respectively. If we start at and , the iterative framework of the augmented Lagrangian problem has the following form: The -subproblem of this iterative framework, namely, the minimization of with respect to , is and it can be transformed as an equivalent convex quadratic problem: Note that it can be reduced to solving the linear system based on the optimality condition as follows: Similarly, minimizing with respect to in the iterative framework has the following formula: After simple manipulations, it is easy to see that (14) is equivalent to Notice that the solution of (15) has a closed form according to the one-dimensional shrinkage (or referred to as soft-thresholding) formula [34]: where , and the convention is assumed. For shortness, we denote the above group-wise shrinkage operation as .

Finally, we update the multipliers . Note that step lengths can be incorporated to the update of ; that is, where are step lengths. Under certain assumptions, convergence of the ADM framework with step lengths was demonstrated in [31, 32] in the context of variational inequality. The above procedure can be summarized in Algorithm 1.

alg1
Algorithm 1: Primal-based ADM for group sparsity [9].

2.3.2. Applying ADM to the Dual Weighted Model

In this part, we briefly review the ADM approach to the dual form of the weighted model and derive an equally simple yet more efficient algorithm. The dual form of (3) is given as follows: Similarly, we introduce an auxiliary variable and transform (18) as an equivalent constraint optimization problem: The Lagrangian function of (19) is given by where is a penalty parameter, and note that is a multiplier and essentially the primal variable. If we start at and , the iteration framework of the augmented Lagrangian problem is given by The subproblem of this iterative framework can be solved as a linear system according to its optimality condition The subproblem has the form It is easy to see that (24) has a closed-form solution: where represents a projection (in Euclidean norm) onto a convex set denoted by a subscript and . Finally, we update the multiplier , that is, the primal variable essentially: where is a step length.

Now, we rewrite the ADM iteration scheme for (18) as follows.

2.4. Step 2: Adaptive Weights Determined Based on Iterative Support Detection

In this section, we will present the way to determine the weights in Step 2, extending the idea of iterative support detection (ISD) [10] from single sparse vector cases to group sparsity cases.

The support detection based on thresholding in terms of group sparsity is as follows: where denotes the stage number. The elements of the weighting vector are equal to if the corresponding positions belong to the support detection , or otherwise. Before discussing the choice of , it should be pointed out that support detection sets are not necessarily increased and nested; that is, may not hold for all , because the support detection we get from the current solution may contain wrong detections by thresholding, and not requiring to be monotonic leaves the chance for support detection to remove previous wrong detections. This makes less sensitive to , thus making the threshold value easier to choose. The tuning parameter is a key parameter, and it is not a fixed value but is preferred to decrease from a large value to a small value, which can extract more correct nonzero information from the intermediate reconstruction results as the ISD iteration proceeds. In addition, we have proved that ISD can tolerate certain ratio of wrong support detections and still achieve a better reconstruction in [10].

We set the threshold value as follows: with . An excessively large will result in too many false detections and lead to low solution quality, while an excessively small tends to need a large number of iterations. This rule will be quite effective with an appropriate , and the proper range of is case-dependent [35]. Empirically, the performance of our algorithm is not very sensitive to the choice of in our cases.

Here we would like to point out the difference of the situations between this paper and our pervious work in [10]. While we still use the threshold-ISD [10] in our cases, the fast-decaying property of the unknown signal is not required any more, because the prior information of grouping improves the performance of threshold based support detection. Some simple intuitive explanation is given here. In [10], the components of a sparse signal are considered separately while the known grouping information of the sparse signal is considered in this paper. Therefore, when performing the thresholding via (27), the prior grouping information reduces the freedom of the unknown signal and provides us with better robustness to the recovery errors of the intermediate results than the plain sparse recovery, which is either the arbitrary grouping or only component-wise.

2.5. Our Algorithm Framework and Some Further Analysis

Now, we summarize the algorithm framework of the multistage convex relaxation for the weighted model based on ISD. The algorithm repeatedly performs the two steps mentioned above: support detection to determine the 0-1 weights and solving the resulted weighted utilizing the ADM algorithms. Moreover, we present some new viewpoints of ISD.

Notice that since , the weighted model (3) in Step 2(b) is nothing but the plain model in iteration 0. The weighted model in 2(b) of Algorithm 3 can be solved by Algorithm 1 or Algorithm 2. The support detection in 2(c) has been introduced in part C of this section. In each iteration, ISD estimates the indices of the nonzero groups using thresholding. Since ISD belongs to the greedy methods and it is also a multistage procedure, the new method is denoted by GM-ADM. Each iteration GM-ADM needs to solve a weighted problem and hence GM-ADM is computationally more demanding, compared to ADM. However, its costing time is not necessarily several times of that of original model. The reason is due to the adopted warm-starting; that is, the output of the current stage (outside iteration) is employed as the input of the next stage, and then in the next stage, we often just need to run a few inner iterations to obtain a better updated solution. In addition, usually the number of the stages is not necessarily large (no more than 9 empirically). Notice that, for our problems, we are mainly focusing on the reconstruction quality and GM-ADMs are performing much better than ADMs in this aspect, and it is worth the extra computing cost.

alg2
Algorithm 2: Dual-based ADM for group sparsity [9].

alg3
Algorithm 3: The GM-ADM algorithm.

ISD can be also considered as a procedure based on mixed soft-threshoding and hard-thresholding. Specifically, The processing of obtaining the solution of the subproblem (15) is actually a soft-thresholding procedure, and we will obtain a group sparse solution via this group-wise shrinkage operation if the weights are all equal to . However, this kind of shrinkage has a fatal disadvantage; that is, it shrinks the components of the true nonzero groups as well and reduces the groups’ sharpness of the solution. ISD aims not to use the uniform weights in the weighted () model but uses the greedy 0-1 weights. It is easy to see that some components of certain groups will not be shrunk if we believe they are unlikely to be zero groups. In such cases, the weights of these certain groups are set as 0, which corresponds to the hard-thresholding. Thus the process of solving becomes a selective shrinkage procedure, that is, mixed soft-threshoding and hard-thresholding when ISD is applied.

3. Numerical Experiments

In this section, we show numerical results to evaluate the performance of our proposed GM-ADM approach in comparison with the ADM approach [9] in the case of group sparsity because [9] has made comprehensive comparisons with some other group sparsity methods, and their proposed algorithms are outperforming the previously known state-of-the-art algorithms mostly. The code of ADM (referred to as YALL1-group package) can be downloaded in the website [36]. All the experiments were performed under Windows 7 and MATLAB v7.10.0 (R2010a) running on a desktop with an Intel(R) pentium(R) CPU G640 (2.80 GHz) and 2 GB of memory.

3.1. Synthetic Nonoverlapping Group Sparsity Experiment

We generate the nonoverlapping group sparse solutions as follows: we first randomly divide an -vector into groups and then randomly pick of them as active (nonzero) groups, whose entries are iid random Gaussian or (Bernoulli signals; instead of randomly dividing -vector into groups, we fix the number of components in each group to be and randomly pick out of groups as active groups). The purpose of testing the sparse Bernoulli signal is to demonstrate that in, the cases of grouping sparsity, the fast decaying property of the magnitudes of the nonzero components is not required any longer, even in terms of the magnitudes of groups of variables. We use the standard iid Gaussian sensing matrices generated by  A = randn(m,n)  in MATLAB. We add the Gaussian noise into the observation by  noise = randn(m,1), b = b + sigma*norm(b)/norm(noise)*noise  in MATLAB. The test sets are summarized in Table 1. It needs to be pointed out that the YALL1-group package not only includes solvers for the constrained model (3) but also includes 6 different approaches for solving the corresponding nonoverlapping group sparse problems, as summarized in Table 2. The continuation technique takes the rules as follows: for the primal-based ADM, that is, PADM3, if we update , . Similarly, for the dual-based ADM, that is, DADM3, if we update . The ADMs are terminated when one of the following situations are met: That is, the relative change of two consecutive iterations becomes smaller than the tolerance. Or the iteration number reaches the prescribed maximal number, for example, . We use the same inner iteration terminal condition for GM-ADMs, that is, the terminal condition of 2(b) in Algorithm 3. We set in all of the numerical experiments.

tab1
Table 1: Summary of test sets.
tab2
Table 2: The 6 different primal-based and dual-based ADM approaches.

We set the optimal parameters for the 6 different ADMs in Table 3, which can achieve the best reconstruction quality in terms of relative error. These parameter values are mostly borrowed from [9], and the parameters for our proposed 6 different GM-ADMs are set the same as the corresponding ADMs and one can refer to [9] for some guidance. Empirically, these parameters are not very sensitive, and we believe the comparisons are fair under this parameter setting. Here we use the MATLAB-type notation mean(abs(b)) to denote the arithmetic average of the absolute value . For the primal-based ADMs and dual-based ADMs, we set and , respectively. With regard to our proposed GM-ADM, considering that the 2(b) in Algorithm 3 can be solved by the 6 different above-mentioned ADMs, we call them GM-PADM1, GM-PADM2, GM-PADM3, GM-DADM1, GM-DADM2, and GM-DADM3, respectively. The parameters for our proposed 6 different GM-ADMs are set same as the corresponding ADMs. In addition, we set in (28), and empirically the parameter is not very sensitive, and thus we choose to fix it in our experiments.

tab3
Table 3: The optimal parameter setting of 6 different primal-based and dual-based ADM approaches that achieve the best reconstruction quality.

As mentioned before, in the cases of grouping sparsity, the fast decaying property of the magnitudes of the nonzero components is not required any longer for the effectiveness of ISD in the case of group sparsity, though fast decaying property might be able to further enhance the performance of ISD. Our experiments will include the results of both fast decaying signals and non-fast decaying signals.

Figure 1 shows that the sorted groups’ magnitude of our test nonoverlapping group sparse Gaussian signal has fast decaying property, and this is good for our threshold-ISD, though not necessarily. In Figure 2, we present the relative error between recovered signal and original true signal of the test 1, and we can see that our algorithm GM-ADMs brings significant enhancements compared with the corresponding ADMs, for either primal-based ADMs or dual-based ADMs. From Figure 2, we can also see that GM-ADMs can obtain satisfactory promotion compared with ADMs, even if the maximum stage number is not very large. The key factor to our algorithm is that ISD requires reliable true support detection from inexact reconstruction, if the output of the first iteration (the result of ADMs) is rather unsatisfactory, for example, suffering from insufficient measurement number and/or considerable quantity of noise: it also becomes a hard work for GM-ADMs to achieve very huge promotion because of the inexact support detection. Thus the improvement by ISD when the measurements number is less obvious than other cases. In Figure 3, we show the comparison results between ADMs and GM-ADMs in noise environments, and ISD still brings better results.

358742.fig.001
Figure 1: The sorted groups’ magnitude of our tested nonoverlapping group sparse Gaussian signal has fast decaying property.
358742.fig.002
Figure 2: Comparison results of different primal-based ADMs, dual-based ADMs, and the corresponding GM-ADMs (nonoverlapping group sparse Gaussian cases). Noise level . The -axe represents number of measurements, and the -axe represents relative error, respectively. The maximum number of stages of the top row: ; middle row: ; bottom row: , respectively.
358742.fig.003
Figure 3: Comparison results of different primal-based ADMs, dual-based ADMs, and the corresponding GM-ADMs (nonoverlapping group sparse Gaussian cases). Top row: noise level . Bottom row: noise level . The -axe represents number of measurements, and the -axe represents relative error, respectively. The maximum number of stages .

In order to better illustrate the practical advantage of our algorithm, we also give visual comparisons of the reconstruction. Figures 4, 5, and 6 plot the reconstructed signals in the case of , , and the noise levels are , , , respectively. For paper conciseness, here we just give the comparisons between PADM3 and GM-PADM3, DADM3 and GM-DADM3, since the other cases have the similar conclusions. From Figures 4, 5, and 6, we can see that the results of GM-PADM3 and GM-DADM3 are much better than those of the correspondent PADM3 and DADM3, in either noiseless environment or noise environment.

358742.fig.004
Figure 4: Visual comparison of the performance between PADM3 and GM-PADM3, DADM3 and GM-DADM3 (nonoverlapping group sparse Gaussian cases). Noise level . The maximum number of stages .
358742.fig.005
Figure 5: Visual comparison of the performance between PADM3 and GM-PADM3, DADM3 and GM-DADM3 (nonoverlapping group sparse Gaussian cases). Noise level . The maximum number of stages .
358742.fig.006
Figure 6: Visual comparison of the performance between PADM3 and GM-PADM3, DADM3 and GM-DADM3 (nonoverlapping group sparse Gaussian cases). Noise level . The maximum number of stages .

In Figure 7, we show the sorted groups’ magnitude of nonoverlapping group sparse Bernoulli signal whose nonzero components are generated by randomly either 1 or −1 and do not have fast decaying property. In Figure 8, we present the comparison results between ADMs and corresponding GM-ADMs, and we can see that GM-ADMs can achieve considerable promotion either in noiseless environment or in noise environment. We also give the visual comparison between PADM3 and GM-ADM3 in Figure 9. From Figures 8 and 9, for nonoverlapping group sparse Bernoulli signals, we can observe the similar conclusions as nonoverlapping group sparse Gaussian signals.

358742.fig.007
Figure 7: The sorted groups’ magnitude of our tested nonoverlapping group sparse Bernoulli signal does not have fast decaying property.
358742.fig.008
Figure 8: Comparison results of different primal-based ADMs, dual-based ADMs, and the corresponding GM-ADMs (nonoverlapping group sparse Bernoulli cases). The -axe represents number of measurements, and the -axe represents relative error, respectively. The maximum number of stages: . Noise level of top row: . Middle row: . Bottom row: .
358742.fig.009
Figure 9: Visual comparison of the performance between PADM3 and GM-ADM3 (nonoverlapping group sparse Bernoulli cases). Noise level from top row to bottom row: , , and , respectively. The maximum number of stages .
3.2. Synthetic Overlapping Group Sparsity Experiment

To assess the performance of our algorithm when overlapping groups are given a priori, we generate the simulation data with variables, covered by 126 groups of 10 variables with 2 variables of overlap between two consecutive groups: , . Then we randomly pick of them as active (nonzero) groups, whose entries are either iid random Gaussian or , while the remaining groups are all zeros. We use the same test sets as shown in Table 1, and the only difference is that the signal has overlapping group sparse structure here. In addition, the optimal parameters and terminal condition in Part A are still applicable here. Considering that the YALL1-group package only includes primal-based ADMs for overlapping group sparse problems, here we just compare the results between PADM3 and GM-PADM3, for fair comparison and paper conciseness. Empirically, can already achieve satisfactory reconstruction quality in terms of relative error, and thereby we set for all of the experiments.

In Figure 10, we show the sorted groups’ magnitude of our tested overlapping group sparse signals whose nonzero components are generated by Gaussian distribution also has fast decaying property. In Figure 11, we present the comparison between PADM3 and GM-PADM3, and we can see that GM-ADM3 achieve considerable promotion in either noiseless environment or noisy environment.

358742.fig.0010
Figure 10: The sorted groups’ magnitude of our tested overlapping group sparse Gaussian signal has fast decaying property.
358742.fig.0011
Figure 11: Comparison of the performance between PADM3 and GM-PADM3 (overlapping group sparse Gaussian cases). Noise level from left to right: , , and , respectively. The maximum number of stages .

In order to better illustrate the practical advantage of our algorithm for overalpping group sparse reconstruction, we also give visual comparisons in Figure 12, and we can see that the accuracy rate of the elements’ value of PADM3 is fairly low, while GM-ADM3 can achieve a rather high accuracy rate.

358742.fig.0012
Figure 12: Visual comparison of the performance between PADM3 and GM-ADM3 (overlapping group sparse Gaussian cases). Noise level from top row to bottom row: , , and , respectively. The maximum number of stages .

In Figure 13, we show that the sorted groups’ magnitude of overlapping group sparse signals whose nonzeros are generated by randomly either or does not have fast decaying property. In Figure 14, we present the comparison between PADM3 and GM-PADM3, where GM-ADM3 achieves considerable promotion no matter in noiseless environment or noise environment. We also give the visual comparison between PADM3 and GM-ADM3 in Figure 15. From Figures 14 and 15, for overlapping group sparse Bernoulli signals, we can observe the similar conclusions as overlapping group sparse Gaussian signals.

358742.fig.0013
Figure 13: The sorted groups’ magnitude of our tested overlapping group sparse Bernoulli signal does not have fast decaying property.
358742.fig.0014
Figure 14: Comparison of the performance between PADM3 and GM-PADM3 (overlapping group sparse Bernoulli cases). Noise level from left to right: , , and , respectively. The maximum number of stages .
358742.fig.0015
Figure 15: Visual comparison of the performance between PADM3 and GM-ADM3 (overlapping group sparse Bernoulli cases). Noise level from top row to bottom row: , , and , respectively. The maximum number of stages .
3.3. A Nonoverlapping Group Sparsity Simulation Example from Collaborative Spectrum Sensing

In this part, we study an interesting special case of the nonoverlapping group sparsity structure called joint sparsity; namely, a set of sparse solutions share a common nonzero support. This example that comes from collaborative spectrum sensing [37], which aims at detecting spectrum holes (i.e., channels not used by any primal users), is the precondition for the implementation of Cognitive Radio (CR). The Cognitive Radio (CR) nodes must constantly sense the spectrum in order to detect the presence of the Primary Radio (PR) nodes and use the spectrum holes without causing harmful interference to the PRs. Hence, sensing the spectrum in a reliable manner is of critical importance and constitutes a major challenge in CR networks. Collaborative spectrum sensing is expected to improve the ability of checking complete spectrum usage. We consider a cognitive radio network with CR nodes that locally monitor a subset of channels. A channel is either occupied by a PR or unoccupied, corresponding to the states 1 and 0, respectively. It is assumed that the number of occupied channels is much smaller than . The goal is to recover the occupied channels form the CR nodes’ observations. Via frequency-selective filters, a CR takes a small number of measurements that are linear combinations of multiple channels. In order to mix the different channel sensing information, the filter coefficients are designed to be random numbers. Then, the filter outputs are sent to the fusion center. Assume that there are frequency selective filters in each CR node sending out reports regarding the channels. The sensing process at each CR can be represented by filter coefficients matrix . Let an diagonal matrix represent the states of all the channel sources using 0 and 1 as diagonal entries, indicating the unoccupied or occupied states, respectively. There are nonzero entries in the diagonal matrix . In addition, channel gains between the CRs and channels are described in an channel gain matrix given by [38]. Then, the measurements reports sent to the fusion center can be written as a matrix as follows:

Now, we need a highly effective method for recovering

In , each column (denoted by ) corresponds to the channel occupancy status received by CR , and each row (denoted by ) corresponds to the status of channel . Since there are only a small number of channels which are used, is sparse in terms of the number of rows containing nonzero. In each nonzero row , if , other entries in the same row are likely nonzeros. Therefore, is a joint sparse matrix.

The weighted model of the joint sparsity problem is where denotes the collection of jointly sparse solutions, and and denote the th row and th column of , respectively.

Let us define where is the identity matrix, and and are standard notations for the vectorization of a matrix and the Kronecker product, respectively. We partition into groups , where corresponds to the th row of matrix . Thus, we can obtain an equivalent group problem to (34) as follows:

The main advantage of joint sparsity reconstruction is its applications to large scale problems. Therefore, the following simulations is carried out for a relatively large dimensional applications with the following settings: we consider a 16-node cognitive radio network (i.e., ), the number of channels is 1024 (i.e., ), the number of active PR nodes ranges from 100 to 120 on the given set of 1024 channels, the measurement matrix is Gaussian random matrix, and the size is fixed as .

Here, we adopt the support detection strategy originally used in our previous work [10]. In order to better illustrate this strategy, we introduce a vector , and the elements of are the magnitudes of these groups (i.e., ). The rule of our choice is based on locating the “first significant jump in the increasingly sorted sequence ( denotes the th largest component of by magnitude), and the rule looks for the smallest such that Then, we set in (27).

For paper conciseness, here we just give the comparison between the ADM1 algorithm and GM-ADM1 algorithm, to demonstrate the superiority of our method. Namely, the multistage process can achieve impressive improvement compared to the original single stage process. The parameters and tolerance of ADM1 method are the same as the nonoverlapping group sparse experiment. In addition, for GM-ADM1 algorithm, empirically, we fix the maximum stage number , *mean(diff(sort in MATLAB.

In Figure 16, we present the comparison results between the ADM1 algorithm and the corresponding GM-ADM1 algorithm, in terms of the relative error, both in noiseless environment and noisy environment (noise level ). We can draw similar conclusions as those in parts A and B. Moreover, we also give the ideal results of our algorithm; that is, the support detection is based on the underlying true solution. While we usually do not know the true solution in practice, here we just use it as a reference and name it as ideal GM-PADM1 method (shortened as IGM-PADM1), as an ideal golden upper bound of the performance of iterative support detection based multistage methods. It fully demonstrates the superiority of our new idea; the iterative support detection based multistage process can bring significantly enhancement compared to the standard single stage process. In addition, it makes us believe that the GM-ADMs can bring dramatically better reconstructions compared to corresponding ADMs, as long as we can acquire enough reliable support detections. Even though the ideal case is not possible in practice, however, it serves as a benchmark and chalks out a path for us to explore. In addition, to make a further comparison between the ADM1 algorithm and GM-ADM1 algorithm, in Figure 17, we show the recoverability results of the two algorithms. It is worth emphasizing that we believe that the reconstruction is successful if the relative error is below the given threshold. Here, we just consider the noiseless case, and we set the threshold and , respectively. Not surprisingly, the recoverability of the GM-ADM1 is better than ADM1.

358742.fig.0016
Figure 16: Comparison results of ADM1, GM-ADM1, and IGM-ADM1 on joint sparsity problem (, , and ) from Cognitive Radio. The -axes represent the number of nonzero rows, and the -axes represent the relative error, respectively. The first column corresponds to the noiseless data and the second corresponds to the data with Gaussian noise (noise level ). The results are average of 100 runs.
358742.fig.0017
Figure 17: Comparison results of ADM1, and GM-ADM1 on the joint sparsity problem (, , and ) from Cognitive Radio. The -axes represent the number of nonzero rows, and the -axes represent the recoverability, respectively. The success thresholds of the first column and the second column are and , respectively. The results are average of 100 runs.

4. Conclusions and Possible Future Work

In this paper, we propose a novel GM-ADM approach for group sparse reconstruction problems. The final result is obtained from multistage process consisting of solving a series of weighted model and determining the adaptive weights via ISD. The numerical results demonstrate the extendability of iterative support detection from sparsity to group sparsity.

Our previous work [10] has demonstrated the dependence of threshold-ISD on the fast decaying property in the case of the plain sparse signal recovery for effectiveness. However, in this paper, the threshold-ISD can still achieve impressive performance without the dependence on the fast decaying property, in the presence of the prior grouping information. Considering that support detection is not limited to thresholding and reliable support detection guarantees better performance, we would like to study other specific signals, for example, structure sparsity signals in the future, and design more effective support detection methods based on the particular property of these signals.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work was supported by the Natural Science Foundation of China, Grant nos. 11201054 and 91330201, and by the Fundamental Research Funds for the Central Universities ZYGX2012J118, ZYGX2013Z005.

References

  1. M. Yuan and Y. Lin, “Model selection and estimation in regression with grouped variables,” Journal of the Royal Statistical Society. Series B. Statistical Methodology, vol. 68, no. 1, pp. 49–67, 2006. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet · View at Scopus
  2. F. R. Bach, “Consistency of the group lasso and multiple kernel learning,” Journal of Machine Learning Research, vol. 9, pp. 1179–1225, 2008. View at Google Scholar · View at MathSciNet · View at Scopus
  3. S. Ma, X. Song, and J. Huang, “Supervised group Lasso with applications to microarray data analysis,” BMC Bioinformatics, vol. 8, no. 1, article 60, 2007. View at Publisher · View at Google Scholar · View at Scopus
  4. D. Eiwen, G. Taubock, F. Hlawatsch, and H. Feichtinger, Group Sparsity Sethods sor Sompressive Channel Estimation in Doubly Dispersive Multicarrier Systems, 2010.
  5. E. van den Berg, M. Schmidt, M. Friedlander, and K. Murphy, “Group sparsity via linear-time projection,” Tech. Rep., Department of Computer Science, University of British Columbia, Vancouver, Canada, 2008. View at Google Scholar
  6. J. Liu, S. Ji, and J. Ye, SLEP: Sparse Learning with Effcient Projections, Arizona State University, 2009.
  7. Z. Qin, K. Scheinberg, and D. Goldfarb, “Efficient block-coordinate descent algorithms for the Group Lasso,” Mathematical Programming Computation, vol. 5, no. 2, pp. 143–169, 2013. View at Publisher · View at Google Scholar · View at Scopus
  8. S. J. Wright, R. D. Nowak, and M. A. Figueiredo, “Sparse reconstruction by separable approximation,” IEEE Transactions on Signal Processing, vol. 57, no. 7, pp. 2479–2493, 2009. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  9. W. Deng, W. Yin, and Y. Zhang, “Group sparse optimization by alternating direction method,” in Wavelets and Sparsity XV, vol. 8858 of Proceedings of SPIE, 2013. View at Publisher · View at Google Scholar
  10. Y. Wang and W. Yin, “Sparse signal reconstruction via iterative support detection,” SIAM Journal on Imaging Sciences, vol. 3, no. 3, pp. 462–491, 2010. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet · View at Scopus
  11. J. Huang, T. Zhang, and D. Metaxas, “Learning with structured sparsity,” The Journal of Machine Learning Research, vol. 12, pp. 3371–3412, 2011. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  12. S. Kim and E. P. Xing, “Tree-guided group lasso for multi-task regression with structured sparsity,” in Proceedings of the 27th International Conference on Machine Learning (ICML '10), pp. 543–550, June 2010. View at Scopus
  13. F. Bach, R. Jenatton, J. Mairal, and G. Obozinski, “Structured sparsity through convex optimization,” Statistical Science, vol. 27, no. 4, pp. 450–468, 2012. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  14. T. Zhang, “Analysis of multi-stage convex relaxation for sparse regularization,” The Journal of Machine Learning Research, vol. 11, pp. 1081–1107, 2010. View at Google Scholar · View at MathSciNet · View at Scopus
  15. N. Vaswani and W. Lu, “Modified-CS: modifying compressive sensing for problems with partially known support,” IEEE Transactions on Signal Processing, vol. 58, no. 9, pp. 4595–4607, 2010. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  16. W. Lu and N. Vaswani, “Regularized modified BPDN for noisy sparse reconstruction with partial erroneous support and signal value knowledge,” IEEE Transactions on Signal Processing, vol. 60, no. 1, pp. 182–196, 2012. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  17. L. Jacques, “A short note on compressed sensing with partially known signal support,” Signal Processing, vol. 90, no. 12, pp. 3308–3312, 2010. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at Scopus
  18. M. P. Friedlander, H. Mansour, R. Saab, and O. Yilmaz, “Recovering compressively sampled signals using partial support information,” IEEE Transactions on Information Theory, vol. 58, no. 2, pp. 1122–1134, 2012. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  19. R. E. Carrillo, L. F. Polania, and K. E. Barner, “Iterative algorithms for compressed sensing with partially known support,” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '10), pp. 3654–3657, March 2010. View at Publisher · View at Google Scholar · View at Scopus
  20. R. E. Carrillo, L. F. Polanía, and K. E. Barner, “Iterative hard thresholding for compressed sensing with partially known support,” in Proceedings of the 36th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '11), pp. 4028–4031, May 2011. View at Publisher · View at Google Scholar · View at Scopus
  21. E. J. Candes, M. B. Wakin, and S. P. Boyd, “Enhancing sparsity by reweighted l1 minimization,” The Journal of Fourier Analysis and Applications, vol. 14, no. 5-6, pp. 877–905, 2008. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  22. R. Chartrand and W. Yin, “Iteratively reweighted algorithms for compressive sensing,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '08), pp. 3869–3872, April 2008. View at Publisher · View at Google Scholar · View at Scopus
  23. E. J. Candès, J. K. Romberg, and T. Tao, “Stable signal recovery from incomplete and inaccurate measurements,” Communications on Pure and Applied Mathematics, vol. 59, no. 8, pp. 1207–1223, 2006. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  24. D. L. Donoho, “Compressed sensing,” IEEE Transactions on Information Theory, vol. 52, no. 4, pp. 1289–1306, 2006. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  25. W. Guo and W. Yin, “Edge guided reconstruction for compressive imaging,” SIAM Journal on Imaging Sciences, vol. 5, no. 3, pp. 809–834, 2012. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet · View at Scopus
  26. Y. Zhang, An Alternating Direction Algorithm for Nonnegative Matrix Factorization, TR10-03, Rice University, 2010.
  27. Z. Wen, W. Yin, and Y. Zhang, “Solving a low-rank factorization model for matrix completion by a nonlinear successive over-relaxation algorithm,” Mathematical Programming Computation, vol. 4, no. 4, pp. 333–361, 2012. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet · View at Scopus
  28. Y. Shen, Z. Wen, and Y. Zhang, “Augmented Lagrangian alternating direction method for matrix separation based on low-rank factorization,” Optimization Methods & Software, vol. 29, no. 2, pp. 239–263, 2014. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  29. Y. Xu, W. Yin, Z. Wen, and Y. Zhang, “An alternating direction algorithm for matrix completion with nonnegative factors,” Frontiers of Mathematics in China, vol. 7, no. 2, pp. 365–384, 2011. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  30. M. J. D. Powell, “A method for nonlinear constraints in minimization problems,” in Optimization, R. Fletcher, Ed., pp. 283–298, Academic Press, New York, NY, USA, 1969. View at Google Scholar · View at MathSciNet
  31. R. Glowinski and P. Le Tallec, Augmented Lagra ngian and Operatorsplitting Methods in Nonlinear Mechanics, Society for Industrial Mathematics, 1989.
  32. R. Glowinski, Numerical Methods for Nonlinear Variational Problems, Springer, 2008. View at MathSciNet
  33. L. Jacob, G. Obozinski, and J.-P. Vert, “Group lasso with overlap and graph lasso,” in Proceedings of the 26th International Conference On Machine Learning (ICML '09), pp. 433–440, ACM, June 2009. View at Scopus
  34. D. L. Donoho, “De-noising by soft-thresholding,” IEEE Transactions on Information Theory, vol. 41, no. 3, pp. 613–627, 1995. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  35. D. H. Wolpert and W. G. Macready, “No free lunch theorems for optimization,” IEEE Transactions on Evolutionary Computation, vol. 1, no. 1, pp. 67–82, 1997. View at Publisher · View at Google Scholar · View at Scopus
  36. YALL1-Group: A solver for group/joint sparse reconstruction, http://www.convexoptimization.com/wikimization/index.php.
  37. J. Meng, W. Yin, H. Li, E. Hossain, and Z. Han, “Collaborative spectrum sensing from sparse observations in cognitive radio networks,” IEEE Journal on Selected Areas in Communications, vol. 29, no. 2, pp. 327–337, 2011. View at Publisher · View at Google Scholar · View at Scopus
  38. T. S. Rappaport, Wireless Communications: Priciples and Practice, Prentice Hall, 2nd edition, 2002.