Preserver Problems on Function Spaces, Operator Algebras, and Related TopicsView this Special Issue
Research Article | Open Access
Maryam A. Alghamdi, Mohammad Ali Alghamdi, Naseer Shahzad, Hong-Kun Xu, "Properties and Iterative Methods for the -Lasso", Abstract and Applied Analysis, vol. 2013, Article ID 250943, 8 pages, 2013. https://doi.org/10.1155/2013/250943
Properties and Iterative Methods for the -Lasso
We introduce the -lasso which generalizes the well-known lasso of Tibshirani (1996) with a closed convex subset of a Euclidean m-space for some integer . This set can be interpreted as the set of errors within given tolerance level when linear measurements are taken to recover a signal/image via the lasso. Solutions of the -lasso depend on a tuning parameter . In this paper, we obtain basic properties of the solutions as a function of . Because of ill posedness, we also apply regularization to the -lasso. In addition, we discuss iterative methods for solving the -lasso which include the proximal-gradient algorithm and the projection-gradient algorithm.
The lasso of Tibshirani  is the minimization problem: where is an (real) matrix, , and is a tuning parameter. It is equivalent to the basis pursuit (BP) of Chen et al. : It is well known that both lasso and BP model a number of applied problems arising from machine learning, signal/image processing, and statistics, due to the fact that they promote the sparsity of a signal . Sparsity is popular phenomenon that occurs in practical problems since a solution may have a sparse representation in terms of an appropriate basis and therefore has been paid much attention.
Observe that both the lasso (1) and BP (2) can be viewed as the regularization applied to the inverse linear system in : In sparse recovery, the system (3) is underdetermined (i.e., and often indeed). The theory of compressed sensing of Donoho  and Candès et al. [4, 5] makes a breakthrough that under certain conditions the underdetermined system (3) can determine a unique -sparse solution. (Recall that a signal is said to be -sparse if the number of nonzero entries of is no bigger than .)
However, due to errors of measurements, the system (3) is actually inexact: . It turns out that the BP (2) is reformulated as where is the tolerance level of errors and is a norm on (often it is the norm for ; a solution to (4) when the tolerance is measured by the norm is known as the Dantzig selector by Candès and Tao ; see also ).
Note that if we let be the closed ball in around and with radius of , then (4) is rewritten as
Let now be a nonempty closed convex subset of and let be the projection from onto . Then noticing the condition being equivalent to the condition , we see that the problem (5) is solved via Applying the Lagrangian method, we arrive at the following equivalent minimization: where is a Lagrangian multiplier (also interpreted as a regularization parameter).
Alternatively, we may view (7) as the regularization of the inclusion which extends the linear system (3) in an obvious way. We refer to the problem (7) as the -lasso since it is the regularization of inclusion (8) as lasso (1) is the regularization of the linear system (3). Throughout the rest of this paper, we always assume that (8) is consistent (i.e., solvable).
-lasso (7) is also connected with the so-called split feasibility problem (SFP) of Censor and Elfving  (see also ) which is stated as finding a point with the property where and are closed convex subsets of and , respectively. An equivalent minimization formulation of the SFP (9) is given as Its regularization is given as the minimization where is a regularization parameter. Problem (7) is a special case of (11) when the set of constraints, , is taken to be the entire space .
The purpose of this paper is to study the behavior, in terms of , of solutions to the regularized problem (7). (We leave the more general problem (11) to further work, due to the fact that the involvement of another closed convex set brings some technical difficulties which are not easy to overcome.) We discuss iterative methods for solving the -lasso, including the proximal-gradient method and the projection-gradient method, the latter being derived via a duality technique. Due to ill posedness, we also apply the regularization to the -lasso.
Let be an integer and let be the Euclidean -space. If , we use to denote the norm on . Namely, for , Let be a closed convex subset of . Recall that the projection from to is defined as the operator The projection is characterized as follows: Projections are nonexpansive. Namely, we have the following.
Proposition 1. One has that is firmly nonexpansive in the sense that In particular, is nonexpansive; that is, for all .
Recall that function is convex if for all and . (Note that we only consider finite-valued functions.)
The subdifferential of a convex function is defined as the operator given by The inequality in (17) is referred to as the subdifferential inequality of at . We say that is subdifferentiable at if is nonempty. It is well known that, for an everywhere finite-valued convex function on , is everywhere subdifferentiable.
Examples. (i) If for , then ; (ii) of for , then is given componentwise by Here is the sign function; that is, for ,
Consider the unconstrained minimization problem The following are well known.
Proposition 2. Let be everywhere finite-valued on .(i) If is strictly convex, then (20) admits at most one solution.(ii) If is convex and satisfies the coercivity condition then there exists at least one solution to (20). Therefore, if is both strictly convex and coercive, there exists one and only one solution to (20).
Proposition 3. Let be everywhere finite-valued convex on and . Suppose is bounded below (i.e., ). Then is a solution to minimization (20) if and only if it satisfies the first-order optimality condition:
3. Properties of the -Lasso
We study some basic properties of the -lasso which is repeated below where is a regularization parameter. We also consider the following minimization (we call it -least squares problem): Denote by and the solution sets of (24) and (23), respectively. Since is continuous, convex, and coercive (i.e., as ), is closed, convex, and nonempty. Notice also that since we assume the consistency of (8), we have ; moreover, the solution sets of (8) and (24) coincide.
Observe that the assumption that actually implies that is uniformly bounded in , as shown by the lemma below.
Lemma 4. Assume that (24) is consistent (i.e., ). Then, for and , one has .
Proof. Let . In the relation taking yields (for ) It follows that This proves the conclusion of the lemma.
Proposition 5. One has the following.(i) The functions are well defined for . That is, they do not depend upon particular choice of .(ii) The function is decreasing in .(iii)The function is increasing in .(iv) is continuous in .
Proof. For , we have the optimality condition:
Here is the transpose of and stands for the subdifferential in the sense of convex analysis. Equivalently,
It follows by the subdifferential inequality that
In particular, for ,
Interchange and to get
Adding up (32) and (33) yields
Consequently, . Moreover, (32) and (33) imply that and , respectively. Hence , and it follows that the functions
are well defined for .
Now substituting for in (31), we get Interchange and and and to find Adding up (36) and (37) and using the fact that is firmly nonexpansive, we deduce that We therefore find that if , then . This proves that is nonincreasing in . From (38) it also follows that is continuous for , which implies the continuity of for .
To see that is increasing, we use the inequality (as ) which implies that Now if , then, as , we immediately get that and the increase of is proven.
Proposition 6. One has the following.(i).(ii).
Proof. (i) Taking the limit as in the inequality (and using the boundedness of )
The result in (i) then follows.
As for (ii), we have, by (27), for any . In particular, , where is an minimum-norm element of ; that is, .
Assume is such that . Then for any , It follows that solves the minimization problem: ; that is, . Consequently, This suffices to ensure that the conclusion of (ii) holds.
It is a challenging problem how to select the tuning (i.e., regularizing) parameter in lasso (1) and -lasso (7). There is no general rule to universally select which should instead be selected in a case-to-case manner. The following result however points out that cannot be large.
Proposition 7. Let be a nonempty closed convex subset of and assume that -lasso (7) is consistent (i.e., solvable). If (note that this condition is reduced to for lasso (1) for which ), then . (Here is, as before, the solution set of the -least squares problem (24).)
Proof. Let . The optimality condition
Taking in subdifferential inequality (31) yields
It follows that
Now by Lemma 4, we have . Hence, from (49) it follows that if , we must have . This completes the proof.
Proposition 8. Let be a nonempty closed convex subset of and let and . Then is a solution of the -lasso (23) if and only if and . It turns out that where is the null space of and where denotes the closed ball centered at the origin and with radius of . This shows that if one can find one solution to -lasso (23), then all solutions are found by (50).
Proof. If , then from the relations we obtain . This together with the assumption of yields that which in turn implies that and hence .
4. Iterative Methods
In this section we discuss the proximal iterative methods for solving -lasso (7). The basics are Moreau's concept of proximal operators and their fundamental properties which are briefly mentioned below. (For the sake of our purpose, we however confine ourselves to the finite-dimensional setting.)
4.1. Proximal Operators
Let be the space of convex functions in that are proper, lower semicontinuous and convex.
Definition 9 (see [10, 11]). The proximal operator of is defined by
The proximal operator of of order is defined as the proximal operator of ; that is,
For fundamental properties of proximal operators, the reader is referred to [12, 13] for details. Here we only mention the fact that the proximal operator can have a closed-form expression in some important cases as shown in the examples below .(a)If we take to be any norm of , then In particular, if we take to be the absolute value function of the real line , we get which is also known as the scalar soft-thresholding operator.(b)Let be an orthonormal basis of and let be real positive numbers. Define by Then , where In particular, if for , then where for .
4.2. Proximal-Gradient Algorithm
The proximal operators can be used to minimize the sum of two convex functions: where . It is often the case where one of them is differentiable. The following is an equivalent fixed point formulation of (59).
Initialize and iterate where is a sequence of positive real numbers.
Theorem 11 (see [12, 14]). Let and assume (59) is consistent. Assume in addition the following.(i) is Lipschitz continuous on : (ii). Then the sequence generated by the proximal-gradient algorithm (61) converges to a solution of (59).
4.3. The Relaxed Proximal-Gradient Algorithm
The relaxed proximal-gradient algorithm generates a sequence by the following iteration process.
Initialize and iterate where is the sequence of relaxation parameters and is a sequence of positive real numbers.
Theorem 12 (see ). Let and assume (59) is consistent. Assume in addition the following.(i) is Lipschitz continuous on : (ii). (iii). Then the sequence generated by proximal-gradient algorithm (61) converges to a solution of (59).
If we take , then the relaxation parameters can be chosen from a larger pool; they are allowed to be close to zero. More precisely, we have the following theorem.
Theorem 13 (see ). Let and assume (59) is consistent. Define the sequence by the following relaxed proximal algorithm: Suppose that (a) satisfies the Lipschitz continuity condition (i) in Theorem 12;(b) and for all ;(c). Then converges to a solution of (59).
4.4. Proximal-Gradient Algorithms Applied to Lasso
For -lasso (7), we take and . Noticing that which is Lipschitz continuous with constant for is nonexpansive, we find that proximal-gradient algorithm (61) is reduced to the following algorithm for solving -lasso (7):
Remark 16. Proximal-gradient algorithm (61) can be reduced to a projection-gradient algorithm in the case where the convex function is homogeneous (i.e., for and ) because the homogeneity of implies that the proximal operator of is actually a projection; more precisely, we have
where . As a result, proximal-gradient algorithm (61) is reduced to the following projection-gradient algorithm:
Now we apply projection-gradient algorithm (68) to -lasso (7). In this case, we have and (homogeneous). Thus, and the convex set is given as . We find that, for each positive number , is the projection of the Euclidean space to the ball with radius of ; that is, . It turns out that proximal-projection algorithm (66) is rewritten as a projection algorithm below:
5. An - Regularization for the -Lasso
-lasso (7) may be ill posed and therefore needs to be regularized. Inspired by the elastic net  which regularizes lasso (1), we introduce an - regularization for the -Lasso as the minimization where and are regularization parameters. This is indeed the traditional Tikhonov regularization applied to -lasso (7).
Let be the unique solution of (70) and set which are the limits of as and , respectively. Let
Proposition 17. Assume the -least-squares problem is consistent (i.e., solvable) and let be its nonempty set of solutions.(i) As (for each fixed ), which is the () minimum-norm solution to -lasso (7). Moreover, as , every cluster point of is a () minimum-norm solution of -least squares problem (73), that is, a point in the set .(ii)As (for each fixed ), which is the unique solution to the regularized problem: Moreover, as , which is the minimal norm solution of (73); that is, .
Proof. We have that satisfies the optimality condition: where the subdifferential of is given by It turns out that the above optimality condition is reduced to Using the subdifferential inequality, we obtain for . Replacing with for and yields Interchange and and and to get Adding up (79) and (80) results in Since - regularization (70) is the Tikhonov regularization of -lasso (7), we get Here is a constant. It follows that is bounded.(i)For fixed , we can use the theory of Tikhonov regularization to conclude that is continuous in and converges, as , to which is the () minimum-norm solution to -lasso (7), that is, the unique element . By Proposition 6, we also find that every cluster point of , as , lies in the set .(ii)Fix and use Proposition 6 to see that as . Now the standard property of Tikhonov's regularization ensures that as .
We can also take and . Then with , and the proximal algorithm (61) is reduced to Here . Convergence of this algorithm is given below.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
The authors were grateful to the anonymous referees for their helpful comments and suggestions which improved the presentation of this paper. This work was funded by the Deanship of Scientific Research (DSR), King Abdulaziz University, under Grant no. 2-363-1433-HiCi. The authors, therefore, acknowledge the technical and financial support of KAU.
- R. Tibshirani, “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistical Society B, vol. 58, no. 1, pp. 267–288, 1996.
- S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decomposition by basis pursuit,” SIAM Journal on Scientific Computing, vol. 20, no. 1, pp. 33–61, 1998.
- D. L. Donoho, “Compressed sensing,” IEEE Transactions on Information Theory, vol. 52, no. 4, pp. 1289–1306, 2006.
- E. J. Candès, J. Romberg, and T. Tao, “Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information,” IEEE Transactions on Information Theory, vol. 52, no. 2, pp. 489–509, 2006.
- E. J. Candès, J. K. Romberg, and T. Tao, “Stable signal recovery from incomplete and inaccurate measurements,” Communications on Pure and Applied Mathematics, vol. 59, no. 8, pp. 1207–1223, 2006.
- E. Candès and T. Tao, “The Dantzig selector: statistical estimation when p is much larger than n,” Annals of Statistics, vol. 35, no. 6, pp. 2313–2351, 2007.
- T. T. Cai, G. Xu, and J. Zhang, “On recovery of sparse signals via minimization,” IEEE Transactions on Information Theory, vol. 55, no. 7, pp. 3388–3397, 2009.
- Y. Censor and T. Elfving, “A multiprojection algorithm using Bregman projections in a product space,” Numerical Algorithms, vol. 8, no. 2–4, pp. 221–239, 1994.
- H.-K. Xu, “Iterative methods for the split feasibility problem in infinite-dimensional Hilbert spaces,” Inverse Problems, vol. 26, no. 10, Article ID 105018, 17 pages, 2010.
- J.-J. Moreau, “Propriétés des applications ‘prox’,” Comptes Rendus de l'Académie des Sciences, vol. 256, pp. 1069–1071, 1963.
- J.-J. Moreau, “Proximité et dualité dans un espace hilbertien,” Bulletin de la Société Mathématique de France, vol. 93, pp. 273–299, 1965.
- P. L. Combettes and V. R. Wajs, “Signal recovery by proximal forward-backward splitting,” Multiscale Modeling & Simulation, vol. 4, no. 4, pp. 1168–1200, 2005.
- C. A. Micchelli, L. Shen, and Y. Xu, “Proximity algorithms for image models: denoising,” Inverse Problems, vol. 27, no. 4, Article ID 045009, 30 pages, 2011.
- H. K. Xu, “Properties and iterative methods for the Lasso and its variants,” Chinese Annals of Mathematics B, vol. 35, no. 3, 2014.
- H. Zou and T. Hastie, “Regularization and variable selection via the elastic net,” Journal of the Royal Statistical Society B, vol. 67, no. 2, pp. 301–320, 2005.
Copyright © 2013 Maryam A. Alghamdi et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.