Abstract
We present a normalization of the norm. A compressive sensing criterion is proposed using the normalized zero norm. Based on the method of Lagrange multipliers, we derive the solution of the proposed optimization framework. It turns out that the new solution is a limit case of the least fractional norm solution for , where its fixedpoint iteration algorithm can readily follow an existing algorithm. The derivation of the minimal normalized zero norm solution herein gives a relation in the aspect of Lagrange multiplier method to existing works that invoke least fractional norm and least pseudo zero norm criteria.
1. Introduction
Various applications in science and engineering need to recover a desired signal from a set of observed data or measured data based on a modeling or measurement matrix , which either depends on the model or can be chosen beforehand, for and . A linear system in Figure 1 can be represented bywhere is the amount of perturbation hidden in the output .
The signal can be recovered by solving an optimization problem related to linear least squares (LLS), i.e.,where is the square root of the maximal allowable noise power [1]. Usually, the matrix is of full rank, i.e.,where is the rank of matrix. Whether the number of provided data is greater than, equal to, or less than the number of unknown variables , i.e., the size of the matrix , the LLS problem can be classified into
The solution of (4) is well known as the LLS estimatewhere is the transpose of a vector or matrix, and is the inverse of square matrix. The LLS estimate exists provided that the matrix or or is invertible, i.e., the measurement matrix needs to be of full rank. The LLS estimate provides all nonzero entries to the desired signal .
Many works indicate that the desired signal often is subject to sparsity, i.e., the situation when a number of elements in are zeros. Even though the signal sometimes does not strictly entail the sparsity, it is efficient to keep an approximate value of the signal that contains only a sufficient amount of its largest components, from which we called compressible signal. The sparsity nature of the signal is usually hidden and can be exposed by discovering a sparse basis and its associated spanning coefficients. The decomposition of the coefficient vector can be seen as a superposition of dictionary elements with a remaining term [2]. Compressed sensing was an emerging field that spans many applications in science and engineering, e.g., imaging and vision [3], photonic mixer device [4], electronic defense [5], security and cryptosystem [6], radar [7, 8], earth observation [9], wireless networks [10, 11], biometric watermarking [12], and healthcare [13].
In compressive sensing, the norm is originally adopted to impose the zero elements in the solution. The optimization of norm criterion however appears to be a combinatorial nondeterministic polynomialtime hard (NPhard) problem, which appears to be prohibitive. The performance of the above optimization problem can however be analyzed, e.g., in [14].
Instead of the norm, norm is often of interest because it is more convenient than norm optimization in terms of computability while its accuracy is comparable (see e.g., [15, 16]). A widely considered method designed for norm minimization of dictionary coefficients is known as matching pursuit [17]. Its variations are presented in terms of basis pursuit denoising [2], orthogonal matching pursuit [18], compressive sampling matching pursuit [19], stagewise orthogonal matching pursuit [20], gradient pursuits [21], etc. Most approaches based on the matching pursuit involve the norm, except for basis pursuit denoising, which considers the norm.
In this work, we point out that the zero norm that is mostly adopted in compressed sensing literature is not the actual zero norm, but rather a pseudo zero norm. We also show that the actual zero norm is unbounded and thus trivial. Later we present a normalized norm and apply its special case for to be a new objective function. By using the method of Lagrange multipliers, the proposed constrained optimization is solved and the emerging solution is equal to the limit case of that given by the least fractional norm for .
This paper is organized as follows. In Section 2, we point out that the norm is diverged or undefined, whereas the socalled zero norm adopted in compressive sensing is actually not a proper norm or is only a pseudonorm. In Section 3, we propose a normalized norm. It is later shown that for the normalized zero norm is approximately a geometrical mean and unfortunately it does not hold the triangle inequality of the proper norm. In Section 4, we consider the compressive sensing model. The fractional norm for and its criterion presented in the past are revisited herein. The corresponding solution is found by the method of Lagrange multiplier. In Section 5, we propose an alternative criterion based on the normalized zero norm. We later derive its solution through the method of Lagrange multiplier. The solution is found in a closed form and unfortunately turns to be a limiting case to that of the least fractional norm criterion in the former works. In Section 6, numerical examples of the solution are provided in conjunction with other works. Concluding remarks of the paper are provided in Section 7.
2. Conventional Zero Norm
Let be a complexvalued vector, expressed as
Let be a positive integer. The norm, or simply the norm, , is given bywhere is the absolute value of . Basic properties of the norm in (7) are as follows:(i)The norm is positivedefinite, i.e.,(ii)The norm is zero if and only if the vector is zero, i.e., where is a zero vector whose all elements are zeros.(iii)The norm has a triangle inequality, i.e., for and .(iv)Furthermore, the norm has a scalability or homogeneity, which can be shown as for and .
2.1. Zero Norm
When is zero, the 0norm of a vector can be determined fromwhere is the exponential function of . From the Taylor’s series expansion, we can express
For a small value of , i.e., , we can approximatewhere is the big ‘oh’ notation of Bachmann–Landau symbols, i.e.,
Substituting (14) into (12), we obtain
By using a property of logarithm power, we can show that
It is obvious that the 0norm is unbounded and thus trivial.
2.2. Pseudo Zero Norm
Most works, however, instead consider
By assigning to 0, we have a convenient relationwhere is the cardinality of a set or herein the number of nonzero members in a set . The pseudo zero norm counts the number of nonzero elements in . It is important to note that the pseudo zero norm actually is not the zero norm and is not a proper norm, because it does not preserve the homogeneity. However, it is widely used to replace the zero norm due to its simple computability.
3. Normalized Zero Norm
Let us introduce a normalization of the norm as
In statistical analysis ([22], Ch. 3), the above quantity is known as mean, a generalized mean, Hölder mean, mean of degree , power mean, etc. We can represent the relation between the normalized norm and the conventional norm by
When is zero, we can derive
Using the Taylor’s series expansion of the exponential function, we have
It is hard to deal with the result in (23). We would rather consider an approximation of (22) associated with (14), which is given by
Under the same manipulation, we can show that
We can see that due to the firstorder Taylor series approximation, can be approximated by the geometric mean of . The geometric mean has the following properties:(i)It is positivedefinite for any , i.e.,(ii)It can simply be zero when only one of all entries in is zero, i.e.,(iii)It does not hold the triangle inequality, e.g., for and , which mean .(iv)It is homogeneous, i.e.,(v)It is concave for any (see, e.g., [23]) and a monotonically increasing function.
4. Compressive Sensing
Let us consider an underdetermined system where there are more unknown signal components than equations, i.e., . In this case, there is infinite number of solutions for . Recent works indicate that the desired signal often is subject to sparsity, i.e., the situation when a number of elements in are zeros. Even though sometimes the signal does not strictly entail the sparsity, it is efficient to keep an approximate version that contains only a sufficient amount of its largest components, from which we called compressible signal. The sparsity nature of the signal is hidden and can be imposed by discovering a sparse basis and its associated spanning coefficients. The decomposition of the coefficient vector can be seen as a superposition of dictionary elements with a remaining term [2]. Let be the number of nonzero elements in . If is the true value of , there is a relationwhere is the cardinality of set, and is the support set or the sparsity pattern of . The number is often known as sparsity degree. Compressive sensing can be seen as a problem of finding a sparse signal [24]. Unfortunately, the solution in (5) does not preserve the inherent sparsity of the signal. A different kind of vector norms can be used to explore the signal sparsity. The signal recovery can be formulated as an optimization problem, i.e.,
Note that we do not express the norm in (31) as similarly to most works, because the zero norm in most works is equal to the pseudo zero norm in this paper. In general, the objective function in terms of the modified zero norm is nonconvex. If is an identity matrix, an exact solution of (31) is a hard shrinkage of . For arbitrary matrix , one may resort to combinatorial optimization. Even an approximate value of the true minimum of the problem in (31) is nondeterministic polynomialtime hard or NPhard, which appears to be prohibitive. The performance of the above optimization problem can, however, be analyzed, e.g., in [14].
4.1. Compressive Sensing by Fractional Norm
An alternative way is the consideration of [24–26]. When lies in the range ,(i)the norm accepts only a positive realvalued argument, i.e., ,(ii)it is a concave function, and(iii)the fractional norm does not hold the triangle property, e.g., mean
The last one implies that the fractional norm is not a proper norm or is only a quasinorm. For , the compressive sensing problemis nonconvex, nonsmooth, and nonLipschitz. The fractional norm gives a closer approximation to the pseudo zero norm than the 1norm, since the smaller the norm index , the sparser the solutions. It is shown in [27] that although a local minimum is found, exact reconstruction is possible with much sparser solution than that required by the 1norm reconstruction. The case of provides the sparsest solution for , while the compressive sensing solution with has no significant difference from that with [28, 29].
4.2. Method of Lagrange Multipliers
Let be the Lagrange constants. The constrained optimization problem in (35) can be solved by [25], i.e.,where is given by
The solution of (36) is given by [30]where is the diagonal matrix whose diagonal is taken from vector , and is given by
We can see that when is equal to 2, the least fractional norm criterion provides the same result as that shown in (5), i.e.,
Unlike the solution by the LLS criterion in (5), the solution by the fractional norm in (38) depends on the unknown variable , which later needs to be involved with an iterative computation. The computation procedure in an iterative way, which is known as FOCal Underdetermined System Solver (FOCUSS), is summarized in [30].

Algorithm 1 can suffer from many local minima. An alternative tries to avoid the NPhard problem for by sequentially minimizing a smooth function.
Sometimes, the least fractional exponent norm criterion is called iteratively reweighted least squares (IRLS) in compressive sensing [31–33]. When the vector in Algorithm 1 converges to the true value of , a number of elements in may be close to zeros, which can cause an ill condition of the matrix . It is suggested in [31] thatwhere is a regularization quantity that is large at the beginning of the iteration and gradually smaller when the iteration converges. In [32], the regularization quantity in (41) is replaced by its square . Let be a nonincreasing set of all elements in , which can be represented by
Let be the th element of the nonincreasing set , i.e., .

Algorithm 2 is different from [32] in two aspects. First, the regularization parameter in [32] is computed from the updated at the th iteration, which is unavailable. Second, the procedure addressed in [32] considers only .
5. Compressive Sensing by Normalized Zero Norm
We propose a new criterion by using the normalized zero norm. Under a similar idea to (32), the usual compressive sensing problem can alternatively be formulated as
Fortunately, when approximated as the geometrical mean in (25), the normalized zero norm makes the optimization problem nonconvex under a concave objective function and an affine/convex constraint.
The Lagrange function can be expressed as
The derivative of the Lagrange function with respect to and can be written as
We can show that
Using the chain rule for a realvalued variable , the derivative can be written as
By using the result from (47), we can derive
At the critical point , we have
At the critical point , we have
Substituting (50) into (49), we obtain
It should be noted that the result in (51) is equal to (38) with . Thus, the compressive sensing problem using the geometric mean is the limit case of the fractional norm problem for in (35), i.e.,and tends to be the desired but complicated problem with the pseudo zero norm in (31). Although the solution of the minimization of the normalized zero norm by Lagrange multiplier method appears to be the same as the former works, it gives a relation in the Lagrange multiplier method point of view to existing works that invoke least fractional norm and least pseudo zero norm criteria.
6. Numerical Examples
All computer simulations in this work are conducted using Python language. The rootmeansquared relative error (RMSRE), denoted byis the index for evaluating the performance of each algorithm. It is calculated by the square root of the probabilistic average of the square of the normalized estimation error, where the expectation is taken into account with respect to the randomization caused by(i)the true value of , which is assumed to follow an identical and independent realvalued Gaussian distribution with zero mean and unit variance, i.e.,(ii)the sparsity pattern of all nonzero elements in , which is assumed to have an equal probability for locations on all possible positions ( for a sparse signal vector).
The algorithms intended to comparison include(i)norm, the problem in (4) whose Euclidean norm is replaced by the norm, i.e., ,(ii)norm, the problem in (4),(iii)FOCUSS, Algorithm 1 with and the maximum number of iterations of ,(iv)IRLS, Algorithm 2 with , and(v)Theoretical approximate normalized zero norm (TANZN); the best possibility of the first iteration of the fixedpoint iteration in (52), calculated by substituting by the theoretical or true value , i.e.,
The minimizations of the norm and norm subject to are conducted by an interiorpoint solver for convex optimization [34]. For both fixedpoint iteration methods, such as the FOCUSS and the IRLS, the norm exponent is assumed to be . It should be noted that the solution in (55) is the ideal case of (51), because the true value is unknown. The realistic implementations of (55) were addressed in the past, e.g., in terms of the FOCUSS, the IRLS, etc. The design of a more accurate algorithm to the fixedpoint iteration required by (51) may remain open for a future work. The matrix inverse in (55) is usually subject to a large condition number, which can cause a numerical failure. One has to resolve this numerical instability by adding a tiny amount, e.g., , to each diagonal element of before its inverse operation.
In Figure 2, numerical computation is done with and from independent runs for each value of . One can see that the norm approach is very precise from to . At this critical region, the error abruptly arises probably because it is beyond the capability of the interiorpoint method in the convex optimization. The norm method does not explore the sparsity nature of the signal vector and thus performs worst. The IRLS performs almost identically to the FOCUSS, except for a little better performance during the transition region. The ideal TANZN method stays constant for any value of . It is worse than its actual implementations, such as the FOCUSS and the IRLS, for . However, it is worth noting that both fixedpoint iteration techniques each involve multiple iterations, e.g., maximal iterations in the FOCUSS, while the TANZN represents the best case for a single substitution or the first iteration.
In Figure 3, we assume that the length of the input elements is and the number of nonzero elements is , which depends on the number output elements , where is the operator of rounding to the nearest lower integer. One can see that when more observed data are available, the RMSRE decreases or the signal acquisition is more precise from all the abovementioned methods. The FOCUSS approach preforms identically to the IRLS algorithm. The norm minimization is the realistic method that provides the least amount of signal recovery error. The TANZAN technique indicates that if the desired signal reaches its true value, the possible acquisition error can be lower than that by the norm minimization.
7. Conclusion
A normalization of the norm, denoted by is presented. A compressive sensing criterion using the normalized zero norm is proposed. Based on the method of Lagrange multipliers, the solution of the proposed optimization framework, i.e., is derived. It turns out that the new solution is a limit case of the fractional norm solution for , where its fixedpoint algorithm can readily follow the FOCUSS algorithm in [30]. In our companion works, we find that the minimization of the normalized zero norm by Tikhonov regularization method provides a different solution from that of the fractional norm [28, 29].
Data Availability
The data used to support the findings of this study are available upon request.
Conflicts of Interest
The author declares no conflicts of interest.