Abstract

Analysis sparse representation has recently emerged as an alternative approach to the synthesis sparse model. Most existing algorithms typically employ the -norm, which is generally NP-hard. Other existing algorithms employ the -norm to relax the -norm, which sometimes cannot promote adequate sparsity. Most of these existing algorithms focus on general signals and are not suitable for nonnegative signals. However, many signals are necessarily nonnegative such as spectral data. In this paper, we present a novel and efficient analysis dictionary learning algorithm for nonnegative signals with the determinant-type sparsity measure which is convex and differentiable. The analysis sparse representation can be cast in three subproblems, sparse coding, dictionary update, and signal update, because the determinant-type sparsity measure would result in a complex nonconvex optimization problem, which cannot be easily solved by standard convex optimization methods. Therefore, in the proposed algorithms, we use a difference of convex (DC) programming scheme for solving the nonconvex problem. According to our theoretical analysis and simulation study, the main advantage of the proposed algorithm is its greater dictionary learning efficiency, particularly compared with state-of-the-art algorithms. In addition, our proposed algorithm performs well in image denoising.

1. Introduction

Real signals around our daily life are always distributed in a high dimensional space; however low-dimensional structures are found in the signals, then we can represent the signals with a proper model by only a few parameters [1]. A proper model should be simple while matching the signals. In the past decades, the sparse and redundant representation model has been proven to be an efficient and beneficial model [25]. The theoretical background for sparse models is given by compressed sensing (CS) [68]. CS mathematically declares that if a signal is sparse or compressive, this original signal can be reconstructed by a few measurements, which are much fewer than the counts suggested by previous theories [6, 7, 911]. Sparse representation has also been described as an extraordinary powerful solution for a wide range of real-word applications, especially in image processing, such as image denoising, deblurring, inpainting, restoration, superresolution, and also in the field of machine learning, computer vision, and so on [1221].

Sparse representation can be formulated by either a synthesis model or an analysis model. The synthesis model is popular and mature. The analysis model has been less investigated for sparse representation, although several analysis dictionary learning algorithms have been proposed, such as the analysis K-SVD [12], Greedy Analysis Pursuit (GAP) [22], and the analysis thresholding algorithm [23].

In practice, some signals such as chemical concentrations in experimental results and pixels in video frames and images have inherent nonnegativity, and dedicated factorization methods have been proposed [24, 25]. The above analysis sparse representation algorithms are all for general signals which contain nonnegative and negative elements. The methods for general signals directly applied to nonnegative signals cannot achieve satisfying results. An existing analysis dictionary learning method for nonnegative signals [26], which uses blocked determinants as the sparseness measure, is quite difficult or computationally expensive. The purpose of this paper is to address the problem on nonnegative analysis sparse representation.

In this paper, we present a novel algorithm of analysis sparse representation for nonnegative signals, which is parallel to the synthesis sparse representation in the principle and structure. Though this model has been studied in the past, there is still not a matured field for nonnegative analysis representation, while the algorithms designed for general signal cannot sufficiently be applied to the nonnegative signals. Thus, we focus on the nonnegative sparse representation with the analysis model. We cast the analysis sparse representation into three subproblems (analysis dictionary update, sparse coding, and signal recovery) and use an alternating scheme to obtain an optimization solution. We utilize the determinant-type of sparseness measure as the sparseness constraint, which is convex and differentiable. The objective function for sparse coding is nonconvex, and standard convex optimization methods cannot be employed. Fortunately the objective function is the difference of two convex functions, then we can introduce difference of convex (DC) programming to solve this nonconvex optimization problem.

The remainder of this paper is organized as follows. The conventional sparse representation problem is reviewed in Section 2. In Section 3, we introduce the analysis representation. In Section 4, we describe the problem formulation for analysis sparse representation and present the optimization framework. The experiments described in Section 5 demonstrate the practical advantages of the proposed algorithms compared with state-of-the-art algorithms both with artificial and real-world datasets. Finally, we present our conclusions in Section 6.

1.1. Notations

Here we list notations used in this paper. A boldface uppercase letter like is defined as a matrix, and a lowercase letter like is defined as the th entry of . A boldface lowercase letter, such as , is defined as a vector, and a lowercase letter is defined as the th entry of . Matrix slices and are defined as the th row and the th column of matrix , respectively. The Frobenius norm of matrix is defined as . The determinant value of a matrix is denoted by . Note that, in this paper, all parameters take real values.

2. Preliminaries

2.1. Sparse Representation

Sparse representation decomposes observed signals into a product of a dictionary matrix which contains signal bases and a sparse coefficient matrix [1317], and there are two different structures: synthesis model and analysis model. The synthesis model is the first proposed sparse model and more popular. We first review it in this section.

Assume that we want to model the signals , where is the signal dimensionality and is the number of measurements. The synthesis sparse model suggests that the signals could be expressed asorwhere refer to as a dictionary, is a representation coefficient matrix, and is a small residual [27]. Here, is the number of bases, which are also called dictionary atoms. We further assume that the representation matrix is sparse (i.e., many zero entries) to obtain sparse representations of the signals. Equations (1) or (2) mean that each signal can be represented as a linear combination of a few atoms from the dictionary matrix .

A key issue in the sparse representation is the choice of the dictionary which the observed signals are used to decompose. One choice is a predefined dictionary, such as discrete Fourier transform (DFT), discrete cosine transform (DCT), and wavelets [28], which can be employed for learning a signal-specific dictionary from observed signals. Another choice, the learned dictionary, results in better matching to the contents of signals. In addition, the learned dictionary often exhibits better performance compared to predefined dictionaries in real-world applications [29, 30].

Intriguingly, there exists a “twin” of the synthesis model called the analysis model [31]. Assume that there is a matrix that produces a sparse coefficient matrix by being multiplied to the signal matrix: . This equation can be obtained as the solution to a minimization problem of the error function . Remarkably, this error function is convex, and standard optimization methods can be employed. Since error functions in the synthesis model are nonconvex, optimization in the analysis model is often easier. We call the analysis dictionary. Atoms in the analysis dictionary are its rows, rather columns in the synthesis dictionary . The term “analysis” means the dictionary analyzes the signal to produce a sparse result [32]. To emphasize the difference between the analysis and synthesis models, the term “cosparsity” has been introduced in the literature [31, 33], which counts the number of 0-valued elements of , that is, zero elements coproduced by and [34]. The analysis sparse model is also called the cosparse model, and then the analysis dictionary is also called the cosparse dictionary.

Now we look more closely at the analysis sparse model. The analysis model for one signal , which is a column in the signal matrix , can be represented using a proper analysis dictionary . The th row, namely, th atom, in is denoted by . We want to make the analysis representation vector sparse. This is formulated by introducing a sparsity measure , so that it negatively behaves with the sparsity of , and minimizing yields the sparsest solution:Although employing the -norm, that is, setting , yields the sparsest solution [35], the optimization problem is combinatorial and often NP-hard. Therefore other sparsity measures such as the -norm are employed to have easier optimization problems. Nevertheless, it is known that the -norm often overpenalizes large elements and solutions are too sparse.

2.2. Sparseness Measure

The -norms, where , or , are popular measures for assessing the sparseness of a vector. Since the -norm yields an NP-hard problem, its convex relaxation, the -norm, is often preferred [36, 37]. The -norm of a vector is defined to be the sum of the absolute values of ; namely, . If the vector is nonnegative, that is, , the -norm of is just . For nonnegative vectors, their -norm is differentiable and smooth, and gradient methods can be used in optimization. Some authors introduce the -norm with nonnegative matrix factorization since the nonnegative constraints yield sparse solutions. However, the results with the -norm are not sparser than those with the -norm or -norm [38].

The sparsity measures mentioned above can reflect the instantaneous sparseness of one single signal [35], but they are not suitable for evaluating sparsity across different measurements [39]. In order to describe the joint sparseness of the nonnegative sources, we introduce determinant-type of sparsity measure. In spectral unmixing for remote sensing image interpretation, where signals are nonnegative, the determinant-type of sparsity measure [40] can get good results similar to the other sparseness-based methods from the results of numerical experiments [41]. Thus, the determinant-type measure can explicitly measure the sparseness of nonnegative matrices.

The determinant-type sparse measure has several good qualities. If a nonnegative matrix is normalized, we can find that the determinant value of the nonnegative matrix is well bounded and its value interpolates monotonously between two extremes of and , along with the increasing of the sparsity. For instance, if the nonnegative matrix is nonsparse with its rows and satisfies sum-to-one, then the determinant of , , is close to . On the other hand, approaches to if and only if the matrix is the most sparse [42]. Namely, the determinant value satisfies that , where if all entries of are the same, and when the following two criteria are satisfied at the same time:(1)For all , only one element in is nonzero.(2)For all and , and are orthogonal, that is, .

The detailed proof can be found in [40]. Thus we can use the determinant measure in the cost function. Figure 1 illustrates the sparseness degrees of three different matrices gauged by the determinant measure. The determinant values of the matrices from left to right are 0.0625, 0.5, and 1. We can see that the sparser the matrix is, the larger value the determinant measure is. Thus the sparse coding problem with determinant constraints can be expressed as an optimization problem

2.3. Related Works

The existing methods for sparse representation with the analysis model employ various sparsity constraints. The analysis K-SVD algorithm [12] minimizes the error between the noisy (observed) signals and the estimated (reconstructed) signals with the -norm as the sparsity constraint. The optimal backward greedy algorithm (OBG) is employed to estimate the analysis dictionary , which has higher complexity. In [4346], the -norm is imposed as sparse constraint on the sparse coefficients and a projected subgradient-based algorithm is employed to learn the analysis dictionary. The analysis operator learning (AOL) algorithm [43] utilizing the -norm constraint restricts the dictionary to a uniform normalized tight frame (UNTF) and learns the analysis dictionary as entire matrix by solving a cheap soft thresholding and then projecting the dictionary onto the UNTF set. Li et al. [26] propose a novel analysis sparse model based on the determinant-type measure, where the optimization problem is solved by an iterative sparseness maximization scheme. In their work, the problem can be cast into row-to-row optimization with respect to the analysis atoms and then utilize the quadratic programming (QP) technique to solve the optimization problem with respect to each row. However this method has no signal recovery stage and cannot apply to denoising and other applications. In recently, DC programming is introduced to deal with the nonconvex optimization problems for sparse representation [47]. DC programming and DC algorithm (DCA) constitute the backbone of smoothness and nonsmoothness, nonconvex programming, and global optimization [48]. Thus it is considered to be a powerful tool for nonconvex optimizations [48, 49]. The exiting DC programming based sparse representation algorithms are with synthesis model, and we first propose DCA to analysis model.

3. Formulation

Now we turn to describe the problem of analysis sparse representation. We consider the following assumption: given an observed signal vector , we assume that is a noisy version of a signal . Thus, , where is additive positive white Gaussian noise. Using an analysis dictionary , each row of which defines analysis atom, we assume that satisfies , where is the cosparsity of the signal defied to be the number of zero elements. Then we extend to signals matrix, and we use matrix to define signals with each column as one signal. Here we use as the sparse measure. Taking into account the noise in the measured signals, we formulate an optimization task for analysis dictionary learning bywhere is a noise-level parameter, is the constraint on the analysis dictionary, and is the sparse regularization. We prefer to use an alternative, regularized version of above equation with penalty multipliers. Here we employ as an approximation of , which makes the learning easier and faster in practice. Thus, the analysis sparse coding is cheap and can be obtained exactly by thresholding the product of and the sparsity measure on . Thus the analysis sparse representation can be rewritten aswhere is the representation coefficient matrix. Having a sparse representation means the representation matrix is sparse. We introduce normalization constraint to analysis dictionary which helps remove a scale ambiguity.

We can cast the entire problem into three variable-wise optimization subproblems. The first one is the analysis dictionary update subproblem and the next two problems are sparse coefficient matrix and signal update subproblems.

A pseudocode for such an analysis dictionary learning is presented in Algorithm 1 (ADLA).

(1) Initialization: ,
(2) while not converged do
(3) ,
(4)
(5)
(6)
(7) end while
3.1. Analysis Sparse Representation with Determinant Constraint

In this section, we introduce a novel determinant-type constrained sparse method to learn an analysis dictionary for sparse representation of nonnegative signals. What is more, we set the parameters and in (6) as positive constants. Considering these factors, our minimization problem for analysis sparse dictionary learning with the determinant-type constraint can be expressed asThe above objective function is nonconvex, which is hard to solve by traditional convex optimization methods.

4. Proposed Algorithm

The analysis representation with determinant-type constraint problem can be cast into three subproblems: an analysis dictionary update stage, a sparse coefficient coding stage, and a signal recovery stage. The corresponding formulations are as follows.

Analysis Dictionary Update Stage

Sparse Coefficient Coding Stage

Signal Recovery Stage

We will describe each stage in detail in the following subsections.

4.1. Analysis Dictionary Update

We use a projected subgradient type algorithm to solve the analysis dictionary update subproblem as follows:

Since the objective function above is a quadratic function of , the solution can be analytically found as the zero-gradient point. The gradient of the objective function isBy solving , we obtain

The projection of a dictionary onto a space with fixed row norms can be easily done for nonzero rows by scaling each row to have a unit norm. We use to denote this projection. If a row is zero, we set the row to a normalized random vector. Namely, is not uniquely defined. This is due to the fact that the set of uniformly normalized vectors is not convex. The projection can be found bywhere is a random vector on the unit sphere.

4.2. Sparse Coding by DC Programming

Optimization for estimating sparse coefficient is a core problem. The formulation is the subtraction of two convex functions, making the objective function not generally nonconvex. Thus we cannot use traditional convex optimization methods to solve this problem.

In this paper, to solve the objective function, we introduce the DC programming scheme to translate the minimization optimization problem into DC programming problem. According to the theory of DC programming [48, 49], we construct a DC function for our problem. Then the reformulated DC objective function can be expressed aswhereHere and are two convex functions. Then, the sparse coding problem can be reformulated as a DC program:where is an indicator function, which is defined by if and otherwise.

We utilize the scheme of DCA [48]. At each iteration ,At the iteration ,Consider that is separable according to the columns of , the above equation can be rewritten as

For the sparse coding stage in iteration scheme, we fix the dictionary and signals and update only in (16) and (17). To keep the coefficient matrix nonnegative, the update procedure of sparse coefficient can be briefly described as follows:where takes the positive part. Algorithm 2 shows the procedures of the proposed algorithm for sparse coding based on DC programming.

Require: matrix , matrix , and .
(1) Initialize , let
(2) while not converged do
(3)Compute using (19).
(4)Update using (22).
(5) end while
4.3. Signal Recovery

Then we consider the signal recovery stage. In this stage the analysis dictionary and the coefficients are fixed. The optimization problem for signal recovery is formulated as

The above objective function (23) is a quadratic function of signals . Thus, the optimal solution of above objective function can be analytically obtained by setting its gradient equal to zero. The gradient of the objective function isBy solving , we obtain

5. Experiments and Discussion

This section presents the results of numerical experiments to evaluate the performance of the proposed algorithms. The programs were coded in Matlab (R2016b) and run on a machine with a 3.3 GHz Intel Core i7 CPU and 16 GB of memory under Microsoft Windows 10.

5.1. Phase Transition

The goal of this phase transition experiment is to study how the proposed algorithm with the determinant-type sparsity measure can identify nonzero elements in a sparse matrix . Specifically, we examine when the success or failure of nonzero element identification switches according to the sparsity of the matrix. We varied the number of nonzero elements in sparse matrix , the number of the dimensionality of the dictionary atoms, and the number of measurements . The range for was from to , from to , and from to .

Figure 2 compares the phase transition diagrams of the determinant-type measure and the -norm. The white regions indicate success of identifying nonzero elements, whereas the black regions indicate failure. It can be seen that the area of white for the determinant-type measure is larger than that of the -norm. It demonstrates that the determinant-type measure outperforms the -norm especially when the number of measurements increases.

5.2. Dictionary Recovery

This subsection examines if the proposed algorithm can recover the true analysis dictionary. To quantitatively evaluate the performance, we used synthetic signals with a known ground-truth dictionary.

Then, to evaluate the sparsity of the coefficient, we use the Hoyer sparsity [50], which can be normalized to satisfy more of the sparsity constraints [51]:where is a column of the coefficient matrix. The Hoyer sparsity for a matrix is defined asThe larger the Hoyer sparsity measure, the sparser the coefficient matrix. Note that the sparse representation optimization problem with Hoyer sparsity constraint is hard to solve; thus it is generally used for evaluation.

5.2.1. Experimental Setup

A ground-truth dictionary and observed signals were built as follows. A random synthesis dictionary was generated by taking the absolute values of i.i.d. zero mean, unit variance normal random variables. The analysis dictionary was set as . A set of observation signals was generated by ; namely, each signal column was generated by a linear combination of different atoms in the dictionary, with nonnegative corresponding coefficients in unity random and independent locations. Naturally, the elements of the observed signals became nonnegative with different sparsities. In this experiment, we chose a set of size of such observational signal matrices.

To initialize the proposed algorithm, we used a linear model to generate the initial estimate by combining the ground-truth dictionary and a normalized random matrix , that is, , and then projecting it to satisfy the uniformly normalized and orthogonal constraints. When is zero, we actually initialize with the ground-truth dictionary and when , the initial analysis dictionary will be random.

The learned analysis dictionary by our algorithm was compared with the ground-truth dictionary. Since there is a row-shuffle indeterminacy, we find corresponding rows by sweeping all rows of the learned dictionary and ground-truth dictionary and finding the closest rows between the two dictionaries. The distance less than 0.01 was considered as a success:where was an atom from the learned dictionary, and was an atom from the ground-truth dictionary.

First we use different number of nonzeros . We set and to see the performance between different sparsity. The Hoyer sparsities of ground-truth coefficient matrices are 1 (), 0.95 (), 0.91 (), 0.87 (), 0.84 (), and 0.81 (). Figure 3 presents the recovery curves of the analysis dictionary of size , from which we can see that the results can reach to 100% in the cases of , and and more than 99% in the cases of , and . Figure 4 presents the Hoyer sparsity of the learned sparse coefficients, which shows the convergence of Hoyer sparsity value of different and they nearly converge to the Hoyer sparsities of ground-truth coefficient matrices.

Next, we investigated the role of . We chose with and drew the recovery curves and the Hoyer sparsity. Figure 5 shows the recovery curves of the analysis dictionary with different . From this figure, we can see that the ratio of recovery can reach about 100% in many different situations. Figure 6 presents the Hoyer sparsity of the learned sparse coefficient, which shows the Hoyer sparsity converges to the same value which nearly equal to the ground-truth Hoyer sparsity 0.91 ().

Then we compared our proposed algorithm with the existing determinant-type based algorithm SADL-DET [26], which is also applied for nonnegative signals. We compared the recovery rate and the Hoyer sparsity for these algorithms in the case of . Figure 7 shows the results in the case of , which shows our proposed algorithm converges faster than SADL-DET. Figure 8 shows the results in the case of , which shows our proposed algorithm also converge faster in the recovery curve and can obtain a better recovery rate. Figure 9 shows the results in the case of , which shows that our proposed algorithm can achieve good results.

We compared the computational consumption. Figure 10 indicates that our proposed algorithm is nearly times faster compared with SADL-DET.

5.3. Image Denoising

We tested our proposed method with a practical problem of image denoising [18]. The noise level and the quality of denoising were evaluated by the peak signal-to-noise ratios (PSNRs (PSNR (dB) is defined as , where and denote the original signal and the signal polluted by noise, resp.)). The tested images, as also the tested noise levels, are from the Cropped Yale face database (from http://vision.ucsd.edu/extyaleb/CroppedYaleBZip/CroppedYale.zip). Note that, to make sure that the observed signals are nonnegative, the noises we added to the original images are nonnegative uniformly distributed pseudorandom values.

Figure 11 shows original, noisy, and denoised images by our proposed algorithm. Then, we compare our proposed algorithm with the analysis model based algorithm, Constrained Overcomplete Analysis Operator Learning (COAOL), and we also represent the denoised results using fixed Finite Difference (FD) operator. The average PSNRs of the denoised results for different face cases are presented in Table 1, which shows that our algorithm performs well in image denoising. To intuitively see the denoising performance with different methods, we take faces 05, 10, 15, and 20, for instance. Figure 12 shows the original faces, noisy faces, and denoised faces with different methods, COAOL, ours, and FD. Although COAOL has a good performance for general signals, it cannot achieve good results for nonnegative noise. Our proposed algorithm has outstanding advantage in nonnegative image denoising.

6. Conclusion

In this study, we have proposed a novel and efficient analysis sparse representation algorithm with the determinant-type sparse measure, which focuses on nonnegative signals processing. We separated the whole problem into three subproblems: analysis dictionary updates, sparse coding, and signals recovery. In the sparse coding stage, we employ DC programming to solve the nonconvex minimization problems. The experimental results verify the effectiveness of the proposed algorithm and the determinant measure of sparsity. Remarkable advantages of the proposed algorithm include the faster running time and good performance in image denoising. Moving forward, we can employ the proposed algorithm for more applications, such as image inpainting, superresolution, and other applications.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported in part by NEDO and JSPS KAKENHI 26730130 and 16K00335.