Abstract

Dictionary learning problem has become an active topic for decades. Most existing learning methods train the dictionary to adapt to a particular class of signals. But as the number of the dictionary atoms is increased to represent the signals much more sparsely, the coherence between the atoms becomes higher. According to the greedy and compressed sensing theories, this goes against the implementation of sparse coding. In this paper, a novel approach is proposed to learn the dictionary that minimizes the sparse representation error according to the training signals with the coherence taken into consideration. The coherence is constrained by making the Gram matrix of the desired dictionary approximate to an identity matrix of proper dimension. The method for handling the proposed model is mainly based on the alternating minimization procedure and, in each step, the closed-form solution is derived. A series of experiments on synthetic data and audio signals is executed to demonstrate the promising performance of the learnt incoherent dictionary and the superiority of the learning method to the existing ones.

1. Introduction

Sparse representation (SR) theory [1, 2] indicates that a signal can be represented by certain linear combination of a few atoms of a prespecified dictionary. It is an evolving field, with state-of-the-art results in many signal processing tasks, such as coding, denoising, face recognition, deblurring, and compressed sensing [37].

A fundamental consideration in employing the above theory is the choice of the dictionary and this leads to the famous dictionary learning (DL) problem. DL has attracted a lot of attention since its introduction at the end of last century [8, 9]. Most of the research has been done to learn a data adaptive dictionary so that a particular class of signals can be sparsely represented in this dictionary with low approximation error.

Under the SR framework, a signal vector can be expressed in the form ofwhere is the dictionary with its columns referred to as atoms (throughout this paper, MATLAB notations are used) and is the corresponding sparse coefficient vector.

Let with being its th element. The -norm of vector is defined as Note that is not a norm in a strict sense for . For convenience, is used to denote the number of nonzero elements in . A vector given by (1) is said to be -sparse in if .

Let be a set of training samples from a class of signals to be considered. The basic problem of DL is to find a dictionary such that, for each , there exists a vector that is sparse. Such a problem has been widely investigated during the last decade or so [1012] and can be formulated aswhere denotes the Frobenius norm, are proper constants, is the sparsity of sparse vector , andSuch a problem is difficult to be solved as it is nonconvex in and , and is nonsmooth and highly unstable. A popularly used approach is based on the alternating minimization strategy. A two-stage procedure is usually carried out for solving the above problem and also for avoiding the selections of [1012]. The problem in the first stage is referred to as sparse coding, aiming at finding the (column) sparse matrix with a given ; that isNote that the equivalent expression to (5) where the constraint is a fixed sparse representation error (SRE) level can also be formulated as with being the error threshold. Such a problem can be solved using the orthogonal matching pursuit (OMP) based methods [13, 14]. Furthermore, it can be shown that the solution of the above problem is the same as the one of the -based minimization below: while the latter can be addressed using algorithms such as basis pursuit (BP) [15] and the -based optimization techniques [16].

Many algorithms for solving (3) are different from each other mainly in the 2nd stage, that is, dictionary updating. For the dictionary , in order to code the signals of interest more sparsely, we usually set which means that is overcomplete. However, this redundancy increases the pairwise similarity of dictionary atoms. According to the work in [13], such a similarity has a direct influence on the dictionary’s performance, especially for the accuracy in sparse coding stage. If any two atoms degenerate to the same vector, this will lead to overfitting to the training data. Thus, incoherent dictionary is expected to improve the performance of the SR model.

Yaghoobi et al. proposed a design method for parametric dictionary [17]. The authors attempted to optimize the dictionary to make the corresponding Gram matrix approximate to the Gram of an equiangular tight frame (ETF), which possesses good coherence behavior. However, this method relies extremely on a priori knowledge of appropriate parameters choosing criterion that is related to a given class of signals. A new algorithm was developed in [18] named INK-SVD. In each iteration of K-SVD algorithm [11], the dictionary updating stage is followed by an additional decorrelation step. Each pair of atoms which has coherence above the threshold should have its inner angle increased symmetrically so as to reduce the coherence. But this procedure will implicitly destroy the original SR result from the K-SVD algorithm. To compensate this problem, the authors of [19] improved the work of [18] by incorporating a new decorrelation step (also related to the ETF according to its low coherence) and a dictionary rotation operation to the update stage. In [20], a weighting model was formulated to balance the coherence of the dictionary and the sparse representation ability, and a gradient-based method was carried out for solving the corresponding problem.

The main objective of this paper is to propose a new incoherent dictionary learning (IDL) method that constrains the coherence of the dictionary and minimizes the SRE and the contributions are threefold:(i)A novel model is proposed for learning the incoherent dictionary. The main contribution is also located in the dictionary updating procedure. When minimizing the SRE, that is, , is under the coherence constraint by making the corresponding Gram matrix approximate to an identity matrix of proper dimension.(ii)An iterative algorithm that updates the sparse coefficients and the components of the dictionary alternately is put forward to solve the design problem. In every step of dictionary updating, the solution of each component of dictionary is derived analytically.(iii)A series of experiments on synthetic data and audio signals is carried out to demonstrate the performance of each compared algorithm.

The remainder of this paper is arranged as follows. In Section 2, some preliminaries are provided and the main issue of learning incoherent dictionary is also formulated in this part. The algorithm proposed for addressing the corresponding design problem is investigated in Section 3. Simulations are carried out in Section 4 to examine the performance of the proposed algorithm and to compare with the existing ones. Some concluding remarks are given in Section 5.

2. Preliminaries and Problem Formulation

In this section, some preliminaries will be introduced and two main comparisons of this paper are also reviewed in detail. Based on these, we formulate the problem of incoherent dictionary learning with the purpose of increasing the approximation performance of the dictionary to a particular class of signals under the coherence constraint.

The most fundamental quality associated with a dictionary is the mutual coherence (MC) [21]. MC indicates the degree of similarity between different dictionary columns. It equals the maximum absolute inner product between two distinct atoms: where denotes the transpose operator. As shown in [21], a -sparse signal generated according to (1) can be exactly recovered with OMP as long asRoughly speaking, MC measures how two atoms can look alike. Equation (9) is just a worst-case bound and only reflects the most extreme correlations in the dictionary. Nevertheless, MC is easy to be manipulated and it captures well the behaviors of some dictionaries. Generally, a dictionary is called incoherent if the corresponding MC is small [18, 19]. Besides, as pointed out in [19], the coherence of a dictionary is related to the condition number of its subdictionaries. This implies that achieving a low MC value results in well-conditioned subdictionaries.

Define the Gram matrix of asIt is common to study MC in (8) via the Gram matrix. Let be the diagonal matrix whose th element is given by for . The Gram matrix of , denoted as , is then normalized, such that , . Obviously, .

For , it has been shown in [22] that is bounded with with being the Welch bound. If each atomic inner product meets this bound, the dictionary is called an ETF. An ETF has a very nice MC behavior and has been considered to be utilized in optimal dictionary design [17, 19].

2.1. Related Works

It is worth noting that ETFs only exist for those matrices with dimensionality constrained with if the atoms are real. So, one usually replaces the set of ETF Grams with a relaxed version [17, 19] that is defined aswhere is a constant to control the searching space. Clearly, when , contains all the ETF Grams.

Besides the space , the authors of [19] define a spectral constraint set as Here returns the vector of eigenvalues and is the rank operator. The algorithm for learning incoherent dictionary proposed in [19] can be outlined as follows:(i)Sparse coding with OMP.(ii)Dictionary updating employing K-SVD.(iii)Atoms decorrelation through an iterative projection procedure.(iv)Dictionary rotation to minimize the approximate error while keeping the MC unchanged.The main contributions of [19] lie in the last two steps. The atoms decorrelation is executed by iteratively projecting the Gram of the output dictionary of K-SVD between the sets and until a stopping criterion is met. With the singular value decomposition (SVD) of the resulting positive semidefinite Gram matrix being expressed as where is orthonormal and is the diagonal singular value matrix with all its elements being nonnegative, the incoherent dictionary can be obtained as with being an arbitrary orthonormal matrix. Finally, the authors consider this degree of freedom to further reduce the SRE by solving where is the set of orthonormal matrices. This is the rotation procedure.

Remark 1. Compared with the decorrelation operation in [18], the above-mentioned atoms decorrelation can achieve a much smaller MC value. Besides, the additional rotation procedure can slightly redeem SR ability. However, the approximation performance of the dictionary is highly damaged by the iterative projections. Though the dictionary rotation procedure is carried out for compensation, the effect of the sole degree of freedom on the SR ability is quite limited.

In [20], the authors consider another strategy for IDL, where the dictionary’s coherence is minimized along with the SRE. The cost function can be expressed aswith denoting the identity matrix of dimension . It is clear that is the simplest ETF Gram (with ). The Lagrange multiplier controls the trade-off between minimizing the SRE and minimizing the dictionary’s coherence. With the gradient of being calculated as the update of the dictionary is then executed by the steepest descent algorithm [20].

Remark 2. (i) The choice of remains open-ended and there is no selection criterion introduced in [20]. From the simulation results of [20], larger introduces better performance.
(ii) It is well known that the gradient-based algorithms may easily fall into a local minimum if the initialization is not properly set [23, 24]. As gradient-based method is carried out for solving (17), the efficiency and accuracy can be further improved.

2.2. Problem Formulation

Let be the dictionary, let be the signal set with , and let be the corresponding sparse coefficient matrix in with as defined previously. For the problem indicated in (3), we update and alternately. For a fixed , can be calculated by the greedy algorithms or the -based convex optimization methods. In the following, we focus our discussion on the dictionary updating stage. For the traditional case, that is,the authors of [10] simply update the dictionary as . But when is not full rank, this method fails to work. The K-SVD algorithm [11] minimizes (19) for each atom separately. When updating the dictionary, the coefficients are also renewed simultaneously. In every iteration between coefficients and dictionary, it needs times SVD operations. It is a time-consuming algorithm and not well suited to enforcing a coherence constraint which is important for the implementation of sparse coding.

As an ETF can achieve small MC value, this motivates us to design a dictionary that is as close as possible to an ETF [1720]. So the following constrained model is proposed: The closed-form solution set of has been derived in [7, 24] as where and are both arbitrary orthonormal matrices of dimensions and , respectively. So (20) can be rewritten as

Remark 3. (i) Here we choose the identity matrix as the target Gram for the following reasons: it is easy to handle (avoiding the iterative projection between and as carried out in [19]) and expression (21) contains more degrees of freedom than (15) for further minimizing the SRE.
(ii) As pointed out in [20], a flatter singular value spectrum of the dictionary indicates a less coherent dictionary. Our design strategy is under constraint (21) which means that the nonzero singular values of our designed dictionary are all equal (the same as (17) with ). Hence, better coherence performance can be expected.

3. Coherence Constrained Dictionary Learning

In this section, an alternating minimization algorithm is developed to address the dictionary learning problem (22).

3.1. Algorithm for IDL

To solve the above multivariate problem and also avoid the selections of , the alternating minimization strategy as introduced for addressing (3) seems a natural choice. The pseudocode of the proposed algorithm (named CCDL, standing for Coherence Constrained Dictionary Learning) is summarized as follows:

Initialization: initial random dictionary;: training data;: number of iterations between sparse coding and dictionary updating;: number of iterations between updating and .Calculate the SVD of as and set .

For , do the following.

Step 1. Set , and update column by column by solving with an OMP based algorithm and get the approximate solution .
Set and and .

Step 2. While , do the following:(a)For fixed , update by solvingThe analytical solution will be given in the next subsection.(b)For fixed , update by solvingThe solution will be derived in the next subsection.(c)Return to Step 2 with .

Step 3. Set and . End for if .

End. Output .

3.2. Update the Components of Dictionary

Now, let us focus on solving (24) and (25). For convenience, we omit the superscript in the expressions. As the sparse coefficient matrix is assumed to be fixed in Step 2, we can rewrite the cost function of (22) as where and can be updated alternately. Let have the following SVD (for arbitrary matrix , the general SVD form can be expressed as ):Definewith . We then have two alternative expressions for : Assume that and be given. In what follows, we derive a procedure for updating such that

3.2.1. Update

First of all, considerThis model can be solved by the following theorem [19].

Theorem 4. For both and belonging to , the solution of can be characterized aswhere and are both orthonormal matrices given by the following SVD:

LetThe solution to (31) can be derived as

3.2.2. Update

Now, let us considerObviously, (30) is satisfied with and obtained as the solutions of (31) and (37), respectively.

Note that . DefineIt is clear that such a function has the following properties:where is defined in (28).

Denotewhere each with is constructed using It follows from (38) and Theorem 4 that the solution to (41), as understood, is given bywhereIt turns out from (39) and (40) that while (41) indicates thatThis implies that constructing using (42) and hence makes a decreasing sequence and, therefore, the solution to (37) can be estimated with

Remark 5. (i) It may be possible that , but is always true. Therefore, can be updated with .
(ii) For the whole CCDL, there actually exist three loops (indexed by , , and , resp.). For the loop indexed by , (44) and (45) indicate that the procedure of updating makes decrease as increases. So the solution (or an approximate one) of (37) can be gotten. Besides, the solution of (31) is derived analytically as (36). All these result in the convergence of the second loop indexed by , that is, dictionary updating. Assuming that OMP performs perfectly in the sparse coding stage, the nonincreasing trend of Step 1 is ensured. To sum up, the cost function (22) decreases in every step and hence the convergence of CCDL is guaranteed.

4. Experiment Results

In this section, we evaluate the performance of the proposed model and algorithm with synthetic data and audio signals.

4.1. Convergence Performance

Firstly, several simulations will be carried out to verify the convergence performance of the proposed CCDL. As the main contributions of the new method lie in the dictionary updating stage, we focus on the performance of designing the orthonormal matrices and , that is, solving (26).

Set , , and number of signals . There exist two loops in dictionary updating as introduced in the second point of Remark 5 indexed by and , respectively. The maximum iteration numbers for and are both fixed to .

4.1.1. For Synthetic Dictionary

is taken as a Gaussian random matrix. Two orthonormal matrices and are generated to form the authentic dictionary :Then is produced as . The performance is evaluated bywith being the learnt dictionary.

Starting from an initial random dictionary, Figures 1 and 2 show the convergence performance of the loops indexed by (with ) and , respectively.

Remark 6. (i) Seen from the minimum values of Figures 1 and 2 (very close to zero), they indicate that the design processes of and can result in a dictionary which is almost the same as the authentic one, .
(ii) The loop indexed by is embedded in that of . When , that is, the case of Figure 1, already achieves the minimum value that is very close to zero. It manifests that the orthonormal matrix plays a more important role in minimizing . The result of Figure 2 also verifies this conclusion as the value of converges in one iteration of the loop indexed by . Review Remarks 1 and 3 that state that one of the main differences between the proposed algorithm and the method in [19] is the extra degree of freedom . So better sparse representation ability of the proposed algorithm can be expected.

Figures 1 and 2 show the efficiency of the proposed CCDL, where the authentic dictionary is generated with an ideal format as (47). In what follows, random generated dictionary will be considered.

4.1.2. For Random Dictionary

In this case, the matrix , the authentic dictionary , and the initial dictionary are all chosen randomly of proper dimensions without any correlation. is produced as . Figures 3 and 4 depict the convergence performance of the loops indexed by (with ) and , respectively.

The phenomena observed in this case are similar to the previous one, except for the fact that the value of is larger compared to the case with synthetic . It should be pointed out that, for random , always holds [25]. This explains why the minimum of cannot approach zero in this case.

4.2. Simulations with Synthetic Data

We now carry out experiments to illustrate the performance of dictionaries learnt with different approaches. As comparisons, algorithms in [11, 19, 20] will be performed. For convenience, the learning systems are denoted as , , , and for the proposed CCDL and the methods in the references just mentioned, respectively.

We generate two dictionaries and , both with normally distributed entries. is used as the initial condition for executing different learning algorithms, and is the authentic dictionary. A set of -sparse vectors is produced, where each nonzero element of is randomly positioned with a Gaussian distribution of i.i.d that has zero mean and unit variance. With the authentic dictionary , the set of signal vectors is generated by , , for training the dictionaries.

Set , , , and , and the number of iterations for dictionary learning is fixed for all the four methods. Besides, the number of iterative projections and rotations in [19] is set to , and the gradient descent is executed times with step size equaling to in [20]. For CCDL, the maximum iteration numbers for the loops indexed by and are fixed as and , respectively.

The mutual coherence performances of different dictionaries are compared and the results are shown in Figure 5. For , the horizontal axis refers to the constant which controls the searching space in (12), while for the horizontal axis indicates the weighting factor for with being an integer varying within . In order to have clear comparisons, some results beyond certain ranges have been omitted, mainly concerning with too small .

With synthetic data, we test the sparse representation abilities of the learnt dictionaries. The representation accuracy is usually quantified with the mean square error (MSE) defined as [11]where is the reconstructed signal with being the output dictionary of , , , or and being the corresponding coefficients of in different dictionaries calculated by OMP. Figure 6 depicts the MSE results of different systems.

Remark 7. (i) As known, for , when approximates to , the system regresses to . The results in Figures 5 and 6 have confirmed this conclusion with the cases where . Besides, if too small is chosen, the MSE performance of degenerates, though small mutual coherence values would be achieved.
(ii) The results of fluctuate a lot for both tests. Though some surprisingly good performance is achieved, this superiority is highly sensitive to the data. As will be seen in the next experiment, when musical audio signals are tested, the fluctuations of become gentle.
(iii) Seen from the recovery accuracy indicator that is most crucial for evaluating the systems’ performance, that is, Figure 6, the results of are superior to those of the other three methods in most of the cases.

4.3. Experiments with Musical Audio Signals

The effectiveness of all the algorithms will be evaluated for audio signal coding task which is popularly used for testing the performance of incoherent dictionaries [19, 20].

The audio signals are selected from the “testMusic16kHz” set of SMALLbox [26]. Just like the operations in [19], we divide the recording into overlapping blocks of samples with rectangular windows and arrange the resulting time-domain signals as columns of the training data matrix . For each musical excerpt, the resulting ; that is, and . An overcomplete Gabor dictionary of size is used as the initialization which means . The sparsity level is fixed as for all the tests. The numbers of iterations are kept with the same settings as used for that of synthetic data. When learning the dictionaries, OMP algorithm is applied for the sparse coding stage.

The recovery accuracy is quantified with the signal-to-noise ratio (SNR) defined as [19] with being the output dictionary of each learning method and being the sparse coefficient matrix corresponding to calculated from by the -Homotopy algorithm [16] with .

For all , , , and , we test each of the ten musical excerpts in “testMusic16kHz” set and keep the average results for comparison. The mutual coherence behavior for each of the learnt dictionaries is depicted in Figure 7.

When recovering with -Homotopy, the SNR performance is shown in Figure 8.

Remark 8. In this case, where musical audio signals are tested and -Homotopy algorithm is applied for signal reconstruction, the fluctuations of become gentle for both SNR and mutual coherence performance versus the weighting factor . For , the results of mutual coherence and SNR are more uniform; that is, smaller mutual coherence leads to higher SNR value. The performance of is superior to the others as these results are obtained by averaging ten musical excerpts.

5. Conclusion

In this paper, we have investigated the problem of learning incoherent dictionary. The contributions are threefold. The first one is to have proposed a novel model for IDL which considers minimizing the sparse representation error according to the training signals under the coherence constraint by making the Gram matrix of the dictionary approximate to the identity matrix. An alternating minimization algorithm named CCDL has been presented for solving the learning problem and the solution of each component of the optimum dictionary is derived analytically as the second contribution. The last one is to carry out experiments on synthetic data and musical audio signals to demonstrate the superiority of the proposed model and algorithm.

Competing Interests

The authors declare that there are no competing interests regarding the publication of this paper.

Acknowledgments

This work was supported by Grants of NSFCs 61273195, 61473262, and 61503339, ZJNSF LQ14F030008, and Zhejiang Hua Yue Institute of Information and Data Processing.