Abstract

Optimizing the mutual coherence of a learned dictionary plays an important role in sparse representation and compressed sensing. In this paper, a efficient framework is developed to learn an incoherent dictionary for sparse representation. In particular, the coherence of a previous dictionary (or Gram matrix) is reduced sequentially by finding a new dictionary (or Gram matrix), which is closest to the reference unit norm tight frame of the previous dictionary (or Gram matrix). The optimization problem can be solved by restricting the tightness and coherence alternately at each iteration of the algorithm. The significant and different aspect of our proposed framework is that the learned dictionary can approximate an equiangular tight frame. Furthermore, manifold optimization is used to avoid the degeneracy of sparse representation while only reducing the coherence of the learned dictionary. This can be performed after the dictionary update process rather than during the dictionary update process. Experiments on synthetic and real audio data show that our proposed methods give notable improvements in lower coherence, have faster running times, and are extremely robust compared to several existing methods.

1. Introduction

In recent years, the research of dictionary learning has attracted a lot of attention because a learned dictionary captures some of the intrafeatures of training samples in many applications like denoising [1], compressed sensing [2], pattern recognition, and classification tasks [35]. A learned dictionary allows an interesting signal to be represented as a linear combination with relatively few atoms, and the representation coefficients are as sparse as possible. Hence, the problem of dictionary learning can be stated as follows [6]:where , is the admissible set of all column-normalized dictionaries, is an overcomplete dictionary , and each column of is referred to an atom. represents the admissible set of all sparse coefficient matrices (i.e., most of the entries are either zero or are sufficiently small in magnitude), and . represents nonzero numbers in .

Equation (1) is not a convex problem regarding the pair , so most dictionary learning methods employ alternating optimization over and . The following two stages are repeated until convergence:(1)Sparse coding(2)Dictionary update

The first stage is a sparse coding with fixed, and the second stage is a dictionary update that updates partial atoms with fixed.

(1) Related Work. Different applications tend to use different optimization algorithms for learning sparsifying dictionaries to obtain the desired characteristics. Traditional dictionary learning methods, such as the method of optimal directions (MOD) [7] and -means singular value decomposition (K-SVD) [8], aim at optimizing a dictionary to represent all training samples sparsely, but the coherence between atoms is ignored. However, many studies of compressed sensing focus on the mutual coherence of an effective dictionary (the multiplication of a sensing matrix and dictionary) [912], which is a key factor in controlling the support of solutions of the least-squares with penalized and greedy problems. Furthermore, highly incoherent dictionaries tend to avoid ambiguity and improve noise stability when sparse coding is enforced. Therefore, an incoherent frame is applied typically to optimize the sensing matrix in compressed sensing. Tsiligianni et al. [13] constructed an incoherent frame to optimize a sensing matrix by the averaged projections onto a Gram matrix and obtained better performance of sparse signal recovery. Rusu and González-Prelcic [14] directly optimized the maximum inner product between pairs of atoms to construct incoherent frames using convex optimization. It is unlikely that we will focus on learning an incoherent dictionary for sparse representation.

Much research has concentrated on reducing the coherence of a learned dictionaries. Yaghoobi et al. [15, 16] introduced the optimization of dictionary coherence by imposing a minimal coherence constraint to design a parametric dictionary to deal in advance with a relatively well-known signal model. The penalty term on the coherence is added in the dictionary learning; therefore, (1) can be reformulated as follows [1720]:

Inspired by MOD [7], Ramirez et al. [17] proposed the method of optimal coherence-constrained directions (MOCOD) to learn a dictionary. The limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) algorithm was used to cooptimized the coherence between atoms and the performance of sparse representation [18]. Abolghasemi et al. [19] constructed an incoherent dictionary by using steepest-gradient descent in the iteration of K-SVD. Bao et al. [20] proposed a hybrid alternating proximal algorithm for incoherent dictionary learning accompanied by a convergent analysis and a proof. The abovementioned methods cannot reduce the coherence obviously because the sum of the squared inner product of all atom pairs is minimized in the second term of (4). Meanwhile, a learned dictionaries cannot obtain the target coherence arbitrarily. Mailhé et al. [21] proposed an incoherent K-SVD (INKSVD) algorithm, in which each pair of atoms, having a higher value than the target coherence, is decorrelated in the dictionary update after K-SVD is performed. The key idea is to cluster atoms and symmetrically decrease the correlation of each pair of atoms based on a greedy method. The main drawback is that if the target coherence is set too low, the work will not perform well on sparse representation (Figure 5), and the computation will rise dramatically (Table 2). Barchiesi and Plumbley [22] proposed an incoherent dictionary learning method that enforces the iterative projection (IP) onto the spectral and structural constraint set in order to obtain the optimal Gram matrix. As a result, dictionary optimization was performed based on the orthogonal Procrustes problem (OPP) for better sparse representation performance. More recently, Rusu and González-Prelcic [14] directly constructed an incoherent dictionary followed by orthogonal constraints such as in [22] for sparse representation. Similar work has been done with regard to the dictionary optimization by [14, 22]. We are only reporting results obtained using methods from [21, 22], as they seem to provide better performance of incoherent dictionaries and sparse representation. Note that it is a difficult task to obtain an arbitrarily low coherence, and this is not approximating the flat spectrum of an equiangular tight frame (ETF) using the methods from the literature [21, 22] (see Section 4). Additionally, our proposed methods improved [21, 22]. Manopt is employed to solve orthogonal Procrustes problem in the dictionary optimization because only reducing the coherence of a learned dictionary will degenerate the performance of sparse representation, which is very different from the work done in [14, 21, 22] and is also the major contribution of our work.

(2) Our Contributions. There are three specific characteristics of our proposed incoherent dictionary learning methods that distinguish them from prior methods.(1)Rather than general dictionary learning methods, an efficient framework based on a unit norm tight frame (UNTF) is developed for solving the incoherent dictionary learning problem, which constrains the dictionary to approximate to ETF.(2)The mutual coherence of the dictionary is reduced by alternately restricting the tightness and coherence, which gives a significantly higher incoherence than those reports in [21, 22].(3)We use manifold optimization (Manopt) to solve the problem of optimization with orthogonal constraints, that is, (14), which aims to obtain better performance from incoherent dictionaries and sparse representation. Experiments are carried out on synthetic data and real audio data to illustrate the better performance of our proposed methods.

(3) Organization of Paper. The rest of this paper is organized as follows. Section 2 gives the definitions of mutual coherence and ETFs, after which our proposed algorithms are presented in Section 3. Section 4 gives the details of dictionary optimization by employing Manopt. Section 5 reports on extensive experiments that were carried on synthetic data and real audio data. Finally, conclusions are drawn in Section 6.

2. Incoherent Dictionary

2.1. The Mutual Coherence

The mutual coherence of a dictionary is defined as the maximum inner product between atoms [23]:where and denote two different atoms.

The coherence measures the similarities between atoms. We have that , and a dictionary is considered incoherent if is small. Mutual coherence is an important sufficient condition to provide a theoretical guarantee for exactly sparse signal recovery.

Theorem 1 (see [9, 10]). Let be overcomplete dictionary with mutual coherence , if condition (6) is satisfiedwhere is nonzero numbers in . Consider the system , in which case can be recovered using basis pursuit (BP) and orthogonal matching pursuit (OMP). Theorem 1 shows that an incoherent dictionary is desirable; here, the best expectation is that the mutual coherence can reach the Welch bound.

Theorem 2 (see [25]). Consider overcomplete dictionary with normalized columns. The coherence satisfiesThe bound is achieved if and only if matrix is an equiangular tight frame (ETF).

Therefore, optimizing a dictionary to approximate ETF is an effective method to reduce the coherence in sparse representation.

2.2. Equiangular Tight Frames

ETF can be defined as follows.

Definition 3 (see [26]). Let be a matrix, where , whose columns are . Matrix is named an equiangular tight frame if the following conditions are met:(1)Each column has unit norm: .(2)The columns are equiangular. For some nonnegative , we have .(3)The columns form a tight frame. That is to say, ,where is a unit matrix with . It follows that is the lowest coherence, and matrix is full row rank, and nonzero singular values are equal to .

3. Our Proposed Incoherent Dictionary Learning

Frames are an overcomplete version of a basis set and tight frames are an overcomplete version of an orthogonal basis set. ETFs generalize the geometric properties of an orthogonal basis [26]. However, ETF is difficult to construct. In particular, Tropp et al. [27] have demonstrated that -tight frame is the closest design in the Frobenius-norm sense to the solution of the relaxed problem.

Theorem 4 (see [27]). Given the matrix with , suppose that it has singular value decomposition , then the matrix is called orthogonal polar factor. With regard to Frobenius norm, is proximate -tight frame to the matrix , and it is also obtained by computing .

We call the given -tight frame UNTF if all columns , in which case . UNTF is employed in our proposed methods, because it is closest to the computed low-coherent dictionary in terms of its Frobenius norm.

3.1. Our Proposed Incoherent Dictionary Learning Algorithms

To constrain the coherence between atoms, (3) can be reformulated aswhere is the target coherence. Next, we modify the INKSVD [21] and IP [22] algorithms according to Theorem 4 in the expectation that the new dictionary will be proximal to ETFs. Following these modification, the proposed algorithms are named UNTF-INKSVD and UNTF-IP, respectively, in order to emphasize our different framework as well as prior work.

3.1.1. The Improvement of INKSVD Algorithm

In the INKSVD algorithm [21], coherent optimization is added after K-SVD is performed. It is expressed as follows:where is the given dictionary and the minimization of matrix nearest problem is employed to resolve (9).

In the first algorithm, the coherence of an initial dictionary is reduced sequentially by finding a new dictionary, which has a lower coherence and is nearest to the previous one. Accordingly, we modify the objective function based on Theorem 4:where is the reference UNTF of the previous dictionary. Equation (10) can be resolved based on local convex problems by using convex-optimization toolbox (CVX) (http://cvxr.com/cvx/doc/CVX.pdf).

The proposed algorithm is called UNTF-INKSVD, as discussed in relation to Algorithm 1. Firstly, we take the normalized as the initial UNTF; then (10) is used to seek for a new dictionary with a lower coherence, which is proximal to the reference UNTF of the previous one. That is to say, can be viewed as the reference UNTF in the th iteration. Lastly we project the new dictionary onto the UNTF manifold, , achieving an incoherent tight frame. Thus, the constraints between optimizing coherence and projection onto the UNTF manifold are performed alternately in the iterative dictionary update, yielding a tightness and lower coherence between atoms.

       Input: initial dictionary , , , iterations
      Output:
()    Initialize:
()    ;
()    ;
()    ;
()    normalize the columns of ;
()    while    do
()      Obtain the new dictionary by solving (10) based on the ;
()      ;
()      Normalize the columns of , ;
()   Compute the new mutual coherence and ;
()   if    then
()    ;
()   end
()   Update , ;
()   if    then
()    break;
()   end
() end
() return ;
3.1.2. The Improvement of IP Algorithm

The off-diagonal entries of the Gram matrix represent the coherence between atoms, so another technique for reducing coherence is to operate on the entries of the Gram matrix also in order to meet the following property:where is the target coherence.

Barchiesi and Plumbley [22] proposed iterative projections (IP) onto Gram matrix to reduce the correlation between atoms. Shrinkage is performed on the off-diagonal entries of the Gram matrix based on the following function:Unfortunately, the rank of shrunken Gram matrix may be greater than . Therefore, SVD is used to keep the best rank- approximation. The decomposition can be used further to extract the square root of the new Gram matrix, thus obtaining the optimal dictionary .

In the second algorithm, the coherence of the initial Gram matrix decreases sequentially upon finding a new Gram matrix that has a lower coherence and is nearest to the previous one. The optimization of (9) can be modified into the following problem:where is called to be reference Gram matrix.

The core methodology is to operate the reference Gram matrix , instead of . Our modified algorithm is referred to as UNTF-IP, and the optimization process is described as Algorithm 2.

      Input: initial dictionary , , iterations;
      Output:
()    Initialize ;
()    ;
()    ;
()    normalize the columns of ;
()    while    do
()      Compute the reference Gram matrix ;
()      Apply (12) to for decreasing the coherence;
()      Apply SVD to to obtain the matrix which rank is equal to be ;
()      Building the squared-root of to obtain a new dictionary;
()   Compute to obtain the next closest UNTF;
()   Normalize the columns of and ;
()   Update , ;
()   Compute the new mutual coherence ;
()   if    then
()    break;
()   end
() end
() return ;

Firstly, the closest -tight frame is obtained. Normalization is then executed, after which the Gram matrix is computed. In the th iteration, can be viewed as the best coherence set over the current dictionary by employing Theorem 4. The above shrinkage operation equation (12) is performed, and SVD is enforced to obtain the rank . The updated Gram matrix is then decomposed to obtain the new dictionary. Lastly, we project the new dictionary onto the UNTF manifold, , achieving the next reference UNTF. Consequently, we obtain an effectively tighter and lower coherence dictionary that those obtained with the IP algorithms [22].

4. Dictionary Optimization with Manopt

Only reducing the coherence in (10) and (13) will result in poor performance of the sparse representation. Hence, after (10) and (13) are resolved, we add dictionary optimization to maintain good performance of the sparse representation based on the OPP. Equation (1) can be formulated equivalently as an orthogonal-constraint minimization as follows:

It is clear that . So the dictionary optimization in (14) has two advantages: (I) good representation performance can be obtained; (II) incoherence remains unchanged.

In [14, 22], dictionary rotation (DR) is employed to solve (14), but this is performed in the iterative dictionary update of (10) and (13). As demonstrated in [28], Manopt provides efficient algorithms to find an optimal solution of the OPP. In the next section, we introduce an optimization framework based on Manopt.

Let . We consider the special orthogonal constraint as a Riemannian submanifold of . Hence the purpose of manifold optimization is to find an optimal solution of for the following model:where the search space is a Riemannian manifold that can be linearized locally at each point as a tangent space .

The inner problem at the current iterate is defined as follows:where and are the Riemannian gradient and the Hessian of the cost function at , respectively.

The Riemannian gradient of at is defined as follows:where is the gradient of as a function in and .

Intuitively, we also define the Riemannian Hessian of at along :where is the Hessian matrix of at along as a function in and .

Next, is calculated based on inner iterations with Steihaug-Toint truncated conjugate gradient (tCG) [29]; a candidate next iteration is produced by

The term is a retraction function on the manifold and describes the mapping between the tangent space and for any point . A simpler mapping is selected: . Let be orthogonalized. The selection of whether to receive or discard the candidate and quotient is used to update the trust-region radius:

We optimize using the Manopt toolbox [29] while is fixed. Algorithm 3 presents the procedure for this optimization. Afterwards, the optimal is obtained by . A better performance can be achieved for sparse representation, and the dictionary coherence remains unaffected. Furthermore, this optimization leads to a faster algorithm because it can be performed after the dictionary update process, which is in contrast to [22].

      Input: , , and iterations
      Output:
()    Initialize , ;
()    , , , ;
()    while    do
()      Apply (16) ;
()      Apply (19) to compute the next iterate ;
()      Apply (20) to compute the trust region radius ;
()      if    then
()       ;
()       else if   and   then
()     ;
()     else
()      
()     end
()    end
()   end
()   if    then
()    ;
()    else
()     ;
()    end
()   end
() end
() ;
() return ;

5. Experiment Results

In this section, we report on the experiments with synthetic data and real audio data that were intended to compare our proposed incoherent dictionary learning with the prior methods. All the experiments were performed on a Dell computer with 4 GB of memory and a 2-core 2.6 GHz Intel Pentium processor. All the codes were written in MATLAB.

5.1. Incoherent Dictionary Construction

In this experiment, incoherent dictionaries are constructed without learning from training samples, and we aim to reduce directly the mutual coherence of a given dictionary. We set the initial dictionary randomly, and and are chosen according to the condition that . Each atom is normalized as a unit norm, and the Welch bound is 0.1750. In order to observe the benefits of our proposed methods, the dictionary update is taken as follows: (I) INKSVD [21]; (II) IP [22]; (III) UNTF-INKSVD; (IV) UNTF-IP. The INKSVD and IP are taken from the web (http://code.soundsoftware.ac.uk/). Each algorithm is executed for ten times, and average results are taken. Specifically, the same initial dictionary and iterations are used for the measurements, and we evaluate the mutual coherence of the constructed dictionaries using each algorithm.

Figure 1 shows the mutual coherence of the constructed dictionaries. We note that our proposed algorithms exhibit significantly lower coherences, with the performance of the UNTF-IP algorithm slightly exceeding those of IP, UNTF-INKSVD, and INKSVD. A standard line is in Figure 2, which indicates that ETF has nonzero singular values that are same. As can be seen, the UNTF-IP and UNTF-INKSVD algorithms give approximately flat spectra and approximate the properties of ETF. This is a better outcome than with the IP or INKSVD algorithms, because alternating the constraints on tightness and coherence has a beneficial effect on incoherent dictionary construction. As a result, the incoherent dictionary constructed with the UNTF-IP algorithm possess an apparent property with approximation of ETFs. The error bars in Figures 1 and 2 show the standard deviation based on running each test 10 times and demonstrate the consistency of the results.

5.2. Incoherent Dictionary Learning for Sparse Representation with Synthetic Data

In this section, we have investigated the incoherent dictionary learning performance for sparse representation of synthetic data. The training samples are generated via underdetermined , where and are generated randomly. The dictionary is enforced by a column normalization, where and . The matrix is a sparse matrix with ,000. The nonzero coefficients are distributed randomly, and their values are determined according to a standard Gaussian distribution. The target coherence is set as a Welch bound of . Table 1 summarizes the tested methods, which are executed for 30 iterations between dictionary update and optimization. Each algorithm is executed for 10 times, and average results are taken.

The error bars in Figures 3 and 4 show the standard deviation based on running each test 10 times, and they demonstrate the consistency of the results. Figure 3(a) shows significant coherence of each learned dictionary. The UNTF-IP and UNTF-INKSVD algorithms have better mutual coherence on average than those of IP [22] and INKSVD [21] algorithms. In particular, the coherence of the learned dictionary with the UNTF-IP algorithm is closest to the Welch bound. Note that we have used Manopt to achieve a better performance of sparse representation. A signal-to-noise (SNR) value of is computed in order to evaluate the performance of sparse representation. The SNR value is showed in Figure 3(b), where it can be seen that Manopt gives a better performance of sparse representation with compared to [21, 22], while the coherence is reduced.

Figure 4 shows the ratio between the SNR and the coherence, . The experimental results show that our proposed algorithms perform well equilibrium between the coherence and sparse representation and exhibit a generalized performance of learned incoherent dictionaries.

5.3. Application on Audio Data

To verify the efficiency of our proposed methods, experiments are reported in this section on real audio data via , where is a predetermined noise. For the purposes of comparison and analysis, the audio dataset that we use is the one adopted by [21, 22], in which the data comprise an audio sample from a 16 kHz guitar recording. Furthermore, all columns in the initial dictionary are selected randomly from training samples and are normalized.

In this simulation, the target coherence is set in a range from 0.05 to 0.5, and the step size is 0.05. The tested methods are the same as those in Table 1, which are executed 10 times, and average results are taken. The termination criterion is that the target coherence is satisfied. We then evaluate our proposed incoherent dictionary learning methods by computing the mutual coherence and SNR.

As shown in Figure 5, the standard deviation based on running times is showed, and the consistency in many tests is obtained. And when the target coherence is less than 0.3, the proposed method II in Table 1—employing UNTF-IP followed by Manopt—generates the best effect compared to other methods and approximates the lowest bound. However, if the target coherence is greater than 0.3, the SNR of [21] is the highest, followed by that of our proposed method I. Table 2 shows the computational running times. The key idea behind [21] is to decrease symmetrically the correlation of each pair of atoms having higher coherence based on a greedy method. Therefore, when the target coherence is higher, the number of pairs of atoms to be decorrelated will decrease dramatically, and the computation decreases dramatically as shown in the first row of Table 2. Unlike [22], the most important benefit of our proposed methods is to obtain a better computational efficiency when the target mutual coherence is very low, because Manopt can be performed after the dictionary update process rather than during the dictionary update process. Compared with the prior methods, the present experimental results indicate that our learned dictionaries have a lower coherence, and with a certain degree of sparse representation.

6. Conclusion

In this paper, we have proposed two methods of learning an incoherent dictionary for sparse representation, adding the dictionary update and dictionary optimization in the traditional dictionary learning.

First, UNTF-INKSVD and UNTF-IP algorithms were developed to solve the problem of the higher incoherent and tighter dictionary effectively. Unlike other dictionary learning algorithms, our proposed algorithms learned an incoherent dictionary based on a unit norm tight frame in the dictionary update. An efficient framework was developed for sequentially reducing the coherence of an initial dictionary (or Gram matrix) by finding a new dictionary (or Gram matrix), which has a lower coherence and is nearest to the previous one. Hence, our learned incoherent dictionaries approximate the properties of ETFs, and the support of sparse coding is maximized.

Second, Manopt was employed to solve the orthogonal Procrustes problem in dictionary optimization, because only reducing the coherence of a learned dictionary will degenerate the performance of sparse representation. Meanwhile, we compared our proposed methods with the other methods, and the experimental results showed that our proposed methods balance the performance between incoherence and sparse representation. In particular, our proposed methods provide state-of-the-art results when is too low and have higher running speeds and better representation performances when compared to [21, 22]. This is because Manopt is performed after the dictionary update rather than during the dictionary update process.

However, a drawback is that our proposed methods are mainly suitable for learning an incoherent dictionary for sparse representation. Traditional dictionary learning seems to work well if the coherence of a learned dictionary is not restricted. In our work, more general, objective functions are proposed (see (10) and (13)) to construct an incoherent dictionary where tightness and coherence are restricted alternately at each iteration of the algorithm, and this method is similar to each alternating minimization. The theoretical proof of convergence in alternating minimization on more than two sets is still an open issue in [13, 15, 22]. Nevertheless, the experiments in our work show that incoherent dictionary learning methods can converge with a set of accumulation points under certain conditions. Our proposed algorithms can result in approximate converge to the values of objective coherence as in Figures 1, 3(a), and 5. Constructed dictionaries with our proposed algorithms give approximately flat spectrum of ETF in Figure 2. The SNR value is shown in Figures 3(b) and 5, which prove the effectiveness of our proposed algorithm when compared to [21, 22]. The convergence of the objective value does not prove the convergence of our proposed algorithms. Therefore, we will continue to work to prove the convergence of our proposed algorithms. And apply our proposed methods to other domains.

Competing Interests

The authors declare that they have no competing interests.

Acknowledgments

This article is supported by National Natural Science Foundation of China (61573299, 61673162, and 61672216), Scientific Research Project of Hunan Province Education Department (15C1328), and Control Science and Engineering Disciplinary Construction Funds of Xiangtan University.