Abstract

In image denoising (IDN) processing, the low-rank property is usually considered as an important image prior. As a convex relaxation approximation of low rank, nuclear norm-based algorithms and their variants have attracted a significant attention. These algorithms can be collectively called image domain-based methods whose common drawback is the requirement of great number of iterations for some acceptable solution. Meanwhile, the sparsity of images in a certain transform domain has also been exploited in image denoising problems. Sparsity transform learning algorithms can achieve extremely fast computations as well as desirable performance. By taking both advantages of image domain and transform domain in a general framework, we propose a sparsifying transform learning and weighted singular values minimization method (STLWSM) for IDN problems. The proposed method can make full use of the preponderance of both domains. For solving the nonconvex cost function, we also present an efficient alternative solution for acceleration. Experimental results show that the proposed STLWSM achieves improvement both visually and quantitatively with a large margin over state-of-the-art approaches based on an alternatively single domain. It also needs much less iteration than all the image domain algorithms.

1. Introduction

Noise inevitably exists in images during the process of real-world scenes acquisition by reason of physical limitations, leading to image denoising (IDN) and becomes a fundamental task in image processing. The recent IDN can be categorized as data-driven and prior-driven approaches.

The data-driven methods turn to a certain deep convolution neural network, such as Universal Denoising Net (UDN) [1] and Fractional Optimal Control Net [2], for the IDN problem. These CNN models, although have achieved great success provided with sufficient training samples, may not perform well in small-scale data applications. For example, one cannot obtain the acceptable network parameters on a single corrupted image, which is the case considered in this study. The aim of the prior-driven methods for image denoising is to renovate the inferior image by certain image prior or other properties, such as local smoothness, nonlocal similarity, low-rank structure, and so forth [35]. More specifically, the prior-based image denoising process means to find the inherently ideal image from the degraded one by extracting few significant factors and excluding the noisy information. It is a typical ill-posed linear inverse problem, and a widely used image degradation model can be generally formulated as follows [69]:where X and Y are both matrices representing the original image and the degraded one, respectively. H is also a matrix denoting the noninvertible degradation operator, and N is the additive noise.

To cope with the ill-posed problem, the general image denoising problem can be formulated as follows [9, 10]:where F(X) is regarded as the image prior knowledge, including local smoothness, nonlocal similarity, low-rank, and sparsity, and denotes the Frobenius norm. According to sparsity property, the degraded image x (x is the vectorization of X, ) satisfies , where is a synthesis overcomplete dictionary, is the sparse coefficient, and e is an approximation term in image domain [11]. This model is called as the synthesis model, and is the supposed sparse ().

To be specific, given an image x, the synthesis sparse coding problem is subjected to find a sparse to minimize . Various algorithms have been proposed [10, 1215] to figure out this NP-hard problem. Numerous researchers have learned the synthesis dictionary and updated the nonzero coefficients simultaneously to well represent the potential high-quality image. And these methods have been demonstrated useful in image denoising. Specifically, these synthesis models typically alternate two steps: the sparse coding updating and dictionary learning. However, the practical operation of synthesis models requires some rigorous conditions, which often violate in applications.

While the synthesis model has attracted extensive attentions, the analysis model has also been catching notice recently [16, 17]. The analysis model considers that a noisy image satisfies , where is regarded as an analysis dictionary, since it ‘analyzes’ the image x to a sparse form. The essence of defines the subspace to which the image belongs. And the underlying ideal image is formulated as , with representing the noise. The denoising problem is to find x by minimizing subject to . This problem is also NP-hard and resemblant of sparse coding in the synthesis model. Approximation algorithms of learning analysis dictionary have been proposed in recent years, which similar to the synthesis case are also computationally expensive.

More recently, a generalized analysis model named the transform learning model has been proposed, which follows the intuition that images are essential sparse in certain transform domain and can be expressed as , where is the transform matrix, is the sparse coefficient, and is the approximation error [18]. The distinguishing feature from the synthesis and analysis models is that approximation error of the transform learning model is in transform domain and is likely to be small. Another superiority of the transform model compared to the image domain model is that the former can achieve exact and extremely fast computations.

Instead of learning synthesis or analysis dictionary, the transform learning model aims at learning the transform matrix to minimize the approximation error. After getting the learned transform W, the original image is recovered by , where is the pseudoinverse of W. The transform learning model has earned great success in application of image denoising in both efficiency and effectiveness [1821].

Nonetheless, a remaining drawback is that the transform model overemphasizes transform domain but ignores the primary image domain. There is always a connection between image domain and transform domain, and this can be treated as a regularization term in image denoising.

For taking full use of the advantages of both image domain and transform domain and implementing single image denoising problem, this study focuses on sparsifying transform learning and essential sparsity property of image, and proposes a novel algorithm named sparsifying transform learning and weighted singular values minimization (STLWSM). Specifically, our model simultaneously considers the sparsifying transform learning and the weighted singular values minimization of image patches.

The remainder of this paper is organized as follows. In the next section, a brief review of the transform domain and image domain for IDN is provided. In section 3, we propose our method and obtain the efficient solution. Section 4 provides experimental results of gray images and color images. Conclusions are drawn in section 5.

2.1. Transform Domain for IDN

As mentioned in the previous section, the transform model can utilize the sparsity of image in transform domain to increase efficiency. Therefore, the analytical transform models such as wavelets and discrete cosine transform (DCT) are widely used in practical application, for instance, the image compression standards JPEG2000. As a classical and effective tool, transform models have been increasingly used in image denoising. Inspired by dictionary learning, Saiprasad et.al [19] proposed a learning sparsifying transform (LST) model. In [19], for any noisy image , it is first reformed to another resolution as , where each column represents a square patch of the original X extracted by a sliding window. Second, a transform matrix is randomly initialized to formulate the transform sparse coding problem as follows:where is the sparse coefficient, is the column of , and s is a constant representing the sparse magnitude. The additional regular term is used to avoid a trivial solution. is a balance coefficient, and is the log-determinant of W with base 10. Ravishankar and Bresler [19] solved the proposed problem by alternately updating W and and proved the convergence. To carry forward their achievements, they further proposed a learning doubly sparse transforms (LDST) for IDN [21]. Specifically, is adopted to replace the original W, where and are both square matrices with the same size. is a transform constrained to be sparse, and is an analysis transform with an efficient implementation. They use the doubly sparse transform model in image denoising and get faster and better results than unstructured transforms. And then, Wen et al. [18, 20] proposed a structured overcomplete sparsifying transform learning (SOSTL) model. The main feature different from aforementioned transform models is that Wen et al. cluster image patches and learns diverse W for corresponding patch groups. This process can be formulated as following:where is a regular term to prevent trivial solutions. {Ck} indicates the specific class of image , K is the number of categories, and G is the set of all classes.

2.2. Image Domain for IDN

While the transform learning models have achieved great success, in image domain, there also have been proposed various algorithms for IDN. As mentioned before, in the general image denoising model, F(X) is an additional regularization. The widely studied regularizations include l1, l2, and l1/2 norm, nuclear norm, low-rank property, and so on [2224]. Focusing on patch form instead of vector form, low-rank property has been attracting a significant research interest. As a convex relaxation of low-rank matrix factorization problem (LRFM), the nuclear norm minimization (NNM) has engrossed more attention [4, 6, 24, 25]. The nuclear norm of an image X is defined as , where is the ith singular value of X. However, many researchers hold that the minimization of different singular values should be separated. Liu et.al [4] proposed weight nuclear norm minimization (WNNM) for image denoising problems. The weight nuclear norm is defined as , and  = [, ,…, ] is nonnegative. At this point, we can treat F(X) as , and the denoising model is

By taking consideration of different singular values, as well as image structure, the WNNM shows strong denoising capability. Meanwhile, Hu et al. [6] proposed truncated nuclear norm regularization (TNNR) for matrix completion. They deemed that the minimization of the smallest min(m, n)-r singular values can maintain the original matrix rank by holding the first r nonzero singular values fixed. Using , the TNNR constrained model can be written as follows:

TNNR gets a better approximation to the rank function than nuclear norm-based-approaches. Inspired by both WNNM and TNNR, Liu et al. [26] improved the previous algorithms by reweighting the residual error separately and minimizing the truncated nuclear norm of error matrix simultaneously (TNNR-WRE). In their work, F(X) is considered as follows:where H = XY, U and V are the left and right matrices of H’s singular value decomposition (SVD), respectively, and r is the truncation parameter. TNNR-WRE further achieves higher accuracy than TNNR.

From above, the nuclear norm-based algorithms usually can get considerable results because of the essential low-rank property in image domain. For taking both advantages of transform domain and image domain in IDN, the sparsifying transform learning and weighted singular values minimization (STLWSM) method is proposed. In contrast to LST, LDST, SOLST, WNNM, TNNR, and TNNR-WRE, the proposed STLWSM jointly takes consideration of sparsity in transform domain and low-rank in image domain. The main results of our work can be enumerated as follows:(i)We propose a general framework of image process in both transform domain and image domain, which combines the sparsifying transform learning of image patches and the low-rank property of the original image.(ii)As image patches can take advantage of the nonlocal similarity existing inherently in the image, we learn the sparsifying transform for each group of similar patches by Euclidean distance.(iii)For solving the proposed NP-hard problem, we present an efficient alternative optimization algorithm. In practical applications, our method requires limited number of iterations, mostly less than 3, for the final solution.(iv)We applied our model to IDN, and the results show that STLWSM can achieve evident PSNR (peak signal to noise ratio) improvements over other state-of-the-art methods.

3. Proposed Method

In this section, we propose a general framework in both transform domain and image domain. To be clear, we take sparsifying transform learning in transform domain and weighted singular values minimization in image domain simultaneously. To solve this NP-hard problem, an efficient solution is also derived.

3.1. Sparsifying Transform Learning and Weighted Singular Values Minimization (STLWSM)

In light of the observations mentioned above, we first introduce a sparsifying learning transform based on image patches and utilize the weighted singular values minimization to improve the image quality.

Given a noisy image , nonlocal similarity is a well-known patch-based prior, which means that one patch in one image has many similar patches [79]. Accordingly, overlapped image patches can be extracted with a sliding window in a fixed step size. For each specific patch, we choose the most similar M patches by Euclidean distance [4, 7, 1820] for potential low-rank structure, and a matrix of is constructed. The patch’s size is , and the total number of depends on the size of the original image X, patch size, and step size. After similar patches’ aggregation process, in each group, is obtained, and . Following the idea of the transform learning algorithm [1820], with the obtained and some initialized Wi, our preliminary model can be formulated as the following:

The definition of Q(Wi) is the same as one in problem (4), but is the sparse representation of in transform domain, which is a matrix. Suppose the transform W i and sparse coefficient have been updated. The denoised patch can be obtained by . Obviously, also has low-rank structure; hence, we utilize weighted singular values minimization to approximate the matrix. The unified denoising minimization iswhere and are the regularization parameters and usually set empirically. This formulation can minimize the residual in transform domain and the rank of the recovered matrix simultaneously.

3.2. Efficient Optimization of the Proposed Model

In this subsection, we introduce an efficient solution for the nonconvex sparsifying transform learning and weighted singular values minimization problem. According to [1619], the transform learning process is not sensitive to the initialization of W. As a result, with given W, the subproblem of can be obtained using cheap hard-thresholding, . Here, is the hard-thresholding operator. And the subproblem of is as follows:

Because the term is more like a postfix operator, we divide the updating process of into two parts:(a)The first formula is

Decomposing as , . Then, can be written as . Let and have full SVD of and , respectively. If we take consideration of their diagonal matrix only, the foregoing formula can be rewritten aswhere is the constant and can be omitted. The revised problem is convex for , so the optimizing solution can be found by taking partial differential with respect to and setting the derivative to 0.

Therefore, excluding the nonpositive results, the solution is

To sum up, the transform update step can be computed as follows:(b)The Second Formula is

With fixed obtained in step (a), this part can be simply seen aswhere , and represents the denoised matrix. Following Liu et.al [4], a desirable weighting vector Wi in image domain can be given aswhere is the ith singular of , c is a positive constant, and  = 10−16 is to avoid dividing by zero. And the second formula’s optimal solution iswhere and the soft-thresholding operator is defined as .

The summary of our optimization solution is presented in Algorithm 1 where the similar patches are determined by Euclidean distance.

Input: is the noisy image with size , p is the patch size, M is the number of similar patches, initial sparsity, and ,, and are the constants.
 Output: is the denoised image
 Initialization: Wi is the DCT matrix of size , and is the number of similar patches’ group.
 For iteration = 1 : 3
 Do:
 For each group calculate
(1)Transform domain:(a)Decompose the image X into the patch form X ’.(b)Compute by .(c)Update by .
(2)Image domain:(a)Compute by SVD .(b)Compute .(c)Compute .
 End.
(3)Image reconstruction.

4. Experiment Results

In this section, we choose 25, 12, 15, and 10 reference images with a size of from TID2008 [27], USC-SIPI1, Live-IQAD [28], and IVC-SQDB [29] to test the image denoising effects, respectively. As we use six different noise levels to the test images in our experiments, the total number of distorted images is 372. Some representative images from USC-SIPI database are shown in Figures 1 and 2. Four recently proposed methods, including the patch-based algorithm GSR, weighted nuclear norm WNNM, sparsity learning transform scheme SOLST and sparsity transform learning, and the low-rank model STROLLR, are adopted as contrasts. The noisy images are obtained by additional Gaussian noise with  = 15, 20, 30, 40, 50, and 75. All competing algorithms use their default settings, which has been finely tuned and deeply verified in their original publications. Since our method is derived from both the schemes of image domain and transform domain, we set our parameters the same as the representative methods in these two domains, i.e., WNNM and SOLST, for fairness. That is, for the image denoising application, when, p is 6, M is 70, and is 0.54. When , p is 7, M is 90, and is 0.56. When , p is 8, M is 120, and is 0.58. And whenis set others, p is 9, M is 140, and is 0.58. In addition, 6 images of from USC-SIPI (Figure 3) are used in image inpainting application. For the image inpainting application, we also follow the similar setting rule. The balance parameters and are both set as . Table 1 shows the detailed parameter setting in our experiments where the texts in bracket are used for the images, while the plain ones are for the images.

The peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM) are used to evaluate the quality of the denoised images. PSNR is defined bywhere MSE is the mean squared error between the original image and the denoised one. SSIM is defined as [28, 30]where x and y represent the original image and the denoised one, respectively, and are the mean values of x and y, and are the variances, and is the covariance. C1 and C2 denote two stabilization variables.

For a thorough comparison, we list the average denoising results from all the 372 distorted images in Table 2. Also, the experimental results from all the gray images of USC-SIPI are provided in Table 3.

From these two tables, we can observe that among the competing algorithms, GSR also adopts the nonlocal similarity that groups image patches for low-rank structure. However, it requires too much iterations in practical applications, e.g., 100 or even up to 200 times. In contrast, WNNM needs fewer iterations, around 14, and achieves pretty good results than other 3 algorithms at an average of 8.26 dB for gray images. In the meantime, the proposed STLWSM needs the least iterations and achieves best performance.

SOLST and STROLLR are both transform algorithms and have hard-to-catch efficiency. STROLLR trains transform matrices for each group, while SOLST combines nonlocal low-rank and transform learning, and they also achieved better results than STROLLR at an average of 5.54 dB. In Table 3, the numerical results of the proposed STLWSM are all made bold that means the best one among the five algorithms. It is evident that the proposed method has achieved visible improvement in PSNR under all kinds of noise levels at an average of 13.61 dB. More visual results are shown in Figure 4, in which our method clearly outperforms all other methods.

Moreover, considering that GSR needs too much iterations, and pure transform learning algorithms are extremely faster; we compare our time consummation against WNNM, and the results are shown in Figure 5. It can be seen that our method spends much less time than WNNM, at an average of 55.46%.

Our algorithm also has good scalability; we further use RGB images in IDN, and experiments results show that the proposed STLWSM still outperform than other algorithms, and specific numerical comparisons are shown in Table 4. Again, Figures 6 and 7, respectively, show the visual results in terms of the average PSNR and the elapsed time, which also demonstrate our superiority against other competitors. Figures 8 and 9 show the visual results of average SSIM comparison of gray images and color images, respectively. It can be seen that our method can hold denoised image structure even with high noise rate.

For detailed display of the efficiency of our algorithm, we provide its generated results versus different iterations (up to 10). The experimental results are shown in Figure 10. All 12 images’ PSNR values are averaged for each noise level. The PSNR value of the original noisy images in different noise levels is shown as the starting point where the top black line is the max value of 24.63, the bottom black line is the min value of 10.65, and the red line represents the median of 17.64. And the green star is an average of 17.72. Figure 8 shows that our algorithm has a fast constringency speed and needs limited number of iterations, mostly 3, for the final solution.

We also applied our method in image inpainting with 6 images in sizes of , and the degenerated images are obtained by multiplying with a random logical matrix in an element-wise manner, and the missing rates are set as  = {15%, 20%, 30%, 40%, 50%}. The image inpainting results are shown in Table 5. The original images are shown in Figure 3. The results show that all methods achieve admirable inpainting results for filling in missing pixels, and the proposed STLWSM still outperforms all the other state-of-the-art algorithms. Taking into account the image denoising results, our STLWSM has better robustness with much less PSNR changes compared to other competing approaches.

5. Conclusions

In this paper, we have proposed a unified framework of image denoising using both knowledge from image domain and transform domain, namely sparsifying transform learning and weighted singular values minimization (STLWSM). Specifically, we learned the transform matrix for each group of patches with similar structure. After obtaining the optimized transform matrix and the sparse coefficient with an efficient optimization algorithm, we further restored the image patch groups through their low-rank prior. By adopting STLWSM to all the groups, a denoised image can be reconstructed. For both gray images and color images, experimental results show that, the proposed model can achieve visible improvement in PSNR over other state-of-the-art approaches. Our efficient optimization algorithm also costs much less running time compared to the typical image domain-based method. Note that while the pure transform learning methods run faster than STLWSM, they perform poorer with a large margin. To further improve, the efficiency of our framework will be our main work in the near future.

Data Availability

The image data are provided in the manuscript, and all images can be found in http://sipi.usc.edu/database/. The codes of this article are available in https://github.com/Yapan0975/STLWSM.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported in part by the National Key Research and Development Program of China under grant No 2018YFE0126100, National Science Fund of China under grant Nos. 51875524, 61873240, and 61602413, and the Natural Science Foundation of Zhejiang Province of China under grant No LY19F030016.