Optimizing Shrinkage Curves and Application in Image Denoising
A shrinkage curve optimization is proposed for weighted nuclear norm minimization and is adapted to image denoising. The proposed optimization method employs a penalty function utilizing the difference between a latent matrix and its observation and uses odd polynomials to shrink the singular values of the observation matrix. As a result, the coefficients of polynomial characterize the shrinkage operator fully. Furthermore, the Frobenius norm of the penalty function is converted into the corresponding spectral norm, and thus the parameter optimization problem can be easily solved by using off-and-shelf plain least-squares. In the practical application, the proposed denoising method does not work on the whole image at once, but rather a series of matrix termed Rank-Ordered Similar Matrix (ROSM). Simulation results on 256 noisy images demonstrate the effectiveness of the proposed algorithms.
Low rank matrix approximation has been attracting significant research interest in recent years. This approach aims to reconstruct the latent data from its degraded observation matrix and is frequently applied in many fields, such as machine learning , computer version , recommendation system , and image processing . As a branch of this research, a regularized nuclear norm minimization problem is widely considered over matrices,where denotes a matrix, scalar is a parameter, and and are the data fidelity term and the data regularization term, respectively. In this formula, is called the nuclear norm of , where denotes the th largest singular value of .
If is convex, then is a convex function because nuclear norm is also convex. Thus, problem (1) is a convex optimization problem and can be treated by using various classic iterative optimization algorithms including steepest-descent, conjugate-gradient, and interior-point algorithms. When the data fidelity term , where is an observation matrix and denotes the Frobenius norm operator, this is the well-known nuclear norm minimization (NNM) . The NNM problem was proved that it can be solved by applying a soft threshold operation on the singular values of , and the solution can be achieved using a Singular Value Thresholding (SVT) algorithm .
Despite the success of NNM, it is not flexible enough to handle more complex issues. To pursue the convex property, NNM treats each singular value equally. As a result, the soft-thresholding operator shrinks each singular value by the same amount . In principal component analysis however, different principal directions quantify different information. For example, the large singular value delivers the major feature information such as edges and texture. This implies that, in image denoising, the larger the singular value is, the lesser the amount shrinks. Obviously, the NNM model and the corresponding solver cannot handle this issue.
To overcome this limitation, a regularized nuclear norm minimization with weights was put forward. These weights may enhance the representation capability of the original nuclear norm. Its form is as follows:where is the weight designed to .
Problem (2) is a nonconvex nonsmooth low rank minimization problem. Of course, if is replaced with , problem (2) reverts to problem (1). Solving problem (2) is challenging, or even NP-hard. To solve this problem, researchers presented some assumption to handle it. Gu et al.  assumed that is nondescending on , , and thus problem (2) becomes convex and can be solved by a soft-thresholding operation. Moreover, the authors devised a solver in which is inversely proportional to . Gasso et al.  argued that if both and are convex, problem (2) can be solved by DC (Difference of Convex functions) programming. Lu et al.  assumed that is concave increasing monotonically on and satisfies Lipschitz continuous condition; the weights are achieved at the super gradient point of the concave function . Based on this assumption, the authors proposed Iteratively Reweighted Nuclear Norm (IRNN) method. In addition, Hu et al.  reported their Truncated Nuclear Norm Regularization (TNNR) method, based on the same assumptions as in .
By applying these low rank matrix approximation theories, different image denoising methods have been reported. For example, a method of coupling sparse denoising and unmixing with low rank constraint is proposed for hyperspectral image in ; a scheme of incorporating iterative support detection into TNNR is presented to reduce white Gaussian noise in ; the eigenvectors of the Laplacian are considered to suppress Gaussian noise in ; a weighted nuclear norm minimization model is presented  and is used in three applications, that is, image denoising, background subtraction, and image completion. These methods achieve high quality results. A main reason is that all of them employ a powerful patch-based technique.
Inspired by weighted nuclear norm minimization and patch-based technique, a parameter optimization method is proposed in this paper. The proposed method utilizes the difference between a latent matrix and its observation to design a penalty function and employs odd polynomials to shrink the singular values of the observation matrix. As a result, the coefficients of polynomial fully characterize the shrinkage operator. Furthermore, for the penalty function, its Frobenius norm is converted into a spectral norm. Thereby, the parameter optimization can be easily solved by using plain least-squares.
To validate the effectiveness, the optimization theory is applied in image denoising. Since the proposed method is to optimize shrinkage curves, it is called OSC method. In the practical application, the OSC method does not work on the whole image at once, but rather a series of matrix termed Rank-Ordered Similar Matrix (ROSM, see Definition 2). Thirty-two images were tested. Experimental results show that the OSC method achieves better results than the Bilateral Filter; when the noisy standard deviation is less than 20, the results achieved by OSC are better than those by BM3D, and when the noisy standard deviation varies from 20 to 40, the results by OSC are weaker than by BM3D.
The contribution of this paper is twofold. Firstly, in the penalty function devised for weighted nuclear norm minimization, the weight representation is replaced by odd polynomials. So, the coefficients of polynomial characterize the role of the weights fully. Furthermore, the Frobenius norm of the penalty function is converted into a spectral norm. Secondly, the proposed optimization method is adapted to image denoising. Experimental results show that the proposed OSC method outperforms the Bilateral Filter and also is superior to the BM3D method on the case of low noise.
The rest of paper is organized as follows. In Section 2, the shrinkage curve optimization is formulated. Section 3 describes the image denoising algorithm, and the corresponding analysis is followed in Section 4. Section 5 reports the experimental results, and conclusions are drawn in Section 6.
2. Optimizing Shrinkage Curves
In this section, the problem to be discussed is formulated, and the method of optimizing shrinkage curves is followed.
2.1. Problem Formulation
Let be a unknown square matrix in , and let be its observation. The observed matrix is corrupted by white Gaussian noise with deviation . This is expressed as
To reconstruct the original square matrix from its noisy version, the following weighted nuclear norm minimization with a constraint is considered:where is a threshold, , and . In this formula, denotes the th largest singular value of , and and are the weighted nuclear norm of and the Frobenius norm of , respectively. Our aim is to use polynomial coefficients to characterize the weights and obtain the solution of the problem.
2.2. Optimization Method
As in , it is proven that the weighted nuclear norm minimization (4) can be solved by imposing a soft threshold operation on the singular values of observation matrix. The form is as follows:where is the Singular Value Decomposition (SVD) of and is the soft-thresholding operator. This operator with weight vector shrinks the singular values; . To obtain the thresholds, the following penalty function is employed:
Assuming that shrinkage operation is applied differently to every singular value in matrix , and thus it can be broken intowhere is the th largest singular value of and denotes the th shrinkage operator. In our work, odd polynomials are taken to represent these shrinkage operators, where the coefficients of polynomial characterize the shrinkage operator fully. Thus, the shrinkage operator is expressed as
Substituting (8) into (7), the shrinkage operator can be rewritten as
Substituting (9) into (6) and considering is a unitary matrix, the penalty function can be rewritten as
The focus now shifts to optimization of the penalty function. To obtain the optimal solution by using plain least-squares, the Frobenius norm in (10) is converted into a spectral norm. For ease of formulation, a vector is defined that contains all the coefficients series: , ; that is,
A matrix is also defined that is a block-diagonal matrix with blocks,where each of the blocks is size of , with the content
In addition, two operators, and , are introduced. The operator returns a vector by concatenating the columns in a matrix; the operator returns a diagonal matrix by putting the elements of a vector on the main diagonal. For example, if is a matrix, returns the vector ; if is a vector, returns the diagonal matrix where the main diagonal is .
Using these notations and operators, the penalty function (10) can be rewritten aswhere and .
The optimal set of parameters that define the shrinkage curves iswhich leads to
3. Application in Image Denoising
In this section, the OSC method is introduced for image denoising, containing denoising modeling data and the denoising algorithm.
3.1. Denoising Modeling Data
The proposed OSC method does not work on the whole image at once, but rather a matrix-set in which each matrix contains a fixed number of similar patches extracted from the original noisy images. A patch is first defined as follows.
Definition 1. denotes an image, sized pixels. Let be a reference pixel. A block of size is extracted from , where resides at the top-left corner. By applying to the block, the block is identified with a vector in . The corresponding patch is defined bywhere and . When all pixels are complete, a patch-set in can be built, denoted by .
Definition 2. Let be a reference patch and denote the Euclidean norm; the distance between and can be calculated, using the metricThese scalar distances are then sorted and the patches in are correspondingly ordered,where denotes the th smallest distance value, denotes the order relation between patches, and denotes the number of patches in . The denoising modeling data, termed Rank-Ordered Similar Matrix (ROSM), are defined asObviously, ROSM is a square matrix of size and . When all patches in are complete, a matrix-set in can be built, denoted by .
3.2. Denoising Algorithm
The proposed denoising algorithm consists of Algorithms 1 and 2. The former trains the polynomial coefficients , and the latter reduces noise.
Suppose is a noisy image and is its noise-free version. Based on Definition 2, the matrix-set of is built, and the corresponding noise-free matrix-set is also obtained, denoted by . Thereby, there exist paired samples to train . The corresponding penalty function is as follows:where . The optimum parameters arewhere , , and can be obtained by using (12) and (13).
Next, the two-iteration denoising algorithm is introduced. Suppose is a noisy image to be processed, of size . Employing Definition 2, the matrix-set of is built. For any ROSM in , the estimate of its noise-free version can be obtained bywhere is the SVD of and is expressed as (10). When all ROSMs in are complete, an estimated-set can be built, denoted by . Every estimate is put back into the original image canvas, then intensities of pixels falling in the same position in the canvas are averaged. And thus, the first-estimate of the noise-free image is obtained, denoted by . Repeating the above operations again, the second-estimate is also yielded.
In addition, a scaled version of the residual error, the difference between the noisy and estimated image, is considered. Let denote the operation putting the th estimate back into the original image canvas; the denoising method can be expressed as the following tuples formula:where is the th image estimate, is the th adjustment, is the scaled factor of residual error, denotes entry-wise division, and denotes the matrix with the same size as the noisy image, in which each entry records the number that the pixel on the same location is processed.
4. Algorithm Analysis
In this section, the influence of different levels of noise for the singular vectors is first discussed; then the shrunken scales of singular values are analyzed.
4.1. Influence of Different Levels of Noise for Singular Vectors
Two matrices, and , are chosen arbitrarily from the noisy matrix-set , and and . Thus, and , where denotes the noise-free vector and is white noise. The squared Euclidean distance between the two vectors and iswhere the symbol is the inner-product operator.
The vector difference, , is the Gaussian noise with covariance matrix , where symbol denotes identity matrix. Based on the argument in , if the dimension of is large, the norm squared, , is concentrated around its mean , with high probability. Similarly, the projection of the noise on the deterministic vector is concentrated around 0 with high probability. Therefore, (17) can be interpreted as a translation-invariant procedure.
On the other hand, as the noise level increases, the singular vectors of are no longer aligned with the original singular vectors of noise-free matrix . As a result, the number of reliable singular vectors of becomes smaller. For example, when is corrupted by Gaussian noise at very small magnitude, all the singular vectors of track the noise; when corrupted at a large magnitude, the number of the singular vectors tracking the noise becomes smaller. Therefore, at increasing noise level, the singular vectors of become more and more unreliable and result in worse restored feature.
4.2. Shrunken Scales of Different Singular Values
Let be a paired samples and be the th largest singular value of . The coefficients vector , of size , is trained on the paired samples and then is divided into parts; , where corresponds to the th largest singular value of . And thus, the estimate of the th largest singular value of is as follows:
To observe the shrunken scales of the different visually, the shrinkage curves are shown in Figure 1. Test results show that (i) different singular values have different shrunk scales; (ii) the larger the singular value, the smaller the shrunk scales; (iii) the shrunken scales for fixed index singular value are almost the same although is corrupted by different levels of noise.
In this section, the OSC denoising performance is evaluated. In Section 5.1, setting parameters are formulated. Section 5.2 introduces two metrics, the Peak Signal-to-Noise-Ratio (PSNR) and the Mean Structural Similarity Index Measure (MSSIM). Section 5.3 reports the experimental results, and the comparisons with other methods are discussed.
5.1. Setting Parameters
A total of three parameters are set in the proposed method, a scale factor of residual error, an integer regulating odd polynomial order, and a patch-size . The large may lead to oscillatory effects. For all noise levels, it is desirable for to be set to 0.1. To study the effect of parameters () on denoising, the coefficients vector was trained on the Lena image, and the trained is used to reduce noise for the Barbara image. The influences of and for PSNR results are shown in the left and right insets in Figure 2, respectively. By experience, these parameter values used in our experiments are shown in Table 1. During implementation process, because the behavior of coefficients on large entries may be completely distorted, thereby an input value outside the training range may be violently magnified. For this case, the algorithm checks whether the input is within the learning data range, and if not, it is not modified at all.
5.2. Two Metrics
The PSNR measurement is based on pixel intensity errors between the noise-free and the restored images, given in decibels (dB). Higher PSNR means better denoising capability. Let and be the noise-free and the restored image, respectively, denote the cardinality of , and denote the Frobenius norm; the calculation of PSNR is as follows:
The MSSIM measurement is the mean SSIM that yields mean value of structural similarities. MSSIM values are bounded in the range , and the closer the value is to 1, the better the denoising scheme is implied. The calculation of SSIM involves paired blocks extracted from the noise-free image and the restored , respectively. Let and denote paired blocks, and be the mean values of and , respectively, and be respective variances, and be their covariance. Assuming and are two stabilization variables, the SSIM value can be calculated by
5.3. Experimental Results and Comparisons
A test set was built to evaluate the OSC method, which was a combination of two groups, denoted by . Each group contained noisy images, with eight different levels of white noise. The group is associated with the sixteen noise-free images widely used, shown in Figure 3; the group is associated with the sixteen noise-free biomedical images, shown in Figure 5. These noise-free images were taken from the CVG-Granada database.
The corresponding training set was also built, which was a subset of the test , denoted by . The group contained the noisy versions of the five noise-free images, the Einstein, Elaine, Barbara, Fingerprint, and the Man image; the group included the noisy versions of the six noise-free images, the C05c, Celulas, Cromo2, Mr2, Mr032, and the Mr034 image. The OSC method was applied to the test set . Using the parameter values shown in Algorithm 1, the coefficients were trained by Algorithm 1. Algorithm 2 used the same parameter values and the trained . The PSNR and MSSIM results associated with both groups are reported in Tables 2 and 3, respectively. Moreover, the visual results and zoom-in for three widely used images are shown in Figure 4 and are shown in Figure 6 for three biomedical images.
To augment the performance evaluations, the OSC method was compared with Bilateral Filter  and BM3D . For the Bilateral Filter there are three parameters needed to be set, containing a sliding window of size , a geometric spread standard deviation , and a photometric spread standard deviation . The three parameters, , , and , were set to 5, 5, and 0.1, respectively. The source codes of BM3D were taken from the original authors, and the parameter values used in the experiments were those recommended by the authors. The PSNR and MSSIM results from the two methods are reported in Table 2, and the visual results for three images are shown in Figure 4.
In addition, the mean PSNR and mean MSSIM results for fixed noise are calculated for noisy images, the Bilateral Filter, BM3D, and the OSC method, respectively. The corresponding calculations are as follows:
In these two formulae, denotes all the images with the same level noise in the test set , and thus ; and and denote the PSNR and MSSIM values of the th image with the noise , respectively. For example, if the OSC method employs formula (29) to calculate, denotes the arithmetical mean PSNR for all images with the noise , associated with the OSC method. The mean PSNR and MSSIM values for fixed noise for the four methods are plotted in Figure 7.
Some observations and conclusions can be drawn from the quantitative measurements and visual results. Firstly, the OSC method achieved the desirable results. The OSC method achieved the same results as the BM3D method on average and significantly outperformed the noisy method and the Bilateral Filter by 8.49 dB and 5.26 dB on average, respectively, in the PSNR results. The OSC method achieved the same results as the BM3D method on average and significantly outperformed the noisy method and the Bilateral Filter by 0.417 and 0.275 on average, respectively, in the MSSIM results. Secondly, the OSC method has a strong capability to preserve detail. The OSC method reconstructed more image details from noisy images than the Bilateral Filter and was almost the same as the BM3D. As seen in Figure 4, the fine-details of the moustache for the Baboon image were preserved, while the Bilateral Filter introduced excessive smoothing and resulted in visual feature blurring. Thirdly, in contrast to BM3D, the OSC method achieved decreasing results at increasing levels noise, as in Figure 7.
In this paper, a shrinkage curves optimization is proposed for weighted nuclear norm minimization. Based on the theory  that the weighted nuclear norm minimization can be solved by imposing a soft threshold operation on the singular values, the proposed optimization method designs a penalty function using the difference between a latent matrix and its observation and employs odd polynomials to shrink the singular values of the observation matrix. As a result, the coefficients of polynomial fully characterize the shrinkage operator. Furthermore, the Frobenius norm of the penalty function is converted into the corresponding spectral norm, so the parameter optimization can be easily solved by off-and-shelf plain least-squares.
To validate the effectiveness, the proposed parameter optimization method is adapted to image denoising. In our experiments, a total of 256 noisy images were tested. They contained the noisy version of thirty-two noise-free images, corrupted by eight different levels of noise. Experimental results show that the proposed denoising method is effective in terms of the comparative results. The proposed method achieves better results than the Bilateral Filter; when the noisy standard deviation is less than 20, the proposed method slightly outperforms the BM3D method and is weaker than the BM3D method when the noisy standard deviation varies from 20 to 40.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
This work was partially supported by National Natural Science Foundation of China under Grant no. 61300192, by Fundamental Research Funds for the Central Universities of China under Grant no. ZYGX2014J052, by Foundation Science and Forefront Technology of Chongqing Science & Technology Commission under Grant no. cstc2016jcyjA0571, and by Ph.D. Cultivation Foundation of Chongqing University of Posts and Telecommunications under Grant no. RC 2016002.
M. Ye, Y. Qian, and J. Zhou, “Multitask sparse nonnegative matrix factorization for joint spectral-spatial hyperspectral imagery denoising,” IEEE Transactions on Geoscience and Remote Sensing, vol. 53, no. 5, pp. 2621–2639, 2015.View at: Publisher Site | Google Scholar
Y. Peng, A. Ganesh, J. Wright, W. Xu, and Y. Ma, “RASL: robust alignment by sparse and low-rank decomposition for linearly correlated images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 11, pp. 2233–2246, 2012.View at: Publisher Site | Google Scholar
Y. Li, Y. Sun, and Y. Chi, “Low-rank positive semidefinite matrix recovery from corrupted rank-one measurements,” IEEE Transactions on Signal Processing, vol. 65, no. 2, pp. 397–408, 2017.View at: Publisher Site | Google Scholar | MathSciNet
Y.-Q. Zhao and J. Yang, “Hyperspectral image denoising via sparse representation and low-rank constraint,” IEEE Transactions on Geoscience and Remote Sensing, vol. 53, no. 1, pp. 296–308, 2015.View at: Publisher Site | Google Scholar
B. Recht, M. Fazel, and P. A. Parrilo, “Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization,” SIAM Review, vol. 52, no. 3, pp. 471–501, 2010.View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
J.-F. Cai, E. J. Candes, and Z. Shen, “A singular value thresholding algorithm for matrix completion,” SIAM Journal on Optimization, vol. 20, no. 4, pp. 1956–1982, 2010.View at: Publisher Site | Google Scholar | MathSciNet
S. Gu, L. Zhang, W. Zuo, and X. Feng, “Weighted nuclear norm minimization with application to image denoising,” in Proceedings of the 27th IEEE Conference on Computer Vision and Pattern Recognition (CVPR '14), pp. 2862–2869, June 2014.View at: Publisher Site | Google Scholar
G. Gasso, A. Rakotomamonjy, and S. Canu, “Recovering sparse signals with a certain family of nonconvex penalties and DC programming,” IEEE Transactions on Signal Processing, vol. 57, no. 12, pp. 4686–4698, 2009.View at: Publisher Site | Google Scholar | MathSciNet
C. Lu, J. Tang, S. Yan, and Z. Lin, “Generalized nonconvex nonsmooth low-rank minimization,” in Proceedings of the 27th IEEE Conference on Computer Vision and Pattern Recognition (CVPR '14), pp. 4130–4137, June 2014.View at: Publisher Site | Google Scholar
Y. Hu, D. Zhang, J. Ye, X. Li, and X. He, “Fast and accurate matrix completion via truncated nuclear norm regularization,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 9, pp. 2117–2130, 2013.View at: Publisher Site | Google Scholar
J. Yang, Y.-Q. Zhao, J. C.-W. Chan, and S. G. Kong, “Coupled sparse denoising and unmixing with low-rank constraint for hyperspectral image,” IEEE Transactions on Geoscience and Remote Sensing, vol. 54, no. 3, pp. 1818–1833, 2016.View at: Publisher Site | Google Scholar
Y. Wang and X. Su, “Truncated nuclear norm minimization for image restoration based on iterative support detection,” Mathematical Problems in Engineering, vol. 2014, Article ID 937560, 2014.View at: Publisher Site | Google Scholar
F. G. Meyer and X. Shen, “Perturbation of the eigenvectors of the graph Laplacian: application to image denoising,” Applied and Computational Harmonic Analysis, vol. 36, no. 2, pp. 326–334, 2014.View at: Publisher Site | Google Scholar | MathSciNet
S. Gu, Q. Xie, D. Meng, W. Zuo, X. Feng, and L. Zhang, “Weighted nuclear norm minimization and its applications to low level vision,” International Journal of Computer Vision, vol. 121, no. 2, pp. 183–208, 2017.View at: Publisher Site | Google Scholar
N. El Karoui, “On information plus noise kernel random matrices,” The Annals of Statistics, vol. 38, no. 5, pp. 3191–3216, 2010.View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
C. Tomasi and R. Manduchi, “Bilateral filtering for gray and color images,” in Proceedings of the IEEE 6th International Conference on Computer Vision, pp. 839–846, January 1998.View at: Google Scholar
K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Image denoising by sparse 3-D transform-domain collaborative filtering,” IEEE Transactions on Image Processing, vol. 16, no. 8, pp. 2080–2095, 2007.View at: Publisher Site | Google Scholar | MathSciNet