EURASIP Journal on Advances in Signal Processing 
Volume 2008 (2008), Article ID 429128, 12 pages
doi:10.1155/2008/429128
Research Article

Iterative Estimation Algorithms Using Conjugate Function Lower Bound and Minorization-Maximization with Applications in Image Denoising

Guang Deng1 and Wai-Yin Ng2

1Department of Electronic Engineering, La Trobe University, Bundoora, Victoria 3086, Australia
2Department of Information Engineering, The Chinese University of Hong Kong, Shatin, Hong Kong

Received 19 September 2007; Revised 3 January 2008; Accepted 11 February 2008

Recommended by Hubert Cardot

Abstract

A fundamental problem in signal processing is to estimate signal from noisy observations. This is usually formulated as an optimization problem. Optimizations based on variational lower bound and minorization-maximization have been widely used in machine learning research, signal processing, and statistics. In this paper, we study iterative algorithms based on the conjugate function lower bound (CFLB) and minorization-maximization (MM) for a class of objective functions. We propose a generalized version of these two algorithms and show that they are equivalent when the objective function is convex and differentiable. We then develop a CFLB/MM algorithm for solving the MAP estimation problems under a linear Gaussian observation model. We modify this algorithm for wavelet-domain image denoising. Experimental results show that using a single wavelet representation the performance of the proposed algorithms makes better than that of the bishrinkage algorithm which is arguably one of the best in recent publications. Using complex wavelet representations, the performance of the proposed algorithm is very competitive with that of the state-of-the-art algorithms.

1. Introduction

Estimating signal from noisy observations is a fundamental task in signal processing. A linear observation model is given by(1)where is a column vector of the true signal, and are vectors of observations and noise, respectively. We assume that is a known matrix. When noise is assumed independent and identically distributed (i.i.d.) Gaussian, the maximum likelihood (ML) estimation is a typical least-squares problem [1]. When the distribution of the noise is assumed heavy-tailed, the ML estimation is a robust regression problem [2]. When noise is i.i.d. Gaussian, the maximum a posteriori (MAP) estimation problem is essentially a penalized least-squares problem. For example, the problem is known as a ridge-regression [3] or weight-decay [4] problem when the prior for is also i.i.d. Gaussian. When the prior for is non Gaussian, it is usually chosen to model the sparseness of the signal [5, 6], to control the complexity of the model [3, 4], or to perform model selection [7, 8]. A typical application that exploits the sparseness of the signal is in wavelet-based image denoising [911].

Among the non-Gaussian distributions, a particularfamily of heavy-tailed distributions is the scale mixture of Gaussian (SMG) distribution [12]. These distribution functions, including power exponential, student-t, and slash distributions [13, 14], have found many successful applications in robust statistical data analysis [14, 15], image processing [11, 16], and machine learning problems [6, 17]. An interesting discussion of robust estimation using the SMG distribution is given in [18].

After we make assumptions about the likelihood andthe prior, the MAP estimate of can be determined by solving a specific optimization problem. Optimization algorithms that are based on variational methods [19, 20] and the minorization-maximization principle [21] are powerful alternatives to the EM algorithm. For example, optimization algorithms based on variational methods have been widely used in Bayesian learning that employs non-Gaussian distributions. These algorithms are based on maximizing the variational lower bound which is given either by introducing an auxiliary distribution function [20, 22, 23] or by representing the objective function in its variational form using the conjugate function [19, 24, 25]. Variational methods have also been applied to a number of signal processing problems [17, 23, 26]. The basic idea of the minorization-maximization (MM) algorithm [21] is that, instead of directly maximizing the objective function, another objective function that minorizes the original objective function is iteratively maximized. (The formal definition is presented in Section 2.2.) The MM algorithm has been successfully applied to solve many statistical problems including variable selection [27] and quantile regression [28]. It has also found applications in machine learning research [29]. Both variational methods and the MM algorithm have long been applied to solve many signal processing problems such as image restoration [3034] and computer tomography [3538].

In this paper, we develop an iterative algorithm to determine the MAP estimate of based on the Gaussian linear observation model given by (1) and the prior given by an i.i.d. scale mixture of Gaussian distribution. A direct application of this algorithm is image denoising in the wavelet domain. This is performed by letting in the observation model and regarding and as the observed and original wavelet coefficients, respectively. This is a special case of the linear observation model. Since our study is based on the conjugate function lower bound (CFLB) and minorization-maximization, we also study the connection between the two.

This paper is organized as follows. In Section 2, after a brief introduction of the CFLB and MM algorithms, we present a generalized view of both algorithms and two extensions. In particular, we study iterative optimization algorithms for a class of objective functions . We assume that through a suitable mapping of the variable , the resulting function () is convex and differentiable. This type of objective function is studied because the log-prior (the logarithm of the scale mixture of Gaussian distribution) has this property and plays an important role in the algorithmic development. We describe iterative algorithms called the CFLB algorithm and MM algorithm for maximizing the objective function. We then propose a generalized version of both algorithms and show that they are indeed equivalent for the class of objective functions considered in this paper. In Section 3, we develop an iterative algorithm for MAP estimate of the signal under the general model setting of (1). We also discuss the connection between the developed algorithm with an EM algorithm. In Section 4, we modify the algorithm developed in Section 3 for image denoising in the wavelet domain. We study two heavy-tailed priors for wavelet coefficients: student-t and slash distributions which are of interest as they have not yet been widely studied in image denoising. Recognizing that the proposed algorithms can be regarded as generalized Wiener estimation algorithms, we propose two algorithms which exploit the local statistics. One is a noniterative algorithm which has a parameter that accounts for the heavy-tailed characteristics of the signal. The other is an iterative algorithm based on either the student-t or slash distribution. We also discuss the connection of proposed algorithms with algorithms based on empirical Bayes and issues related to using the proposed algorithm in complex wavelet representations. Experimental results show that when using a single wavelet representation the performance of the proposed algorithms is better than that of the bi-shrinkage algorithm [39] which is arguably one of the best in recent publications. Using over-complete wavelet representations, the performance of the proposed algorithm is competitive to that of the state-of-the-art image denoising algorithms [16, 39, 40].

2. Iterative Maximization Based on the Convexity of the Objective Function

In this section, we present a brief introduction to the CFLB and MM algorithms for determining the local/global maximum of a objective function . We assume that through a suitable mapping we have a convex and differentiable function such that . The proposed CFLB and MM algorithms are based on the convexity of . We then present a generalized version of these two algorithms and two extensions. This is followed by two families of objective functions for which the CFLB and MM algorithms are useful tools.

2.1. The Conjugate Function Lower Bound 85(CFLB) Algorithm [19, 20]

The conjugate function [41] of is(2)When is convex and differentiable, the maximizer of satisfies , where . For a fixed , is recovered by(3)Using Fenchel's inequality [41], for any and , the conjugate function lower bound for is given by(4)We can define a new objective function(5)Substituting into (3), it is clear that (6)and for any .

An iterative algorithm that guarantees a nondecreasing sequence of is the following. The algorithm, called the conjugate function lower bound (CFLB) algorithm, has two maximization steps. At the kth iteration, we know and maximize to obtain . It is easy to show that(7)where . Next, we calculate by maximizing ,(8)where we assume that there is at least a local maximum for . Since is fixed, is a constant. We can write(9)From the above two maximization steps, we can write(10)According to definition, we have and . Thus, . There-fore, the CFLB algorithm leads to a nondecreasing sequence of .

2.2. The Minorization-Maximization (MM) Algorithm

A function with a known parameter is said to minorize at the point provided(11)Let be the maximizer of , such that(12)From the definition, we have(13)Therefore, maximizing results in a nondecreasing sequence . This algorithm is called a minorization-maximization (MM) algorithm.

For a convex and differentiable function , we have(14)Substituting into (14), we have(15)Thus, is minorized by(16)since and ). Therefore, the MM algorithm that iteratively maximizes (17)leads to a nondecreasing sequence . The convergent property of the MM algorithm and techniques to speed up the convergence rate are discussed in [21, 42].

2.3. Generalization, Comparison, and Extensions
2.3.1. Generalization and Comparison

The basic ideas of the CFLB and MM algorithms can be generalized as the following. To find a local/global maximum of , we assume there is a function : (18)where is a suitable function of . An iterative algorithm can now be developed. In the th step, we assume is known and we calculate . Then in the th step, we determine such that(19)Since by definition and , we have . Therefore, when the two conditions stated in (18) are satisfied, the uphill location for is also the uphill location for .

The CFLB and MM algorithms are special cases of the above generalized algorithm. There are two special considerations in the CFLB and MM algorithms.

(i) One is operational. is determined by themaximization In light of (19), this maximization step in the CFLB and MM algorithm is sufficient but not necessary. (ii) The other is structural. For an MM algorithm, , while for a CFLB algorithm, the function depends on the definition of the conjugate function lower bound . In addition, we can clearly see that the objective function ) of the CFLB algorithm is exactly the same as the minorization function of the MM algorithm. This is because for a convex and differentiable function its conjugate function lower bound [41] is the same as the minorizing function used in the MM algorithm. Therefore, the CFLB algorithm and the MM algorithm, when they rely on the convexity of , are essentially the same in searching for a local/global maximum of the function . We note that for the MM algorithm there are other tools to construct the minorizing function which is not necessarily the same as that constructed by using the CFLB.

2.3.2. Two Extensions

Here, we assume the objective function has at least a local maximum. In the first extension, we consider an objective function of a vector variable : (20)where and is scalar. Assume is convex. Then, is minorized by . The CFLB/MM algorithm for maximizing is given by(21)

In the second extension, we consider an objective function which is the sum of defined in (20) and another objective function :(22)Here, we assume that has at least a local maximum. Since is minorized by , is minorized by . The CFLB/MM algorithm for maximizing is given by(23)

2.4. An Example

In this section, we show that the logarithm of the scale mixture of Gaussian distribution function [12] has the desired convex property after suitable mapping of variables. A family of heavy-tailed distributions for a zero-mean scalar random variable is defined as a scale mixtures of Gaussian:(24)where and is the prior distribution of . (We adopt the following notations in this paper. The conditional distribution function is denoted by , while a function parameterized by a parameter is denoted by . A vector is written in bold-face font, while its nth elements is denoted by .) Here, we have included two parameters and to account for certain distributions such as the student-t distribution which has a scaling parameter and a parameter for the degree of freedom. Different settings for result in a family of heavy-tailed distributions [11, 12, 18]. For example, when is a gamma distribution with both parameters set to , the resulting SMG is the student-t distribution. The Gaussian distribution is a special case where is not a random variable but is a constant . The definitions of three heavy-tailed distributions: power exponential, student-t, and slash are shown in Table 1. More examples can be found in [11]. We note that the Laplacian () and Gaussian distribution () are two special cases of the power exponential distribution. In addition, the power exponential distribution function can be represented as the SMG when . (This is a subset of the power exponential distribution function that can be represented as SMG.)

Table 1: Three heavy-tailed distributions are presented, where and are the gamma function and incomplete gamma function, respectively. We assume is a fixed parameter.

We will use to simplify notations in the following discussion. We study the logarithm of the scale mixture of Gaussian distribution function(25)where(26)Changing the variable , we have a new function and its first derivative as follows:(27)(28)

We can verify the convexity of by recognizing that it is the logarithm of the Laplace transform of . The Laplace transform of islog-convex [41]. Therefore, the proposed CFLB/MM algorithm can be used to maximize an objective function which involves log-SMG distribution functions. The function and its first derivative for the three distributions are listed in Table 2.

Table 2: The function and its first derivative for the three heavy-tailed distributions. For power exponential, the parameter must be set as such that is convex. We note that and can be directly calculated from the distribution function. The knowledge of the exact form of is not required.

3. Map Estimation under Linear Gaussian Observation Model

In this section, we present the development of a CFLB/MM algorithm for solving a general MAP estimation problem of linear Gaussian model. We then discuss the connection between the developed algorithm with an expectation maximization (EM) algorithm.

3.1. Development of the Iterative Algorithm

Given a linear observation model in (1), we want todetermine a MAP estimate of the parameter vector based on the following model assumptions. Elements of are i.i.d. Gaussian with known variance . Elements of are i.i.d. scale mixture of Gaussian with unknown scaling parameter . The degree of freedom is assumed to be a free parameter that can be tuned. (A full Bayesian estimate for is generally very computationally complicated [15, 43] and is beyond the scope of this paper.) More specifically, the log-posterior is given by(29)where(30)Changing variable , we have , where the function is convex and is given by (27). We can rewrite (29) as(31)where(32)Since is minorized by , is minorized by the following objective function:(33)where is a diagonal matrix. The update for is then obtained by maximizing ,(34)Here, we need to make a further assumption that the matrix is properly defined such that the matrix inversion in the above equation can be carried out for each iteration. The next step is to determine the update for the scaling parameter . To simplify presentation, we assume is a uniform distribution in this section. Other priors for are considered in Section 4.1. The update of the scaling parameter is given by(35)

3.2. Equivalence with the EM Algorithm

In [44], we develop an EM algorithm for the MAP estimation problem. In this section, we present the details of the EM algorithm and show that it is equivalent to the CFLB/MM algorithm. In developing the EM algorithm, we regard the parameters as the missing data. The signal and the scaling factor , denoted , are the data to be estimated. We then determine the Q-function [45](36)We now give details of calculating the Q-function and the E-step. Using Bayes' rule, we can write(37)where(38)Therefore, ignoring constants and unrelated terms, we have the following results:(39)where is the conditional mean(40)Equation (40) states the calculation required for the E-step. In the M-step, we maximize the Q-function to determine .

The E-step is calculated as the following:(41)where(42)Since is Gaussian(43)we have(44)where we have used the substitution . Substituting (44) into (41) and comparing with (28), we can see that . Comparing objective function of the CFLB/MM algorithm (33) with the Q-function of the EM algorithm (39), we can see the two algorithms are equivalent.

4. Applications in Image Denoising

4.1. Iterative Denoising Algorithms

We now use image denoising in wavelet transform domain as an example to demonstrate the application of the proposed algorithm. (Part of this section was presented in [34, 46].) In the wavelet domain, we have the following observation model:(45)where and are observed and original wavelet coefficients of the signal, respectively. is the additive Gaussian noise. Therefore, the denoising problem is a special case of the MAP estimation problem (considered in Section 3), where is an identity matrix. From (34), we can easily derive the update for the signal(46)where we have used to simplify notation. (At the end of Section 3.2, we have commented on the relationship between and .) The update of the scaling parameter depends on its prior distribution. In this section, we consider three priors: a conjugate prior given by the inverse-chi-square (Inv-) distribution, Jeffreys' prior (), and the uniform prior [45]. The Inv- distribution is given by(47)where is the degree of freedom. For the mean of is given by . Therefore, if we have prior knowledge about the mean of , say , then . With these considerations, we can determine the update for the scaling parameter using the Inv- prior, Jeffreys' prior and the uniform prior as the following:(48)

4.2. Generalized Wiener Estimation

We recall that for the observation model given by (45), when the signal is modeled i.i.d. Gaussian with zero-mean and known variance , the MAP estimate of is a Wiener estimate given by(49)To link the proposed iterative algorithm with the Wiener estimation, we compare (46) and (49). It is easy to see that (49) is a special case of (46) where is a constant, that is, . We regard the proposed algorithm as a generalized Wiener estimate, because (a) the variable in (46) is a scaling factor that accounts for the heavy-tailed characteristic of the distribution, and (b) it is an iterative algorithm.

To gain further insight into the proposed algorithm, we study the student-t distribution with the degree of freedom . The relationship between the variance of the signal and the scaling factor is given by(50)Thus, once the estimated scale parameter at the th iteration is known, the estimated signal variance is also known. Using this relationship, we can rewrite (46) as(51)where we have used and . We can further rewrite (51) as(52)where(53)Comparing (49) and (52), we can see that the latter can be regarded as a generalized Wiener estimate of the signal, where a localized signal variance is estimated by taking a weighted average of the signal variance and the local signal energy. It can be easily seen that when , the student- distribution approaches the Gaussian distribution and . In this case, (52) is a generalized form of (49) in that it represents an iterative algorithm for estimating signal under unknown signal variance.

4.3. Two Image Denoising Algorithms

Direct application of the proposed algorithm for image denoising does not necessarily lead to satisfactory results. This is because in developing the algorithm we have ignored that image signals are generally nonstationary. Since the proposed algorithms can be regarded as generalized Wiener estimates that use localized information, they are modified in the following two ways for image denoising.

4.3.1. A Noniterative Generalized Wiener Estimation Algorithm

Motivated by developing a low-complexity algorithm, we consider a noniterative algorithm given by(54)where is a localized estimate of the signal energy at the nth location and is constant to be determined for a particular class of signals. When , this algorithm is a Wiener estimate using local statistics. The heavy-tail distribution of the signal is accounted for by setting . The performance of this algorithm also depends on the estimation of noise variance and the local signal variance . A robust estimation of the variance [10] of the noise is given by(55)A simple method to estimate the signal variance is the following:(56)where . The underlyingprinciple for this estimation is that the signal is uncorrelated with noise. With the above results, we can see that the proposed algorithm (in (54)) is actually a combination of shrinkage and hard-thresholding. The shrinkage part is a generalized adaptive Wiener filter with the parameter accounting for the heavy-tailed characteristics of the signal, while the hard-thresholding part, which is well established [10], plays an essential role in favouring a sparse solution.

It should be noted that (54) is not a direct result of an optimization problem. It is, however, a low-complexity approximation of the iterative algorithm given by (46). Indeed, comparing these two equations, we can see that and in (46) are replaced by and , respectively. As such, we can regard (46) as a one-step implementation of the iterative algorithm.

4.3.2. An Iterative Generalized Wiener Estimation Algorithm Using Local Statistics

This algorithm is motivated by using the local statistics discussed in Section 4.2. We note that in developing (46) we have assumed a global scaling parameter for the whole image. This assumption is useful to simplify discussion. However, it is not necessarily a valid one for waveletcoefficients of an image. Therefore, we propose to replace the global scaling parameter with a localized scaling parameter which is estimated by(57)From Table 2, we can see that is a function of for the student-t and slash distribution. We replace it with , where(58)The estimate of the signal is then given by(59)Comparing (59) with (46), we can see that we have used a local estimate of the scaling parameter to replace the global scaling parameter.

4.3.3. Experimental Results

The noniterative and the iterative algorithms will be referred to as generalized Wiener estimate (GWE) and iterative generalized Wiener estimate (IGWE), respectively. For the GWE algorithm, extensive experiments using different images have shown that setting has led to good results in terms of the peak-signal-to-noise ratio (PSNR) of the denoised image. For the IGWE algorithm, since the power exponential and a number of SMG distributions havebeen studied [11, 16, 48], we focus on the student-t and slash distributions which have not been widely applied to denoising problem. We use IGWE-T and IGWE-S to indicate the student-t and slash distributions being used, respectively. Experimental results show that good results are obtained for 3 to 4 iterations for the IGWE-T () and IGWE-S () algorithms.

We first test image denoising using a single waveletrepresentation. In all experiments, an image is decomposed into 6 levels using the wavelet. Each subband of the signal is then denoised independently. We have performed simulations using the Barbara and Lena images. The experimental results for each noise level setting are obtained by taking the average of the PSNR of 100 runs of the program. In each run of the program, pseudo-Gaussian noise generator is reset to a different state and noise is added to the image.

We can see from the experimental results shown in Table 3 that for the Barbara image the iterative algorithms perform better than the noniterative algorithms. For the Lena image, their performance is about the same. We can also see that using the GWE algorithm, the PSNR associated with the setting is generally higher than that with the setting . The difference in PSNR is significant for images with high-noise levels. We also note that slight improvement in PSNR can be achieved by varying the value of between 1.2 to 1.5 according to the estimated noise variance.

Table 3: PSNR (dB) results using two noisy images with different levels of additive noise. GWE1 and GWE2 are the GWE algorithms with and respectively.

We compare the performance of the proposed algorithms with that of the bi-shrinkage [39] which is arguably one of the best in recent publications. We can see that the performance of proposed algorithms (GWE2, IGWE-T, and IGWE-S) is consistently better than that of the bi-shrinkage algorithm.

Next, we test the proposed algorithm using the complex wavelet representation [39]. In our experiments, we use the proposed algorithms (IGWE-T and IGWE-S) to process each individual image subband of the complex wavelet representation. We use exactly the same complex wavelet transform as that used in [39]. For IGWE-T and IGWE-S, the number of iterations is 3 and the degrees of freedom are set (IGWE-T) and (IGWE-S). Results are shown in Table 4.

Table 4: A comparison of denoising results based on the peak-signal-to-noise ratio (dB). Five test images are used under different noise levels. It should be noted that results due to references [16, 39, 40] are calculated using the available software from the authors. These results may be slightly different from those presented in the original paper.

We can see that the performances of the two proposed iterative algorithms are almost the same. When we compare the results of the proposed algorithms in Table 4 and with respective results in Table 3, we can see that using the complex wavelet representation has led to substantially improved results. Next, we compare results from three image denoising algorithms which use different over complete wavelet representations and different statistical models [16, 39, 40]. We can see from Tables 4 and 5 that the performance of the proposed algorithms are comparable with that of the three state-of-the-art algorithms in terms of the peak-signal-to-noise ratio and mean absolute error.

Table 5: A comparison of denoising results based on the mean absolute error. Five test images are used under different noise levels. It should be noted that results due to references [16, 39, 40] are calculated using the available software from the authors.

The mandrill (also known as baboon) image is quite different from the other four test images in that it contains a lot of fine details. We notice that when the noise level is low () the performances of the two iterative algorithms are not as good as those of three published algorithms. This may be because the window size () used in the calculation of local signal variance does not match the characteristics of the image. Another reason could be that the prior with a fixed setting of parameter does not model the signal well. Therefore, the proposed algorithm could be improved by introducing an adaptive estimation of the window size and the parameter . However, this may significantly increase the computational cost.

In Figures 1 and 2, we compare the results ofdenoised images using algorithms in [16, 39, 40] and the proposed IGWE-T and GWE2 algorithms. (Professor Xin Li kindly provided us with his source code. Matlab codes for the algorithms in [16, 39] are available from the following addresses: http://decsai.ugr.es/~javier/denoise/software/ and http://taco.poly.edu/WaveletSoftware/, respectively. Default settings for algorithms in [39, 40] are used and suggested settings to reproduce results in [16] are also used.) Again, we can see that the denoised images are quite similar. We note that computation time of these algorithms are quite different in our simulations using a PC with a Pentium 4 3 GHz processor. While the running time for algorithm in [16] is more than 75 seconds those for the proposed GWE2 and IGWET algorithms are about 2.5 and 3.7 seconds, respectively. The running time for the algorithm in [39] is about 3.6 seconds and the running time for the algorithm in [40] is about 7.2 seconds.

Figure 1: Image denoising results using the Lena image added with random noise (). Images shown from top to bottom, left to right are the noisy results of algorithms in [16, 39, 40], the proposed IGWE-T and GWE2 algorithms. Typical PSNR values for these images are listed in Table 4.
Figure 2: Image denoising results using the Barbara image added with random noise (). Images shown from top to bottom, left to right are the noisy, results of algorithms in [16, 39, 40], the proposed IGWE-T and GWE2 algorithms. Typical PSNR values for these images are listed in Table 4.
4.4. Discussion

In [49], an empirical Bayes (EB) approach is proposed to develop low-complexity image denoising algorithms in which parameters of the prior are estimated from the data. These estimated parameters are then “plugged" into the posterior. The proposed iterative algorithms using local statistics can be regarded as a generalization of the idea of [49] in that the scaling parameter is treated as a random variable and is jointly estimated with the signal. More specifically, the difference is in the way the problem isformulated. For the denoising problem considered in this paper, if we used an EB approach, we would first determine an estimate (e.g., a MAP estimate) of the scale parameter from the marginal distribution , where . We would then determine an estimate (e.g., a MAP estimate) of the signal by assuming a known scale parameter , that is, . The approach used in this paper, however, is different in that we determine a MAP estimate from the joint posterior by using the proposed iterative algorithm.

Another interesting question is as follows. In the observation model (see (45)), it is assumed that the wavelet transform is orthogonal. However, the complex wavelet transform is redundant and is usually nonorthogonal. Can we still apply the denoising method developed for signals in the orthogonal wavelet transform domain to signals in the complex wavelet transform domain? This question is partly answered in a recent paper [50] by Elad. Elad showed that for signal denoising using redundant representations an iterative algorithm such as the basis pursuit [51] is usually employed. Elad further showed that applying a shrinkage function (usually developed for orthogonal wavelet representations) to redundant wavelet representations is justified in that this can be regarded as the first iteration step of the basis pursuit algorithm.

In addition, the number of iterations deserves further study. As the proposed iterative algorithm is essentially an EM algorithm, it may converge to a local minimum. On the other hand, since we use a local neighborhood to update the scale parameter , we effectively make a further assumption that the scale parameter also follows an i.i.d. distribution locally. This assumption may fit the data well in the first few iterations. But this may not be the case after a few iterations when the signal is less noisy. This is perhaps an intuitive explanation of the observation that the performance (measured by the PSNR and mean absolute errors) of the proposed iterative algorithm improves in the first 3 to 4 iterations, drops slightly, and converges to a suboptimal estimate.

The proposed algorithms are not optimal in removing non-Gaussian noise (e.g., impulsive noise). This is because we have taken a model-based approach in solving a MAP estimating problem involving a Gaussian linear observation model which has been used in many recent publications (see, e.g., [11] and reference therein). As such, the solution is only optimal for Gaussian noise. From a model-based point of view, to deal with non-Gaussian noise, we need to make proper assumption about the noise distribution function and solve the MAP problem. Such work is beyond the scope of this paper.

5. Conclusions

In this paper, we have studied CFLB/MM algorithms for a special class of objective functions that are convex through a suitable mapping of variable. We proposed a generalized version of the CFLB/MM algorithm and show that the CFLB and MM algorithms are equivalent for this class of objective functions. We develop a CFLB/MM algorithm for general MAP estimation problems under linear Gaussian observation models. We also study the relationship between the CFLB/MM algorithm and the EM algorithm. We then modify the proposed algorithm to image denoising. We showthat the proposed image denoising algorithm can be regarded as a generalization of the classical Wiener estimate algorithm.We propose a noniterative and an iterativealgorithm for image denoising. We discuss connections of the proposed iterativealgorithm with those algorithms using empirical Bayes and issues related to using the proposed algorithms in over-complete wavelet representations. Experimental results show that the performance of the proposed algorithmusing a single wavelet representation is better than that of the bi-shrinkage algorithm which is arguably one of the best in recent publications. When over-complete wavelet representations such as the complex wavelets are used, the performance of the proposed algorithms are competitive with three state-of-the-art algorithms.

References

  1. S. M. Kay, Fundamentals of Statistical Signal Processing Estimation Theory, Prentice Hall, Englewood Cliffs, NJ, USA, 1993.
  2. P. J. Huber, Robust Statistics, John Wiley & Sons, New York, NY, USA, 1981.
  3. T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer, New York, NY, USA, 2001.
  4. C. M. Bishop, Neural Networks for Pattern Recognition, Oxford University Press, London, UK, 1995.
  5. B. A. Olshausen and D. J. Field, “Emergence of simple-cell receptive field properties by learning a sparse code for natural images,” Nature, vol. 381, no. 6583, pp. 607–609, 1996.
  6. M. A. T. Figueiredo, “Adaptive sparseness for supervised learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 9, pp. 1150–1159, 2003.
  7. J. Fan and R. Li, “Variable selection via nonconcave penalized likelihood and its oracle properties,” Journal of the American Statistical Association, vol. 96, no. 456, pp. 1348–1360, 2001.
  8. R. Tibshirani, “Regression shrinkage and selection via lasso,” Journal of the Royal Statistical Society. Series B, vol. 57, pp. 257–288, 1995.
  9. A. Antoniadis and J. Fan, “Regularization of wavelet approximations,” Journal of the American Statistical Association, vol. 96, no. 455, pp. 939–967, 2001.
  10. D. Donoho and I. Johnstone, “Adapting to unknown smoothness via wavelet shrinkage,” Journal of the American Statistical Association, vol. 90, no. 432, pp. 1200–1224, 1995.
  11. J. M. Bioucas-Dias, “Bayesian wavelet-based image deconvolution: a GEM algorithm exploiting a class of heavy-tailed priors,” IEEE Transactions on Image Processing, vol. 15, no. 4, pp. 937–951, 2006.
  12. D. F. Andrews and C. L. Mallows, “Scale mixtures of normal distributions,” Journal of the Royal Statistical Society. Series B, vol. 36, pp. 99–102, 1974.
  13. W. H. Rogers and J. W. Tukey, “Understanding some long-tailed distributions,” Statistica Neerlandica, vol. 26, pp. 211–226, 1962.
  14. K. L. Lange and J. S. Sinsheimer, “Normal/independent distributions and their applications in robust regression,” Journal of Computational and Graphical Statistics, vol. 2, no. 2, pp. 175–198, 1993.
  15. L. Chuanhai, “Bayesian robust multivariate linear regression with incomplete data,” Journal of the American Statistical Association, vol. 91, no. 435, pp. 1219–1227, 1996.
  16. J. Portilla, V. Strela, M. J. Wainwright, and E. P. Simoncelli, “Image denoising using scale mixtures of Gaussians in the wavelet domain,” IEEE Transactions on Image Processing, vol. 12, no. 11, pp. 1338–1351, 2003.
  17. D. P. Wipf and B. D. Rao, “Sparse Bayesian learning for basis selection,” IEEE Transactions on Signal Processing, vol. 52, no. 8, pp. 2153–2164, 2004.
  18. F. Girosi, “Models of noise and robust estimates,” Massachusetts Institute of Technology, Cambridge, Mass, USA, 1991.
  19. M. I. Jordan, Z. Ghahramani, T. S. Jaakkola, and L. K. Saul, “An introduction to variational methods for graphical models,” in Learning in Graphical Models, M. I. Jordan, Ed., Kluwer Academic Publishers, Dordrecht, The Netherlands, 1999.
  20. T. S. Jaakkola, “Tutorial on variational approximation methods,” in Advanced Mean Field Methods: Theory and Practice, D. Saad and M. Opper, Eds., MIT Press, Cambridge, Mass, USA, 2000.
  21. D. R. Hunter and K. Lange, “A tutorial on MM algorithms,” American Statistician, vol. 58, no. 1, pp. 30–37, 2004.
  22. M. J. Beal, Variational algorithms for approximate Bayesian inference, Ph.D. dissertation, The Gatsby Computational Neuroscience Unit, University College of London, London, UK, May 2003.
  23. S. J. Roberts and W. D. Penny, “Variational Bayes for generalized autoregressive models,” IEEE Transactions on Signal Processing, vol. 50, no. 9, pp. 2245–2257, 2002.
  24. M. Girolami, “A variational method for learning sparse and overcomplete representations,” Neural Computation, vol. 13, no. 11, pp. 2517–2532, 2001.
  25. M. Welling, G. E. Hinton, and S. Osindero, “Learning sparse topographic representations with products of student-t distributions,” in Proceedings of the 15th Conference on Neural Information Processing Systems (NIPS '02), G. Tesauro, D. S. Touretzky, and T. K. Leen, Eds., pp. 1359–1366, MIT Press, Vancouver, BC, Canada, December 2002.
  26. A. C. Likas and N. P. Galatsanos, “A variational approach for Bayesian blind image deconvolution,” IEEE Transactions on Signal Processing, vol. 52, no. 8, pp. 2222–2233, 2004.
  27. D. R. Hunter and R. Li, “Variable selection using MM algorithms,” Annals of Statistics, vol. 33, no. 4, pp. 1617–1642, 2005.
  28. D. R. Hunter and K. Lange, “Quantile regression via an MM algorithm,” Journal of Computational & Graphical Statistics, vol. 9, pp. 60–77, 2000.
  29. Z. Zhang, J. T. Kwok, and D.-Y. Yeung, “Surrogate maximization/minimization algorithms for AdaBoost and the logistic regression model,” in Proceedings of the 21st International Conference on Machine Learning (ICML '04), pp. 927–934, Banff, Canada, July 2004.
  30. D. Geman and G. Reynolds, “Constrained restoration and the recovery of discontinuities,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 14, no. 3, pp. 367–383, 1992.
  31. D. Geman and C. Yang, “Nonlinear image recovery with half-quadratic regularization,” IEEE Transactions on Image Processing, vol. 4, no. 7, pp. 932–946, 1995.
  32. M. A. T. Figueiredo and R. D. Nowak, “A bound optimization approach to wavelet-based image deconvolution,” in Proceedings of the International Conference on Image Processing (ICIP '05), vol. 2, pp. 782–785, Genova, Italy, September 2005.
  33. J. Bioucas-Dias, M. Figueiredo, and J. Oliveira, “Total variation image deconvolution: a majorization-minimization approach,” in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '06), vol. 2, Toulouse, France, May 2006.
  34. G. Deng and W.-Y. Ng, “A minorization-maximization algorithm for maximum a posteriori signal estimation,” in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '06), vol. 2, pp. 617–620, Toulouse, France, May 2006.
  35. K. Lange and J. Fessler, “Globally convergent algorithms for maximuma posteriori transmission tomography,” IEEE Transactions on Image Processing, vol. 4, no. 10, pp. 1430–1438, 1995.
  36. A. de Pierro, “A modified expectation maximization algorithm for penalized likelihood estimation in emission tomography,” IEEE Transactions on Medical Imaging, vol. 14, no. 1, pp. 132–137, 1995.
  37. P. Charbonnier, L. Blanc-Féraud, G. Aubert, and M. Barlaud, “Deterministic edge-preserving regularization in computed imaging,” IEEE Transactions on Image Processing, vol. 6, no. 2, pp. 298–311, 1997.
  38. H. Erdogan and J. A. Fessler, “Monotonic algorithms for transmission tomography,” IEEE Transactions on Medical Imaging, vol. 18, no. 9, pp. 801–814, 1999.
  39. L. Şendur and I. W. Selesnick, “Bivariate shrinkage with local variance estimation,” IEEE Signal Processing Letters, vol. 9, no. 12, pp. 438–441, 2002.
  40. X. Li and M. T. Orchard, “Spatially adaptive image denoising under overcomplete expansion,” in Proceedings of IEEE International Conference on Image Processing (ICIP '00), vol. 3, pp. 300–303, Vancouver, BC, Canada, September 2000.
  41. S. Boyd and L. Vandenberghe, Convex Optimization, Cambridge University Press, Cambridge, UK, 2004.
  42. D. R. Hunter, K. Lange, and I. Yang, “Optimization transfer using surrogate objective functions,” Journal of Computational and Graphical Statistics, vol. 9, no. 1, pp. 1–20, 2000.
  43. M. Jamshidian, “Adaptive robust regression by using a nonlinear regression program,” Journal of Statistical Software, vol. 4, pp. 1–25, 1999.
  44. G. Deng, “Generalized wiener estimation algorithms based on a family of heavy-tail distributions,” in Proceedings of IEEE International Conference on Image Processing (ICIP '05), vol. 1, pp. 457–460, Genova, Italy, September 2005.
  45. A. Gelman, H. B. Carlin, H. S. Stern, and D. B. Rubin, Bayesian Data Analysis, Chapman & Hall/CRC, Boca Raton, Fla, USA, 2004.
  46. G. Deng, “Signal estimation using multiple-wavelet representations and Gaussian models,” in Proceedings of IEEE International Conference on Image Processing (ICIP '05), vol. 1, pp. 453–456, Genova, Italy, September 2005.
  47. L. Şendur and I. W. Selesnick, “Bivariate shrinkage functions for wavelet-based denoising exploiting interscale dependency,” IEEE Transactions on Signal Processing, vol. 50, no. 11, pp. 2744–2756, 2002.
  48. S. G. Chang, B. Yu, and M. Vetterli, “Adaptive wavelet thresholding for image denoising and compression,” IEEE Transactions on Image Processing, vol. 9, no. 9, pp. 1532–1546, 2000.
  49. M. K. Mihcak, I. Kozintsev, K. Ramchandram, and P. Moulin, “Low-complexity image denoising based on statistical modelling of wavelet coefficients,” IEEE Signal Processing Letters, vol. 6, no. 12, pp. 300–303, 1999.
  50. M. Elad, “Why simple shrinkage is still relevant for redundant representations?,” IEEE Transactions on Information Theory, vol. 52, no. 12, pp. 5559–5569, 2006.
  51. S. S. Chen, Basis pursuit, Ph.D. dissertation, Department of Statistics, Stanford University, Stanford, Calif, USA, November 1995.