Department of Electronic Engineering, La Trobe University, Bundoora, Victoria 3086, Australia
Department of Information Engineering, The Chinese University of Hong Kong, Shatin, Hong Kong
Abstract
A fundamental problem in signal processing is to estimate signal from noisy observations.
This is usually formulated as an optimization problem. Optimizations based on variational
lower bound and minorization-maximization have been widely used in machine learning
research, signal processing, and statistics. In this paper, we study iterative algorithms based
on the conjugate function lower bound (CFLB) and minorization-maximization (MM) for a
class of objective functions. We propose a generalized version of these two algorithms and
show that they are equivalent when the objective function is convex and differentiable. We
then develop a CFLB/MM algorithm for solving the MAP estimation problems under a linear
Gaussian observation model. We modify this algorithm for wavelet-domain image denoising.
Experimental results show that using a single wavelet representation the performance of the
proposed algorithms makes better than that of the bishrinkage algorithm which is arguably one
of the best in recent publications. Using complex wavelet representations, the performance
of the proposed algorithm is very competitive with that of the state-of-the-art algorithms.
1. Introduction
Estimating signal from noisy observations is a
fundamental task in signal processing. A linear observation model is given
by
(1)where
is a column
vector of the true signal,
and
are vectors of
observations and noise, respectively. We assume that
is a known
matrix. When noise is assumed independent and identically distributed (i.i.d.)
Gaussian, the maximum likelihood (ML) estimation is a typical least-squares
problem [1]. When the
distribution of the noise is assumed heavy-tailed, the ML estimation is a
robust regression problem [2]. When noise is i.i.d. Gaussian, the maximum a
posteriori (MAP) estimation problem is essentially a penalized least-squares
problem. For example, the problem is known as a ridge-regression [3] or weight-decay [4] problem when the prior for
is also i.i.d.
Gaussian. When the prior for
is
non Gaussian, it is usually chosen to model the sparseness of the signal
[5, 6], to control the complexity
of the model [3, 4], or to perform model selection [7, 8]. A typical application that exploits the sparseness of
the signal is in wavelet-based image denoising [9–11].
Among the non-Gaussian distributions, a particularfamily of heavy-tailed distributions is the scale mixture of Gaussian (SMG)
distribution [12].
These distribution functions, including power exponential,
student-t,
and slash distributions [13, 14], have found many successful applications in robust
statistical data analysis [14, 15], image processing [11, 16], and machine learning problems [6, 17]. An interesting discussion
of robust estimation using the SMG distribution is given in [18].
After we make assumptions about the likelihood andthe
prior, the MAP estimate of
can be
determined by solving a specific optimization problem. Optimization algorithms that are based on variational methods [19, 20] and the minorization-maximization principle [21] are powerful alternatives
to the EM algorithm. For example, optimization algorithms based on variational
methods have been widely used in Bayesian learning that employs non-Gaussian
distributions. These algorithms are based on maximizing the variational lower
bound which is given either by introducing an auxiliary distribution function [20, 22, 23] or by representing the
objective function in its variational form using the conjugate
function [19, 24, 25]. Variational methods have also been applied to a number of signal processing problems [17, 23, 26]. The basic idea of the minorization-maximization (MM)
algorithm [21] is that, instead of directly maximizing the objective function, another objective function that minorizes the original objective function is iteratively
maximized. (The formal definition is presented in Section
2.2.) The MM algorithm has been successfully
applied to solve many statistical problems including variable selection
[27] and quantile
regression [28]. It
has also found applications in machine learning research [29]. Both variational methods
and the MM algorithm have long been applied to solve many signal processing
problems such as image restoration [30–34] and computer tomography
[35–38].
In this paper, we develop an iterative algorithm to
determine the MAP estimate of
based on the
Gaussian linear observation model given by (1) and the prior
given by an
i.i.d. scale mixture of Gaussian distribution. A direct application of this
algorithm is image denoising in the wavelet domain. This is performed by
letting
in the
observation model and regarding
and
as the observed
and original wavelet coefficients, respectively. This is a special case of the
linear observation model. Since our study is based on the conjugate function
lower bound (CFLB) and minorization-maximization, we also study the connection
between the two.
This paper is organized as follows. In Section 2,
after a brief introduction of the CFLB and MM algorithms, we present a
generalized view of both algorithms and two extensions. In particular, we study
iterative optimization algorithms for a class of objective functions
. We assume that through a suitable mapping of the
variable
, the resulting function
(
) is convex and
differentiable. This type of objective function is studied because the
log-prior (the logarithm of the scale mixture of Gaussian distribution) has
this property and plays an important role in the algorithmic development. We
describe iterative algorithms called the CFLB algorithm and MM algorithm for
maximizing the objective function. We then propose a generalized version of
both algorithms and show that they are indeed equivalent for the class of
objective functions considered in this paper. In Section 3, we develop an
iterative algorithm for MAP estimate of the signal under the general model
setting of (1). We also discuss the connection between the developed algorithm
with an EM algorithm. In Section 4, we modify the algorithm developed in
Section 3 for image denoising in the wavelet domain. We study two heavy-tailed
priors for wavelet coefficients: student-t and slash distributions which
are of interest as they have not yet been widely studied in image denoising.
Recognizing that the proposed algorithms can be regarded as generalized Wiener
estimation algorithms, we propose two algorithms which exploit the local
statistics. One is a noniterative algorithm which has a parameter that accounts
for the heavy-tailed characteristics of the signal. The other is an iterative
algorithm based on either the student-t or slash distribution. We also
discuss the connection of proposed algorithms with algorithms based on
empirical Bayes and issues related to using the proposed algorithm in complex
wavelet representations. Experimental results show that when using a single
wavelet representation the performance of the proposed algorithms is better
than that of the bi-shrinkage algorithm [39] which is arguably one of the best in recent
publications. Using over-complete wavelet representations, the performance of
the proposed algorithm is competitive to that of the state-of-the-art image
denoising algorithms [16, 39, 40].
2. Iterative Maximization Based on the Convexity of the Objective Function
In this section, we present a brief introduction to
the CFLB and MM algorithms for determining the local/global maximum of a
objective function
. We assume that through a suitable mapping
we have a
convex and differentiable function
such that
. The proposed CFLB and MM algorithms are based on the
convexity of
. We then present a generalized version of these two
algorithms and two extensions. This is followed by two families of objective
functions for which the CFLB and MM algorithms are useful tools.
2.1. The Conjugate Function Lower Bound 85(CFLB) Algorithm [19, 20]
The conjugate function [41] of
is
(2)When
is convex and
differentiable, the maximizer of
satisfies
, where
. For a fixed
,
is recovered
by
(3)Using Fenchel's inequality
[41], for any
and
, the conjugate function lower bound for
is given
by
(4)We can define a new objective
function
(5)Substituting
into (3), it is clear that
(6)and
for any
.
An iterative algorithm that guarantees a nondecreasing
sequence of
is the
following. The algorithm, called the conjugate function lower bound (CFLB)
algorithm, has two maximization steps. At the kth iteration, we know
and maximize
to obtain
. It is easy to show that
(7)where
. Next, we calculate
by maximizing
,
(8)where we assume that there is at
least a local maximum for
. Since
is fixed,
is a constant.
We can write
(9)From the above two maximization
steps, we can write
(10)According to definition, we have
and
. Thus,
. There-fore, the CFLB algorithm leads to a
nondecreasing sequence of
.
2.2. The Minorization-Maximization (MM) Algorithm
A function
with a known
parameter
is said to
minorize
at the point
provided
(11)Let
be the
maximizer of
, such that
(12)From the definition, we
have
(13)Therefore, maximizing
results in a
nondecreasing sequence
. This algorithm is called a minorization-maximization
(MM) algorithm.
For a convex and differentiable function
, we have
(14)Substituting
into (14), we
have
(15)Thus,
is minorized
by
(16)since
and
). Therefore,
the MM algorithm that iteratively maximizes 
(17)leads to a nondecreasing
sequence
. The convergent property of the MM algorithm and
techniques to speed up the convergence rate are discussed in [21, 42].
2.3. Generalization, Comparison, and Extensions
2.3.1. Generalization and Comparison
The basic ideas of the CFLB and MM algorithms can be
generalized as the following. To find a local/global maximum of
, we assume there is a function
:
(18)where
is a suitable
function of
. An iterative algorithm can now be developed. In the
th step, we
assume
is known and we
calculate
. Then in the
th step, we
determine
such
that
(19)Since by definition
and
, we have
. Therefore, when the two conditions stated in (18) are satisfied, the uphill location
for
is also the
uphill location for
.
The CFLB and MM algorithms are special
cases of the above generalized algorithm. There are two special considerations
in the CFLB and MM algorithms.
(i)
One is operational.
is determined
by themaximization
In light of
(19), this maximization step in the CFLB and MM algorithm is sufficient but not
necessary.
(ii)
The other is
structural. For an MM algorithm,
, while for a CFLB algorithm, the function
depends on the
definition of the conjugate function lower bound
.
In addition, we
can clearly see that the objective function
) of the CFLB
algorithm is exactly the same as the minorization function
of the MM
algorithm. This is because for a convex and differentiable function its
conjugate function lower bound [41] is the same as the minorizing function used in the MM
algorithm. Therefore, the CFLB algorithm and the MM algorithm, when they rely
on the convexity of
, are essentially the same in searching
for a local/global maximum of the function
. We note that for the MM algorithm there are other
tools to construct the minorizing function which is not necessarily the same as
that constructed by using the CFLB.
2.3.2. Two Extensions
Here, we assume the objective function has at least a
local maximum. In the first extension, we consider an objective function of a
vector variable
:
(20)where
and
is scalar.
Assume
is convex.
Then,
is minorized by
. The CFLB/MM
algorithm for maximizing
is given
by
(21)
In the second extension, we consider an objective
function which is the sum of
defined in (20)
and another objective function
:
(22)Here, we assume that
has at least a
local maximum. Since
is minorized by
,
is minorized by
. The CFLB/MM algorithm for maximizing
is given
by
(23)
2.4. An Example
In this section, we show that the logarithm of the
scale mixture of Gaussian distribution function [12] has the desired convex
property after suitable mapping of variables. A family of heavy-tailed
distributions for a zero-mean scalar random variable
is defined as a
scale mixtures of Gaussian:
(24)where
and
is the prior distribution of
. (We adopt the following
notations in this paper. The conditional distribution function is denoted by
, while a function parameterized by a parameter is
denoted by
. A vector
is written in
bold-face font, while its nth elements is denoted
by
.) Here, we have included
two parameters
and
to account for
certain distributions such as the student-t distribution which has a
scaling parameter and a parameter for the degree of freedom. Different settings
for
result in a
family of heavy-tailed distributions [11, 12, 18].
For example, when
is a gamma
distribution with both parameters set to
, the resulting SMG is the student-t distribution. The Gaussian distribution is a special case where
is not a random
variable but is a constant
. The definitions of three heavy-tailed distributions:
power exponential, student-t, and slash are shown in Table 1. More
examples can be found in [11]. We note that
the Laplacian (
) and Gaussian
distribution (
) are two
special cases of the power exponential distribution. In addition, the power
exponential distribution function can be represented as the SMG when
. (This is a subset of the
power exponential distribution function that can be represented as
SMG.)
Table 1: Three
heavy-tailed distributions are presented, where

and

are the gamma
function and incomplete gamma function, respectively. We assume

is a fixed
parameter.
We will use
to simplify
notations in the following discussion. We study the logarithm of the scale
mixture of Gaussian distribution function
(25)where
(26)Changing the variable
, we have a new function
and its first
derivative as follows:
(27)
(28)
We can verify the convexity of
by recognizing
that it is the logarithm of the Laplace transform of
. The Laplace transform of
islog-convex
[41]. Therefore, the
proposed CFLB/MM algorithm can be used to maximize an objective function which
involves log-SMG distribution functions. The function
and its first
derivative for the three distributions are listed in Table 2.
Table 2: The function

and its first
derivative for the three heavy-tailed distributions. For power exponential, the
parameter

must be set as

such that

is convex. We
note that

and

can be directly
calculated from the distribution function. The knowledge of the exact form of

is not
required.
3. Map Estimation under Linear Gaussian Observation Model
In this section, we present the development of a
CFLB/MM algorithm for solving a general MAP estimation problem of linear
Gaussian model. We then discuss the connection between the developed algorithm
with an expectation maximization (EM) algorithm.
3.1. Development of the Iterative Algorithm
Given a linear observation model in (1), we want todetermine a MAP estimate of the parameter vector
based on the
following model assumptions. Elements of
are i.i.d.
Gaussian with known variance
. Elements of
are i.i.d.
scale mixture of Gaussian with unknown scaling parameter
. The degree of freedom
is assumed
to be a free parameter that can be tuned.
(A full Bayesian estimate for
is generally
very computationally complicated [15, 43] and is beyond the scope of this
paper.) More specifically, the log-posterior is
given by
(29)where
(30)Changing variable
, we have
, where the function
is convex and
is given by (27). We can rewrite (29) as
(31)where
(32)Since
is minorized by
,
is minorized by
the following objective function:
(33)where
is a diagonal
matrix. The update for
is then
obtained by maximizing
,
(34)Here, we need to make a further
assumption that the matrix
is properly
defined such that the matrix inversion in the above equation can be carried out
for each iteration. The next step is to determine the update for the scaling
parameter
. To simplify presentation, we assume
is a uniform
distribution in this section. Other priors for
are considered
in Section 4.1. The update of the scaling parameter is given by
(35)
3.2. Equivalence with the EM Algorithm
In [44], we develop an EM algorithm for the MAP estimation
problem. In this section, we present the details of the EM algorithm and show
that it is equivalent to the CFLB/MM algorithm. In developing the EM algorithm,
we regard the parameters
as the missing
data. The signal
and the scaling
factor
, denoted
, are the data to be estimated. We then determine the
Q-function [45]
(36)We now give details of
calculating the Q-function and the E-step. Using Bayes' rule, we can
write
(37)where
(38)Therefore, ignoring constants
and unrelated terms, we have the following results:
(39)where
is the
conditional mean
(40)Equation (40) states the
calculation required for the E-step. In the M-step, we maximize the Q-function
to determine
.
The E-step is calculated as the
following:
(41)where
(42)Since
is
Gaussian
(43)we have
(44)where we have used the
substitution
. Substituting (44) into (41) and comparing with (28),
we can see that
. Comparing objective function of the CFLB/MM algorithm
(33) with the Q-function of the EM algorithm (39), we can see the two
algorithms are equivalent.
4. Applications in Image Denoising
4.1. Iterative Denoising Algorithms
We now use image denoising in wavelet transform domain
as an example to demonstrate the application of the proposed algorithm. (Part of this section was presented in [34, 46].) In the wavelet domain,
we have the following observation model:
(45)where
and
are observed
and original wavelet coefficients of the signal, respectively.
is the additive
Gaussian noise. Therefore, the denoising problem is a special case of the MAP
estimation problem (considered in Section 3), where
is an identity
matrix. From (34), we can easily derive the update for the
signal
(46)where we have used
to simplify
notation. (At the end of Section 3.2, we have
commented on the relationship between
and
.) The update of the
scaling parameter depends on its prior distribution. In this section, we
consider three priors: a conjugate prior given by the inverse-chi-square (Inv-
) distribution,
Jeffreys' prior (
), and the
uniform prior [45].
The Inv-
distribution is
given by
(47)where
is the degree
of freedom. For
the mean of
is given by
. Therefore, if we have prior knowledge about the mean
of
, say
, then
. With these considerations, we can determine the
update for the scaling parameter using the Inv-
prior,
Jeffreys' prior and the uniform prior as the following:
(48)
4.2. Generalized Wiener Estimation
We recall that for the observation model given by
(45), when the signal is modeled i.i.d. Gaussian with zero-mean and known
variance
, the MAP estimate of
is a Wiener
estimate given by
(49)To link the proposed iterative
algorithm with the Wiener estimation, we compare (46) and
(49). It is easy to see that (49) is a special case of (46) where
is a constant,
that is,
. We regard the proposed algorithm as a generalized
Wiener estimate, because (a) the variable
in (46) is a
scaling factor that accounts for the heavy-tailed characteristic of the
distribution, and (b) it is an iterative algorithm.
To gain further insight into the proposed algorithm,
we study the student-t distribution with the degree of freedom
. The relationship between the variance of the signal
and the scaling
factor
is given
by
(50)Thus, once the estimated scale
parameter
at the
th iteration is
known, the estimated signal variance
is also known.
Using this relationship, we can rewrite (46) as
(51)where we have used
and
. We can further rewrite (51) as
(52)where
(53)Comparing (49) and (52), we can
see that the latter can be regarded as a generalized Wiener estimate of the
signal, where a localized signal variance
is estimated by
taking a weighted average of the signal variance
and the local
signal energy. It can be easily seen that when
, the student-
distribution approaches the Gaussian
distribution and
. In this case, (52) is a generalized form of (49) in
that it represents an iterative algorithm for estimating signal under unknown
signal variance.
4.3. Two Image Denoising Algorithms
Direct application of the proposed algorithm for image
denoising does not necessarily lead to satisfactory results. This is because in
developing the algorithm we have ignored that image signals are generally
nonstationary. Since the proposed algorithms can be regarded as generalized
Wiener estimates that use localized information, they are modified in the
following two ways for image denoising.
4.3.1. A Noniterative Generalized Wiener Estimation Algorithm
Motivated by developing a low-complexity algorithm, we
consider a noniterative algorithm given by
(54)where
is a localized
estimate of the signal energy at the nth location and
is constant to
be determined for a particular class of signals. When
, this algorithm is a Wiener estimate using local
statistics. The heavy-tail distribution of the signal is accounted for by
setting
. The performance of this algorithm also depends on
the estimation of noise variance
and the local
signal variance
. A robust
estimation of the variance [10] of the noise is given by
(55)A simple method to estimate the
signal variance is the following:
(56)where
. The underlyingprinciple for this estimation is that
the signal is uncorrelated with noise. With the above results, we can see that
the proposed algorithm (in (54)) is actually a
combination of shrinkage and hard-thresholding. The shrinkage part is a
generalized adaptive Wiener filter with the parameter
accounting for
the heavy-tailed characteristics of the signal, while the hard-thresholding
part, which is well established [10], plays an essential role in favouring a sparse
solution.
It should be noted that (54) is not a direct result of
an optimization problem. It is, however, a low-complexity approximation of the
iterative algorithm given by (46). Indeed, comparing these two equations, we
can see that
and
in (46) are
replaced by
and
, respectively. As such, we can regard (46) as a
one-step implementation of the iterative algorithm.
4.3.2. An Iterative Generalized Wiener Estimation Algorithm Using Local Statistics
This algorithm is motivated by using the local
statistics discussed in Section 4.2. We note that in developing (46) we have
assumed a global scaling parameter
for the whole
image. This assumption is useful to simplify discussion. However, it is not
necessarily a valid one for waveletcoefficients of an image. Therefore, we
propose to replace the global scaling parameter
with a
localized scaling parameter
which is
estimated by
(57)From Table 2, we can see that
is a function
of
for the
student-t and slash distribution. We replace it with
, where
(58)The estimate of the signal is
then given by
(59)Comparing (59) with (46), we can
see that we have used a local estimate of the scaling parameter to replace the
global scaling parameter.
4.3.3. Experimental Results
The noniterative and the iterative
algorithms will be referred to as generalized Wiener
estimate (GWE) and iterative generalized Wiener estimate (IGWE), respectively.
For the GWE algorithm, extensive experiments using different images have shown
that setting
has led to good
results in terms of the peak-signal-to-noise ratio (PSNR) of the denoised
image. For the IGWE algorithm, since the power exponential and a number of SMG
distributions havebeen studied [11, 16, 48], we focus on the student-t and slash
distributions which have not been widely applied to denoising problem. We use
IGWE-T and IGWE-S to indicate the student-t and slash distributions
being used, respectively. Experimental results show that good results are
obtained for 3 to 4 iterations for the IGWE-T (
) and IGWE-S (
) algorithms.
We first test image denoising using a single waveletrepresentation. In all experiments, an image is decomposed into 6 levels using
the
wavelet. Each subband of the signal is then denoised independently.
We have performed simulations using the Barbara and Lena images. The
experimental results for each noise level setting are obtained by taking the
average of the PSNR of 100 runs of the program. In each run of the program,
pseudo-Gaussian noise generator is reset to a different state and noise is added to the image.
We can see from the experimental results shown in
Table 3 that for the Barbara image the iterative algorithms perform better than
the noniterative algorithms. For the Lena image, their performance is about the
same. We can also see that using the GWE algorithm, the PSNR associated with
the setting
is generally
higher than that with the setting
. The difference in PSNR is significant for images
with high-noise levels. We also note that slight improvement in PSNR can be
achieved by varying the value of
between 1.2 to
1.5 according to the estimated noise variance.
Table 3: PSNR (dB)
results using two noisy images with different levels of additive noise. GWE1
and GWE2 are the GWE algorithms with

and

respectively.
We compare the performance of the proposed algorithms
with that of the bi-shrinkage [39] which is arguably one of the best in recent
publications. We can see that the performance of proposed algorithms (GWE2,
IGWE-T, and IGWE-S) is consistently better than that of the bi-shrinkage
algorithm.
Next, we test the proposed algorithm using the complex
wavelet representation [39]. In our experiments, we use the proposed algorithms
(IGWE-T and IGWE-S) to process each individual image subband of the complex
wavelet representation. We use exactly the same complex wavelet transform as
that used in [39]. For
IGWE-T and IGWE-S, the number of iterations is 3 and the degrees of freedom are
set
(IGWE-T) and
(IGWE-S).
Results are shown in Table 4.
Table 4: A comparison of denoising results based on the
peak-signal-to-noise ratio (dB). Five test images are used under different
noise levels. It should be noted that results due to references [
16,
39,
40] are calculated using the
available software from the authors. These results may be slightly different
from those presented in the original paper.
We can see that the performances of the two proposed
iterative algorithms are almost the same. When we compare the results of the
proposed algorithms in Table 4 and with respective results in Table 3, we can
see that using the complex wavelet representation has led to substantially
improved results. Next, we compare results from three image denoising
algorithms which use different over complete wavelet representations and
different statistical models [16, 39, 40]. We can see from Tables 4 and 5 that the performance
of the proposed algorithms are comparable with that of the three
state-of-the-art algorithms in terms of the peak-signal-to-noise ratio and mean
absolute error.
Table 5: A comparison of denoising results based on the mean
absolute error. Five test images are used under different noise levels. It
should be noted that results due to references [
16,
39,
40] are calculated using the
available software from the authors.
The mandrill (also known as baboon) image is quite
different from the other four test images in that it contains a lot of fine
details. We notice that when the noise level is low (
) the
performances of the two iterative algorithms are not as good as those of three
published algorithms. This may be because the window size (
) used in the
calculation of local signal variance does not match the characteristics of the
image. Another reason could be that the prior with a fixed setting of parameter
does not model
the signal well. Therefore, the proposed algorithm could be improved by
introducing an adaptive estimation of the window size and the parameter
. However, this may significantly increase the
computational cost.
In Figures 1 and 2, we compare the results ofdenoised
images using algorithms in [16, 39, 40] and the proposed IGWE-T and GWE2 algorithms.
(Professor Xin Li kindly provided us with his
source code. Matlab codes for the algorithms in [16, 39] are available from the following addresses:
http://decsai.ugr.es/~javier/denoise/software/ and http://taco.poly.edu/WaveletSoftware/, respectively. Default settings for
algorithms in [39, 40] are used and suggested settings to reproduce results
in [16] are also
used.) Again, we can see that the denoised images
are quite similar. We note that computation time of these algorithms are quite
different in our simulations using a PC with a
Pentium 4 3 GHz processor. While the running time
for algorithm in [16]
is more than 75 seconds those for the proposed GWE2 and IGWET algorithms are
about 2.5 and 3.7 seconds, respectively. The running time for the algorithm in
[39] is about 3.6
seconds and the running time for the algorithm in [40] is about 7.2 seconds.
Figure 1: Image denoising
results using the Lena image added with random noise (

). Images shown
from top to bottom, left to right are the noisy results of algorithms in
[
16,
39,
40], the proposed IGWE-T and
GWE2 algorithms. Typical PSNR values for these images are listed in Table
4.
Figure 2: Image denoising
results using the Barbara image added with random noise (

). Images shown
from top to bottom, left to right are the noisy, results of algorithms in
[
16,
39,
40], the proposed IGWE-T and
GWE2 algorithms. Typical PSNR values for these images are listed in Table
4.
4.4. Discussion
In [49], an empirical Bayes (EB) approach is proposed to develop low-complexity image denoising algorithms in which parameters of the
prior are estimated from the data. These estimated parameters are then
“plugged" into the posterior. The proposed iterative algorithms using local
statistics can be regarded as a generalization of the idea of [49] in that the scaling
parameter is treated as a random variable and is jointly estimated with the
signal. More specifically, the difference is in the way the problem isformulated. For the denoising problem considered in this paper, if we used an
EB approach, we would first determine an estimate (e.g., a MAP estimate) of the
scale parameter
from the
marginal distribution
, where
. We would then determine an estimate (e.g., a MAP
estimate) of the signal by assuming a known scale parameter
, that is,
. The approach used in this paper, however, is
different in that we determine a MAP estimate from the joint posterior
by using the
proposed iterative algorithm.
Another interesting question is as follows. In the
observation model (see (45)), it is assumed that the wavelet transform is
orthogonal. However, the complex wavelet transform is redundant and is usually
nonorthogonal. Can we still apply the denoising method developed for signals in
the orthogonal wavelet transform domain to signals in the complex wavelet
transform domain? This question is partly answered in a recent paper [50] by Elad. Elad showed that
for signal denoising using redundant representations an iterative algorithm
such as the basis pursuit [51] is usually employed. Elad further showed that
applying a shrinkage function (usually developed for orthogonal wavelet
representations) to redundant wavelet representations is justified in that this
can be regarded as the first iteration step of the basis pursuit algorithm.
In addition, the number of iterations deserves further
study. As the proposed iterative algorithm is essentially an EM algorithm, it
may converge to a local minimum. On the other hand,
since we use a local neighborhood to update the scale parameter
, we effectively
make a further assumption that the scale parameter also follows an i.i.d.
distribution locally. This assumption may fit the data well in the first few
iterations. But this may not be the case after a few iterations when the signal
is less noisy. This is perhaps an intuitive explanation of the observation that
the performance (measured by the PSNR and mean absolute errors) of the proposed
iterative algorithm improves in the first 3 to 4 iterations, drops slightly,
and converges to a suboptimal estimate.
The proposed algorithms are not optimal in removing
non-Gaussian noise (e.g., impulsive noise). This is because we have taken a
model-based approach in solving a MAP estimating problem involving a Gaussian
linear observation model which has been used in many recent publications (see,
e.g., [11] and
reference therein). As such, the solution is only optimal for Gaussian noise.
From a model-based point of view, to deal with non-Gaussian noise, we need to
make proper assumption about the noise distribution function and solve the MAP
problem. Such work is beyond the scope of this paper.
5. Conclusions
In this paper, we have studied CFLB/MM algorithms for
a special class of objective functions that are convex through a suitable
mapping of variable. We proposed a generalized version of the CFLB/MM algorithm
and show that the CFLB and MM algorithms are equivalent for this class of
objective functions. We develop a CFLB/MM algorithm for general MAP estimation
problems under linear Gaussian observation models. We also study the
relationship between the CFLB/MM algorithm and the EM algorithm. We then modify
the proposed algorithm to image denoising. We showthat the proposed image denoising algorithm can be regarded as a generalization of the classical Wiener estimate algorithm.We propose a noniterative and an iterativealgorithm for
image denoising. We discuss connections of the proposed iterativealgorithm
with those algorithms using empirical Bayes and issues related to using the
proposed algorithms in over-complete wavelet representations. Experimental
results show that the performance of the proposed algorithmusing a single
wavelet representation is better than that of the bi-shrinkage algorithm which
is arguably one of the best in recent publications. When over-complete wavelet
representations such as the complex wavelets are used, the performance of the
proposed algorithms are competitive with three state-of-the-art algorithms.
References
- S. M. Kay, Fundamentals of Statistical Signal Processing Estimation Theory, Prentice Hall, Englewood Cliffs, NJ, USA, 1993.
- P. J. Huber, Robust Statistics, John Wiley & Sons, New York, NY, USA, 1981.
- T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer, New York, NY, USA, 2001.
- C. M. Bishop, Neural Networks for Pattern Recognition, Oxford University Press, London, UK, 1995.
- B. A. Olshausen and D. J. Field, “Emergence of simple-cell receptive field properties by learning a sparse code for natural images,” Nature, vol. 381, no. 6583, pp. 607–609, 1996.
- M. A. T. Figueiredo, “Adaptive sparseness for supervised learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 9, pp. 1150–1159, 2003.
- J. Fan and R. Li, “Variable selection via nonconcave penalized likelihood and its oracle properties,” Journal of the American Statistical Association, vol. 96, no. 456, pp. 1348–1360, 2001.
- R. Tibshirani, “Regression shrinkage and selection via lasso,” Journal of the Royal Statistical Society. Series B, vol. 57, pp. 257–288, 1995.
- A. Antoniadis and J. Fan, “Regularization of wavelet approximations,” Journal of the American Statistical Association, vol. 96, no. 455, pp. 939–967, 2001.
- D. Donoho and I. Johnstone, “Adapting to unknown smoothness via wavelet shrinkage,” Journal of the American Statistical Association, vol. 90, no. 432, pp. 1200–1224, 1995.
- J. M. Bioucas-Dias, “Bayesian wavelet-based image deconvolution: a GEM algorithm exploiting a class of heavy-tailed priors,” IEEE Transactions on Image Processing, vol. 15, no. 4, pp. 937–951, 2006.
- D. F. Andrews and C. L. Mallows, “Scale mixtures of normal distributions,” Journal of the Royal Statistical Society. Series B, vol. 36, pp. 99–102, 1974.
- W. H. Rogers and J. W. Tukey, “Understanding some long-tailed distributions,” Statistica Neerlandica, vol. 26, pp. 211–226, 1962.
- K. L. Lange and J. S. Sinsheimer, “Normal/independent distributions and their applications in robust regression,” Journal of Computational and Graphical Statistics, vol. 2, no. 2, pp. 175–198, 1993.
- L. Chuanhai, “Bayesian robust multivariate linear regression with incomplete data,” Journal of the American Statistical Association, vol. 91, no. 435, pp. 1219–1227, 1996.
- J. Portilla, V. Strela, M. J. Wainwright, and E. P. Simoncelli, “Image denoising using scale mixtures of Gaussians in the wavelet domain,” IEEE Transactions on Image Processing, vol. 12, no. 11, pp. 1338–1351, 2003.
- D. P. Wipf and B. D. Rao, “Sparse Bayesian learning for basis selection,” IEEE Transactions on Signal Processing, vol. 52, no. 8, pp. 2153–2164, 2004.
- F. Girosi, “Models of noise and robust estimates,” Massachusetts Institute of Technology, Cambridge, Mass, USA, 1991.
- M. I. Jordan, Z. Ghahramani, T. S. Jaakkola, and L. K. Saul, “An introduction to variational methods for graphical models,” in Learning in Graphical Models, M. I. Jordan, Ed., Kluwer Academic Publishers, Dordrecht, The Netherlands, 1999.
- T. S. Jaakkola, “Tutorial on variational approximation methods,” in Advanced Mean Field Methods: Theory and Practice, D. Saad and M. Opper, Eds., MIT Press, Cambridge, Mass, USA, 2000.
- D. R. Hunter and K. Lange, “A tutorial on MM algorithms,” American Statistician, vol. 58, no. 1, pp. 30–37, 2004.
- M. J. Beal, Variational algorithms for approximate Bayesian inference, Ph.D. dissertation, The Gatsby Computational Neuroscience Unit, University College of London, London, UK, May 2003.
- S. J. Roberts and W. D. Penny, “Variational Bayes for generalized autoregressive models,” IEEE Transactions on Signal Processing, vol. 50, no. 9, pp. 2245–2257, 2002.
- M. Girolami, “A variational method for learning sparse and overcomplete representations,” Neural Computation, vol. 13, no. 11, pp. 2517–2532, 2001.
- M. Welling, G. E. Hinton, and S. Osindero, “Learning sparse topographic representations with products of student-t distributions,” in Proceedings of the 15th Conference on Neural Information Processing Systems (NIPS '02), G. Tesauro, D. S. Touretzky, and T. K. Leen, Eds., pp. 1359–1366, MIT Press, Vancouver, BC, Canada, December 2002.
- A. C. Likas and N. P. Galatsanos, “A variational approach for Bayesian blind image deconvolution,” IEEE Transactions on Signal Processing, vol. 52, no. 8, pp. 2222–2233, 2004.
- D. R. Hunter and R. Li, “Variable selection using MM algorithms,” Annals of Statistics, vol. 33, no. 4, pp. 1617–1642, 2005.
- D. R. Hunter and K. Lange, “Quantile regression via an MM algorithm,” Journal of Computational & Graphical Statistics, vol. 9, pp. 60–77, 2000.
- Z. Zhang, J. T. Kwok, and D.-Y. Yeung, “Surrogate maximization/minimization algorithms for AdaBoost and the logistic regression model,” in Proceedings of the 21st International Conference on Machine Learning (ICML '04), pp. 927–934, Banff, Canada, July 2004.
- D. Geman and G. Reynolds, “Constrained restoration and the recovery of discontinuities,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 14, no. 3, pp. 367–383, 1992.
- D. Geman and C. Yang, “Nonlinear image recovery with half-quadratic regularization,” IEEE Transactions on Image Processing, vol. 4, no. 7, pp. 932–946, 1995.
- M. A. T. Figueiredo and R. D. Nowak, “A bound optimization approach to wavelet-based image deconvolution,” in Proceedings of the International Conference on Image Processing (ICIP '05), vol. 2, pp. 782–785, Genova, Italy, September 2005.
- J. Bioucas-Dias, M. Figueiredo, and J. Oliveira, “Total variation image deconvolution: a majorization-minimization approach,” in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '06), vol. 2, Toulouse, France, May 2006.
- G. Deng and W.-Y. Ng, “A minorization-maximization algorithm for maximum a posteriori signal estimation,” in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '06), vol. 2, pp. 617–620, Toulouse, France, May 2006.
- K. Lange and J. Fessler, “Globally convergent algorithms for maximuma posteriori transmission tomography,” IEEE Transactions on Image Processing, vol. 4, no. 10, pp. 1430–1438, 1995.
- A. de Pierro, “A modified expectation maximization algorithm for penalized likelihood estimation in emission tomography,” IEEE Transactions on Medical Imaging, vol. 14, no. 1, pp. 132–137, 1995.
- P. Charbonnier, L. Blanc-Féraud, G. Aubert, and M. Barlaud, “Deterministic edge-preserving regularization in computed imaging,” IEEE Transactions on Image Processing, vol. 6, no. 2, pp. 298–311, 1997.
- H. Erdogan and J. A. Fessler, “Monotonic algorithms for transmission tomography,” IEEE Transactions on Medical Imaging, vol. 18, no. 9, pp. 801–814, 1999.
- L. Şendur and I. W. Selesnick, “Bivariate shrinkage with local variance estimation,” IEEE Signal Processing Letters, vol. 9, no. 12, pp. 438–441, 2002.
- X. Li and M. T. Orchard, “Spatially adaptive image denoising under overcomplete expansion,” in Proceedings of IEEE International Conference on Image Processing (ICIP '00), vol. 3, pp. 300–303, Vancouver, BC, Canada, September 2000.
- S. Boyd and L. Vandenberghe, Convex Optimization, Cambridge University Press, Cambridge, UK, 2004.
- D. R. Hunter, K. Lange, and I. Yang, “Optimization transfer using surrogate objective functions,” Journal of Computational and Graphical Statistics, vol. 9, no. 1, pp. 1–20, 2000.
- M. Jamshidian, “Adaptive robust regression by using a nonlinear regression program,” Journal of Statistical Software, vol. 4, pp. 1–25, 1999.
- G. Deng, “Generalized wiener estimation algorithms based on a family of heavy-tail distributions,” in Proceedings of IEEE International Conference on Image Processing (ICIP '05), vol. 1, pp. 457–460, Genova, Italy, September 2005.
- A. Gelman, H. B. Carlin, H. S. Stern, and D. B. Rubin, Bayesian Data Analysis, Chapman & Hall/CRC, Boca Raton, Fla, USA, 2004.
- G. Deng, “Signal estimation using multiple-wavelet representations and Gaussian models,” in Proceedings of IEEE International Conference on Image Processing (ICIP '05), vol. 1, pp. 453–456, Genova, Italy, September 2005.
- L. Şendur and I. W. Selesnick, “Bivariate shrinkage functions for wavelet-based denoising exploiting interscale dependency,” IEEE Transactions on Signal Processing, vol. 50, no. 11, pp. 2744–2756, 2002.
- S. G. Chang, B. Yu, and M. Vetterli, “Adaptive wavelet thresholding for image denoising and compression,” IEEE Transactions on Image Processing, vol. 9, no. 9, pp. 1532–1546, 2000.
- M. K. Mihcak, I. Kozintsev, K. Ramchandram, and P. Moulin, “Low-complexity image denoising based on statistical modelling of wavelet coefficients,” IEEE Signal Processing Letters, vol. 6, no. 12, pp. 300–303, 1999.
- M. Elad, “Why simple shrinkage is still relevant for redundant representations?,” IEEE Transactions on Information Theory, vol. 52, no. 12, pp. 5559–5569, 2006.
- S. S. Chen, Basis pursuit, Ph.D. dissertation, Department of Statistics, Stanford University, Stanford, Calif, USA, November 1995.