Abstract
In this study, two new approaches for speech signal noise reduction
based on the empirical mode decomposition (EMD) recently introduced by
Huang et al. (1998) are proposed. Based on the EMD, both reduction schemes are fully data-driven approaches. Noisy signal is decomposed adaptively
into oscillatory components called intrinsic mode functions (IMFs), using
a temporal decomposition called sifting process. Two strategies for noise
reduction are proposed: filtering and thresholding. The basic principle
of these two methods is the signal reconstruction with IMFs previously
filtered, using the minimum mean-squared error (MMSE) filter introduced by I. Y. Soon et al. (1998), or thresholded using a shrinkage function. The performance of these methods
is analyzed and compared with those of the MMSE filter and wavelet
shrinkage. The study is limited to signals corrupted by additive white
Gaussian noise. The obtained results show that the proposed denoising
schemes perform better than the MMSE filter and wavelet approach.
1. Introduction
Speech enhancement is a classical problem in signal
processing, particularly in the case of additive white Gaussian noise where
different noise reduction methods have been proposed [1–4]. When noise estimation is
available, then filtering gives accurate results. However, these methods are
not so effective when noise is difficult to estimate. Linear methods such as
Wiener filtering [5]
are used because linear filters are easy to implement and design. These linear
methods are not so effective for signals presenting sharp edges or impulses of
short duration. Furthermore, real signals are often nonstationary. In order to
overcome these shortcomings, nonlinear methods have been proposed and
especially those based on wavelets thresholding [6, 7]. The idea of wavelet
thresholding relies on the assumption that signal magnitudes dominate the
magnitudes of noise in a wavelet representation so that wavelet coefficients
can be set to zero if their magnitudes are less than a predetermined threshold
[7]. A limit of the
wavelet approach is that basis functions are fixed, and, thus, do not
necessarily match all real signals. To avoid this problem, time-frequency
atomic signal decomposition can be used [8, 9]. As for wavelet packets, if the dictionary is very
large and rich with a collec- tion of atomic waveforms which are located on a
much finer grid in time-frequency space than wavelet and cosine packet tables,
then it should be possible to represent a large class of real signals; but, in
spite of this, the basis functions must be specified (Gabor functions, damped
).
Recently, a new
data-driven technique, referred to as empirical mode decomposition (EMD) has
been introduced by Huang et al.
[10] for analyzing data
from nonstationary and nonlinear processes. The EMD has received more attention
in terms of applications [11–23], interpretation [24, 25],
and improvement [26, 27]. The major advantage of the EMD is that basis
functions are derived from the signal itself. Hence, the analysis is adaptive
in contrast to the traditional methods where basis functions are fixed. The EMD
is based on the sequential extraction of energy associated with various
intrinsic time scales of the signal, called intrinsic mode functions (IMFs),
starting from finer temporal scales (high-frequency IMFs) to coarser ones
(low-frequency IMFs). The total sum of the IMFs matches the signal very well and
therefore ensures completeness [10]. We have shown that the EMD can be used for signals
denoising [14, 15]
or filtering [17]. The
denoising method reconstructs the signal with all the IMFs previously
thresholded as in wavelet analysis or filtered [14, 15]. The filtering scheme
relies on the basic idea that most structures of the signal are often
concentrated on lower-frequency components (last IMFs), and decrease toward
high-frequency modes (first IMFs) [17]. Thus, the recovered signal is reconstructed with
only few IMFs that are signal dominated using an energy criterion. Thus,
compared to the approach introduced in [14, 15], no thresholding or filtered is required. The
proposed filtering method is a fully data approach [17].
In this paper, we show how the idea of thresholding
IMFs using hard or soft shrinkage introduced in [14, 15] can be extended and adapted
to speech signal for enhancement purpose. According to if the noise level can
be correctly estimated or not, two noise reduction methods are proposed. The
first strategy combines the EMD and the minimum mean-squared error (MMSE)
filter [1], and the
second one associates the EMD with hard shrinkage [14, 15]. The two methods are
applied to speech signals corrupted with different noise levels, and the
results are compared to the MMSE filter and the wavelet approach.
2. EMD Algorithm
The EMD
decomposes a signal
into a series of IMFs through an iterative
process called sifting; each one, with distinct time scale [10]. The decomposition is based
on the local time scale of
and yields adaptive basis functions. The EMD
can be seen as a type of wavelet decomposition whose subbands are built up as
needed to separate the different components of
.
Each IMF replaces the signals detail, at a certain scale or frequency band
[24]. The EMD picks
out the highest-frequency oscillation that remains in
.
By definition, an IMF satisfies two conditions:
(1)
the number of extrema and the number of zeros
crossings may differ by no more than one;
(2)
the average value of the envelope defined by
the local maxima and the envelope defined by the local minima is zero.
Thus, locally,
each IMF contains lower-frequency oscillations than the just-extracted one. To
be successfully decomposed into IMFs,
must have at least two extrema; one minimum
and one maximum. The sifting involves the following steps:
Step 1.
fix
the threshold
and set
(
);
Step 2.
(residual);
Step 3.
extract the
:
(a)
,
(
number of sifts),
(b)
extract local maxima/minima of
,
(c)
compute upper and lower envelopes
and
by interpolating, using cubic spline, respectively, local maxima
and minima of
,
(d)
compute the mean of the envelopes:
(e)
update:
,
,
(f)
calculate the stopping criterion:
,
(g)
repeat Steps (b)–(f) until SD(i)
and then put
;
Step 4.
update residual:
;
Step 5.
repeat Step 3 with
until the number of extrema in
is
;
where T is
time duration. The sifting is repeated several
times (i) in order to get
true IMF that fulfills the conditions (1) and
(2). The result of the sifting is that
will be decomposed into a sum of
IMFs and a residual
such that
(1)
value is determined automatically using SD
(Step 3(f)). The sifting has two effects: (a) it eliminates riding waves and
(b) to smoothen uneven amplitudes. To guarantee
that IMF components retain enough physical sense of both amplitude and
frequency modulation, we have to determine SD value for the sifting. This is
accomplished by limiting the size of the standard deviation SD computed from
the two consecutive sifting results. Usually, SD (or
) is set between
to
[10].
3. Denoising Principle
Let a clean
speech signal
be corrupted by an additive white Gaussian
noise
as follows:
(2)The noisy signal is decomposed into a sum of IMFs by the EMD, such
that
(3)where
is a noisy version of the data
:
(4)An estimation
of
based on the noisy observation
is given by
(5)where
is a preprocessing function, defined by a set
of parameters
,
applied to signal
[14, 15]. The function
is chosen according to if noise level can be
estimated or not. When this estimation is possible,
is reduced to the MMSE filter [1]. However, when the
estimation is not easy, the preprocessing can be a thresholding [14, 15]. The function
is a shrinkage, and
is a threshold parameter. Finally, the
denoised signal,
,
is given by
(6)
3.1. EMD-MMSE
Generally,
speech noise estimation is performed using the Boll's method [28]. Accordingly, the silence
periods of the signal are detected, and then power spectra noise estimation is
performed by considering the average of the power spectra of the noisy signal
on the
first temporal frames which are considered as
being moments of silence, following the relation
(7)where
is power spectra value at the discrete
frequency
of frame
.
This method gives a correct estimation of the noise [28].
Extensive simulations have shown that when a speech
signal with a silence sequence is decomposed by EMD, its first IMF corresponds
to that silence sequence. Thus, the first IMF can be used to correctly estimate
the noise level. According to [24], the noise level of the modes following the first IMF
(
) is estimated via
(8)where
is the noise level of first IMF.
The combination
of the EMD and the MMSE filter [1] is called EMD-MMSE strategy. Thus, each IMF is
filtered by the MMSE filter as follows:
(9)
where
and
are the spectral noisy IMF and the spectral
estimated IMF, respectively, observed at the discrete frequency
on the frame
.
H(
) is described as follows [1]:
(10)The signal-to-noise ratio,
,
is estimated according to the method of Ephraim and Malah [2] which is based on the
estimated
from the previous frame and on a local
estimation of
:
(11)where
is a weighting factor (equal to 0.98) and
indicates the instantaneous SNR, defined as
the local estimation of
:
(12)
3.2. EMD-Shrinkage
A smooth
version of the input signal can be obtained by thresholding the IMFs before
signal reconstruction [14, 15]. In this case, the threshold parameter is estimated
by the following expression [6, 14, 15, 29, 30]:
(13)where
is the signal length and
is the estimated noise level (scale level).
The
is given by [14, 15, 31]
(14)According to [24, 32], using
,
the noise level
of the IMFs can be estimated using (8).
There are different nonlinear shrinkage functions
[33]. In the present
work, we use the hard shrinkage which has given interesting denoising results
for speech enhancement:
(15)The association of the EMD and the hard shrinkage is called
EMD-shrinkage method.
4. Results
The two
proposed noise reduction methods are tested on speech signals corrupted by
additive white Gaussian noise with different SNRs. The results are compared to
the MMSE filter and the wavelet approach (Haar, Symmlet 4, Daubechies 4). As
indicated, the EMD denoising schemes depend on the noise estimation. So, if the
prespeech period of the noisy signal is detected, then the EMD-MMSE is used.
Otherwise, the EMD-shrinkage is used. The SNR is used as an objective measure
to evaluate the denoising methods performance. More precisely, the SNR is used
to compare the EMD-MMSE to the MMSE filter and the wavelet approach to the
EMD-shrinkage. The SNR is defined by
(16)where
and
are the original signal and the reconstructed
one, respectively.
The EMD-MMSE
denoising scheme is applied to four clean speech signals “a”,
“b”, “c”, and “d” (Figures 1(a)–1(d)) corrupted
by additive white Gaussian noise with SNR values ranging from 4 dB to 10 dB.
Noisy versions of the original signals corresponding to SNR = 5 dB are shown in
Figure 2. We carried out numerical simulations where for each SNR value, 100
independent noise simulations are generated and averaged values calculated.
Figure 3 shows the denoising result obtained by the EMD-MMSE and the MMSE
filter. From this figure and compared to the respective clean signals of Figure 1, one can conclude that the EMD-MMSE performs better in terms of noise
reduction than the MMSE filter. This fact is confirmed by the results shown in
Figure 4 where interesting improvement in SNR are given by the EMD-MMSE
compared to the MMSE filter. Indeed, the EMD-MMSE's SNR improvement is about
1 dB greater than the MMSE filter for all the four considered signals
“a”, “b”, “c”, and “d”.
Figure 1: The original signals
“a”, “b”, “c”, and “d”.
Figure 2: The noisy version of signals
“a”, “b”, “c”, and “d”. (SNR = 5 dB).
Figure 3: Denoising results of signals
“a”, “b”, “c”, and “d” by the EMD-MMSE
and the MMSE filter.
Figure 4: Final SNR values obtained for
different initial noise levels of signals “a”, “b”, “c”, and “d”. The results are
the average of 100 instances signal. It is reported for EMD-MMSE and the MMSE
filter.
The
EMD-shrinkage is applied to four clean speech signals “e”,
“f”, “g”, and “h” (Figure 5), corrupted by
additive white Gaussian noise with SNR values ranging from
10 dB to 3 dB. Noisy
versions of the original signals corresponding to SNR =
1 dB are shown in Figure 6. Denoising results of the EMD-shrinkage (hard thresholding) and the wavelet
method (Daubechies 4) are shown in Figure 7. A careful examination of the
signals of Figures 5 and 7 shows that the EMD-shrinkage performs better than
the wavelet method in terms of noise reduction. Furthermore, signals structures
or features are globally better preserved with the EMD-shrinkage than the
wavelet method. Figure 8 shows the improvement in SNR values obtained for
different noise levels of the signals “e”, “f”,
“g”, and “h” for the EMD-shrinkage and three-type wavelet
method (Haar, Symmlet 4, Daubechies 4). This figure demonstrates that for noise
SNR values from
10 dB to 3 dB, the improvement in SNR provided by the
EMD-shrinkage varies from
0.7 dB to 11.5 dB. In addition, the gain in SNR of the
EMD-shrinkage is much better than the one obtained by the other method for the
three wavelets. When listening to the enhanced speeches, both the EMD-MMSE and
the EMD-shrinkage are found to produce lower residual noise and, noticeably,
less speech distortion for all the signals compared to the MMSE or the wavelet
method.
Figure 5: The original signals
“e”, “f”, “g”, and “h”.
Figure 6: The noisy version of signals
“e”, “f”, “g”, and “h” (SNR = −1 dB).
Figure 7: Denoising results of signals
“e”, “f”, “g”, and “h” by the
EMD-shrinkage and the wavelet approach (Daubechies 4).
Figure 8: Final SNR values obtained for
different initial noise levels of signals “e”, “f”, “g”, and “h”. The results are
the average of 100 instances signal. It's reported for EMD-shrinkage and for
three different wavelets (Haar, Symmlet 4, Daubechies 4).
5. Conclusion
This paper
presents two new speech denoising methods. Both schemes are based on the EMD
and thus are simple and fully data-driven methods. The methods do not use any
pre- or postprocessing and do not require any use of parameters setting (except
the threshold
). The study is limited to signals corrupted
by additive white Gaussian noise. Obtained results for clean speech signals
corrupted with additive Gaussian noise with different SNR values ranging from
10 dB to 10 dB show that the proposed EMD-denoising methods, associated with the
MMSE filter or the shrinkage strategy, perform better than the MMSE filter and
the wavelet approach, respectively. These results show that the EMD-denoising
methods are effective for noise removal and confirm our previous findings
[14, 15]. The EMD-shrinkage is very
attractive, especially in the case where the noise estimation is not easy. Even
in the case when the noise level estimation is possible, the EMD improves the
denoising result with the classical MMSE filter. The obtained results also show
that it is more efficient to apply thresholding or filtering to the different
components (IMFs) of the signal than to the signal itself. To confirm the
obtained results and the effectiveness of the EMD-denoising approaches, the
schemes must be evaluated with a large class of speech signals and in different
experimental conditions, such as sampling rates, sample sizes, multiplicative
noise, or the type of noise.
References
- I. Y. Soon, S. N. Koh, and C. K. Yeo, “Noisy speech enhancement using discrete cosine transform,” Speech Communication, vol. 24, no. 3, pp. 249–257, 1998.
- Y. Ephraim and D. Malah, “Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 32, no. 6, pp. 1109–1121, 1984.
- I.-Y. Soon and S. N. Koh, “Low distortion speech enhancement,” IEE Proceedings: Vision, Image and Signal Processing, vol. 147, no. 3, pp. 247–253, 2000.
- P. Scalart and J. V. Filho, “Speech enhancement based on a priori signal to noise estimation,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal (ICASSP '96), vol. 2, pp. 629–632, Atlanta, Ga, USA, May 1996.
- J. G. Proakis and D. G. Manolakis, Digital Signal Processing: Principles, Algorithms, and Applications, Prentice-Hall, Upper Saddle River, NJ, USA, 3rd edition, 1996.
- D. L. Donoho and I. M. Johnstone, “Ideal spatial adaptation by wavelet shrinkage,” Biometrica, vol. 81, no. 3, pp. 425–455, 1994.
- D. L. Donoho, “De-noising by soft-thresholding,” IEEE Transactions on Information Theory, vol. 41, no. 3, pp. 613–627, 1995.
- S. G. Mallat and Z. Zhang, “Matching pursuits with time-frequency dictionaries,” IEEE Transactions on Signal Processing, vol. 41, no. 12, pp. 3397–3415, 1993.
- M. M. Goodwin and M. Vetterli, “Matching pursuit and atomic signal models based on recursive filter banks,” IEEE Transactions on Signal Processing, vol. 47, no. 7, pp. 1890–1902, 1999.
- N. E. Huang, Z. Shen, S. R. Long, et al., “The empirical mode decomposition and Hilbert spectrum for nonlinear and non-stationary time series analysis,” Proceedings of the Royal Society A, vol. 454, no. 1971, pp. 903–995, 1998.
- A.-O. Boudraa, J. C. Cexus, F. Salzenstein, and L. Guillon, “If estimation using empirical mode decomposition and nonlinear Teager energy operator,” in Proceedings of the 1st International Symposium on Control, Communications and Signal Processing (ISCCSP '04), pp. 45–48, Hammamet, Tunisia, March 2004.
- J. C. Cexus and A. O. Boudraa, “Non-stationary signals analysis by Teager-Huang transform (THT),” in Proceedings of the 14th European Signal Processing Conference (EUSIPCO '06), p. 5, Florence, Italy, September 2006.
- J. C. Cexus and A. O. Boudraa, “Teager-Huang analysis applied to sonar target recognition,” International Journal of Signal Processing, vol. 1, no. 1, pp. 23–27, 2004.
- A. O. Boudraa, J. C. Cexus, and Z. Saidi, “EMD-based signal noise reduction,” International Journal of Signal Processing, vol. 1, no. 1, pp. 33–37, 2004.
- A. O. Boudraa and J. C. Cexus, “Denoising via empirical mode decomposition,” in Proceedings of the IEEE International Symposium on Control, Communications and Signal Processing (ISCCSP '06), p. 4, Marrakech, Morocco, March 2006.
- B. Weng, M. Blanco-Velasco, and K. E. Barner, “ECG denoising based on the empirical mode decomposition,” in Proceedings of the 28th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBS '06), pp. 1–4, New York, NY, USA, August-September 2006.
- A. O. Boudraa, J. C. Cexus, S. Benramdane, and A. Beghdadi, “Noise filtering using empirical mode decomposition,” in Proceedings of the IEEE International Symposium on Signal Processing and Its Applications (ISSPA '07), p. 4, Sharjah, United Arab Emirates, February 2007.
- Z. Liu and S. Peng, “Boundary processing of bidimensional EMD using texture synthesis,” IEEE Signal Processing Letters, vol. 12, no. 1, pp. 33–36, 2005.
- A. O. Boudraa, J. C. Cexus, F. Salzenstein, and A. Beghdadi, “EMD-based multibeam echosounder images segmentation,” in Proceedings of the 2nd IEEE International Symposium on Communications, Control and Signal Processing (ISCCSP '06), p. 4, Marrakech, Morocco, March 2006.
- K. Zeng and M.-X. He, “A simple boundary process technique for empirical mode decomposition,” in Proceedings of the IEEE International Geoscience and Remote Sensing Symposium Proceedings (IGARSS '04), vol. 6, pp. 4258–4261, Anchorage, Alaska, USA, September 2004.
- P. Flandrin, P. Gonçalvès, and G. Rilling, “Detrending and denoising with empirical mode decomposition,” in Proceedings of the 12th European Signal Processing Conference (EUSIPCO '04), pp. 1581–1584, Vienna, Austria, September 2004.
- G. Rilling, P. Flandrin, and P. Gonçalvès, “Empirical mode decomposition, fractional Gaussian noise and hurst exponent estimation,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '05), vol. 4, pp. 489–492, 2005.
- S. Benramdane, J. C. Cexus, A. O. Boudraa, and J. A. Astolfi, “Transient turbulent pressure signal processing using empirical mode decomposition,” in Proceedings of the 4th International Conference on Physics in Signal and Image Processing, p. 4, Mulhouse, France, January 2007.
- P. Flandrin, G. Rilling, and P. Gonçalvès, “Empirical mode decomposition as a filter bank,” IEEE Signal Processing Letters, vol. 11, no. 2, part 1, pp. 112–114, 2004.
- Z. Wu and N. E. Huang, “A study of the characteristics of white noise using the empirical mode decomposition method,” Proceedings of the Royal Society A, vol. 460, no. 2046, pp. 1597–1611, 2004.
- B. Weng and K. E. Barner, “Optimal and bidirectional optimal empirical mode decomposition,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '07), vol. 3, pp. 1501–1504, Honolulu, Hawaii, USA, April 2007.
- R. Deering and J. F. Kaiser, “The use of a masking signal to improve empirical mode decomposition,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '05), vol. 4, pp. 485–488, Philadelphia, Pa, USA, March 2005.
- S. F. Boll, “Suppression of acoustic noise in speech using spectral subtraction,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 27, no. 2, pp. 113–120, 1979.
- D. L. Donoho and I. M. Johnstone, “Adapting to unknow smoothness via wavelet shrinkage,” Journal of the American Statistical Association, vol. 90, no. 432, pp. 1200–1424, 1995.
- D. L. Donoho, I. M. Johnstone, G. Kerkyacharian, and D. Picard, “Wavelet shrinkage: asymptopia with discussion,” Proceedings of the Royal Statistical Society B, vol. 57, no. 2, pp. 301–396, 1995.
- W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling, Numerical Recipes in C: The Art of Scientific Computing, Cambridge University Press, New York, NY, USA, 2nd edition, 1992.
- G. Steidl, J. Weickert, T. Brox, P. Mrazek, and M. Welk, “On the equivalence of soft wavelet shrinkage, total variation diffusion, total variation regularization and SIDEs,” Department of Mathematics, University of Bremen, Bremen, Germany, 2003.
- S. Mallat, Une Exploration des Signaux en Ondelettes, Ecole Polytechnique, Palaiseau, France, 2000.