Table of Contents Author Guidelines Submit a Manuscript
Journal of Electrical and Computer Engineering
Volume 2018, Article ID 1895190, 5 pages
https://doi.org/10.1155/2018/1895190
Research Article

Comparative Study between the Discrete-Frequency Kalman Filtering and the Discrete-Time Kalman Filtering with Application in Noise Reduction in Speech Signals

1Department of Electrical Engineering, Universidade Federal de Uberlândia, Av. João Naves de Ávila, 2160 Bloco 3N, Campus Santa Mônica, Uberlândia, MG, Brazil
2Department of Electrical Engineering, Faculdade de Talentos Humanos, R. Manoel Gonçalves de Rezende, 230 São Cristóvão, Uberaba, MG, Brazil
3Department of Electrical Engineering, Universidade Federal de Goiás, Av. Esperança, s/n. Campus Universitário, Goiânia, GO, Brazil

Correspondence should be addressed to Leandro Aureliano da Silva; rb.moc.oohay@onailerua_ordnael

Received 26 January 2018; Accepted 15 April 2018; Published 3 June 2018

Academic Editor: Shunyi Zhao

Copyright © 2018 Leandro Aureliano da Silva et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

This article aims to carry out a comparative study between discrete-time and discrete-frequency Kalman filters. In order to assess the performance of both methods for speech reconstruction, we measured the output segmental signal-to-noise ratio and the Itakura-Saito distance provided by each algorithm over 25 different voice signals. The results show that although the two algorithms performed very similarly regarding noise reduction, the discrete-time Kalman filter produced smaller spectral distortion on the estimated signals when compared with the discrete-frequency Kalman filter.

1. Introduction

Even with the advent of the Internet, voice transmission is still one of the most important ways of communication. The quality and intelligibility of speech signals play a key role in the ease and precision during information exchange. Practically in almost all voice transmission applications, the quality can be affected by factors such as ambient noise, losses due to digital link encoding, and interference from other conversations or even from other signal sources [1].

In order to overcome their harmful effects, digital speech processing techniques can be employed to reduce or even eliminate them. In recent years, some techniques and methods such as spectral subtraction, Kalman filtering, psychoacoustics, and wavelet transforms gained more prominence, especially in noise reduction, so that many research efforts have been made for improving them.

In [2, 3], the authors enhance speech quality by removing the musical noise introduced by spectral subtraction. In [1], the authors combined spectral subtraction and wavelets on a prefiltering approach for noise reduction in speech signals and used the result as an initial guess for a Kalman filter. When compared to Kalman filtering using only wavelets or spectral subtraction alone to produce the initial guess, their method showed the least spectral distortion and a similar segmental output signal-to-noise ratio.

Since wavelet-based denoising is highly dependent on thresholding the approximation and detail coefficients, recent research in this area focuses on new thresholds [4, 5].

Shao and Chang [6] concatenated the Kalman filter to a bank of wavelet filters with a perceptual weighting filter. They used a technique of masking the psychoacoustic model to derive the weighting filter. According to the authors, that work brought two contributions. The first one was the wavelet-based auditory model with a perceptual wavelet filter bank that maps the frequency response of the human auditory system through subband decomposition. The second was the Kalman filter using a voice state space model in the wavelet domain, whose computational cost was reduced when compared to the discrete-time Kalman filter. They were able to reduce the noise in different environments with low signal degradation.

Dhivya and Justin [7] proposed a noise reduction based technique that applies spectral subtraction to the wavelet approximation coefficients and soft thresholding to the detail coefficients. They used five wavelet filters and compared them according to their output signal-to-noise ratios. Besides the output SNR, they also considered the correlation coefficient and the perceptual evolution of speech quality (PESQ) criteria.

However, although these algorithms show significant advances in noise removal, most of them do not evaluate spectral distortion nor do they attempt to minimize it. So, since the method in [1] provided low spectral distortion, this article proposes a comparative study between discrete-time and discrete-frequency Kalman filters simply using the noisy signal as initial estimate. According to Fujimoto and Ariki [8], the main difference between the two approaches is that the operation of the Kalman filter is more computationally efficient in the frequency domain than in the time domain.

On the other hand, transforming the set of Kalman filter equations to/from the frequency domain produces a significant distortion in the estimated signal. Then, we used prefiltering based on spectral subtraction to reduce this distortion. In order to assess the performance of the proposed algorithms, we measured both the segmental signal-to-noise ratio of the outputs and the Itakura-Saito distance.

This article is structured as follows: Sections 2 and 3 describe the discrete-time and discrete-frequency Kalman filtering algorithms, respectively. Section 4 brings the experimental results and finally, in Section 5, the conclusions are presented.

2. Discrete-Time Kalman Filtering (DTKF)

In the 1960s, Rudolf Emil Kalman published the paper “A New Approach to Linear Filtering and Prediction Problems”, describing a recursive solution to the discrete-time linear filtering problem [1]. Since then, due to the major advances of digital computing, Kalman filtering has become a very important technique in several areas such as navigation, monitoring processes, economics, and signal reconstruction from noisy samples.

In this article, the Kalman filtering development follows the heuristics described by Vaseghi [9]. Thus, the speech signal is modeled as an autoregressive process of order , , according to where are the linear prediction coefficients of order , is the prediction error associated with the excitation of the source-filter model of speech production, and is the th sample of the speech signal.

It can be observed that, in the acquisition process of audio and speech signals, most of the signals are captured in the presence of some type of additive noise. Consequently, we can model the noisy signal as shown in where is the noisy speech signal and is a white Gaussian additive noise.

From (1) and (2), we can set up a state space model described by (3) and (4), respectively [9]: where is the state vector at time ; is the state transition matrix with dimensions that relates current time with past time ; is the input vector of the state equation and it is modeled as a white noise; is the observation vector; is the channel distortion matrix of dimensions ; and is an additive white noise vector [9].

According to Vaseghi [9], and are assumed to be independent white noise processes so that where and are diagonal covariance matrices, respectively, related to the additive noise and the prediction error.

The Kalman filtering estimates a process by using a kind of feedback control: first, the filter estimates the state of the process at a given time, then the feedback is obtained in the form of a new measurement.

Brown and Hwang [10] and Vaseghi [9] divided the Kalman filtering equations into two groups. The first ones are the time-update equations (prediction) and the second are the measurement-update equations (correction). Equation (7) describes the time-update: while measurement-update equations are shown in (8) and (9), respectively. where is the error covariance matrix at time ; is the Kalman gain matrix, responsible for minimizing ; and is the state estimate at time , according to the previous observations of .

3. Discrete-Frequency Kalman Filtering (DFKF)

Fujimoto and Ariki [8] introduced the discrete-frequency Kalman filtering (DFKF) in 2000 to provide more computationally efficient algorithm. This is accomplished by transforming the Kalman filter equations to be iterated in the frequency domain and then inverse transforming the estimated spectrum back to the time domain to find the estimated signal. In order to do so, they divide the frequency domain into multiple frames in such a way that the th frame is the complex spectrum of the noiseless signal and is the white Gaussian noise. Thus, the noise-corrupted signal is given by the following equation [8]: Since can be replaced by the Inverse Discrete Fourier Transform (IDFT) of , we have In matrix notation, (12) can be represented as shown in that can be simply written as where represents time within th frame, is the number of samples in the frame, and is the vector containing the basis of the Discrete Fourier Transform (DFT). is the complex spectrum vector of the th frame. Since time has no meaning for , there is no state transition matrix in the Kalman equations for the frequency domain, so that the computational effort of the DFKF is significantly reduced.

Analogous to the DTKF, the DFKF can be represented by the following equations: where means the complex conjugate transpose of a matrix.

In order to obtain the estimated signal of the Kalman filter in the time domain, we must apply the Inverse Discrete Fourier Transform (IDFT) on (16).

4. Results

In order to compare the performances of the studied techniques, we used 25 different recorded speech signals sampled at 22050 Hz and coded with 16 bits per sample. Each signal was windowed by a Hamming window of size 512 with 50% overlap. All tests were performed using Matlab R2013B on a Core i7 processor computer with 8 GB RAM.

The quality of the estimated speech signal in the output of each filter was evaluated using the segmental signal-to-noise ratio (SNRseg). We have chosen the SNRseg because it can be calculated over short segments of the speech signal, in order to balance the weights assigned to each segment of higher or lower signal strength. SNRseg is given by [11]where are the limits of each one of the frames of length . To carry out the tests, the signals were contaminated by additive white noise and the input segmental signal-to-noise ratio (SNRI) was adjusted to 3 dB.

As reported by Rabiner and Schafer [12], a suitable way to measure spectrum variations is the Itakura-Saito distance. Such measure can be calculated as where and are the linear prediction coefficients (LPC) vectors of the original and estimated signals, respectively, and is the autocorrelation matrix of the original signal. The closer to each other the spectra of the original and estimated signals, the smaller . Thus, an Itakura-Saito distance equal to zero indicates that the spectra are the equal [12].

The DTKF algorithm was employed in the first test, which used the utterance elétrica (electrical in Portuguese). The results are shown in Figures 1, 2, and 3, respectively.

Figure 1: Noiseless signal used for comparison with the estimated signal.
Figure 2: Contaminated signal with white noise applied to the DTKF algorithm.
Figure 3: Estimated signal after processing with the DTKF algorithm.

Figures 2 and 3 evidence the noise reduction, especially during the silence parts of the signal. The SNRO in this case was 10 dB and the Itakura-Saito distance was 0.3250.

The second test preserved the same parameters of the first test except for the use of DFKF. The results are shown in Figures 4 and 5, respectively. The SNRO was 8 dB and the comparison of Figures 4 and 5 shows a considerable reduction in the noise. However, the Itakura-Saito distance was 0.3782, which indicates a larger distortion in the filtering.

Figure 4: Contaminated signal with white noise applied to the algorithm DFKF.
Figure 5: Estimated signal after processing with the DFKF algorithm.

Therefore, the DTKF algorithm produced smaller spectral distortion than the DFKF but provided a larger SNRO.

The results of the tests for the 25 words are presented in Figures 6 and 7. Figure 6 shows that the SNRO in targeted tests was almost always the same for DTKF and DFKF, with an average of 9 dB.

Figure 6: Comparison for segmental signal-to-noise ratio output (SNRO) with 25 words contaminated by white noise with signal-to-noise ratio input (SNRI) of 3 dB.
Figure 7: Comparison for spectral distortion for 25 words contaminated by white noise with signal-to-noise ratio input (SNRI) of 3 dB.

Figure 7 shows that the DTKF algorithm produced smaller signal distortion for all tests. Thus, we can affirm that the DTKF is more suitable than the DFKF for speech processing.

Tests were also performed after prefiltering the noisy signals. The prefiltering was based on spectral subtraction like in [1]. All results showed that the DTKF produced smaller spectral distortion than DFKF. The spectral distortions for the 25 words are shown in Figure 8 for an SNRI of 3 dB.

Figure 8: Comparison for spectral distortion for 25 words contaminated by white noise with signal-to-noise ratio input (SNRI) of 3 dB, using spectral subtraction with prefiltering of the contaminated signal.

The comparison of Figures 7 and 8 indicates that prefiltering allowed only a tiny improvement in the reduction of spectral distortion provided by the DTKF algorithm.

5. Conclusions

This paper presented a comparative study between discrete-time and discrete-frequency Kalman filtering algorithms. Tests were carried out with 25 different words using Itakura-Saito distance to measure the spectral distortion and the segmental signal-to-noise ratio to evaluate the noise reduction.

Although the two algorithms performed very similarly regarding noise reduction, discrete-time Kalman filtering produced smaller spectral distortion on the estimated signals for all targeted tests. This shows that discrete-time Kalman filtering is more suitable than discrete-frequency Kalman filtering for the reconstruction of speech signals corrupted by additive white noise.

Data Availability

The voice data (.wav files) used to support the results of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

References

  1. L. A. da Silva and M. B. Joaquim, “Redução de Ruído em Sinais de Voz Usando, Filtros de Kalman de Tempo e Freqüência Discretos, Combinados com Subtração Espectral de Potência e/ou Wavelets,” in Proceedings of the XXV Simpósio Brasileiro de Telecomunicações-SBrT, pp. 3–6, 2007.
  2. C. Lu, K. Tseng, and C. Chen, “Reduction of Musical Residual Noise Using Hybrid Median Filter,” in Proceedings of the 2012 Spring Congress on Engineering and Technology (S-CET), pp. 1–4, Xi'an, China, May 2012. View at Publisher · View at Google Scholar
  3. R. Miyazaki, H. Saruwatari, T. Inoue, Y. Takahashi, K. Shikano, and K. Kondo, “Musical-noise-free speech enhancement based on optimized iterative spectral subtraction,” IEEE Transactions on Audio, Speech and Language Processing, vol. 20, no. 7, pp. 2080–2094, 2012. View at Publisher · View at Google Scholar · View at Scopus
  4. R. Li, C. Bao, B. Xia, and M. Jia, “Speech enhancement using the combination of adaptive wavelet threshold and spectral subtraction based on wavelet packet decomposition,” in Proceedings of the 2012 11th International Conference on Signal Processing, ICSP 2012, pp. 481–484, chn, October 2012. View at Scopus
  5. R. Aggarwal, J. Karan Singh, V. Kumar Gupta, S. Rathore, M. Tiwari, and A. Khare, “Noise Reduction of Speech Signal using Wavelet Transform with Modified Universal Threshold,” International Journal of Computer Applications, vol. 20, no. 5, pp. 14–19, 2011. View at Publisher · View at Google Scholar
  6. Y. Shao and C.-H. Chang, “A Kalman filter based on wavelet filter-bank and psychoacoustic modeling for speech enhancement,” in Proceedings of the ISCAS 2006: 2006 IEEE International Symposium on Circuits and Systems, pp. 121–124, grc, May 2006. View at Scopus
  7. R. Dhivya and J. Justin, “A novel speech enhancement technique,” International Journal of Research in Engineering and Technology, vol. 3, no. 19, pp. 98–102, 2014. View at Publisher · View at Google Scholar
  8. M. Fujimoto and Y. Ariki, “Noisy speech recognition using noise reduction method based on Kalman filter,” in Proceedings of the 25th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2000, vol. 3, pp. 1727–1730, June 2000. View at Scopus
  9. S. V. Vaseghi, Advanced Digital Signal Processing and Noise Reduction, John Wiley & Sons, 2008.
  10. R. G. Brown and P. Y. C. Hwang, Introduction to random signals and applied Kalman filtering: with MATLAB exercises and solutions, Wiley, New York, NY, USA, 1997.
  11. J. R. Deller, J. G. Proakis, and J. H. Hansen, Discrete Time Processing of Speech Signals, Prentice Hall PTR, 1993.
  12. L. R. Rabiner and R. W. Schafer, Digital Processing of Speech Signals, Prentice Hall, 1978.