A New Speech Enhancement Technique Based on Stationary Bionic Wavelet Transform and MMSE Estimate of Spectral Amplitude

Talbi, Mourad; Bouhlel, Med Salim

doi:https://doi.org/10.1155/2021/9968275

Security and Communication Networks

On this page

Abstract Introduction Materials and Methods Results and Discussion Conclusion Conflicts of Interest References Copyright Related Articles

Special Issue

Application-Aware Multimedia Security Techniques

View this Special Issue

Review Article | Open Access

Volume 2021 | Article ID 9968275 | https://doi.org/10.1155/2021/9968275

A New Speech Enhancement Technique Based on Stationary Bionic Wavelet Transform and MMSE Estimate of Spectral Amplitude

Mourad Talbi¹and Med Salim Bouhlel²

Academic Editor: Manjit Kaur

Received02 Apr 2021

Revised26 Apr 2021

Accepted10 Oct 2021

Published24 Dec 2021

Abstract

Speech enhancement has gained considerable attention in the employment of speech transmission via the communication channel, speaker identification, speech-based biometric systems, video conference, hearing aids, mobile phones, voice conversion, microphones, and so on. The background noise processing is needed for designing a successful speech enhancement system. In this work, a new speech enhancement technique based on Stationary Bionic Wavelet Transform (SBWT) and Minimum Mean Square Error (MMSE) Estimate of Spectral Amplitude is proposed. This technique consists at the first step in applying the SBWT to the noisy speech signal, in order to obtain eight noisy wavelet coefficients. The denoising of each of those coefficients is performed through the application of the denoising method based on MMSE Estimate of Spectral Amplitude. The SBWT inverse, , is applied to the obtained denoised stationary wavelet coefficients for finally obtaining the enhanced speech signal. The proposed technique’s performance is proved by the calculation of the Signal to Noise Ratio (SNR), the Segmental SNR (SSNR), and the Perceptual Evaluation of Speech Quality (PESQ).

1. Introduction

In many speech-related applications, an input speech signal is frequently corrupted by environmental noise and needs further processing using a speech enhancement technique for ameliorating the associated quality before being employed [1]. Generally, speech enhancement techniques can be grouped into two groups which are supervised and unsupervised. Unsupervised techniques include spectral subtraction (SS) [2–4], Wiener filtering [5, 6], short-time spectral amplitude (STSA) estimation [7], and short-time log-spectral amplitude estimation (logSTSA) [8]. Concerning the supervised speech enhancement techniques, they employ a training set for learning diverse models for noisy and clean speech signals, and examples include codebook-based methods [9] and Hidden Markov Model (HMM)-based techniques [10]. Classical speech enhancement techniques are frequently processing a noisy utterance in a frame-wise way, that is, for enhancing each short-time period of the utterance nearly in independent manner. Some research works showed that considering the inter-frame variation over a relatively long span of time can contribute to superior performance in enhancing speech [1]. Famous approaches along this direction include modulation-domain spectral subtraction [11], Kalman filtering, and modulation-domain Wiener filtering [12, 13]. Moreover, when we compare the discrete wavelet transform (DWT) to the Fourier transform (FT) where only the frequency parts are taken into consideration, though, in the expression of the DWT [14], both temporal and frequency characteristics of the signal to be analyzed are taken into consideration. The DWT has become a well-known method in speech analysis. In Wavelet Thresholding Denoising (WTD) [15], the wavelet transform is applied for splitting the time-domain signal into sub-bands. After that, thresholding of the obtained wavelet coefficients (sub-bands) is performed. In [16], the DWT [17, 18] was applied to the speech signal to simply conserve the obtained approximation portion, which simultaneously attains data compression and noise robustness in recognition. In [1], the DWT was employed for analyzing the spectrogram of a noisy utterance along the temporal axis, and then the resulting detail portion was devalued with an expect of reducing noise effect in order to promote speech quality. Despite the ease of its implementation, the preliminary evaluation results indicate that the technique proposed in [1] permits to have input signals with better perceptual quality. It was proved that this technique [1] can be paired with many well-known speech enhancement approaches for achieving even better performance [1]. In this work, a novel speech enhancement technique based on the Stationary Bionic Wavelet Transform (SBWT) [19–21] and Minimum Mean Square Error (MMSE) Estimate of Spectral Amplitude [22] is proposed. In this paper, this approach is evaluated and compared to four other speech enhancement approaches which are as follows:(i)Unsupervised speech denoising via perceptually motivated robust principal component analysis [23].(ii)The speech enhancement technique based on MSS-SMPO [24, 25].(iii)The denoising technique based on MMSE Estimate of Spectral Amplitude [22].(iv)Our previous speech enhancement technique based on LWT and Artificial Neural Network (ANN) and using MMSE Estimate of Spectral Amplitude [26].

The fourth technique which is based on LWT and ANN [27–29] and uses MMSE Estimate of Spectral Amplitude [26] can be summarized by the following steps:(i)First step: applying the LWT to the noisy speech signal for obtaining two noisy details coefficients, and , and one approximation coefficient, .(ii)Second step: denoising cD1 and cD2 by soft thresholding, and for their thresholding, suitable thresholds, , have to be used. Those thresholds are determined by using an Artificial Neural Network (ANN). This soft thresholding is performed for having two denoised coefficients, and .(iii)Third step: applying the denoising approach based on MMSE Estimate of Spectral Amplitude [22] to for obtaining a denoised coefficient, .(iv)Fourth step: applying the inverse of , to , , and , for finally obtaining the enhanced signal.

As a future work, we will develop a novel speech enhancement approach using ANN [30–36] or deep learning [37, 38] for thresholding the noisy stationary bionic wavelet coefficients. Those coefficients are obtained by applying the to the noisy speech signal.

In Section 2 of this paper, materials and methods are presented. Section 2.4 describes the speech enhancement technique proposed in this work. In Section 3, results and discussion are presented. Finally, Section 4 concludes the paper.

2. Materials and Methods

2.1. The Stationary Bionic Wavelet Transform ()

In [19], the has been proposed as a novel wavelet transform. This transform was initially introduced for solving the problem of perfect reconstruction that exists with the Bionic Wavelet Transform (BWT). Its application was performed for speech enhancement [19, 20] and also for ECG denoising [21].

2.2. The MMSE Estimate of Spectral Amplitude

In the literature, it was proposed to estimate the noise power spectral density employing MMSE (Minimum Mean Square Error) optimal estimation [22]. It was proved that the obtained estimator can be considered as a VAD (Voice Activity Detector)-based noise power estimator, and the noise power is updated alone if speech absence is detected, compensated with a required bias compensation [22]. It was proved that the bias compensation is not needed if the VAD is substituted by a soft SPP (Speech Presence Probability) with fixed priors [22]. When choosing fixed priors, this has the benefit of decoupling the noise power estimator from subsequent steps in a speech enhancement algorithm, such as the estimation of the speech power and that of the clean speech [22]. Gerkmann and Richard [22] proved that the proposed SPP approach permits to maintain the quick noise tracking performance of the bias-compensated MMSE-based technique while exhibiting less overestimation of the spectral noise power and an even lower complexity of calculation.

2.3. Signal Model

In [22], Gerkmann and Richard considered frame-by-frame processing of time-domain signals where the Discrete Fourier Transform (DFT) is applied to these frames. Let the complex spectral noise and speech coefficients be given, respectively, by and , where is the time frame index and is the frequency bin index [22]. In [22], it was assumed that in the short-time Fourier domain, both noise and speech signals tend to be additive. Therefore, the complex spectral noisy observation has the following expression:

In [22], it was supposed that the noise and speech signals own zero mean and are independent so thatwhere E(∙)denotes the statistical expectation operator.

The spectral noise and speech power are expressed as follows:

Then, both a posteriori SNR and a priori SNR are expressed as follows:

All details about MMSE-based noise power estimation are given in [22].

2.4. The Proposed Speech Enhancement Technique

The speech enhancement technique introduced in this work is based on the SBWT [19–21] and the MMSE Estimate of Spectral Amplitude [22]. The novelty of this approach consists in applying the speech enhancement method based on Estimate of Spectral Amplitude [1, 22] in the SBWT domain. In fact, this technique [22] is applied to each noisy stationary bionic wavelet coefficient for its denoising. Those noisy coefficients are obtained by applying the SBWT to the noisy speech signal. Then, the inverse of SBWT () is applied to the obtained denoised coefficients in order to obtain finally the enhanced speech signal. Figure 1 illustrates the flowchart of this proposed technique.

According to Figure 1, the first step of the proposed approach is to apply the to the noisy speech signal for obtaining eight noisy stationary bionic wavelet coefficients. Those coefficients are named , and each of them is denoised by the speech enhancement technique based on Estimate of Spectral Amplitude [1, 22]. and we obtain eight denoised coefficients, (Figure 1). In those coefficients, inverse is applied for SBWT (SBWT-1) in order to obtain the enhanced signal finally.

2.5. Minimum Mean Square Error () Estimate of Spectral Amplitude in the Domain

In general, classical speech enhancement approaches based on thresholding in the wavelet transform domain can introduce some distortions to the original speech signal. This particularly occurs for the unvoiced sounds. Consequently, a great number of speech enhancement techniques based on wavelet transforms are employing other tools such as spectral subtraction (SS), Wiener filtering, and MMSE-STSA estimation [39, 40]. This is the reason why we apply the Minimum Mean Square Error (MMSE) Estimate of Spectral Amplitude in the domain in our speech enhancement system. The application of the permits to solve the problem of the perfect reconstruction existing when we apply the BWT [19]. Furthermore, the SBWT among all wavelet transforms [41, 42] tends to uncorrelated data [43] and facilitates the noise suppression. The fact that the Minimum Mean Square Error (MMSE) Estimate of Spectral Amplitude [22] is applied to each noisy stationary bionic coefficient permits to have a better adaptation for speech and noise estimations compared to the application of this technique [22] to the whole noisy speech signal.

2.6. Unsupervised Speech Denoising via Perceptually Motivated Robust Principal Component Analysis [23]

To overcome the shortcomings in the existing sparse and low-rank speech denoising technique that the auditory perceptual properties are not fully exploited and the speech degradation is simply perceived, a perceptually motivated robust principal component analysis (ISNRPCA) technique was presented. In order to reflect the non-linear property for frequency perception of the basilar membrane, cochleagram is employed as inputs of . The latter employs the perceptually meaningful Itakura–Saito measure as its optimization objective function. Furthermore, non-negative constraints are also compulsory for regularizing the decomposed terms with respect to their physical meaning [23]. In [23], Min et al. proposed an alternating direction technique of multipliers (ADMM) for solving the optimization problem of ISNRPCA. The latter is completely unsupervised, and neither the noise nor the speech model requires to be trained beforehand. Experimental results under diverse kinds of noise and different SNRs prove that the ISNRPCA is showing promising results for speech denoising [23].

2.7. The Speech Enhancement Technique Based on MSS-SMPO [25]

In [25], a two-step enhancement technique based on spectral subtraction and phase spectrum compensation was presented for noisy speeches in diverse environments requiring non-stationary noise and medium to low levels of SNR. In the first step of the technique proposed in [25], the magnitude of the noisy speech spectrum is modified by a spectral subtraction technique, where a noise estimation approach was introduced. The latter is based on the low-frequency information of the noisy speech. This noise estimation technique is able to estimate precisely the non-stationary noise. In the second step, the phase spectrum of the noisy speech is modified consisting of phase spectrum compensation, where an SNR-dependent technique is incorporated for determining the amount of compensation to be compulsory on the phase spectrum [25]. A modified complex spectrum is obtained by aggregating the magnitude from the step of spectral subtraction and the modified phase spectrum from the step of phase compensation, which is found to be a better representation of enhanced speech spectrum.

3. Results and Discussion

In this work, the evaluation of the proposed technique is performed by its application to ten Arabic speech sentences pronounced by a male speaker and ten others by a female speaker (Table 1). Those speech signals are degraded in artificial manner by an additive noise at different values of (before denoising). In order to corrupt those speech signals (Table 1), we have chosen four kinds of noise which are white Gaussian, car, F16, and tank noises. Those twenty speech signals are sampled at and are listed in Table 1.

Also, for evaluating the proposed technique, it is compared with other three speech enhancement approaches which are as follows:(i)The denoising approach based on MMSE Estimate of Spectral Amplitude [22].(ii)The unsupervised speech denoising technique via perceptually motivated robust principal component analysis [23].(iii)The speech enhancement approach based on MSS-SMPO [24].

This evaluation is performed through the computations of the SNR (Signal to Noise Ratio), the Segmental SNR (SSNR), and the PESQ (Perceptual Evaluation of Speech Quality). The results obtained from these computations are presented in Tables 2–16.

According to these tables, the best results are the values in italics and they are practically obtained from the application of the proposed technique. Therefore, this technique outperforms the other speech enhancement approaches [22–25] applied for this evaluation.

Figure 2 illustrates an example of speech enhancement applying the proposed technique to the clean speech signal (Figure 2(a)) corrupted in additive manner by a car noise (Volvo) with (Figure 2(b)). According to this figure, this technique permits to considerably reduce noise and to obtain an enhanced speech signal (Figure 2(c)) with little distortions despite the fact that the value of the SNR is low (0 dB). Figure 3 illustrates the spectrograms of the clean, noisy, and enhanced speech signals.

(a)

(b)

(c)

(a)

(b)

(c)

The spectrogram in Figure 3(b) shows that the type of noise corrupting the speech signal is localized in low-frequency parts. The spectrogram in Figure 3(c) shows that the car noise is considerably reduced by using the proposed speech enhancement technique. Moreover, this technique permits to have an enhanced speech signal with low distortions compared to the clean speech signal (Figure 2(a)).

In the following, we will compare the proposed technique with our previous speech enhancement approach which is based on LWT and ANN and uses MMSE [26]. The first difference between the speech enhancement technique proposed in this work and our previous approach is that they use two completely different wavelet transforms which are the for the technique proposed in this paper and the LWT for our previous approach proposed in [26]. The second difference between these two techniques is that the denoising approach based on MMSE Estimate of Spectral Amplitude is applied [22] to all stationary bionic wavelet coefficients for the technique proposed in this paper. However, we apply this approach [22] only to the approximation coefficient for our previous speech enhancement technique proposed in [26]. The latter also uses an Artificial Neural Network (ANN), and this fact differentiates this technique [26] from our technique proposed in this paper. The comparison of these two techniques is also in terms of SNR, SSNR, and PESQ. These two techniques are applied to a speech signal degraded by a car noise with diverse values of SNR before denoising (). Tables 17–19 present the results obtained from the computation of SNR, SSNR, and PESQ for the two techniques.

According to these tables, the best results are the values in italics and they are obtained from the application of the proposed technique. Therefore, this technique outperforms the other speech enhancement approach proposed in [26].

4. Conclusion

In this paper, we propose a new speech enhancement technique based on and Estimate of Spectral Amplitude. In the first step of this technique, the SBWT is applied to the noisy speech signal for obtaining eight noisy stationary bionic wavelet coefficients. The denoising of each of those coefficients is performed through the application of the denoising approach based on MMSE Estimate of Spectral Amplitude. Finally, the inverse of is applied to the obtained stationary wavelet coefficients, for obtaining the enhanced speech signal. An evaluation of this technique is performed by its comparison with four other speech enhancement approaches where the first one is the denoising technique based on MMSE Estimate of Spectral Amplitude. The second one is the speech enhancement technique based on MSS-SMPO. The third one is the unsupervised speech denoising approach through perceptually motivated robust principal component analysis. The fourth one is the speech enhancement technique based on and ANN and using Estimate of spectral amplitude. This evaluation is performed through the computations of Signal to Noise Ratio (SNR), the Segmental SNR (SSNR), and the Perceptual Evaluation of Speech Quality (PESQ). The results obtained from these computations show that the proposed technique outperforms the other previously mentioned techniques. Furthermore, the technique proposed in this work permits to considerably reduce the noises corrupting the clean speech signal and to have an enhanced speech signal with good perceptual quality.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

References

S.-K. Lee, S.-S. Wang, T. Yu, and J.-W. Hung, “Speech enhancement based on reducing the detail portion of speech spectrograms in modulation domain via Discrete wavelet transform,” in Proceedings of the 2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP), Taipei City, Taiwan, November 2018.
View at: Publisher Site | Google Scholar
S. Boll, “Suppression of acoustic noise in speech using spectral subtraction,” IEEE Transactions on Acoustics, Speech, & Signal Processing, vol. 27, no. 2, Article ID 113120, 1979.
View at: Publisher Site | Google Scholar
M. Berouti, R. Schwartz, and J. Makhoul, “Enhancement of speech corrupted by acoustic noise,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 208–211, Washington, D. C., USA, April 1979.
View at: Google Scholar
S. Kamath and P. Loizou, “A multi-band spectral subtraction method for enhancing speech corrupted by colored noise,” in Proceedings of the IEEE International Conference on Acoustics Speech and Signal Processing, Orlando, Florida, USA, May 2002.
View at: Publisher Site | Google Scholar
C. Plapous, C. Marro, and P. Scalart, “Improved signal-to-noise ratio estimation for speech enhancement,” IEEE Transactions on Audio Speech and Language Processing, vol. 14, no. 6, Article ID 20982108, 2006.
View at: Publisher Site | Google Scholar
P. Scalart and J. V. Filho, “Speech enhancement based on a priori signal to noise estimation,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 629–632, Atlanta, GA, USA, June 1996.
View at: Publisher Site | Google Scholar
Y. Ephraim and D. Malah, “Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator,” IEEE Transactions on Acoustics, Speech, & Signal Processing, vol. 32, no. 6, Article ID 11091121, 1984.
View at: Google Scholar
Y. Ephraim and D. Malah, “Speech enhancement using a minimum mean-square error log-spectral amplitude estimator,” IEEE Transactions on Acoustics, Speech, & Signal Processing, vol. 33, no. 2, pp. 443–445, 1985.
View at: Publisher Site | Google Scholar
S. Srinivasan, J. Samuelsson, and W. Kleijn, “Codebook driven short-term predictor parameter estimation for speech enhancement,” IEEE Transactions on Audio Speech and Language Processing, vol. 14, no. 1, Article ID 163176, 2006.
View at: Publisher Site | Google Scholar
D. Y. Zhao and W. B. Kleijn, “HMM-based gain modeling for enhancement of speech in noise,” IEEE Transactions on Audio Speech and Language Processing, vol. 15, no. 3, Article ID 882892, 2007.
View at: Publisher Site | Google Scholar
K. K. Paliwal, K. K. Wojcicki, and B. Schwerin, “Single-channel speech enhancement using spectral subtraction in the short-time modulation domain,” Speech Communication, vol. 52, no. 5, pp. 450–475, 2010.
View at: Publisher Site | Google Scholar
C.-C. Hsu, K.-M. Cheong, J.-T. Chien, and T.-S. Chi, “Modulation Wiener filter for improving speech intelligibility,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 370–374, Queensland, Australia, April 2015.
View at: Publisher Site | Google Scholar
S. So and K. K. Paliwal, “Modulation-domain Kalman filtering for single-channel speech enhancement,” Speech Communication, vol. 53, no. 6, pp. 818–829, 2011.
View at: Publisher Site | Google Scholar
O. Rioul and M. Vettertui, Wavelets and Signal Processing, Springer, Berlin Heidelberg, Germany, 1991.
S. G. Chang, B. Bin Yu, and M. Vetterli, “Adaptive wavelet thresholding for image denoising and compression,” IEEE Transactions on Image Processing, vol. 9, no. 9, pp. 1532–1546, 2000.
View at: Publisher Site | Google Scholar
S.-S. Wang, P. Lin, Y. Tsao, J.-W. Hung, and B. Su, “Suppression by selecting wavelets for feature compression in distributed speech recognition,” IEEE/ACM Trans. on Audio, Speech, and Language Processing, vol. 26, no. 3, pp. 564–579, 2018.
View at: Publisher Site | Google Scholar
D. Huang, K. Lanyan, B. Mi, G. Wei, J. Wang, and S. Wan, “A cooperative denoising algorithm with interactive dynamic adjustment function for security of stacker in industrial internet of things,” Hindawi, Security and Communication Networks, vol. 2019, Article ID 4049765, 16 pages, 2019.
View at: Publisher Site | Google Scholar
M. Ali Nematollahi, C. Vorakulpipat, and H. G. Rosales, “Optimization of a blind speech watermarking technique against amplitude scaling,” Hindawi, Security and Communication Networks, vol. 2017, Article ID 5454768, 13 pages, 2017.
View at: Publisher Site | Google Scholar
T. Mourad, “Speech enhancement based on stationary bionic wavelet transform and maximum a posterior estimator of magnitude-squared spectrum,” International Journal of Speech Technology, vol. 20, no. 1, pp. 75–88, 2017.
View at: Publisher Site | Google Scholar
M. Talbi and M. S. Bouhlel, “A novel approach of speech enhancement based on SBWT and MMSE estimate of spectral amplitude,” in Proceedings of the 2020 4th International Conference on Advanced Systems and Emergent Technologies (IC_ASET), Hammamet, Tunisia, March 2020.
View at: Publisher Site | Google Scholar
M. Talbi, “New approach of ECG denoising based on 1-D double-density complex DWT and SBWT,” Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization, vol. 8, no. 6, pp. 608–620, 2020.
View at: Publisher Site | Google Scholar
T. Gerkmann and C. H. Richard, “Unbiased MMSE-based noise power estimation with low complexity and low tracking delay,” IEEE Transactions on Audio Speech and Language Processing, vol. 20, no. 4, pp. 1383–1393, 2012.
View at: Publisher Site | Google Scholar
G. Min, X. Zou, W. Han, X. Zhang, and W. Tan, “Unsupervised speech denoising via perceptually motivated robust principal component analysis,” Shengxue Xuebao/Acta Acustica, vol. 42, no. 2, pp. 246–256, 2017.
View at: Google Scholar
Y. Lu and P. C. Loizou, “Estimators of the magnitude-squared spectrum and methods for incorporating SNR uncertainty,” IEEE Transactions on Audio Speech and Language Processing, vol. 19, no. 5, pp. 1123–1137, 2011.
View at: Publisher Site | Google Scholar
M. T. Islam, A. Asaduzzaman, C. Shahnaz, W. P. Zhu, and M. O. Ahmad, “Speech enhancement in adverse environments based on non-stationary noise-driven SpectralSubtraction and SNR-dependent phase compensation,” 2018, arXiv preprint https://arxiv.org/abs/1803.00396.
View at: Google Scholar
M. Talbi, R. Baazaoui, and M. Salim Bouhlel, “Speech enhancement based on LWT and artificial neural Network and using MMSE estimate of spectral amplitude,” Deep Learning Applications, 2021.
View at: Publisher Site | Google Scholar
T. Chen, N. Kapron, and J. C.-Y. Chen, “Using evolving ANN-based algorithm models for accurate meteorological forecasting applications in vietnam,” Hindawi, Mathematical Problems in Engineering, vol. 2020, Article ID 8179652, 8 pages, 2020.
View at: Publisher Site | Google Scholar
E. Vilavicencio-Arcadia, S. G. Navarro, S. G. Navarro et al., “Application of artificial neural networks for the automatic spectral classification,” Hindawi Mathematical Problems in Engineering, vol. 2020, Article ID 1751932, 15 pages, 2020.
View at: Publisher Site | Google Scholar
K.-C. Yang, C. Yang, P.-Y. Chao, and Po-H. Shih, “Applying artificial neural Network to predict semiconductor machine outliers,” Hindawi Publishing Corporation Mathematical Problems in Engineering, vol. 2013, Article ID 210740, 10 pages, 2013.
View at: Publisher Site | Google Scholar
B. Ramesh Murlidhar, R. K. Sinha, E. T. Mohamad, R. Sonkar, and M. Khorami, “The effects of particle swarm optimisation and genetic algorithm on ANN results in predicting pile bearing capacity,” International Journal of Hydromechatronics, vol. 3, no. 1, p. 69, 2020.
View at: Publisher Site | Google Scholar
M. Safa, M. Ahmadi, J. Mehrmashadi et al., “Selection of the most influential parameters on vectorial crystal growth of highly oriented vertically aligned carbon nanotubes by adaptive neuro-fuzzy technique,” International Journal of Hydromechatronics, vol. 3, no. 3, p. 238, 2020.
View at: Publisher Site | Google Scholar
C. Zhu, W. Yan, X. Cai, S. Liu, T. H. Li, and G. Li, “Neural saliency algorithm guide bi-directional visual perception style transfer,” CAAI Transactions on Intelligence Technology, vol. 5, no. 1, pp. 1–8, 2020.
View at: Publisher Site | Google Scholar
T. Sangeetha and G. Mary Amalanathan, “Outlier detection in neutrosophic sets by using rough entropy based weighted density method,” CAAI Transactions on Intelligence Technology, vol. 5, no. 2, pp. 121–127, 2020.
View at: Publisher Site | Google Scholar
Z. Ali and T. Mahmood, “Complex neutrosophic generalised dice similarity measures and their application to decision making,” CAAI Transactions on Intelligence Technology, vol. 5, no. 2, pp. 78–87, 2020.
View at: Publisher Site | Google Scholar
T. Goehring, F. Bolner, J. J. M. Monaghan et al., “Speech enhancement based on neural networks improves speech intelligibility in noise for cochlear implant users,” Hearing Research, vol. 344, pp. 183–194, 2017.
View at: Publisher Site | Google Scholar
R. Birok, R. Kapoor, and M. Singh Choudhry, “ECG denoising using artificial neural networks and complete ensemble empirical mode decomposition,” Turkish Journal of Computer and Mathematics Education, vol. 12, no. 2, pp. 2382–2389, 2021.
View at: Publisher Site | Google Scholar
J. Llombart, D. Ribas, A. Miguel, L. Vicente, A. Ortega, and E. Lleida, “Progressive loss functions for speech enhancement with deep neural networks,” EURASIP Journal on Audio, Speech, and Music Processing, vol. 2021, no. 1, 2021.
View at: Publisher Site | Google Scholar
P. Karjol, M. Ajay Kumar, and P. K. Ghosh, “Speech Enhancement Using Multiple Deep Neural Networks,” in Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Canada, April 2018.
View at: Publisher Site | Google Scholar
H. Tasmaz and E. Erc¸elebi, “Speech enhancement based on undecimated wavelet packet-perceptual filterbanks and MMSE– STSA estimation in various noise environments,” Digital Signal Processing, vol. 18, no. 5, pp. 797–812, 2008.
View at: Publisher Site | Google Scholar
Y. Ephraim and D. Malah, “Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator,” IEEE Transactions on Acoustics, Speech, & Signal Processing, vol. 32, no. 6, pp. 1109–1121, 1984.
View at: Publisher Site | Google Scholar
A. Biswas, P. K. Sahu, A. Bhowmick, and M. Chandra, “Feature extraction technique using ERB like wavelet sub-band periodic and aperiodic decomposition for TIMIT phoneme recognition,” International Journal of Speech Technology, vol. 17, no. 4, pp. 389–399, 2014.
View at: Publisher Site | Google Scholar
S. Singh and A. M. Mutawa, “A wavelet-based transform method for quality improvement in noisy speech patterns of Arabic language,” International Journal of Speech Technology, vol. 19, no. 4, pp. 677–685, 2016.
View at: Publisher Site | Google Scholar
M. Bahoura and J. Rouat, “Wavelet speech enhancement based on time-scale adaptation,” Speech Communication, vol. 48, no. 12, pp. 1620–1637, 2006.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2021 Mourad Talbi and Med Salim Bouhlel. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

747

Downloads

607

Citations