Table of Contents
ISRN Signal Processing
Volume 2014, Article ID 357048, 13 pages
http://dx.doi.org/10.1155/2014/357048
Research Article

Complex Cepstrum Based Voice Conversion Using Radial Basis Function

1Department of Electronics Engineering, SVNIT, Surat, India
2Department of Computer Engineering, SVNIT, Surat, India

Received 30 November 2013; Accepted 26 December 2013; Published 6 February 2014

Academic Editors: K. S. Chuang and A. Maier

Copyright © 2014 Jagannath Nirmal et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. H. Kuwabara and Y. Sagisak, “Acoustic characteristics of speaker individuality: control and conversion,” Speech Communication, vol. 16, no. 2, pp. 165–173, 1995. View at Google Scholar · View at Scopus
  2. K.-S. Lee, “Statistical approach for voice personality transformation,” IEEE Transactions on Audio, Speech and Language Processing, vol. 15, no. 2, pp. 641–651, 2007. View at Publisher · View at Google Scholar · View at Scopus
  3. A. Kain and M. W. Macon, “Design and evaluation of a voice conversion algorithm based on spectral envelope mapping and residual prediction,” in Proceedings of the IEEE Interntional Conference on Acoustics, Speech, and Signal Processing, vol. 2, pp. 813–816, May 2001. View at Scopus
  4. D. Sundermann, “Voice conversion: state-of-the-art and future work,” in Proceedings of the 31st German Annual Conference on Acoustics (DAGA '01), Munich, Germany, 2001.
  5. D. G. Childers, B. Yegnanarayana, and W. Ke, “Voice conversion: factors responsible for quality,” in Proceeding of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '85), vol. 1, pp. 748–751, Tampa, Fla, USA, 1985.
  6. M. Abe, S. Nakanura, K. Shikano, and H. Kuwabara, “Voice conversion through vector quantization,” in Proceeding of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 655–658, 1988.
  7. W. Verhelst and J. Mertens, “Voice conversion using partitions of spectra feature space,” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '96), pp. 365–368, May 1996. View at Scopus
  8. N. Iwahashi and Y. Sagisaka, “Speech spectrum conversion based on speaker interpolation and multi-functional representation with weighting by radial basis function networks,” Speech Communication, vol. 16, no. 2, pp. 139–151, 1995. View at Google Scholar · View at Scopus
  9. Y. Stylianou, O. Cappé, and E. Moulines, “Continuous probabilistic transform for voice conversion,” IEEE Transactions on Speech and Audio Processing, vol. 6, no. 2, pp. 131–142, 1998. View at Publisher · View at Google Scholar · View at Scopus
  10. S. Desai, A. W. Black, B. Yegnanarayana, and K. Prahallad, “Spectral mapping using artificial neural networks for voice conversion,” IEEE Transactions on Audio, Speech and Language Processing, vol. 18, no. 5, pp. 954–964, 2010. View at Publisher · View at Google Scholar · View at Scopus
  11. C. Orphanidou, I. M. Moroz, and S. J. Roberts, “Wavelet-based voice morphing,” WSEAS Journal of Systems, vol. 10, no. 3, pp. 3297–3302, 2004. View at Google Scholar
  12. E. Helander, T. Virtanen, N. Jani, and M. Gabbouj, “Voice conversion using partial least squares regression,” IEEE Transcation on Audio, Speech, Language Processing, vol. 18, no. 5, pp. 912–921, 2010. View at Google Scholar
  13. L. M. Arslan, “Speaker transformation algorithm using segmental codebooks (STASC),” Speech Communication, vol. 28, no. 3, pp. 211–226, 1999. View at Publisher · View at Google Scholar · View at Scopus
  14. K. S. Rao, “Voice conversion by mapping the speaker-specific features using pitch synchronous approach,” Computer Speech and Language, vol. 24, no. 3, pp. 474–494, 2010. View at Publisher · View at Google Scholar · View at Scopus
  15. S. Hayakawa and F. Itakura, “Text-dependent speaker recognition using the information in the higher frequency band,” in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP '94), pp. 137–140, Adelaide, Australia, 1994. View at Scopus
  16. H. Kawahara, I. Masuda-Katsuse, and A. de Cheveigné, “Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds,” Speech Communication, vol. 27, no. 3, pp. 187–207, 1999. View at Publisher · View at Google Scholar · View at Scopus
  17. R. J. McAulay and T. F. Quatieri, “Phase modelling and its application to sinusoidal transform coding,” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '86), vol. 1, pp. 1713–1716, Tokyo, Japan, 1986. View at Publisher · View at Google Scholar
  18. J. Nirmal, P. Kachare, S. Patnaik, and M. Zaveri, “Cepstrum liftering based voice conversion using RBF and GMM,” in Proceedings of the IEEE International Conference on Communications and Signal Processing (ICCSP '13), pp. 570–575, April 2013.
  19. A. V. Oppenheim, “Speech analysis and synthesis system based on homomorphic filtering,” The Journal of the Acoustical Society of America, vol. 45, no. 2, pp. 458–465, 1969. View at Google Scholar · View at Scopus
  20. W. Verhelst and O. Steenhaut, “A new model for the short-time complex cepstrum of voiced speech,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 34, no. 1, pp. 43–51, 1986. View at Google Scholar · View at Scopus
  21. H. Deng, R. K. Ward, M. P. Beddoes, and M. Hodgson, “A new method for obtaining accurate estimates of vocal-tract filters and glottal waves from vowel sounds,” IEEE Transactions on Audio, Speech and Language Processing, vol. 14, no. 2, pp. 445–455, 2006. View at Publisher · View at Google Scholar · View at Scopus
  22. T. Drugman, B. Bozkurt, and T. Dutoit, “Complex cepstrum-based decomposition of speech for glottal source estimation,” in Proceedings of the 10th Annual Conference of the International Speech Communication Association (INTERSPEECH '09), pp. 116–119, Brighton, UK, September 2009. View at Scopus
  23. T. F. Quatieri Jr., “Minimum and mixed phase speech analysis-synthesis by adaptive homomorphic deconvolution,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 27, no. 4, pp. 328–335, 1979. View at Google Scholar · View at Scopus
  24. T. Drugman, B. Bozkurt, and T. Dutoit, “Causal-anticausal decomposition of speech using complex cepstrum for glottal source estimation,” Speech Communication, vol. 53, no. 6, pp. 855–866, 2011. View at Publisher · View at Google Scholar · View at Scopus
  25. M. Vondra and R. Vích, “Speechmodeling using the complex cepstrum,” in Toward Autonomous, Adaptive, and Context-Aware Ultimodal Interfaces: Theoretical and Practical, Issues, vol. 6456 of Lecture Notes in Computer Science, pp. 324–330, 2011. View at Publisher · View at Google Scholar
  26. R. Maia, M. Akamine, and M. Gales, “Complex cepstrum as phase information in statistical parametric speech synthesis,” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '12), pp. 4581–4584, 2012.
  27. K. Shikano, S. Nakamura, and M. Abe, “Speaker adaptation and voice conversion by codebook mapping,” in Proceedings of the IEEE International Symposium on Circuits and Systems, pp. 594–597, June 1991. View at Scopus
  28. T. Toda, H. Saruwatari, and K. Shikano, “Voice conversion algorithm based on Gaussian mixture model with dynamic frequency warping of straight spectrum,” in Proceedings of the IEEE Interntional Conference on Acoustics, Speech, and Signal Processing, pp. 841–844, May 2001. View at Scopus
  29. H. Ye and S. Young, “High quality voice morphing,” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. I9–I12, May 2004. View at Scopus
  30. R. Laskar, K. Banerjee, F. Talukdar, and K. Sreenivasa Rao, “A pitch synchronous approach to design voice conversion system using source-filter correlation,” International Journal of Speech Technology, vol. 15, pp. 419–431, 2012. View at Google Scholar
  31. D. Sündermann, A. Bonafonte, H. Ney, and H. Höge, “A study on residual prediction techniques for voice conversion,” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '05), pp. I13–I16, March 2005. View at Publisher · View at Google Scholar · View at Scopus
  32. K. S. Rao, R. H. Laskar, and S. G. Koolagudi, “Voice transformation by mapping the features at syllable level,” in Pattern Recognition and Machine Intelligence, vol. 4815 of Lecture Notes in Computer Science, pp. 479–486, Springer, 2007. View at Publisher · View at Google Scholar
  33. http://sp-tk.sourceforge.net/.
  34. S. Imai, “Cepstral analysis and synthesis on the mel-frequency scale,” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '83), pp. 93–96, Boston, Mass, USA, 1983. View at Publisher · View at Google Scholar
  35. K. Tokuda, T. Kobayashi, T. Masuko, and S. Imai, “Mel-generalized cepstral analysis—a unified approach to speech spectral estimation,” in Proceedings of the International Conference on Spoken Language Processing (ICSLP '94), pp. 1043–1046, 1994. View at Scopus
  36. J. Kominek and A. W. Black, “CMU ARCTIC speech databases,” in Proceedings of the 5th ISCA Speech Synthesis Workshop, pp. 223–224, Pittsburgh, Pa, USA, June 2004.