Table of Contents Author Guidelines Submit a Manuscript
The Scientific World Journal
Volume 2014 (2014), Article ID 628516, 9 pages
http://dx.doi.org/10.1155/2014/628516
Research Article

Cost-Sensitive Learning for Emotion Robust Speaker Recognition

1School of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
2Department of Computer Science and Technology, Zhejiang University, No. 38, Yuquan Road, Zhejiang 310027, China
3School of Management, Fudan University, No. 220, Handan Road, Shanghai 200433, China

Received 19 April 2014; Accepted 22 April 2014; Published 4 June 2014

Academic Editor: Yu-Bo Yuan

Copyright © 2014 Dongdong Li et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. O. A. Esan, S. M. Ngwira, and I. O. Osunmakinde, “Bimodal biometrics for financial infrastructure security,” in Proceedings of the Information Security for South Africa, pp. 1–8, IEEE, August 2013. View at Publisher · View at Google Scholar
  2. S. Rane, W. Ye, S. C. Draper, and P. Ishwar, “Secure biometrics: concepts, authentication architectures, and challenges,” IEEE Signal Processing Magazine, vol. 30, no. 5, pp. 51–64, 2013. View at Publisher · View at Google Scholar
  3. A. Alarifi, I. Alkurtass, and A. S. Alsalman, “SVM based Arabic speaker verification system for mobile devices,” in Proceedings of the International Conference on Information Technology and e-Services (ICITeS '12), pp. 1–6, March 2012. View at Publisher · View at Google Scholar · View at Scopus
  4. K. S. Rao, A. K. Vuppala, S. Chakrabarti, and L. Dutta, “Robust speaker recognition on mobile devices,” in Proceedings of the International Conference on Signal Processing and Communications (SPCOM '10), pp. 1–5, July 2010. View at Publisher · View at Google Scholar · View at Scopus
  5. I. R. Murray and J. L. Arnott, “Synthesizing emotions in speech: is it time to get excited?” in Proceedings of the International Conference on Spoken Language Processing (ICSLP '96), Invited Paper, pp. 1816–1819, Philadelphia, Pa, USA, October 1996.
  6. S. R. Krothapalli, J. Yadav, S. Sarkar, S. G. Koolagudi, and A. K. Vuppala, “Neural network based feature transformation for emotion independent speaker identification,” International Journal of Speech Technology, vol. 15, no. 3, pp. 335–349, 2012. View at Publisher · View at Google Scholar · View at Scopus
  7. D. Li, Y. Yang, and T. Huang, “Pitch envelope based frame level score reweighed algorithm for emotion robust speaker recognition,” in Proceedings of the 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops (ACII '09), pp. 1–4, September 2009. View at Publisher · View at Google Scholar · View at Scopus
  8. K. R. Scherer, “Vocal communication of emotion: a review of research paradigms,” Speech Communication, vol. 40, no. 1-2, pp. 227–256, 2003. View at Publisher · View at Google Scholar · View at Scopus
  9. G. Klasmeyer, T. Johnstone, T. Bänziger, C. Sappok, and K. R. Scherer, “Emotional voice variability in speaker verification,” in Proceedings of the ISCA Workshop on Speech and Emotion, pp. 212–217, Belfast, Ireland, 2000.
  10. T. Wu, Y. Yang, and Z. Wu, “Improving speaker recognition by training on emotion-added models,” in Affective Computing and Intelligent Interaction: Proceedings of the 1st International Conference (ACII '05), Beijing, China, October 22–24, 2005, vol. 3784 of Lecture Notes in Computer Science, pp. 382–389, 2005. View at Publisher · View at Google Scholar
  11. L. Dongdong and Y. Yingchun, “Emotional speech clustering based robust speaker recognition system,” in Proceedings of the 2nd International Congress on Image and Signal Processing (CISP '09), pp. 1–5, Tianjin, China, October 2009. View at Publisher · View at Google Scholar · View at Scopus
  12. L. Dongdong, Y. Yingchun, W. Zhaohi, and W. Tian, “Emotion-state conversion for speaker recognition,” in Affective Computing and Intelligent Interaction: Proceedings First International Conference (ACII '05), Beijing, China, October 22–24, 2005, vol. 3784 of Lecture Notes in Computer Science, pp. 403–410, 2005. View at Publisher · View at Google Scholar
  13. W. Zhaohui, L. Dongdong, and Y. Yingchun, “Rules based feature modification for affective speaker recognition,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '06), pp. I661–I664, Toulouse, France, May 2006. View at Scopus
  14. I. Shahin, “Using emotions to identify speakers,” in Proceedings of the 5th International Workshop on Signal Processing and Its Applications (WoSPA '08), Sharjah, United Arab Emirates, 2008.
  15. I. Shahin, “Speaker identification in the shouted environment using Suprasegmental Hidden Markov Models,” Signal Processing, vol. 88, no. 11, pp. 2700–2708, 2008. View at Publisher · View at Google Scholar · View at Scopus
  16. I. Shahin, “Speaker identification in emotional environments,” Iranian Journal of Electrical and Computer Engineering, vol. 8, no. 1, pp. 41–46, 2009. View at Google Scholar · View at Scopus
  17. W. Wu, T. F. Zheng, M.-X. Xu, and H.-J. Bao, “Study on speaker verification on emotional speech,” in Proceedings of the INTERSPEECH and 9th International Conference on Spoken Language Processing (INTERSPEECH '06—ICSLP), pp. 2102–2105, September 2006. View at Scopus
  18. D. A. Reynolds, “Speaker identification and verification using Gaussian mixture speaker models,” Speech Communication, vol. 17, no. 1-2, pp. 91–108, 1995. View at Google Scholar · View at Scopus
  19. K. P. Markov and S. Nakagawa, “Text-independent speaker recognition using non-linear frame likelihood transformation,” Speech Communication, vol. 24, no. 3, pp. 193–209, 1998. View at Google Scholar · View at Scopus
  20. T. Wu, Y. Yang, Z. Wu, and D. Li, “MASC: a speech corpus in mandarin for emotion analysis and affective speaker recognition,” in Proceedings of the IEEE Workshop on Speaker and Language Recognition (Odyssey '06), pp. 1–5, San Juan, Puerto Rico, June 2006. View at Publisher · View at Google Scholar · View at Scopus
  21. V. Makarova and V. Petrushin, “RUSLANA: Russian language affective speech database,” in Proceedings of the 7th International Conference on Spoken Language Processing (ICSLP '02), pp. 2041–2044, 2002.
  22. S. Kemal, S. Elizabeth, H. Larry, and W. Mitchel, “Modeling dynamic prosodic variation for speaker verification,” in Proceedings of the 7th International Conference on Spoken Language Processing (ICSLP '98), pp. 3189–3192, Sydney, Australia, 1998.
  23. R. W. Frick, “Communicating emotion. The role of prosodic Features,” Psychological Bulletin, vol. 97, no. 3, pp. 412–429, 1985. View at Publisher · View at Google Scholar · View at Scopus
  24. Santiago-Omar and Caballero-Morales, “Recognition of emotions in Mexican Spanish speech: an approach based on acoustic modelling of emotion-specific vowels,” The Scientific World Journal, vol. 2013, Article ID 162093, 13 pages, 2013. View at Publisher · View at Google Scholar
  25. J. Hirschberg, “Communication and prosody: functional aspects of prosody,” in Proceedings of ESCA Workshop Dialogue and Prosody, pp. 7–15, 1999.
  26. N. Minematsu and S. Nakagawa, “Modeling of variations in cepstral coefficients caused by Fo changes and its application to speech processing,” in Proceedings of the 7th International Conference on Spoken Language Processing (ICSLP '98), pp. 1063–1066, Sydney, Australia, 1998.
  27. M. Schröder, “Emotional speech synthesis: a review,” in Proceedings of the 7th European Conference on Speech Communication and Technology (EUROSPEECH '01), pp. 561–564, Aalborg Congress and Culture Centre, Aalborg, Denmark, September 2001.
  28. D. Ververidis and C. Kotropoulos, “Emotional speech recognition: resources, features, and methods,” Speech Communication, vol. 48, no. 9, pp. 1162–1181, 2004. View at Publisher · View at Google Scholar · View at Scopus
  29. P. Yang, Y. Yang, and Z. Wu, “Exploiting glottal information in speaker recognition using parallel GMMs,” in Proceedings of the 5th International Conference on Audio- and Video-Based Biometric Person Authentication (AVBPA '05), pp. 804–812, Hilton Rye Town, NY, USA, July 2005. View at Scopus
  30. F. Xia, Y.-W. Yang, L. Zhou, F. Li, M. Cai, and D. D. Zeng, “A closed-form reduction of multi-class cost-sensitive learning to weighted multi-class learning,” Pattern Recognition, vol. 42, no. 7, pp. 1572–1581, 2009. View at Publisher · View at Google Scholar · View at Scopus
  31. N. Abe, B. Zadrozny, and J. Langford, “An iterative method for multi-class cost-sensitive learning,” in Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '04), pp. 3–11, August 2004. View at Scopus
  32. R. Cowie, R. Corive, E. Douglas-Cowie et al., “Emotion recognition in human-computer interaction,” IEEE Signal Processing Magazine, vol. 18, no. 1, pp. 32–80, 2001. View at Publisher · View at Google Scholar · View at Scopus
  33. W. Dai, S. Liu, and S. Liang, “An improved ant colony optimization cluster algorithm based on swarm intelligence,” Journal of Software, vol. 4, no. 4, pp. 299–306, 2009. View at Google Scholar · View at Scopus
  34. Y. Wang, X. Hu, W. Dai, J. Zhou, and T. Guo, “Vocal emotion of humanoid robots: a study from brain mechanism,” The Scientific World Journal, vol. 2014, Article ID 216341, 7 pages, 2014. View at Publisher · View at Google Scholar