Table of Contents Author Guidelines Submit a Manuscript
Advances in Human-Computer Interaction
Volume 2010, Article ID 782802, 15 pages
http://dx.doi.org/10.1155/2010/782802
Research Article

Segmenting into Adequate Units for Automatic Recognition of Emotion-Related Episodes: A Speech-Based Approach

1Pattern Recognition Laboratory, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), D-91058 Erlangen, Germany
2ESAT, Katholieke Universiteit Leuven, B-3001 Leuven, Belgium
3Institute for Human-Machine Communication, Technische Universität München (TUM), D-80333 Munich, Germany

Received 1 April 2009; Accepted 12 December 2009

Academic Editor: Elisabeth Andre

Copyright © 2010 Anton Batliner et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. A. Ortony, G. L. Clore, and A. Collins, The Cognitive Structure of Emotions, Cambridge University Press, Cambridge, NY, USA, 1988.
  2. R. Cowie, N. Sussman, and A. Ben-Ze'ev, “Emotions: concepts and definitions,” in Humaine Handbook on Emotion, P. Petta, Ed., Springer, Berlin, Germany, 2010, to appear. View at Google Scholar
  3. H. Grice, “Logic and conversation,” in Syntax and Semantics, P. Cole and J. Morgan, Eds., vol. 3 of Speech Acts, pp. 41–58, Academic Press, New York, NY, USA, 1975. View at Google Scholar
  4. Z. Zeng, M. Pantic, G. I. Roisman, and T. S. Huang, “A survey of affect recognition methods: audio, visual, and spontaneous expressions,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 1, pp. 39–58, 2009. View at Publisher · View at Google Scholar · View at Scopus
  5. A. Batliner, S. Steidl, C. Hacker, E. Nöth, and H. Niemann, “Tales of tuning—prototyping for automatic classification of emotional user states,” in Proceedings of the 9th European Conference on Speech Communication and Technology (Interspeech '05), pp. 489–492, Lisbon, Portugal, 2005. View at Scopus
  6. D. Seppi, A. Batliner, B. Schuller et al., “Patterns, prototypes, performance: classifying emotional user states,” in Proceedings of the 9th International Conference on Spoken Language Processing (Interspeech '08), pp. 601–604, Brisbane, Australia, September 2008.
  7. R. P. Lippmann, “Speech recognition by machines and humans,” Speech Communication, vol. 22, no. 1, pp. 1–15, 1997. View at Publisher · View at Google Scholar · View at Scopus
  8. R. Cowie, E. Douglas-Cowie, S. Savvidou, E. McMahon, M. Sawey, and M. Schröder, “Feeltrace: an instrument for recording perceived emotion in real time,” in Proceedings of the ISCA Workshop on Speech and Emotion, pp. 19–24, Newcastle, Northern Ireland, 2000.
  9. A. Batliner, R. Kompe, A. Kießling, M. Mast, H. Niemann, and E. Nöth, “M = Syntax + Prosody: a syntactic-prosodic labelling scheme for large spontaneous speech databases,” Speech Communication, vol. 25, no. 4, pp. 193–222, 1998. View at Google Scholar · View at Scopus
  10. M. Shami and W. Verhelst, “Automatic classification of expressiveness in speech: a multi-corpus study,” in Speaker Classification II, C. Müller, Ed., vol. 4441 of Lecture Notes in Computer Science/Lecture Notes in Artificial Intelligence, pp. 43–56, Springer, Berlin, Germany, 2007. View at Google Scholar
  11. M. Streit, A. Batliner, and T. Portele, “Emotions analysis and emotion-handling subdialogues,” in SmartKom: Foundations of Multimodal Dialogue Systems, W. Wahlster, Ed., pp. 317–332, Springer, Berlin, Germany, 2006. View at Google Scholar
  12. M. Schröder, R. Cowie, D. Heylen, M. Pantic, C. Pelachaud, and B. Schuller, “Towards responsive sensitive artificial listeners,” in Proceedings of the 4th International Workshop on Human-Computer Conversation, Bellagio, Italy, 2008.
  13. M. Pantic, N. Sebe, J. F. Cohn, and T. Huang, “Affective multimodal human-computer interaction,” in Proceedings of ACM Multimedia, pp. 669–676, Singapore, 2005.
  14. A. Batliner, S. Steidl, C. Hacker, and E. Nöth, “Private emotions versus social interaction: a data-driven approach towards analysing emotion in speech,” User Modelling and User-Adapted Interaction, vol. 18, no. 1-2, pp. 175–206, 2008. View at Publisher · View at Google Scholar · View at Scopus
  15. S. Steidl, Automatic classification of emotion-related user states in spontaneous children's speech, Ph.D. thesis, Logos, Berlin, Germany, 2009.
  16. A. Batliner, V. Zeissler, C. Frank, J. Adelhardt, R. P. Shi, and E. Nöth, “We are not amused—but how do you know? User states in a multi-modal dialogue system,” in Proceedings of the 8th European Conference on Speech Communication and Technology (Interspeech '03), pp. 733–736, Geneva, Switzerland, September 2003.
  17. B. Schuller, R. Müller, F. Eyben et al., “Being bored? Recognising natural interest by extensive audiovisual integration for real-life application,” Image and Vision Computing, vol. 27, no. 12, pp. 1760–1774, 2009. View at Publisher · View at Google Scholar · View at Scopus
  18. M. Blomberg and D. Elenius, “Collection and recognition of children's speech in the PF-Star project,” in Proceedings of the Swedish Phonetics Conference (Fonetik '00), pp. 81–84, Umeå, Sweden, 2003.
  19. M. Russell, S. D'Arcy, and L. Qun, “The effects of bandwidth reduction on human and computer recognition of children's speech,” IEEE Signal Processing Letters, vol. 14, no. 12, pp. 1044–1046, 2007. View at Publisher · View at Google Scholar · View at Scopus
  20. D. Giuliani and M. Gerosa, “Investigating recognition of children's speech,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '03), vol. 2, pp. 137–140, Hong Kong, 2003. View at Scopus
  21. S. Lee, A. Potamianos, and S. Narayanan, “Acoustics of children's speech: developmental changes of temporal and spectral parameters,” Journal of the Acoustical Society of America, vol. 105, no. 3, pp. 1455–1468, 1999. View at Publisher · View at Google Scholar · View at Scopus
  22. M. Holodynski and W. Friedlmeier, Development of Emotions and Emotion Regulation, Springer, New York, NY, USA, 2006.
  23. B. Schuller, D. Seppi, A. Batliner, A. Maier, and S. Steidl, “Towards more reality in the recognition of emotional speech,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '07), vol. 4, pp. 941–944, Honolulu, Hawaii, USA, 2007. View at Publisher · View at Google Scholar · View at Scopus
  24. F. Burkhardt, A. Paeschke, M. Rolfes, W. Sendlmeier, and B. Weiss, “A database of German emotional speech,” in Proceedings of the 9th European Conference on Speech Communication and Technology (Interspeech '05), pp. 1517–1520, Lisbon, Portugal, 2005. View at Scopus
  25. I. S. Engberg, A. V. Hansen, O. Andersen, and P. Dalsgaard, “Design, recording and verification of a Danish emotional speech database,” in Proceedings of the European Conference on Speech Communication and Technology (Eurospeech '97), pp. 1695–1698, Rhodes, Greece, 1997.
  26. J. Schwitalla, Gesprochenes Deutsch: Eine Einführung, Erich Schmidt, Berlin, Germany, 1997.
  27. A. Batliner, K. Fischer, R. Huber, J. Spilker, and E. Nöth, “How to find trouble in communication,” Speech Communication, vol. 40, no. 1-2, pp. 117–143, 2003. View at Publisher · View at Google Scholar · View at Scopus
  28. Z. Inanoglu and R. Caneel, “Emotive alert: HMM-based emotion detection in voicemail messages,” in Proceedings of the 10th International Conference on Intelligent User Interfaces (IUI '05), pp. 251–253, San Diego, Calif, USA, 2005.
  29. L. Devillers, L. Vidrascu, and L. Lamel, “Challenges in real-life emotion annotation and machine learning based detection,” Neural Networks, vol. 18, no. 4, pp. 407–422, 2005. View at Publisher · View at Google Scholar · View at Scopus
  30. F. de Rosis, A. Batliner, N. Novielli, and S. Steidl, “‘You are sooo cool, Valentina!‘ recognizing social attitude in speech-based dialogues with an ECA,” in Affective Computing and Intelligent Interaction, A. Paiva, R. Prada, and R. W. Picard, Eds., pp. 179–190, Springer, Berlin, Germany, 2007. View at Google Scholar
  31. R. Cowie, E. Douglas-Cowie, N. Tsapatsoulis et al., “Emotion recognition in human-computer interaction,” IEEE Signal Processing Magazine, vol. 18, no. 1, pp. 32–80, 2001. View at Publisher · View at Google Scholar · View at Scopus
  32. B. Schuller, A. Batliner, D. Seppi et al., “The relevance of feature type for the automatic classification of emotional user states: low level descriptors and functionals,” in Proceedings of the 8th Annual Conference of the International Speech Communication Association (Interspeech '07), vol. 2, pp. 2253–2256, Antwerp, Belgium, August 2007. View at Scopus
  33. B. Schuller, S. Steidl, and A. Batliner, “The Interspeech 2009 emotion challenge,” in Proceedings of the 10th Annual Conference of the International Speech Communication Association (Interspeech '09), pp. 312–315, Brighton, UK, 2009. View at Scopus
  34. S. Arunachalam, D. Gould, E. Anderson, D. Byrd, and S. Narayanan, “Politeness and frustration language in child-machine interactions,” in Proceedings of the 7th European Conference on Speech Communication and Technology (Eurospeech '01), pp. 2675–2678, Aalborg, Denmark, September 2001.
  35. Z.-J. Chuang and C.-H. Wu, “Emotion recognition using acoustic features and textual content,” in Proceedings of the IEEE International Conference on Multimedia and Expo (ICME '04), vol. 1, pp. 53–56, Taipei, Taiwan, 2004. View at Scopus
  36. K. Dupuis and K. Pichora-Fuller, “Use of lexical and affective prosodic cues to emotion by younger and older adults,” in Proceedings of the 8th Annual Conference of the International Speech Communication Association (Interspeech '07), vol. 2, pp. 2237–2240, Antwerp, Belgium, August 2007. View at Scopus
  37. A. Batliner, S. Steidl, B. Schuller et al., “Combining efforts for improving automatic classification of emotional user states,” in Proceedings of the 1st International Language Technologies Conference (IS-LTC '06), pp. 240–245, Ljubljana, Slovenia, 2006.
  38. T. Joachims, “Text categorization with support vector machines: learning with many relevant features,” in Proceedings of the 10th European Conference on Machine Learning (ECML '98), C. Nédellec and C. Rouveirol, Eds., pp. 137–142, Chemnitz, Germany, 1998.
  39. B. Pang, L. Lee, and S. Vaithyanathan, “Thumbs up? Sentiment classification using machine learning techniques,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP '02), pp. 79–86, Philadelphia, Pa, USA, 2002.
  40. B. Schuller, N. Köhler, R. Müller, and G. Rigoll, “Recognition of interest in human conversational speech,” in Proceedings of the 9th International Conference on Spoken Language Processing (ICSLP '06), vol. 2, pp. 793–796, Pittsburgh, Pa, USA, 2006. View at Scopus
  41. J. B. Lovins, “Development of a stemming algorithm,” Mechanical Translation and Computational Linguistics, vol. 11, pp. 22–31, 1968. View at Google Scholar
  42. I. H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques, Morgan Kaufmann, San Francisco, Calif, USA, 2nd edition, 2005.
  43. T. Athanaselis, S. Bakamidis, I. Dologlou, R. Cowie, E. Douglas-Cowie, and C. Cox, “ASR for emotional speech: clarifying the issues and enhancing performance,” Neural Networks, vol. 18, no. 4, pp. 437–444, 2005. View at Publisher · View at Google Scholar · View at Scopus
  44. B. Schuller, A. Batliner, S. Steidl, and D. Seppi, “Does affect affect automatic recognition of children's speech?” in Proceedings of the 1st Workshop on Child, Computer and Interaction, Chania, Greece, 2008.
  45. B. Schuller, A. Batliner, S. Steidl, and D. Seppi, “Emotion recognition from speech: putting ASR in the loop,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '09), pp. 4585–4588, Taipei, Taiwan, 2009. View at Publisher · View at Google Scholar · View at Scopus
  46. T. L. Nwe, S. W. Foo, and L. C. De Silva, “Speech emotion recognition using hidden Markov models,” Speech Communication, vol. 41, no. 4, pp. 603–623, 2003. View at Publisher · View at Google Scholar · View at Scopus
  47. B. Schuller, R. Jiménez Villar, G. Rigoll, and M. Lang, “Meta-classifiers in acoustic and linguistic feature fusion-based affect recognition,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '05), vol. 1, pp. 325–328, Philadelphia, Pa, USA, March 2005. View at Publisher · View at Google Scholar · View at Scopus
  48. B. Vlasenko, B. Schuller, A. Wendemuth, and G. Rigoll, “Combining frame and turn-level information for robust recognition ofemotions within speech,” in Proceedings of the 8th Annual Conference of the International Speech Communication Association (Interspeech '07), vol. 4, pp. 2249–2252, Antwerp, Belgium, August 2007. View at Scopus
  49. C. M. Lee and S. S. Narayanan, “Toward detecting emotions in spoken dialogs,” IEEE Transactions on Speech and Audio Processing, vol. 13, no. 2, pp. 293–303, 2005. View at Publisher · View at Google Scholar · View at Scopus
  50. J. Su, H. Zhang, C. X. Ling, and S. Matwin, “Discriminative parameter learning for Bayesian networks,” in Proceedings of the 25th International Conference on Machine Learning (ICML '08), pp. 1016–1023, Helsinki, Sweden, 2008. View at Scopus
  51. A. Batliner, K. Fischer, R. Huber, J. Spilker, and R. Nöth, “Desperately seeking emotions: actors, wizards, and human beings,” in Proceedings of the ISCA Workshop on Speech and Emotion, pp. 195–200, Newcastle, Northern Ireland, 2000.
  52. A. Batliner, F. Burkhardt, M. van Ballegooy, and E. Nöth, “A taxonomy of applications that utilize emotional awareness,” in Proceedings of the 1st International Language Technologies Conference (IS-LTC '06), pp. 246–250, Ljubljana, Slovenia, 2006.
  53. M. Garrett, T. Bever, and J. Fodor, “The active use of grammar in speech perception,” Perception and Psychophysics, vol. 1, pp. 30–32, 1966. View at Google Scholar