Table of Contents Author Guidelines Submit a Manuscript
Journal of Electrical and Computer Engineering
Volume 2016, Article ID 2013658, 8 pages
http://dx.doi.org/10.1155/2016/2013658
Research Article

Experiments on Detection of Voiced Hesitations in Russian Spontaneous Speech

St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences, St. Petersburg, Russia

Received 31 July 2016; Accepted 31 October 2016

Academic Editor: Alexander Petrovsky

Copyright © 2016 Vasilisa Verkhodanova and Vladimir Shapranov. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. F. G. Eisler, Psycholinguistics: Experiments in Spontaneous Speech, Academic Press, 1968.
  2. W. L. Chafe, Ed., The Pear Stories: Cognitive, Cultural, and Linguistic Aspects of Narrative Production, Ablex Publishing Corp, Norwood, Mass, USA, 1980.
  3. E. Shriberg, Preliminaries to a theory of speech disfluencies [Ph.D. thesis], University of California at Berkeley, 1994.
  4. J. E. F. Tree, “The effects of false starts and repetitions on the processing of subsequent words in spontaneous speech,” Journal of Memory and Language, vol. 34, no. 6, pp. 709–738, 1995. View at Publisher · View at Google Scholar · View at Scopus
  5. A. Kibrik and V. Podlesskaya, Eds., Night Dream Stories: Corpus Study of Russian Discourse, Litres, 2014.
  6. D. C. O'Connell and S. Kowal, “The history of research on the filled pause as evidence of the written language bias in linguistics (Linell, 1982),” Journal of Psycholinguistic Research, vol. 33, no. 6, pp. 459–474, 2004. View at Publisher · View at Google Scholar · View at Scopus
  7. A. Stolcke, E. Shriberg, R. A. Bates et al., “Automatic detection of sentence boundaries and disfluencies based on recognized words,” in Proceedings of the International Conference on Spoken Language Processing (ICSLP '98).
  8. J. J. Godfrey, E. C. Holliman, and J. McDaniel, “Switch board: telephone speech corpusfor research and development,” in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP '92), vol. 1, pp. 517–520, San Francisco, Calif, USA, March 1992.
  9. H. Medeiros, H. Moniz, F. Batista, I. Trancoso, and L. Nunes, “Disfluency detection based on prosodic features for university lectures,” in Proceedings of the 14th Annual Conference of the International Speech Communication Association (INTERSPEECH '13), pp. 2629–2633, Lyon, France, August 2013. View at Scopus
  10. E. Shriberg, “Spontaneous speech: how people really talk and why engineers should care,” in Proceedings of the 9th European Conference on Speech Communication and Technology (INTERSPEECH '05), pp. 1781–1784, ISCA, Lisbon, Portugal, September 2005. View at Scopus
  11. H. Clark, Using Language, Cambridge University Press, Cambridge, UK, 1996.
  12. R. Ogden, “Turn-holding, turn-yielding and laryngeal activity in finnish talkin-interaction,” Journal of the International Phonetics Association, vol. 31, no. 1, pp. 139–152, 2001. View at Google Scholar
  13. T. Arbisi-Kelm and S. A. Jun, “A comparison of disfluency patterns in normal andstuttered speech,” in Disfluency in Spontaneous Speech, 2005. View at Google Scholar
  14. F. Ferreira, E. F. Lau, and K. G. D. Bailey, “Disfluencies, language comprehension, and tree adjoining grammars,” Cognitive Science, vol. 28, no. 5, pp. 721–749, 2004. View at Publisher · View at Google Scholar · View at Scopus
  15. Y. Liu, E. Shriberg, A. Stolcke, D. Hillard, M. Ostendorf, and M. Harper, “Enriching speech recognition with automatic detection of sentence boundaries and disfluencies,” IEEE Transactions on Audio, Speech and Language Processing, vol. 14, no. 5, pp. 1526–1539, 2006. View at Publisher · View at Google Scholar · View at Scopus
  16. K. Audhkhasi, K. Kandhway, O. D. Deshmukh, and A. Verma, “Formant-based technique for automatic filled-pause detection in spontaneous spoken english,” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '09), pp. 4857–4860, April 2009. View at Publisher · View at Google Scholar · View at Scopus
  17. M. Goto, K. Itou, and S. Hayamizu, “A real-time filled pause detection system forspontaneous speech recognition,” in Proceedings of the 6th European Conference on Speech Communication and Technology (Eurospeech '99), pp. 227–230, ISCA, Budapest, Hungary, 1999.
  18. V. Khurshudian, “Hesitation in typologically different languages: an experimentalstudy,” in Proceedings of the International Conference on Computational Linguistics Dialogue, pp. 497–501, 2005.
  19. S. Stepanova, “Some features of filled hesitation pauses in spontaneous russian,” in Proceedings of the 16th International Congress of Phonetic Sciences, vol. 16, pp. 1325–1328, Saarbrucken, Germany, 2007.
  20. A. Giannini, “Hesitation phenomena in spontaneous italian,” in Proceedings of the 15th International Congress of Phonetic Sciences, pp. 2653–2656, Barcelona, Spain, 2003.
  21. E. Shriberg, “To ‘Errrr’ is human: ecology and acoustics of speech disfluencies,” Journal of the International Phonetic Association, vol. 31, no. 1, pp. 153–169, 2001. View at Publisher · View at Google Scholar
  22. J. Peters, “LM studies on filled pauses in spontaneous medical dictation,” in Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol. 2, pp. 82–84, Association for Computational Linguistics, Edmonton, Canada, May 2003. View at Publisher · View at Google Scholar
  23. F. Stouten and J. P. Martens, “A feature-based filled pause detection system for dutch,” in Proceedings of the Workshop on Automatic Speech Recognition and Understanding (ASRU '03), pp. 309–314, IEEE, 2003.
  24. D. O'Shaughnessy, “Recognition of hesitations in spontaneous speech,” in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP '92), vol. 1, pp. 521–524, IEEE, 1992.
  25. E. Shriberg, R. A. Bates, and A. Stolcke, “A prosody only decision-tree model fordisfluency detection,” in Proceedings of the 5th European Conference on Speech Communication and Technology (Eurospeech '97), pp. 2383–2386, Rhodes, Greece, 1997.
  26. H. Medeiros, F. Batista, H. Moniz, I. Trancoso, and H. Meinedo, “Experiments onautomatic detection of filled pauses using prosodic features,” Actas de Inforum, pp. 335–345, 2013. View at Google Scholar
  27. INTERSPEECH: Computational Paralinguistic Challenge, 2013, http://emotion-research.net/sigs/speech-sig/is13-compare.
  28. R. Gupta, K. Audhkhasi, S. Lee, and S. Narayanan, “Paralinguistic event detection from speech using probabilistic time-series smoothing and masking,” in Proceedings of the 14th Annual Conference of the International Speech Communication Association (INTERSPEECH '13), pp. 173–177, Lyon, France, August 2013. View at Scopus
  29. D. Prylipko, O. Egorow, I. Siegert, and A. Wendemuth, “Application of image processing methods to filled pauses detection from spontaneous speech,” in Proceedings of the 15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages (INTERSPEECH '14), pp. 1816–1820, Singapore, September 2014. View at Scopus
  30. F. Eyben, M. Wöllmer, and B. Schuller, “OpenSMILE: the Munich versatile and fast open-source audio feature extractor,” in Proceedings of the 18th ACM International Conference on Multimedia ACM Multimedia (MM '10), pp. 1459–1462, ACM, Firenze, Italy, October 2010. View at Publisher · View at Google Scholar · View at Scopus
  31. Y. Liu, Structural event detection for rich transcription of speech [Ph.D. thesis], Purdue University, 2004.
  32. LDC: English CTS treebank with structural metadata, http://catalog.ldc.upenn.edu/LDC2009T01.
  33. LDC: Czech broadcast conversation MDE transcripts, http://catalog.ldc.upenn.edu/LDC2009T20.
  34. LDC, “Czech broadcast conversation speech,” http://catalog.ldc.upenn.edu/LDC2009S02.
  35. J. Kolár, J. Svec, S. Strassel, C. Walker, D. Kozlíková, and J. Psutka, “Czech spontaneous speech corpus with structural metadata,” in Proceedings of the 9th European Conference on Speech Communication and Technology (INTERSPEECH '05), Lisbon, Portugal, September 2005.
  36. V. Verkhodanova and V. Shapranov, “Automatic detection of filled pauses and lengthenings in the spontaneous Russian speech,” in Proceedings of the 7th International Conference on Speech Prosody (SP '14), pp. 1110–1114, Dublin, Ireland, May 2014. View at Scopus
  37. E. Zemskaya, Russian Spoken Speech: Linguistic Analysis and the Problems of Learning. Moscow, 1979.
  38. A. Anderson, M. Bader, E. Bard et al., “The HCRC map task corpus,” Language and Speech, vol. 34, no. 4, pp. 351–366, 1991. View at Google Scholar
  39. K. J. Kohler, “Labelled data bank of spoken standard German—the Kiel Corpus of read/spontaneous speech,” in Proceedings of the 4th International Conference on Spoken Language Processing (ICSLP '96), vol. 3, pp. 1938–1941, IEEE, October 1996. View at Scopus
  40. S. A. Zahorian, J. Wu, M. Karnjanadecha et al., “Open-source multi-language audio database for spoken language processing applications,” in Proceedings of the 12th Annual Conference of the International Speech Communication Association (INTERSPEECH '11), pp. 1493–1496, Florence, Italy, August 2011.
  41. Department of Phonetics of Saint Petersburg University, http://phonetics.spbu.ru/.
  42. G. Garg and N. Ward, “Detecting filled pauses in tutorial dialogs,” 2006.
  43. J. Snyman, Practical Mathematical Optimization: An Introduction to Basic Optimization Theory and Classical and New Gradient-Based Algorithms, vol. 97, Springer Science & Business Media, 2005.
  44. V. Verkhodanova and V. Shapranov, “Multi-factor method for detection of filledpauses and lengthenings in Russian spontaneous speech,” in Speech and Computer: 17th International Conference, SPECOM 2015, Athens, Greece, September 20–24, 2015, Proceedings, vol. 9319 of Lecture Notes in Computer Science, pp. 285–292, Springer, Berlin, Germany, 2015. View at Publisher · View at Google Scholar
  45. V. Verkhodanova, V. Shapranov, and A. Karpov, “Filled pauses and lengthenings detection using machine learning techniques,” in Proceedings of the 7th Tutorial and Research Workshop on Experimental Linguistics ExLing, pp. 175–178, Saint Petersburg, Russia, July 2016.
  46. A. Akusok, K.-M. Björk, Y. Miche, and A. Lendasse, “High-performance extreme learning machines: a complete toolbox for big data applications,” IEEE Access, vol. 3, pp. 1011–1025, 2015. View at Publisher · View at Google Scholar
  47. P. Boersma and D. Weenink, Praat: doing phonetics by computer [computer program], version 6.0.11, http://www.praat.org/.
  48. V. Verkhodanova and V. Shapranov, “Detecting filled pauses and lengthenings in russian spontaneous speech using SVM,” in Speech and Computer: 18th International Conference, SPECOM 2016, Budapest, Hungary, August 23–27, 2016, Proceedings, vol. 9811 of Lecture Notes in Computer Science, pp. 224–231, Springer, Berlin, Germany, 2016. View at Publisher · View at Google Scholar
  49. Scikit-Learn: Machine learning in Python, http://scikit-learn.org.
  50. C. C. Chang and C. J. Lin, “LIBSVM: a library for support vector machines,” in ACM Transactions on Intelligent Systems and Technology (TIST), vol. 2, pp. 1–127, 2011. View at Google Scholar
  51. H. J. Heijmans, “Mathematical morphology: a modern approach in image processing based on algebra and geometry,” SIAM Review, vol. 37, no. 1, pp. 1–36, 1995. View at Publisher · View at Google Scholar · View at MathSciNet