Table of Contents Author Guidelines Submit a Manuscript
Journal of Electrical and Computer Engineering
Volume 2016, Article ID 4062786, 9 pages
http://dx.doi.org/10.1155/2016/4062786
Research Article

A Russian Keyword Spotting System Based on Large Vocabulary Continuous Speech Recognition and Linguistic Knowledge

1Speech Drive LLC, Saint Petersburg, Russia
2V.A. Trapeznikov Institute of Control Sciences of RAS, Moscow, Russia
3RUDN University, Moscow, Russia

Received 11 July 2016; Revised 27 October 2016; Accepted 14 November 2016

Academic Editor: Alexey Karpov

Copyright © 2016 Valentin Smirnov et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. T. J. Hazen, F. Richardson, and A. Margolis, “Topic identification from audio recordings using word and phone recognition lattices,” in Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU '07), pp. 659–664, Kyoto, Japan, December 2007. View at Scopus
  2. D. A. James and S. J. Young, “A fast lattice-based approach to vocabulary independent wordspotting,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 1, pp. 377–380, Adelaide, Australia, 1994.
  3. I. Szöke, P. Schwarz, P. Matějka, L. Burget, M. Karafiát, and J. Černocký, “Phoneme based acoustics keyword spotting in informal continuous speech,” in Text, Speech and Dialogue: 8th International Conference, TSD 2005, Karlovy Vary, Czech Republic, September 12–15, 2005. Proceedings, vol. 3658 of Lecture Notes in Computer Science, pp. 302–309, Springer, Berlin, Germany, 2005. View at Publisher · View at Google Scholar
  4. M. Yamada, M. Naito, T. Kato, and H. Kawai, “Improvement of rejection performance of keyword spotting using anti-keywords derived from large vocabulary considering acoustical similarity to keywords,” in Proceedings of the 9th European Conference on Speech Communication and Technology, Lisbon, Portugal, September 2005.
  5. M. Matsushita, H. Nishizaki, H. Nishizaki, S. Nakagawa et al., “Evaluating multiple LVCSR model combination in NTCIR-3 speech-driven web retrieval task,” in Proceedings of the 8th European Conference on Speech Communication and Technology (EUROSPEECH '03), pp. 1205–1208, Geneva, Switzerland, September 2003.
  6. I. Szoke et al., “Comparison of keyword spotting approaches for informal continuous speech,” in Proceedings of the 2nd Joint Workshop on Multimodal Interaction and Related Machine Learning Algorithms (INTERSPEECH '05), pp. 633–636, Edinburgh, UK, 2005.
  7. D. Karakos, R. Schwartz, S. Tsakalidis et al., “Score normalization and system combination for improved keyword spotting,” in Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU '13), pp. 210–215, Olomouc, Czech Republic, December 2013. View at Publisher · View at Google Scholar · View at Scopus
  8. J. Mamou, J. Cui, X. Cui et al., “System combination and score normalization for spoken term detection,” in Proceedings of the 38th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '13), pp. 8272–8276, Vancouver, Canada, May 2013. View at Publisher · View at Google Scholar · View at Scopus
  9. G. Hinton, L. Deng, D. Yu et al., “Deep neural networks for acoustic modeling in speech recognition,” IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 82–97, 2012. View at Publisher · View at Google Scholar · View at Scopus
  10. M. Harper, “IARPA Babel Program,” https://www.iarpa.gov/index.php/research-programs/babel?highlight=WyJiYWJlbCJd.
  11. J. Cui, X. Cui, B. Ramabhadran et al., “Developing speech recognition systems for corpus indexing under the IARPA Babel program,” in Proceedings of the 38th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '13), pp. 6753–6757, IEEE, Vancouver, Canada, May 2013. View at Publisher · View at Google Scholar · View at Scopus
  12. M. J. F. Gales, K. M. Knill, A. Ragni, and S. P. Rath, “Speech recognition and keyword spotting for low resource languages: babel project research at CUED,” in Proceedings of the 4th International Workshop on Spoken Language Technologies for Under-Resourced Languages, pp. 16–23, Petersburg, Russia, 2014.
  13. F. Grézl, M. Karafiát, and M. Janda, “Study of probabilistic and Bottle-Neck features in multilingual environment,” in Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU '11), pp. 359–364, December 2011. View at Publisher · View at Google Scholar · View at Scopus
  14. P. Baljekar, J. F. Lehman, and R. Singh, “Online word-spotting in continuous speech with recurrent neural networks,” in Proceedings of the IEEE Workshop on Spoken Language Technology (SLT '14), pp. 536–541, South Lake Tahoe, Nev, USA, December 2014. View at Publisher · View at Google Scholar · View at Scopus
  15. K. Kilgour and A. Waibel, “A neural network keyword search system for telephone speech,” in Speech and Computer: 16th International Conference, SPECOM 2014, Novi Sad, Serbia, October 5–9, 2014. Proceedings, A. Ronzhin, R. Potapova, and V. Delic, Eds., pp. 58–65, Springer, Berlin, Germany, 2014. View at Publisher · View at Google Scholar
  16. I. Bulyko, M. Ostendorf, M. Siu, T. Ng, A. Stolcke, and Ö. Çetin, “Web resources for language modeling in conversational speech recognition,” ACM Transactions on Speech and Language Processing, vol. 5, no. 1, article 1, 2007. View at Publisher · View at Google Scholar · View at Scopus
  17. A. Gandhe, L. Qin, F. Metze, A. Rudnicky, I. Lane, and M. Eck, “Using web text to improve keyword spotting in speech,” in Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU '13), pp. 428–433, Olomouc, Czech Republic, December 2013. View at Publisher · View at Google Scholar · View at Scopus
  18. G. Mendels, E. Cooper, V. Soto et al., “Improving speech recognition and keyword search for low resource languages using web data,” in Proceedings of the 16th Annual Conference of the International Speech Communication Association (Interspeech '15), pp. 829–833, Dresden, Germany, September 2015. View at Scopus
  19. SpeechKit API, http://api.yandex.ru/speechkit/.
  20. K. Levin, I. Ponomareva, A. Bulusheva et al., “Automated closed captioning for Russian live broadcasting,” in Proceedings of the 15th Annual Conference of the International Speech Communication Association (INTERSPEECH '14), pp. 1438–1442, Singapore, September 2014.
  21. A. Karpov, I. Kipyatkova, and A. Ronzhin, “Speech recognition for east slavic languages: the case of russian,” in Proceedings of the 3rd International Workshop on Spoken Languages Technologies for Under-resourced Languages (SLTU '12), pp. 84–89, Cape Town, South Africa, 2012.
  22. A. Karpov, K. Markov, I. Kipyatkova, D. Vazhenina, and A. Ronzhin, “Large vocabulary Russian speech recognition using syntactico-statistical language modeling,” Speech Communication, vol. 56, no. 1, pp. 213–228, 2014. View at Publisher · View at Google Scholar · View at Scopus
  23. M. Zulkarneev, R. Grigoryan, and N. Shamraev, “Acoustic modeling with deep belief networks for Russian Speech,” in Speech and Computer: 15th International Conference, SPECOM 2013, Pilsen, Czech Republic, September 1–5, 2013. Proceedings, vol. 8113 of Lecture Notes in Computer Science, pp. 17–23, Springer, Berlin, Germany, 2013. View at Publisher · View at Google Scholar
  24. K.-F. Lee, H.-W. Hon, and R. Reddy, “An overview of the SPHINX speech recognition system,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 38, no. 1, pp. 35–45, 1990. View at Publisher · View at Google Scholar · View at Scopus
  25. V. A. Smirnov, M. N. Gusev, and M. P. Farkhadov, “The function of linguistic processor in the system for automated analysis of unstructured speech data,” Automation and Modern Technologies, no. 8, pp. 22–28, 2013. View at Google Scholar
  26. M. Bisani and H. Ney, “Joint-sequence models for grapheme-to-phoneme conversion,” Speech Communication, vol. 50, no. 5, pp. 434–451, 2008. View at Publisher · View at Google Scholar · View at Scopus
  27. T. Hain, “Implicit modelling of pronunciation variation in automatic speech recognition,” Speech Communication, vol. 46, no. 2, pp. 171–188, 2005. View at Publisher · View at Google Scholar · View at Scopus
  28. D. Jouvet, D. Fohr, and I. Illina, “Evaluating grapheme-to-phoneme converters in automatic speech recognition context,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '12), pp. 4821–4824, IEEE, Kyoto, Japan, March 2012. View at Publisher · View at Google Scholar · View at Scopus
  29. L. Lu, A. Ghoshal, and St. Renals, “Acoustic data-driven pronunciation lexicon for large vocabulary speech recognition,” in Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU '13), pp. 374–379, Olomouc, Czech Republic, December 2013. View at Publisher · View at Google Scholar · View at Scopus
  30. S. G. Paulo and L. C. Oliveira, “Generation of word alternative pronunciations using weighted finite state transducers,” in Proceedings of the Interspeech 2005, pp. 1157–1160, Lisbon, Portugal, September 2005.
  31. I. Kipyatkova, A. Karpov, V. Verkhodanova, and M. Železný, “Modeling of pronunciation, language and nonverbal units at conversational Russian speech recognition,” International Journal of Computer Science and Applications, vol. 10, no. 1, pp. 11–30, 2013. View at Google Scholar · View at Scopus
  32. L. V. Bondarko, A. Iivonen, L. C. W. Pols, and V. de Silva, “Common and language dependent phonetic differences between read and spontaneous speech in russian, finnish and dutch,” in Proceedings of the 15th International Congress of Phonetic Sciences (ICPhS '03), pp. 2977–2980, Barcelona, Spain, 2003.
  33. L. V. Bondarko, N. B. Volskaya, S. O. Tananaiko, and L. A. Vasilieva, “Phonetic properties of russian spontaneous speech,” in Proceedings of the 15th International Congress of Phonetic Sciences (ICPhS '03), p. 2973, Barcelona, Spain, 2003.
  34. A. Stolcke, “SRILM-an extensible language modeling toolkit,” in Proceedings of the International Conference on Spoken Language Processing, Denver, Colo, USA, September 2002.
  35. T. Mikolov, M. Karafiát, L. Burget, J. Cernocký, and S. Khudanpur, “Recurrent neural network based language model,” Interspeech, vol. 2, no. 3, 2010. View at Google Scholar
  36. R. V. Bilik, V. A. Zhozhikashvili, N. V. Petukhova, and M. P. Farkhadov, “Analysis of the oral interface in the interactive servicing systems. II,” Automation and Remote Control, vol. 70, no. 3, pp. 434–448, 2009. View at Publisher · View at Google Scholar · View at Scopus
  37. V. A. Zhozhikashvili, N. V. Petukhova, and M. P. Farkhadov, “Computerized queuing systems and speech technologies,” Control Sciences, no. 2, pp. 3–7, 2006. View at Google Scholar
  38. V. A. Zhozhikashvili, R. V. Bilik, V. A. Vertlib, A. V. Zhozhikashvili, N. V. Petukhova, and M. P. Farkhadov, “Open queuing system with speech recognition,” Control Sciences, no. 4, pp. 55–62, 2003. View at Google Scholar
  39. N. V. Petukhova, S. V. Vas'kovskii, M. P. Farkhadov, and V. A. Smirnov, “Architecture and features of speech recognition systems,” Neurocomputers: Development, Applications, no. 12, pp. 22–30, 2013. View at Google Scholar