Journal of Electrical and Computer Engineering
Volume 2016 (2016), Article ID 4062786, 9 pages
Research Article

A Russian Keyword Spotting System Based on Large Vocabulary Continuous Speech Recognition and Linguistic Knowledge

1Speech Drive LLC, Saint Petersburg, Russia
2V.A. Trapeznikov Institute of Control Sciences of RAS, Moscow, Russia
3RUDN University, Moscow, Russia

Received 11 July 2016; Revised 27 October 2016; Accepted 14 November 2016

The paper describes the key concepts of a word spotting system for Russian based on large vocabulary continuous speech recognition. Key algorithms and system settings are described, including the pronunciation variation algorithm, and the experimental results on the real-life telecom data are provided. The description of system architecture and the user interface is provided. The system is based on CMU Sphinx open-source speech recognition platform and on the linguistic models and algorithms developed by Speech Drive LLC. The effective combination of baseline statistic methods, real-world training data, and the intensive use of linguistic knowledge led to a quality result applicable to industrial use.