Department of Language and Speech, Radboud University, 6500 HD Nijmegen, The Netherlands
Copyright © 2010 Joost van Doremalen et al. This is an open access article distributed under the
Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract
Computer-Assisted
Language Learning (CALL) applications for
improving the oral skills of low-proficient
learners have to cope with non-native speech that
is particularly challenging. Since unconstrained
non-native ASR is still problematic, a possible
solution is to elicit constrained responses from
the learners. In this paper, we describe
experiments aimed at selecting utterances from
lists of responses. The first experiment on
utterance selection indicates that the decoding
process can be improved by optimizing the
language model and the acoustic models, thus
reducing the utterance error rate from
29–26% to 10–8%. Since
giving feedback on incorrectly recognized
utterances is confusing, we verify the
correctness of the utterance before providing
feedback. The results of the second experiment
on utterance verification indicate that
combining duration-related features with a
likelihood ratio (LR) yield an equal error rate
(EER) of 10.3%, which is significantly
better than the EER for the other measures in
isolation.