Research Article

Using Morphological Data in Language Modeling for Serbian Large Vocabulary Speech Recognition

Table 3

Textual database overview.

Corpus part#Sentences#Words#Characters

Journalistic737k17M94M
Literary303k3.9M18M
Scientific23k503k3M
Administrative15k378k2M
Popular-scientific18k357k2M
Conversational38k128k530k
Transcriptions251k3.2M15M

Total1.4M26M135M
“Dev” set20k470k2.6M