Research Article
Using Morphological Data in Language Modeling for Serbian Large Vocabulary Speech Recognition
Table 3
Textual database overview.
| Corpus part | #Sentences | #Words | #Characters |
| Journalistic | 737k | 17M | 94M | Literary | 303k | 3.9M | 18M | Scientific | 23k | 503k | 3M | Administrative | 15k | 378k | 2M | Popular-scientific | 18k | 357k | 2M | Conversational | 38k | 128k | 530k | Transcriptions | 251k | 3.2M | 15M |
| Total | 1.4M | 26M | 135M | “Dev” set | 20k | 470k | 2.6M |
|
|