Research Article

Recognition of the Script in Serbian Documents Using Frequency Occurrence and Co-Occurrence Analysis

Table 4

Percentage of script type occurrence in document.

Type of script (TOS)LatinCyrillic times

S71.88%82.76%0.87
A19.39%1.74%11.14
D8.46%14.64%0.57
F0.27%0.86%0.31

It is obvious that the Latin document compared to Cyrillic one has slightly smaller number of short (S), descender (D), and full (F) letters. Nonetheless, the crucial margin is seen in ascender (A) letters. Hence, it can be a measure of confidence for detection of the script in a document given in Serbian language.