Research Article

Recognition of the Script in Serbian Documents Using Frequency Occurrence and Co-Occurrence Analysis

Table 6

Cooccurrence descriptors for Latin and Cyrillic cipher text.

Serbian languageLatin CyrillicCharacterization

Uniformity (energy)0.24590.3811Latin < Cyrillic
Entropyāˆ’1.6298āˆ’1.4363Latin > Cyrillic
Maximum probability0.37220.5863Latin < Cyrillic
Dissimilarity0.73560.6669Latin > Cyrillic
Contrast1.04231.2660Latin < Cyrillic

From the above results, it is clear that co-occurrence descriptors can fully characterize the difference between Latin and Cyrillic script. This means that frequency analysis of the occurrence can be supplemented with additional attributes in order to define a strong margin as a criterion to distinguish a certain script.