Research Article

Recognition of the Script in Serbian Documents Using Frequency Occurrence and Co-Occurrence Analysis

Table 10

GLCM five descriptors of the script type co-occurrence in documents from database.

Printed documents
Doc 1Doc 2Doc 3Doc 4Doc 5
LatinCyrillicLatinCyrillicLatinCyrillicLatinCyrillicLatinCyrillic

Uniformity0.28850.47250.24730.41670.25570.41200.27590.45450.27070.4498
Entropy−1.5191−1.1774−1.6379−1.3079−1.6047−1.2999−1.5675−1.1650−1.5847−1.1799
Max. probability0.46550.66360.39520.61390.41200.60980.44390.64570.43490.6405
Dissimilarity0.68470.59330.74690.65920.75020.64270.70640.60410.71170.6217
Contrast1.03241.17901.11061.28591.12581.22611.05771.14491.06301.1949

Web documents
Doc 6Doc 7Doc 8Doc 9Doc 10
LatinCyrillicLatinCyrillicLatinCyrillicLatinCyrillicLatinCyrillic

Uniformity0.24470.37140.27540.38170.25330.50050.22520.31470.25220.3325
Entropy−1.6524−1.3738−1.5725−1.3412−1.5990−1.0779−1.6778−1.5650−1.6144−1.5059
Max. probability0.39640.56500.44090.57530.39720.68440.31950.51540.40160.5318
Dissimilarity0.77230.73200.69120.72090.72940.56860.83170.76670.72560.7416
Contrast1.18621.38691.02871.36811.04591.11581.21221.42201.06411.3716

The above results are further processed in order to calculate the ratio of script type co-occurrence in between Latin and Cyrillic document. These results are shown in Table 11.