Recognition of the Script in Serbian Documents Using Frequency Occurrence and Co-Occurrence Analysis
Table 10
GLCM five descriptors of the script type co-occurrence in documents from database.
Printed documents
Doc 1
Doc 2
Doc 3
Doc 4
Doc 5
Latin
Cyrillic
Latin
Cyrillic
Latin
Cyrillic
Latin
Cyrillic
Latin
Cyrillic
Uniformity
0.2885
0.4725
0.2473
0.4167
0.2557
0.4120
0.2759
0.4545
0.2707
0.4498
Entropy
−1.5191
−1.1774
−1.6379
−1.3079
−1.6047
−1.2999
−1.5675
−1.1650
−1.5847
−1.1799
Max. probability
0.4655
0.6636
0.3952
0.6139
0.4120
0.6098
0.4439
0.6457
0.4349
0.6405
Dissimilarity
0.6847
0.5933
0.7469
0.6592
0.7502
0.6427
0.7064
0.6041
0.7117
0.6217
Contrast
1.0324
1.1790
1.1106
1.2859
1.1258
1.2261
1.0577
1.1449
1.0630
1.1949
Web documents
Doc 6
Doc 7
Doc 8
Doc 9
Doc 10
Latin
Cyrillic
Latin
Cyrillic
Latin
Cyrillic
Latin
Cyrillic
Latin
Cyrillic
Uniformity
0.2447
0.3714
0.2754
0.3817
0.2533
0.5005
0.2252
0.3147
0.2522
0.3325
Entropy
−1.6524
−1.3738
−1.5725
−1.3412
−1.5990
−1.0779
−1.6778
−1.5650
−1.6144
−1.5059
Max. probability
0.3964
0.5650
0.4409
0.5753
0.3972
0.6844
0.3195
0.5154
0.4016
0.5318
Dissimilarity
0.7723
0.7320
0.6912
0.7209
0.7294
0.5686
0.8317
0.7667
0.7256
0.7416
Contrast
1.1862
1.3869
1.0287
1.3681
1.0459
1.1158
1.2122
1.4220
1.0641
1.3716
The above results are further processed in order to calculate the ratio of script type co-occurrence in between Latin and Cyrillic document. These results are shown in Table 11.