Research Article
Mixed Script Identification Using Automated DNN Hyperparameter Optimization
Table 1
Corpus statistics (Eng = English, Hin = Hindi, Sar = Saraiki, Ben = Bengali, and RU = Roman Urdu).
| Dataset | Type | Source FB = Facebook W = Whatsapp T = Twitter | Token count | Total | Sentence |
| Eng-Roman Urdu | Space oriented | FB + T + W | ENG (102311) + HIN (97235) | 199546 | 3558 | Saraiki-Hindi | Cursive | FB + T + W | SAR (78412) + HIN (87563) | 165975 | 4256 | Bengali-Hindi | Cursive | FB + T + W | BEN (85672) + HIN (87563) | 173235 | 3801 | Eng-Bengali | Mix | FB + T + W | ENG (102311) + BEN (85672) | 187983 | 4065 | Saraiki-English | Mix | FB + T + W | SAR (78412) + ENG (102311) | 180723 | 3457 | Saraiki-Roman Urdu | Mix | FB + T + W | SAR (78412) + RU (97235) | 175647 | 3962 | Saraiki-Bengali | Cursive | FB + T + W | SAR (78412) + BEN (85672) | 164084 | 2864 | Eng-Bengali-Saraiki-Hindi-Roman Urdu | Mix | FB + T + W | ENG (102311) + BEN (85672) + SAR (78412) + HIN (87563) + RU (97235) | 451193 | 4539 |
|
|