Research Article

Mixed Script Identification Using Automated DNN Hyperparameter Optimization

Table 1

Corpus statistics (Eng = English, Hin = Hindi, Sar = Saraiki, Ben = Bengali, and RU = Roman Urdu).

DatasetTypeSource FB = Facebook
W = Whatsapp
T = Twitter
Token countTotalSentence

Eng-Roman UrduSpace orientedFB + T + WENG (102311) + HIN (97235)1995463558
Saraiki-HindiCursiveFB + T + WSAR (78412) + HIN (87563)1659754256
Bengali-HindiCursiveFB + T + WBEN (85672) + HIN (87563)1732353801
Eng-BengaliMixFB + T + WENG (102311) + BEN (85672)1879834065
Saraiki-EnglishMixFB + T + WSAR (78412) + ENG (102311)1807233457
Saraiki-Roman UrduMixFB + T + WSAR (78412) + RU (97235)1756473962
Saraiki-BengaliCursiveFB + T + WSAR (78412) + BEN (85672)1640842864
Eng-Bengali-Saraiki-Hindi-Roman UrduMixFB + T + WENG (102311) + BEN (85672) + SAR (78412) + HIN (87563) + RU (97235)4511934539