Research Article

Design and Implementation of a Machine Learning-Based Authorship Identification Model

Table 7

Unsupervised classification of documents based on LDA topics with cosine similarity on four datasets of UrduCorpus.

Method and datasetParametersAccuracy rate (%)

LDA instance-based (original text)Vocabulary 44,634, k = 2491.42
LDA instance-based (n-grams)Vocabulary 104,201, k = 6093.17
LDA profile-based (original text)Vocabulary 44,634, k = 6091.83
LDA profile-based (n-grams)Vocabulary 55,423, k = 7291.75