Research Article
Design and Implementation of a Machine Learning-Based Authorship Identification Model
Table 7
Unsupervised classification of documents based on LDA topics with cosine similarity on four datasets of UrduCorpus.
| Method and dataset | Parameters | Accuracy rate (%) |
| LDA instance-based (original text) | Vocabulary 44,634, k = 24 | 91.42 | LDA instance-based (n-grams) | Vocabulary 104,201, k = 60 | 93.17 | LDA profile-based (original text) | Vocabulary 44,634, k = 60 | 91.83 | LDA profile-based (n-grams) | Vocabulary 55,423, k = 72 | 91.75 |
|
|