Research Article
Design and Implementation of a Machine Learning-Based Authorship Identification Model
Table 6
Unsupervised classification of documents based on LDA topics with cosine similarity on twelve datasets of pan12.
| Dataset with description | Parameters | Accuracy rate (%) |
| A1 instance-based (original text) | Vocabulary 900, k = 6 | 83.3 | A2 instance-based (n-grams) | Vocabulary 1500, k = 6 | 100 | A3 profile-based (original text) | Vocabulary 900, k = 3 | 83.3 | A4 profile-based (n-grams) | Vocabulary 1500, k = 3 | 100 | C1 instance-based (original text) | Vocabulary 2400, k = 16 | 50.0 | C2 instance-based (n-grams) | Vocabulary 4000, k = 16 | 75.0 | C3 profile-based (original text) | Vocabulary 2400, k = 8 | 62.5 | C4 profile-based (n-grams) | Vocabulary 4000, k = 8 | 75.0 | I1 instance-based (original text) | Vocabulary 4200, k = 28 | 64.3 | I2 instance-based (n-grams) | Vocabulary 7000, k = 28 | 78.6 | I3 profile-based (original text) | Vocabulary 4200, k = 14 | 64.3 | I4 profile-based (n-grams) | Vocabulary 7000, k = 14 | 78.6 |
|
|