Research Article

Design and Implementation of a Machine Learning-Based Authorship Identification Model

Table 6

Unsupervised classification of documents based on LDA topics with cosine similarity on twelve datasets of pan12.

Dataset with descriptionParametersAccuracy rate (%)

A1 instance-based (original text)Vocabulary 900, k = 683.3
A2 instance-based (n-grams)Vocabulary 1500, k = 6100
A3 profile-based (original text)Vocabulary 900, k = 383.3
A4 profile-based (n-grams)Vocabulary 1500, k = 3100
C1 instance-based (original text)Vocabulary 2400, k = 1650.0
C2 instance-based (n-grams)Vocabulary 4000, k = 1675.0
C3 profile-based (original text)Vocabulary 2400, k = 862.5
C4 profile-based (n-grams)Vocabulary 4000, k = 875.0
I1 instance-based (original text)Vocabulary 4200, k = 2864.3
I2 instance-based (n-grams)Vocabulary 7000, k = 2878.6
I3 profile-based (original text)Vocabulary 4200, k = 1464.3
I4 profile-based (n-grams)Vocabulary 7000, k = 1478.6