Research Article

Design and Implementation of a Machine Learning-Based Authorship Identification Model

Table 5

PAN12 datasets used in the experiments.

DatasetTraining documents wordsDistinct wordsVocabulary size

A125,7713,252900
A2128,84548,0121,500
A325,7713,252900
A4128,84548,0121,500
C196,05226,6542,400
C2480,250133,2564,000
C396,05226,6542,400
C4480,250133,2564,000
I12,353,267137,3154,200
I211,766,3257,839,4717,000
I32,353,267137,3154,200
I411,766,3257,839,4717,000