Research Article

Proposed Framework for the Evaluation of Standalone Corpora Processing Systems: An Application to Arabic Corpora

Table 1

Number of words, number of texts, and average text length for the corpora used in the performance evaluation.

CorpusNumber of wordsNumber of textsAverage text length (words)

2012 CAN2,207,4692,910759
4M KACST ATCC4,356,5095,939734
7M KACST ATCC7,198,76711,719614
KACST ATCC11,555,27617,658654
KSUCCA50,602,412410123,421