Research Article

A Dirichlet Process Mixture Based Name Origin Clustering and Alignment Model for Transliteration

Table 1

The size of the three multiorigin corpora.

Corpora Corpus size
Training set Dev. set Test set

EO 32,680 3,267 3,267
ECJ-O 32,500 3,250 3,250
Multi-O 33,290 3,328 3,328