Research Article
A Dirichlet Process Mixture Based Name Origin Clustering and Alignment Model for Transliteration
Table 1
The size of the three multiorigin corpora.
| Corpora | Corpus size | Training set | Dev. set | Test set |
| EO | 32,680 | 3,267 | 3,267 | ECJ-O | 32,500 | 3,250 | 3,250 | Multi-O | 33,290 | 3,328 | 3,328 |
|
|