Research Article
Unsupervised Chunking Based on Graph Propagation from Bilingual Corpus
Table 4
(a) English-Chinese parallel corpus. (b) Graph instance. (c) Chinese unlabeled dataset: CTB7 corpora.
(a) |
| Number of sentence pairs | Number of seeds | Number of words |
| 10,000 | 27,940 | 31,678 |
|
|
(b) |
| Number of sentences | Number of vertices |
| 17,617 | 185,441 |
|
|
(c) |
| Dataset | Source | Number of sentences |
| Training dataset | Xinhua 1–321 | 7,617 | Testing dataset | Xinhua 363–403 | 912 |
|
|