Research Article

Unsupervised Chunking Based on Graph Propagation from Bilingual Corpus

Table 4

(a) English-Chinese parallel corpus. (b) Graph instance. (c) Chinese unlabeled dataset: CTB7 corpora.
(a)

Number of sentence pairsNumber of seedsNumber of words

10,00027,94031,678

(b)

Number of sentences Number of vertices

17,617185,441

(c)

DatasetSourceNumber of sentences

Training datasetXinhua 1–3217,617
Testing datasetXinhua 363–403912