Research Article
Improving Transformer-Based Neural Machine Translation with Prior Alignments
Table 3
Overview of the datasets.
| Vietnamese/English | Training | Validation | Testing |
| Number of news articles | 930 | 40 | 30 | Number of sentences | 42,026 | 1,482 | 1,527 | Average sentence length | 26.2/19.2 | 24.5/17.8 | 28.3/20.6 | Alignments per sentence | 22.4 | 20.8 | 23.1 | Number of unique tokens | 16441/36672 | 2720/4981 | 3462/6211 | Number of alignments | 942001 | 30821 | 35291 | Number of tokens | 1099205/806456 | 36276/26315 | 43286/31513 |
|
|