Research Article

Improving Transformer-Based Neural Machine Translation with Prior Alignments

Table 3

Overview of the datasets.

Vietnamese/EnglishTrainingValidationTesting

Number of news articles9304030
Number of sentences42,0261,4821,527
Average sentence length26.2/19.224.5/17.828.3/20.6
Alignments per sentence22.420.823.1
Number of unique tokens16441/366722720/49813462/6211
Number of alignments9420013082135291
Number of tokens1099205/80645636276/2631543286/31513