Research Article

Improving Transformer-Based Neural Machine Translation with Prior Alignments

Algorithm 1

Procedure to construct statistical alignments .
(1)We tokenize both Vietnamese source sentences and English target sentences. We apply the types of tokens in the Transformer-S1 model as in the case of the Transformer-M model. A token in both source and target sentences is a sequence of characters delimited by spaces. Linguistically, Vietnamese-English Transformer-M and Transformer-S1 models are syllable-to-word models since spaces in Vietnamese delimit syllables and spaces in English delimit words.
(2)We construct many-to-one alignments from Vietnamese to English, using the fast_align token aligner.
(3)We repeat step 2 in the reverse direction from English to Vietnamese.
(4)We merge the bidirectional alignments generated in steps 2 and 3, following grow-diagonal heuristics proposed by Koehn et al. [24].