Research Article

An Improved Transformer-Based Neural Machine Translation Strategy: Interacting-Head Attention

Table 2

Possible number of heads in our model.

DatasetModelNumber of heads

IWSLT16 DE-EN51225 [2, 4, 8, 16]
WMT17 EN-DE51220 [2, 4, 8, 16]
WMT17 EN-CS51220 [2, 4, 8, 16]