Research Article
An Improved Transformer-Based Neural Machine Translation Strategy: Interacting-Head Attention
Table 2
Possible number of heads in our model.
| Dataset | Model | Number of heads |
| IWSLT16 DE-EN | 512 | 25 [2, 4, 8, 16] | WMT17 EN-DE | 512 | 20 [2, 4, 8, 16] | WMT17 EN-CS | 512 | 20 [2, 4, 8, 16] |
|
|