|
Reference | Authors | Training and optimization | Mechanism | Search at decoder (siz) |
|
[18] | Rush et al. | Stochastic gradient descent to minimise negative log-likelihood | | Beam search |
[39] | Chopra et al. | Minimizing negative log-likelihood using end-to-end using stochastic gradient descent | Encodes the position information of the input words | Beam search |
[55] | Nallapati et al. | Optimize the conditional likelihood using Adadelta | Pointer mechanism | Beam search (5) |
[52] | Zhou et al. | Stochastic gradient descent, Adam optimizer, optimizing the negative log-likelihood | Attention mechanism | Beam search (12) |
[53] | Cao et al. | Adam optimizer, optimizing the negative log-likelihood | Copy mechanism, coverage mechanism, dual-attention decoder | Beam search (6) |
[54] | Cai et al. | Cross entropy is used as the loss function | Attention mechanism | Beam search (5) |
[50] | Adelson et al. | Adam | Attention mechanism | |
[29] | Lopyrev | RMSProp adaptive gradient method | Simple and complex attention mechanism | Beam search |
[38] | Jobson et al. | Adadelta, minimising the negative log probability of prediction word | Bilinear attention mechanism, pointer mechanism | |
[56] | See et al. | Adadelta | Coverage mechanism, attention mechanism, pointer mechanism | Beam search (4) |
[57] | Paulus et al. | Adam, RL | Intradecoder attention mechanism, pointer mechanism, copy mechanism, RL | Beam search (5) |
[58] | Liu et al. | Adadelta stochastic gradient descent | Attention mechanism, pointer mechanism, copy mechanism, RL | |
[30] | Song et al. | | Attention mechanism, copy mechanism | |
[35] | Al-Sabahi et al. | Adagrad | Pointer mechanism, coverage mechanism, copy mechanism | Bidirectional beam search |
[59] | Li et al. | Adadelta | Attention mechanism, pointer mechanism, copy mechanism, prediction guide mechanism | Beam search |
[60] | Kryściński et al. | Asynchronous gradient descent optimizer | Temporal attention and intra-attention pointer mechanism, RL | Beam search |
[61] | Yao et al. | RL, Adagrad | Attention mechanism, pointer mechanism, copy mechanism, coverage mechanism, RL | Beam search (4) |
[62] | Wan et al. | Adagrad | Attention mechanism, pointer mechanism | Beam-search backward (2) and forward (4) |
[65] | Liu et al. | Adam | Self-attention mechanism | Beam search (5) |
[63] | Wang et al. | Gradient of reinforcement learning, Adam, cross-entropy loss function | Attention mechanism, pointer mechanism, copy mechanism, new coverage mechanism | Beam search |
[64] | Egonmwan et al. | Adam | Self-attention mechanism | Greedy-decoding during training and validation. Beam search at decoding during testing |
[49] | Peng et al. | Adam, gradient descent, cross-entropy loss | Coverage mechanism, RL, double attention pointer network (DAPT) | Beam search (5) |
|