Transformer | 34.44 | Attention Is All You Need | - |
TaLK Convolutions | 35.5 | Time-aware Large Kernel Convolutions | - |
Mask Attention Network (small) | 36.3 | Mask Attention Networks: Rethinking and Strengthen Transformer | - |
Minimum Risk Training [Edunov2017] | 32.84 | Classical Structured Prediction Losses for Sequence to Sequence Learning | - |
TransformerBase + AutoDropout | 35.8 | AutoDropout: Learning Dropout Patterns to Regularize Deep Networks | - |
Back-Translation Finetuning | 28.83 | Tag-less Back-Translation | - |
Local Joint Self-attention | 35.7 | Joint Source-Target Self Attention with Locality Constraints | - |
Transformer + R-Drop + Cutoff | 37.90 | R-Drop: Regularized Dropout for Neural Networks | - |
Actor-Critic [Bahdanau2017] | 28.53 | An Actor-Critic Algorithm for Sequence Prediction | - |