NMT Training through the Lens of SMT

This is a post for the EMNLP 2021 paper Language Modeling, Lexical Translation, Reordering: The Training Process of NMT through the Lens of Classical SMT.

In SMT, model competences are modelled with distinct models. In NMT, the whole translation task is modelled with a single neural network. How and when does NMT get to learn all the competences? We show that

  • during training, NMT undergoes three different stages:
  • target-side language modeling,
  • learning how to use source and approaching word-by-word translation,
  • refining translations, visible by increasingly complex reorderings, but almost invisible to standard metrics (e.g. BLEU);

morda-min

  • not only this is fun, but it can also help in practice! For example, in settings where data complexity matters, such as non-autoregressive NMT.