This is a post for the EMNLP 2021 paper Language Modeling, Lexical Translation, Reordering: The Training Process of NMT through the Lens of Classical SMT.

In SMT, model competences are modelled with distinct models. In NMT, the whole translation task is modelled with a single neural network. How and when does NMT get to learn all the competences? We show that

during training, NMT undergoes three different stages:
target-side language modeling,
learning how to use source and approaching word-by-word translation,
refining translations, visible by increasingly complex reorderings, but almost invisible to standard metrics (e.g. BLEU);

morda-min

not only this is fun, but it can also help in practice! For example, in settings where data complexity matters, such as non-autoregressive NMT.

view code

NMT Training through the Lens of SMT