Why is the M2 score in models/README.md different from that in your paper ? #2

h-asano · 2018-12-21T09:19:40Z

Your reported M2 score on CoNLL2014 is 57.53.
In your paper, the M2 score is 55.8.

snukky · 2018-12-21T10:12:40Z

The published system is not exactly the same system we trained for the paper as we have lost the original models and config files. I reconstructed the system with a newer version of Marian, and there are several reasons why M2 scores are different:

This is different training run.
Implementation of transformer models or default training parameters in Marian could change slightly during last year.
I replaced averaging four best model checkpoints with built-in exponential smoothing, which is similar, but probably slightly more effective. This was also a nice simplification.
I used my recent experience with training tranformer models to chose training parameters that we didn't mention in the paper.

So these are changes that someone could make while reconstructing our systems from scratch using the same data. The training data, subword segmentation codes, and vocabularies are exactly the same.

h-asano · 2018-12-21T10:32:01Z

Thank you very much !

snukky pinned this issue Mar 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why is the M2 score in models/README.md different from that in your paper ? #2

Why is the M2 score in models/README.md different from that in your paper ? #2

h-asano commented Dec 21, 2018

snukky commented Dec 21, 2018

h-asano commented Dec 21, 2018

Why is the M2 score in models/README.md different from that in your paper ? #2

Why is the M2 score in models/README.md different from that in your paper ? #2

Comments

h-asano commented Dec 21, 2018

snukky commented Dec 21, 2018

h-asano commented Dec 21, 2018