You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The published system is not exactly the same system we trained for the paper as we have lost the original models and config files. I reconstructed the system with a newer version of Marian, and there are several reasons why M2 scores are different:
This is different training run.
Implementation of transformer models or default training parameters in Marian could change slightly during last year.
I replaced averaging four best model checkpoints with built-in exponential smoothing, which is similar, but probably slightly more effective. This was also a nice simplification.
I used my recent experience with training tranformer models to chose training parameters that we didn't mention in the paper.
So these are changes that someone could make while reconstructing our systems from scratch using the same data. The training data, subword segmentation codes, and vocabularies are exactly the same.
Your reported M2 score on CoNLL2014 is 57.53.
In your paper, the M2 score is 55.8.
The text was updated successfully, but these errors were encountered: