Skip to content

Conversation

Addyk-24
Copy link

@Addyk-24 Addyk-24 commented Oct 11, 2025

What does this PR do?

Fixes #41492

Fixes incorrect target language generation during evaluation/validation in run_translation.py for multilingual translation models (mBART , M2M100).

Problem

When fine-tuning multilingual models, forced_bos_token_id was only set in model.config but not in model.generation_config. During evaluation, model.generate() reads from generation_config, causing generation in wrong language and artificially low BLEU scores.Previously would be ~2-5 (wrong language)

Solution

Set forced_bos_token_id in both model.config and model.generation_config.

Results:

eval_metrics
  • ✅ BLEU score: 29.07
  • ✅ This warning appears if you modify model.config directly for generation. Using model.generation_config removes this warning and ensures Transformers v5+ uses the setting correctly.
  • ✅ All evaluations complete without errors

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@zach-huggingface @Cyrilvallez

@Bmingg
Copy link

Bmingg commented Oct 11, 2025

What's worked for me is that I set model.generation_config.decoder_start_token_id to the target language ID of MBart. When I discovered this bug, I think I checked the forced_bos_token_id of the output of MBart, and it should still be the start token s>. In this case, I think what was missing was the target language ID after start token, if I remembered correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

For finetuning MBart-based model, setting decoder_start_token_id in model.config is NOT ENOUGH.

2 participants