Fix forced_bos_token_id not set in generation_config #41521

Addyk-24 · 2025-10-11T12:19:20Z

What does this PR do?

Fixes incorrect target language generation during evaluation/validation in run_translation.py for multilingual translation models (mBART , M2M100).

Problem

When fine-tuning multilingual models, forced_bos_token_id was only set in model.config but not in model.generation_config. During evaluation, model.generate() reads from generation_config, causing generation in wrong language and artificially low BLEU scores.Previously would be ~2-5 (wrong language)

Solution

Set forced_bos_token_id in both model.config and model.generation_config.

Results:

✅ BLEU score: 29.07
✅ This warning appears if you modify model.config directly for generation. Using model.generation_config removes this warning and ensures Transformers v5+ uses the setting correctly.
✅ All evaluations complete without errors

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@zach-huggingface @Cyrilvallez

Bmingg · 2025-10-11T21:52:38Z

What's worked for me is that I set model.generation_config.decoder_start_token_id to the target language ID of MBart. When I discovered this bug, I think I checked the forced_bos_token_id of the output of MBart, and it should still be the start token s>. In this case, I think what was missing was the target language ID after start token, if I remembered correctly.

Addyk-24 added 2 commits October 11, 2025 17:03

Fix forced_bos_token_id not set in generation_config

5ee7714

Fixed - forced_bos_token_id not set in generation_config

1198033

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix forced_bos_token_id not set in generation_config #41521

Fix forced_bos_token_id not set in generation_config #41521

Addyk-24 commented Oct 11, 2025 •

edited

Loading

Uh oh!

Bmingg commented Oct 11, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix forced_bos_token_id not set in generation_config #41521

Are you sure you want to change the base?

Fix forced_bos_token_id not set in generation_config #41521

Conversation

Addyk-24 commented Oct 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Problem

Solution

Before submitting

Who can review?

Uh oh!

Bmingg commented Oct 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Addyk-24 commented Oct 11, 2025 •

edited

Loading

Bmingg commented Oct 11, 2025 •

edited

Loading