🧹 Remove `max_batch_tokens`, `num_blocks` and `block_size` from generation kwargs #4065

qgallouedec · 2025-09-11T17:32:48Z

These parameters are now automatically computed, see huggingface/transformers#40426

… and RLOOTrainer

HuggingFaceDocBuilderDev · 2025-09-11T17:40:25Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

remi-or · 2025-09-11T21:26:22Z

As a reminder, while max_batch_tokens and num_blocks are infered, block_size isn't and defaults to 32. You might want to test out how the change from 128 to 32 will impact you 🙂
Not really familiar with trl so I can't really review this, sorry!

qgallouedec · 2025-09-11T21:43:28Z

thanks for the feedback @remi-or!

change from 128 to 32 will impact you 🙂

will do!

albertvillanova

LGTM if the test of the change in block_size does not have a negative impact. Otherwise, we should keep the original 128 value.

qgallouedec · 2025-09-23T14:50:29Z

Tried with 0.5B and 1.7B, no difference between block_size=32 and 128👍

…ation kwargs (#4065)

qgallouedec added 2 commits September 11, 2025 17:29

🗑 Remove generation kwargs for transformers paged in GRPOTrainer

6d8caf7

🗑 Remove generation kwargs for transformers paged in OnlineDPOTrainer…

9685763

… and RLOOTrainer

qgallouedec requested review from McPatate, kashif, remi-or and albertvillanova September 11, 2025 17:33

Merge branch 'main' into remove-kward-paged-attention

1e56683

Merge branch 'main' into remove-kward-paged-attention

17b8921

qgallouedec added 2 commits September 11, 2025 15:55

Merge branch 'main' into remove-kward-paged-attention

2de94bd

Merge branch 'main' into remove-kward-paged-attention

a4e8b86

McPatate approved these changes Sep 23, 2025

View reviewed changes

kashif approved these changes Sep 23, 2025

View reviewed changes

albertvillanova reviewed Sep 23, 2025

View reviewed changes

restore train mode

7eb964a

qgallouedec merged commit deac14a into main Sep 23, 2025
10 of 12 checks passed

qgallouedec deleted the remove-kward-paged-attention branch September 23, 2025 14:50

qgallouedec added a commit that referenced this pull request Sep 23, 2025

🧹 Remove max_batch_tokens, num_blocks and block_size from gener…

95fe6b8

…ation kwargs (#4065)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🧹 Remove `max_batch_tokens`, `num_blocks` and `block_size` from generation kwargs #4065

🧹 Remove `max_batch_tokens`, `num_blocks` and `block_size` from generation kwargs #4065

Uh oh!

qgallouedec commented Sep 11, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Sep 11, 2025

Uh oh!

remi-or commented Sep 11, 2025

Uh oh!

qgallouedec commented Sep 11, 2025

Uh oh!

albertvillanova left a comment

Uh oh!

qgallouedec commented Sep 23, 2025

Uh oh!

Uh oh!

Uh oh!

🧹 Remove max_batch_tokens, num_blocks and block_size from generation kwargs #4065

🧹 Remove max_batch_tokens, num_blocks and block_size from generation kwargs #4065

Uh oh!

Conversation

qgallouedec commented Sep 11, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Sep 11, 2025

Uh oh!

remi-or commented Sep 11, 2025

Uh oh!

qgallouedec commented Sep 11, 2025

Uh oh!

albertvillanova left a comment

Choose a reason for hiding this comment

Uh oh!

qgallouedec commented Sep 23, 2025

Uh oh!

Uh oh!

Uh oh!

🧹 Remove `max_batch_tokens`, `num_blocks` and `block_size` from generation kwargs #4065

🧹 Remove `max_batch_tokens`, `num_blocks` and `block_size` from generation kwargs #4065