Inconsistent output lengths when `max_length=20` is set implicitly vs explicitly in `generate()` #35765

imantdaunhawer · 2025-01-18T13:41:52Z

System Info

transformers version: 4.49.0.dev0
Platform: macOS-15.1.1-arm64-arm-64bit
Python version: 3.11.10
Huggingface_hub version: 0.27.1
Safetensors version: 0.5.2
Accelerate version: 1.2.1
Accelerate config: not found
PyTorch version (GPU?): 2.5.1 (False)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using distributed or parallel set-up in script?: no

Who can help?

@gante
@ArthurZucker

Related PR that discusses recent default max_length-related changes: #34814.

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

When using generate() with a model that has generation_config.max_length=20, the output length differs depending on whether max_length is passed explicitly or used implicitly from the generation_config.

from transformers import AutoModelForCausalLM, AutoTokenizer

# Setup from tests/generation/test_utils.py::GenerationIntegrationTests
article = "Today a dragon flew over Paris."
model = AutoModelForCausalLM.from_pretrained("hf-internal-testing/tiny-random-gpt2")
tokenizer = AutoTokenizer.from_pretrained("hf-internal-testing/tiny-random-gpt2")
input_ids = tokenizer(article, return_tensors="pt").input_ids

# Case 1: Implicit max_length from generation_config
out_gen_implicit = model.generate(input_ids=input_ids)
print(out_gen_implicit.shape[-1])  # 36

# Case 2: Explicit max_length
out_gen_explicit = model.generate(
    input_ids=input_ids,
    max_length=model.generation_config.max_length
)
print(out_gen_explicit.shape[-1])  # 20

In the first case, the generated text is longer than in the second case (36 vs. 20 tokens).

Reason and scope

In the first case, max_length is overwritten as follows in file src/transformers/generation/utils.py, function _prepare_generated_length:

if generation_config.max_length == GenerationConfig().max_length:
    generation_config.max_length = generation_config.max_length + input_ids_length

Since GenerationConfig().max_length defaults to 20, the bug only affects models with generation_config.max_length set to 20.

Expected behavior

The calls model.generate(input_ids=input_ids) and model.generate(input_ids=input_ids, max_length=model.generation_config.max_length) should generate texts of the same length when generation_config.max_length is set to 20.

The text was updated successfully, but these errors were encountered:

Rocketknight1 · 2025-01-20T14:51:17Z

cc @gante

zucchini-nlp · 2025-01-29T15:07:22Z

Btw, is there any reason why we made default max_length act differently than user-defined value, instead of adding a new default for max_new_tokens=20, @gante ?

ArthurZucker · 2025-02-12T15:02:38Z

That is expected, the default is to generate 20 tokens, so max_new_tokens not max_lenght. Max length is really when you don't want more than a certain lenght regardless of the input length

imantdaunhawer · 2025-02-12T21:25:50Z

That is expected, the default is to generate 20 tokens, so max_new_tokens not max_lenght. Max length is really when you don't want more than a certain lenght regardless of the input length

Ok, I understand that.

But is it safe to assume that generation_config.max_length is 20 for all models and is not changed manually? Because in those cases, model.generate(input_ids) does not generate 20 new tokens by default.

For example, if generation_config.max_length = 21 is set, then model.generate(input_ids) uses max length instead of generating 20 new tokens.

ArthurZucker · 2025-02-13T10:15:58Z

Yeah it does not because sometimes your prompt can be answered in less thank 20 tokens!

If you explicitly set max_length and no max_new_tokens it will respect what you ask it to do!

imantdaunhawer · 2025-02-13T20:19:18Z

I understand that the generation uses max_length instead of max_new_tokens if the former is set explicitly. It is also clear that some prompts can be answered in fewer tokens than the number specified.

Yet, I still think that generate shows inconsistent behavior depending on the value of model.generation_config.max_tokens:

If model.generation_config.max_tokens is set to 20 (as per default), then generate produces at most 20 new tokens, independent of the number of input tokens.
If model.generation_config.max_tokens is set to any value $Y \not= 20$, then generate produces at most $Y - X$ new tokens, where $X$ is the number of input tokens.

zucchini-nlp · 2025-02-14T08:52:34Z

I agree with the above. We can have defaults max_length=None and max_new_tokens=20 to be more transparent @ArthurZucker

ArthurZucker · 2025-02-14T10:15:36Z

Happy to update as well! Sorry that this created confusion, many things are kept for BC 😿

gante · 2025-02-15T10:16:58Z

I'll open a PR to update this new behavior and make the default clearer :)

imantdaunhawer added the bug label Jan 18, 2025

gante linked a pull request Feb 15, 2025 that will close this issue

🔴 [generate] default max_new_tokens #36215

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent output lengths when `max_length=20` is set implicitly vs explicitly in `generate()` #35765

Inconsistent output lengths when `max_length=20` is set implicitly vs explicitly in `generate()` #35765

imantdaunhawer commented Jan 18, 2025

Rocketknight1 commented Jan 20, 2025

zucchini-nlp commented Jan 29, 2025

ArthurZucker commented Feb 12, 2025

imantdaunhawer commented Feb 12, 2025

ArthurZucker commented Feb 13, 2025

imantdaunhawer commented Feb 13, 2025

zucchini-nlp commented Feb 14, 2025

ArthurZucker commented Feb 14, 2025

gante commented Feb 15, 2025

Inconsistent output lengths when max_length=20 is set implicitly vs explicitly in generate() #35765

Inconsistent output lengths when max_length=20 is set implicitly vs explicitly in generate() #35765

Comments

imantdaunhawer commented Jan 18, 2025

System Info

Who can help?

Information

Tasks

Reproduction

Reason and scope

Expected behavior

Rocketknight1 commented Jan 20, 2025

zucchini-nlp commented Jan 29, 2025

ArthurZucker commented Feb 12, 2025

imantdaunhawer commented Feb 12, 2025

ArthurZucker commented Feb 13, 2025

imantdaunhawer commented Feb 13, 2025

zucchini-nlp commented Feb 14, 2025

ArthurZucker commented Feb 14, 2025

gante commented Feb 15, 2025

Inconsistent output lengths when `max_length=20` is set implicitly vs explicitly in `generate()` #35765

Inconsistent output lengths when `max_length=20` is set implicitly vs explicitly in `generate()` #35765