Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent output lengths when max_length=20 is set implicitly vs explicitly in generate() #35765

Open
2 of 4 tasks
imantdaunhawer opened this issue Jan 18, 2025 · 9 comments · May be fixed by #36215
Open
2 of 4 tasks
Labels

Comments

@imantdaunhawer
Copy link

System Info

  • transformers version: 4.49.0.dev0
  • Platform: macOS-15.1.1-arm64-arm-64bit
  • Python version: 3.11.10
  • Huggingface_hub version: 0.27.1
  • Safetensors version: 0.5.2
  • Accelerate version: 1.2.1
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.5.1 (False)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?: no

Who can help?

@gante
@ArthurZucker

Related PR that discusses recent default max_length-related changes: #34814.

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

When using generate() with a model that has generation_config.max_length=20, the output length differs depending on whether max_length is passed explicitly or used implicitly from the generation_config.

from transformers import AutoModelForCausalLM, AutoTokenizer

# Setup from tests/generation/test_utils.py::GenerationIntegrationTests
article = "Today a dragon flew over Paris."
model = AutoModelForCausalLM.from_pretrained("hf-internal-testing/tiny-random-gpt2")
tokenizer = AutoTokenizer.from_pretrained("hf-internal-testing/tiny-random-gpt2")
input_ids = tokenizer(article, return_tensors="pt").input_ids

# Case 1: Implicit max_length from generation_config
out_gen_implicit = model.generate(input_ids=input_ids)
print(out_gen_implicit.shape[-1])  # 36

# Case 2: Explicit max_length
out_gen_explicit = model.generate(
    input_ids=input_ids,
    max_length=model.generation_config.max_length
)
print(out_gen_explicit.shape[-1])  # 20

In the first case, the generated text is longer than in the second case (36 vs. 20 tokens).

Reason and scope

In the first case, max_length is overwritten as follows in file src/transformers/generation/utils.py, function _prepare_generated_length:

if generation_config.max_length == GenerationConfig().max_length:
    generation_config.max_length = generation_config.max_length + input_ids_length

Since GenerationConfig().max_length defaults to 20, the bug only affects models with generation_config.max_length set to 20.

Expected behavior

The calls model.generate(input_ids=input_ids) and model.generate(input_ids=input_ids, max_length=model.generation_config.max_length) should generate texts of the same length when generation_config.max_length is set to 20.

@Rocketknight1
Copy link
Member

cc @gante

@zucchini-nlp
Copy link
Member

Btw, is there any reason why we made default max_length act differently than user-defined value, instead of adding a new default for max_new_tokens=20, @gante ?

@ArthurZucker
Copy link
Collaborator

That is expected, the default is to generate 20 tokens, so max_new_tokens not max_lenght. Max length is really when you don't want more than a certain lenght regardless of the input length

@imantdaunhawer
Copy link
Author

That is expected, the default is to generate 20 tokens, so max_new_tokens not max_lenght. Max length is really when you don't want more than a certain lenght regardless of the input length

Ok, I understand that.

But is it safe to assume that generation_config.max_length is 20 for all models and is not changed manually? Because in those cases, model.generate(input_ids) does not generate 20 new tokens by default.

For example, if generation_config.max_length = 21 is set, then model.generate(input_ids) uses max length instead of generating 20 new tokens.

@ArthurZucker
Copy link
Collaborator

Yeah it does not because sometimes your prompt can be answered in less thank 20 tokens!

If you explicitly set max_length and no max_new_tokens it will respect what you ask it to do!

@imantdaunhawer
Copy link
Author

I understand that the generation uses max_length instead of max_new_tokens if the former is set explicitly. It is also clear that some prompts can be answered in fewer tokens than the number specified.

Yet, I still think that generate shows inconsistent behavior depending on the value of model.generation_config.max_tokens:

  • If model.generation_config.max_tokens is set to 20 (as per default), then generate produces at most 20 new tokens, independent of the number of input tokens.
  • If model.generation_config.max_tokens is set to any value $Y \not= 20$, then generate produces at most $Y - X$ new tokens, where $X$ is the number of input tokens.

@zucchini-nlp
Copy link
Member

I agree with the above. We can have defaults max_length=None and max_new_tokens=20 to be more transparent @ArthurZucker

@ArthurZucker
Copy link
Collaborator

Happy to update as well! Sorry that this created confusion, many things are kept for BC 😿

@gante
Copy link
Member

gante commented Feb 15, 2025

I'll open a PR to update this new behavior and make the default clearer :)

@gante gante linked a pull request Feb 15, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants