resizing token embeddings causes output embedding to be reinitialized in `post_init` when `tie_word_embedding` is False #35141

avishaiElmakies · 2024-12-07T15:26:57Z

System Info

transformers version: 4.46.3
Platform: Linux-6.6.20-aufs-1-x86_64-with-glibc2.36
Python version: 3.11.2
Huggingface_hub version: 0.26.1
Safetensors version: 0.4.5
Accelerate version: 1.0.1
Accelerate config: not found
PyTorch version (GPU?): 2.5.1+cu124 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using distributed or parallel set-up in script?: No
Using GPU in script?: Yes
GPU type: NVIDIA A10

Who can help?

@ArthurZucker

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

This code reproduces the problem:

pythia = AutoModelForCausalLM.from_pretrained("EleutherAI/pythia-410m")
pythia.resize_token_embeddings(502)
pythia.post_init()

the default value for tie_word_embeddings in pythia is False.
I believe the problem arises from the fact the if tie_word_embeddings is False, Then resize_token_embeddings creates a new nn.Linear object that doesn't have the flag _is_hf_initialized(causing it to be False when using getattr), and then post_init calls _init_weights on the new module.

transformers/src/transformers/modeling_utils.py

Line 2406 in c8c8dff

new_lm_head = nn.Linear(

Expected behavior

post_init should not change the weights of output_embeddings after a resize.

The text was updated successfully, but these errors were encountered:

Rocketknight1 · 2024-12-09T16:39:13Z

@avishaiElmakies this might be a bug, but can you explain why you're calling post_init() after the layer resize?

avishaiElmakies · 2024-12-09T16:42:24Z

I am using the model inside a more complex model which i add more modules to. I want thow to be initiated normally, which from what i understand should be done using post_init.

Rocketknight1 · 2024-12-09T17:03:10Z

@avishaiElmakies I understand! In that case, would setting _is_hf_initialized in resize_token_embeddings fix the issue? Would you be willing to make that PR?

avishaiElmakies · 2024-12-10T09:08:36Z

@Rocketknight1 yes. I think _is_hf_initialized in the function would fix the issue. I would be willing to do a PR, but don't think I will have the time this week. so if someone else wants to do it will be fine

Rocketknight1 · 2024-12-10T15:49:19Z

@avishaiElmakies thank you! The PR isn't urgent, but we'd definitely appreciate the fix if you get a chance.

avishaiElmakies · 2024-12-10T15:54:52Z

Thank you for the response!

ArthurZucker · 2024-12-20T15:58:22Z

PR is most welcome as I am not sure this is the intended API, but might be good to have!

github-actions · 2025-01-14T08:03:36Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

sambhavnoobcoder · 2025-02-16T21:07:30Z

Hi @Rocketknight1 @ArthurZucker ,
I've submitted PR #36221 to address this issue. While analyzing the problem, I found that the solution might be straightforward - adding the _is_hf_initialized flag to prevent unintended reinitialization during post_init().
I've added comprehensive tests that verify the behaviour, but given the minimal nature of the change, I'd appreciate your review to ensure I haven't overlooked any aspects of the issue. The tests are passing, but I want to make sure this solution aligns with the intended behaviour . Also i understand if adding tests for this feels like an overkill , however they were kept in for initial reference and ease of testing of desired behaviour .
Looking forward to your feedback!

avishaiElmakies added the bug label Dec 7, 2024

Rocketknight1 added the fixme label Jan 14, 2025

github-actions bot closed this as completed Jan 23, 2025

ArthurZucker reopened this Jan 23, 2025

ArthurZucker added the Good Second Issue Issues that are more difficult to do than "Good First" issues - give it a try if you want! label Jan 23, 2025

sambhavnoobcoder linked a pull request Feb 16, 2025 that will close this issue

Prevent Reinitialization of Resized LM Head When tie_word_embeddings is False #35141 #36221

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

resizing token embeddings causes output embedding to be reinitialized in `post_init` when `tie_word_embedding` is False #35141

resizing token embeddings causes output embedding to be reinitialized in `post_init` when `tie_word_embedding` is False #35141

avishaiElmakies commented Dec 7, 2024

Rocketknight1 commented Dec 9, 2024

avishaiElmakies commented Dec 9, 2024

Rocketknight1 commented Dec 9, 2024

avishaiElmakies commented Dec 10, 2024 •

edited

Loading

Rocketknight1 commented Dec 10, 2024

avishaiElmakies commented Dec 10, 2024

ArthurZucker commented Dec 20, 2024

github-actions bot commented Jan 14, 2025

sambhavnoobcoder commented Feb 16, 2025 •

edited

Loading

resizing token embeddings causes output embedding to be reinitialized in post_init when tie_word_embedding is False #35141

resizing token embeddings causes output embedding to be reinitialized in post_init when tie_word_embedding is False #35141

Comments

avishaiElmakies commented Dec 7, 2024

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Rocketknight1 commented Dec 9, 2024

avishaiElmakies commented Dec 9, 2024

Rocketknight1 commented Dec 9, 2024

avishaiElmakies commented Dec 10, 2024 • edited Loading

Rocketknight1 commented Dec 10, 2024

avishaiElmakies commented Dec 10, 2024

ArthurZucker commented Dec 20, 2024

github-actions bot commented Jan 14, 2025

sambhavnoobcoder commented Feb 16, 2025 • edited Loading

resizing token embeddings causes output embedding to be reinitialized in `post_init` when `tie_word_embedding` is False #35141

resizing token embeddings causes output embedding to be reinitialized in `post_init` when `tie_word_embedding` is False #35141

avishaiElmakies commented Dec 10, 2024 •

edited

Loading

sambhavnoobcoder commented Feb 16, 2025 •

edited

Loading