Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resizing token embeddings causes output embedding to be reinitialized in post_init when tie_word_embedding is False #35141

Open
2 of 4 tasks
avishaiElmakies opened this issue Dec 7, 2024 · 9 comments · May be fixed by #36221
Labels
bug fixme Good Second Issue Issues that are more difficult to do than "Good First" issues - give it a try if you want!

Comments

@avishaiElmakies
Copy link
Contributor

System Info

  • transformers version: 4.46.3
  • Platform: Linux-6.6.20-aufs-1-x86_64-with-glibc2.36
  • Python version: 3.11.2
  • Huggingface_hub version: 0.26.1
  • Safetensors version: 0.4.5
  • Accelerate version: 1.0.1
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.5.1+cu124 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?: No
  • Using GPU in script?: Yes
  • GPU type: NVIDIA A10

Who can help?

@ArthurZucker

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

This code reproduces the problem:

pythia = AutoModelForCausalLM.from_pretrained("EleutherAI/pythia-410m")
pythia.resize_token_embeddings(502)
pythia.post_init()

the default value for tie_word_embeddings in pythia is False.
I believe the problem arises from the fact the if tie_word_embeddings is False, Then resize_token_embeddings creates a new nn.Linear object that doesn't have the flag _is_hf_initialized(causing it to be False when using getattr), and then post_init calls _init_weights on the new module.

new_lm_head = nn.Linear(

Expected behavior

post_init should not change the weights of output_embeddings after a resize.

@Rocketknight1
Copy link
Member

@avishaiElmakies this might be a bug, but can you explain why you're calling post_init() after the layer resize?

@avishaiElmakies
Copy link
Contributor Author

I am using the model inside a more complex model which i add more modules to. I want thow to be initiated normally, which from what i understand should be done using post_init.

@Rocketknight1
Copy link
Member

@avishaiElmakies I understand! In that case, would setting _is_hf_initialized in resize_token_embeddings fix the issue? Would you be willing to make that PR?

@avishaiElmakies
Copy link
Contributor Author

avishaiElmakies commented Dec 10, 2024

@Rocketknight1 yes. I think _is_hf_initialized in the function would fix the issue. I would be willing to do a PR, but don't think I will have the time this week. so if someone else wants to do it will be fine

@Rocketknight1
Copy link
Member

@avishaiElmakies thank you! The PR isn't urgent, but we'd definitely appreciate the fix if you get a chance.

@avishaiElmakies
Copy link
Contributor Author

Thank you for the response!

@ArthurZucker
Copy link
Collaborator

PR is most welcome as I am not sure this is the intended API, but might be good to have!

Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@ArthurZucker ArthurZucker reopened this Jan 23, 2025
@ArthurZucker ArthurZucker added the Good Second Issue Issues that are more difficult to do than "Good First" issues - give it a try if you want! label Jan 23, 2025
@sambhavnoobcoder
Copy link
Contributor

sambhavnoobcoder commented Feb 16, 2025

Hi @Rocketknight1 @ArthurZucker ,
I've submitted PR #36221 to address this issue. While analyzing the problem, I found that the solution might be straightforward - adding the _is_hf_initialized flag to prevent unintended reinitialization during post_init().
I've added comprehensive tests that verify the behaviour, but given the minimal nature of the change, I'd appreciate your review to ensure I haven't overlooked any aspects of the issue. The tests are passing, but I want to make sure this solution aligns with the intended behaviour . Also i understand if adding tests for this feels like an overkill , however they were kept in for initial reference and ease of testing of desired behaviour .
Looking forward to your feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug fixme Good Second Issue Issues that are more difficult to do than "Good First" issues - give it a try if you want!
Projects
None yet
4 participants