Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug Report] Padding side inconsistency with Huggingface Transformers #801

Open
1 task done
spfrommer opened this issue Nov 27, 2024 · 0 comments
Open
1 task done
Labels
bug Something isn't working complexity-moderate Moderately complicated issues for people who have intermediate experience with the code needs-investigation Issues that need to be recreated, or investigated before work can be done

Comments

@spfrommer
Copy link

Describe the bug
The HookedTransformer tokenizer has the padding side set to "right" for Gemma 2 2b. However, the huggingface autotokenizer has the padding side set to "left." I'm not sure why these are inconsistent.

Code example

from transformer_lens import HookedTransformer
from transformers import AutoTokenizer

model = HookedTransformer.from_pretrained('google/gemma-2-2b')
tokenizer = AutoTokenizer.from_pretrained('google/gemma-2-2b')

print(model.tokenizer.padding_side)
print(tokenizer.padding_side)

Output:

right
left

System Info
Linux system: installed using pip in a Python 3.10.12 virtualenv. Package versions are:

  • transformer_lens: 2.9.0
  • transformers: 4.46.1

Checklist

  • I have checked that there is no similar issue in the repo (required)
@bryce13950 bryce13950 added bug Something isn't working complexity-moderate Moderately complicated issues for people who have intermediate experience with the code needs-investigation Issues that need to be recreated, or investigated before work can be done labels Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working complexity-moderate Moderately complicated issues for people who have intermediate experience with the code needs-investigation Issues that need to be recreated, or investigated before work can be done
Projects
None yet
Development

No branches or pull requests

2 participants