[Bug Report] Global and Local Attn layer order of Gemma2 is wrong? #778

huangxt39 · 2024-11-09T12:25:41Z

Describe the bug
In huggingface's implementation, the attention layers of Gemma2-2b are [local, global, local, ...]. However, the configuration of Gemma2-2b in transformer_lens is [global, local, global, ...]

Code example

from transformer_lens import HookedTransformer
model = HookedTransformer.from_pretrained("google/gemma-2-2b")
print(model.cfg)

On the other side, you can check the huggingface code here
https://github.com/huggingface/transformers/blob/a06a0d12636756352494b99b5b264ac9955bc735/src/transformers/models/gemma2/modeling_gemma2.py#L505

Additional context
I'm using transformer_lens-2.8.1

Checklist

[√] I have checked that there is no similar issue in the repo (required)

The text was updated successfully, but these errors were encountered:

bryce13950 added complexity-moderate Moderately complicated issues for people who have intermediate experience with the code implementation-inaccuracy Any issues related to our implementation being off from the official version labels Nov 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug Report] Global and Local Attn layer order of Gemma2 is wrong? #778

[Bug Report] Global and Local Attn layer order of Gemma2 is wrong? #778

huangxt39 commented Nov 9, 2024 •

edited

Loading

[Bug Report] Global and Local Attn layer order of Gemma2 is wrong? #778

[Bug Report] Global and Local Attn layer order of Gemma2 is wrong? #778

Comments

huangxt39 commented Nov 9, 2024 • edited Loading

Checklist

huangxt39 commented Nov 9, 2024 •

edited

Loading