[Bug Report] Global and Local Attn layer order of Gemma2 is wrong? #778
Labels
complexity-moderate
Moderately complicated issues for people who have intermediate experience with the code
implementation-inaccuracy
Any issues related to our implementation being off from the official version
Describe the bug
In huggingface's implementation, the attention layers of Gemma2-2b are [local, global, local, ...]. However, the configuration of Gemma2-2b in transformer_lens is [global, local, global, ...]
Code example
On the other side, you can check the huggingface code here
https://github.com/huggingface/transformers/blob/a06a0d12636756352494b99b5b264ac9955bc735/src/transformers/models/gemma2/modeling_gemma2.py#L505
Additional context
I'm using
transformer_lens-2.8.1
Checklist
The text was updated successfully, but these errors were encountered: