-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How the recovery of locality works? #1
Comments
Hi! Thank you for the great question. I think you can understand this operation in that way. Let me explain using the shifted position matrix of Llama3.1 128K as an example: If we do not set the If my answer does not fully address your question, please feel free to ask for further clarification. |
Thanks for the response! I assume there will be some overlapping on position ids, like there will be multiple token associated to position 128? can the model works on this setting without further training? |
In fact, positions |
Congrats for the great paper!
I have a simple question about the recovering of locality.
by adding a
w
, it literally shifts the left bottom corner to the right, which is just recovering the first step of shifting leftwards. am I misunderstand this operation, or there are more tricks or findings here?The text was updated successfully, but these errors were encountered: