-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
linear_focus_attention问题 #1
Comments
Thank you for your comments. This is a typo; our results were obtained without this operation. Our motivation was to make attention more focused, particularly concerning |
I understand. Thank you for your answer! |
Sorry, I still have a question. In the paper, formula(18) deal with Z_t by subtracting the reciprocal of curvature k, while the code adds a curvature k. Looking forward to your answer! |
Thanks for your question! For the first questionIf we fix the curvature Therefore, we directly set a For the second questionUsing But why do we use |
Thank you for your answer! |
在linear_focus_attention这部分,为什么不对v值进行phi_qs = (F.relu(qs) + 1e-6) / (self.norm_scale.abs() + 1e-6)类似的操作呢?因为我看到论文里公式(15)对Q_s、K_s和V_s都应用了Phi函数
The text was updated successfully, but these errors were encountered: