You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Actually, I finished this implementation back in October of 2024, but it is unoptimized.
Just thought I would let you know since NSA is compatible with existing pre-trained regular attention models and might have some nice speedups once it's optimized.
The text was updated successfully, but these errors were encountered:
It would be nice to have a Pallas implementation of the following paper by deepseek:
https://arxiv.org/pdf/2502.11089
Here is a jax implementation by me.
https://github.com/OhadRubin/nsa_attention_jax
Actually, I finished this implementation back in October of 2024, but it is unoptimized.
Just thought I would let you know since NSA is compatible with existing pre-trained regular attention models and might have some nice speedups once it's optimized.
The text was updated successfully, but these errors were encountered: