You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is there code available, either here or elsewhere, that would implement a pytorch layer for streaming attention w/ sinks independent of the various LLMs that it goes in?
Is there code available, either here or elsewhere, that would implement a pytorch layer for streaming attention w/ sinks independent of the various LLMs that it goes in?
I can imagine turning this into a separate layer:
https://github.com/mit-han-lab/streaming-llm/blob/main/streaming_llm/pos_shift/modify_llama.py
but if that work has already been done somewhere else, that'd be a great time saver
The text was updated successfully, but these errors were encountered: