-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft PR Adding mistral 0.1 #1131
Conversation
This is ready for review! Something might be up with the self-hosted runner for tests? it seems to not have the proper packages installed, including pytest |
Does this require flash attention >= 2.3? Sliding window attention is only available from that version (see https://github.com/Dao-AILab/flash-attention?tab=readme-ov-file#23-local-ie-sliding-window-attention). With the current Docker image (
|
Ah yes you’re correct— I will add a check for this in #1162 . |
Here's the PR of the October addition of support for Mistral 7B v0.1 in GPT-NeoX, referred to in issue 1050.
Among other things, this PR also adds support for sliding window attention in GPT-NeoX, both through FlashAttention2 and through Megatron.
An example script is included to show how to run the conversion of a HuggingFace (HF) Mistral 7B v0.1 model into corresponding GPT-NeoX checkpoints.
The items left to do since then are to:
pp>0
is not supported).Note: issue 1124 recently added support for conversion back to HF to enable such testing and is also concerned with supporting pp>0 in the conversion scripts.