-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Mistral Models #1050
Comments
Is somebody already actively working on this? |
Hello, yes I have the model implemented in the adding-mistral-0.1 branch of my fork, but I'm currently still testing it. The items left are:
|
@AIproj Sounds great! Good progress :) |
@AIproj any updates on this? |
Yep, had a meeting today with @haileyschoelkopf to figure out some bugs and test training. By the way one of the bugs we ran into has a PR needing merging in the DeeperSpeed repo, link. We're meeting again tomorrow hopefully to wrap this up ASAP. |
Training works. Current issues are revolving around lm-eval on a neox model (haven't converted to hf yet), since I'm using DS 0.12 based DeeperSpeed and it seems some things broke. To give more details, some attributes like
I had no error during training since training doesn't access the self.model attributes I mentioned. It's really |
We have updated |
Closed by #1131 , which allows for Mistral-7b-v0.1, and instruct versions 0.1 and 0.2 to be converted from meta / Mistral distributed weights format, trained in NeoX and exported to HF. |
Mistral just released a nice 7B. Let's support loading it into gpt-neox.
The text was updated successfully, but these errors were encountered: