Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[New Model]: Codestral Mamba #6479

Open
K-Mistele opened this issue Jul 16, 2024 · 4 comments · May be fixed by #9292
Open

[New Model]: Codestral Mamba #6479

K-Mistele opened this issue Jul 16, 2024 · 4 comments · May be fixed by #9292
Labels
new model Requests to new models stale

Comments

@K-Mistele
Copy link
Contributor

The model to consider.

Mamba Codestral: https://huggingface.co/mistralai/mamba-codestral-7B-v0.1

Highlights:

  • SOTA 7B code model
  • theoretically unlimited context length; tested up to 256k
  • inference is linear-complexity with respect to sequence length, compared to transformers which is quadratic-complexity

The closest model vllm already supports.

Jamba seems to be the closest model, since it is Mamba-based: https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/models/jamba.py

What's your difficulty of supporting the model you want?

Mamba is a non-transformer architecture, but there is already a mamba-based model supported, so it's unclear how difficult it would be to support.

@K-Mistele K-Mistele added the new model Requests to new models label Jul 16, 2024
@simon-mo
Copy link
Collaborator

cc @tlrmchlsmth who is working on it

@digantamisra98
Copy link

Any updates on this?

@tlrmchlsmth
Copy link
Collaborator

@digantamisra98 I have a branch where I have this working -- planning to land #6484 soon (possibly today) and will follow up with Mamba2 support afterwards (which will include support for Codestral Mamba)

@tlrmchlsmth tlrmchlsmth linked a pull request Oct 12, 2024 that will close this issue
5 tasks
Copy link

github-actions bot commented Jan 9, 2025

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

@github-actions github-actions bot added the stale label Jan 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new model Requests to new models stale
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants